DataFrame.
groupBy
Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
DataFrame
GroupedData
groupby() is an alias for groupBy().
groupby()
groupBy()
New in version 1.3.0.
Column
columns to group by. Each element should be a column name (string) or an expression (Column).
Examples
>>> df.groupBy().avg().collect() [Row(avg(age)=3.5)] >>> sorted(df.groupBy('name').agg({'age': 'mean'}).collect()) [Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)] >>> sorted(df.groupBy(df.name).avg().collect()) [Row(name='Alice', avg(age)=2.0), Row(name='Bob', avg(age)=5.0)] >>> sorted(df.groupBy(['name', df.age]).count().collect()) [Row(name='Alice', age=2, count=1), Row(name='Bob', age=5, count=1)]