pyspark.pandas.groupby.GroupBy.mean¶

GroupBy.mean(numeric_only: Optional[bool] = True) → FrameLike[source]¶

Compute mean of groups, excluding missing values.

Parameters

numeric_onlybool, default False: Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.

New in version 3.4.0.

Returns

pyspark.pandas.Series or pyspark.pandas.DataFrame

See also

pyspark.pandas.Series.groupby
pyspark.pandas.DataFrame.groupby

Examples

>>> df = ps.DataFrame({'A': [1, 1, 2, 1, 2],
...                    'B': [np.nan, 2, 3, 4, 5],
...                    'C': [1, 2, 1, 1, 2],
...                    'D': [True, False, True, False, True]})

Groupby one column and return the mean of the remaining columns in each group.

>>> df.groupby('A').mean().sort_index()  
     B         C         D
A
1  3.0  1.333333  0.333333
2  4.0  1.500000  1.000000

pyspark.pandas.groupby.GroupBy.max

pyspark.pandas.groupby.GroupBy.median