pyspark.pandas.groupby.GroupBy.mean¶
-
GroupBy.
mean
(numeric_only: Optional[bool] = True) → FrameLike[source]¶ Compute mean of groups, excluding missing values.
- Parameters
- numeric_onlybool, default False
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.
New in version 3.4.0.
- Returns
- pyspark.pandas.Series or pyspark.pandas.DataFrame
Examples
>>> df = ps.DataFrame({'A': [1, 1, 2, 1, 2], ... 'B': [np.nan, 2, 3, 4, 5], ... 'C': [1, 2, 1, 1, 2], ... 'D': [True, False, True, False, True]})
Groupby one column and return the mean of the remaining columns in each group.
>>> df.groupby('A').mean().sort_index() B C D A 1 3.0 1.333333 0.333333 2 4.0 1.500000 1.000000