GroupBy.
sum
Compute sum of group values
New in version 3.3.0.
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. It takes no effect since only numeric columns can be support here.
New in version 3.4.0.
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
See also
pyspark.pandas.Series.groupby
pyspark.pandas.DataFrame.groupby
Notes
There is a behavior difference between pandas-on-Spark and pandas:
even if numeric_only is False.
Examples
>>> df = ps.DataFrame({"A": [1, 2, 1, 2], "B": [True, False, False, True], ... "C": [3, 4, 3, 4], "D": ["a", "a", "b", "a"]})
>>> df.groupby("A").sum().sort_index() B C A 1 1 6 2 1 8
>>> df.groupby("D").sum().sort_index() A B C D a 5 2 11 b 1 0 3
>>> df.groupby("D").sum(min_count=3).sort_index() A B C D a 5.0 2.0 11.0 b NaN NaN NaN