pyspark.pandas.groupby.GroupBy.quantile¶

GroupBy.quantile(q: float = 0.5, accuracy: int = 10000) → FrameLike[source]¶

Return group values at the given quantile.

New in version 3.4.0.

Parameters

qfloat, default 0.5 (50% quantile): Value between 0 and 1 providing the quantile to compute.
accuracyint, optional: Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy. This is a panda-on-Spark specific parameter.

Returns

pyspark.pandas.Series or pyspark.pandas.DataFrame: Return type determined by caller of GroupBy object.

See also

pyspark.pandas.Series.quantile
pyspark.pandas.DataFrame.quantile
pyspark.sql.functions.percentile_approx

Notes

quantile in pandas-on-Spark are using distributed percentile approximation algorithm unlike pandas, the result might be different with pandas, also interpolation parameter is not supported yet.

Examples

>>> df = ps.DataFrame([
...     ['a', 1], ['a', 2], ['a', 3],
...     ['b', 1], ['b', 3], ['b', 5]
... ], columns=['key', 'val'])

Groupby one column and return the quantile of the remaining columns in each group.

>>> df.groupby('key').quantile()
     val
key
a    2.0
b    3.0

pyspark.pandas.groupby.GroupBy.nunique

pyspark.pandas.groupby.GroupBy.size