pyspark.pandas.DataFrame.idxmax¶

DataFrame.idxmax(axis: Union[int, str] = 0) → Series[source]¶

Return index of first occurrence of maximum over requested axis. NA/null values are excluded.

Note

This API collect all rows with maximum value using to_pandas() because we suppose the number of rows with max values are usually small in general.

Parameters

axis0 or ‘index’: Can only be set to 0 now.

Returns

Series

See also

Series.idxmax

Examples

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2],
...                     'b': [4.0, 2.0, 3.0, 1.0],
...                     'c': [300, 200, 400, 200]})
>>> psdf
   a    b    c
0  1  4.0  300
1  2  2.0  200
2  3  3.0  400
3  2  1.0  200

>>> psdf.idxmax()
a    2
b    0
c    2
dtype: int64

For Multi-column Index

>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2],
...                     'b': [4.0, 2.0, 3.0, 1.0],
...                     'c': [300, 200, 400, 200]})
>>> psdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')])
>>> psdf
   a    b    c
   x    y    z
0  1  4.0  300
1  2  2.0  200
2  3  3.0  400
3  2  1.0  200

>>> psdf.idxmax()
a  x    2
b  y    0
c  z    2
dtype: int64

pyspark.pandas.DataFrame.head

pyspark.pandas.DataFrame.idxmin