pyspark.pandas.DataFrame.idxmax¶
-
DataFrame.
idxmax
(axis: Union[int, str] = 0) → Series[source]¶ Return index of first occurrence of maximum over requested axis. NA/null values are excluded.
Note
This API collect all rows with maximum value using to_pandas() because we suppose the number of rows with max values are usually small in general.
- Parameters
- axis0 or ‘index’
Can only be set to 0 now.
- Returns
- Series
See also
Examples
>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2], ... 'b': [4.0, 2.0, 3.0, 1.0], ... 'c': [300, 200, 400, 200]}) >>> psdf a b c 0 1 4.0 300 1 2 2.0 200 2 3 3.0 400 3 2 1.0 200
>>> psdf.idxmax() a 2 b 0 c 2 dtype: int64
For Multi-column Index
>>> psdf = ps.DataFrame({'a': [1, 2, 3, 2], ... 'b': [4.0, 2.0, 3.0, 1.0], ... 'c': [300, 200, 400, 200]}) >>> psdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')]) >>> psdf a b c x y z 0 1 4.0 300 1 2 2.0 200 2 3 3.0 400 3 2 1.0 200
>>> psdf.idxmax() a x 2 b y 0 c z 2 dtype: int64