pyspark.pandas.groupby.GroupBy.tail#
- GroupBy.tail(n=5)[source]#
Return last n rows of each group.
Similar to .apply(lambda x: x.tail(n)), but it returns a subset of rows from the original DataFrame with original index and order preserved (as_index flag is ignored).
Does not work for negative values of n.
- Returns
- DataFrame or Series
Examples
>>> df = ps.DataFrame({'a': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3], ... 'b': [2, 3, 1, 4, 6, 9, 8, 10, 7, 5], ... 'c': [3, 5, 2, 5, 1, 2, 6, 4, 3, 6]}, ... columns=['a', 'b', 'c'], ... index=[7, 2, 3, 1, 3, 4, 9, 10, 5, 6]) >>> df a b c 7 1 2 3 2 1 3 5 3 1 1 2 1 1 4 5 3 2 6 1 4 2 9 2 9 2 8 6 10 3 10 4 5 3 7 3 6 3 5 6
>>> df.groupby('a').tail(2).sort_index() a b c 1 1 4 5 3 1 1 2 4 2 9 2 5 3 7 3 6 3 5 6 9 2 8 6
>>> df.groupby('a')['b'].tail(2).sort_index() 1 4 3 1 4 9 5 7 6 5 9 8 Name: b, dtype: int64
Supports Groupby positional indexing Since pandas on Spark 3.4 (with pandas 1.4+):
>>> df = ps.DataFrame([["g", "g0"], ... ["g", "g1"], ... ["g", "g2"], ... ["g", "g3"], ... ["h", "h0"], ... ["h", "h1"]], columns=["A", "B"]) >>> df.groupby("A").tail(-1) A B 3 g g3 2 g g2 1 g g1 5 h h1