pyspark.pandas.Index.drop_duplicates#
- Index.drop_duplicates(keep='first')[source]#
Return Index with duplicate values removed.
- Parameters
- keep{‘first’, ‘last’,
False
}, default ‘first’ Method to handle dropping duplicates: - ‘first’ : Drop duplicates except for the first occurrence. - ‘last’ : Drop duplicates except for the last occurrence. -
False
: Drop all duplicates.
- keep{‘first’, ‘last’,
- Returns
- deduplicatedIndex
See also
Series.drop_duplicates
Equivalent method on Series.
DataFrame.drop_duplicates
Equivalent method on DataFrame.
Examples
Generate an Index with duplicate values.
>>> idx = ps.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx.drop_duplicates().sort_values() Index(['beetle', 'cow', 'hippo', 'lama'], dtype='object')