pyspark.pandas.Index.drop_duplicates#

Index.drop_duplicates(keep='first')[source]#

Return Index with duplicate values removed.

Parameters
keep{‘first’, ‘last’, False}, default ‘first’

Method to handle dropping duplicates: - ‘first’ : Drop duplicates except for the first occurrence. - ‘last’ : Drop duplicates except for the last occurrence. - False : Drop all duplicates.

Returns
deduplicatedIndex

See also

Series.drop_duplicates

Equivalent method on Series.

DataFrame.drop_duplicates

Equivalent method on DataFrame.

Examples

Generate an Index with duplicate values.

>>> idx = ps.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'])
>>> idx.drop_duplicates().sort_values()
Index(['beetle', 'cow', 'hippo', 'lama'], dtype='object')