cudf.Series.drop_duplicates#

Series.drop_duplicates(keep='first', inplace=False, ignore_index=False)#

Return Series with duplicate values removed.

Parameters:
keep{‘first’, ‘last’, False}, default ‘first’

Method to handle dropping duplicates:

  • ‘first’ : Drop duplicates except for the first occurrence.

  • ‘last’ : Drop duplicates except for the last occurrence.

  • False : Drop all duplicates.

inplacebool, default False

If True, performs operation inplace and returns None.

Returns:
Series or None

Series with duplicates dropped or None if inplace=True.

Examples

>>> s = cudf.Series(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'],
...               name='animal')
>>> s
0      lama
1       cow
2      lama
3    beetle
4      lama
5     hippo
Name: animal, dtype: object

With the keep parameter, the selection behavior of duplicated values can be changed. The value ‘first’ keeps the first occurrence for each set of duplicated entries. The default value of keep is ‘first’. Note that order of the rows being returned is not guaranteed to be sorted.

>>> s.drop_duplicates()
0      lama
1       cow
3    beetle
5     hippo
Name: animal, dtype: object

The value ‘last’ for parameter keep keeps the last occurrence for each set of duplicated entries.

>>> s.drop_duplicates(keep='last')
1       cow
3    beetle
4      lama
5     hippo
Name: animal, dtype: object

The value False for parameter keep discards all sets of duplicated entries. Setting the value of ‘inplace’ to True performs the operation inplace and returns None.

>>> s.drop_duplicates(keep=False, inplace=True)
>>> s
1       cow
3    beetle
5     hippo
Name: animal, dtype: object