cudf.DataFrame.dropna#

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)#

Drop rows (or columns) containing nulls from a Column.

Parameters:
axis{0, 1}, optional

Whether to drop rows (axis=0, default) or columns (axis=1) containing nulls.

how{“any”, “all”}, optional

Specifies how to decide whether to drop a row (or column). any (default) drops rows (or columns) containing at least one null value. all drops only rows (or columns) containing all null values.

thresh: int, optional

If specified, then drops every row (or column) containing less than thresh non-null values

subsetlist, optional

List of columns to consider when dropping rows (all columns are considered by default). Alternatively, when dropping columns, subset is a list of rows to consider.

inplacebool, default False

If True, do operation inplace and return None.

Returns:
Copy of the DataFrame with rows/columns containing nulls dropped.

See also

cudf.DataFrame.isna

Indicate null values.

cudf.DataFrame.notna

Indicate non-null values.

cudf.DataFrame.fillna

Replace null values.

cudf.Series.dropna

Drop null values.

cudf.Index.dropna

Drop null indices.

Examples

>>> import cudf
>>> df = cudf.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": ['Batmobile', None, 'Bullwhip'],
...                    "born": [np.datetime64("1940-04-25"),
...                             np.datetime64("NaT"),
...                             np.datetime64("NaT")]})
>>> df
       name        toy                 born
0    Alfred  Batmobile  1940-04-25 00:00:00
1    Batman       <NA>                 <NA>
2  Catwoman   Bullwhip                 <NA>

Drop the rows where at least one element is null.

>>> df.dropna()
     name        toy       born
0  Alfred  Batmobile 1940-04-25

Drop the columns where at least one element is null.

>>> df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are null.

>>> df.dropna(how='all')
       name        toy                 born
0    Alfred  Batmobile  1940-04-25 00:00:00
1    Batman       <NA>                 <NA>
2  Catwoman   Bullwhip                 <NA>

Keep only the rows with at least 2 non-null values.

>>> df.dropna(thresh=2)
       name        toy                 born
0    Alfred  Batmobile  1940-04-25 00:00:00
2  Catwoman   Bullwhip                 <NA>

Define in which columns to look for null values.

>>> df.dropna(subset=['name', 'born'])
     name        toy       born
0  Alfred  Batmobile 1940-04-25

Keep the DataFrame with valid entries in the same variable.

>>> df.dropna(inplace=True)
>>> df
     name        toy       born
0  Alfred  Batmobile 1940-04-25