cudf.DataFrame.query#

DataFrame.query(expr, local_dict=None)#

Query with a boolean expression using Numba to compile a GPU kernel.

See pandas.DataFrame.query.

Parameters:
exprstr

A boolean expression. Names in expression refer to columns. index can be used instead of index name, but this is not supported for MultiIndex.

Names starting with @ refer to Python variables.

An output value will be null if any of the input values are null regardless of expression.

local_dictdict

Containing the local variable to be used in query.

Returns:
filteredDataFrame

Examples

>>> df = cudf.DataFrame({
...     "a": [1, 2, 2],
...     "b": [3, 4, 5],
... })
>>> expr = "(a == 2 and b == 4) or (b == 3)"
>>> df.query(expr)
   a  b
0  1  3
1  2  4

DateTime conditionals:

>>> import numpy as np
>>> import datetime
>>> df = cudf.DataFrame()
>>> data = np.array(['2018-10-07', '2018-10-08'], dtype='datetime64')
>>> df['datetimes'] = data
>>> search_date = datetime.datetime.strptime('2018-10-08', '%Y-%m-%d')
>>> df.query('datetimes==@search_date')
   datetimes
1 2018-10-08

Using local_dict:

>>> import numpy as np
>>> import datetime
>>> df = cudf.DataFrame()
>>> data = np.array(['2018-10-07', '2018-10-08'], dtype='datetime64')
>>> df['datetimes'] = data
>>> search_date2 = datetime.datetime.strptime('2018-10-08', '%Y-%m-%d')
>>> df.query('datetimes==@search_date',
...          local_dict={'search_date': search_date2})
   datetimes
1 2018-10-08

Pandas Compatibility Note

DataFrame.query

One difference from pandas is that query currently only supports numeric, datetime, timedelta, or bool dtypes.