cudf.DataFrame.query#
- DataFrame.query(expr, local_dict=None)[source]#
Query with a boolean expression using Numba to compile a GPU kernel.
- Parameters:
- exprstr
A boolean expression. Names in expression refer to columns. index can be used instead of index name, but this is not supported for MultiIndex.
Names starting with @ refer to Python variables.
An output value will be null if any of the input values are null regardless of expression.
- local_dictdict
Containing the local variable to be used in query.
- Returns:
- filteredDataFrame
Examples
>>> df = cudf.DataFrame({ ... "a": [1, 2, 2], ... "b": [3, 4, 5], ... }) >>> expr = "(a == 2 and b == 4) or (b == 3)" >>> df.query(expr) a b 0 1 3 1 2 4
DateTime conditionals:
>>> import numpy as np >>> import datetime >>> df = cudf.DataFrame() >>> data = np.array(['2018-10-07', '2018-10-08'], dtype='datetime64') >>> df['datetimes'] = data >>> search_date = datetime.datetime.strptime('2018-10-08', '%Y-%m-%d') >>> df.query('datetimes==@search_date') datetimes 1 2018-10-08
Using local_dict:
>>> import numpy as np >>> import datetime >>> df = cudf.DataFrame() >>> data = np.array(['2018-10-07', '2018-10-08'], dtype='datetime64') >>> df['datetimes'] = data >>> search_date2 = datetime.datetime.strptime('2018-10-08', '%Y-%m-%d') >>> df.query('datetimes==@search_date', ... local_dict={'search_date': search_date2}) datetimes 1 2018-10-08
Pandas Compatibility Note
One difference from pandas is that
query
currently only supports numeric, datetime, timedelta, or bool dtypes.