Pandas Compatibility Notes#
Pandas Compatibility Note
pandas.DataFrame.transpose()
, pandas.DataFrame.T
Not supporting copy because default and only behavior is copy=True
Pandas Compatibility Note
pandas.DataFrame.all()
, pandas.Series.all()
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
pandas.DataFrame.any()
, pandas.Series.any()
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
Parameters currently not supported are axis and numeric_only.
Pandas Compatibility Note
Diff currently only supports numeric dtype columns.
Pandas Compatibility Note
pandas.DataFrame.empty
, pandas.Series.empty
If DataFrame/Series contains only null values, it is still not considered empty. See the example above.
Pandas Compatibility Note
Additional kwargs are not supported.
Bitwise and logical operators are not dtype-dependent. Specifically, & must be used for bitwise operators on integers, not and, which is specifically for the logical and between booleans.
Only numerical types are currently supported.
Operators generally will not cast automatically. Users are responsible for casting columns to suitable types before evaluating a function.
Multiple assignments to the same name (i.e. a sequence of assignment statements where later statements are conditioned upon the output of earlier statements) is not supported.
Pandas Compatibility Note
The parameters min_periods
, ignore_na
, axis
, and times
are not yet supported. Behavior is defined only for data that begins
with a valid (non-null) element.
Currently, only mean
is a supported method.
Pandas Compatibility Note
pandas.DataFrame.from_arrow
This method does not exist in pandas but it is similar to
how pyarrow.Table.to_pandas()
works for PyArrow Tables i.e.
it does not support automatically setting index column(s).
Pandas Compatibility Note
pandas.DataFrame.interleave_columns
This method does not exist in pandas but it can be run
as pd.Series(np.vstack(df.to_numpy()).reshape((-1,)))
.
Pandas Compatibility Note
other must be a single DataFrame for now.
on is not supported yet due to lack of multi-index support.
Pandas Compatibility Note
pandas.DataFrame.max()
, pandas.Series.max()
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
DataFrames merges in cuDF result in non-deterministic row ordering.
Pandas Compatibility Note
pandas.DataFrame.min()
, pandas.Series.min()
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
One notable difference from Pandas is when DataFrame is of non-numeric types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn’t support mixed types under Series.
Pandas Compatibility Note
One difference from pandas is that query
currently only
supports numeric, datetime, timedelta, or bool dtypes.
Pandas Compatibility Note
Note: One difference from Pandas is that NA
is used for rows
that do not match, rather than NaN
. One side effect of this is
that the column http_status
retains an integer dtype in cuDF
where it is cast to float in Pandas.
Pandas Compatibility Note
Not Supporting: level
Rename will not overwrite column names. If a list with duplicates is passed, column names will be postfixed with a number.
Pandas Compatibility Note
pandas.DataFrame.replace()
, pandas.Series.replace()
Parameters that are currently not supported are: limit, regex, method
Pandas Compatibility Note
pandas.DataFrame.resample()
, pandas.Series.resample()
Note that the dtype of the index (or the ‘on’ column if using ‘on=’) in the result will be of a frequency closest to the resampled frequency. For example, if resampling from nanoseconds to milliseconds, the index will be of dtype ‘datetime64[ms]’.
Pandas Compatibility Note
pandas.DataFrame.sample()
, pandas.Series.sample()
When sampling from axis=0/'index'
, random_state
can be
either a numpy random state (numpy.random.RandomState
)
or a cupy random state (cupy.random.RandomState
). When a numpy
random state is used, the output is guaranteed to match the output
of the corresponding pandas method call, but generating the sample
maybe slow. If exact pandas equivalence is not required, using a
cupy random state will achieve better performance,
especially when sampling large number of
items. It’s advised to use the matching ndarray type to
the random state for the weights array.
Pandas Compatibility Note
pandas.DataFrame.skew()
, pandas.Series.skew()
The axis parameter is not currently supported.
Pandas Compatibility Note
pandas.DataFrame.sort_index()
, pandas.Series.sort_index()
Not supporting: kind, sort_remaining=False
Pandas Compatibility Note
pandas.DataFrame.sort_values()
, pandas.Series.sort_values()
Support axis=’index’ only.
Not supporting: inplace, kind
Pandas Compatibility Note
pandas.DataFrame.transpose()
, pandas.DataFrame.T
Not supporting copy because default and only behavior is copy=True
Pandas Compatibility Note
pandas.DataFrame.truncate()
, pandas.Series.truncate()
The copy
parameter is only present for API compatibility, but
copy=False
is not supported. This method always generates a
copy.
Pandas Compatibility Note
pandas.DataFrame.where()
, pandas.Series.where()
Note that where
treats missing values as falsy,
in parallel with pandas treatment of nullable data:
>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0 1
1 <NA>
2 <NA>
dtype: int64
>>> gsr.where([True, False, False])
0 1
1 <NA>
2 <NA>
dtype: int64
Pandas Compatibility Note
pandas.DataFrame.all()
, pandas.Series.all()
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
pandas.DataFrame.any()
, pandas.Series.any()
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
pandas.DataFrame.max()
, pandas.Series.max()
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
pandas.DataFrame.min()
, pandas.Series.min()
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
pandas.DataFrame.where()
, pandas.Series.where()
Note that where
treats missing values as falsy,
in parallel with pandas treatment of nullable data:
>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0 1
1 <NA>
2 <NA>
dtype: int64
>>> gsr.where([True, False, False])
0 1
1 <NA>
2 <NA>
dtype: int64
Pandas Compatibility Note
pandas.DataFrame.all()
, pandas.Series.all()
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
pandas.DataFrame.any()
, pandas.Series.any()
Parameters currently not supported are axis, bool_only, level.
Pandas Compatibility Note
pandas.DataFrame.empty
, pandas.Series.empty
If DataFrame/Series contains only null values, it is still not considered empty. See the example above.
Pandas Compatibility Note
The parameters min_periods
, ignore_na
, axis
, and times
are not yet supported. Behavior is defined only for data that begins
with a valid (non-null) element.
Currently, only mean
is a supported method.
Pandas Compatibility Note
Please note map currently only supports fixed-width numeric type functions.
Pandas Compatibility Note
pandas.DataFrame.max()
, pandas.Series.max()
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
pandas.DataFrame.min()
, pandas.Series.min()
Parameters currently not supported are level, numeric_only.
Pandas Compatibility Note
Note: One difference from Pandas is that NA
is used for rows
that do not match, rather than NaN
. One side effect of this is
that the series retains an integer dtype in cuDF
where it is cast to float in Pandas.
Pandas Compatibility Note
Supports scalar values only for changing name attribute
Pandas Compatibility Note
pandas.DataFrame.replace()
, pandas.Series.replace()
Parameters that are currently not supported are: limit, regex, method
Pandas Compatibility Note
pandas.DataFrame.resample()
, pandas.Series.resample()
Note that the dtype of the index (or the ‘on’ column if using ‘on=’) in the result will be of a frequency closest to the resampled frequency. For example, if resampling from nanoseconds to milliseconds, the index will be of dtype ‘datetime64[ms]’.
Pandas Compatibility Note
pandas.DataFrame.sample()
, pandas.Series.sample()
When sampling from axis=0/'index'
, random_state
can be
either a numpy random state (numpy.random.RandomState
)
or a cupy random state (cupy.random.RandomState
). When a numpy
random state is used, the output is guaranteed to match the output
of the corresponding pandas method call, but generating the sample
maybe slow. If exact pandas equivalence is not required, using a
cupy random state will achieve better performance,
especially when sampling large number of
items. It’s advised to use the matching ndarray type to
the random state for the weights array.
Pandas Compatibility Note
pandas.DataFrame.skew()
, pandas.Series.skew()
The axis parameter is not currently supported.
Pandas Compatibility Note
pandas.DataFrame.sort_index()
, pandas.Series.sort_index()
Not supporting: kind, sort_remaining=False
Pandas Compatibility Note
Support axis=’index’ only.
The inplace and kind argument is currently unsupported
Pandas Compatibility Note
pandas.DataFrame.truncate()
, pandas.Series.truncate()
The copy
parameter is only present for API compatibility, but
copy=False
is not supported. This method always generates a
copy.
Pandas Compatibility Note
pandas.DataFrame.where()
, pandas.Series.where()
Note that where
treats missing values as falsy,
in parallel with pandas treatment of nullable data:
>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0 1
1 <NA>
2 <NA>
dtype: int64
>>> gsr.where([True, False, False])
0 1
1 <NA>
2 <NA>
dtype: int64
Pandas Compatibility Note
pandas.Series.list.sort_values
This method does not exist in pandas but it can be run as:
>>> import pandas as pd
>>> s = pd.Series([[3, 2, 1], [2, 4, 3]])
>>> print(s.apply(sorted))
0 [1, 2, 3]
1 [2, 3, 4]
dtype: object
Pandas Compatibility Note
The parameters case and na are not yet supported and will raise a NotImplementedError if anything other than the default value is set. The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Pandas Compatibility Note
flags parameter currently only supports re.DOTALL and re.MULTILINE.
Some characters need to be escaped when passing in pat. e.g.
'$'
has a special meaning in regex and must be escaped when finding this literal character.
Pandas Compatibility Note
na parameter is not yet supported, as cudf uses native strings instead of Python objects.
Pandas Compatibility Note
The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Pandas Compatibility Note
The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Pandas Compatibility Note
Parameters case and na are currently not supported. The flags parameter currently only supports re.DOTALL and re.MULTILINE.
Pandas Compatibility Note
The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set.
Pandas Compatibility Note
The parameters case and flags are not yet supported and will raise a NotImplementedError if anything other than the default value is set.
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
Parameter freq
is unsupported.
Pandas Compatibility Note
cuDF’s groupby.apply
is limited compared to pandas.
In some situations, Pandas returns the grouped keys as part of
the index while cudf does not due to redundancy. For example:
>>> import pandas as pd
>>> df = pd.DataFrame({
... 'a': [1, 1, 2, 2],
... 'b': [1, 2, 1, 2],
... 'c': [1, 2, 3, 4],
... })
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('a')[["b", "c"]].apply(lambda x: x.iloc[[0]])
b c
a
1 0 1 1
2 2 1 3
>>> gdf.groupby('a')[["b", "c"]].apply(lambda x: x.iloc[[0]])
b c
0 1 1
2 1 3
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
Parameter freq
is unsupported.
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
The numeric_only, min_count
Pandas Compatibility Note
pandas.DatetimeIndex.strftime()
The following date format identifiers are not yet
supported: %c
, %x
,``%X``
Pandas Compatibility Note
DataFrames merges in cuDF result in non-deterministic row ordering.
Pandas Compatibility Note
An important difference from pandas is that this function does not
accept mixed numeric/non-numeric type sequences.
For example [1, 'a']
. A TypeError
will be raised when such
input is received, regardless of errors
parameter.