Pandas Compatibility Notes#

Pandas Compatibility Note

pandas.DataFrame.transpose(), pandas.DataFrame.T

Not supporting copy because default and only behavior is copy=True

[source]

Pandas Compatibility Note

pandas.DataFrame.agg()

  • Not supporting: axis, *args, **kwargs

[source]

Pandas Compatibility Note

pandas.DataFrame.all(), pandas.Series.all()

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

pandas.DataFrame.any(), pandas.Series.any()

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

pandas.DataFrame.count()

Parameters currently not supported are axis and numeric_only.

[source]

Pandas Compatibility Note

pandas.DataFrame.diff()

Diff currently only supports numeric dtype columns.

[source]

Pandas Compatibility Note

pandas.DataFrame.empty, pandas.Series.empty

If DataFrame/Series contains only null values, it is still not considered empty. See the example above.

[source]

Pandas Compatibility Note

pandas.DataFrame.eval()

  • Additional kwargs are not supported.

  • Bitwise and logical operators are not dtype-dependent. Specifically, & must be used for bitwise operators on integers, not and, which is specifically for the logical and between booleans.

  • Only numerical types are currently supported.

  • Operators generally will not cast automatically. Users are responsible for casting columns to suitable types before evaluating a function.

  • Multiple assignments to the same name (i.e. a sequence of assignment statements where later statements are conditioned upon the output of earlier statements) is not supported.

[source]

Pandas Compatibility Note

pandas.DataFrame.ewm()

The parameters min_periods, ignore_na, axis, and times are not yet supported. Behavior is defined only for data that begins with a valid (non-null) element.

Currently, only mean is a supported method.

[source]

Pandas Compatibility Note

pandas.DataFrame.from_arrow

This method does not exist in pandas but it is similar to how pyarrow.Table.to_pandas() works for PyArrow Tables i.e. it does not support automatically setting index column(s).

[source]

Pandas Compatibility Note

pandas.DataFrame.interleave_columns

This method does not exist in pandas but it can be run as pd.Series(np.vstack(df.to_numpy()).reshape((-1,))).

[source]

Pandas Compatibility Note

pandas.DataFrame.join()

  • other must be a single DataFrame for now.

  • on is not supported yet due to lack of multi-index support.

[source]

Pandas Compatibility Note

pandas.DataFrame.max(), pandas.Series.max()

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

pandas.DataFrame.merge()

DataFrames merges in cuDF result in non-deterministic row ordering.

[source]

Pandas Compatibility Note

pandas.DataFrame.min(), pandas.Series.min()

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

pandas.DataFrame.transpose()

axis parameter is currently not supported.

[source]

Pandas Compatibility Note

pandas.DataFrame.nlargest()

  • Only a single column is supported in columns

[source]

Pandas Compatibility Note

pandas.DataFrame.nsmallest()

  • Only a single column is supported in columns

[source]

Pandas Compatibility Note

pandas.DataFrame.quantile()

One notable difference from Pandas is when DataFrame is of non-numeric types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn’t support mixed types under Series.

[source]

Pandas Compatibility Note

pandas.DataFrame.query()

One difference from pandas is that query currently only supports numeric, datetime, timedelta, or bool dtypes.

[source]

Pandas Compatibility Note

pandas.DataFrame.reindex()

Note: One difference from Pandas is that NA is used for rows that do not match, rather than NaN. One side effect of this is that the column http_status retains an integer dtype in cuDF where it is cast to float in Pandas.

[source]

Pandas Compatibility Note

pandas.DataFrame.rename()

  • Not Supporting: level

Rename will not overwrite column names. If a list with duplicates is passed, column names will be postfixed with a number.

[source]

Pandas Compatibility Note

pandas.DataFrame.replace(), pandas.Series.replace()

Parameters that are currently not supported are: limit, regex, method

[source]

Pandas Compatibility Note

pandas.DataFrame.resample(), pandas.Series.resample()

Note that the dtype of the index (or the ‘on’ column if using ‘on=’) in the result will be of a frequency closest to the resampled frequency. For example, if resampling from nanoseconds to milliseconds, the index will be of dtype ‘datetime64[ms]’.

[source]

Pandas Compatibility Note

pandas.DataFrame.sample(), pandas.Series.sample()

When sampling from axis=0/'index', random_state can be either a numpy random state (numpy.random.RandomState) or a cupy random state (cupy.random.RandomState). When a numpy random state is used, the output is guaranteed to match the output of the corresponding pandas method call, but generating the sample maybe slow. If exact pandas equivalence is not required, using a cupy random state will achieve better performance, especially when sampling large number of items. It’s advised to use the matching ndarray type to the random state for the weights array.

[source]

Pandas Compatibility Note

pandas.DataFrame.skew(), pandas.Series.skew()

The axis parameter is not currently supported.

[source]

Pandas Compatibility Note

pandas.DataFrame.sort_index(), pandas.Series.sort_index()

  • Not supporting: kind, sort_remaining=False

[source]

Pandas Compatibility Note

pandas.DataFrame.sort_values(), pandas.Series.sort_values()

  • Support axis=’index’ only.

  • Not supporting: inplace, kind

[source]

Pandas Compatibility Note

pandas.DataFrame.transpose(), pandas.DataFrame.T

Not supporting copy because default and only behavior is copy=True

[source]

Pandas Compatibility Note

pandas.DataFrame.truncate(), pandas.Series.truncate()

The copy parameter is only present for API compatibility, but copy=False is not supported. This method always generates a copy.

[source]

Pandas Compatibility Note

pandas.DataFrame.where(), pandas.Series.where()

Note that where treats missing values as falsy, in parallel with pandas treatment of nullable data:

>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0       1
1    <NA>
2    <NA>
dtype: int64
>>> gsr.where([True, False, False])
0       1
1    <NA>
2    <NA>
dtype: int64

[source]

Pandas Compatibility Note

pandas.DataFrame.all(), pandas.Series.all()

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

pandas.DataFrame.any(), pandas.Series.any()

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

pandas.DataFrame.max(), pandas.Series.max()

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

pandas.DataFrame.min(), pandas.Series.min()

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

pandas.DataFrame.where(), pandas.Series.where()

Note that where treats missing values as falsy, in parallel with pandas treatment of nullable data:

>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0       1
1    <NA>
2    <NA>
dtype: int64
>>> gsr.where([True, False, False])
0       1
1    <NA>
2    <NA>
dtype: int64

[source]

Pandas Compatibility Note

pandas.DataFrame.all(), pandas.Series.all()

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

pandas.DataFrame.any(), pandas.Series.any()

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

pandas.Series.count()

Parameters currently not supported is level.

[source]

Pandas Compatibility Note

pandas.Series.cov()

min_periods parameter is not yet supported.

[source]

Pandas Compatibility Note

pandas.DataFrame.empty, pandas.Series.empty

If DataFrame/Series contains only null values, it is still not considered empty. See the example above.

[source]

Pandas Compatibility Note

pandas.DataFrame.ewm()

The parameters min_periods, ignore_na, axis, and times are not yet supported. Behavior is defined only for data that begins with a valid (non-null) element.

Currently, only mean is a supported method.

[source]

Pandas Compatibility Note

pandas.Series.map()

Please note map currently only supports fixed-width numeric type functions.

[source]

Pandas Compatibility Note

pandas.DataFrame.max(), pandas.Series.max()

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

pandas.DataFrame.min(), pandas.Series.min()

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

pandas.Series.reindex()

Note: One difference from Pandas is that NA is used for rows that do not match, rather than NaN. One side effect of this is that the series retains an integer dtype in cuDF where it is cast to float in Pandas.

[source]

Pandas Compatibility Note

pandas.Series.rename()

  • Supports scalar values only for changing name attribute

[source]

Pandas Compatibility Note

pandas.DataFrame.replace(), pandas.Series.replace()

Parameters that are currently not supported are: limit, regex, method

[source]

Pandas Compatibility Note

pandas.DataFrame.resample(), pandas.Series.resample()

Note that the dtype of the index (or the ‘on’ column if using ‘on=’) in the result will be of a frequency closest to the resampled frequency. For example, if resampling from nanoseconds to milliseconds, the index will be of dtype ‘datetime64[ms]’.

[source]

Pandas Compatibility Note

pandas.DataFrame.sample(), pandas.Series.sample()

When sampling from axis=0/'index', random_state can be either a numpy random state (numpy.random.RandomState) or a cupy random state (cupy.random.RandomState). When a numpy random state is used, the output is guaranteed to match the output of the corresponding pandas method call, but generating the sample maybe slow. If exact pandas equivalence is not required, using a cupy random state will achieve better performance, especially when sampling large number of items. It’s advised to use the matching ndarray type to the random state for the weights array.

[source]

Pandas Compatibility Note

pandas.DataFrame.skew(), pandas.Series.skew()

The axis parameter is not currently supported.

[source]

Pandas Compatibility Note

pandas.DataFrame.sort_index(), pandas.Series.sort_index()

  • Not supporting: kind, sort_remaining=False

[source]

Pandas Compatibility Note

pandas.Series.sort_values()

  • Support axis=’index’ only.

  • The inplace and kind argument is currently unsupported

[source]

Pandas Compatibility Note

pandas.DataFrame.truncate(), pandas.Series.truncate()

The copy parameter is only present for API compatibility, but copy=False is not supported. This method always generates a copy.

[source]

Pandas Compatibility Note

pandas.DataFrame.where(), pandas.Series.where()

Note that where treats missing values as falsy, in parallel with pandas treatment of nullable data:

>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0       1
1    <NA>
2    <NA>
dtype: int64
>>> gsr.where([True, False, False])
0       1
1    <NA>
2    <NA>
dtype: int64

[source]

Pandas Compatibility Note

pandas.Series.list.sort_values

This method does not exist in pandas but it can be run as:

>>> import pandas as pd
>>> s = pd.Series([[3, 2, 1], [2, 4, 3]])
>>> print(s.apply(sorted))
0    [1, 2, 3]
1    [2, 3, 4]
dtype: object

[source]

Pandas Compatibility Note

pandas.Series.str.contains()

The parameters case and na are not yet supported and will raise a NotImplementedError if anything other than the default value is set. The flags parameter currently only supports re.DOTALL and re.MULTILINE.

[source]

Pandas Compatibility Note

pandas.Series.str.count()

  • flags parameter currently only supports re.DOTALL and re.MULTILINE.

  • Some characters need to be escaped when passing in pat. e.g. '$' has a special meaning in regex and must be escaped when finding this literal character.

[source]

Pandas Compatibility Note

pandas.Series.str.endswith()

na parameter is not yet supported, as cudf uses native strings instead of Python objects.

[source]

Pandas Compatibility Note

pandas.Series.str.extract()

The flags parameter currently only supports re.DOTALL and re.MULTILINE.

[source]

Pandas Compatibility Note

pandas.Series.str.findall()

The flags parameter currently only supports re.DOTALL and re.MULTILINE.

[source]

Pandas Compatibility Note

pandas.Series.str.match()

Parameters case and na are currently not supported. The flags parameter currently only supports re.DOTALL and re.MULTILINE.

[source]

Pandas Compatibility Note

pandas.Series.str.partition()

The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set.

[source]

Pandas Compatibility Note

pandas.Series.str.replace()

The parameters case and flags are not yet supported and will raise a NotImplementedError if anything other than the default value is set.

[source]

[source]

[source]

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.shift(),

pandas.core.groupby.SeriesGroupBy.shift()

Parameter freq is unsupported.

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.apply(),

pandas.core.groupby.SeriesGroupBy.apply()

cuDF’s groupby.apply is limited compared to pandas. In some situations, Pandas returns the grouped keys as part of the index while cudf does not due to redundancy. For example:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'a': [1, 1, 2, 2],
...     'b': [1, 2, 1, 2],
...     'c': [1, 2, 3, 4],
... })
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('a')[["b", "c"]].apply(lambda x: x.iloc[[0]])
     b  c
a
1 0  1  1
2 2  1  3
>>> gdf.groupby('a')[["b", "c"]].apply(lambda x: x.iloc[[0]])
   b  c
0  1  1
2  1  3

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.first(),

pandas.core.groupby.SeriesGroupBy.first()

The numeric_only, min_count

[source]

[source]

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.last(),

pandas.core.groupby.SeriesGroupBy.last()

The numeric_only, min_count

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.max(),

pandas.core.groupby.SeriesGroupBy.max()

The numeric_only, min_count

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.mean(),

pandas.core.groupby.SeriesGroupBy.mean()

The numeric_only, min_count

[source]

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.min(),

pandas.core.groupby.SeriesGroupBy.min()

The numeric_only, min_count

[source]

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.prod(),

pandas.core.groupby.SeriesGroupBy.prod()

The numeric_only, min_count

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.shift(),

pandas.core.groupby.SeriesGroupBy.shift()

Parameter freq is unsupported.

[source]

Pandas Compatibility Note

pandas.core.groupby.DataFrameGroupBy.sum(),

pandas.core.groupby.SeriesGroupBy.sum()

The numeric_only, min_count

[source]

[source]

Pandas Compatibility Note

pandas.DatetimeIndex.strftime()

The following date format identifiers are not yet supported: %c, %x,``%X``

[source]

Pandas Compatibility Note

pandas.DataFrame.merge()

DataFrames merges in cuDF result in non-deterministic row ordering.

[source]

Pandas Compatibility Note

pandas.to_numeric()

An important difference from pandas is that this function does not accept mixed numeric/non-numeric type sequences. For example [1, 'a']. A TypeError will be raised when such input is received, regardless of errors parameter.

[source]