Pandas Compatibility Notes#

Pandas Compatibility Note

DataFrame.transpose, DataFrame.T

Not supporting copy because default and only behavior is copy=True

[source]

Pandas Compatibility Note

DataFrame.agg

  • Not supporting: axis, *args, **kwargs

[source]

Pandas Compatibility Note

DataFrame.all, Series.all

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.all, Series.all

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.any, Series.any

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.any, Series.any

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.count

Parameters currently not supported are axis and numeric_only.

[source]

Pandas Compatibility Note

DataFrame.diff

Diff currently only supports numeric dtype columns.

[source]

Pandas Compatibility Note

DataFrame.empty, Series.empty

If DataFrame/Series contains only null values, it is still not considered empty. See the example above.

[source]

Pandas Compatibility Note

DataFrame.eval

  • Additional kwargs are not supported.

  • Bitwise and logical operators are not dtype-dependent. Specifically, & must be used for bitwise operators on integers, not and, which is specifically for the logical and between booleans.

  • Only numerical types are currently supported.

  • Operators generally will not cast automatically. Users are responsible for casting columns to suitable types before evaluating a function.

  • Multiple assignments to the same name (i.e. a sequence of assignment statements where later statements are conditioned upon the output of earlier statements) is not supported.

[source]

Pandas Compatibility Note

DataFrame.from_arrow

  • Does not support automatically setting index column(s) similar to how to_pandas works for PyArrow Tables.

[source]

Pandas Compatibility Note

DataFrame.join

  • other must be a single DataFrame for now.

  • on is not supported yet due to lack of multi-index support.

[source]

Pandas Compatibility Note

DataFrame.kurtosis

Parameters currently not supported are level and numeric_only

[source]

Pandas Compatibility Note

DataFrame.kurtosis

Parameters currently not supported are level and numeric_only

[source]

Pandas Compatibility Note

DataFrame.max, Series.max

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.median, Series.median

Parameters currently not supported are level and numeric_only.

[source]

Pandas Compatibility Note

DataFrame.median, Series.median

Parameters currently not supported are level and numeric_only.

[source]

Pandas Compatibility Note

DataFrame.merge

DataFrames merges in cuDF result in non-deterministic row ordering.

[source]

Pandas Compatibility Note

DataFrame.min, Series.min

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.mode

axis parameter is currently not supported.

[source]

Pandas Compatibility Note

DataFrame.nlargest

  • Only a single column is supported in columns

[source]

Pandas Compatibility Note

DataFrame.nsmallest

  • Only a single column is supported in columns

[source]

Pandas Compatibility Note

DataFrame.product, Series.product

Parameters currently not supported are level`, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.product, Series.product

Parameters currently not supported are level`, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.quantile

One notable difference from Pandas is when DataFrame is of non-numeric types and result is expected to be a Series in case of Pandas. cuDF will return a DataFrame as it doesn't support mixed types under Series.

[source]

Pandas Compatibility Note

DataFrame.query

One difference from pandas is that query currently only supports numeric, datetime, timedelta, or bool dtypes.

[source]

Pandas Compatibility Note

DataFrame.reindex

Note: One difference from Pandas is that NA is used for rows that do not match, rather than NaN. One side effect of this is that the column http_status retains an integer dtype in cuDF where it is cast to float in Pandas.

[source]

Pandas Compatibility Note

DataFrame.rename

  • Not Supporting: level

Rename will not overwrite column names. If a list with duplicates is passed, column names will be postfixed with a number.

[source]

Pandas Compatibility Note

DataFrame.replace, Series.replace

Parameters that are currently not supported are: limit, regex, method

[source]

Pandas Compatibility Note

DataFrame.resample, Series.resample

Note that the dtype of the index (or the 'on' column if using 'on=') in the result will be of a frequency closest to the resampled frequency. For example, if resampling from nanoseconds to milliseconds, the index will be of dtype 'datetime64[ms]'.

[source]

Pandas Compatibility Note

DataFrame.sample, Series.sample

When sampling from axis=0/'index', random_state can be either a numpy random state (numpy.random.RandomState) or a cupy random state (cupy.random.RandomState). When a numpy random state is used, the output is guaranteed to match the output of the corresponding pandas method call, but generating the sample maybe slow. If exact pandas equivalence is not required, using a cupy random state will achieve better performance, especially when sampling large number of items. It's advised to use the matching ndarray type to the random state for the weights array.

[source]

Pandas Compatibility Note

DataFrame.skew, Series.skew, Frame.skew

The axis parameter is not currently supported.

[source]

Pandas Compatibility Note

DataFrame.sort_index, Series.sort_index

  • Not supporting: kind, sort_remaining=False

[source]

Pandas Compatibility Note

DataFrame.sort_values, Series.sort_values

  • Support axis='index' only.

  • Not supporting: inplace, kind

[source]

Pandas Compatibility Note

DataFrame.std, Series.std

Parameters currently not supported are level and numeric_only

[source]

Pandas Compatibility Note

DataFrame.sum, Series.sum

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.transpose, DataFrame.T

Not supporting copy because default and only behavior is copy=True

[source]

Pandas Compatibility Note

DataFrame.truncate, Series.truncate

The copy parameter is only present for API compatibility, but copy=False is not supported. This method always generates a copy.

[source]

Pandas Compatibility Note

DataFrame.var, Series.var

Parameters currently not supported are level and numeric_only

[source]

Pandas Compatibility Note

DataFrame.where, Series.where

Note that where treats missing values as falsy, in parallel with pandas treatment of nullable data:

>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0       1
1    <NA>
2    <NA>
dtype: int64
>>> gsr.where([True, False, False])
0       1
1    <NA>
2    <NA>
dtype: int64

[source]

Pandas Compatibility Note

DataFrame.all, Series.all

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.all, Series.all

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.any, Series.any

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.any, Series.any

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.max, Series.max

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.min, Series.min

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.where, Series.where

Note that where treats missing values as falsy, in parallel with pandas treatment of nullable data:

>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0       1
1    <NA>
2    <NA>
dtype: int64
>>> gsr.where([True, False, False])
0       1
1    <NA>
2    <NA>
dtype: int64

[source]

Pandas Compatibility Note

DataFrame.all, Series.all

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.all, Series.all

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.any, Series.any

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

DataFrame.any, Series.any

Parameters currently not supported are axis, bool_only, level.

[source]

Pandas Compatibility Note

Series.count

Parameters currently not supported is level.

[source]

Pandas Compatibility Note

Series.cov

min_periods parameter is not yet supported.

[source]

Pandas Compatibility Note

DataFrame.empty, Series.empty

If DataFrame/Series contains only null values, it is still not considered empty. See the example above.

[source]

Pandas Compatibility Note

DataFrame.kurtosis

Parameters currently not supported are level and numeric_only

[source]

Pandas Compatibility Note

DataFrame.kurtosis

Parameters currently not supported are level and numeric_only

[source]

Pandas Compatibility Note

Series.map

Please note map currently only supports fixed-width numeric type functions.

[source]

Pandas Compatibility Note

DataFrame.max, Series.max

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.median, Series.median

Parameters currently not supported are level and numeric_only.

[source]

Pandas Compatibility Note

DataFrame.median, Series.median

Parameters currently not supported are level and numeric_only.

[source]

Pandas Compatibility Note

DataFrame.min, Series.min

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.product, Series.product

Parameters currently not supported are level`, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.product, Series.product

Parameters currently not supported are level`, numeric_only.

[source]

Pandas Compatibility Note

Series.reindex

Note: One difference from Pandas is that NA is used for rows that do not match, rather than NaN. One side effect of this is that the series retains an integer dtype in cuDF where it is cast to float in Pandas.

[source]

Pandas Compatibility Note

Series.rename

  • Supports scalar values only for changing name attribute

  • The inplace and level is not supported

[source]

Pandas Compatibility Note

DataFrame.replace, Series.replace

Parameters that are currently not supported are: limit, regex, method

[source]

Pandas Compatibility Note

DataFrame.resample, Series.resample

Note that the dtype of the index (or the 'on' column if using 'on=') in the result will be of a frequency closest to the resampled frequency. For example, if resampling from nanoseconds to milliseconds, the index will be of dtype 'datetime64[ms]'.

[source]

Pandas Compatibility Note

DataFrame.sample, Series.sample

When sampling from axis=0/'index', random_state can be either a numpy random state (numpy.random.RandomState) or a cupy random state (cupy.random.RandomState). When a numpy random state is used, the output is guaranteed to match the output of the corresponding pandas method call, but generating the sample maybe slow. If exact pandas equivalence is not required, using a cupy random state will achieve better performance, especially when sampling large number of items. It's advised to use the matching ndarray type to the random state for the weights array.

[source]

Pandas Compatibility Note

DataFrame.skew, Series.skew, Frame.skew

The axis parameter is not currently supported.

[source]

Pandas Compatibility Note

DataFrame.sort_index, Series.sort_index

  • Not supporting: kind, sort_remaining=False

[source]

Pandas Compatibility Note

Series.sort_values

  • Support axis='index' only.

  • The inplace and kind argument is currently unsupported

[source]

Pandas Compatibility Note

DataFrame.std, Series.std

Parameters currently not supported are level and numeric_only

[source]

Pandas Compatibility Note

DataFrame.sum, Series.sum

Parameters currently not supported are level, numeric_only.

[source]

Pandas Compatibility Note

DataFrame.truncate, Series.truncate

The copy parameter is only present for API compatibility, but copy=False is not supported. This method always generates a copy.

[source]

Pandas Compatibility Note

DataFrame.var, Series.var

Parameters currently not supported are level and numeric_only

[source]

Pandas Compatibility Note

DataFrame.where, Series.where

Note that where treats missing values as falsy, in parallel with pandas treatment of nullable data:

>>> gsr = cudf.Series([1, 2, 3])
>>> gsr.where([True, False, cudf.NA])
0       1
1    <NA>
2    <NA>
dtype: int64
>>> gsr.where([True, False, False])
0       1
1    <NA>
2    <NA>
dtype: int64

[source]

Pandas Compatibility Note

ListMethods.sort_values

The inplace and kind arguments are currently not supported.

[source]

Pandas Compatibility Note

StringMethods.contains

The parameters case and na are not yet supported and will raise a NotImplementedError if anything other than the default value is set. The flags parameter currently only supports re.DOTALL and re.MULTILINE.

[source]

Pandas Compatibility Note

StringMethods.count

  • flags parameter currently only supports re.DOTALL and re.MULTILINE.

  • Some characters need to be escaped when passing in pat. e.g. '$' has a special meaning in regex and must be escaped when finding this literal character.

[source]

Pandas Compatibility Note

StringMethods.endswith

na parameter is not yet supported, as cudf uses native strings instead of Python objects.

[source]

Pandas Compatibility Note

StringMethods.extract

The flags parameter currently only supports re.DOTALL and re.MULTILINE.

[source]

Pandas Compatibility Note

StringMethods.findall

The flags parameter currently only supports re.DOTALL and re.MULTILINE.

[source]

Pandas Compatibility Note

StringMethods.match

Parameters case and na are currently not supported. The flags parameter currently only supports re.DOTALL and re.MULTILINE.

[source]

Pandas Compatibility Note

StringMethods.partition

The parameter expand is not yet supported and will raise a NotImplementedError if anything other than the default value is set.

[source]

Pandas Compatibility Note

StringMethods.replace

The parameters case and flags are not yet supported and will raise a NotImplementedError if anything other than the default value is set.

[source]

Pandas Compatibility Note

DataFrameGroupBy.idxmax

The numeric_only, min_count

[source]

Pandas Compatibility Note

DataFrameGroupBy.idxmin

The numeric_only, min_count

[source]

Pandas Compatibility Note

DataFrameGroupBy.nunique

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.shift

Parameter freq is unsupported.

[source]

Pandas Compatibility Note

GroupBy.apply

cuDF's groupby.apply is limited compared to pandas. In some situations, Pandas returns the grouped keys as part of the index while cudf does not due to redundancy. For example:

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'a': [1, 1, 2, 2],
...     'b': [1, 2, 1, 2],
...     'c': [1, 2, 3, 4],
... })
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('a')[["b", "c"]].apply(lambda x: x.iloc[[0]])
     b  c
a
1 0  1  1
2 2  1  3
>>> gdf.groupby('a')[["b", "c"]].apply(lambda x: x.iloc[[0]])
   b  c
0  1  1
2  1  3

[source]

Pandas Compatibility Note

GroupBy.first

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.idxmax

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.idxmin

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.last

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.max

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.mean

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.median

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.min

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.nunique

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.prod

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.shift

Parameter freq is unsupported.

[source]

Pandas Compatibility Note

GroupBy.sum

The numeric_only, min_count

[source]

Pandas Compatibility Note

GroupBy.nunique

The numeric_only, min_count

[source]

Pandas Compatibility Note

series.DatetimeProperties.strftime

The following date format identifiers are not yet supported: %c, %x,``%X``

[source]

Pandas Compatibility Note

DataFrame.merge

DataFrames merges in cuDF result in non-deterministic row ordering.

[source]

Pandas Compatibility Note

cudf.to_numeric

An important difference from pandas is that this function does not accept mixed numeric/non-numeric type sequences. For example [1, 'a']. A TypeError will be raised when such input is received, regardless of errors parameter.

[source]