Pandas Compatibility Notes

Pandas Compatibility Note

MultiIndex.get_loc

The return types of this function may deviates from the method provided by Pandas. If the index is neither lexicographically sorted nor unique, a best effort attempt is made to coerce the found indices into a slice. For example:

>>> import pandas as pd
>>> import cudf
>>> x = pd.MultiIndex.from_tuples(
            [(2, 1, 1), (1, 2, 3), (1, 2, 1),
                (1, 1, 1), (1, 1, 1), (2, 2, 1)]
        )
>>> x.get_loc(1)
array([False,  True,  True,  True,  True, False])
>>> cudf.from_pandas(x).get_loc(1)
slice(1, 5, 1)

[source]

Pandas Compatibility Note

groupby.fillna

This function may return result in different format to the method Pandas supports. For example:

>>> df = pd.DataFrame({'k': [1, 1, 2], 'v': [2, None, 4]})
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('k').fillna({'v': 4}) # pandas
       v
k
1 0  2.0
  1  4.0
2 2  4.0
>>> gdf.groupby('k').fillna({'v': 4}) # cudf
     v
0  2.0
1  4.0
2  4.0

[source]

Pandas Compatibility Note

groupby.apply

cuDF's groupby.apply is limited compared to pandas. In some situations, Pandas returns the grouped keys as part of the index while cudf does not due to redundancy. For example:

>>> df = pd.DataFrame({
    'a': [1, 1, 2, 2],
    'b': [1, 2, 1, 2],
    'c': [1, 2, 3, 4]})
>>> gdf = cudf.from_pandas(df)
>>> df.groupby('a').apply(lambda x: x.iloc[[0]])
        a  b  c
    a
    1 0  1  1  1
    2 2  2  1  3
>>> gdf.groupby('a').apply(lambda x: x.iloc[[0]])
        a  b  c
    0  1  1  1
    2  2  1  3

[source]