cudf.DataFrame.reindex#

DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=<NA>, limit=None, tolerance=None)[source]#

Conform DataFrame to new index. Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:
labelsIndex, Series-convertible, optional, default None

New labels / index to conform the axis specified by axis to.

indexIndex, Series-convertible, optional, default None

The index labels specifying the index to conform to.

columnsarray-like, optional, default None

The column labels specifying the columns to conform to.

axisAxis to target.

Can be either the axis name (index, columns) or number (0, 1).

methodNot supported
copyboolean, default True

Return a new object, even if the passed indexes are the same.

levelNot supported
fill_valueValue to use for missing values.

Defaults to NA, but can be any “compatible” value.

limitNot supported
toleranceNot supported
Returns:
DataFrame with changed index.

Examples

DataFrame.reindex supports two calling conventions * (index=index_labels, columns=column_labels, ...) * (labels, axis={'index', 'columns'}, ...) We _highly_ recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = cudf.DataFrame({'http_status': [200, 200, 404, 404, 301],
...                    'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...                      index=index)
>>> df
        http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00
>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...              'Chrome']
>>> df.reindex(new_index)
            http_status response_time
Safari                404          0.07
Iceweasel            <NA>          <NA>
Comodo Dragon        <NA>          <NA>
IE10                  404          0.08
Chrome                200          0.02

Pandas Compatibility Note

DataFrame.reindex

Note: One difference from Pandas is that NA is used for rows that do not match, rather than NaN. One side effect of this is that the column http_status retains an integer dtype in cuDF where it is cast to float in Pandas.

We can fill in the missing values by passing a value to the keyword fill_value.

>>> df.reindex(new_index, fill_value=0)
            http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02

We can also reindex the columns.

>>> df.reindex(columns=['http_status', 'user_agent'])
        http_status user_agent
Firefox            200       <NA>
Chrome             200       <NA>
Safari             404       <NA>
IE10               404       <NA>
Konqueror          301       <NA>

Or we can use “axis-style” keyword arguments

>>> df.reindex(columns=['http_status', 'user_agent'])
        http_status user_agent
Firefox            200       <NA>
Chrome             200       <NA>
Safari             404       <NA>
IE10               404       <NA>
Konqueror          301       <NA>