cudf.DataFrame.reindex#

DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=<NA>, limit=None, tolerance=None)#

Conform DataFrame to new index. Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:

labelsIndex, Series-convertible, optional, default None: New labels / index to conform the axis specified by axis to.
indexIndex, Series-convertible, optional, default None: The index labels specifying the index to conform to.
columnsarray-like, optional, default None: The column labels specifying the columns to conform to.
axisAxis to target.: Can be either the axis name (index, columns) or number (0, 1).
methodNot supported
copyboolean, default True: Return a new object, even if the passed indexes are the same.
levelNot supported
fill_valueValue to use for missing values.: Defaults to NA, but can be any “compatible” value.
limitNot supported
toleranceNot supported

Returns:

DataFrame with changed index.

Examples

DataFrame.reindex supports two calling conventions * (index=index_labels, columns=column_labels, ...) * (labels, axis={'index', 'columns'}, ...) We _highly_ recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = cudf.DataFrame({'http_status': [200, 200, 404, 404, 301],
...                    'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...                      index=index)
>>> df
        http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00
>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...              'Chrome']
>>> df.reindex(new_index)
            http_status response_time
Safari                404          0.07
Iceweasel            <NA>          <NA>
Comodo Dragon        <NA>          <NA>
IE10                  404          0.08
Chrome                200          0.02

Pandas Compatibility Note

DataFrame.reindex

Note: One difference from Pandas is that NA is used for rows that do not match, rather than NaN. One side effect of this is that the column http_status retains an integer dtype in cuDF where it is cast to float in Pandas.

We can fill in the missing values by passing a value to the keyword fill_value.

>>> df.reindex(new_index, fill_value=0)
            http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02

We can also reindex the columns.

>>> df.reindex(columns=['http_status', 'user_agent'])
        http_status user_agent
Firefox            200       <NA>
Chrome             200       <NA>
Safari             404       <NA>
IE10               404       <NA>
Konqueror          301       <NA>

Or we can use “axis-style” keyword arguments

>>> df.reindex(columns=['http_status', 'user_agent'])
        http_status user_agent
Firefox            200       <NA>
Chrome             200       <NA>
Safari             404       <NA>
IE10               404       <NA>
Konqueror          301       <NA>