cudf.DataFrame.reindex#
- DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=<NA>, limit=None, tolerance=None)[source]#
Conform DataFrame to new index. Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.
- Parameters:
- labelsIndex, Series-convertible, optional, default None
New labels / index to conform the axis specified by
axis
to.- indexIndex, Series-convertible, optional, default None
The index labels specifying the index to conform to.
- columnsarray-like, optional, default None
The column labels specifying the columns to conform to.
- axisAxis to target.
Can be either the axis name (
index
,columns
) or number (0, 1).- methodNot supported
- copyboolean, default True
Return a new object, even if the passed indexes are the same.
- levelNot supported
- fill_valueValue to use for missing values.
Defaults to
NA
, but can be any “compatible” value.- limitNot supported
- toleranceNot supported
- Returns:
- DataFrame with changed index.
Examples
DataFrame.reindex
supports two calling conventions *(index=index_labels, columns=column_labels, ...)
*(labels, axis={'index', 'columns'}, ...)
We _highly_ recommend using keyword arguments to clarify your intent.Create a dataframe with some fictional data.
>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror'] >>> df = cudf.DataFrame({'http_status': [200, 200, 404, 404, 301], ... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]}, ... index=index) >>> df http_status response_time Firefox 200 0.04 Chrome 200 0.02 Safari 404 0.07 IE10 404 0.08 Konqueror 301 1.00 >>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10', ... 'Chrome'] >>> df.reindex(new_index) http_status response_time Safari 404 0.07 Iceweasel <NA> <NA> Comodo Dragon <NA> <NA> IE10 404 0.08 Chrome 200 0.02
Pandas Compatibility Note
Note: One difference from Pandas is that
NA
is used for rows that do not match, rather thanNaN
. One side effect of this is that the columnhttp_status
retains an integer dtype in cuDF where it is cast to float in Pandas.We can fill in the missing values by passing a value to the keyword
fill_value
.>>> df.reindex(new_index, fill_value=0) http_status response_time Safari 404 0.07 Iceweasel 0 0.00 Comodo Dragon 0 0.00 IE10 404 0.08 Chrome 200 0.02
We can also reindex the columns.
>>> df.reindex(columns=['http_status', 'user_agent']) http_status user_agent Firefox 200 <NA> Chrome 200 <NA> Safari 404 <NA> IE10 404 <NA> Konqueror 301 <NA>
Or we can use “axis-style” keyword arguments
>>> df.reindex(columns=['http_status', 'user_agent']) http_status user_agent Firefox 200 <NA> Chrome 200 <NA> Safari 404 <NA> IE10 404 <NA> Konqueror 301 <NA>