cudf.DataFrame#

class cudf.DataFrame(data=None, index=None, columns=None, dtype=None, nan_as_null=True)#

A GPU Dataframe object.

Parameters
dataarray-like, Iterable, dict, or DataFrame.

Dict can contain Series, arrays, constants, or list-like objects.

indexIndex or array-like

Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.

columnsIndex or array-like

Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.

dtypedtype, default None

Data type to force. Only a single dtype is allowed. If None, infer.

nan_as_nullbool, Default True

If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Examples

Build dataframe with __setitem__:

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df
   key   val
0    0  10.0
1    1  11.0
2    2  12.0
3    3  13.0
4    4  14.0

Build DataFrame via dict of columns:

>>> import numpy as np
>>> from datetime import datetime, timedelta
>>> t0 = datetime.strptime('2018-10-07 12:00:00', '%Y-%m-%d %H:%M:%S')
>>> n = 5
>>> df = cudf.DataFrame({
...     'id': np.arange(n),
...     'datetimes': np.array(
...     [(t0+ timedelta(seconds=x)) for x in range(n)])
... })
>>> df
    id            datetimes
0    0  2018-10-07 12:00:00
1    1  2018-10-07 12:00:01
2    2  2018-10-07 12:00:02
3    3  2018-10-07 12:00:03
4    4  2018-10-07 12:00:04

Build DataFrame via list of rows as tuples:

>>> df = cudf.DataFrame([
...     (5, "cats", "jump", np.nan),
...     (2, "dogs", "dig", 7.5),
...     (3, "cows", "moo", -2.1, "occasionally"),
... ])
>>> df
   0     1     2     3             4
0  5  cats  jump  <NA>          <NA>
1  2  dogs   dig   7.5          <NA>
2  3  cows   moo  -2.1  occasionally

Convert from a Pandas DataFrame:

>>> import pandas as pd
>>> pdf = pd.DataFrame({'a': [0, 1, 2, 3],'b': [0.1, 0.2, None, 0.3]})
>>> pdf
   a    b
0  0  0.1
1  1  0.2
2  2  NaN
3  3  0.3
>>> df = cudf.from_pandas(pdf)
>>> df
   a     b
0  0   0.1
1  1   0.2
2  2  <NA>
3  3   0.3

Attributes

T

Transpose index and columns.

at

Alias for DataFrame.loc; provided for compatibility with Pandas.

axes

Return a list representing the axes of the DataFrame.

columns

Returns a tuple of columns

dtypes

Return the dtypes in this object.

empty

Indicator whether DataFrame or Series is empty.

iat

Alias for DataFrame.iloc; provided for compatibility with Pandas.

index

Get the labels for the rows.

ndim

Dimension of the data.

shape

Returns a tuple representing the dimensionality of the DataFrame.

size

Return the number of elements in the underlying data.

values

Return a CuPy representation of the DataFrame.

values_host

Return a NumPy representation of the data.

Methods

abs()

Return a Series/DataFrame with absolute numeric value of each element.

add(other[, axis, level, fill_value])

Get Addition of DataFrame or Series and other, element-wise (binary operator add).

add_prefix(prefix)

Prefix labels with string prefix.

add_suffix(suffix)

Suffix labels with string suffix.

agg(aggs[, axis])

Aggregate using one or more operations over the specified axis.

all([axis, bool_only, skipna, level])

Return whether all elements are True in DataFrame.

any([axis, bool_only, skipna, level])

Return whether any elements is True in DataFrame.

append(other[, ignore_index, ...])

Append rows of other to the end of caller, returning a new object.

apply(func[, axis, raw, result_type, args])

Apply a function along an axis of the DataFrame.

apply_chunks(func, incols, outcols[, ...])

Transform user-specified chunks using the user-provided function.

apply_rows(func, incols, outcols, kwargs[, ...])

Apply a row-wise user defined function.

applymap(func[, na_action])

Apply a function to a Dataframe elementwise.

argsort([by, axis, kind, order, ascending, ...])

Return the integer indices that would sort the Series values.

assign(**kwargs)

Assign columns to DataFrame from keyword arguments.

astype(dtype[, copy, errors])

Cast the object to the given dtype.

backfill([value, axis, inplace, limit])

Synonym for Series.fillna() with method='bfill'.

bfill([value, axis, inplace, limit])

Synonym for Series.fillna() with method='bfill'.

clip([lower, upper, inplace, axis])

Trim values at input threshold(s).

copy([deep])

Make a copy of this object's indices and data.

corr([method, min_periods])

Compute the correlation matrix of a DataFrame.

count([axis, level, numeric_only])

Count non-NA cells for each column or row.

cov(**kwargs)

Compute the covariance matrix of a DataFrame.

cummax([axis])

Return cumulative max of the IndexedFrame.

cummin([axis])

Return cumulative min of the IndexedFrame.

cumprod([axis])

Return cumulative product of the IndexedFrame.

cumsum([axis])

Return cumulative sum of the IndexedFrame.

describe([percentiles, include, exclude, ...])

Generate descriptive statistics.

deserialize(header, frames)

Generate an object from a serialized representation.

device_deserialize(header, frames)

Perform device-side deserialization tasks.

device_serialize()

Serialize data and metadata associated with device memory.

diff([periods, axis])

First discrete difference of element.

div(other[, axis, level, fill_value])

Get Floating division of DataFrame or Series and other, element-wise (binary operator truediv).

divide(other[, axis, level, fill_value])

Get Floating division of DataFrame or Series and other, element-wise (binary operator truediv).

dot(other[, reflect])

Get dot product of frame and other, (binary operator dot).

drop([labels, axis, index, columns, level, ...])

Drop specified labels from rows or columns.

drop_duplicates([subset, keep, inplace, ...])

Return DataFrame with duplicate rows removed, optionally only considering certain subset of columns.

dropna([axis, how, thresh, subset, inplace])

Drop rows (or columns) containing nulls from a Column.

eq(other[, axis, level, fill_value])

Get Equal to of DataFrame or Series and other, element-wise (binary operator eq).

equals(other, **kwargs)

Test whether two objects contain the same elements.

eval(expr[, inplace])

Evaluate a string describing operations on DataFrame columns.

explode(column[, ignore_index])

Transform each element of a list-like to a row, replicating index values.

ffill([value, axis, inplace, limit])

Synonym for Series.fillna() with method='ffill'.

fillna([value, method, axis, inplace, limit])

Fill null values with value or specified method.

first(offset)

Select initial periods of time series data based on a date offset.

floordiv(other[, axis, level, fill_value])

Get Integer division of DataFrame or Series and other, element-wise (binary operator floordiv).

from_arrow(table)

Convert from PyArrow Table to DataFrame.

from_pandas(dataframe[, nan_as_null])

Convert from a Pandas DataFrame.

from_records(data[, index, columns, nan_as_null])

Convert structured or record ndarray to DataFrame.

ge(other[, axis, level, fill_value])

Get Greater than or equal to of DataFrame or Series and other, element-wise (binary operator ge).

groupby([by, axis, level, as_index, sort, ...])

Group using a mapper or by a Series of columns.

gt(other[, axis, level, fill_value])

Get Greater than of DataFrame or Series and other, element-wise (binary operator gt).

hash_values([method])

Compute the hash of values in this column.

head([n])

Return the first n rows.

host_deserialize(header, frames)

Perform device-side deserialization tasks.

host_serialize()

Serialize data and metadata associated with host memory.

info([verbose, buf, max_cols, memory_usage, ...])

Print a concise summary of a DataFrame.

insert(loc, name, value[, nan_as_null])

Add a column to DataFrame at the index specified by loc.

interleave_columns()

Interleave Series columns of a table into a single column.

interpolate([method, axis, limit, inplace, ...])

Interpolate data values between some points.

isin(values)

Whether each element in the DataFrame is contained in values.

isna()

Identify missing values.

isnull()

Identify missing values.

items()

Iterate over column names and series pairs

join(other[, on, how, lsuffix, rsuffix, sort])

Join columns with other DataFrame on index or on a key column.

keys()

Get the columns.

kurt([axis, skipna, level, numeric_only])

Return Fisher's unbiased kurtosis of a sample.

kurtosis([axis, skipna, level, numeric_only])

Return Fisher's unbiased kurtosis of a sample.

last(offset)

Select final periods of time series data based on a date offset.

le(other[, axis, level, fill_value])

Get Less than or equal to of DataFrame or Series and other, element-wise (binary operator le).

lt(other[, axis, level, fill_value])

Get Less than of DataFrame or Series and other, element-wise (binary operator lt).

mask(cond[, other, inplace])

Replace values where the condition is True.

max([axis, skipna, level, numeric_only])

Return the maximum of the values in the DataFrame.

mean([axis, skipna, level, numeric_only])

Return the mean of the values for the requested axis.

median([axis, skipna, level, numeric_only])

Return the median of the values for the requested axis.

melt(**kwargs)

Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.

memory_usage([index, deep])

Return the memory usage of an object.

merge(right[, on, left_on, right_on, ...])

Merge GPU DataFrame objects by performing a database-style join operation by columns or indexes.

min([axis, skipna, level, numeric_only])

Return the minimum of the values in the DataFrame.

mod(other[, axis, level, fill_value])

Get Modulo of DataFrame or Series and other, element-wise (binary operator mod).

mode([axis, numeric_only, dropna])

Get the mode(s) of each element along the selected axis.

mul(other[, axis, level, fill_value])

Get Multiplication of DataFrame or Series and other, element-wise (binary operator mul).

multiply(other[, axis, level, fill_value])

Get Multiplication of DataFrame or Series and other, element-wise (binary operator mul).

nans_to_nulls()

Convert nans (if any) to nulls

ne(other[, axis, level, fill_value])

Get Not equal to of DataFrame or Series and other, element-wise (binary operator ne).

nlargest(n, columns[, keep])

Get the rows of the DataFrame sorted by the n largest value of columns

notna()

Identify non-missing values.

notnull()

Identify non-missing values.

nsmallest(n, columns[, keep])

Get the rows of the DataFrame sorted by the n smallest value of columns

nunique([axis, dropna])

Count number of distinct elements in specified axis.

pad([value, axis, inplace, limit])

Synonym for Series.fillna() with method='ffill'.

partition_by_hash(columns, nparts[, keep_index])

Partition the dataframe by the hashed value of data in columns.

pct_change([periods, fill_method, limit, freq])

Calculates the percent change between sequential elements in the DataFrame.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

pivot(index, columns[, values])

Return reshaped DataFrame organized by the given index and column values.

pop(item)

Return a column and drop it from the DataFrame.

pow(other[, axis, level, fill_value])

Get Exponential of DataFrame or Series and other, element-wise (binary operator pow).

prod([axis, skipna, dtype, level, ...])

Return product of the values in the DataFrame.

product([axis, skipna, dtype, level, ...])

Return product of the values in the DataFrame.

quantile([q, axis, numeric_only, ...])

Return values at the given quantile.

quantiles([q, interpolation])

Return values at the given quantile.

query(expr[, local_dict])

Query with a boolean expression using Numba to compile a GPU kernel.

radd(other[, axis, level, fill_value])

Get Addition of DataFrame or Series and other, element-wise (binary operator radd).

rank([axis, method, numeric_only, ...])

Compute numerical data ranks (1 through n) along axis.

rdiv(other[, axis, level, fill_value])

Get Floating division of DataFrame or Series and other, element-wise (binary operator rtruediv).

reindex([labels, index, columns, axis, ...])

Conform DataFrame to new index.

rename([mapper, index, columns, axis, copy, ...])

Alter column and index labels.

repeat(repeats[, axis])

Repeats elements consecutively.

replace([to_replace, value, inplace, limit, ...])

Replace values given in to_replace with value.

resample(rule[, axis, closed, label, ...])

Convert the frequency of ("resample") the given time series data.

reset_index([level, drop, inplace, ...])

Reset the index of the DataFrame, or a level of it.

rfloordiv(other[, axis, level, fill_value])

Get Integer division of DataFrame or Series and other, element-wise (binary operator rfloordiv).

rmod(other[, axis, level, fill_value])

Get Modulo of DataFrame or Series and other, element-wise (binary operator rmod).

rmul(other[, axis, level, fill_value])

Get Multiplication of DataFrame or Series and other, element-wise (binary operator rmul).

rolling(window[, min_periods, center, axis, ...])

Rolling window calculations.

round([decimals, how])

Round to a variable number of decimal places.

rpow(other[, axis, level, fill_value])

Get Exponential of DataFrame or Series and other, element-wise (binary operator rpow).

rsub(other[, axis, level, fill_value])

Get Subtraction of DataFrame or Series and other, element-wise (binary operator rsub).

rtruediv(other[, axis, level, fill_value])

Get Floating division of DataFrame or Series and other, element-wise (binary operator rtruediv).

sample([n, frac, replace, weights, ...])

Return a random sample of items from an axis of object.

scale()

Scale values to [0, 1] in float64

scatter_by_map(map_index[, map_size, keep_index])

Scatter to a list of dataframes.

searchsorted(values[, side, ascending, ...])

Find indices where elements should be inserted to maintain order

select_dtypes([include, exclude])

Return a subset of the DataFrame’s columns based on the column dtypes.

serialize()

Generate an equivalent serializable representation of an object.

set_index(keys[, drop, append, inplace, ...])

Return a new DataFrame with a new index

shift([periods, freq, axis, fill_value])

Shift values by periods positions.

skew([axis, skipna, level, numeric_only])

Return unbiased Fisher-Pearson skew of a sample.

sort_index([axis, level, ascending, ...])

Sort object by labels (along an axis).

sort_values(by[, axis, ascending, inplace, ...])

Sort by the values along either axis.

stack([level, dropna])

Stack the prescribed level(s) from columns to index

std([axis, skipna, level, ddof, numeric_only])

Return sample standard deviation of the DataFrame.

sub(other[, axis, level, fill_value])

Get Subtraction of DataFrame or Series and other, element-wise (binary operator sub).

subtract(other[, axis, level, fill_value])

Get Subtraction of DataFrame or Series and other, element-wise (binary operator sub).

sum([axis, skipna, dtype, level, ...])

Return sum of the values in the DataFrame.

sum_of_squares([dtype])

Return the sum of squares of values.

swaplevel([i, j, axis])

Swap level i with level j.

tail([n])

Returns the last n rows as a new DataFrame or Series

take(indices[, axis])

Return a new frame containing the rows specified by indices.

tile(count)

Repeats the rows count times to form a new Frame.

to_arrow([preserve_index])

Convert to a PyArrow Table.

to_csv([path_or_buf, sep, na_rep, columns, ...])

Write a dataframe to csv file format.

to_cupy([dtype, copy, na_value])

Convert the Frame to a CuPy array.

to_dlpack()

Converts a cuDF object into a DLPack tensor.

to_feather(path, *args, **kwargs)

Write a DataFrame to the feather format.

to_hdf(path_or_buf, key, *args, **kwargs)

Write the contained data to an HDF5 file using HDFStore.

to_json([path_or_buf])

Convert the cuDF object to a JSON string.

to_numpy([dtype, copy, na_value])

Convert the Frame to a NumPy array.

to_orc(fname[, compression])

Write a DataFrame to the ORC format.

to_pandas([nullable])

Convert to a Pandas DataFrame.

to_parquet(path, *args, **kwargs)

Write a DataFrame to the parquet format.

to_records([index])

Convert to a numpy recarray

to_string()

Convert to string

to_struct([name])

Return a struct Series composed of the columns of the DataFrame.

transpose()

Transpose index and columns.

truediv(other[, axis, level, fill_value])

Get Floating division of DataFrame or Series and other, element-wise (binary operator truediv).

unstack([level, fill_value])

Pivot one or more levels of the (necessarily hierarchical) index labels.

update(other[, join, overwrite, ...])

Modify a DataFrame in place using non-NA values from another DataFrame.

value_counts([subset, normalize, sort, ...])

Return a Series containing counts of unique rows in the DataFrame.

var([axis, skipna, level, ddof, numeric_only])

Return unbiased variance of the DataFrame.

where(cond[, other, inplace])

Replace values where the condition is False.

iterrows

itertuples

to_dict