cudf.DataFrame#

class cudf.DataFrame(data=None, index=None, columns=None, dtype=None, nan_as_null=True)#

A GPU Dataframe object.

Parameters:

dataarray-like, Iterable, dict, or DataFrame.: Dict can contain Series, arrays, constants, or list-like objects.
indexIndex or array-like: Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
columnsIndex or array-like: Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
dtypedtype, default None: Data type to force. Only a single dtype is allowed. If None, infer.
nan_as_nullbool, Default True: If None/True, converts np.nan values to null values. If False, leaves np.nan values as is.

Examples

Build dataframe with __setitem__:

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df
   key   val
0    0  10.0
1    1  11.0
2    2  12.0
3    3  13.0
4    4  14.0

Build DataFrame via dict of columns:

>>> import numpy as np
>>> from datetime import datetime, timedelta
>>> t0 = datetime.strptime('2018-10-07 12:00:00', '%Y-%m-%d %H:%M:%S')
>>> n = 5
>>> df = cudf.DataFrame({
...     'id': np.arange(n),
...     'datetimes': np.array(
...     [(t0+ timedelta(seconds=x)) for x in range(n)])
... })
>>> df
    id            datetimes
0    0  2018-10-07 12:00:00
1    1  2018-10-07 12:00:01
2    2  2018-10-07 12:00:02
3    3  2018-10-07 12:00:03
4    4  2018-10-07 12:00:04

Build DataFrame via list of rows as tuples:

>>> df = cudf.DataFrame([
...     (5, "cats", "jump", np.nan),
...     (2, "dogs", "dig", 7.5),
...     (3, "cows", "moo", -2.1, "occasionally"),
... ])
>>> df
   0     1     2     3             4
0  5  cats  jump  <NA>          <NA>
1  2  dogs   dig   7.5          <NA>
2  3  cows   moo  -2.1  occasionally

Convert from a Pandas DataFrame:

>>> import pandas as pd
>>> pdf = pd.DataFrame({'a': [0, 1, 2, 3],'b': [0.1, 0.2, None, 0.3]})
>>> pdf
   a    b
0  0  0.1
1  1  0.2
2  2  NaN
3  3  0.3
>>> df = cudf.from_pandas(pdf)
>>> df
   a     b
0  0   0.1
1  1   0.2
2  2  <NA>
3  3   0.3

Attributes

`T`	Transpose index and columns.
`at`	Alias for `DataFrame.loc`; provided for compatibility with Pandas.
`axes`	Return a list representing the axes of the DataFrame.
`columns`	Returns a tuple of columns
`dtypes`	Return the dtypes in this object.
`empty`	Indicator whether DataFrame or Series is empty.
`iat`	Alias for `DataFrame.iloc`; provided for compatibility with Pandas.
`index`	Get the labels for the rows.
`ndim`	Dimension of the data.
`shape`	Returns a tuple representing the dimensionality of the DataFrame.
`size`	Return the number of elements in the underlying data.
`values`	Return a CuPy representation of the DataFrame.
`values_host`	Return a NumPy representation of the data.

iloc

Select values by position. Examples ——– Series >>> import cudf >>> s = cudf.Series([10, 20, 30]) >>> s 0 10 1 20 2 30 dtype: int64 >>> s.iloc[2] 30 DataFrame Selecting rows and column by position. >>> df = cudf.DataFrame({‘a’: range(20), … ‘b’: range(20), … ‘c’: range(20)}) Select a single row using an integer index. >>> df.iloc[1] a 1 b 1 c 1 Name: 1, dtype: int64 Select multiple rows using a list of integers. >>> df.iloc[[0, 2, 9, 18]] a b c 0 0 0 0 2 2 2 2 9 9 9 9 18 18 18 18 Select rows using a slice. >>> df.iloc[3:10:2] a b c 3 3 3 3 5 5 5 5 7 7 7 7 9 9 9 9 Select both rows and columns. >>> df.iloc[[1, 3, 5, 7], 2] 1 1 3 3 5 5 7 7 Name: c, dtype: int64 Setting values in a column using iloc. >>> df.iloc[:4] = 0 >>> df a b c 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 [10 more rows]

loc

Select rows and columns by label or boolean mask. Examples ——– Series >>> import cudf >>> series = cudf.Series([10, 11, 12], index=[‘a’, ‘b’, ‘c’]) >>> series a 10 b 11 c 12 dtype: int64 >>> series.loc[‘b’] 11 DataFrame DataFrame with string index. >>> df a b a 0 5 b 1 6 c 2 7 d 3 8 e 4 9 Select a single row by label. >>> df.loc[‘a’] a 0 b 5 Name: a, dtype: int64 Select multiple rows and a single column. >>> df.loc[[‘a’, ‘c’, ‘e’], ‘b’] a 5 c 7 e 9 Name: b, dtype: int64 Selection by boolean mask. >>> df.loc[df.a > 2] a b d 3 8 e 4 9 Setting values using loc. >>> df.loc[[‘a’, ‘c’, ‘e’], ‘a’] = 0 >>> df a b a 0 5 b 1 6 c 0 7 d 3 8 e 0 9

Methods

`abs`()	Return a Series/DataFrame with absolute numeric value of each element.
`add`(other[, axis, level, fill_value])	Get Addition of DataFrame or Series and other, element-wise (binary operator add).
`add_prefix`(prefix)	Prefix labels with string prefix.
`add_suffix`(suffix)	Suffix labels with string suffix.
`agg`(aggs[, axis])	Aggregate using one or more operations over the specified axis.
`all`([axis, bool_only, skipna])	Return whether all elements are True in DataFrame.
`any`([axis, bool_only, skipna])	Return whether any elements is True in DataFrame.
`apply`(func[, axis, raw, result_type, args])	Apply a function along an axis of the DataFrame.
`apply_chunks`(func, incols, outcols[, ...])	Transform user-specified chunks using the user-provided function.
`apply_rows`(func, incols, outcols, kwargs[, ...])	Apply a row-wise user defined function.
`applymap`(func[, na_action])	Apply a function to a Dataframe elementwise.
`argsort`([by, axis, kind, order, ascending, ...])	Return the integer indices that would sort the Series values.
`assign`(**kwargs)	Assign columns to DataFrame from keyword arguments.
`astype`(dtype[, copy, errors])	Cast the object to the given dtype.
`backfill`([value, axis, inplace, limit])	Synonym for `Series.fillna()` with `method='bfill'`.
`bfill`([value, axis, inplace, limit])	Synonym for `Series.fillna()` with `method='bfill'`.
`clip`([lower, upper, inplace, axis])	Trim values at input threshold(s).
`convert_dtypes`([infer_objects, ...])	Convert columns to the best possible nullable dtypes.
`copy`([deep])	Make a copy of this object's indices and data.
`corr`([method, min_periods])	Compute the correlation matrix of a DataFrame.
`count`([axis, numeric_only])	Count `non-NA` cells for each column or row.
`cov`(**kwargs)	Compute the covariance matrix of a DataFrame.
`cummax`([axis])	Return cumulative max of the IndexedFrame.
`cummin`([axis])	Return cumulative min of the IndexedFrame.
`cumprod`([axis])	Return cumulative product of the IndexedFrame.
`cumsum`([axis])	Return cumulative sum of the IndexedFrame.
`describe`([percentiles, include, exclude])	Generate descriptive statistics.
`deserialize`(header, frames)	Generate an object from a serialized representation.
`device_deserialize`(header, frames)	Perform device-side deserialization tasks.
`device_serialize`()	Serialize data and metadata associated with device memory.
`diff`([periods, axis])	First discrete difference of element.
`div`(other[, axis, level, fill_value])	Get Floating division of DataFrame or Series and other, element-wise (binary operator truediv).
`divide`(other[, axis, level, fill_value])	Get Floating division of DataFrame or Series and other, element-wise (binary operator truediv).
`dot`(other[, reflect])	Get dot product of frame and other, (binary operator dot).
`drop`([labels, axis, index, columns, level, ...])	Drop specified labels from rows or columns.
`drop_duplicates`([subset, keep, inplace, ...])	Return DataFrame with duplicate rows removed.
`dropna`([axis, how, thresh, subset, inplace])	Drop rows (or columns) containing nulls from a Column.
`duplicated`([subset, keep])	Return boolean Series denoting duplicate rows.
`eq`(other[, axis, level, fill_value])	Get Equal to of DataFrame or Series and other, element-wise (binary operator eq).
`equals`(other)	Test whether two objects contain the same elements.
`eval`(expr[, inplace])	Evaluate a string describing operations on DataFrame columns.
`explode`(column[, ignore_index])	Transform each element of a list-like to a row, replicating index values.
`ffill`([value, axis, inplace, limit])	Synonym for `Series.fillna()` with `method='ffill'`.
`fillna`([value, method, axis, inplace, limit])	Fill null values with `value` or specified `method`.
`first`(offset)	Select initial periods of time series data based on a date offset.
`floordiv`(other[, axis, level, fill_value])	Get Integer division of DataFrame or Series and other, element-wise (binary operator floordiv).
`from_arrow`(table)	Convert from PyArrow Table to DataFrame.
`from_dict`(data[, orient, dtype, columns])	Construct DataFrame from dict of array-like or dicts.
`from_pandas`(dataframe[, nan_as_null])	Convert from a Pandas DataFrame.
`from_records`(data[, index, columns, nan_as_null])	Convert structured or record ndarray to DataFrame.
`ge`(other[, axis, level, fill_value])	Get Greater than or equal to of DataFrame or Series and other, element-wise (binary operator ge).
`groupby`([by, axis, level, as_index, sort, ...])	Group using a mapper or by a Series of columns.
`gt`(other[, axis, level, fill_value])	Get Greater than of DataFrame or Series and other, element-wise (binary operator gt).
`hash_values`([method, seed])	Compute the hash of values in this column.
`head`([n])	Return the first n rows.
`host_deserialize`(header, frames)	Perform device-side deserialization tasks.
`host_serialize`()	Serialize data and metadata associated with host memory.
`info`([verbose, buf, max_cols, memory_usage, ...])	Print a concise summary of a DataFrame.
`insert`(loc, name, value[, nan_as_null])	Add a column to DataFrame at the index specified by loc.
`interleave_columns`()	Interleave Series columns of a table into a single column.
`interpolate`([method, axis, limit, inplace, ...])	Interpolate data values between some points.
`isin`(values)	Whether each element in the DataFrame is contained in values.
`isna`()	Identify missing values.
`isnull`()	Identify missing values.
`items`()	Iterate over column names and series pairs
`iterrows`()	Iteration is unsupported.
`itertuples`([index, name])	Iteration is unsupported.
`join`(other[, on, how, lsuffix, rsuffix, sort])	Join columns with other DataFrame on index or on a key column.
`keys`()	Get the columns.
`kurt`([axis, skipna, numeric_only])	Return Fisher's unbiased kurtosis of a sample.
`kurtosis`([axis, skipna, numeric_only])	Return Fisher's unbiased kurtosis of a sample.
`last`(offset)	Select final periods of time series data based on a date offset.
`le`(other[, axis, level, fill_value])	Get Less than or equal to of DataFrame or Series and other, element-wise (binary operator le).
`lt`(other[, axis, level, fill_value])	Get Less than of DataFrame or Series and other, element-wise (binary operator lt).
`map`(func[, na_action])	Apply a function to a Dataframe elementwise.
`mask`(cond[, other, inplace])	Replace values where the condition is True.
`max`([axis, skipna, numeric_only])	Return the maximum of the values in the DataFrame.
`mean`([axis, skipna, numeric_only])	Return the mean of the values for the requested axis.
`median`([axis, skipna, level, numeric_only])	Return the median of the values for the requested axis.
`melt`(**kwargs)	Unpivots a DataFrame from wide format to long format, optionally leaving identifier variables set.
`memory_usage`([index, deep])	Return the memory usage of an object.
`merge`(right[, on, left_on, right_on, ...])	Merge GPU DataFrame objects by performing a database-style join operation by columns or indexes.
`min`([axis, skipna, numeric_only])	Return the minimum of the values in the DataFrame.
`mod`(other[, axis, level, fill_value])	Get Modulo of DataFrame or Series and other, element-wise (binary operator mod).
`mode`([axis, numeric_only, dropna])	Get the mode(s) of each element along the selected axis.
`mul`(other[, axis, level, fill_value])	Get Multiplication of DataFrame or Series and other, element-wise (binary operator mul).
`multiply`(other[, axis, level, fill_value])	Get Multiplication of DataFrame or Series and other, element-wise (binary operator mul).
`nans_to_nulls`()	Convert nans (if any) to nulls
`ne`(other[, axis, level, fill_value])	Get Not equal to of DataFrame or Series and other, element-wise (binary operator ne).
`nlargest`(n, columns[, keep])	Return the first n rows ordered by columns in descending order.
`notna`()	Identify non-missing values.
`notnull`()	Identify non-missing values.
`nsmallest`(n, columns[, keep])	Return the first n rows ordered by columns in ascending order.
`nunique`([axis, dropna])	Count number of distinct elements in specified axis.
`pad`([value, axis, inplace, limit])	Synonym for `Series.fillna()` with `method='ffill'`.
`partition_by_hash`(columns, nparts[, keep_index])	Partition the dataframe by the hashed value of data in columns.
`pct_change`([periods, fill_method, limit, freq])	Calculates the percent change between sequential elements in the DataFrame.
`pipe`(func, args, *kwargs)	Apply `func(self, args, *kwargs)`.
`pivot`(*, columns[, index, values])	Return reshaped DataFrame organized by the given index and column values.
`pivot_table`([values, index, columns, ...])	Create a spreadsheet-style pivot table as a DataFrame.
`pop`(item)	Return a column and drop it from the DataFrame.
`pow`(other[, axis, level, fill_value])	Get Exponential of DataFrame or Series and other, element-wise (binary operator pow).
`prod`([axis, skipna, dtype, numeric_only, ...])	Return product of the values in the DataFrame.
`product`([axis, skipna, dtype, numeric_only, ...])	Return product of the values in the DataFrame.
`quantile`([q, axis, numeric_only, ...])	Return values at the given quantile.
`query`(expr[, local_dict])	Query with a boolean expression using Numba to compile a GPU kernel.
`radd`(other[, axis, level, fill_value])	Get Addition of DataFrame or Series and other, element-wise (binary operator radd).
`rank`([axis, method, numeric_only, ...])	Compute numerical data ranks (1 through n) along axis.
`rdiv`(other[, axis, level, fill_value])	Get Floating division of DataFrame or Series and other, element-wise (binary operator rtruediv).
`reindex`([labels, index, columns, axis, ...])	Conform DataFrame to new index.
`rename`([mapper, index, columns, axis, copy, ...])	Alter column and index labels.
`repeat`(repeats[, axis])	Repeats elements consecutively.
`replace`([to_replace, value, inplace, limit, ...])	Replace values given in `to_replace` with `value`.
`resample`(rule[, axis, closed, label, ...])	Convert the frequency of ("resample") the given time series data.
`reset_index`([level, drop, inplace, ...])	Reset the index of the DataFrame, or a level of it.
`rfloordiv`(other[, axis, level, fill_value])	Get Integer division of DataFrame or Series and other, element-wise (binary operator rfloordiv).
`rmod`(other[, axis, level, fill_value])	Get Modulo of DataFrame or Series and other, element-wise (binary operator rmod).
`rmul`(other[, axis, level, fill_value])	Get Multiplication of DataFrame or Series and other, element-wise (binary operator rmul).
`rolling`(window[, min_periods, center, axis, ...])	Rolling window calculations.
`round`([decimals, how])	Round to a variable number of decimal places.
`rpow`(other[, axis, level, fill_value])	Get Exponential of DataFrame or Series and other, element-wise (binary operator rpow).
`rsub`(other[, axis, level, fill_value])	Get Subtraction of DataFrame or Series and other, element-wise (binary operator rsub).
`rtruediv`(other[, axis, level, fill_value])	Get Floating division of DataFrame or Series and other, element-wise (binary operator rtruediv).
`sample`([n, frac, replace, weights, ...])	Return a random sample of items from an axis of object.
`scale`()	Scale values to [0, 1] in float64
`scatter_by_map`(map_index[, map_size, ...])	Scatter to a list of dataframes.
`searchsorted`(values[, side, ascending, ...])	Find indices where elements should be inserted to maintain order
`select_dtypes`([include, exclude])	Return a subset of the DataFrame's columns based on the column dtypes.
`serialize`()	Generate an equivalent serializable representation of an object.
`set_index`(keys[, drop, append, inplace, ...])	Return a new DataFrame with a new index
`shift`([periods, freq, axis, fill_value])	Shift values by periods positions.
`skew`([axis, skipna, numeric_only])	Return unbiased Fisher-Pearson skew of a sample.
`sort_index`([axis, level, ascending, ...])	Sort object by labels (along an axis).
`sort_values`(by[, axis, ascending, inplace, ...])	Sort by the values along either axis.
`squeeze`([axis])	Squeeze 1 dimensional axis objects into scalars.
`stack`([level, dropna, future_stack])	Stack the prescribed level(s) from columns to index
`std`([axis, skipna, ddof, numeric_only])	Return sample standard deviation of the DataFrame.
`sub`(other[, axis, level, fill_value])	Get Subtraction of DataFrame or Series and other, element-wise (binary operator sub).
`subtract`(other[, axis, level, fill_value])	Get Subtraction of DataFrame or Series and other, element-wise (binary operator sub).
`sum`([axis, skipna, dtype, numeric_only, ...])	Return sum of the values in the DataFrame.
`swaplevel`([i, j, axis])	Swap level i with level j.
`tail`([n])	Returns the last n rows as a new DataFrame or Series
`take`(indices[, axis])	Return a new frame containing the rows specified by indices.
`tile`(count)	Repeats the rows count times to form a new Frame.
`to_arrow`([preserve_index])	Convert to a PyArrow Table.
`to_csv`([path_or_buf, sep, na_rep, columns, ...])	Write a dataframe to csv file format.
`to_cupy`([dtype, copy, na_value])	Convert the Frame to a CuPy array.
`to_dict`([orient, into])	Convert the DataFrame to a dictionary.
`to_dlpack`()	Converts a cuDF object into a DLPack tensor.
`to_feather`(path, args, *kwargs)	Write a DataFrame to the feather format.
`to_hdf`(path_or_buf, key, args, *kwargs)	Write the contained data to an HDF5 file using HDFStore.
`to_json`([path_or_buf])	Convert the cuDF object to a JSON string.
`to_numpy`([dtype, copy, na_value])	Convert the Frame to a NumPy array.
`to_orc`(fname[, compression, statistics, ...])	Write a DataFrame to the ORC format.
`to_pandas`(*[, nullable, arrow_type])	Convert to a Pandas DataFrame.
`to_parquet`(path[, engine, compression, ...])	Write a DataFrame to the parquet format.
`to_records`([index])	Convert to a numpy recarray
`to_string`()	Convert to string
`to_struct`([name])	Return a struct Series composed of the columns of the DataFrame.
`transpose`()	Transpose index and columns.
`truediv`(other[, axis, level, fill_value])	Get Floating division of DataFrame or Series and other, element-wise (binary operator truediv).
`truncate`([before, after, axis, copy])	Truncate a Series or DataFrame before and after some index value.
`unstack`([level, fill_value])	Pivot one or more levels of the (necessarily hierarchical) index labels.
`update`(other[, join, overwrite, ...])	Modify a DataFrame in place using non-NA values from another DataFrame.
`value_counts`([subset, normalize, sort, ...])	Return a Series containing counts of unique rows in the DataFrame.
`var`([axis, skipna, ddof, numeric_only])	Return unbiased variance of the DataFrame.
`where`(cond[, other, inplace])	Replace values where the condition is False.