cudf.read_avro#

cudf.read_avro(filepath_or_buffer, engine='cudf', columns=None, skiprows=None, num_rows=None, **kwargs)#

Load an Avro dataset into a DataFrame

Parameters
filepath_or_bufferstr, path object, bytes, or file-like object

Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).

engine[‘cudf’], default ‘cudf’

Parser engine to use.

columnslist, default None

If not None, only these columns will be read.

skiprowsint, default None

If not None, the number of rows to skip from the start of the file.

num_rowsint, default None

If not None, the total number of rows to read.

Returns
DataFrame

Notes

  • cuDF supports local and remote data stores. See configuration details for available sources here.

Examples

>>> import pandavro
>>> import pandas as pd
>>> import cudf
>>> pandas_df = pd.DataFrame()
>>> pandas_df['numbers'] = [10, 20, 30]
>>> pandas_df['text'] = ["hello", "rapids", "ai"]
>>> pandas_df
   numbers    text
0       10   hello
1       20  rapids
2       30      ai
>>> pandavro.to_avro("data.avro", pandas_df)
>>> cudf.read_avro("data.avro")
   numbers    text
0       10   hello
1       20  rapids
2       30      ai