cudf.read_avro#

cudf.read_avro(filepath_or_buffer, columns=None, skiprows=None, num_rows=None, storage_options=None)#

Load an Avro dataset into a DataFrame

Parameters:

filepath_or_bufferstr, path object, bytes, or file-like object: Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).
columnslist, default None: If not None, only these columns will be read.
skiprowsint, default None: If not None, the number of rows to skip from the start of the file.
num_rowsint, default None: If not None, the total number of rows to read.
storage_optionsdict, optional, default None: Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details.

Returns:

DataFrame

Notes

cuDF supports local and remote data stores. See configuration details for available sources here.

Examples

>>> import pandavro
>>> import pandas as pd
>>> import cudf
>>> pandas_df = pd.DataFrame()
>>> pandas_df['numbers'] = [10, 20, 30]
>>> pandas_df['text'] = ["hello", "rapids", "ai"]
>>> pandas_df
   numbers    text
0       10   hello
1       20  rapids
2       30      ai
>>> pandavro.to_avro("data.avro", pandas_df)
>>> cudf.read_avro("data.avro")
   numbers    text
0       10   hello
1       20  rapids
2       30      ai