cudf.read_avro#

cudf.read_avro(filepath_or_buffer, columns=None, skiprows=None, num_rows=None, storage_options=None)#

Load an Avro dataset into a DataFrame

Parameters:
filepath_or_bufferstr, path object, bytes, or file-like object

Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).

columnslist, default None

If not None, only these columns will be read.

skiprowsint, default None

If not None, the number of rows to skip from the start of the file.

num_rowsint, default None

If not None, the total number of rows to read.

storage_optionsdict, optional, default None

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details.

Returns:
DataFrame

Notes

  • cuDF supports local and remote data stores. See configuration details for available sources here.

Examples

>>> import pandavro
>>> import pandas as pd
>>> import cudf
>>> pandas_df = pd.DataFrame()
>>> pandas_df['numbers'] = [10, 20, 30]
>>> pandas_df['text'] = ["hello", "rapids", "ai"]
>>> pandas_df
   numbers    text
0       10   hello
1       20  rapids
2       30      ai
>>> pandavro.to_avro("data.avro", pandas_df)
>>> cudf.read_avro("data.avro")
   numbers    text
0       10   hello
1       20  rapids
2       30      ai