cudf.read_avro#
- cudf.read_avro(filepath_or_buffer, columns=None, skiprows=None, num_rows=None, storage_options=None)[source]#
Load an Avro dataset into a DataFrame
- Parameters:
- filepath_or_bufferstr, path object, bytes, or file-like object
Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).
- columnslist, default None
If not None, only these columns will be read.
- skiprowsint, default None
If not None, the number of rows to skip from the start of the file.
- num_rowsint, default None
If not None, the total number of rows to read.
- storage_optionsdict, optional, default None
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Request
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open
. Please seefsspec
andurllib
for more details.
- Returns:
- DataFrame
Notes
cuDF supports local and remote data stores. See configuration details for available sources here.
Examples
>>> import pandavro >>> import pandas as pd >>> import cudf >>> pandas_df = pd.DataFrame() >>> pandas_df['numbers'] = [10, 20, 30] >>> pandas_df['text'] = ["hello", "rapids", "ai"] >>> pandas_df numbers text 0 10 hello 1 20 rapids 2 30 ai >>> pandavro.to_avro("data.avro", pandas_df) >>> cudf.read_avro("data.avro") numbers text 0 10 hello 1 20 rapids 2 30 ai