cudf.read_avro#
- cudf.read_avro(filepath_or_buffer, columns=None, skiprows=None, num_rows=None, storage_options=None)[source]#
Load an Avro dataset into a DataFrame
- Parameters:
- filepath_or_bufferstr, path object, bytes, or file-like object
Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), Python bytes of raw binary data, or any object with a read() method (such as builtin open() file handler function or BytesIO).
- columnslist, default None
If not None, only these columns will be read.
- skiprowsint, default None
If not None, the number of rows to skip from the start of the file.
- num_rowsint, default None
If not None, the total number of rows to read.
- storage_optionsdict, optional, default None
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Requestas header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open. Please seefsspecandurllibfor more details.
- Returns:
- DataFrame
Notes
cuDF supports local and remote data stores. See configuration details for available sources here.
Examples
>>> import fastavro >>> import cudf >>> schema = {"type": "record", "name": "test", ... "fields": [{"name": "numbers", "type": "long"}, ... {"name": "text", "type": "string"}]} >>> records = [{"numbers": 10, "text": "hello"}, ... {"numbers": 20, "text": "rapids"}, ... {"numbers": 30, "text": "ai"}] >>> with open("data.avro", "wb") as f: ... fastavro.writer(f, schema, records) >>> cudf.read_avro("data.avro") numbers text 0 10 hello 1 20 rapids 2 30 ai