cudf.read_text#

cudf.read_text(filepath_or_buffer, delimiter=None, byte_range=None, strip_delimiters=False, compression=None, compression_offsets=None, storage_options=None)[source]#

Configuration object for a text Datasource

Parameters:
filepath_or_bufferstr, path object, or file-like object

Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), or any object with a read() method (such as builtin open() file handler function or StringIO).

delimiterstring, default None

The delimiter that should be used for splitting text chunks into separate cudf column rows. The delimiter may be one or more characters.

byte_rangelist or tuple, default None

Byte range within the input file to be read. The first number is the offset in bytes, the second number is the range size in bytes. The output contains all rows that start inside the byte range (i.e. at or after the offset, and before the end at offset + size), which may include rows that continue past the end.

strip_delimitersboolean, default False

Unlike the str.split() function, read_text preserves the delimiter at the end of a field in output by default, meaning a;b;c will turn into [‘a;’,’b;’,’c’] when using ; as a delimiter. Setting this option to True will strip these trailing delimiters, leaving only the contents between delimiters in the resulting column: [‘a’,’b’,’c’]

compressionstring, default None

Which compression type is the input compressed with. Currently supports only bgzip, and requires the path to a file as input.

compression_offsets: list or tuple, default None

The virtual begin and end offset associated with the provided compression. For bgzip, they are composed of a local uncompressed offset inside a BGZIP block (lower 16 bits) and the start offset of this BGZIP block in the compressed file (upper 48 bits). The start offset points to the first byte to be read, the end offset points one past the last byte to be read.

storage_optionsdict, optional, default None

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details.

Returns:
resultSeries