cudf.read_text#
- cudf.read_text(filepath_or_buffer, delimiter=None, byte_range=None, strip_delimiters=False, compression=None, compression_offsets=None, storage_options=None)[source]#
Configuration object for a text Datasource
- Parameters:
- filepath_or_bufferstr, path object, or file-like object
Either a path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), or any object with a read() method (such as builtin open() file handler function or StringIO).
- delimiterstring, default None
The delimiter that should be used for splitting text chunks into separate cudf column rows. The delimiter may be one or more characters.
- byte_rangelist or tuple, default None
Byte range within the input file to be read. The first number is the offset in bytes, the second number is the range size in bytes. The output contains all rows that start inside the byte range (i.e. at or after the offset, and before the end at offset + size), which may include rows that continue past the end.
- strip_delimitersboolean, default False
Unlike the str.split() function, read_text preserves the delimiter at the end of a field in output by default, meaning a;b;c will turn into [‘a;’,’b;’,’c’] when using ; as a delimiter. Setting this option to True will strip these trailing delimiters, leaving only the contents between delimiters in the resulting column: [‘a’,’b’,’c’]
- compressionstring, default None
Which compression type is the input compressed with. Currently supports only bgzip, and requires the path to a file as input.
- compression_offsets: list or tuple, default None
The virtual begin and end offset associated with the provided compression. For bgzip, they are composed of a local uncompressed offset inside a BGZIP block (lower 16 bits) and the start offset of this BGZIP block in the compressed file (upper 48 bits). The start offset points to the first byte to be read, the end offset points one past the last byte to be read.
- storage_optionsdict, optional, default None
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Request
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open
. Please seefsspec
andurllib
for more details.
- Returns:
- resultSeries