CSV#

pylibcudf.io.csv.read_csv(SourceInfo source_info, *, compression_type compression=compression_type.AUTO, size_t byte_range_offset=0, size_t byte_range_size=0, list col_names=None, unicode prefix=u'', bool mangle_dupe_cols=True, list usecols=None, size_type nrows=-1, size_type skiprows=0, size_type skipfooter=0, size_type header=0, unicode lineterminator=u'\n', unicode delimiter=None, unicode thousands=None, unicode decimal=u'.', unicode comment=None, bool delim_whitespace=False, bool skipinitialspace=False, bool skip_blank_lines=True, quote_style quoting=quote_style.MINIMAL, unicode quotechar=u'"', bool doublequote=True, list parse_dates=None, list parse_hex=None, dtypes=None, list true_values=None, list false_values=None, list na_values=None, bool keep_default_na=True, bool na_filter=True, bool dayfirst=False)#

Reads a CSV file into a TableWithMetadata.

Parameters:
source_infoSourceInfo

The SourceInfo to read the CSV file from.

compressioncompression_type, default CompressionType.AUTO

The compression format of the CSV source.

byte_range_offsetsize_type, default 0

Number of bytes to skip from source start.

byte_range_sizesize_type, default 0

Number of bytes to read. By default, will read all bytes.

col_nameslist, default None

The column names to use.

prefixstring, default ‘’

The prefix to apply to the column names.

mangle_dupe_colsbool, default True

If True, rename duplicate column names.

usecolslist, default None

Specify the string column names/integer column indices of columns to be read.

nrowssize_type, default -1

The number of rows to read.

skiprowssize_type, default 0

The number of rows to skip from the start before reading

skipfootersize_type, default 0

The number of rows to skip from the end

headersize_type, default 0

The index of the row that will be used for header names. Pass -1 to use default column names.

lineterminatorstr, default ‘n’

The character used to determine the end of a line.

delimiterstr, default “,”

The character used to separate fields in a row.

thousandsstr, default None

The character used as the thousands separator. Cannot match delimiter.

decimalstr, default ‘.’

The character used as the decimal separator. Cannot match delimiter.

commentstr, default None

The character used to identify the start of a comment line. (which will be skipped by the reader)

delim_whitespacebool, default False

If True, treat whitespace as the field delimiter.

skipinitialspacebool, default False

If True, skip whitespace after the delimiter.

skip_blank_linesbool, default True

If True, ignore empty lines (otherwise line values are parsed as null).

quotingQuoteStyle, default QuoteStyle.MINIMAL

The quoting style used in the input CSV data. One of { QuoteStyle.MINIMAL, QuoteStyle.ALL, QuoteStyle.NONNUMERIC, QuoteStyle.NONE }

quotecharstr, default ‘”’

The character used to indicate quoting.

doublequotebool, default True

If True, a quote inside a value is double-quoted.

parse_dateslist, default None

A list of integer column indices/string column names of columns to read as datetime.

parse_hexlist, default None

A list of integer column indices/string column names of columns to read as hexadecimal.

dtypesUnion[Dict[str, DataType], List[DataType]], default None

A list of data types or a dictionary mapping column names to a DataType.

true_valuesList[str], default None

A list of additional values to recognize as True.

false_valuesList[str], default None

A list of additional values to recognize as False.

na_valuesList[str], default None

A list of additional values to recognize as null.

keep_default_nabool, default True

Whether to keep the built-in default N/A values.

na_filterbool, default True

Whether to detect missing values. If False, can improve performance.

dayfirstbool, default False

If True, interpret dates as being in the DD/MM format.

Returns:
TableWithMetadata

The Table and its corresponding metadata (column names) that were read in.