CSV#
- pylibcudf.io.csv.read_csv(SourceInfo source_info, *, compression_type compression=compression_type.AUTO, size_t byte_range_offset=0, size_t byte_range_size=0, list col_names=None, unicode prefix=u'', bool mangle_dupe_cols=True, list usecols=None, size_type nrows=-1, size_type skiprows=0, size_type skipfooter=0, size_type header=0, unicode lineterminator=u'\n', unicode delimiter=None, unicode thousands=None, unicode decimal=u'.', unicode comment=None, bool delim_whitespace=False, bool skipinitialspace=False, bool skip_blank_lines=True, quote_style quoting=quote_style.MINIMAL, unicode quotechar=u'"', bool doublequote=True, list parse_dates=None, list parse_hex=None, dtypes=None, list true_values=None, list false_values=None, list na_values=None, bool keep_default_na=True, bool na_filter=True, bool dayfirst=False)#
Reads a CSV file into a
TableWithMetadata
.- Parameters:
- source_infoSourceInfo
The SourceInfo to read the CSV file from.
- compressioncompression_type, default CompressionType.AUTO
The compression format of the CSV source.
- byte_range_offsetsize_type, default 0
Number of bytes to skip from source start.
- byte_range_sizesize_type, default 0
Number of bytes to read. By default, will read all bytes.
- col_nameslist, default None
The column names to use.
- prefixstring, default ‘’
The prefix to apply to the column names.
- mangle_dupe_colsbool, default True
If True, rename duplicate column names.
- usecolslist, default None
Specify the string column names/integer column indices of columns to be read.
- nrowssize_type, default -1
The number of rows to read.
- skiprowssize_type, default 0
The number of rows to skip from the start before reading
- skipfootersize_type, default 0
The number of rows to skip from the end
- headersize_type, default 0
The index of the row that will be used for header names. Pass -1 to use default column names.
- lineterminatorstr, default ‘n’
The character used to determine the end of a line.
- delimiterstr, default “,”
The character used to separate fields in a row.
- thousandsstr, default None
The character used as the thousands separator. Cannot match delimiter.
- decimalstr, default ‘.’
The character used as the decimal separator. Cannot match delimiter.
- commentstr, default None
The character used to identify the start of a comment line. (which will be skipped by the reader)
- delim_whitespacebool, default False
If True, treat whitespace as the field delimiter.
- skipinitialspacebool, default False
If True, skip whitespace after the delimiter.
- skip_blank_linesbool, default True
If True, ignore empty lines (otherwise line values are parsed as null).
- quotingQuoteStyle, default QuoteStyle.MINIMAL
The quoting style used in the input CSV data. One of { QuoteStyle.MINIMAL, QuoteStyle.ALL, QuoteStyle.NONNUMERIC, QuoteStyle.NONE }
- quotecharstr, default ‘”’
The character used to indicate quoting.
- doublequotebool, default True
If True, a quote inside a value is double-quoted.
- parse_dateslist, default None
A list of integer column indices/string column names of columns to read as datetime.
- parse_hexlist, default None
A list of integer column indices/string column names of columns to read as hexadecimal.
- dtypesUnion[Dict[str, DataType], List[DataType]], default None
A list of data types or a dictionary mapping column names to a DataType.
- true_valuesList[str], default None
A list of additional values to recognize as True.
- false_valuesList[str], default None
A list of additional values to recognize as False.
- na_valuesList[str], default None
A list of additional values to recognize as null.
- keep_default_nabool, default True
Whether to keep the built-in default N/A values.
- na_filterbool, default True
Whether to detect missing values. If False, can improve performance.
- dayfirstbool, default False
If True, interpret dates as being in the DD/MM format.
- Returns:
- TableWithMetadata
The Table and its corresponding metadata (column names) that were read in.