CSV#

class pylibcudf.io.csv.CsvWriterOptions#

The settings to use for write_csv

For details, see cudf::io::csv_writer_options

Methods

builder(SinkInfo sink, Table table)

Create a CsvWriterOptionsBuilder object

static builder(SinkInfo sink, Table table)#

Create a CsvWriterOptionsBuilder object

For details, see cudf::io::csv_writer_options::builder()

Parameters:

sinkSinkInfo: The sink used for writer output
tableTable: Table to be written to output

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

class pylibcudf.io.csv.CsvWriterOptionsBuilder#

Builder to build options for write_csv

For details, see cudf::io::csv_writer_options_builder

Methods

`build`(self)	Create a CsvWriterOptions object
`false_value`(self, unicode val)	Sets string used for values == 0
`include_header`(self, bool val)	Enables/Disables headers being written to csv.
`inter_column_delimiter`(self, unicode delim)	Sets character used for separating column values.
`line_terminator`(self, unicode term)	Sets character used for separating lines.
`na_rep`(self, unicode val)	Sets string to used for null entries.
`names`(self, list names)	Sets optional column names.
`rows_per_chunk`(self, int val)	Sets maximum number of rows to process for each file write.
`true_value`(self, unicode val)	Sets string used for values != 0

build(self) → CsvWriterOptions#: Create a CsvWriterOptions object

false_value(self, unicode val) → CsvWriterOptionsBuilder#

Sets string used for values == 0

Parameters:

valstr: String to represent values == 0

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

include_header(self, bool val) → CsvWriterOptionsBuilder#

Enables/Disables headers being written to csv.

Parameters:

valbool: Boolean value to enable/disable

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

inter_column_delimiter(self, unicode delim) → CsvWriterOptionsBuilder#

Sets character used for separating column values.

Parameters:

delimstr: Character to delimit column values

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

line_terminator(self, unicode term) → CsvWriterOptionsBuilder#

Sets character used for separating lines.

Parameters:

termstr: Character to represent line termination

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

na_rep(self, unicode val) → CsvWriterOptionsBuilder#

Sets string to used for null entries.

Parameters:

valstr: String to represent null value

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

names(self, list names) → CsvWriterOptionsBuilder#

Sets optional column names.

Parameters:

nameslist[str]: Column names

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

rows_per_chunk(self, int val) → CsvWriterOptionsBuilder#

Sets maximum number of rows to process for each file write.

Parameters:

valint: Number of rows per chunk

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

true_value(self, unicode val) → CsvWriterOptionsBuilder#

Sets string used for values != 0

Parameters:

valstr: String to represent values != 0

Returns:

CsvWriterOptionsBuilder: Builder to build CsvWriterOptions

pylibcudf.io.csv.read_csv(SourceInfo source_info, *, compression_type compression=compression_type.AUTO, size_t byte_range_offset=0, size_t byte_range_size=0, list col_names=None, unicode prefix=u'', bool mangle_dupe_cols=True, list usecols=None, size_type nrows=-1, size_type skiprows=0, size_type skipfooter=0, size_type header=0, unicode lineterminator=u'\n', unicode delimiter=None, unicode thousands=None, unicode decimal=u'.', unicode comment=None, bool delim_whitespace=False, bool skipinitialspace=False, bool skip_blank_lines=True, quote_style quoting=quote_style.MINIMAL, unicode quotechar=u'"', bool doublequote=True, list parse_dates=None, list parse_hex=None, dtypes=None, list true_values=None, list false_values=None, list na_values=None, bool keep_default_na=True, bool na_filter=True, bool dayfirst=False)#

Reads a CSV file into a TableWithMetadata.

For details, see read_csv().

Parameters:

source_infoSourceInfo: The SourceInfo to read the CSV file from.
compressioncompression_type, default CompressionType.AUTO: The compression format of the CSV source.
byte_range_offsetsize_type, default 0: Number of bytes to skip from source start.
byte_range_sizesize_type, default 0: Number of bytes to read. By default, will read all bytes.
col_nameslist, default None: The column names to use.
prefixstring, default ‘’: The prefix to apply to the column names.
mangle_dupe_colsbool, default True: If True, rename duplicate column names.
usecolslist, default None: Specify the string column names/integer column indices of columns to be read.
nrowssize_type, default -1: The number of rows to read.
skiprowssize_type, default 0: The number of rows to skip from the start before reading
skipfootersize_type, default 0: The number of rows to skip from the end
headersize_type, default 0: The index of the row that will be used for header names. Pass -1 to use default column names.
lineterminatorstr, default ‘n’: The character used to determine the end of a line.
delimiterstr, default “,”: The character used to separate fields in a row.
thousandsstr, default None: The character used as the thousands separator. Cannot match delimiter.
decimalstr, default ‘.’: The character used as the decimal separator. Cannot match delimiter.
commentstr, default None: The character used to identify the start of a comment line. (which will be skipped by the reader)
delim_whitespacebool, default False: If True, treat whitespace as the field delimiter.
skipinitialspacebool, default False: If True, skip whitespace after the delimiter.
skip_blank_linesbool, default True: If True, ignore empty lines (otherwise line values are parsed as null).
quotingQuoteStyle, default QuoteStyle.MINIMAL: The quoting style used in the input CSV data. One of { QuoteStyle.MINIMAL, QuoteStyle.ALL, QuoteStyle.NONNUMERIC, QuoteStyle.NONE }
quotecharstr, default ‘”’: The character used to indicate quoting.
doublequotebool, default True: If True, a quote inside a value is double-quoted.
parse_dateslist, default None: A list of integer column indices/string column names of columns to read as datetime.
parse_hexlist, default None: A list of integer column indices/string column names of columns to read as hexadecimal.
dtypesUnion[Dict[str, DataType], List[DataType]], default None: A list of data types or a dictionary mapping column names to a DataType.
true_valuesList[str], default None: A list of additional values to recognize as True.
false_valuesList[str], default None: A list of additional values to recognize as False.
na_valuesList[str], default None: A list of additional values to recognize as null.
keep_default_nabool, default True: Whether to keep the built-in default N/A values.
na_filterbool, default True: Whether to detect missing values. If False, can improve performance.
dayfirstbool, default False: If True, interpret dates as being in the DD/MM format.

Returns:

TableWithMetadata: The Table and its corresponding metadata (column names) that were read in.

pylibcudf.io.csv.write_csv(CsvWriterOptions options) → void#

Write to CSV format.

The table to write, output paths, and options are encapsulated by the options object.

For details, see write_csv().

Parameters:

options: CsvWriterOptions: Settings for controlling writing behavior