CSV#

class pylibcudf.io.csv.CsvWriterOptions#

The settings to use for write_csv

For details, see cudf::io::csv_writer_options

Methods

builder(SinkInfo sink, Table table)

Create a CsvWriterOptionsBuilder object

static builder(SinkInfo sink, Table table)#

Create a CsvWriterOptionsBuilder object

For details, see cudf::io::csv_writer_options::builder()

Parameters:
sinkSinkInfo

The sink used for writer output

tableTable

Table to be written to output

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

class pylibcudf.io.csv.CsvWriterOptionsBuilder#

Builder to build options for write_csv

For details, see cudf::io::csv_writer_options_builder

Methods

build(self)

Create a CsvWriterOptions object

false_value(self, unicode val)

Sets string used for values == 0

include_header(self, bool val)

Enables/Disables headers being written to csv.

inter_column_delimiter(self, unicode delim)

Sets character used for separating column values.

line_terminator(self, unicode term)

Sets character used for separating lines.

na_rep(self, unicode val)

Sets string to used for null entries.

names(self, list names)

Sets optional column names.

rows_per_chunk(self, int val)

Sets maximum number of rows to process for each file write.

true_value(self, unicode val)

Sets string used for values != 0

build(self) CsvWriterOptions#

Create a CsvWriterOptions object

false_value(self, unicode val) CsvWriterOptionsBuilder#

Sets string used for values == 0

Parameters:
valstr

String to represent values == 0

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

include_header(self, bool val) CsvWriterOptionsBuilder#

Enables/Disables headers being written to csv.

Parameters:
valbool

Boolean value to enable/disable

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

inter_column_delimiter(self, unicode delim) CsvWriterOptionsBuilder#

Sets character used for separating column values.

Parameters:
delimstr

Character to delimit column values

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

line_terminator(self, unicode term) CsvWriterOptionsBuilder#

Sets character used for separating lines.

Parameters:
termstr

Character to represent line termination

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

na_rep(self, unicode val) CsvWriterOptionsBuilder#

Sets string to used for null entries.

Parameters:
valstr

String to represent null value

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

names(self, list names) CsvWriterOptionsBuilder#

Sets optional column names.

Parameters:
nameslist[str]

Column names

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

rows_per_chunk(self, int val) CsvWriterOptionsBuilder#

Sets maximum number of rows to process for each file write.

Parameters:
valint

Number of rows per chunk

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

true_value(self, unicode val) CsvWriterOptionsBuilder#

Sets string used for values != 0

Parameters:
valstr

String to represent values != 0

Returns:
CsvWriterOptionsBuilder

Builder to build CsvWriterOptions

pylibcudf.io.csv.read_csv(SourceInfo source_info, *, compression_type compression=compression_type.AUTO, size_t byte_range_offset=0, size_t byte_range_size=0, list col_names=None, unicode prefix=u'', bool mangle_dupe_cols=True, list usecols=None, size_type nrows=-1, size_type skiprows=0, size_type skipfooter=0, size_type header=0, unicode lineterminator=u'\n', unicode delimiter=None, unicode thousands=None, unicode decimal=u'.', unicode comment=None, bool delim_whitespace=False, bool skipinitialspace=False, bool skip_blank_lines=True, quote_style quoting=quote_style.MINIMAL, unicode quotechar=u'"', bool doublequote=True, list parse_dates=None, list parse_hex=None, dtypes=None, list true_values=None, list false_values=None, list na_values=None, bool keep_default_na=True, bool na_filter=True, bool dayfirst=False)#

Reads a CSV file into a TableWithMetadata.

For details, see read_csv().

Parameters:
source_infoSourceInfo

The SourceInfo to read the CSV file from.

compressioncompression_type, default CompressionType.AUTO

The compression format of the CSV source.

byte_range_offsetsize_type, default 0

Number of bytes to skip from source start.

byte_range_sizesize_type, default 0

Number of bytes to read. By default, will read all bytes.

col_nameslist, default None

The column names to use.

prefixstring, default ‘’

The prefix to apply to the column names.

mangle_dupe_colsbool, default True

If True, rename duplicate column names.

usecolslist, default None

Specify the string column names/integer column indices of columns to be read.

nrowssize_type, default -1

The number of rows to read.

skiprowssize_type, default 0

The number of rows to skip from the start before reading

skipfootersize_type, default 0

The number of rows to skip from the end

headersize_type, default 0

The index of the row that will be used for header names. Pass -1 to use default column names.

lineterminatorstr, default ‘n’

The character used to determine the end of a line.

delimiterstr, default “,”

The character used to separate fields in a row.

thousandsstr, default None

The character used as the thousands separator. Cannot match delimiter.

decimalstr, default ‘.’

The character used as the decimal separator. Cannot match delimiter.

commentstr, default None

The character used to identify the start of a comment line. (which will be skipped by the reader)

delim_whitespacebool, default False

If True, treat whitespace as the field delimiter.

skipinitialspacebool, default False

If True, skip whitespace after the delimiter.

skip_blank_linesbool, default True

If True, ignore empty lines (otherwise line values are parsed as null).

quotingQuoteStyle, default QuoteStyle.MINIMAL

The quoting style used in the input CSV data. One of { QuoteStyle.MINIMAL, QuoteStyle.ALL, QuoteStyle.NONNUMERIC, QuoteStyle.NONE }

quotecharstr, default ‘”’

The character used to indicate quoting.

doublequotebool, default True

If True, a quote inside a value is double-quoted.

parse_dateslist, default None

A list of integer column indices/string column names of columns to read as datetime.

parse_hexlist, default None

A list of integer column indices/string column names of columns to read as hexadecimal.

dtypesUnion[Dict[str, DataType], List[DataType]], default None

A list of data types or a dictionary mapping column names to a DataType.

true_valuesList[str], default None

A list of additional values to recognize as True.

false_valuesList[str], default None

A list of additional values to recognize as False.

na_valuesList[str], default None

A list of additional values to recognize as null.

keep_default_nabool, default True

Whether to keep the built-in default N/A values.

na_filterbool, default True

Whether to detect missing values. If False, can improve performance.

dayfirstbool, default False

If True, interpret dates as being in the DD/MM format.

Returns:
TableWithMetadata

The Table and its corresponding metadata (column names) that were read in.

pylibcudf.io.csv.write_csv(CsvWriterOptions options) void#

Write to CSV format.

The table to write, output paths, and options are encapsulated by the options object.

For details, see write_csv().

Parameters:
options: CsvWriterOptions

Settings for controlling writing behavior