JSON#

class pylibcudf.io.json.JsonReaderOptions#

The settings to use for read_json

For details, see :cpp:class:`cudf::io::json_reader_options

Methods

allow_nonnumeric_numbers(self, bool val)

allow_numeric_leading_zeros(self, bool val)

allow_unquoted_control_chars(self, bool val)

builder(SourceInfo source)

Create a JsonReaderOptionsBuilder object

enable_dayfirst(self, bool val)

enable_experimental(self, bool val)

enable_keep_quotes(self, bool keep_quotes)

Set whether the reader should keep quotes of string values.

enable_lines(self, bool val)

Set whether to read the file as a json object per line.

enable_mixed_types_as_string(self, ...)

Set whether to parse mixed types as a string column.

enable_normalize_single_quotes(self, bool val)

enable_normalize_whitespace(self, bool val)

enable_prune_columns(self, bool prune_columns)

Set whether to prune columns on read, selected based on the set_dtypes option.

set_byte_range_offset(self, size_t offset)

Set number of bytes to skip from source start.

set_byte_range_size(self, size_t size)

Set number of bytes to read.

set_delimiter(self, unicode val)

set_dtypes(self, list types)

Set data types for columns to be read.

set_na_values(self, list vals)

set_strict_validation(self, bool val)

allow_nonnumeric_numbers(self, bool val) void#
allow_numeric_leading_zeros(self, bool val) void#
allow_unquoted_control_chars(self, bool val) void#
static builder(SourceInfo source)#

Create a JsonReaderOptionsBuilder object

For details, see cudf::io::json_reader_options::builder()

Parameters:
sinkSourceInfo

The source to read the JSON file from.

Returns:
JsonReaderOptionsBuilder

Builder to build JsonReaderOptions

enable_dayfirst(self, bool val) void#
enable_experimental(self, bool val) void#
enable_keep_quotes(self, bool keep_quotes) void#

Set whether the reader should keep quotes of string values.

Parameters:
keep_quotesbool

Boolean value to indicate whether the reader should keep quotes of string values

Returns:
None
enable_lines(self, bool val) void#

Set whether to read the file as a json object per line.

Parameters:
valbool

Boolean value to enable/disable the option to read each line as a json object

Returns:
None
enable_mixed_types_as_string(self, bool mixed_types_as_string) void#

Set whether to parse mixed types as a string column. Also enables forcing to read a struct as string column using schema.

Parameters:
mixed_types_as_stringbool

Boolean value to enable/disable parsing mixed types as a string column

Returns:
None
enable_normalize_single_quotes(self, bool val) void#
enable_normalize_whitespace(self, bool val) void#
enable_prune_columns(self, bool prune_columns) void#

Set whether to prune columns on read, selected based on the set_dtypes option.

Parameters:
prune_columnsbool

When set as true, if the reader options include set_dtypes, then the reader will only return those columns which are mentioned in set_dtypes. If false, then all columns are returned, independent of the set_dtypes setting.

Returns:
None
set_byte_range_offset(self, size_t offset) void#

Set number of bytes to skip from source start.

Parameters:
offsetsize_t

Number of bytes of offset

Returns:
None
set_byte_range_size(self, size_t size) void#

Set number of bytes to read.

Parameters:
sizesize_t

Number of bytes to read

Returns:
None
set_delimiter(self, unicode val) void#
set_dtypes(self, list types) void#

Set data types for columns to be read.

Parameters:
typeslist

List of dtypes or a list of tuples of column names, dtypes, and list of tuples (to support nested column hierarchy)

Returns:
None
set_na_values(self, list vals) void#
set_strict_validation(self, bool val) void#
class pylibcudf.io.json.JsonWriterOptions#

The settings to use for write_json

For details, see cudf::io::json_writer_options

Methods

builder(SinkInfo sink, Table table)

Create a JsonWriterOptionsBuilder object

set_compression(self, compression_type comptype)

Sets compression type to be used

set_false_value(self, unicode val)

Sets string used for values == 0

set_rows_per_chunk(self, size_type val)

Sets string to used for null entries.

set_true_value(self, unicode val)

Sets string used for values != 0

static builder(SinkInfo sink, Table table)#

Create a JsonWriterOptionsBuilder object

Parameters:
sinkSinkInfo

The sink used for writer output

tableTable

Table to be written to output

Returns:
JsonWriterOptionsBuilder

Builder to build JsonWriterOptions

set_compression(self, compression_type comptype) void#

Sets compression type to be used

Parameters:
comptypeCompressionType

Compression type for sink

Returns:
None
set_false_value(self, unicode val) void#

Sets string used for values == 0

Parameters:
valstr

String to represent values == 0

Returns:
None
set_rows_per_chunk(self, size_type val) void#

Sets string to used for null entries.

Parameters:
valsize_type

String to represent null value

Returns:
None
set_true_value(self, unicode val) void#

Sets string used for values != 0

Parameters:
valstr

String to represent values != 0

Returns:
None
pylibcudf.io.json.chunked_read_json(JsonReaderOptions options, int chunk_size=100000000) tuple#

Reads chunks of a JSON file into a TableWithMetadata.

Parameters:
optionsJsonReaderOptions

Settings for controlling reading behavior

chunk_sizeint, default 100_000_000 bytes.

The number of bytes to be read in chunks. The chunk_size should be set to at least row_size.

Returns:
tuple

A tuple of (columns, column_name, child_names)

pylibcudf.io.json.read_json(JsonReaderOptions options) TableWithMetadata#

Read from JSON format.

The source to read from and options are encapsulated by the options object.

For details, see read_json().

Parameters:
options: JsonReaderOptions

Settings for controlling reading behavior

Returns:
TableWithMetadata

The Table and its corresponding metadata (column names) that were read in.

pylibcudf.io.json.write_json(JsonWriterOptions options) void#

Writes a set of columns to JSON format.

Parameters:
optionsJsonWriterOptions

Settings for controlling writing behavior

Returns:
None