JSON#

pylibcudf.io.json.chunked_read_json(SourceInfo source_info, list dtypes=None, compression_type compression=compression_type.AUTO, bool keep_quotes=False, bool mixed_types_as_string=False, bool prune_columns=False, json_recovery_mode_t recovery_mode=json_recovery_mode_t.FAIL, int chunk_size=100000000) tuple#

Reads an JSON file into a TableWithMetadata.

Parameters:
source_infoSourceInfo

The SourceInfo object to read the JSON file from.

dtypeslist, default None

Set data types for the columns in the JSON file.

Each element of the list has the format (column_name, column_dtype, list of child dtypes), where the list of child dtypes is an empty list if the child is not a nested type (list or struct dtype), and is of format (column_child_name, column_child_type, list of grandchild dtypes).

compression: CompressionType, default CompressionType.AUTO

The compression format of the JSON source.

keep_quotesbool, default False

Whether the reader should keep quotes of string values.

mixed_types_as_stringbool, default False

If True, mixed type columns are returned as string columns. If False parsing mixed type columns will thrown an error.

prune_columnsbool, default False

Whether to only read columns specified in dtypes.

recover_modeJSONRecoveryMode, default JSONRecoveryMode.FAIL

Whether to raise an error or set corresponding values to null when encountering an invalid JSON line.

chunk_sizeint, default 100_000_000 bytes.

The number of bytes to be read in chunks. The chunk_size should be set to at least row_size.

Returns:
tuple

A tuple of (columns, column_name, child_names)

pylibcudf.io.json.read_json(SourceInfo source_info, list dtypes=None, compression_type compression=compression_type.AUTO, bool lines=False, size_t byte_range_offset=0, size_t byte_range_size=0, bool keep_quotes=False, bool mixed_types_as_string=False, bool prune_columns=False, json_recovery_mode_t recovery_mode=json_recovery_mode_t.FAIL) TableWithMetadata#

Reads an JSON file into a TableWithMetadata.

Parameters:
source_infoSourceInfo

The SourceInfo object to read the JSON file from.

dtypeslist, default None

Set data types for the columns in the JSON file.

Each element of the list has the format (column_name, column_dtype, list of child dtypes), where the list of child dtypes is an empty list if the child is not a nested type (list or struct dtype), and is of format (column_child_name, column_child_type, list of grandchild dtypes).

compression: CompressionType, default CompressionType.AUTO

The compression format of the JSON source.

byte_range_offsetsize_t, default 0

Number of bytes to skip from source start.

byte_range_sizesize_t, default 0

Number of bytes to read. By default, will read all bytes.

keep_quotesbool, default False

Whether the reader should keep quotes of string values.

mixed_types_as_stringbool, default False

If True, mixed type columns are returned as string columns. If False parsing mixed type columns will thrown an error.

prune_columnsbool, default False

Whether to only read columns specified in dtypes.

recover_modeJSONRecoveryMode, default JSONRecoveryMode.FAIL

Whether to raise an error or set corresponding values to null when encountering an invalid JSON line.

Returns:
TableWithMetadata

The Table and its corresponding metadata (column names) that were read in.

pylibcudf.io.json.write_json(SinkInfo sink_info, TableWithMetadata table_w_meta, unicode na_rep=u'', bool include_nulls=False, bool lines=False, size_type rows_per_chunk=numeric_limits[size_type].max(), unicode true_value=u'true', unicode false_value=u'false') void#

Writes a Table to JSON format.

Parameters:
sink_info: SinkInfo

The SinkInfo object to write the JSON to.

table_w_meta: TableWithMetadata

The TableWithMetadata object containing the Table to write

na_rep: str, default “”

The string representation for null values.

include_nulls: bool, default False

Enables/Disables output of nulls as ‘null’.

lines: bool, default False

If True, write output in the JSON lines format.

rows_per_chunk: size_type, defaults to length of the input table

The maximum number of rows to write at a time.

true_value: str, default “true”

The string representation for values != 0 in INT8 types.

false_value: str, default “false”

The string representation for values == 0 in INT8 types.