JSON#
- pylibcudf.io.json.chunked_read_json(SourceInfo source_info, list dtypes=None, compression_type compression=compression_type.AUTO, bool keep_quotes=False, bool mixed_types_as_string=False, bool prune_columns=False, json_recovery_mode_t recovery_mode=json_recovery_mode_t.FAIL, int chunk_size=100000000) tuple #
Reads an JSON file into a
TableWithMetadata
.- Parameters:
- source_infoSourceInfo
The SourceInfo object to read the JSON file from.
- dtypeslist, default None
Set data types for the columns in the JSON file.
Each element of the list has the format (column_name, column_dtype, list of child dtypes), where the list of child dtypes is an empty list if the child is not a nested type (list or struct dtype), and is of format (column_child_name, column_child_type, list of grandchild dtypes).
- compression: CompressionType, default CompressionType.AUTO
The compression format of the JSON source.
- keep_quotesbool, default False
Whether the reader should keep quotes of string values.
- mixed_types_as_stringbool, default False
If True, mixed type columns are returned as string columns. If False parsing mixed type columns will thrown an error.
- prune_columnsbool, default False
Whether to only read columns specified in dtypes.
- recover_modeJSONRecoveryMode, default JSONRecoveryMode.FAIL
Whether to raise an error or set corresponding values to null when encountering an invalid JSON line.
- chunk_sizeint, default 100_000_000 bytes.
The number of bytes to be read in chunks. The chunk_size should be set to at least row_size.
- Returns:
- tuple
A tuple of (columns, column_name, child_names)
- pylibcudf.io.json.read_json(SourceInfo source_info, list dtypes=None, compression_type compression=compression_type.AUTO, bool lines=False, size_t byte_range_offset=0, size_t byte_range_size=0, bool keep_quotes=False, bool mixed_types_as_string=False, bool prune_columns=False, json_recovery_mode_t recovery_mode=json_recovery_mode_t.FAIL) TableWithMetadata #
Reads an JSON file into a
TableWithMetadata
.- Parameters:
- source_infoSourceInfo
The SourceInfo object to read the JSON file from.
- dtypeslist, default None
Set data types for the columns in the JSON file.
Each element of the list has the format (column_name, column_dtype, list of child dtypes), where the list of child dtypes is an empty list if the child is not a nested type (list or struct dtype), and is of format (column_child_name, column_child_type, list of grandchild dtypes).
- compression: CompressionType, default CompressionType.AUTO
The compression format of the JSON source.
- byte_range_offsetsize_t, default 0
Number of bytes to skip from source start.
- byte_range_sizesize_t, default 0
Number of bytes to read. By default, will read all bytes.
- keep_quotesbool, default False
Whether the reader should keep quotes of string values.
- mixed_types_as_stringbool, default False
If True, mixed type columns are returned as string columns. If False parsing mixed type columns will thrown an error.
- prune_columnsbool, default False
Whether to only read columns specified in dtypes.
- recover_modeJSONRecoveryMode, default JSONRecoveryMode.FAIL
Whether to raise an error or set corresponding values to null when encountering an invalid JSON line.
- Returns:
- TableWithMetadata
The Table and its corresponding metadata (column names) that were read in.
- pylibcudf.io.json.write_json(SinkInfo sink_info, TableWithMetadata table_w_meta, unicode na_rep=u'', bool include_nulls=False, bool lines=False, size_type rows_per_chunk=numeric_limits[size_type].max(), unicode true_value=u'true', unicode false_value=u'false') void #
Writes a
Table
to JSON format.- Parameters:
- sink_info: SinkInfo
The SinkInfo object to write the JSON to.
- table_w_meta: TableWithMetadata
The TableWithMetadata object containing the Table to write
- na_rep: str, default “”
The string representation for null values.
- include_nulls: bool, default False
Enables/Disables output of nulls as ‘null’.
- lines: bool, default False
If True, write output in the JSON lines format.
- rows_per_chunk: size_type, defaults to length of the input table
The maximum number of rows to write at a time.
- true_value: str, default “true”
The string representation for values != 0 in INT8 types.
- false_value: str, default “false”
The string representation for values == 0 in INT8 types.