cudf.read_json(path_or_buf, engine='auto', dtype=True, lines=False, compression='infer', byte_range=None, keep_quotes=False, *args, **kwargs)#

Load a JSON dataset into a DataFrame

path_or_buflist, str, path object, or file-like object

Either JSON data in a str, path to a file (a str, pathlib.Path, or py._path.local.LocalPath), URL (including http, ftp, and S3 locations), or any object with a read() method (such as builtin open() file handler function or StringIO). Multiple inputs may be provided as a list. If a list is specified each list entry may be of a different input type as long as each input is of a valid type and all input JSON schema(s) match.

engine{{ ‘auto’, ‘cudf’, ‘cudf_experimental’, ‘pandas’ }}, default ‘auto’

Parser engine to use. If ‘auto’ is passed, the engine will be automatically selected based on the other parameters.


Indication of expected JSON string format (pandas engine only). Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible orients is:

  • 'split' : dict like {index -> [index], columns -> [columns], data -> [values]}

  • 'records' : list like [{column -> value}, ... , {column -> value}]

  • 'index' : dict like {index -> {column -> value}}

  • 'columns' : dict like {column -> {index -> value}}

  • 'values' : just the values array

The allowed and default values depend on the value of the typ parameter.

  • when typ == 'series',

    • allowed orients are {'split','records','index'}

    • default is 'index'

    • The Series index must be unique for orient 'index'.

  • when typ == 'frame',

    • allowed orients are {'split','records','index', 'columns','values', 'table'}

    • default is 'columns'

    • The DataFrame index must be unique for orients 'index' and 'columns'.

    • The DataFrame columns must be unique for orients 'index', 'columns', and 'records'.

typtype of object to recover (series or frame), default ‘frame’

With cudf engine, only frame output is supported.

dtypeboolean or dict, default True

If True, infer dtypes, if a dict of column to dtype, then use those, if False, then don’t infer dtypes at all, applies only to the data.

convert_axesboolean, default True

Try to convert the axes to the proper dtypes (pandas engine only).

convert_datesboolean, default True

List of columns to parse for dates (pandas engine only); If True, then try to parse datelike columns default is True; a column label is datelike if

  • it ends with '_at',

  • it ends with '_time',

  • it begins with 'timestamp',

  • it is 'modified', or

  • it is 'date'

keep_default_datesboolean, default True

If parsing dates, parse the default datelike columns (pandas engine only)

numpyboolean, default False

Direct decoding to numpy arrays (pandas engine only). Supports numeric data only, but non-numeric column and index labels are supported. Note also that the JSON ordering MUST be the same for each term if numpy=True.

precise_floatboolean, default False

Set to enable usage of higher precision (strtod) function when decoding string to double values (pandas engine only). Default (False) is to use fast but less precise builtin functionality

date_unitstring, default None

The timestamp unit to detect if converting dates (pandas engine only). The default behavior is to try and detect the correct precision, but if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force parsing only seconds, milliseconds, microseconds or nanoseconds.

encodingstr, default is ‘utf-8’

The encoding to use to decode py3 bytes. With cudf engine, only utf-8 is supported.

linesboolean, default False

Read the file as a json object per line.

chunksizeinteger, default None

Return JsonReader object for iteration (pandas engine only). See the line-delimited json docs for more information on chunksize. This can only be passed if lines=True. If this is None, the file will be read into memory all at once.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’

For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip, bz2, zip or xz if path_or_buf is a string ending in ‘.gz’, ‘.bz2’, ‘.zip’, or ‘xz’, respectively, and no decompression otherwise. If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.

byte_rangelist or tuple, default None

Byte range within the input file to be read (cudf engine only). The first number is the offset in bytes, the second number is the range size in bytes. Set the size to zero to read all data after the offset location. Reads the row that starts before or at the end of the range, even if it ends after the end of the range.

keep_quotesbool, default False

This parameter is only supported in cudf_experimental engine. If True, any string values are read literally (and wrapped in an additional set of quotes). If False string values are parsed into Python strings.

resultSeries or DataFrame, depending on the value of typ.


>>> import cudf
>>> df = cudf.DataFrame({'a': ["hello", "rapids"], 'b': ["hello", "worlds"]})
>>> df
        a       b
0   hello   hello
1  rapids  worlds
>>> json_str = df.to_json(orient='records', lines=True)
>>> json_str
>>> cudf.read_json(json_str,  engine="cudf", lines=True)
        a       b
0   hello   hello
1  rapids  worlds

To read the strings with additional set of quotes:

>>> cudf.read_json(json_str,  engine="cudf_experimental", lines=True,
...                keep_quotes=True)
          a         b
0   "hello"   "hello"
1  "rapids"  "worlds"

Reading a JSON string containing ordered lists and name/value pairs:

>>> json_str = '[{"list": [0,1,2], "struct": {"k":"v1"}}, {"list": [3,4,5], "struct": {"k":"v2"}}]'
>>> cudf.read_json(json_str, engine='cudf_experimental')
        list       struct
0  [0, 1, 2]  {'k': 'v1'}
1  [3, 4, 5]  {'k': 'v2'}

Reading JSON Lines data containing ordered lists and name/value pairs:

>>> json_str = '{"a": [{"k1": "v1"}]}
{"a": [{"k1":"v2"}]}'
>>> cudf.read_json(json_str, engine='cudf_experimental', lines=True)
0  [{'k1': 'v1'}]
1  [{'k1': 'v2'}]

Using the dtype argument to specify type casting:

>>> json_str = '{"k1": 1, "k2":[1.5]}'
>>> cudf.read_json(json_str, engine='cudf_experimental', lines=True, dtype={'k1':float, 'k2':cudf.ListDtype(int)})
    k1   k2
0  1.0  [1]