Zarr

Zarr is a binary file format for chunked, compressed, N-Dimensional array. It is used throughout the PyData ecosystem and especially for climate and biological science applications.

Zarr-Python is the official Python package for reading and writing Zarr arrays. Its main feature is a NumPy-like array that translates array operations into file IO seamlessly. KvikIO provides a GPU backend to Zarr-Python that enables GPUDirect Storage (GDS) seamlessly.

The following is an example of how to use the convenience function kvikio.zarr.open_cupy_array() to create a new Zarr array and how to open an existing Zarr array.

# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
# See file LICENSE for terms.

import cupy
import numpy
import zarr

import kvikio
import kvikio.zarr


def main(path):
    a = cupy.arange(20)

    # Let's use KvikIO's convenience function `open_cupy_array()` to create
    # a new Zarr file on disk. Its semantic is the same as `zarr.open_array()`
    # but uses a GDS file store, nvCOMP compression, and CuPy arrays.
    z = kvikio.zarr.open_cupy_array(store=path, mode="w", shape=(20,), chunks=(5,))

    # `z` is a regular Zarr Array that we can write to as usual
    z[0:10] = numpy.arange(0, 10)
    # but it also support direct reads and writes of CuPy arrays
    z[10:20] = cupy.arange(10, 20)

    # Reading `z` returns a CuPy array
    assert isinstance(z[:], cupy.ndarray)
    assert (a == z[:]).all()

    # Normally, we cannot assume that GPU and CPU compressors are compatible.
    # E.g., `open_cupy_array()` uses nvCOMP's Snappy GPU compression by default,
    # which, as far as we know, isn’t compatible with any CPU compressor. Thus,
    # let's re-write our Zarr array using a CPU and GPU compatible compressor.
    #
    # Warning: it isn't possible to use `CompatCompressor` as a compressor argument
    #          in Zarr directly. It is only meant for `open_cupy_array()`. However,
    #          in an example further down, we show how to write using regular Zarr.
    z = kvikio.zarr.open_cupy_array(
        store=path,
        mode="w",
        shape=(20,),
        chunks=(5,),
        compressor=kvikio.zarr.CompatCompressor.lz4(),
    )
    z[:] = a

    # Because we are using a CompatCompressor, it is now possible to open the file
    # using Zarr's built-in LZ4 decompressor that uses the CPU.
    z = zarr.open_array(path)
    # `z` is now read as a regular NumPy array
    assert isinstance(z[:], numpy.ndarray)
    assert (a.get() == z[:]).all()
    # and we can write to is as usual
    z[:] = numpy.arange(20, 40)

    # And we can read the Zarr file back into a CuPy array.
    z = kvikio.zarr.open_cupy_array(store=path, mode="r")
    assert isinstance(z[:], cupy.ndarray)
    assert (cupy.arange(20, 40) == z[:]).all()

    # Similarly, we can also open a file written by regular Zarr.
    # Let's write the file without any compressor.
    ary = numpy.arange(10)
    z = zarr.open(store=path, mode="w", shape=ary.shape, compressor=None)
    z[:] = ary
    # This works as before where the file is read as a CuPy array
    z = kvikio.zarr.open_cupy_array(store=path)
    assert isinstance(z[:], cupy.ndarray)
    assert (z[:] == cupy.asarray(ary)).all()

    # Using a compressor is a bit more tricky since not all CPU compressors
    # are GPU compatible. To make sure we use a compable compressor, we use
    # the CPU-part of `CompatCompressor.lz4()`.
    ary = numpy.arange(10)
    z = zarr.open(
        store=path,
        mode="w",
        shape=ary.shape,
        compressor=kvikio.zarr.CompatCompressor.lz4().cpu,
    )
    z[:] = ary
    # This works as before where the file is read as a CuPy array
    z = kvikio.zarr.open_cupy_array(store=path)
    assert isinstance(z[:], cupy.ndarray)
    assert (z[:] == cupy.asarray(ary)).all()


if __name__ == "__main__":
    main("/tmp/zarr-cupy-nvcomp")