Remote File

KvikIO provides direct access to remote files, including AWS S3, WebHDFS, and generic HTTP/HTTPS.

Example

# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
# See file LICENSE for terms.

import pathlib
import tempfile

import cupy
import numpy

import kvikio
from kvikio.utils import LocalHttpServer


def main(tmpdir: pathlib.Path):
    a = cupy.arange(100)
    a.tofile(tmpdir / "myfile")
    b = cupy.empty_like(a)

    # Start a local server that serves files in `tmpdir`
    with LocalHttpServer(root_path=tmpdir) as server:
        # Open remote file from a http url
        with kvikio.RemoteFile.open_http(f"{server.url}/myfile") as f:
            # KvikIO fetch the file size
            assert f.nbytes() == a.nbytes
            # Read the remote file into `b` as if it was a local file.
            f.read(b)
            assert all(a == b)
            # We can also read into host memory seamlessly
            a = cupy.asnumpy(a)
            c = numpy.empty_like(a)
            f.read(c)
            assert all(a == c)


if __name__ == "__main__":
    with tempfile.TemporaryDirectory() as tmpdir:
        main(pathlib.Path(tmpdir))

AWS S3 object naming requirement

KvikIO imposes the following naming requirements derived from the AWS object naming guidelines .

  • !, *, ', (, ), &, $, @, =, ;, :, +, ,: These special characters are automatically encoded by KvikIO, and are safe for use in key names.

  • -, _, .: These special characters are not automatically encoded by KvikIO, but are still safe for use in key names.

  • / is used as path separator and must not appear in the object name itself.

  • Space character must be explicitly encoded (%20) because it will otherwise render the URL malformed.

  • ? must be explicitly encoded (%3F) because it will otherwise cause ambiguity with the query string.

  • Control characters 0x00 ~ 0x1F hexadecimal (0~31 decimal) and 0x7F (127) are automatically encoded by KvikIO, and are safe for use in key names.

  • Other printable special characters must be avoided, such as \, {, ^, }, %, `, ], ", >, [, ~, <, #, |.

  • Non-ASCII characters 0x80 ~ 0xFF (128~255) must be avoided.