Remote File
KvikIO provides direct access to remote files, including AWS S3, WebHDFS, and generic HTTP/HTTPS.
Example
# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
import pathlib
import tempfile
import cupy
import numpy
import kvikio
from kvikio.utils import LocalHttpServer
def main(tmpdir: pathlib.Path):
a = cupy.arange(100)
a.tofile(tmpdir / "myfile")
b = cupy.empty_like(a)
# Start a local server that serves files in `tmpdir`
with LocalHttpServer(root_path=tmpdir) as server:
# Open remote file from a http url
with kvikio.RemoteFile.open_http(f"{server.url}/myfile") as f:
# KvikIO fetch the file size
assert f.nbytes() == a.nbytes
# Read the remote file into `b` as if it was a local file.
f.read(b)
assert all(a == b)
# We can also read into host memory seamlessly
a = cupy.asnumpy(a)
c = numpy.empty_like(a)
f.read(c)
assert all(a == c)
if __name__ == "__main__":
with tempfile.TemporaryDirectory() as tmpdir:
main(pathlib.Path(tmpdir))
AWS S3 object naming requirement
KvikIO imposes the following naming requirements derived from the AWS object naming guidelines .
!,*,',(,),&,$,@,=,;,:,+,,: These special characters are automatically encoded by KvikIO, and are safe for use in key names.
-,_,.: These special characters are not automatically encoded by KvikIO, but are still safe for use in key names.
/is used as path separator and must not appear in the object name itself.Space character must be explicitly encoded (
%20) because it will otherwise render the URL malformed.
?must be explicitly encoded (%3F) because it will otherwise cause ambiguity with the query string.Control characters
0x00~0x1Fhexadecimal (0~31 decimal) and0x7F(127) are automatically encoded by KvikIO, and are safe for use in key names.Other printable special characters must be avoided, such as
\,{,^,},%,`,],",>,[,~,<,#,|.Non-ASCII characters
0x80~0xFF(128~255) must be avoided.