Remote File
KvikIO provides direct access to remote files, including AWS S3, WebHDFS, and generic HTTP/HTTPS.
Example
# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
# See file LICENSE for terms.
import pathlib
import tempfile
import cupy
import numpy
import kvikio
from kvikio.utils import LocalHttpServer
def main(tmpdir: pathlib.Path):
a = cupy.arange(100)
a.tofile(tmpdir / "myfile")
b = cupy.empty_like(a)
# Start a local server that serves files in `tmpdir`
with LocalHttpServer(root_path=tmpdir) as server:
# Open remote file from a http url
with kvikio.RemoteFile.open_http(f"{server.url}/myfile") as f:
# KvikIO fetch the file size
assert f.nbytes() == a.nbytes
# Read the remote file into `b` as if it was a local file.
f.read(b)
assert all(a == b)
# We can also read into host memory seamlessly
a = cupy.asnumpy(a)
c = numpy.empty_like(a)
f.read(c)
assert all(a == c)
if __name__ == "__main__":
with tempfile.TemporaryDirectory() as tmpdir:
main(pathlib.Path(tmpdir))
AWS S3 object naming requirement
KvikIO imposes the following naming requirements derived from the AWS object naming guidelines .
!
,*
,'
,(
,)
,&
,$
,@
,=
,;
,:
,+
,,
: These special characters are automatically encoded by KvikIO, and are safe for use in key names.
-
,_
,.
: These special characters are not automatically encoded by KvikIO, but are still safe for use in key names.
/
is used as path separator and must not appear in the object name itself.Space character must be explicitly encoded (
%20
) because it will otherwise render the URL malformed.
?
must be explicitly encoded (%3F
) because it will otherwise cause ambiguity with the query string.Control characters
0x00
~0x1F
hexadecimal (0~31 decimal) and0x7F
(127) are automatically encoded by KvikIO, and are safe for use in key names.Other printable special characters must be avoided, such as
\
,{
,^
,}
,%
,`
,]
,"
,>
,[
,~
,<
,#
,|
.Non-ASCII characters
0x80
~0xFF
(128~255) must be avoided.