Handle of remote file.
More...
#include <remote_handle.hpp>
Handle of remote file.
Definition at line 287 of file remote_handle.hpp.
◆ RemoteHandle() [1/2]
kvikio::RemoteHandle::RemoteHandle |
( |
std::unique_ptr< RemoteEndpoint > |
endpoint, |
|
|
std::size_t |
nbytes |
|
) |
| |
Create a new remote handle from an endpoint and a file size.
- Parameters
-
endpoint | Remote endpoint used for subsequent IO. |
nbytes | The size of the remote file (in bytes). |
◆ RemoteHandle() [2/2]
kvikio::RemoteHandle::RemoteHandle |
( |
std::unique_ptr< RemoteEndpoint > |
endpoint | ) |
|
Create a new remote handle from an endpoint (infers the file size).
The file size is received from the remote server using endpoint
.
- Parameters
-
endpoint | Remote endpoint used for subsequently IO. |
◆ endpoint()
Get a const reference to the underlying remote endpoint.
- Returns
- The remote endpoint.
◆ nbytes()
std::size_t kvikio::RemoteHandle::nbytes |
( |
| ) |
const |
|
noexcept |
Get the file size.
Note, the file size is retrieved at construction so this method is very fast, no communication needed.
- Returns
- The number of bytes.
◆ open()
Create a remote file handle from a URL.
This function creates a RemoteHandle for reading data from various remote endpoints including HTTP/HTTPS servers, AWS S3 buckets, S3 presigned URLs, and WebHDFS. The endpoint type can be automatically detected from the URL or explicitly specified.
- Parameters
-
If not provided, defaults to all supported types in this order: RemoteEndpointType::S3, RemoteEndpointType::S3_PRESIGNED_URL, RemoteEndpointType::WEBHDFS, and RemoteEndpointType::HTTP.
- Parameters
-
nbytes | Optional file size in bytes. If not provided, the function sends additional request to the server to query the file size. |
- Returns
- A RemoteHandle object that can be used to read data from the remote file.
- Exceptions
-
std::runtime_error | If:
- If the URL is malformed or missing required components.
- RemoteEndpointType::AUTO mode is used and the URL doesn't match any supported endpoint type.
- The specified endpoint type is not in the
allow_list .
- The URL is invalid for the specified endpoint type.
- Unable to connect to the remote server or determine file size (when nbytes not provided).
|
Example:
- Auto-detect endpoint type from URL
"https://bucket.s3.amazonaws.com/object?X-Amz-Algorithm=AWS4-HMAC-SHA256"
"&X-Amz-Credential=...&X-Amz-Signature=..."
);
static RemoteHandle open(std::string url, RemoteEndpointType remote_endpoint_type=RemoteEndpointType::AUTO, std::optional< std::vector< RemoteEndpointType >> allow_list=std::nullopt, std::optional< std::size_t > nbytes=std::nullopt)
Create a remote file handle from a URL.
- Open S3 file with explicit endpoint type
"https://my-bucket.s3.us-east-1.amazonaws.com/data.bin",
);
- Restrict endpoint type candidates
std::vector<kvikio::RemoteEndpointType> allow_list = {
};
user_provided_url,
allow_list
);
- Provide known file size to skip HEAD request
"https://example.com/large-file.bin",
std::nullopt,
1024 * 1024 * 100
);
◆ pread()
std::future<std::size_t> kvikio::RemoteHandle::pread |
( |
void * |
buf, |
|
|
std::size_t |
size, |
|
|
std::size_t |
file_offset = 0 , |
|
|
std::size_t |
task_size = defaults::task_size() |
|
) |
| |
Read from remote source into buffer (host or device memory) in parallel.
This API is a parallel async version of .read()
that partitions the operation into tasks of size task_size
for execution in the default thread pool.
- Parameters
-
buf | Pointer to host or device memory. |
size | Number of bytes to read. |
file_offset | File offset in bytes. |
task_size | Size of each task in bytes. |
- Returns
- Future that on completion returns the size of bytes read, which is always
size
.
◆ read()
std::size_t kvikio::RemoteHandle::read |
( |
void * |
buf, |
|
|
std::size_t |
size, |
|
|
std::size_t |
file_offset = 0 |
|
) |
| |
Read from remote source into buffer (host or device memory).
When reading into device memory, a bounce buffer is used to avoid many small memory copies to device. Use kvikio::default::bounce_buffer_size_reset()
to set the size of this bounce buffer (default 16 MiB).
- Parameters
-
buf | Pointer to host or device memory. |
size | Number of bytes to read. |
file_offset | File offset in bytes. |
- Returns
- Number of bytes read, which is always
size
.
◆ remote_endpoint_type()
Get the type of the remote file.
- Returns
- The type of the remote file.
The documentation for this class was generated from the following file: