Handle of remote file.
More...
#include <remote_handle.hpp>
Handle of remote file.
Definition at line 297 of file remote_handle.hpp.
◆ RemoteHandle() [1/2]
| kvikio::RemoteHandle::RemoteHandle |
( |
std::unique_ptr< RemoteEndpoint > |
endpoint, |
|
|
std::size_t |
nbytes |
|
) |
| |
Create a new remote handle from an endpoint and a file size.
- Parameters
-
| endpoint | Remote endpoint used for subsequent IO. |
| nbytes | The size of the remote file (in bytes). |
◆ RemoteHandle() [2/2]
| kvikio::RemoteHandle::RemoteHandle |
( |
std::unique_ptr< RemoteEndpoint > |
endpoint | ) |
|
Create a new remote handle from an endpoint (infers the file size).
The file size is received from the remote server using endpoint.
- Parameters
-
| endpoint | Remote endpoint used for subsequently IO. |
◆ endpoint()
Get a const reference to the underlying remote endpoint.
- Returns
- The remote endpoint.
◆ nbytes()
| std::size_t kvikio::RemoteHandle::nbytes |
( |
| ) |
const |
|
noexcept |
Get the file size.
Note, the file size is retrieved at construction so this method is very fast, no communication needed.
- Returns
- The number of bytes.
◆ open()
Create a remote file handle from a URL.
This function creates a RemoteHandle for reading data from various remote endpoints including HTTP/HTTPS servers, AWS S3 buckets, S3 presigned URLs, and WebHDFS. The endpoint type can be automatically detected from the URL or explicitly specified.
- Parameters
-
If not provided, defaults to all supported types in this order: RemoteEndpointType::S3, RemoteEndpointType::S3_PRESIGNED_URL, RemoteEndpointType::WEBHDFS, and RemoteEndpointType::HTTP.
- Parameters
-
| nbytes | Optional file size in bytes. If not provided, the function sends additional request to the server to query the file size. |
- Returns
- A RemoteHandle object that can be used to read data from the remote file.
- Exceptions
-
| std::runtime_error | If:
- If the URL is malformed or missing required components.
- RemoteEndpointType::AUTO mode is used and the URL doesn't match any supported endpoint type.
- The specified endpoint type is not in the
allow_list.
- The URL is invalid for the specified endpoint type.
- Unable to connect to the remote server or determine file size (when nbytes not provided).
|
Example:
- Auto-detect endpoint type from URL
"https://bucket.s3.amazonaws.com/object?X-Amz-Algorithm=AWS4-HMAC-SHA256"
"&X-Amz-Credential=...&X-Amz-Signature=..."
);
static RemoteHandle open(std::string url, RemoteEndpointType remote_endpoint_type=RemoteEndpointType::AUTO, std::optional< std::vector< RemoteEndpointType >> allow_list=std::nullopt, std::optional< std::size_t > nbytes=std::nullopt)
Create a remote file handle from a URL.
- Open S3 file with explicit endpoint type
"https://my-bucket.s3.us-east-1.amazonaws.com/data.bin",
);
- Restrict endpoint type candidates
std::vector<kvikio::RemoteEndpointType> allow_list = {
};
user_provided_url,
allow_list
);
- Provide known file size to skip HEAD request
"https://example.com/large-file.bin",
std::nullopt,
1024 * 1024 * 100
);
◆ pread()
Read from remote source into buffer (host or device memory) in parallel.
This API is a parallel async version of .read() that partitions the operation into tasks of size task_size for execution in the default thread pool.
- Parameters
-
| buf | Pointer to host or device memory. |
| size | Number of bytes to read. |
| file_offset | File offset in bytes. |
| task_size | Size of each task in bytes. |
| thread_pool | Thread pool to use for parallel execution. Defaults to the global default thread pool. The caller is responsible for ensuring that the thread pool remains valid until the returned future is consumed (i.e., until get() or wait() is called on it). |
- Returns
- Future that on completion returns the size of bytes read, which is always
size.
- Note
- The returned
std::future object must not outlive either the RemoteHandle or the thread pool. Calling wait() or get() on the future after the RemoteHandle or thread pool has been destroyed results in undefined behavior.
◆ read()
| std::size_t kvikio::RemoteHandle::read |
( |
void * |
buf, |
|
|
std::size_t |
size, |
|
|
std::size_t |
file_offset = 0 |
|
) |
| |
Read from remote source into buffer (host or device memory).
When reading into device memory, a bounce buffer is used to avoid many small memory copies to device. Use kvikio::default::bounce_buffer_size_reset() to set the size of this bounce buffer (default 16 MiB).
- Parameters
-
| buf | Pointer to host or device memory. |
| size | Number of bytes to read. |
| file_offset | File offset in bytes. |
- Returns
- Number of bytes read, which is always
size.
◆ remote_endpoint_type()
Get the type of the remote file.
- Returns
- The type of the remote file.
The documentation for this class was generated from the following file: