Public Member Functions | Static Public Member Functions | List of all members
kvikio::RemoteHandle Class Reference

Handle of remote file. More...

#include <remote_handle.hpp>

Public Member Functions

 RemoteHandle (std::unique_ptr< RemoteEndpoint > endpoint, std::size_t nbytes)
 Create a new remote handle from an endpoint and a file size. More...
 
 RemoteHandle (std::unique_ptr< RemoteEndpoint > endpoint)
 Create a new remote handle from an endpoint (infers the file size). More...
 
 RemoteHandle (RemoteHandle &&o)=default
 
RemoteHandleoperator= (RemoteHandle &&o)=default
 
 RemoteHandle (RemoteHandle const &)=delete
 
RemoteHandleoperator= (RemoteHandle const &)=delete
 
RemoteEndpointType remote_endpoint_type () const noexcept
 Get the type of the remote file. More...
 
std::size_t nbytes () const noexcept
 Get the file size. More...
 
RemoteEndpoint const & endpoint () const noexcept
 Get a const reference to the underlying remote endpoint. More...
 
std::size_t read (void *buf, std::size_t size, std::size_t file_offset=0)
 Read from remote source into buffer (host or device memory). More...
 
std::future< std::size_t > pread (void *buf, std::size_t size, std::size_t file_offset=0, std::size_t task_size=defaults::task_size())
 Read from remote source into buffer (host or device memory) in parallel. More...
 

Static Public Member Functions

static RemoteHandle open (std::string url, RemoteEndpointType remote_endpoint_type=RemoteEndpointType::AUTO, std::optional< std::vector< RemoteEndpointType >> allow_list=std::nullopt, std::optional< std::size_t > nbytes=std::nullopt)
 Create a remote file handle from a URL. More...
 

Detailed Description

Handle of remote file.

Definition at line 287 of file remote_handle.hpp.

Constructor & Destructor Documentation

◆ RemoteHandle() [1/2]

kvikio::RemoteHandle::RemoteHandle ( std::unique_ptr< RemoteEndpoint endpoint,
std::size_t  nbytes 
)

Create a new remote handle from an endpoint and a file size.

Parameters
endpointRemote endpoint used for subsequent IO.
nbytesThe size of the remote file (in bytes).

◆ RemoteHandle() [2/2]

kvikio::RemoteHandle::RemoteHandle ( std::unique_ptr< RemoteEndpoint endpoint)

Create a new remote handle from an endpoint (infers the file size).

The file size is received from the remote server using endpoint.

Parameters
endpointRemote endpoint used for subsequently IO.

Member Function Documentation

◆ endpoint()

RemoteEndpoint const& kvikio::RemoteHandle::endpoint ( ) const
noexcept

Get a const reference to the underlying remote endpoint.

Returns
The remote endpoint.

◆ nbytes()

std::size_t kvikio::RemoteHandle::nbytes ( ) const
noexcept

Get the file size.

Note, the file size is retrieved at construction so this method is very fast, no communication needed.

Returns
The number of bytes.

◆ open()

static RemoteHandle kvikio::RemoteHandle::open ( std::string  url,
RemoteEndpointType  remote_endpoint_type = RemoteEndpointType::AUTO,
std::optional< std::vector< RemoteEndpointType >>  allow_list = std::nullopt,
std::optional< std::size_t >  nbytes = std::nullopt 
)
static

Create a remote file handle from a URL.

This function creates a RemoteHandle for reading data from various remote endpoints including HTTP/HTTPS servers, AWS S3 buckets, S3 presigned URLs, and WebHDFS. The endpoint type can be automatically detected from the URL or explicitly specified.

Parameters
urlThe URL of the remote file. Supported formats include:
  • S3 with credentials
  • S3 presigned URL
  • WebHDFS
  • HTTP/HTTPS
remote_endpoint_typeThe type of remote endpoint. Default is RemoteEndpointType::AUTO which automatically detects the endpoint type from the URL. Can be explicitly set to RemoteEndpointType::S3, RemoteEndpointType::S3_PRESIGNED_URL, RemoteEndpointType::WEBHDFS, or RemoteEndpointType::HTTP to force a specific endpoint type.
allow_listOptional list of allowed endpoint types. If provided:
  • If remote_endpoint_type is RemoteEndpointType::AUTO, Types are tried in the exact order specified until a match is found.
  • In explicit mode, the specified type must be in this list, otherwise an exception is thrown.

If not provided, defaults to all supported types in this order: RemoteEndpointType::S3, RemoteEndpointType::S3_PRESIGNED_URL, RemoteEndpointType::WEBHDFS, and RemoteEndpointType::HTTP.

Parameters
nbytesOptional file size in bytes. If not provided, the function sends additional request to the server to query the file size.
Returns
A RemoteHandle object that can be used to read data from the remote file.
Exceptions
std::runtime_errorIf:
  • If the URL is malformed or missing required components.
  • RemoteEndpointType::AUTO mode is used and the URL doesn't match any supported endpoint type.
  • The specified endpoint type is not in the allow_list.
  • The URL is invalid for the specified endpoint type.
  • Unable to connect to the remote server or determine file size (when nbytes not provided).

Example:

◆ pread()

std::future<std::size_t> kvikio::RemoteHandle::pread ( void *  buf,
std::size_t  size,
std::size_t  file_offset = 0,
std::size_t  task_size = defaults::task_size() 
)

Read from remote source into buffer (host or device memory) in parallel.

This API is a parallel async version of .read() that partitions the operation into tasks of size task_size for execution in the default thread pool.

Parameters
bufPointer to host or device memory.
sizeNumber of bytes to read.
file_offsetFile offset in bytes.
task_sizeSize of each task in bytes.
Returns
Future that on completion returns the size of bytes read, which is always size.

◆ read()

std::size_t kvikio::RemoteHandle::read ( void *  buf,
std::size_t  size,
std::size_t  file_offset = 0 
)

Read from remote source into buffer (host or device memory).

When reading into device memory, a bounce buffer is used to avoid many small memory copies to device. Use kvikio::default::bounce_buffer_size_reset() to set the size of this bounce buffer (default 16 MiB).

Parameters
bufPointer to host or device memory.
sizeNumber of bytes to read.
file_offsetFile offset in bytes.
Returns
Number of bytes read, which is always size.

◆ remote_endpoint_type()

RemoteEndpointType kvikio::RemoteHandle::remote_endpoint_type ( ) const
noexcept

Get the type of the remote file.

Returns
The type of the remote file.

The documentation for this class was generated from the following file: