Handle of an open file registered with cufile. More...
#include <file_handle.hpp>
Public Member Functions | |
FileHandle (std::string const &file_path, std::string const &flags="r", mode_t mode=m644, CompatMode compat_mode=defaults::compat_mode()) | |
Construct a file handle from a file path. More... | |
FileHandle (FileHandle const &)=delete | |
FileHandle support move semantic but isn't copyable. | |
FileHandle & | operator= (FileHandle const &)=delete |
FileHandle (FileHandle &&o) noexcept | |
FileHandle & | operator= (FileHandle &&o) noexcept |
bool | closed () const noexcept |
Whether the file is closed according to its initialization status. More... | |
void | close () noexcept |
Deregister the file and close the two files. | |
CUfileHandle_t | handle () |
Get the underlying cuFile file handle. More... | |
int | fd (bool o_direct=false) const noexcept |
Get one of the file descriptors. More... | |
int | fd_open_flags (bool o_direct=false) const |
Get the flags of one of the file descriptors (see open(2)) More... | |
std::size_t | nbytes () const |
Get the file size. More... | |
std::size_t | read (void *devPtr_base, std::size_t size, std::size_t file_offset, std::size_t devPtr_offset, bool sync_default_stream=true) |
Reads specified bytes from the file into the device memory. More... | |
std::size_t | write (void const *devPtr_base, std::size_t size, std::size_t file_offset, std::size_t devPtr_offset, bool sync_default_stream=true) |
Writes specified bytes from the device memory into the file. More... | |
std::future< std::size_t > | pread (void *buf, std::size_t size, std::size_t file_offset=0, std::size_t task_size=defaults::task_size(), std::size_t gds_threshold=defaults::gds_threshold(), bool sync_default_stream=true) |
Reads specified bytes from the file into the device or host memory in parallel. More... | |
std::future< std::size_t > | pwrite (void const *buf, std::size_t size, std::size_t file_offset=0, std::size_t task_size=defaults::task_size(), std::size_t gds_threshold=defaults::gds_threshold(), bool sync_default_stream=true) |
Writes specified bytes from device or host memory into the file in parallel. More... | |
void | read_async (void *devPtr_base, std::size_t *size_p, off_t *file_offset_p, off_t *devPtr_offset_p, ssize_t *bytes_read_p, CUstream stream) |
Reads specified bytes from the file into the device memory asynchronously. More... | |
StreamFuture | read_async (void *devPtr_base, std::size_t size, off_t file_offset=0, off_t devPtr_offset=0, CUstream stream=nullptr) |
Reads specified bytes from the file into the device memory asynchronously. More... | |
void | write_async (void *devPtr_base, std::size_t *size_p, off_t *file_offset_p, off_t *devPtr_offset_p, ssize_t *bytes_written_p, CUstream stream) |
Writes specified bytes from the device memory into the file asynchronously. More... | |
StreamFuture | write_async (void *devPtr_base, std::size_t size, off_t file_offset=0, off_t devPtr_offset=0, CUstream stream=nullptr) |
Writes specified bytes from the device memory into the file asynchronously. More... | |
const CompatModeManager & | get_compat_mode_manager () const noexcept |
Get the associated compatibility mode manager, which can be used to query the original requested compatibility mode or the expected compatibility modes for synchronous and asynchronous I/O. More... | |
Static Public Attributes | |
static constexpr mode_t | m644 = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH |
Friends | |
class | CompatModeManager |
Handle of an open file registered with cufile.
In order to utilize cufile and GDS, a file must be registered with cufile.
Definition at line 47 of file file_handle.hpp.
kvikio::FileHandle::FileHandle | ( | std::string const & | file_path, |
std::string const & | flags = "r" , |
||
mode_t | mode = m644 , |
||
CompatMode | compat_mode = defaults::compat_mode() |
||
) |
Construct a file handle from a file path.
FileHandle opens the file twice and maintains two file descriptors. One file is opened with the specified flags
and the other file is opened with the flags
plus the O_DIRECT
flag.
file_path | File path to the file |
flags | Open flags (see also fopen(3) ): "r" -> "open for reading (default)" "w" -> "open for writing, truncating the file first" "a" -> "open for writing, appending to the end of file if it exists" "+" -> "open for updating (reading and writing)" |
mode | Access modes (see open(2) ). |
compat_mode | Set KvikIO's compatibility mode for this file. |
|
noexcept |
Whether the file is closed according to its initialization status.
|
noexcept |
Get one of the file descriptors.
Notice, FileHandle maintains two file descriptors - one opened with the O_DIRECT
flag and one without.
o_direct | Whether to get the file descriptor opened with the O_DIRECT flag. |
int kvikio::FileHandle::fd_open_flags | ( | bool | o_direct = false | ) | const |
Get the flags of one of the file descriptors (see open(2))
Notice, FileHandle maintains two file descriptors - one opened with the O_DIRECT
flag and one without.
o_direct | Whether to get the flags of the file descriptor opened with the O_DIRECT flag. |
|
noexcept |
Get the associated compatibility mode manager, which can be used to query the original requested compatibility mode or the expected compatibility modes for synchronous and asynchronous I/O.
CUfileHandle_t kvikio::FileHandle::handle | ( | ) |
Get the underlying cuFile file handle.
The file handle must be open and not in compatibility mode i.e. both closed()
and is_compat_mode_preferred()
must be false.
std::size_t kvikio::FileHandle::nbytes | ( | ) | const |
Get the file size.
The value are cached.
std::future<std::size_t> kvikio::FileHandle::pread | ( | void * | buf, |
std::size_t | size, | ||
std::size_t | file_offset = 0 , |
||
std::size_t | task_size = defaults::task_size() , |
||
std::size_t | gds_threshold = defaults::gds_threshold() , |
||
bool | sync_default_stream = true |
||
) |
Reads specified bytes from the file into the device or host memory in parallel.
This API is a parallel async version of .read()
that partition the operation into tasks of size task_size
for execution in the default thread pool.
In order to improve performance of small buffers, when size < gds_threshold
a shortcut that circumvent the threadpool and use the POSIX backend directly is used.
buf
is part of is used. This means that when registering buffers, use the base address of the allocation. This is what memory_register
and memory_deregister
do automatically.buf | Address to device or host memory. |
size | Size in bytes to read. |
file_offset | Offset in the file to read from. |
task_size | Size of each task in bytes. |
gds_threshold | Minimum buffer size to use GDS and the thread pool. |
sync_default_stream | Synchronize the CUDA default (null) stream prior to calling cuFile. Contrary to most of the non-async CUDA API, cuFile does not have the semantic of being ordered with respect to other non-cuFile work in the default stream. By enabling sync_default_stream , KvikIO will synchronize the default stream and order the operation with respect to other work in the null stream. When in KvikIO's compatibility mode or when accessing host memory, the operation is always default stream ordered like the rest of the non-async CUDA API. In this case, the value of sync_default_stream is ignored. |
std::future
object's wait()
or get()
should not be called after the lifetime of the FileHandle object ends. Otherwise, the behavior is undefined. std::future<std::size_t> kvikio::FileHandle::pwrite | ( | void const * | buf, |
std::size_t | size, | ||
std::size_t | file_offset = 0 , |
||
std::size_t | task_size = defaults::task_size() , |
||
std::size_t | gds_threshold = defaults::gds_threshold() , |
||
bool | sync_default_stream = true |
||
) |
Writes specified bytes from device or host memory into the file in parallel.
This API is a parallel async version of .write()
that partition the operation into tasks of size task_size
for execution in the default thread pool.
In order to improve performance of small buffers, when size < gds_threshold
a shortcut that circumvent the threadpool and use the POSIX backend directly is used.
buf
is part of is used. This means that when registering buffers, use the base address of the allocation. This is what memory_register
and memory_deregister
do automatically.buf | Address to device or host memory. |
size | Size in bytes to write. |
file_offset | Offset in the file to write from. |
task_size | Size of each task in bytes. |
gds_threshold | Minimum buffer size to use GDS and the thread pool. |
sync_default_stream | Synchronize the CUDA default (null) stream prior to calling cuFile. Contrary to most of the non-async CUDA API, cuFile does not have the semantic of being ordered with respect to other non-cuFile work in the default stream. By enabling sync_default_stream , KvikIO will synchronize the default stream and order the operation with respect to other work in the null stream. When in KvikIO's compatibility mode or when accessing host memory, the operation is always default stream ordered like the rest of the non-async CUDA API. In this case, the value of sync_default_stream is ignored. |
std::future
object's wait()
or get()
should not be called after the lifetime of the FileHandle object ends. Otherwise, the behavior is undefined. std::size_t kvikio::FileHandle::read | ( | void * | devPtr_base, |
std::size_t | size, | ||
std::size_t | file_offset, | ||
std::size_t | devPtr_offset, | ||
bool | sync_default_stream = true |
||
) |
Reads specified bytes from the file into the device memory.
This API reads the data from the GPU memory to the file at a specified offset and size bytes by using GDS functionality. The API works correctly for unaligned offset and data sizes, although the performance is not on-par with aligned read. This is a synchronous call and will block until the IO is complete.
devPtr_offset
, if data will be read starting exactly from the devPtr_base
that is registered with buffer_register
, devPtr_offset
should be set to 0. To read starting from an offset in the registered buffer range, the relative offset should be specified in the devPtr_offset
, and the devPtr_base
must remain set to the base address that was used in the buffer_register
call.devPtr_base | Base address of buffer in device memory. For registered buffers, devPtr_base must remain set to the base address used in the buffer_register call. |
size | Size in bytes to read. |
file_offset | Offset in the file to read from. |
devPtr_offset | Offset relative to the devPtr_base pointer to read into. This parameter should be used only with registered buffers. |
sync_default_stream | Synchronize the CUDA default (null) stream prior to calling cuFile. Contrary to most of the non-async CUDA API, cuFile does not have the semantic of being ordered with respect to other non-cuFile work in the default stream. By enabling sync_default_stream , KvikIO will synchronize the default stream and order the operation with respect to other work in the null stream. When in KvikIO's compatibility mode or when accessing host memory, the operation is always default stream ordered like the rest of the non-async CUDA API. In this case, the value of sync_default_stream is ignored. |
void kvikio::FileHandle::read_async | ( | void * | devPtr_base, |
std::size_t * | size_p, | ||
off_t * | file_offset_p, | ||
off_t * | devPtr_offset_p, | ||
ssize_t * | bytes_read_p, | ||
CUstream | stream | ||
) |
Reads specified bytes from the file into the device memory asynchronously.
This is an asynchronous version of .read()
, which will be executed in sequence for the specified stream.
When running CUDA v12.1 or older, this function falls back to use .read()
after stream
has been synchronized.
The arguments have the same meaning as in .read()
but some of them are deferred. That is, the values pointed to by size_p
, file_offset_p
and devPtr_offset_p
will not be evaluated until execution time. Notice, this behavior can be changed using cuFile's cuFileStreamRegister API.
devPtr_base | Base address of buffer in device memory. For registered buffers, devPtr_base must remain set to the base address used in the buffer_register call. |
size_p | Pointer to size in bytes to read. If the exact size is not known at the time of I/O submission, then you must set it to the maximum possible I/O size for that stream I/O. Later the actual size can be set prior to the stream I/O execution. |
file_offset_p | Pointer to offset in the file from which to read. Unless otherwise set using cuFileStreamRegister API, this value will not be evaluated until execution time. |
devPtr_offset_p | Pointer to the offset relative to the bufPtr_base from which to write. Unless otherwise set using cuFileStreamRegister API, this value will not be evaluated until execution time. |
bytes_read_p | Pointer to the bytes read from file. This pointer should be a non-NULL value and *bytes_read_p set to 0. The bytes_read_p memory should be allocated with cuMemHostAlloc/malloc/mmap or registered with cuMemHostRegister. After successful execution of the operation in the stream, the value *bytes_read_p will contain either:
|
stream | CUDA stream in which to enqueue the operation. If NULL, make this operation synchronous. |
StreamFuture kvikio::FileHandle::read_async | ( | void * | devPtr_base, |
std::size_t | size, | ||
off_t | file_offset = 0 , |
||
off_t | devPtr_offset = 0 , |
||
CUstream | stream = nullptr |
||
) |
Reads specified bytes from the file into the device memory asynchronously.
This is an asynchronous version of .read()
, which will be executed in sequence for the specified stream.
When running CUDA v12.1 or older, this function falls back to use .read()
after stream
has been synchronized.
The arguments have the same meaning as in .read()
but returns a StreamFuture
object that the caller must keep alive until all data has been read from disk. One way to do this, is by calling StreamFuture.check_bytes_done()
, which will synchronize the associated stream and return the number of bytes read.
devPtr_base | Base address of buffer in device memory. For registered buffers, devPtr_base must remain set to the base address used in the buffer_register call. |
size | Size in bytes to read. |
file_offset | Offset in the file to read from. |
devPtr_offset | Offset relative to the devPtr_base pointer to read into. This parameter should be used only with registered buffers. |
stream | CUDA stream in which to enqueue the operation. If NULL, make this operation synchronous. |
stream
. std::size_t kvikio::FileHandle::write | ( | void const * | devPtr_base, |
std::size_t | size, | ||
std::size_t | file_offset, | ||
std::size_t | devPtr_offset, | ||
bool | sync_default_stream = true |
||
) |
Writes specified bytes from the device memory into the file.
This API writes the data from the GPU memory to the file at a specified offset and size bytes by using GDS functionality. The API works correctly for unaligned offset and data sizes, although the performance is not on-par with aligned writes. This is a synchronous call and will block until the IO is complete.
fsync(2)
call. If the file is opened with an O_SYNC
flag, the metadata will be written to the disk before the call is complete. Refer to the note in read for more information about devPtr_offset
.devPtr_base | Base address of buffer in device memory. For registered buffers, devPtr_base must remain set to the base address used in the buffer_register call. |
size | Size in bytes to write. |
file_offset | Offset in the file to write from. |
devPtr_offset | Offset relative to the devPtr_base pointer to write from. This parameter should be used only with registered buffers. |
sync_default_stream | Synchronize the CUDA default (null) stream prior to calling cuFile. Contrary to most of the non-async CUDA API, cuFile does not have the semantic of being ordered with respect to other non-cuFile work in the default stream. By enabling sync_default_stream , KvikIO will synchronize the default stream and order the operation with respect to other work in the null stream. When in KvikIO's compatibility mode or when accessing host memory, the operation is always default stream ordered like the rest of the non-async CUDA API. In this case, the value of sync_default_stream is ignored. |
void kvikio::FileHandle::write_async | ( | void * | devPtr_base, |
std::size_t * | size_p, | ||
off_t * | file_offset_p, | ||
off_t * | devPtr_offset_p, | ||
ssize_t * | bytes_written_p, | ||
CUstream | stream | ||
) |
Writes specified bytes from the device memory into the file asynchronously.
This is an asynchronous version of .write()
, which will be executed in sequence for the specified stream.
When running CUDA v12.1 or older, this function falls back to use .read()
after stream
has been synchronized.
The arguments have the same meaning as in .write()
but some of them are deferred. That is, the values pointed to by size_p
, file_offset_p
and devPtr_offset_p
will not be evaluated until execution time. Notice, this behavior can be changed using cuFile's cuFileStreamRegister API.
devPtr_base | Base address of buffer in device memory. For registered buffers, devPtr_base must remain set to the base address used in the buffer_register call. |
size_p | Pointer to size in bytes to read. If the exact size is not known at the time of I/O submission, then you must set it to the maximum possible I/O size for that stream I/O. Later the actual size can be set prior to the stream I/O execution. |
file_offset_p | Pointer to offset in the file from which to read. Unless otherwise set using cuFileStreamRegister API, this value will not be evaluated until execution time. |
devPtr_offset_p | Pointer to the offset relative to the bufPtr_base from which to read. Unless otherwise set using cuFileStreamRegister API, this value will not be evaluated until execution time. |
bytes_written_p | Pointer to the bytes read from file. This pointer should be a non-NULL value and *bytes_written_p set to 0. The bytes_written_p memory should be allocated with cuMemHostAlloc/malloc/mmap or registered with cuMemHostRegister. After successful execution of the operation in the stream, the value *bytes_written_p will contain either:
|
stream | CUDA stream in which to enqueue the operation. If NULL, make this operation synchronous. |
StreamFuture kvikio::FileHandle::write_async | ( | void * | devPtr_base, |
std::size_t | size, | ||
off_t | file_offset = 0 , |
||
off_t | devPtr_offset = 0 , |
||
CUstream | stream = nullptr |
||
) |
Writes specified bytes from the device memory into the file asynchronously.
This is an asynchronous version of .write()
, which will be executed in sequence for the specified stream.
When running CUDA v12.1 or older, this function falls back to use .read()
after stream
has been synchronized.
The arguments have the same meaning as in .write()
but returns a StreamFuture
object that the caller must keep alive until all data has been written to disk. One way to do this, is by calling StreamFuture.check_bytes_done()
, which will synchronize the associated stream and return the number of bytes written.
devPtr_base | Base address of buffer in device memory. For registered buffers, devPtr_base must remain set to the base address used in the buffer_register call. |
size | Size in bytes to write. |
file_offset | Offset in the file to write from. |
devPtr_offset | Offset relative to the devPtr_base pointer to write from. This parameter should be used only with registered buffers. |
stream | CUDA stream in which to enqueue the operation. If NULL, make this operation synchronous. |
stream
.