Package ai.rapids.cudf
Class Cuda
java.lang.Object
ai.rapids.cudf.Cuda
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final class
static final class
A class representing a CUDA stream -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
asyncMemset
(long dst, byte value, long count) Sets count bytes starting at the memory area pointed to by dst, with value.static void
Set the device for this thread to the appropriate one.static void
Synchronizes the whole device using cudaDeviceSynchronize.static void
freeZero()
Calls cudaFree(0).static int
Gets the major CUDA compute capability of the current device.static int
Gets the minor CUDA compute capability of the current device.static CudaComputeMode
Gets the CUDA compute mode of the current device.static int
Get the id of the current device.static int
Get the device count.static int
Get the CUDA Driver version, which is the latest version of CUDA supported by the driver.static int
Get the CUDA Runtime version of the current CUDA Runtime instance.static boolean
This should only be used for tests, to enable or disable tests if the current environment is not compatible with this version of the library.static boolean
Whether per-thread default stream is enabled.static CudaMemInfo
Mapping: cudaMemGetInfo(size_t *free, size_t *total)static void
memset
(long dst, byte value, long count) Sets count bytes starting at the memory area pointed to by dst, with value.static void
multiBufferCopyAsync
(long[] destAddrs, long[] srcAddrs, long[] copySizes, Cuda.Stream stream) Copy data from multiple device buffer sources to multiple device buffer destinations.static void
Begins an Nsight profiling session, if a profiler is currently attached.static void
Stops an active Nsight profiling session.static void
setDevice
(int device) Set the id of the current device.
-
Field Details
-
DEFAULT_STREAM
-
-
Constructor Details
-
Cuda
public Cuda()
-
-
Method Details
-
getComputeMode
Gets the CUDA compute mode of the current device.- Returns:
- the enum value of CudaComputeMode
-
memGetInfo
Mapping: cudaMemGetInfo(size_t *free, size_t *total)- Throws:
CudaException
-
memset
Sets count bytes starting at the memory area pointed to by dst, with value. The operation has completed when this returns, but it could overlap with operations occurring on other streams.- Parameters:
dst
- - Destination memory addressvalue
- - Byte value to set dst withcount
- - Size in bytes to set- Throws:
CudaException
-
asyncMemset
Sets count bytes starting at the memory area pointed to by dst, with value. The operation has not necessarily completed when this returns, but it could overlap with operations occurring on other streams.- Parameters:
dst
- - Destination memory addressvalue
- - Byte value to set dst withcount
- - Size in bytes to set- Throws:
CudaException
-
getDevice
Get the id of the current device.- Returns:
- the id of the current device
- Throws:
CudaException
- on any error
-
getDeviceCount
Get the device count.- Returns:
- returns the number of compute-capable devices
- Throws:
CudaException
- on any error
-
setDevice
Set the id of the current device.Note this is relative to CUDA_SET_VISIBLE_DEVICES, e.g. if CUDA_SET_VISIBLE_DEVICES=1,0, and you call setDevice(0), you will get device 1.
Note if RMM has been initialized and the requested device ID does not match the device used to initialize RMM then this will throw an error.
- Throws:
CudaException
- on any errorCudfException
-
autoSetDevice
Set the device for this thread to the appropriate one. Java loves threads, but cuda requires each thread to have the device set explicitly or it falls back to CUDA_VISIBLE_DEVICES. Most JNI calls through the cudf API will do this for you, but if you are writing your own JNI calls that extend cudf you might want to call this before calling into your JNI APIs to ensure that the device is set correctly.- Throws:
CudaException
- on any error
-
getDriverVersion
Get the CUDA Driver version, which is the latest version of CUDA supported by the driver. The version is returned as (1000 major + 10 minor). For example, CUDA 9.2 would be represented by 9020. If no driver is installed,then 0 is returned as the driver version.- Returns:
- the CUDA driver version
- Throws:
CudaException
- on any error
-
getRuntimeVersion
Get the CUDA Runtime version of the current CUDA Runtime instance. The version is returned as (1000 major + 10 minor). For example, CUDA 9.2 would be represented by 9020.- Returns:
- the CUDA Runtime version
- Throws:
CudaException
- on any error
-
getComputeCapabilityMajor
Gets the major CUDA compute capability of the current device. For reference: https://developer.nvidia.com/cuda-gpus Hardware Generation Compute Capability Ampere 8.x Turing 7.5 Volta 7.0, 7.2 Pascal 6.x Maxwell 5.x Kepler 3.x Fermi 2.x- Returns:
- The Major compute capability version number of the current CUDA device
- Throws:
CudaException
- on any error
-
getComputeCapabilityMinor
Gets the minor CUDA compute capability of the current device. For reference: https://developer.nvidia.com/cuda-gpus Hardware Generation Compute Capability Ampere 8.x Turing 7.5 Volta 7.0, 7.2 Pascal 6.x Maxwell 5.x Kepler 3.x Fermi 2.x- Returns:
- The Minor compute capability version number of the current CUDA device
- Throws:
CudaException
- on any error
-
freeZero
Calls cudaFree(0). This can be used to initialize the GPU after a setDevice()- Throws:
CudaException
- on any error
-
isEnvCompatibleForTesting
public static boolean isEnvCompatibleForTesting()This should only be used for tests, to enable or disable tests if the current environment is not compatible with this version of the library. Currently it only does some very basic checks, but these may be expanded in the future depending on needs.- Returns:
- true if it is compatible else false.
-
isPtdsEnabled
public static boolean isPtdsEnabled()Whether per-thread default stream is enabled. -
multiBufferCopyAsync
public static void multiBufferCopyAsync(long[] destAddrs, long[] srcAddrs, long[] copySizes, Cuda.Stream stream) Copy data from multiple device buffer sources to multiple device buffer destinations. For each buffer to copy there is a corresponding entry in the destination address, source address, and copy size vectors.- Parameters:
destAddrs
- vector of device destination addressessrcAddrs
- vector of device source addressescopySizes
- vector of copy sizesstream
- CUDA stream to use for the copy
-
profilerStart
public static void profilerStart()Begins an Nsight profiling session, if a profiler is currently attached. -
profilerStop
public static void profilerStop()Stops an active Nsight profiling session. -
deviceSynchronize
public static void deviceSynchronize()Synchronizes the whole device using cudaDeviceSynchronize.
-