Hardware Resources and Testing

Resource Allocation

CTest resource allocation frameworks allow tests to specify which hardware resources that they need, and for projects to specify the specific local/machine resources available. Combined together this ensures that tests are told which specific resources they should use, and ensures over-subscription won’t occur no matter the requested testing parallel level.

To get CTest resource allocation used by tests the following components are needed.

  • A JSON per-machine resource specification file

  • The CTEST_RESOURCE_SPEC_FILE points to the JSON file

  • Each add_test() records what resources it requires via test properties

  • Each test reads the relevant environment variables to determine what specific resources it should use

These are steep requirements that require large amounts of infrastructure setup for each project. In addition the CTest resource allocation specification is very relaxed, allowing it to represent arbitrary requirements such as CPUs, GPUs, and ASICs.

rapids_test

To help RAPIDS projects utilize all GPUs on a machine when running tests, the rapids-cmake project offers a suite of commands to simplify the process. These commands simplify GPU detection, setting up resource specification files, specifying test requirements, and setting the active CUDA GPU.

Machine GPU Detection

The key component of CTest resource allocation is having an accurate representation of the hardware that exists on the developer’s machine. The rapids_test_init() function will do system introspection to determine the number of GPUs on the current machine and generate a resource allocation JSON file representing these GPUs.

include(${CMAKE_BINARY_DIR}/RAPIDS.cmake)

include(rapids-test)

enable_testing()
rapids_test_init()

The CTest resource allocation specification isn’t limited to representing GPUs as a single unit. Instead it allows the JSON file to specify the capacity (slots) that each GPU has. In the case of rapids-cmake we always represent each GPU as having 100 slots allowing projects to think in total percentages when calculating requirements.

Specifying Tests GPU Requirements

As discussed above, each CMake test needs to specify the GPU resources they require to allow CTest to properly partition GPUs given the CTest parallel level. The easiest path for for developers is to use the rapids_test_add() which wraps each execution in a wrapper script that sets the CUDA visible devices, making tests only see the allocated device(s).

For example below we have three tests, two which can run concurrently on the same GPU and one that requires a full GPU. This specification will allow all three tests to run concurrently when a machine has 2+ GPUs with no modification of the tests!

include(rapids-test)

enable_testing()
rapids_test_init()

add_executable( cuda_test test.cu )
rapids_test_add(NAME test_small_alloc COMMAND cuda_test 50 GPUS 1 PERCENT 10)
rapids_test_add(NAME test_medium_alloc COMMAND cuda_test 100 GPUS 1 PERCENT 20)
rapids_test_add(NAME test_very_large_alloc COMMAND cuda_test 10000 GPUS 1)

Multi GPU Tests

The rapids_test_add() command also supports tests that require multiple GPU bindings. In that case you will need to request two (or more) GPUs with a full allocation like this:

include(rapids-test)

enable_testing()
rapids_test_init()

add_executable( cuda_test test.cu )
rapids_test_add(NAME multi_gpu COMMAND cuda_test GPUS 3)

Due to how CTest does allocations if you need distinct GPUs you need to request a percentage of 51% or higher. Otherwise you have a chance for multiple allocations to be placed on the same GPU.

When rapids-cmake test wrapper is insufficient

At times the approach of using wrapper scripts is insufficient, usually due to using existing test wrappers.

As discussed above, each CMake test still needs to specify the GPU resources they require to allow CTest to properly partition GPUs given the CTest parallel level. But in those cases the tests themselves will need to parse the CTest environment variables to extract what GPUs they should run on.

For the CMake side you can use rapids_test_gpu_requirements() to specify the requirements:

include(rapids-test)

enable_testing()
rapids_test_init()

add_executable( cuda_test test.cu )
target_link_libraries( cuda_test PRIVATE RAPIDS::test )

add_test(NAME test_small_alloc COMMAND cuda_test 50)
rapids_test_gpu_requirements(test_small_alloc GPUS 1 PERCENT 10)

Now in the C++ you need to parse the relevant CTEST_RESOURCE_GROUP environment variables. To simplify the process, here is some helper C++ code that will do the heavy lifting for you:

/*
 * Copyright (c) 2022-2023, NVIDIA CORPORATION.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#pragma once

#include <cuda_runtime_api.h>
#include <vector>

namespace rapids_cmake {

/*
 * Represents a GPU Allocation provided by a CTest resource specification.
 *
 * The `device_id` maps to the CUDA gpu id required by `cudaSetDevice`.
 * The slots represent the percentage of the GPU that this test will use.
 * Primarily used by CTest to ensure proper load balancing of tests.
 */
struct GPUAllocation {
  int device_id;
  int slots;
};

/*
 * Returns true when a CTest resource specification has been specified.
 *
 * Since the vast majority of tests should execute without a CTest resource
 * spec (e.g. when executed manually by a developer), callers of `rapids_cmake`
 * should first ensure that a CTestresource spec file has been provided before
 * trying to query/bind to the allocation.
 *
 * ```cxx
 *   if (rapids_cmake::using_resouces()) {
 *     rapids_cmake::bind_to_first_gpu();
 *   }
 * ```
 */
bool using_resources();

/*
 * Returns all GPUAllocations allocated for a test
 *
 * To support multi-GPU tests the CTest resource specification allows a
 * test to request multiple GPUs. As CUDA only allows binding to a
 * single GPU at any time, this API allows tests to know what CUDA
 * devices they should bind to.
 *
 * Note: The `device_id` of each allocation might not be unique.
 * If a test says it needs 50% of two GPUs, it could be allocated
 * the same physical GPU. If a test needs distinct / unique devices
 * it must request 51%+ of a device.
 *
 * Note: rapids_cmake does no caching, so this query should be cached
 * instead of called multiple times.
 */
std::vector<GPUAllocation> full_allocation();

/*
 * Have CUDA bind to a given GPUAllocation
 *
 * Have CUDA bind to the `device_id` specified in the CTest
 * GPU allocation
 *
 * Note: Return value is the cudaError_t of `cudaSetDevice`
 */
cudaError_t bind_to_gpu(GPUAllocation const& alloc);

/*
 * Convenience method to bind to the first GPU that CTest has allocated
 * Provided as most RAPIDS tests only require a single GPU
 *
 * Will return `false` if no GPUs have been allocated, or if setting
 * the CUDA device failed for any reason.
 */
bool bind_to_first_gpu();

}  // namespace rapids_cmake
/*
 * Copyright (c) 2022-2023, NVIDIA CORPORATION.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <rapids_cmake_ctest_allocation.hpp>

#include <cuda_runtime_api.h>

#include <algorithm>
#include <cstdlib>
#include <numeric>
#include <string>
#include <string_view>

namespace rapids_cmake {

namespace {
GPUAllocation noGPUAllocation() { return GPUAllocation{-1, -1}; }

GPUAllocation parseCTestAllocation(std::string_view env_variable)
{
  std::string gpu_resources{std::getenv(env_variable.begin())};
  // need to handle parseCTestAllocation variable being empty

  // need to handle parseCTestAllocation variable not having some
  // of the requested components

  // The string looks like "id:<number>,slots:<number>"
  auto id_start   = gpu_resources.find("id:") + 3;
  auto id_end     = gpu_resources.find(",");
  auto slot_start = gpu_resources.find("slots:") + 6;

  auto id    = gpu_resources.substr(id_start, id_end - id_start);
  auto slots = gpu_resources.substr(slot_start);

  return GPUAllocation{std::stoi(id), std::stoi(slots)};
}

std::vector<GPUAllocation> determineGPUAllocations()
{
  std::vector<GPUAllocation> allocations;
  const auto* resource_count = std::getenv("CTEST_RESOURCE_GROUP_COUNT");
  if (!resource_count) {
    allocations.emplace_back();
    return allocations;
  }

  const auto resource_max = std::stoi(resource_count);
  for (int index = 0; index < resource_max; ++index) {
    std::string group_env = "CTEST_RESOURCE_GROUP_" + std::to_string(index);
    std::string resource_group{std::getenv(group_env.c_str())};
    std::transform(resource_group.begin(), resource_group.end(), resource_group.begin(), ::toupper);

    if (resource_group == "GPUS") {
      auto resource_env = group_env + "_" + resource_group;
      auto&& allocation = parseCTestAllocation(resource_env);
      allocations.emplace_back(allocation);
    }
  }

  return allocations;
}
}  // namespace

bool using_resources()
{
  const auto* resource_count = std::getenv("CTEST_RESOURCE_GROUP_COUNT");
  return resource_count != nullptr;
}

std::vector<GPUAllocation> full_allocation() { return determineGPUAllocations(); }

cudaError_t bind_to_gpu(GPUAllocation const& alloc) { return cudaSetDevice(alloc.device_id); }

bool bind_to_first_gpu()
{
  if (using_resources()) {
    std::vector<GPUAllocation> allocs = determineGPUAllocations();
    return (bind_to_gpu(allocs[0]) == cudaSuccess);
  }
  return false;
}

}  // namespace rapids_cmake