Unit tests in libcudf are written using Google Test.
Important: Instead of including gtest/gtest.h
directly, use #include <cudf_test/cudf_gtest.hpp>
.
Also, write test code in the global namespace. That is, do not write test code in the cudf
or the cudf::test
namespace or their sub-namespaces. Likewise, do not use using namespace cudf;
or using namespace cudf::test;
in the global namespace.
In general we should test to make sure all code paths are covered. This is not always easy or possible. But generally this means we test all supported combinations of algorithms and data types, and all operators supported by algorithms that support multiple operators (e.g. reductions, groupby). Here are some other guidelines.
é
in test data.offset
). This is an easy to forget case.cudf::test::NumericTypes
type list, but are included in cudf::test::FixedWidthTypes
, so be careful that tests either include or exclude decimal types as appropriate.The naming of unit test directories and source files should be consistent with the feature being tested. For example, the tests for APIs in copying.hpp
should live in cudf/cpp/tests/copying
. Each feature (or set of related features) should have its own test source file named <feature>_tests.cu/cpp
. For example, cudf/cpp/src/copying/scatter.cu
has tests in cudf/cpp/tests/copying/scatter_tests.cu
.
In the interest of improving compile time, whenever possible, test source files should be .cpp
files because nvcc
is slower than gcc
in compiling host code. Note that thrust::device_vector
includes device code, and so must only be used in .cu
files. rmm::device_uvector
, rmm::device_buffer
and the various column_wrapper
types described later can be used in .cpp
files, and are therefore preferred in test code over thrust::device_vector
.
All libcudf unit tests should make use of a GTest "Test Fixture". Even if the fixture is empty, it should inherit from the base fixture cudf::test::BaseFixture
found in include/cudf_test/base_fixture.hpp
. This ensures that RMM is properly initialized and finalized. cudf::test::BaseFixture
already inherits from testing::Test
and therefore it is not necessary for your test fixtures to inherit from it.
Example:
class MyTestFixture : public cudf::test::BaseFixture {...};
In general, libcudf features must work across all of the supported types (there are exceptions e.g. not all binary operations are supported for all types). In order to automate the process of running the same tests across multiple types, we use GTest's Typed Tests. Typed tests allow you to write a test once and run it across a list of types.
For example:
To specify the list of types to use, instead of GTest's testing::Types<...>
, libcudf provides cudf::test::Types<...>
which is a custom, drop-in replacement for testing::Types
. In this example, all tests using the TypedTestFixture
fixture will run once for each type in the list defined in TestTypes
(int, float, double
).
The list of types that are used in tests should be consistent across all tests. To ensure consistency, several sets of common type lists are provided in include/cudf_test/type_lists.hpp
. For example, cudf::test::NumericTypes
is a type list of all numeric types, FixedWidthTypes
is a list of all fixed-width element types, and cudf::test::AllTypes
is a list of every element type that libcudf supports.
Whenever possible, use one of the type list provided in include/utilities/test/type_lists.hpp
rather than creating new custom lists.
Sometimes it is necessary to generate more advanced type lists than the simple lists of single types in the TypeList
example above. libcudf provides a set of meta-programming utilities in include/cudf_test/type_list_utilities.hpp
for generating and composing more advanced type lists.
For example, it may be useful to generate a nested type list where each element in the list is two types. In a nested type list, each element in the list is itself another list. In order to access the N
th type within the nested list, use GetType<NestedList, N>
.
Imagine testing all possible two-type combinations of <int,float>
. This could be done manually:
The above example manually specifies all pairs composed of int
and float
. CrossProduct
is a utility in type_list_utilities.hpp
which materializes this cross product automatically.
CrossProduct
can be used with an arbitrary number of type lists to generate nested type lists of two or more types. However, overuse of CrossProduct
can dramatically inflate compile time. The cross product of two type lists of size n
and m
will result in a new list with n*m
nested type lists. This means n*m
templates will be instantiated; n
and m
need not be large before compile time becomes unreasonable.
There are a number of other utilities in type_list_utilities.hpp
. For more details, see the documentation in that file and their associated tests in cudf/cpp/tests/utilities_tests/type_list_tests.cpp
.
libcudf provides a number of utilities in include/cudf_test
to make common testing operations more convenient. Before creating your own test utilities, look to see if one already exists that does what you need. If not, consider adding a new utility to do what you need. However, make sure that the utility is generic enough to be useful for other tests and is not overly tailored to your specific testing need.
In order to make generating input columns easier, libcudf provides the *_column_wrapper
classes in include/cudf_test/column_wrapper.hpp
. These classes wrap a cudf::column
and provide constructors for initializing a cudf::column
object usable with libcudf APIs. Any *_column_wrapper
class is implicitly convertible to a column_view
or mutable_column_view
and therefore may be transparently passed to any API expecting a column_view
or mutable_column_view
argument.
The cudf::test::fixed_width_column_wrapper
class should be used for constructing and initializing columns of any fixed-width element type, e.g., numeric types, timestamp types, Boolean, etc. cudf::test::fixed_width_column_wrapper
provides constructors that accept an iterator range to generate each element in the column. For nullable columns, an additional iterator can be provided to indicate the validity of each element. There are also constructors that accept a std::initializer_list<T>
for the column elements and optionally for the validity of each element.
Example:
The cudf::test::fixed_point_column_wrapper
class should be used for constructing and initializing columns of any fixed-point element type (DECIMAL32 or DECIMAL64). cudf::test::fixed_point_column_wrapper
provides constructors that accept an iterator range to generate each element in the column. For nullable columns, an additional iterator can be provided to indicate the validity of each element. Constructors also take the scale of the fixed-point values to create.
Example:
The cudf::test::dictionary_column_wrapper
class should be used to create dictionary columns. cudf::test::dictionary_column_wrapper
provides constructors that accept an iterator range to generate each element in the column. For nullable columns, an additional iterator can be provided to indicate the validity of each element. There are also constructors that accept a std::initializer_list<T>
for the column elements and optionally for the validity of each element.
Example:
The cudf::test::strings_column_wrapper
class should be used to create columns of strings. It provides constructors that accept an iterator range to generate each string in the column. For nullable columns, an additional iterator can be provided to indicate the validity of each string. There are also constructors that accept a std::initializer_list<std::string>
for the column's strings and optionally for the validity of each element.
Example:
The cudf::test::lists_column_wrapper
class should be used to create columns of lists. It provides constructors that accept an iterator range to generate each list in the column. For nullable columns, an additional iterator can be provided to indicate the validity of each list. There are also constructors that accept a std::initializer_list<T>
for the column's lists and optionally for the validity of each element. A number of other constructors are available.
Example:
The cudf::test::structs_column_wrapper
class should be used to create columns of structs. It provides constructors that accept a vector or initializer list of pre-constructed columns or column wrappers for child columns. For nullable columns, an additional iterator can be provided to indicate the validity of each struct.
Examples:
A common operation in testing is verifying that two columns are equal, or equivalent, or that they have the same metadata.
Verifies that two columns have the same type, size, and nullability. For nested types, recursively verifies the equality of type, size and nullability of all nested children.
Verifies that two columns have equivalent type and equal size, ignoring nullability. For nested types, recursively verifies the equivalence of type, and equality of size of all nested children, ignoring nullability.
Note "equivalent type". Most types are equivalent if and only they are equal. fixed_point
types are one exception. They are equivalent if the representation type is equal, even if they have different scales. Nested type columns can be equivalent in the case where they both have zero size, but one has children (also empty) and the other does not. For columns with nonzero size, both equals and equivalent expect equal number of children.
Verifies that two columns have equal properties and verifies elementwise equality of the column data. Null elements are treated as equal.
Verifies that two columns have equivalent properties and verifies elementwise equivalence of the column data. Null elements are treated as equivalent.
Verifies the bitwise equality of two device memory buffers.
Column comparison functions in the cudf::test::detail
namespace should NOT be used directly.
The <cudf_test/debug_utilities.hpp>
header defines various functions and overloads for printing columns (print
), converting column data to string (to_string
, to_strings
), and copying data to the host (to_host
). For example, to print a cudf::column_view
contents or column_wrapper
instance to the console use the cudf::test::print()
:
Fixed-width and strings columns output as comma-separated entries including null rows. Nested columns are also supported and output includes the offsets and data children as well as the null mask bits.
libcudf employs a custom-built preload library to validate its internal stream usage (the code may be found here
). This library wraps every asynchronous CUDA runtime API call that accepts a stream with a check to ensure that the passed CUDA stream is a valid one, immediately throwing an exception if an invalid stream is detected. Running tests with this library loaded immediately triggers errors if any test accidentally runs code on an invalid stream.
Stream validity is determined by overloading the definition of libcudf's default stream. Normally, in libcudf cudf::get_default_stream
returns one of rmm
's default stream values (depending on whether or not libcudf is compiled with per thread default stream enabled). In the preload library, this function is redefined to instead return a new user-created stream managed using a function-local static rmm::cuda_stream
. An invalid stream in this situation is defined as any of CUDA's default stream values (cudaStreamLegacy, cudaStreamDefault, or cudaStreamPerThread), since any kernel that properly uses cudf::get_default_stream
will now instead be using the custom stream created by the preload library.
The preload library supports two different modes, cudf
mode and testing
mode. The previous paragraph describes the behavior of cudf
mode, where cudf::get_default_stream
is overloaded. In cudf
mode, the preload library ensures that all CUDA runtime APIs are being provided cudf's default stream. This will detect oversights where, for example, a Thrust call has no stream specified, or when one of CUDA's default stream values is explicitly specified to a kernel. However, it will not detect cases where a stream is not correctly forwarded down the call stack, for instance if some detail
function that accepts a stream parameter fails to forward it along and instead erroneously calls cudf::get_default_stream
instead.
In testing
mode, the library instead overloads cudf::test::get_default_stream
. This function defined in the cudf::test
namespace enables a more stringent mode of testing. In testing
mode, the preload library instead verifies that all CUDA runtime APIs are instead called using the test namespace's default stream. This distinction is important because cudf internals never use cudf::test::get_default_stream
, so this stream value can only appear internally if it was provided to a public API and forwarded properly all the way down the call stack. While testing
mode is more strict than cudf
mode, it is also more intrusive. cudf
mode can operate with no changes to the library or the tests because the preload library overwrites the relevant APIs in place. testing
mode, however, can only be used to validate tests that are correctly passing cudf::test::get_default_stream
to public libcudf APIs.
In addition to the preload library, the test suite also implements a custom memory resource that performs analogous stream verification when its do_allocate
method is called. During testing this rmm's default memory resource is set to use this adaptor for additional stream validation.
When writing tests for a libcudf API, a special set of additional tests should be added to validate the API's stream usage. These tests should be placed in the cpp/tests/streams
directory in a file corresponding to the header containing the tested APIs, e.g. cpp/tests/streams/copying_test.cpp
for all APIs declared in cpp/include/cudf/copying.hpp
. These tests should contain a minimal invocation of the tested API with no additional assertions since they are solely designed to check stream usage. When adding these tests to cpp/tests/CMakeLists.txt
, the ConfigureTest
CMake function should be provided the arguments STREAM_MODE testing
. This change is sufficient for CTest to set up the test to automatically load the preload library compiled in testing
mode when running the test.
The rest of the test suite is configured to run with the preload library in cudf
mode. As a result, all test runs with ctest
will always include stream validation. Since this configuration is managed via CMake and CTest, direct execution of the test executables will not use the preload library at all. Tests will still run and pass normally in this situation, however (with the exception of the test of the preload library itself).