Testing cuDF#

Tooling#

Tests in cuDF are written using pytest. Test coverage is measured using coverage.py, specifically the pytest-cov plugin. Code coverage reports are uploaded to Codecov. Each PR also indicates whether it increases or decreases test coverage.

Test organization#

How tests are organized depends on which of the following two groups they fall into:

  1. Free functions such as cudf.merge that operate on classes like DataFrame or Series.

  2. Methods of the above classes.

Tests of free functions should be grouped into files based on the API sections in the documentation. This places tests of similar functionality in the same module. Tests of class methods should be organized in the same way, except that this organization should be within a subdirectory corresponding to the class. For instance, tests of DataFrame indexing should be placed into dataframe/test_indexing.py. In cases where tests may be shared by multiple classes sharing a common parent (e.g. DataFrame and Series both require IndexedFrame tests), the tests may be placed in a directory corresponding to the parent class.

Test contents#

Writing tests#

In general, functionality must be tested for both standard and exceptional cases. Standard use cases may be covered using parametrization (using pytest.mark.parametrize). Tests of standard use cases should typically include some coverage of:

  • Different dtypes, including nested dtypes (especially strings)

  • Mixed objects, e.g. binary operations between DataFrame and Series

  • Operations on scalars

  • Verifying all combinations of parameters for complex APIs like cudf.merge.

Here are some of the most common exceptional cases to test:

  1. Series/DataFrame/Index with zero rows

  2. DataFrame with zero columns

  3. All null data

  4. For string or list APIs, empty strings/lists

  5. For list APIs, lists containing all null elements or empty strings

  6. For numeric data:

  7. All 0s.

  8. All 1s.

  9. Containing/all inf

  10. Containing/all nan

  11. INT${PRECISION}_MAX for a given precision (e.g. 2**32 for int32).

Most specific APIs will also include a range of other cases.

In general, it is preferable to write separate tests for different exceptional cases. Excessive parametrization and branching increases complexity and obfuscates the purpose of a test. Typically, exception cases require specific assertions or other special logic, so they are best kept separate. The main exception to this rule is tests based on comparison to pandas. Such tests may test exceptional cases alongside more typical cases since the logic is generally identical.

Parametrization: custom fixtures and pytest.mark.parametrize#

When it comes to parametrizing tests written with pytest, the two main options are fixtures and mark.parametrize. By virtue of being functions, fixtures are both more verbose and more self-documenting. Fixtures also have the significant benefit of being constructed lazily, whereas parametrizations are constructed at test collection time.

In general, these approaches are applicable to parametrizations of different complexity. For the purpose of this discussion, we define a parametrization as “simple” if it is composed of a list (possibly nested) of primitive objects. Examples include a list of integers or a list of list of strings. This does not include e.g. cuDF or pandas objects. In particular, developers should avoid performing GPU memory allocations during test collection.

With that in mind, here are some ground rules for how to parametrize.

Use pytest.mark.parametrize when:

  • One test must be run on many inputs and those inputs are simple to construct.

Use fixtures when:

  • One or more tests must be run on the same set of inputs, and all of those inputs can be constructed with simple parametrizations. In practice, that means that it is acceptable to use a fixture like this:

        @pytest.fixture(params=["a", "b"])
        def foo(request):
            if request.param == "a":
                # Some complex initialization
            elif request.param == "b":
                # Some other complex initialization
    

    In other words, the construction of the fixture may be complex, as long as the parametrization of that construction is simple.

  • One or more tests must be run on the same set of inputs, and at least one of those inputs requires complex parametrizations. In this case, the parametrization of a fixture should be decomposed by using fixtures that depend on other fixtures.

        @pytest.fixture(params=["a", "b"])
        def foo(request):
            if request.param == "a":
                # Some complex initialization
            elif request.param == "b":
                # Some other complex initialization
    
        @pytest.fixture
        def bar(foo):
           # do something with foo like initialize a cudf object.
    
        def test_some_property(bar):
            # will be run for each value of bar that results from each value of foo.
            assert some_property_of(bar)
    

Complex parametrizations#

The lists above document common use cases. However, more complex cases may arise. One of the most common alternatives is where, given a set of test cases, different tests need to run on different subsets with a nonempty intersection. Fixtures and parametrization are only capable of handling the Cartesian product of parameters, i.e. “run this test for all values of a and all values of b”.

There are multiple potential solutions to this problem. One possibility is to encapsulate common test logic in a helper function, then call it from multiple test_* functions that construct the necessary inputs. Another possibility is to use functions rather than fixtures to construct inputs, allowing for more flexible input construction:

def get_values(predicate):
    values = range(10)
    yield from filter(predicate, values)

def test_evens():
    for v in get_values(lambda x: x % 2 == 0):
        # Execute test

def test_odds():
    for v in get_values(lambda x: x % 2 == 1):
        # Execute test

Other approaches are also possible, and the best solution should be discussed on a case-by-case basis during PR review.

Tests with expected failures (xfails)#

In some circumstances it makes sense to mark a test as expected to fail, perhaps because the functionality is not yet implemented in cuDF. To do so use the pytest.mark.xfail fixture on the test.

If the test is parametrized and only a single parameter is expected to fail, rather than marking the entire test as xfailing, mark the single parameter by creating a pytest.param with appropriate marks.

@pytest.mark.parametrize(
    "value",
    [
        1,
        2,
        pytest.param(
            3, marks=pytest.mark.xfail(reason="code doesn't work for 3")
        ),
    ],
)
def test_value(value):
    assert value < 3

When marking an xfailing test, provide a descriptive reason. This should include a link to an issue describing the problem so that progress towards fixing the problem can be tracked. If no such issue exists already, create one!

Conditional xfails#

Sometimes, a parametrized test is only expected to fail for some combination of its parameters. Say, for example, division by zero but only if the datatype is bool. If all combinations with a given parameter are expected to fail, one can mark the parameter with pytest.mark.xfail, indicating a reason for the expected failure. If only some of the combinations are expected to fail, it can be tempting to mark the parameter as xfail, but this should be avoided. A test marked as xfail that passes is an “unexpected pass” or XPASS which is considered a failure by the test suite since we use the pytest option xfail_strict=true. Another option is to use the programmatic pytest.xfail function to fail in the test body to xfail the relevant combination of parameters. DO NOT USE THIS OPTION. Unlike the mark-based approach, pytest.xfail does not run the rest of the test body, so we will never know if the test starts to pass because the bug is fixed. Use of pytest.xfail is checked for, and forbidden, via a pre-commit hook.

Instead, to handle this (hopefully rare) case, we can programmatically mark a test as expected to fail under a combination of conditions by applying the pytest.mark.xfail mark to the current test request. To achieve this, the test function should take an extra parameter named request, on which we call applymarker:

@pytest.mark.parametrize("v1", [1, 2, 3])
@pytest.mark.parametrize("v2", [1, 2, 3])
def test_sum_lt_6(request, v1, v2):
    request.applymarker(
        pytest.mark.xfail(
            condition=(v1 == 3 and v2 == 3),
            reason="Add comment linking to relevant issue",
        )
    )
    assert v1 + v2 < 6

This way, when the bug is fixed, the test suite will fail at this point (and we will remember to update the test).

Testing code that throws warnings#

Some code may be expected to throw warnings. A common example is when a cudf API is deprecated for future removal, but many other possibilities exist as well. The cudf testing suite surfaces all warnings as errors. This includes warnings raised from non-cudf code, such as calls to pandas or pyarrow. This setting forces developers to proactively deal with deprecations from other libraries, as well as preventing the internal use of deprecated cudf APIs in other parts of the library. Just as importantly, it can help catch real errors like integer overflow or division by zero.

When testing code that is expected to throw a warnings, developers should use the pytest.warns context to catch the warning. For parametrized tests that raise warnings under specific conditions, use the testing._utils.expect_warning_if decorator instead of pytest.warns.

Warning

warnings.catch_warnings is a tempting alternative to pytest.warns. Do not use this context manager in tests. Unlike pytest.warns, which requires that the expected warning be raised, warnings.catch_warnings simply catches warnings that appear without requiring them. The cudf testing suite should avoid such ambiguities.

Testing utility functions#

The cudf.testing subpackage provides a handful of utilities for testing the equality of objects. The internal cudf.testing._utils module provides additional helper functions for use in tests. In particular:

  • testing._utils.assert_eq is the biggest hammer to reach for. It can be used to compare any pair of objects.

  • For comparing specific objects, use testing.testing.assert_[frame|series|index]_equal.

  • For verifying that the expected assertions are raised, use testing._utils.assert_exceptions_equal.

Version testing#

It is recommended to have cudf pytests only work on the latest supported pandas version i.e., PANDAS_CURRENT_SUPPORTED_VERSION. Any anticipated failures should be either skipped or xfailed.

For example:

@pytest.mark.skipif(PANDAS_VERSION < PANDAS_CURRENT_SUPPORTED_VERSION, reason="bug in older version of pandas")
def test_bug_from_older_pandas_versions(...):
    ...

@pytest.mark.xfail(PANDAS_VERSION >= PANDAS_CURRENT_SUPPORTED_VERSION, reason="bug in latest version of pandas")
def test_bug_in_current_and_maybe_future_versions(...):
    ...

If pandas makes a bugfix release and fixes this, then we’ll see it in CI immediately, patch it, and bump PANDAS_CURRENT_SUPPORTED_VERSION which also usually happens during pandas upgrades.