Contributing Guide#
This document focuses on a high-level overview of best practices in cuDF.
Directory structure and file naming#
cuDF generally presents the same importable modules and subpackages as pandas.
All Cython code is contained in python/cudf/cudf/_lib.
Code style#
cuDF employs a number of linters through pre-commit to ensure consistent style across the code base.
These linting checks must all pass when submitting a pull request, and the
.pre-commit-config.yaml file at the root of the repo contains configurations for all linting tools.
Linter configurations are primarily stored in pyproject.toml, shared among other Python projects, and extended with cudf specific configurations in python/cudf/pyproject.toml
For more information on how to use pre-commit hooks, see the code formatting section of the overall contributing guide.
Deprecating and removing code#
cuDF follows the policy of deprecating code for one release prior to removal.
For example, if we decide to remove an API during the 22.08 release cycle,
it will be marked as deprecated in the 22.08 release and removed in the 22.10 release.
Note that if it is explicitly mentioned in a comment (like # Do not remove until..),
do not enforce the deprecation by removing the affected code until the condition in the comment is met.
When implementing a deprecation:
Remove and replace all internal usage of the deprecated APIs in cuDF
Update the documentation with a Sphinx
deprecateddirective describing the deprecation. For example:Use
warnings.warnwith aFutureWarningand a message describing the deprecation. The deprecation message should:Consist of a single line with no newline characters
Indicate a replacement API(s), if any
NOT specify a future version when the deprecation will occur.
Add a unit test that validates that the warning raises
A mock example of a deprecation:
import warnings
def foo(self):
"""
Return a result from foo
.. deprecated:: 23.08
`foo` is deprecated and will be removed in a future version of cudf.
"""
warnings.warn(
"`Series.foo` is deprecated and will be removed in a future version of cudf. "
"Use `Series.new_foo` instead.",
FutureWarning
)
When enforcing a deprecation:
Remove the API implementation
Remove the associated tests in
python/cudf/cudf/testsRemove references in documentation in
docs/cudf
pandas compatibility#
cuDF API signatures and behaviors should align with the pandas API. While cuDF may support a range of pandas versions, API signatures and behaviors should always align with the latest supported pandas version.
Occasionally, cuDF APIs may deviate from pandas behavior. Common reasons include:
Performance: Match pandas behavior would incur exorbitant runtime or memory costs. Deviations due to performance should be agreed upon by cuDF developers.
Data type representations: cuDF does not support the full type system of pandas and vice versa, commonly encountered with the
objector nested types.Exception messages: The exception type raised in cuDF should match pandas, but the error messages do not need to exactly align.
Warnings: cuDF should generally match warnings raised in APIs that mirror pandas, but some warnings might not be applicable due to intentional differences between both libraries.
Intentional deviations should be documented in the pandas comparison.
If it is not possible to match a pandas API, an entire API or a specific component of an API, at all, it should raise a NotImplementedError.
Catching warnings from dependencies#
If a cuDF API raises a warning from a cuDF dependency and cannot be reasonably addressed in the API, use warnings.catch_warnings to suppress the warning from the users.