C++ API#

Model importer utilities#

auto nvforest::import_from_treelite_model(treelite::Model const &tl_model, tree_layout layout = preferred_tree_layout, index_type align_bytes = index_type{}, std::optional<bool> use_double_precision = std::nullopt, raft_proto::device_type dev_type = raft_proto::device_type::cpu, int device = 0, raft_proto::cuda_stream stream = raft_proto::cuda_stream{})#

Import a treelite model to nvForest

Load a model from Treelite to a nvForest forest_model. The model will be inspected to determine the correct underlying decision_forest variant to use within the forest_model object.

Parameters:
  • tl_model – The Treelite Model to load

  • layout – The in-memory layout of nodes in the loaded forest

  • align_bytes – If non-zero, ensure that each tree is stored in a multiple of this value of bytes by padding with empty nodes. This can be useful for increasing the likelihood that successive reads will take place within a single cache line. On GPU, a value of 128 can be used for this purpose. On CPU, a value of either 0 or 64 typically produces optimal performance.

  • use_double_precision – Whether or not to use 64 bit floats for model evaluation and 64 bit ints for applicable indexing

  • dev_type – Which device type to use for inference (CPU or GPU)

  • device – For GPU execution, the device id for the device on which this model is to be loaded

  • stream – The CUDA stream to use for loading this model (can be omitted for CPU).

auto nvforest::import_from_treelite_handle(TreeliteModelHandle tl_handle, tree_layout layout = preferred_tree_layout, index_type align_bytes = index_type{}, std::optional<bool> use_double_precision = std::nullopt, raft_proto::device_type dev_type = raft_proto::device_type::cpu, int device = 0, raft_proto::cuda_stream stream = raft_proto::cuda_stream{})#

Import a treelite model handle to nvForest

Load a model from a Treelite model handle (type-erased treelite::Model object) to a nvForest forest_model. The model will be inspected to determine the correct underlying decision_forest variant to use within the forest_model object.

Parameters:
  • tl_handle – The Treelite ModelHandle to load

  • layout – The in-memory layout of nodes in the loaded forest

  • align_bytes – If non-zero, ensure that each tree is stored in a multiple of this value of bytes by padding with empty nodes. This can be useful for increasing the likelihood that successive reads will take place within a single cache line. On GPU, a value of 128 can be used for this purpose. On CPU, a value of either 0 or 64 typically produces optimal performance.

  • use_double_precision – Whether or not to use 64 bit floats for model evaluation and 64 bit ints for applicable indexing

  • dev_type – Which device type to use for inference (CPU or GPU)

  • device – For GPU execution, the device id for the device on which this model is to be loaded

  • stream – The CUDA stream to use for loading this model (can be omitted for CPU).

template<tree_layout layout>
struct treelite_importer#

Struct used to import a model from Treelite to nvForest

Template Parameters:

layout – The in-memory layout for nodes to be loaded into nvForest

Public Functions

template<index_type variant_index>
inline auto import_to_specific_variant(index_type target_variant_index, treelite::Model const &tl_model, index_type num_class, index_type num_feature, index_type max_num_categories, std::vector<index_type> const &offsets, index_type align_bytes = index_type{}, raft_proto::device_type mem_type = raft_proto::device_type::cpu, int device = 0, raft_proto::cuda_stream stream = raft_proto::cuda_stream{})#

Assuming that the correct decision_forest variant has been identified, import to that variant

inline forest_model import(treelite::Model const &tl_model, index_type align_bytes = index_type{}, std::optional<bool> use_double_precision = std::nullopt, raft_proto::device_type dev_type = raft_proto::device_type::cpu, int device = 0, raft_proto::cuda_stream stream = raft_proto::cuda_stream{})#

Import a treelite model to nvForest

Load a model from Treelite to a nvForest forest_model. The model will be inspected to determine the correct underlying decision_forest variant to use within the forest_model object.

Parameters:
  • tl_model – The Treelite Model to load

  • align_bytes – If non-zero, ensure that each tree is stored in a multiple of this value of bytes by padding with empty nodes. This can be useful for increasing the likelihood that successive reads will take place within a single cache line. On GPU, a value of 128 can be used for this purpose. On CPU, a value of either 0 or 64 typically produces optimal performance.

  • use_double_precision – Whether or not to use 64 bit floats for model evaluation and 64 bit ints for applicable indexing

  • dev_type – Which device type to use for inference (CPU or GPU)

  • device – For GPU execution, the device id for the device on which this model is to be loaded

  • stream – The CUDA stream to use for loading this model (can be omitted for CPU).

Forest classes#

struct forest_model#

A model used for performing inference with nvForest

This struct is a wrapper for all variants of decision_forest supported by a standard nvForest build.

Public Functions

inline forest_model(decision_forest_variant &&forest = decision_forest_variant{})#

Wrap a decision_forest in a full forest_model object

inline auto num_features()#

The number of features per row expected by the model

inline auto num_outputs()#

The number of outputs per row generated by the model

inline auto num_trees()#

The number of trees in the model

inline auto has_vector_leaves()#

Whether or not leaf nodes use vector outputs

inline auto row_postprocessing()#

The operation used for postprocessing all outputs for a single row

inline void set_row_postprocessing(row_op val)#

Setter for row_postprocessing()

inline auto elem_postprocessing()#

The operation used for postprocessing each element of the output for a single row

inline auto memory_type()#

The type of memory (device/host) where the model is stored

inline auto device_index()#

The ID of the device on which this model is loaded

inline auto is_double_precision()#

Whether or not model is loaded at double precision

template<typename io_t>
inline void predict(raft_proto::buffer<io_t> &output, raft_proto::buffer<io_t> const &input, raft_proto::cuda_stream stream = raft_proto::cuda_stream{}, infer_kind predict_type = infer_kind::default_kind, std::optional<index_type> specified_chunk_size = std::nullopt)#

Perform inference on given input

Parameters:
  • output[out] The buffer where model output should be stored. This must be of size at least ROWS x num_outputs().

  • input[in] The buffer containing input data.

  • stream[in] A raft_proto::cuda_stream, which (on GPU-enabled builds) is a transparent wrapper for the cudaStream_t or (on CPU-only builds) a CUDA-free placeholder object.

  • predict_type[in] Type of inference to perform. Defaults to summing the outputs of all trees and produce an output per row. If set to “per_tree”, we will instead output all outputs of individual trees. If set to “leaf_id”, we will output the integer ID of the leaf node for each tree.

  • specified_chunk_size[in] Specifies the mini-batch size for processing. This has different meanings on CPU and GPU, but on GPU it corresponds to the number of rows evaluated per inference iteration on a single block. It can take on any power of 2 from 1 to 32, and runtime performance is quite sensitive to the value chosen. In general, larger batches benefit from higher values, but it is hard to predict the optimal value a priori. If omitted, a heuristic will be used to select a reasonable value. On CPU, this argument can generally just be omitted.

template<typename io_t>
inline void predict(raft_proto::handle_t const &handle, raft_proto::buffer<io_t> &output, raft_proto::buffer<io_t> const &input, infer_kind predict_type = infer_kind::default_kind, std::optional<index_type> specified_chunk_size = std::nullopt)#

Perform inference on given input

Parameters:
  • handle[in] The raft_proto::handle_t (wrapper for raft::handle_t on GPU) which will be used to provide streams for evaluation.

  • output[out] The buffer where model output should be stored. If this buffer is on host while the model is on device or vice versa, work will be distributed across available streams to copy the data back to this output location. This must be of size at least ROWS x num_outputs().

  • input[in] The buffer containing input data. If this buffer is on host while the model is on device or vice versa, work will be distributed across available streams to copy the input data to the appropriate location and perform inference.

  • predict_type[in] Type of inference to perform. Defaults to summing the outputs of all trees and produce an output per row. If set to “per_tree”, we will instead output all outputs of individual trees. If set to “leaf_id”, we will output the integer ID of the leaf node for each tree.

  • specified_chunk_size[in] Specifies the mini-batch size for processing. This has different meanings on CPU and GPU, but on GPU it corresponds to the number of rows evaluated per inference iteration on a single block. It can take on any power of 2 from 1 to 32, and runtime performance is quite sensitive to the value chosen. In general, larger batches benefit from higher values, but it is hard to predict the optimal value a priori. If omitted, a heuristic will be used to select a reasonable value. On CPU, this argument can generally just be omitted.

template<typename io_t>
inline void predict(raft_proto::handle_t const &handle, io_t *output, io_t *input, std::size_t num_rows, raft_proto::device_type out_mem_type, raft_proto::device_type in_mem_type, infer_kind predict_type = infer_kind::default_kind, std::optional<index_type> specified_chunk_size = std::nullopt)#

Perform inference on given input

Parameters:
  • handle[in] The raft_proto::handle_t (wrapper for raft::handle_t on GPU) which will be used to provide streams for evaluation.

  • output[out] Pointer to the memory location where output should end up

  • input[in] Pointer to the input data

  • num_rows[in] Number of rows in input

  • out_mem_type[in] The memory type (device/host) of the output buffer

  • in_mem_type[in] The memory type (device/host) of the input buffer

  • predict_type[in] Type of inference to perform. Defaults to summing the outputs of all trees and produce an output per row. If set to “per_tree”, we will instead output all outputs of individual trees. If set to “leaf_id”, we will output the integer ID of the leaf node for each tree.

  • specified_chunk_size[in] Specifies the mini-batch size for processing. This has different meanings on CPU and GPU, but on GPU it corresponds to the number of rows evaluated per inference iteration on a single block. It can take on any power of 2 from 1 to 32, and runtime performance is quite sensitive to the value chosen. In general, larger batches benefit from higher values, but it is hard to predict the optimal value a priori. If omitted, a heuristic will be used to select a reasonable value. On CPU, this argument can generally just be omitted.

template<tree_layout layout_v, typename threshold_t, typename index_t, typename metadata_storage_t, typename offset_t>
struct decision_forest#

A general-purpose decision forest implementation

This template provides an optimized but generic implementation of a decision forest. Template parameters are used to specialize the implementation based on the size and characteristics of the forest. For instance, the smallest integer that can express the offset between a parent and child node within a tree is used in order to minimize the size of a node, increasing the number that can fit within the L2 or L1 cache.

Template Parameters:
  • layout_v – The in-memory layout of nodes in this forest

  • threshold_t – The floating-point type used for quantitative splits

  • index_t – The integer type used for storing many things within a forest, including the category value of categorical nodes and the index at which vector output for a leaf node is stored.

  • metadata_storage_t – The type used for storing node metadata. The first several bits will be used to store flags indicating various characteristics of the node, and the remaining bits provide the integer index of the feature for this node’s split

  • offset_t – An integer used to indicate the offset between a node and its most distant child. This type must be large enough to store the largest such offset in the entire forest.

Public Types

using forest_type = forest<layout, threshold_t, index_t, metadata_storage_t, offset_t>#

The type of the forest object which is actually passed to the CPU/GPU for inference

using node_type = typename forest_type::node_type#

The type of nodes within the forest

using io_type = typename forest_type::io_type#

The type used for input and output to the model

using threshold_type = threshold_t#

The type used for quantitative splits within the model

using postprocessor_type = postprocessor<io_type>#

The type used to indicate how leaf output should be post-processed

using categorical_storage_type = typename node_type::index_type#

The type used for storing data on categorical nodes

Public Functions

inline decision_forest()#

Construct an empty decision forest

inline decision_forest(raft_proto::buffer<node_type> &&nodes, raft_proto::buffer<index_type> &&root_node_indexes, raft_proto::buffer<index_type> &&node_id_mapping, raft_proto::buffer<io_type> &&bias, index_type num_features, index_type num_outputs = index_type{2}, bool has_categorical_nodes = false, std::optional<raft_proto::buffer<io_type>> &&vector_output = std::nullopt, std::optional<raft_proto::buffer<typename node_type::index_type>> &&categorical_storage = std::nullopt, index_type leaf_size = index_type{1}, row_op row_postproc = row_op::disable, element_op elem_postproc = element_op::disable, io_type average_factor = io_type{1}, io_type postproc_constant = io_type{1})#

Construct a decision forest with the indicated data

Parameters:
  • nodes – A buffer containing all nodes within the forest

  • root_node_indexes – A buffer containing the index of the root node of every tree in the forest

  • node_id_mapping – Mapping to use to convert nvForest’s internal node ID into Treelite’s node ID. Only relevant when predict_type == infer_kind::leaf_id

  • bias – The bias term that is added to the output as part of the postprocessing step. The bias term should have same length as num_outputs.

  • num_features – The number of features per input sample for this model

  • num_outputs – The number of outputs per row from this model

  • has_categorical_nodes – Whether this forest contains any categorical nodes

  • vector_output – A buffer containing the output from all vector leaves for this model. Each leaf node will specify the offset within this buffer at which its vector output begins, and leaf_size will be used to determine how many subsequent entries from the buffer should be used to construct the vector output. A value of std::nullopt indicates that this is not a vector leaf model.

  • categorical_storage – For models with inputs on too many categories to be stored in the bits of an index_t, it may be necessary to store categorical information external to the node itself. This buffer contains the necessary storage for this information.

  • leaf_size – The number of output values per leaf (1 for non-vector leaves; >1 for vector leaves)

  • row_postproc – The post-processing operation to be applied to an entire row of the model output

  • elem_postproc – The per-element post-processing operation to be applied to the model output

  • average_factor – A factor which is used for output normalization

  • postproc_constant – A constant used by some post-processing operations, including sigmoid, exponential, and logarithm_one_plus_exp

inline auto num_features() const#

The number of features per row expected by the model

inline auto num_trees() const#

The number of trees in the model

inline auto has_vector_leaves() const#

Whether or not leaf nodes have vector outputs

inline auto num_outputs(infer_kind inference_kind = infer_kind::default_kind) const#

The number of outputs per row generated by the model for the given type of inference. Note: This will differ from num_outputs argument passed to the constructor, if inference_kind is not default_kind.

inline auto row_postprocessing() const#

The operation used for postprocessing all outputs for a single row

inline auto elem_postprocessing() const#

The operation used for postprocessing each element of the output for a single row

inline auto memory_type()#

The type of memory (device/host) where the model is stored

inline auto device_index()#

The ID of the device on which this model is loaded

inline void predict(raft_proto::buffer<typename forest_type::io_type> &output, raft_proto::buffer<typename forest_type::io_type> const &input, raft_proto::cuda_stream stream = raft_proto::cuda_stream{}, infer_kind predict_type = infer_kind::default_kind, std::optional<index_type> specified_rows_per_block_iter = std::nullopt)#

Perform inference with this model

Parameters:
  • output[out] The buffer where the model output should be stored. This must be of size ROWS x num_outputs().

  • input[in] The buffer containing the input data

  • stream[in] For GPU execution, the CUDA stream. For CPU execution, this optional parameter can be safely omitted.

  • predict_type[in] Type of inference to perform. Defaults to summing the outputs of all trees and produce an output per row. If set to “per_tree”, we will instead output all outputs of individual trees. If set to “leaf_id”, we will output the integer ID of the leaf node for each tree.

  • specified_rows_per_block_iter[in] If non-nullopt, this value is used to determine how many rows are evaluated for each inference iteration within a CUDA block. Runtime performance is quite sensitive to this value, but it is difficult to predict a priori, so it is recommended to perform a search over possible values with realistic batch sizes in order to determine the optimal value. Any power of 2 from 1 to 32 is a valid value, and in general larger batches benefit from larger values.

Public Static Attributes

static auto constexpr const layout = layout_v#

The in-memory layout of nodes in this forest

Enums and constants#

static auto constexpr const nvforest::preferred_tree_layout = tree_layout::breadth_first#

The default memory layout for nvForest trees if not otherwise specified

enum class nvforest::tree_layout : unsigned char#

Enum representing possible memory layout for nvForest trees

Values:

enumerator depth_first#
enumerator breadth_first#
enumerator layered_children_together#
enum class nvforest::infer_kind : unsigned char#

Enum representing distinct prediction tasks

Values:

enumerator default_kind#
enumerator per_tree#
enumerator leaf_id#
enum class nvforest::row_op : unsigned char#

Enum representing possible row-wise operations on output

Values:

enumerator disable#
enumerator softmax#
enumerator max_index#
enum class nvforest::element_op : unsigned char#

Enum representing possible element-wise operations on output

Values:

enumerator disable#
enumerator signed_square#
enumerator hinge#
enumerator sigmoid#
enumerator exponential#
enumerator logarithm_one_plus_exp#

Type aliases#

using nvforest::decision_forest_variant = std::variant<detail::preset_decision_forest<std::variant_alternative_t<0, detail::specialization_variant>::layout, std::variant_alternative_t<0, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<0, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<1, detail::specialization_variant>::layout, std::variant_alternative_t<1, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<1, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<2, detail::specialization_variant>::layout, std::variant_alternative_t<2, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<2, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<3, detail::specialization_variant>::layout, std::variant_alternative_t<3, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<3, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<4, detail::specialization_variant>::layout, std::variant_alternative_t<4, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<4, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<5, detail::specialization_variant>::layout, std::variant_alternative_t<5, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<5, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<6, detail::specialization_variant>::layout, std::variant_alternative_t<6, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<6, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<7, detail::specialization_variant>::layout, std::variant_alternative_t<7, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<7, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<8, detail::specialization_variant>::layout, std::variant_alternative_t<8, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<8, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<9, detail::specialization_variant>::layout, std::variant_alternative_t<9, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<9, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<10, detail::specialization_variant>::layout, std::variant_alternative_t<10, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<10, detail::specialization_variant>::has_large_trees>, detail::preset_decision_forest<std::variant_alternative_t<11, detail::specialization_variant>::layout, std::variant_alternative_t<11, detail::specialization_variant>::is_double_precision, std::variant_alternative_t<11, detail::specialization_variant>::has_large_trees>>#

A variant containing all standard decision_forest instantiations

template<tree_layout layout, bool double_precision, bool large_trees>
using nvforest::detail::preset_decision_forest = decision_forest<layout, typename specialization_types<layout, double_precision, large_trees>::threshold_type, typename specialization_types<layout, double_precision, large_trees>::index_type, typename specialization_types<layout, double_precision, large_trees>::metadata_type, typename specialization_types<layout, double_precision, large_trees>::offset_type>#

A convenience wrapper to simplify template instantiation of decision_forest

This template takes the large range of available template parameters and reduces them to just three standard choices.

Template Parameters:
  • layout – The in-memory layout of nodes in this forest

  • double_precision – Whether this model should use double-precision for floating-point evaluation and 64-bit integers for indexes

  • large_trees – Whether this forest expects more than 2**(16 -3) - 1 = 8191 features or contains nodes whose child is offset more than 2**16 - 1 = 65535 nodes away.