Classes
struct	param
	contains all the hyper-parameters for training More...

struct	node
	Represents a node in the syntax tree. More...

struct	program
	The main data structure to store the AST that represents a program in the current generation. More...

Typedefs
typedef program *	program_t

Enumerations
enum class	metric_t : uint32_t { mae , mse , rmse , pearson , spearman , logloss }

enum class	init_method_t : uint32_t { grow , full , half_and_half }

enum class	transformer_t : uint32_t { sigmoid }

enum class	mutation_t : uint32_t { none , crossover , subtree , hoist , point , reproduce }

Functions
std::string	stringify (const program &prog)
	Visualize an AST. More...

void	symFit (const raft::handle_t &handle, const float input, const float labels, const float *sample_weights, const int n_rows, const int n_cols, param ¶ms, program_t &final_progs, std::vector< std::vector< program >> &history)
	Fit either a regressor, classifier or a transformer to the given dataset. More...

void	symRegPredict (const raft::handle_t &handle, const float input, const int n_rows, const program_t &best_prog, float output)
	Make predictions for a symbolic regressor. More...

void	symClfPredictProbs (const raft::handle_t &handle, const float input, const int n_rows, const param ¶ms, const program_t &best_prog, float output)
	Probability prediction for a symbolic classifier. If a transformer(like sigmoid) is specified, then it is applied on the output before returning it. More...

void	symClfPredict (const raft::handle_t &handle, const float input, const int n_rows, const param ¶ms, const program_t &best_prog, float output)
	Return predictions for a binary classification program defining the decision boundary. More...

void	symTransform (const raft::handle_t &handle, const float input, const param ¶ms, const program_t &final_progs, const int n_rows, const int n_cols, float output)
	Transform the values in the input feature matrix according to the supplied programs. More...

void	execute (const raft::handle_t &h, const program_t &d_progs, const int n_rows, const int n_progs, const float data, float y_pred)
	Calls the execution kernel to evaluate all programs on the given dataset. More...

void	compute_metric (const raft::handle_t &h, int n_rows, int n_progs, const float y, const float y_pred, const float w, float score, const param ¶ms)
	Compute the loss based on the metric specified in the training hyperparameters. It performs a batched computation for all programs in one shot. More...

void	find_fitness (const raft::handle_t &h, program_t &d_prog, float score, const param ¶ms, const int n_rows, const float data, const float y, const float sample_weights)
	Computes the fitness scores for a sngle program on the given dataset. More...

void	find_batched_fitness (const raft::handle_t &h, int n_progs, program_t &d_progs, float score, const param ¶ms, const int n_rows, const float data, const float y, const float sample_weights)
	Computes the fitness scores for all programs on the given dataset. More...

void	set_fitness (const raft::handle_t &h, program_t &d_prog, program &h_prog, const param ¶ms, const int n_rows, const float data, const float y, const float *sample_weights)
	Computes and sets the fitness scores for a single program on the given dataset. More...

void	set_batched_fitness (const raft::handle_t &h, int n_progs, program_t &d_progs, std::vector< program > &h_progs, const param ¶ms, const int n_rows, const float data, const float y, const float *sample_weights)
	Computes and sets the fitness scores for all programs on the given dataset. More...

float	get_fitness (const program &prog, const param ¶ms)
	Returns precomputed fitness score of program on the host, after accounting for parsimony. More...

int	get_depth (const program &p_out)
	Evaluates and returns the depth of the current program. More...

void	build_program (program &p_out, const param ¶ms, std::mt19937 &rng)
	Build a random program with depth atmost 10. More...

void	point_mutation (const program &prog, program &p_out, const param ¶ms, std::mt19937 &rng)
	Perform a point mutation on the given program(AST) More...

void	crossover (const program &prog, const program &donor, program &p_out, const param ¶ms, std::mt19937 &rng)
	Perform a 'hoisted' crossover mutation using the parent and donor programs. The donor subtree selected is hoisted to ensure our constrains on total depth. More...

void	subtree_mutation (const program &prog, program &p_out, const param ¶ms, std::mt19937 &rng)
	Performs a crossover mutation with a randomly built new program. Since crossover is 'hoisted', this will ensure that depth constrains are not violated. More...

void	hoist_mutation (const program &prog, program &p_out, const param ¶ms, std::mt19937 &rng)
	Perform a hoist mutation on a random subtree of the given program (replace a subtree with a subtree of a subtree) More...

Variables
const int	GENE_TPB = 256

const int	MAX_STACK_SIZE = 20

Typedef Documentation

◆ program_t

typedef program* cuml::genetic::program_t

program_t is a shorthand for device programs

Enumeration Type Documentation

◆ init_method_t

enum cuml::genetic::init_method_t : uint32_t

strong

Type of initialization of the member programs in the population

Enumerator
grow	random nodes chosen, allowing shorter or asymmetrical trees
full	growing till a randomly chosen depth
half_and_half	50% of the population on `grow` and the rest with `full`

◆ metric_t

enum cuml::genetic::metric_t : uint32_t

strong

fitness metric types

Enumerator
mae	mean absolute error (regression-only)
mse	mean squared error (regression-only)
rmse	root mean squared error (regression-only)
pearson	pearson product-moment coefficient (regression and transformation)
spearman	spearman's rank-order coefficient (regression and transformation)
logloss	binary cross-entropy loss (classification-only)

◆ mutation_t

enum cuml::genetic::mutation_t : uint32_t

strong

Mutation types for a program

Enumerator
none	Placeholder for first generation programs
crossover	Crossover mutations
subtree	Subtree mutations
hoist	Hoise mutations
point	Point mutations
reproduce	Program reproduction

◆ transformer_t

enum cuml::genetic::transformer_t : uint32_t

strong

Enumerator
sigmoid	sigmoid function

Function Documentation

◆ build_program()

void cuml::genetic::build_program	(	program &	p_out,
		const param &	params,
		std::mt19937 &	rng
	)

Build a random program with depth atmost 10.

Parameters

p_out	The output program
params	Training hyperparameters
rng	RNG to decide nodes to add

◆ compute_metric()

void cuml::genetic::compute_metric	(	const raft::handle_t &	h,
		int	n_rows,
		int	n_progs,
		const float *	y,
		const float *	y_pred,
		const float *	w,
		float *	score,
		const param &	params
	)

Compute the loss based on the metric specified in the training hyperparameters. It performs a batched computation for all programs in one shot.

Parameters

h	cuML handle
n_rows	The number of labels/rows in the expected output
n_progs	The number of programs being batched
y	Device pointer to the expected output (SIZE = n_samples)
y_pred	Device pointer to the predicted output (SIZE = n_samples * n_progs)
w	Device pointer to sample weights (SIZE = n_samples)
score	Device pointer to final score (SIZE = n_progs)
params	Training hyperparameters

◆ crossover()

void cuml::genetic::crossover	(	const program &	prog,
		const program &	donor,
		program &	p_out,
		const param &	params,
		std::mt19937 &	rng
	)

Perform a 'hoisted' crossover mutation using the parent and donor programs. The donor subtree selected is hoisted to ensure our constrains on total depth.

Parameters

prog	The input program
donor	The donor program
p_out	The result program
params	Training hyperparameters
rng	RNG for subtree selection

◆ execute()

void cuml::genetic::execute	(	const raft::handle_t &	h,
		const program_t &	d_progs,
		const int	n_rows,
		const int	n_progs,
		const float *	data,
		float *	y_pred
	)

Calls the execution kernel to evaluate all programs on the given dataset.

Parameters

h	cuML handle
d_progs	Device pointer to programs
n_rows	Number of rows in the input dataset
n_progs	Total number of programs being evaluated
data	Device pointer to input dataset (in col-major format)
y_pred	Device pointer to output of program evaluation

◆ find_batched_fitness()

void cuml::genetic::find_batched_fitness	(	const raft::handle_t &	h,
		int	n_progs,
		program_t &	d_progs,
		float *	score,
		const param &	params,
		const int	n_rows,
		const float *	data,
		const float *	y,
		const float *	sample_weights
	)

Computes the fitness scores for all programs on the given dataset.

Parameters

h	cuML handle
n_progs	Batch size(Number of programs)
d_progs	Device pointer to list of programs
score	Device pointer to fitness vals computed for all programs
params	Training hyperparameters
n_rows	Number of rows in the input dataset
data	Device pointer to input dataset
y	Device pointer to input labels
sample_weights	Device pointer to sample weights

◆ find_fitness()

void cuml::genetic::find_fitness	(	const raft::handle_t &	h,
		program_t &	d_prog,
		float *	score,
		const param &	params,
		const int	n_rows,
		const float *	data,
		const float *	y,
		const float *	sample_weights
	)

Computes the fitness scores for a sngle program on the given dataset.

Parameters

h	cuML handle
d_prog	Device pointer to program
score	Device pointer to fitness vals
params	Training hyperparameters
n_rows	Number of rows in the input dataset
data	Device pointer to input dataset
y	Device pointer to input labels
sample_weights	Device pointer to sample weights

◆ get_depth()

int cuml::genetic::get_depth ( const program & p_out )

Evaluates and returns the depth of the current program.

Parameters

p_out The given program

Returns: The depth of the current program

◆ get_fitness()

float cuml::genetic::get_fitness	(	const program &	prog,
		const param &	params
	)

Returns precomputed fitness score of program on the host, after accounting for parsimony.

Parameters

prog	The host program
params	Training hyperparameters

Returns: Fitness score corresponding to trained program

◆ hoist_mutation()

void cuml::genetic::hoist_mutation	(	const program &	prog,
		program &	p_out,
		const param &	params,
		std::mt19937 &	rng
	)

Perform a hoist mutation on a random subtree of the given program (replace a subtree with a subtree of a subtree)

Parameters

prog	The input program
p_out	The output program
params	Training hyperparameters
rng	RNG to control subtree selection

◆ point_mutation()

void cuml::genetic::point_mutation	(	const program &	prog,
		program &	p_out,
		const param &	params,
		std::mt19937 &	rng
	)

Perform a point mutation on the given program(AST)

Parameters

prog	The input program
p_out	The result program
params	Training hyperparameters
rng	RNG to decide nodes to mutate

◆ set_batched_fitness()

void cuml::genetic::set_batched_fitness	(	const raft::handle_t &	h,
		int	n_progs,
		program_t &	d_progs,
		std::vector< program > &	h_progs,
		const param &	params,
		const int	n_rows,
		const float *	data,
		const float *	y,
		const float *	sample_weights
	)

Computes and sets the fitness scores for all programs on the given dataset.

Parameters

h	cuML handle
n_progs	Batch size
d_progs	Device pointer to list of programs
h_progs	Host vector of programs corresponding to d_progs
params	Training hyperparameters
n_rows	Number of rows in the input dataset
data	Device pointer to input dataset
y	Device pointer to input labels
sample_weights	Device pointer to sample weights

◆ set_fitness()

void cuml::genetic::set_fitness	(	const raft::handle_t &	h,
		program_t &	d_prog,
		program &	h_prog,
		const param &	params,
		const int	n_rows,
		const float *	data,
		const float *	y,
		const float *	sample_weights
	)

Computes and sets the fitness scores for a single program on the given dataset.

Parameters

h	cuML handle
d_prog	Device pointer to program
h_prog	Host program object
params	Training hyperparameters
n_rows	Number of rows in the input dataset
data	Device pointer to input dataset
y	Device pointer to input labels
sample_weights	Device pointer to sample weights

◆ stringify()

std::string cuml::genetic::stringify ( const program & prog )

Visualize an AST.

Parameters

prog	host object containing the AST

Returns: String representation of the AST

◆ subtree_mutation()

void cuml::genetic::subtree_mutation	(	const program &	prog,
		program &	p_out,
		const param &	params,
		std::mt19937 &	rng
	)

Performs a crossover mutation with a randomly built new program. Since crossover is 'hoisted', this will ensure that depth constrains are not violated.

Parameters

prog	The input program
p_out	The result mutated program
params	Training hyperparameters
rng	RNG to control subtree selection and temporary program addition

◆ symClfPredict()

void cuml::genetic::symClfPredict	(	const raft::handle_t &	handle,
		const float *	input,
		const int	n_rows,
		const param &	params,
		const program_t &	best_prog,
		float *	output
	)

Return predictions for a binary classification program defining the decision boundary.

Parameters

handle	cuML handle
input	device pointer to feature matrix
n_rows	number of rows of the feature matrix
params	host struct containing training hyperparameters
best_prog	Best program obtained after training
output	Device pointer to output predictions

◆ symClfPredictProbs()

void cuml::genetic::symClfPredictProbs	(	const raft::handle_t &	handle,
		const float *	input,
		const int	n_rows,
		const param &	params,
		const program_t &	best_prog,
		float *	output
	)

Probability prediction for a symbolic classifier. If a transformer(like sigmoid) is specified, then it is applied on the output before returning it.

Parameters

handle	cuML handle
input	device pointer to feature matrix
n_rows	number of rows of the feature matrix
params	host struct containing training hyperparameters
best_prog	The best program obtained during training. Inferences are made using this
output	device pointer to output probability(in col major format)

◆ symFit()

void cuml::genetic::symFit	(	const raft::handle_t &	handle,
		const float *	input,
		const float *	labels,
		const float *	sample_weights,
		const int	n_rows,
		const int	n_cols,
		param &	params,
		program_t &	final_progs,
		std::vector< std::vector< program >> &	history
	)

Fit either a regressor, classifier or a transformer to the given dataset.

Parameters

handle	cuML handle
input	device pointer to the feature matrix
labels	device pointer to the label vector of length n_rows
sample_weights	device pointer to the sample weights of length n_rows
n_rows	number of rows of the feature matrix
n_cols	number of columns of the feature matrix
params	host struct containing hyperparameters needed for training
final_progs	device pointer to the final generation of programs(sorted by decreasing fitness)
history	host vector containing the list of all programs in every generation (sorted by decreasing fitness)

Note: This module allocates extra device memory for the nodes of the last generation that is pointed by final_progs[i].nodes for each program i in final_progs. The amount of memory allocated is found at runtime, and is final_progs[i].len * sizeof(node) for each program i. The reason this isn't deallocated within the function is because the resulting memory is needed for executing predictions in symRegPredict, symClfPredict, symClfPredictProbs and symTransform functions. The above device memory is expected to be explicitly deallocated by the caller AFTER calling the predict function.

◆ symRegPredict()

void cuml::genetic::symRegPredict	(	const raft::handle_t &	handle,
		const float *	input,
		const int	n_rows,
		const program_t &	best_prog,
		float *	output
	)

Make predictions for a symbolic regressor.

Parameters

handle	cuML handle
input	device pointer to feature matrix
n_rows	number of rows of the feature matrix
best_prog	device pointer to best AST fit during training
output	device pointer to output values

◆ symTransform()

void cuml::genetic::symTransform	(	const raft::handle_t &	handle,
		const float *	input,
		const param &	params,
		const program_t &	final_progs,
		const int	n_rows,
		const int	n_cols,
		float *	output
	)

Transform the values in the input feature matrix according to the supplied programs.

Parameters

handle	cuML handle
input	device pointer to feature matrix
params	Hyperparameters used during training
final_progs	List of ASTs used for generating new features
n_rows	number of rows of the feature matrix
n_cols	number of columns of the feature matrix
output	device pointer to transformed input

Classes

Typedefs

Enumerations

Functions

Variables

Typedef Documentation

◆ program_t

Enumeration Type Documentation

◆ init_method_t

◆ metric_t

◆ mutation_t

◆ transformer_t

Function Documentation

◆ build_program()

◆ compute_metric()

◆ crossover()

◆ execute()

◆ find_batched_fitness()

◆ find_fitness()

◆ get_depth()

◆ get_fitness()

◆ hoist_mutation()

◆ point_mutation()

◆ set_batched_fitness()

◆ set_fitness()

◆ stringify()

◆ subtree_mutation()

◆ symClfPredict()

◆ symClfPredictProbs()

◆ symFit()

◆ symRegPredict()

◆ symTransform()

Variable Documentation

◆ GENE_TPB

◆ MAX_STACK_SIZE