Classes | Typedefs | Enumerations | Functions | Variables
cuml::genetic Namespace Reference

Classes

struct  param
 contains all the hyper-parameters for training More...
 
struct  node
 Represents a node in the syntax tree. More...
 
struct  program
 The main data structure to store the AST that represents a program in the current generation. More...
 

Typedefs

typedef programprogram_t
 

Enumerations

enum class  metric_t : uint32_t {
  mae , mse , rmse , pearson ,
  spearman , logloss
}
 
enum class  init_method_t : uint32_t { grow , full , half_and_half }
 
enum class  transformer_t : uint32_t { sigmoid }
 
enum class  mutation_t : uint32_t {
  none , crossover , subtree , hoist ,
  point , reproduce
}
 

Functions

std::string stringify (const program &prog)
 Visualize an AST. More...
 
void symFit (const raft::handle_t &handle, const float *input, const float *labels, const float *sample_weights, const int n_rows, const int n_cols, param &params, program_t &final_progs, std::vector< std::vector< program >> &history)
 Fit either a regressor, classifier or a transformer to the given dataset. More...
 
void symRegPredict (const raft::handle_t &handle, const float *input, const int n_rows, const program_t &best_prog, float *output)
 Make predictions for a symbolic regressor. More...
 
void symClfPredictProbs (const raft::handle_t &handle, const float *input, const int n_rows, const param &params, const program_t &best_prog, float *output)
 Probability prediction for a symbolic classifier. If a transformer(like sigmoid) is specified, then it is applied on the output before returning it. More...
 
void symClfPredict (const raft::handle_t &handle, const float *input, const int n_rows, const param &params, const program_t &best_prog, float *output)
 Return predictions for a binary classification program defining the decision boundary. More...
 
void symTransform (const raft::handle_t &handle, const float *input, const param &params, const program_t &final_progs, const int n_rows, const int n_cols, float *output)
 Transform the values in the input feature matrix according to the supplied programs. More...
 
void execute (const raft::handle_t &h, const program_t &d_progs, const int n_rows, const int n_progs, const float *data, float *y_pred)
 Calls the execution kernel to evaluate all programs on the given dataset. More...
 
void compute_metric (const raft::handle_t &h, int n_rows, int n_progs, const float *y, const float *y_pred, const float *w, float *score, const param &params)
 Compute the loss based on the metric specified in the training hyperparameters. It performs a batched computation for all programs in one shot. More...
 
void find_fitness (const raft::handle_t &h, program_t &d_prog, float *score, const param &params, const int n_rows, const float *data, const float *y, const float *sample_weights)
 Computes the fitness scores for a sngle program on the given dataset. More...
 
void find_batched_fitness (const raft::handle_t &h, int n_progs, program_t &d_progs, float *score, const param &params, const int n_rows, const float *data, const float *y, const float *sample_weights)
 Computes the fitness scores for all programs on the given dataset. More...
 
void set_fitness (const raft::handle_t &h, program_t &d_prog, program &h_prog, const param &params, const int n_rows, const float *data, const float *y, const float *sample_weights)
 Computes and sets the fitness scores for a single program on the given dataset. More...
 
void set_batched_fitness (const raft::handle_t &h, int n_progs, program_t &d_progs, std::vector< program > &h_progs, const param &params, const int n_rows, const float *data, const float *y, const float *sample_weights)
 Computes and sets the fitness scores for all programs on the given dataset. More...
 
float get_fitness (const program &prog, const param &params)
 Returns precomputed fitness score of program on the host, after accounting for parsimony. More...
 
int get_depth (const program &p_out)
 Evaluates and returns the depth of the current program. More...
 
void build_program (program &p_out, const param &params, std::mt19937 &rng)
 Build a random program with depth atmost 10. More...
 
void point_mutation (const program &prog, program &p_out, const param &params, std::mt19937 &rng)
 Perform a point mutation on the given program(AST) More...
 
void crossover (const program &prog, const program &donor, program &p_out, const param &params, std::mt19937 &rng)
 Perform a 'hoisted' crossover mutation using the parent and donor programs. The donor subtree selected is hoisted to ensure our constrains on total depth. More...
 
void subtree_mutation (const program &prog, program &p_out, const param &params, std::mt19937 &rng)
 Performs a crossover mutation with a randomly built new program. Since crossover is 'hoisted', this will ensure that depth constrains are not violated. More...
 
void hoist_mutation (const program &prog, program &p_out, const param &params, std::mt19937 &rng)
 Perform a hoist mutation on a random subtree of the given program (replace a subtree with a subtree of a subtree) More...
 

Variables

const int GENE_TPB = 256
 
const int MAX_STACK_SIZE = 20
 

Typedef Documentation

◆ program_t

program_t is a shorthand for device programs

Enumeration Type Documentation

◆ init_method_t

enum cuml::genetic::init_method_t : uint32_t
strong

Type of initialization of the member programs in the population

Enumerator
grow 

random nodes chosen, allowing shorter or asymmetrical trees

full 

growing till a randomly chosen depth

half_and_half 

50% of the population on grow and the rest with full

◆ metric_t

enum cuml::genetic::metric_t : uint32_t
strong

fitness metric types

Enumerator
mae 

mean absolute error (regression-only)

mse 

mean squared error (regression-only)

rmse 

root mean squared error (regression-only)

pearson 

pearson product-moment coefficient (regression and transformation)

spearman 

spearman's rank-order coefficient (regression and transformation)

logloss 

binary cross-entropy loss (classification-only)

◆ mutation_t

enum cuml::genetic::mutation_t : uint32_t
strong

Mutation types for a program

Enumerator
none 

Placeholder for first generation programs

crossover 

Crossover mutations

subtree 

Subtree mutations

hoist 

Hoise mutations

point 

Point mutations

reproduce 

Program reproduction

◆ transformer_t

enum cuml::genetic::transformer_t : uint32_t
strong
Enumerator
sigmoid 

sigmoid function

Function Documentation

◆ build_program()

void cuml::genetic::build_program ( program p_out,
const param params,
std::mt19937 &  rng 
)

Build a random program with depth atmost 10.

Parameters
p_outThe output program
paramsTraining hyperparameters
rngRNG to decide nodes to add

◆ compute_metric()

void cuml::genetic::compute_metric ( const raft::handle_t &  h,
int  n_rows,
int  n_progs,
const float *  y,
const float *  y_pred,
const float *  w,
float *  score,
const param params 
)

Compute the loss based on the metric specified in the training hyperparameters. It performs a batched computation for all programs in one shot.

Parameters
hcuML handle
n_rowsThe number of labels/rows in the expected output
n_progsThe number of programs being batched
yDevice pointer to the expected output (SIZE = n_samples)
y_predDevice pointer to the predicted output (SIZE = n_samples * n_progs)
wDevice pointer to sample weights (SIZE = n_samples)
scoreDevice pointer to final score (SIZE = n_progs)
paramsTraining hyperparameters

◆ crossover()

void cuml::genetic::crossover ( const program prog,
const program donor,
program p_out,
const param params,
std::mt19937 &  rng 
)

Perform a 'hoisted' crossover mutation using the parent and donor programs. The donor subtree selected is hoisted to ensure our constrains on total depth.

Parameters
progThe input program
donorThe donor program
p_outThe result program
paramsTraining hyperparameters
rngRNG for subtree selection

◆ execute()

void cuml::genetic::execute ( const raft::handle_t &  h,
const program_t d_progs,
const int  n_rows,
const int  n_progs,
const float *  data,
float *  y_pred 
)

Calls the execution kernel to evaluate all programs on the given dataset.

Parameters
hcuML handle
d_progsDevice pointer to programs
n_rowsNumber of rows in the input dataset
n_progsTotal number of programs being evaluated
dataDevice pointer to input dataset (in col-major format)
y_predDevice pointer to output of program evaluation

◆ find_batched_fitness()

void cuml::genetic::find_batched_fitness ( const raft::handle_t &  h,
int  n_progs,
program_t d_progs,
float *  score,
const param params,
const int  n_rows,
const float *  data,
const float *  y,
const float *  sample_weights 
)

Computes the fitness scores for all programs on the given dataset.

Parameters
hcuML handle
n_progsBatch size(Number of programs)
d_progsDevice pointer to list of programs
scoreDevice pointer to fitness vals computed for all programs
paramsTraining hyperparameters
n_rowsNumber of rows in the input dataset
dataDevice pointer to input dataset
yDevice pointer to input labels
sample_weightsDevice pointer to sample weights

◆ find_fitness()

void cuml::genetic::find_fitness ( const raft::handle_t &  h,
program_t d_prog,
float *  score,
const param params,
const int  n_rows,
const float *  data,
const float *  y,
const float *  sample_weights 
)

Computes the fitness scores for a sngle program on the given dataset.

Parameters
hcuML handle
d_progDevice pointer to program
scoreDevice pointer to fitness vals
paramsTraining hyperparameters
n_rowsNumber of rows in the input dataset
dataDevice pointer to input dataset
yDevice pointer to input labels
sample_weightsDevice pointer to sample weights

◆ get_depth()

int cuml::genetic::get_depth ( const program p_out)

Evaluates and returns the depth of the current program.

Parameters
p_outThe given program
Returns
The depth of the current program

◆ get_fitness()

float cuml::genetic::get_fitness ( const program prog,
const param params 
)

Returns precomputed fitness score of program on the host, after accounting for parsimony.

Parameters
progThe host program
paramsTraining hyperparameters
Returns
Fitness score corresponding to trained program

◆ hoist_mutation()

void cuml::genetic::hoist_mutation ( const program prog,
program p_out,
const param params,
std::mt19937 &  rng 
)

Perform a hoist mutation on a random subtree of the given program (replace a subtree with a subtree of a subtree)

Parameters
progThe input program
p_outThe output program
paramsTraining hyperparameters
rngRNG to control subtree selection

◆ point_mutation()

void cuml::genetic::point_mutation ( const program prog,
program p_out,
const param params,
std::mt19937 &  rng 
)

Perform a point mutation on the given program(AST)

Parameters
progThe input program
p_outThe result program
paramsTraining hyperparameters
rngRNG to decide nodes to mutate

◆ set_batched_fitness()

void cuml::genetic::set_batched_fitness ( const raft::handle_t &  h,
int  n_progs,
program_t d_progs,
std::vector< program > &  h_progs,
const param params,
const int  n_rows,
const float *  data,
const float *  y,
const float *  sample_weights 
)

Computes and sets the fitness scores for all programs on the given dataset.

Parameters
hcuML handle
n_progsBatch size
d_progsDevice pointer to list of programs
h_progsHost vector of programs corresponding to d_progs
paramsTraining hyperparameters
n_rowsNumber of rows in the input dataset
dataDevice pointer to input dataset
yDevice pointer to input labels
sample_weightsDevice pointer to sample weights

◆ set_fitness()

void cuml::genetic::set_fitness ( const raft::handle_t &  h,
program_t d_prog,
program h_prog,
const param params,
const int  n_rows,
const float *  data,
const float *  y,
const float *  sample_weights 
)

Computes and sets the fitness scores for a single program on the given dataset.

Parameters
hcuML handle
d_progDevice pointer to program
h_progHost program object
paramsTraining hyperparameters
n_rowsNumber of rows in the input dataset
dataDevice pointer to input dataset
yDevice pointer to input labels
sample_weightsDevice pointer to sample weights

◆ stringify()

std::string cuml::genetic::stringify ( const program prog)

Visualize an AST.

Parameters
proghost object containing the AST
Returns
String representation of the AST

◆ subtree_mutation()

void cuml::genetic::subtree_mutation ( const program prog,
program p_out,
const param params,
std::mt19937 &  rng 
)

Performs a crossover mutation with a randomly built new program. Since crossover is 'hoisted', this will ensure that depth constrains are not violated.

Parameters
progThe input program
p_outThe result mutated program
paramsTraining hyperparameters
rngRNG to control subtree selection and temporary program addition

◆ symClfPredict()

void cuml::genetic::symClfPredict ( const raft::handle_t &  handle,
const float *  input,
const int  n_rows,
const param params,
const program_t best_prog,
float *  output 
)

Return predictions for a binary classification program defining the decision boundary.

Parameters
handlecuML handle
inputdevice pointer to feature matrix
n_rowsnumber of rows of the feature matrix
paramshost struct containing training hyperparameters
best_progBest program obtained after training
outputDevice pointer to output predictions

◆ symClfPredictProbs()

void cuml::genetic::symClfPredictProbs ( const raft::handle_t &  handle,
const float *  input,
const int  n_rows,
const param params,
const program_t best_prog,
float *  output 
)

Probability prediction for a symbolic classifier. If a transformer(like sigmoid) is specified, then it is applied on the output before returning it.

Parameters
handlecuML handle
inputdevice pointer to feature matrix
n_rowsnumber of rows of the feature matrix
paramshost struct containing training hyperparameters
best_progThe best program obtained during training. Inferences are made using this
outputdevice pointer to output probability(in col major format)

◆ symFit()

void cuml::genetic::symFit ( const raft::handle_t &  handle,
const float *  input,
const float *  labels,
const float *  sample_weights,
const int  n_rows,
const int  n_cols,
param params,
program_t final_progs,
std::vector< std::vector< program >> &  history 
)

Fit either a regressor, classifier or a transformer to the given dataset.

Parameters
handlecuML handle
inputdevice pointer to the feature matrix
labelsdevice pointer to the label vector of length n_rows
sample_weightsdevice pointer to the sample weights of length n_rows
n_rowsnumber of rows of the feature matrix
n_colsnumber of columns of the feature matrix
paramshost struct containing hyperparameters needed for training
final_progsdevice pointer to the final generation of programs(sorted by decreasing fitness)
historyhost vector containing the list of all programs in every generation (sorted by decreasing fitness)
Note
This module allocates extra device memory for the nodes of the last generation that is pointed by final_progs[i].nodes for each program i in final_progs. The amount of memory allocated is found at runtime, and is final_progs[i].len * sizeof(node) for each program i. The reason this isn't deallocated within the function is because the resulting memory is needed for executing predictions in symRegPredict, symClfPredict, symClfPredictProbs and symTransform functions. The above device memory is expected to be explicitly deallocated by the caller AFTER calling the predict function.

◆ symRegPredict()

void cuml::genetic::symRegPredict ( const raft::handle_t &  handle,
const float *  input,
const int  n_rows,
const program_t best_prog,
float *  output 
)

Make predictions for a symbolic regressor.

Parameters
handlecuML handle
inputdevice pointer to feature matrix
n_rowsnumber of rows of the feature matrix
best_progdevice pointer to best AST fit during training
outputdevice pointer to output values

◆ symTransform()

void cuml::genetic::symTransform ( const raft::handle_t &  handle,
const float *  input,
const param params,
const program_t final_progs,
const int  n_rows,
const int  n_cols,
float *  output 
)

Transform the values in the input feature matrix according to the supplied programs.

Parameters
handlecuML handle
inputdevice pointer to feature matrix
paramsHyperparameters used during training
final_progsList of ASTs used for generating new features
n_rowsnumber of rows of the feature matrix
n_colsnumber of columns of the feature matrix
outputdevice pointer to transformed input

Variable Documentation

◆ GENE_TPB

const int cuml::genetic::GENE_TPB = 256

◆ MAX_STACK_SIZE

const int cuml::genetic::MAX_STACK_SIZE = 20