make_regression#

cuml.datasets.make_regression(n_samples=100, n_features=2, n_informative=2, n_targets=1, bias=0.0, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=False, random_state=None, dtype='single') Union[Tuple[CumlArray, CumlArray], Tuple[CumlArray, CumlArray, CumlArray]][source]#

Generate a random regression problem.

See https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html

Parameters:
n_samplesint, optional (default=100)

The number of samples.

n_featuresint, optional (default=2)

The number of features.

n_informativeint, optional (default=2)

The number of informative features, i.e., the number of features used to build the linear model used to generate the output.

n_targetsint, optional (default=1)

The number of regression targets, i.e., the dimension of the y output vector associated with a sample. By default, the output is a scalar.

biasfloat, optional (default=0.0)

The bias term in the underlying linear model.

effective_rankint or None, optional (default=None)
if not None:

The approximate number of singular vectors required to explain most of the input data by linear combinations. Using this kind of singular spectrum in the input allows the generator to reproduce the correlations often observed in practice.

if None:

The input set is well conditioned, centered and gaussian with unit variance.

tail_strengthfloat between 0.0 and 1.0, optional (default=0.5)

The relative importance of the fat noisy tail of the singular values profile if effective_rank is not None.

noisefloat, optional (default=0.0)

The standard deviation of the gaussian noise applied to the output.

shuffleboolean, optional (default=True)

Shuffle the samples and the features.

coefboolean, optional (default=False)

If True, the coefficients of the underlying linear model are returned.

random_stateint, RandomState instance or None (default)

Seed for the random number generator for dataset creation.

dtype: string or numpy dtype (default: ‘single’)

Type of the data. Possible values: float32, float64, ‘single’, ‘float’ or ‘double’.

Returns:
outdevice array of shape [n_samples, n_features]

The input samples.

valuesdevice array of shape [n_samples, n_targets]

The output values.

coefdevice array of shape [n_features, n_targets], optional

The coefficient of the underlying linear model. It is returned only if coef is True.

Examples

>>> from cuml.datasets.regression import make_regression
>>> from cuml.linear_model import LinearRegression

>>> # Create regression problem
>>> data, values = make_regression(n_samples=200, n_features=12,
...                                n_informative=7, bias=-4.2,
...                                noise=0.3, random_state=10)

>>> # Perform a linear regression on this problem
>>> lr = LinearRegression()
>>> reg = lr.fit(data, values)
>>> print(reg.coef_)
[-2.6980877e-02  7.7027252e+01  1.1498465e+01  8.5468025e+00
5.8548538e+01  6.0772545e+01  3.6876743e+01  4.0023815e+01
4.3908358e-03 -2.0275116e-02  3.5066366e-02 -3.4512520e-02]