DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)#

Return a random sample of items from an axis of object.

If reproducible results are required, a random number generator may be provided via the random_state parameter. This function will always produce the same sample given an identical random_state.

nint, optional

Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional

Fraction of axis items to return. Cannot be used with n.

replacebool, default False

Allow or disallow sampling of the same row more than once. replace == True is not supported for axis = 1/”columns”. replace == False is not supported for axis = 0/”index” given random_state is None or a cupy random state, and weights is specified.

weightsndarray-like, optional

Default None for uniform probability distribution over rows to sample from. If ndarray is passed, the length of weights should equal to the number of rows to sample from, and will be normalized to have a sum of 1. Unlike pandas, index alignment is not currently not performed.

random_stateint, numpy/cupy RandomState, or None, default None

If None, default cupy random state is chosen. If int, the seed for the default cupy random state. If RandomState, rows-to-sample are generated from the RandomState.

axis{0 or index, 1 or columns, None}, default None

Axis to sample. Accepts axis number or name. Default is stat axis for given data type (0 for Series and DataFrames). Series doesn’t support axis=1.

ignore_indexbool, default False

If True, the resulting index will be labeled 0, 1, …, n - 1.

Series or DataFrame

A new object of same type as caller containing n items randomly sampled from the caller object.


When sampling from axis=0/'index', random_state can be either a numpy random state (numpy.random.RandomState) or a cupy random state (cupy.random.RandomState). When a numpy random state is used, the output is guaranteed to match the output of the corresponding pandas method call, but generating the sample may be slow. If exact pandas equivalence is not required, using a cupy random state will achieve better performance, especially when sampling large number of items. It’s advised to use the matching ndarray type to the random state for the weights array.


>>> import cudf as cudf
>>> df = cudf.DataFrame({"a":{1, 2, 3, 4, 5}})
>>> df.sample(3)
1  2
3  4
0  1
>>> sr = cudf.Series([1, 2, 3, 4, 5])
>>> sr.sample(10, replace=True)
1    4
3    1
2    4
0    5
0    1
4    5
4    1
0    2
0    3
3    2
dtype: int64
>>> df = cudf.DataFrame(
...     {"a": [1, 2], "b": [2, 3], "c": [3, 4], "d": [4, 5]}
... )
>>> df.sample(2, axis=1)
   a  c
0  1  3
1  2  4