cudf.core.accessors.string.StringMethods.contains#

StringMethods.contains(pat: str | Sequence, case: bool = True, flags: int = 0, na=<no_default>, regex: bool = True) Series | Index[source]#

Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Parameters:
patstr or list-like

Character sequence or regular expression. If pat is list-like then regular expressions are not accepted.

flagsint, default 0 (no flags)

Flags to pass through to the regex engine (e.g. re.MULTILINE)

nascalar, optional

Fill value for missing values. The default depends on dtype of the array. For the "str" dtype, False is used. For object dtype, numpy.nan is used. For the nullable StringDtype, pandas.NA is used.

regexbool, default True

If True, assumes the pattern is a regular expression. If False, treats the pattern as a literal string.

Returns:
Series/Index of bool dtype

A Series/Index of boolean dtype indicating whether the given pattern is contained within the string of each element of the Series/Index.

Examples

>>> import cudf
>>> s1 = cudf.Series(['Mouse', 'dog', 'house and parrot', '23', None])
>>> s1
0               Mouse
1                 dog
2    house and parrot
3                  23
4                None
dtype: object
>>> s1.str.contains('og', regex=False)
0    False
1     True
2    False
3    False
4     <NA>
dtype: bool

Returning an Index of booleans using only a literal pattern.

>>> data = ['Mouse', 'dog', 'house and parrot', '23.0', np.nan]
>>> idx = cudf.Index(data)
>>> idx
Index(['Mouse', 'dog', 'house and parrot', '23.0', <NA>], dtype='object')
>>> idx.str.contains('23', regex=False)
Index([False, False, False, True, <NA>], dtype='bool')

Returning ‘house’ or ‘dog’ when either expression occurs in a string.

>>> s1.str.contains('house|dog', regex=True)
0    False
1     True
2     True
3    False
4     <NA>
dtype: bool

Returning any digit using regular expression.

>>> s1.str.contains('\\d', regex=True)
0    False
1    False
2    False
3     True
4     <NA>
dtype: bool

Ensure pat is a not a literal pattern when regex is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0.

>>> s2 = cudf.Series(['40', '40.0', '41', '41.0', '35'])
>>> s2.str.contains('.0', regex=True)
0     True
1     True
2    False
3     True
4    False
dtype: bool

The pat may also be a sequence of strings in which case the individual strings are searched in corresponding rows.

>>> s2 = cudf.Series(['house', 'dog', 'and', '', ''])
>>> s1.str.contains(s2)
0    False
1     True
2     True
3     True
4     <NA>
dtype: bool

Pandas Compatibility Note

pandas.Series.str.contains()

The parameter case is not yet supported and will raise a NotImplementedError if anything other than the default value is set. The flags parameter currently only supports re.DOTALL and re.MULTILINE.