cudf.core.column.string.StringMethods.contains#
- StringMethods.contains(pat: str | Sequence, case: bool = True, flags: int = 0, na=nan, regex: bool = True) SeriesOrIndex [source]#
Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
- Parameters:
- patstr or list-like
Character sequence or regular expression. If
pat
is list-like then regular expressions are not accepted.- flagsint, default 0 (no flags)
Flags to pass through to the regex engine (e.g. re.MULTILINE)
- regexbool, default True
If True, assumes the pattern is a regular expression. If False, treats the pattern as a literal string.
- Returns:
- Series/Index of bool dtype
A Series/Index of boolean dtype indicating whether the given pattern is contained within the string of each element of the Series/Index.
Examples
>>> import cudf >>> s1 = cudf.Series(['Mouse', 'dog', 'house and parrot', '23', None]) >>> s1 0 Mouse 1 dog 2 house and parrot 3 23 4 <NA> dtype: object >>> s1.str.contains('og', regex=False) 0 False 1 True 2 False 3 False 4 <NA> dtype: bool
Returning an Index of booleans using only a literal pattern.
>>> data = ['Mouse', 'dog', 'house and parrot', '23.0', np.nan] >>> idx = cudf.Index(data) >>> idx Index(['Mouse', 'dog', 'house and parrot', '23.0', None], dtype='object') >>> idx.str.contains('23', regex=False) Index([False, False, False, True, <NA>], dtype='bool')
Returning ‘house’ or ‘dog’ when either expression occurs in a string.
>>> s1.str.contains('house|dog', regex=True) 0 False 1 True 2 True 3 False 4 <NA> dtype: bool
Returning any digit using regular expression.
>>> s1.str.contains('\d', regex=True) 0 False 1 False 2 False 3 True 4 <NA> dtype: bool
Ensure
pat
is a not a literal pattern whenregex
is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0.>>> s2 = cudf.Series(['40', '40.0', '41', '41.0', '35']) >>> s2.str.contains('.0', regex=True) 0 True 1 True 2 False 3 True 4 False dtype: bool
The
pat
may also be a sequence of strings in which case the individual strings are searched in corresponding rows.>>> s2 = cudf.Series(['house', 'dog', 'and', '', '']) >>> s1.str.contains(s2) 0 False 1 True 2 True 3 True 4 <NA> dtype: bool
Pandas Compatibility Note
The parameters case and na are not yet supported and will raise a NotImplementedError if anything other than the default value is set. The flags parameter currently only supports re.DOTALL and re.MULTILINE.