Strings Regex#

group strings_regex

Enums

enum regex_flags#

Regex flags.

These types can be or’d to combine them. The values are chosen to leave room for future flags and to match the Python flag values.

Values:

enumerator DEFAULT#

default

enumerator MULTILINE#

the ‘^’ and ‘$’ honor new-line characters

enumerator DOTALL#

the ‘.’ matching includes new-line characters

enumerator ASCII#

use only ASCII when matching built-in character classes

enumerator EXT_NEWLINE#

new-line matches extended characters

enum class capture_groups : uint32_t#

Capture groups setting.

For processing a regex pattern containing capture groups. These can be used to optimize the generated regex instructions where the capture groups do not require extracting the groups.

Values:

enumerator EXTRACT#

Capture groups processed normally for extract.

enumerator NON_CAPTURE#

Convert all capture groups to non-capture groups.

Functions

constexpr bool is_multiline(regex_flags const f)#

Returns true if the given flags contain MULTILINE.

Parameters:

f – Regex flags to check

Returns:

true if f includes MULTILINE

constexpr bool is_dotall(regex_flags const f)#

Returns true if the given flags contain DOTALL.

Parameters:

f – Regex flags to check

Returns:

true if f includes DOTALL

constexpr bool is_ascii(regex_flags const f)#

Returns true if the given flags contain ASCII.

Parameters:

f – Regex flags to check

Returns:

true if f includes ASCII

constexpr bool is_ext_newline(regex_flags const f)#

Returns true if the given flags contain EXT_NEWLINE.

Parameters:

f – Regex flags to check

Returns:

true if f includes EXT_NEWLINE

struct regex_program#
#include <regex_program.hpp>

Regex program class.

Create an instance from a regex pattern and use it to call the appropriate strings APIs. An instance can be reused.

See the Regex Features page for details on patterns and APIs that support regex.

Public Functions

regex_program(regex_program &&other)#

Move constructor.

Parameters:

other – Object to move from

regex_program &operator=(regex_program &&other)#

Move operator assignment.

Parameters:

other – Object to move from

Returns:

this object

std::string pattern() const#

Return the pattern used to create this instance.

Returns:

regex pattern as a string

regex_flags flags() const#

Return the regex_flags used to create this instance.

Returns:

regex flags setting

capture_groups capture() const#

Return the capture_groups used to create this instance.

Returns:

capture groups setting

int32_t instructions_count() const#

Return the number of instructions in this instance.

Returns:

Number of instructions

int32_t groups_count() const#

Return the number of capture groups in this instance.

Returns:

Number of groups

std::size_t compute_working_memory_size(int32_t num_strings) const#

Return the size of the working memory for the regex execution.

Parameters:

num_strings – Number of strings for computation

Returns:

Size of the working memory in bytes

Public Static Functions

static std::unique_ptr<regex_program> create(std::string_view pattern, regex_flags flags = regex_flags::DEFAULT, capture_groups capture = capture_groups::EXTRACT)#

Create a program from a pattern.

Throws:

cudf::logic_error – If pattern is invalid or contains unsupported features

Parameters:
  • pattern – Regex pattern

  • flags – Regex flags for interpreting special characters in the pattern

  • capture – Controls how capture groups in the pattern are used

Returns:

Instance of this object