normalize#

pylibcudf.nvtext.normalize.normalize_characters(Column input, bool do_lower_case) Column#

Normalizes strings characters for tokenizing.

For details, see normalize_characters()

Parameters:
inputColumn

Input strings

do_lower_casebool

If true, upper-case characters are converted to lower-case and accents are stripped from those characters. If false, accented and upper-case characters are not transformed.

Returns:
Column

Normalized strings column

pylibcudf.nvtext.normalize.normalize_spaces(Column input) Column#

Returns a new strings column by normalizing the whitespace in each string in the input column.

For details, see normalize_spaces()

Parameters:
inputColumn

Input strings

Returns:
Column

New strings columns of normalized strings.