normalize#

pylibcudf.nvtext.normalize.normalize_characters(Column input, bool do_lower_case) → Column#

Normalizes strings characters for tokenizing.

Parameters:

inputColumn: Input strings
do_lower_casebool: If true, upper-case characters are converted to lower-case and accents are stripped from those characters. If false, accented and upper-case characters are not transformed.

Returns:

pylibcudf.nvtext.normalize.normalize_spaces(Column input) → Column#

Returns a new strings column by normalizing the whitespace in each string in the input column.

Parameters:

Returns: