String handling#

Series.str can be used to access the values of the series as strings and apply several methods to it. These can be accessed like Series.str.<function/property>.

Series.str

Vectorized string functions for Series and Index.

`byte_count`()	Computes the number of bytes of each string in the Series/Index.
`capitalize`()	Convert strings in the Series/Index to be capitalized.
`cat`()	Concatenate strings in the Series/Index with given separator.
`center`(width[, fillchar])	Filling left and right side of strings in the Series/Index with an additional character.
`character_ngrams`([n, as_list])	Generate the n-grams from characters in a column of strings.
`character_tokenize`()	Each string is split into individual characters.
`code_points`()	Returns an array by filling it with the UTF-8 code point values for each character of each string.
`contains`(pat[, case, flags, na, regex])	Test if pattern or regex is contained within a string of a Series or Index.
`count`(pat[, flags])	Count occurrences of pattern in each string of the Series/Index.
`detokenize`(indices[, separator])	Combines tokens into strings by concatenating them in the order in which they appear in the `indices` column.
`edit_distance`(targets)	The `targets` strings are measured against the strings in this instance using the Levenshtein edit distance algorithm.
`edit_distance_matrix`()	Computes the edit distance between strings in the series.
`endswith`(pat)	Test if the end of each string element matches a pattern.
`extract`(pat[, flags, expand])	Extract capture groups in the regex pat as columns in a DataFrame.
`filter_alphanum`([repl, keep])	Remove non-alphanumeric characters from strings in this column.
`filter_characters`(table[, keep, repl])	Remove characters from each string using the character ranges in the given mapping table.
`filter_tokens`(min_token_length[, ...])	Remove tokens from within each string in the series that are smaller than min_token_length and optionally replace them with the replacement string.
`find`(sub[, start, end])	Return lowest indexes in each strings in the Series/Index where the substring is fully contained between `[start:end]`.
`findall`(pat[, flags])	Find all occurrences of pattern or regular expression in the Series/Index.
`find_multiple`(patterns)	Find all first occurrences of patterns in the Series/Index.
`get`([i])	Extract element from each component at specified position.
`get_json_object`(json_path, *[, ...])	Applies a JSONPath string to an input strings column where each row in the column is a valid json string
`hex_to_int`()	Returns integer value represented by each hex string.
`htoi`()	Returns integer value represented by each hex string.
`index`(sub[, start, end])	Return lowest indexes in each strings where the substring is fully contained between `[start:end]`.
`insert`([start, repl])	Insert the specified string into each string in the specified position.
`ip2int`()	This converts ip strings to integers
`ip_to_int`()	This converts ip strings to integers
`is_consonant`(position)	Return true for strings where the character at `position` is a consonant.
`is_vowel`(position)	Return true for strings where the character at `position` is a vowel -- not a consonant.
`isalnum`()	Check whether all characters in each string are alphanumeric.
`isalpha`()	Check whether all characters in each string are alphabetic.
`isdecimal`()	Check whether all characters in each string are decimal.
`isdigit`()	Check whether all characters in each string are digits.
`isempty`()	Check whether each string is an empty string.
`isfloat`()	Check whether all characters in each string form floating value.
`ishex`()	Check whether all characters in each string form a hex integer.
`isinteger`()	Check whether all characters in each string form integer.
`isipv4`()	Check whether all characters in each string form an IPv4 address.
`isspace`()	Check whether all characters in each string are whitespace.
`islower`()	Check whether all characters in each string are lowercase.
`isnumeric`()	Check whether all characters in each string are numeric.
`isupper`()	Check whether all characters in each string are uppercase.
`istimestamp`(format)	Check whether all characters in each string can be converted to a timestamp using the given format.
`istitle`()	Check whether each string is title formatted.
`jaccard_index`(input, width)	Compute the Jaccard index between this column and the given input strings column.
`join`([sep, string_na_rep, sep_na_rep])	Join lists contained as elements in the Series/Index with passed delimiter.
`len`()	Computes the length of each element in the Series/Index.
`like`(pat[, esc])	Test if a like pattern matches a string of a Series or Index.
`ljust`(width[, fillchar])	Filling right side of strings in the Series/Index with an additional character.
`lower`()	Converts all characters to lowercase.
`lstrip`([to_strip])	Remove leading and trailing characters.
`match`(pat[, case, flags])	Determine if each string matches a regular expression.
`minhash`(seed, a, b, width)	Compute the minhash of a strings column or a list strings column of terms.
`ngrams`([n, separator])	Generate the n-grams from a set of tokens, each record in series is treated a token.
`ngrams_tokenize`([n, delimiter, separator])	Generate the n-grams using tokens from each string.
`normalize_characters`([do_lower])	Normalizes strings characters for tokenizing.
`normalize_spaces`()	Remove extra whitespace between tokens and trim whitespace from the beginning and the end of each string.
`pad`(width[, side, fillchar])	Pad strings in the Series/Index up to width.
`partition`([sep, expand])	Split the string at the first occurrence of sep.
`porter_stemmer_measure`()	Compute the Porter Stemmer measure for each string.
`repeat`(repeats)	Duplicate each string in the Series or Index.
`removeprefix`(prefix)	Remove a prefix from an object series.
`removesuffix`(suffix)	Remove a suffix from an object series.
`replace`(pat, repl[, n, case, flags, regex])	Replace occurrences of pattern/regex in the Series/Index with some other string.
`replace_tokens`(targets, replacements[, ...])	The targets tokens are searched for within each string in the series and replaced with the corresponding replacements if found.
`replace_with_backrefs`(pat, repl)	Use the `repl` back-ref template to create a new string with the extracted elements found using the `pat` expression.
`rfind`(sub[, start, end])	Return highest indexes in each strings in the Series/Index where the substring is fully contained between `[start:end]`.
`rindex`(sub[, start, end])	Return highest indexes in each strings where the substring is fully contained between `[start:end]`.
`rjust`(width[, fillchar])	Filling left side of strings in the Series/Index with an additional character.
`rpartition`([sep, expand])	Split the string at the last occurrence of sep.
`rsplit`([pat, n, expand, regex])	Split strings around given separator/delimiter.
`rstrip`([to_strip])	Remove leading and trailing characters.
`slice`([start, stop, step])	Slice substrings from each element in the Series or Index.
`slice_from`(starts, stops)	Return substring of each string using positions for each string.
`slice_replace`([start, stop, repl])	Replace the specified section of each string with a new string.
`split`([pat, n, expand, regex])	Split strings around given separator/delimiter.
`startswith`(pat)	Test if the start of each string element matches a pattern.
`strip`([to_strip])	Remove leading and trailing characters.
`swapcase`()	Change each lowercase character to uppercase and vice versa.
`title`()	Uppercase the first letter of each letter after a space and lowercase the rest.
`token_count`([delimiter])	Each string is split into tokens using the provided delimiter.
`tokenize`([delimiter])	Each string is split into tokens using the provided delimiter(s).
`translate`(table)	Map all characters in the string through the given mapping table.
`upper`()	Convert each string to uppercase.
`url_decode`()	Returns a URL-decoded format of each string.
`url_encode`()	Returns a URL-encoded format of each string.
`wrap`(width, **kwargs)	Wrap long strings in the Series/Index to be formatted in paragraphs with length less than a given width.
`zfill`(width)	Pad strings in the Series/Index by prepending '0' characters.

String handling#

This Page