pyspark.pandas.Series.str.replace#

str.replace(pat, repl, n=- 1, case=None, flags=0, regex=False)#

Replace occurrences of pattern/regex in the Series with some other string. Equivalent to str.replace() or re.sub().

Parameters
patstr or compiled regex

String can be a character sequence or regular expression.

replstr or callable

Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub().

nint, default -1 (all)

Number of replacements to make from start.

caseboolean, default None

If True, case sensitive (the default if pat is a string). Set to False for case insensitive. Cannot be set if pat is a compiled regex.

flags: int, default 0 (no flags)

re module flags, e.g. re.IGNORECASE. Cannot be set if pat is a compiled regex.

regexboolean, default True

If True, assumes the passed-in pattern is a regular expression. If False, treats the pattern as a literal string. Cannot be set to False if pat is a compile regex or repl is a callable.

Returns
Series of object

A copy of the string with all matching occurrences of pat replaced by repl.

Examples

When pat is a string and regex is True (the default), the given pat is compiled as a regex. When repl is a string, it replaces matching regex patterns as with re.sub(). NaN value(s) in the Series are changed to None:

>>> ps.Series(['foo', 'fuz', np.nan]).str.replace('f.', 'ba', regex=True)
0     bao
1     baz
2    None
dtype: object

When pat is a string and regex is False, every pat is replaced with repl as with str.replace():

>>> ps.Series(['f.o', 'fuz', np.nan]).str.replace('f.', 'ba', regex=False)
0     bao
1     fuz
2    None
dtype: object

When repl is a callable, it is called on every pat using re.sub(). The callable should expect one positional argument (a regex object) and return a string.

Reverse every lowercase alphabetic word:

>>> repl = lambda m: m.group(0)[::-1]
>>> ps.Series(['foo 123', 'bar baz', np.nan]).str.replace('[a-z]+', repl, regex=True)
0    oof 123
1    rab zab
2       None
dtype: object

Using regex groups (extract second group and swap case):

>>> pat = "(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
>>> repl = lambda m: m.group('two').swapcase()
>>> ps.Series(['One Two Three', 'Foo Bar Baz']).str.replace(pat, repl, regex=True)
0    tWO
1    bAR
dtype: object

Using a compiled regex with flags:

>>> import re
>>> regex_pat = re.compile('FUZ', flags=re.IGNORECASE)
>>> ps.Series(['foo', 'fuz', np.nan]).str.replace(regex_pat, 'bar', regex=True)
0     foo
1     bar
2    None
dtype: object