pyspark.sql.functions.regexp_replace#

pyspark.sql.functions.regexp_replace(string, pattern, replacement)[source]#

Replace all substrings of the specified string value that match regexp with replacement.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
stringColumn or str

column name or column containing the string value

patternColumn or str

column object or str containing the regexp pattern

replacementColumn or str

column object or str containing the replacement

Returns
Column

string with all substrings replaced.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...      [("100-200", r"(\d+)", "--")],
...      ["str", "pattern", "replacement"]
... )

Example 1: Replaces all the substrings in the str column name that match the regex pattern (d+) (one or more digits) with the replacement string “–“.

>>> df.select('*', sf.regexp_replace('str', r'(\d+)', '--')).show()
+-------+-------+-----------+---------------------------------+
|    str|pattern|replacement|regexp_replace(str, (\d+), --, 1)|
+-------+-------+-----------+---------------------------------+
|100-200|  (\d+)|         --|                            -----|
+-------+-------+-----------+---------------------------------+

Example 2: Replaces all the substrings in the str Column that match the regex pattern in the pattern Column with the string in the replacement column.

>>> df.select('*', \
...     sf.regexp_replace(sf.col("str"), sf.col("pattern"), sf.col("replacement")) \
... ).show()
+-------+-------+-----------+--------------------------------------------+
|    str|pattern|replacement|regexp_replace(str, pattern, replacement, 1)|
+-------+-------+-----------+--------------------------------------------+
|100-200|  (\d+)|         --|                                       -----|
+-------+-------+-----------+--------------------------------------------+