pyspark.sql.functions.regexp_replace#
- pyspark.sql.functions.regexp_replace(string, pattern, replacement)[source]#
Replace all substrings of the specified string value that match regexp with replacement.
New in version 1.5.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
string with all substrings replaced.
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame( ... [("100-200", r"(\d+)", "--")], ... ["str", "pattern", "replacement"] ... )
Example 1: Replaces all the substrings in the str column name that match the regex pattern (d+) (one or more digits) with the replacement string “–“.
>>> df.select('*', sf.regexp_replace('str', r'(\d+)', '--')).show() +-------+-------+-----------+---------------------------------+ | str|pattern|replacement|regexp_replace(str, (\d+), --, 1)| +-------+-------+-----------+---------------------------------+ |100-200| (\d+)| --| -----| +-------+-------+-----------+---------------------------------+
Example 2: Replaces all the substrings in the str Column that match the regex pattern in the pattern Column with the string in the replacement column.
>>> df.select('*', \ ... sf.regexp_replace(sf.col("str"), sf.col("pattern"), sf.col("replacement")) \ ... ).show() +-------+-------+-----------+--------------------------------------------+ | str|pattern|replacement|regexp_replace(str, pattern, replacement, 1)| +-------+-------+-----------+--------------------------------------------+ |100-200| (\d+)| --| -----| +-------+-------+-----------+--------------------------------------------+