pyspark.sql.functions.regexp_substr¶
-
pyspark.sql.functions.
regexp_substr
(str: ColumnOrName, regexp: ColumnOrName) → pyspark.sql.column.Column[source]¶ Returns the substring that matches the Java regex regexp within the string str. If the regular expression is not found, the result is null.
New in version 3.5.0.
- Parameters
- Returns
Column
the substring that matches a Java regex within the string str.
Examples
>>> df = spark.createDataFrame([("1a 2b 14m", r"\d+")], ["str", "regexp"]) >>> df.select(regexp_substr('str', lit(r'\d+')).alias('d')).collect() [Row(d='1')] >>> df.select(regexp_substr('str', lit(r'mmm')).alias('d')).collect() [Row(d=None)] >>> df.select(regexp_substr("str", col("regexp")).alias('d')).collect() [Row(d='1')]