pyspark.sql.functions.try_parse_url#

pyspark.sql.functions.try_parse_url(url, partToExtract, key=None)[source]#

This is a special version of parse_url that performs the same operation, but returns a NULL value instead of raising an error if the parsing cannot be performed.

New in version 4.0.0.

Parameters

urlColumn or str: A column of strings, each representing a URL.
partToExtractColumn or str: A column of strings, each representing the part to extract from the URL.
keyColumn or str, optional: A column of strings, each representing the key of a query parameter in the URL.

Returns

Column: A new column of strings, each representing the value of the extracted part from the URL.

Examples

Example 1: Extracting the query part from a URL

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...   [("https://spark.apache.org/path?query=1", "QUERY")],
...   ["url", "part"]
... )
>>> df.select(sf.try_parse_url(df.url, df.part)).show()
+------------------------+
|try_parse_url(url, part)|
+------------------------+
|                 query=1|
+------------------------+

Example 2: Extracting the value of a specific query parameter from a URL

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...   [("https://spark.apache.org/path?query=1", "QUERY", "query")],
...   ["url", "part", "key"]
... )
>>> df.select(sf.try_parse_url(df.url, df.part, df.key)).show()
+-----------------------------+
|try_parse_url(url, part, key)|
+-----------------------------+
|                            1|
+-----------------------------+

Example 3: Extracting the protocol part from a URL

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...   [("https://spark.apache.org/path?query=1", "PROTOCOL")],
...   ["url", "part"]
... )
>>> df.select(sf.try_parse_url(df.url, df.part)).show()
+------------------------+
|try_parse_url(url, part)|
+------------------------+
|                   https|
+------------------------+

Example 4: Extracting the host part from a URL

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...   [("https://spark.apache.org/path?query=1", "HOST")],
...   ["url", "part"]
... )
>>> df.select(sf.try_parse_url(df.url, df.part)).show()
+------------------------+
|try_parse_url(url, part)|
+------------------------+
|        spark.apache.org|
+------------------------+

Example 5: Extracting the path part from a URL

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...   [("https://spark.apache.org/path?query=1", "PATH")],
...   ["url", "part"]
... )
>>> df.select(sf.try_parse_url(df.url, df.part)).show()
+------------------------+
|try_parse_url(url, part)|
+------------------------+
|                   /path|
+------------------------+

Example 6: Invalid URL

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame(
...   [("inva lid://spark.apache.org/path?query=1", "QUERY", "query")],
...   ["url", "part", "key"]
... )
>>> df.select(sf.try_parse_url(df.url, df.part, df.key)).show()
+-----------------------------+
|try_parse_url(url, part, key)|
+-----------------------------+
|                         NULL|
+-----------------------------+