pyspark.sql.DataFrameReader.parquet#

DataFrameReader.parquet(*paths, **options)[source]#

Loads Parquet files, returning the result as a DataFrame.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
pathsstr

One or more file paths to read the Parquet files from.

Returns
DataFrame

A DataFrame containing the data from the Parquet files.

Other Parameters
**options

For the extra options, refer to Data Source Option for the version you use.

Examples

Create sample dataframes.

>>> df = spark.createDataFrame(
...     [(10, "Alice"), (15, "Bob"), (20, "Tom")], schema=["age", "name"])
>>> df2 = spark.createDataFrame([(70, "Alice"), (80, "Bob")], schema=["height", "name"])

Write a DataFrame into a Parquet file and read it back.

>>> import tempfile
>>> with tempfile.TemporaryDirectory(prefix="parquet1") as d:
...     # Write a DataFrame into a Parquet file.
...     df.write.mode("overwrite").format("parquet").save(d)
...
...     # Read the Parquet file as a DataFrame.
...     spark.read.parquet(d).orderBy("name").show()
+---+-----+
|age| name|
+---+-----+
| 10|Alice|
| 15|  Bob|
| 20|  Tom|
+---+-----+

Read a Parquet file with a specific column.

>>> with tempfile.TemporaryDirectory(prefix="parquet2") as d:
...     df.write.mode("overwrite").format("parquet").save(d)
...
...     # Read the Parquet file with only the 'name' column.
...     spark.read.schema("name string").parquet(d).orderBy("name").show()
+-----+
| name|
+-----+
|Alice|
|  Bob|
|  Tom|
+-----+

Read multiple Parquet files and merge schema.

>>> with tempfile.TemporaryDirectory(prefix="parquet3") as d1:
...     with tempfile.TemporaryDirectory(prefix="parquet4") as d2:
...         df.write.mode("overwrite").format("parquet").save(d1)
...         df2.write.mode("overwrite").format("parquet").save(d2)
...
...         spark.read.option(
...             "mergeSchema", "true"
...         ).parquet(d1, d2).select(
...             "name", "age", "height"
...         ).orderBy("name", "age").show()
+-----+----+------+
| name| age|height|
+-----+----+------+
|Alice|NULL|    70|
|Alice|  10|  NULL|
|  Bob|NULL|    80|
|  Bob|  15|  NULL|
|  Tom|  20|  NULL|
+-----+----+------+