pyspark.sql.DataFrameReader.parquet#
- DataFrameReader.parquet(*paths, **options)[source]#
Loads Parquet files, returning the result as a
DataFrame
.New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- pathsstr
One or more file paths to read the Parquet files from.
- Returns
DataFrame
A DataFrame containing the data from the Parquet files.
- Other Parameters
- **options
For the extra options, refer to Data Source Option for the version you use.
Examples
Create sample dataframes.
>>> df = spark.createDataFrame( ... [(10, "Alice"), (15, "Bob"), (20, "Tom")], schema=["age", "name"]) >>> df2 = spark.createDataFrame([(70, "Alice"), (80, "Bob")], schema=["height", "name"])
Write a DataFrame into a Parquet file and read it back.
>>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="parquet1") as d: ... # Write a DataFrame into a Parquet file. ... df.write.mode("overwrite").format("parquet").save(d) ... ... # Read the Parquet file as a DataFrame. ... spark.read.parquet(d).orderBy("name").show() +---+-----+ |age| name| +---+-----+ | 10|Alice| | 15| Bob| | 20| Tom| +---+-----+
Read a Parquet file with a specific column.
>>> with tempfile.TemporaryDirectory(prefix="parquet2") as d: ... df.write.mode("overwrite").format("parquet").save(d) ... ... # Read the Parquet file with only the 'name' column. ... spark.read.schema("name string").parquet(d).orderBy("name").show() +-----+ | name| +-----+ |Alice| | Bob| | Tom| +-----+
Read multiple Parquet files and merge schema.
>>> with tempfile.TemporaryDirectory(prefix="parquet3") as d1: ... with tempfile.TemporaryDirectory(prefix="parquet4") as d2: ... df.write.mode("overwrite").format("parquet").save(d1) ... df2.write.mode("overwrite").format("parquet").save(d2) ... ... spark.read.option( ... "mergeSchema", "true" ... ).parquet(d1, d2).select( ... "name", "age", "height" ... ).orderBy("name", "age").show() +-----+----+------+ | name| age|height| +-----+----+------+ |Alice|NULL| 70| |Alice| 10| NULL| | Bob|NULL| 80| | Bob| 15| NULL| | Tom| 20| NULL| +-----+----+------+