pyspark.pandas.
read_parquet
Load a parquet object from the file path, returning a DataFrame.
File path
If not None, only these columns will be read from the file.
Index column of table in Spark.
If True, try to respect the metadata if the Parquet file is written from pandas.
All other options passed directly into Spark’s data source.
See also
DataFrame.to_parquet
DataFrame.read_table
DataFrame.read_delta
DataFrame.read_spark_io
Examples
>>> ps.range(1).to_parquet('%s/read_spark_io/data.parquet' % path) >>> ps.read_parquet('%s/read_spark_io/data.parquet' % path, columns=['id']) id 0 0
You can preserve the index in the roundtrip as below.
>>> ps.range(1).to_parquet('%s/read_spark_io/data.parquet' % path, index_col="index") >>> ps.read_parquet('%s/read_spark_io/data.parquet' % path, columns=['id'], index_col="index") ... id index 0 0