pyspark.sql.DataFrameReader

class pyspark.sql.DataFrameReader(spark: SparkSession)[source]

Interface used to load a DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SparkSession.read to access this.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Methods

csv(path[, schema, sep, encoding, quote, …])

Loads a CSV file and returns the result as a DataFrame.

format(source)

Specifies the input data source format.

jdbc(url, table[, column, lowerBound, …])

Construct a DataFrame representing the database table named table accessible via JDBC URL url and connection properties.

json(path[, schema, primitivesAsString, …])

Loads JSON files and returns the results as a DataFrame.

load([path, format, schema])

Loads data from a data source and returns it as a DataFrame.

option(key, value)

Adds an input option for the underlying data source.

options(**options)

Adds input options for the underlying data source.

orc(path[, mergeSchema, pathGlobFilter, …])

Loads ORC files, returning the result as a DataFrame.

parquet(*paths, **options)

Loads Parquet files, returning the result as a DataFrame.

schema(schema)

Specifies the input schema.

table(tableName)

Returns the specified table as a DataFrame.

text(paths[, wholetext, lineSep, …])

Loads text files and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any.