pyspark.sql.Catalog#

class pyspark.sql.Catalog(sparkSession)[source]#

User-facing catalog API, accessible through SparkSession.catalog.

This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog.

Changed in version 3.4.0: Supports Spark Connect.

Methods

cacheTable(tableName[, storageLevel])

Caches the specified table in-memory or with given storage level.

clearCache()

Removes all cached tables from the in-memory cache.

createExternalTable(tableName[, path, ...])

Creates a table based on the dataset in a data source.

createTable(tableName[, path, source, ...])

Creates a table based on the dataset in a data source.

currentCatalog()

Returns the current default catalog in this session.

currentDatabase()

Returns the current default database in this session.

databaseExists(dbName)

Check if the database with the specified name exists.

dropGlobalTempView(viewName)

Drops the global temporary view with the given view name in the catalog.

dropTempView(viewName)

Drops the local temporary view with the given view name in the catalog.

functionExists(functionName[, dbName])

Check if the function with the specified name exists.

getDatabase(dbName)

Get the database with the specified name.

getFunction(functionName)

Get the function with the specified name.

getTable(tableName)

Get the table or view with the specified name.

isCached(tableName)

Returns true if the table is currently cached in-memory.

listCatalogs([pattern])

Returns a list of catalogs in this session.

listColumns(tableName[, dbName])

Returns a list of columns for the given table/view in the specified database.

listDatabases([pattern])

Returns a list of databases available across all sessions.

listFunctions([dbName, pattern])

Returns a list of functions registered in the specified database.

listTables([dbName, pattern])

Returns a list of tables/views in the specified database.

recoverPartitions(tableName)

Recovers all the partitions of the given table and updates the catalog.

refreshByPath(path)

Invalidates and refreshes all the cached data (and the associated metadata) for any DataFrame that contains the given data source path.

refreshTable(tableName)

Invalidates and refreshes all the cached data and metadata of the given table.

registerFunction(name, f[, returnType])

An alias for spark.udf.register().

setCurrentCatalog(catalogName)

Sets the current default catalog in this session.

setCurrentDatabase(dbName)

Sets the current default database in this session.

tableExists(tableName[, dbName])

Check if the table or view with the specified name exists.

uncacheTable(tableName)

Removes the specified table from the in-memory cache.