pyspark.sql.DataFrameStatFunctions¶
-
class
pyspark.sql.
DataFrameStatFunctions
(df: pyspark.sql.dataframe.DataFrame)[source]¶ Functionality for statistic functions with
DataFrame
.New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
Methods
approxQuantile
(col, probabilities, relativeError)Calculates the approximate quantiles of numerical columns of a
DataFrame
.corr
(col1, col2[, method])Calculates the correlation of two columns of a
DataFrame
as a double value.cov
(col1, col2)Calculate the sample covariance for the given columns, specified by their names, as a double value.
crosstab
(col1, col2)Computes a pair-wise frequency table of the given columns.
freqItems
(cols[, support])Finding frequent items for columns, possibly with false positives.
sampleBy
(col, fractions[, seed])Returns a stratified sample without replacement based on the fraction given on each stratum.