pyspark.sql.
DataFrameStatFunctions
Functionality for statistic functions with DataFrame.
DataFrame
New in version 1.4.
Methods
approxQuantile(col, probabilities, relativeError)
approxQuantile
Calculates the approximate quantiles of numerical columns of a DataFrame.
corr(col1, col2[, method])
corr
Calculates the correlation of two columns of a DataFrame as a double value.
cov(col1, col2)
cov
Calculate the sample covariance for the given columns, specified by their names, as a double value.
crosstab(col1, col2)
crosstab
Computes a pair-wise frequency table of the given columns.
freqItems(cols[, support])
freqItems
Finding frequent items for columns, possibly with false positives.
sampleBy(col, fractions[, seed])
sampleBy
Returns a stratified sample without replacement based on the fraction given on each stratum.