MLlib (DataFrame-based) for Spark Connect#
Warning
The namespace for this package can change in the future Spark version.
Pipeline APIs#
Abstract class for transformers that transform one dataset into another. |
|
Abstract class for estimators that fit models to data. |
|
|
Abstract class for models that are fitted by estimators. |
Base class for evaluators that compute metrics from predictions. |
|
|
A simple pipeline, which acts as an estimator. |
|
Represents a compiled pipeline with transformers and fitted models. |
Feature#
|
Rescale each feature individually to range [-1, 1] by dividing through the largest maximum absolute value in each feature. |
|
Model fitted by MaxAbsScaler. |
|
Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set. |
|
Model fitted by StandardScaler. |
|
A feature transformer that merges multiple input columns into an array type column. |
Classification#
|
Logistic regression estimator. |
|
Model fitted by LogisticRegression. |
Functions#
|
Converts a column of array of numeric type into a column of pyspark.ml.linalg.DenseVector instances |
|
Converts a column of MLlib sparse/dense vectors into a column of dense arrays. |
Tuning#
|
K-fold cross validation performs model selection by splitting the dataset into a set of non-overlapping randomly partitioned folds which are used as separate training and test datasets e.g., with k=3 folds, K-fold cross validation will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing. |
|
CrossValidatorModel contains the model with the highest average cross-validation metric across folds and uses this model to transform input data. |
Evaluation#
|
Evaluator for Regression, which expects input columns prediction and label. |
|
Evaluator for binary classification, which expects input columns prediction and label. |
Evaluator for multiclass classification, which expects input columns prediction and label. |
Utilities#
The base interface Estimator / Transformer / Model / Evaluator needs to inherit for supporting saving and loading. |
|
Meta-algorithm such as pipeline and cross validator must implement this interface. |