ALS#

class pyspark.mllib.recommendation.ALS[source]#

Alternating Least Squares matrix factorization

New in version 0.9.0.

Methods

`train`(ratings, rank[, iterations, lambda_, ...])	Train a matrix factorization model given an RDD of ratings by users for a subset of products.
`trainImplicit`(ratings, rank[, iterations, ...])	Train a matrix factorization model given an RDD of 'implicit preferences' of users for a subset of products.

Methods Documentation

classmethod train(ratings, rank, iterations=5, lambda_=0.01, blocks=- 1, nonnegative=False, seed=None)[source]#

Train a matrix factorization model given an RDD of ratings by users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.

New in version 0.9.0.

Parameters

ratingspyspark.RDD: RDD of Rating or (userID, productID, rating) tuple.
rankint: Number of features to use (also referred to as the number of latent factors).
iterationsint, optional: Number of iterations of ALS. (default: 5)
lambda_float, optional: Regularization parameter. (default: 0.01)
blocksint, optional: Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)
nonnegativebool, optional: A value of True will solve least-squares with nonnegativity constraints. (default: False)
seedbool, optional: Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)

classmethod trainImplicit(ratings, rank, iterations=5, lambda_=0.01, blocks=- 1, alpha=0.01, nonnegative=False, seed=None)[source]#

Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.

New in version 0.9.0.

Parameters

ratingspyspark.RDD: RDD of Rating or (userID, productID, rating) tuple.
rankint: Number of features to use (also referred to as the number of latent factors).
iterationsint, optional: Number of iterations of ALS. (default: 5)
lambda_float, optional: Regularization parameter. (default: 0.01)
blocksint, optional: Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)
alphafloat, optional: A constant used in computing confidence. (default: 0.01)
nonnegativebool, optional: A value of True will solve least-squares with nonnegativity constraints. (default: False)
seedint, optional: Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)