ALS#

class pyspark.mllib.recommendation.ALS[source]#

Alternating Least Squares matrix factorization

New in version 0.9.0.

Methods

train(ratings, rank[, iterations, lambda_, ...])

Train a matrix factorization model given an RDD of ratings by users for a subset of products.

trainImplicit(ratings, rank[, iterations, ...])

Train a matrix factorization model given an RDD of 'implicit preferences' of users for a subset of products.

Methods Documentation

classmethod train(ratings, rank, iterations=5, lambda_=0.01, blocks=- 1, nonnegative=False, seed=None)[source]#

Train a matrix factorization model given an RDD of ratings by users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.

New in version 0.9.0.

Parameters
ratingspyspark.RDD

RDD of Rating or (userID, productID, rating) tuple.

rankint

Number of features to use (also referred to as the number of latent factors).

iterationsint, optional

Number of iterations of ALS. (default: 5)

lambda_float, optional

Regularization parameter. (default: 0.01)

blocksint, optional

Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)

nonnegativebool, optional

A value of True will solve least-squares with nonnegativity constraints. (default: False)

seedbool, optional

Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)

classmethod trainImplicit(ratings, rank, iterations=5, lambda_=0.01, blocks=- 1, alpha=0.01, nonnegative=False, seed=None)[source]#

Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.

New in version 0.9.0.

Parameters
ratingspyspark.RDD

RDD of Rating or (userID, productID, rating) tuple.

rankint

Number of features to use (also referred to as the number of latent factors).

iterationsint, optional

Number of iterations of ALS. (default: 5)

lambda_float, optional

Regularization parameter. (default: 0.01)

blocksint, optional

Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)

alphafloat, optional

A constant used in computing confidence. (default: 0.01)

nonnegativebool, optional

A value of True will solve least-squares with nonnegativity constraints. (default: False)

seedint, optional

Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)