pyspark.mllib.clustering.
StreamingKMeans
Provides methods to set k, decayFactor, timeUnit to configure the KMeans algorithm for fitting and predicting on incoming dstreams. More details on how the centroids are updated are provided under the docs of StreamingKMeansModel.
New in version 1.5.0.
Number of clusters. (default: 2)
Forgetfulness of the previous centroids. (default: 1.0)
Can be “batches” or “points”. If points, then the decay factor is raised to the power of number of new points and if batches, then decay factor will be used as is. (default: “batches”)
Methods
latestModel()
latestModel
Return the latest model
predictOn(dstream)
predictOn
Make predictions on a dstream.
predictOnValues(dstream)
predictOnValues
Make predictions on a keyed dstream.
setDecayFactor(decayFactor)
setDecayFactor
Set decay factor.
setHalfLife(halfLife, timeUnit)
setHalfLife
Set number of batches after which the centroids of that particular batch has half the weightage.
setInitialCenters(centers, weights)
setInitialCenters
Set initial centers.
setK(k)
setK
Set number of clusters.
setRandomCenters(dim, weight, seed)
setRandomCenters
Set the initial centers to be random samples from a gaussian population with constant weights.
trainOn(dstream)
trainOn
Train the model on the incoming dstream.
Methods Documentation
Make predictions on a dstream. Returns a transformed dstream object
Make predictions on a keyed dstream. Returns a transformed dstream object.
Set initial centers. Should be set before calling trainOn.