pyspark.mllib.clustering.
BisectingKMeansModel
A clustering model derived from the bisecting k-means method.
New in version 2.0.0.
Examples
>>> data = array([0.0,0.0, 1.0,1.0, 9.0,8.0, 8.0,9.0]).reshape(4, 2) >>> bskm = BisectingKMeans() >>> model = bskm.train(sc.parallelize(data, 2), k=4) >>> p = array([0.0, 0.0]) >>> model.predict(p) 0 >>> model.k 4 >>> model.computeCost(p) 0.0
Methods
call(name, *a)
call
Call method of java_model
computeCost(x)
computeCost
Return the Bisecting K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
predict(x)
predict
Find the cluster that each of the points belongs to in this model.
Attributes
clusterCenters
Get the cluster centers, represented as a list of NumPy arrays.
k
Get the number of clusters
Methods Documentation
Return the Bisecting K-means cost (sum of squared distances of points to their nearest center) for this model on the given data. If provided with an RDD of points returns the sum.
pyspark.mllib.linalg.Vector
pyspark.RDD
A data point (or RDD of points) to compute the cost(s). pyspark.mllib.linalg.Vector can be replaced with equivalent objects (list, tuple, numpy.ndarray).
A data point (or RDD of points) to determine cluster index. pyspark.mllib.linalg.Vector can be replaced with equivalent objects (list, tuple, numpy.ndarray).
Predicted cluster index or an RDD of predicted cluster indices if the input is an RDD.
Attributes Documentation