KernelDensity

class pyspark.mllib.stat.KernelDensity[source]

Estimate probability density at required points given an RDD of samples from the population.

Examples

>>> kd = KernelDensity()
>>> sample = sc.parallelize([0.0, 1.0])
>>> kd.setSample(sample)
>>> kd.estimate([0.0, 1.0])
array([ 0.12938758,  0.12938758])

Methods

estimate(points)

Estimate the probability density at points

setBandwidth(bandwidth)

Set bandwidth of each sample.

setSample(sample)

Set sample points from the population.

Methods Documentation

estimate(points: Iterable[float]) → numpy.ndarray[source]

Estimate the probability density at points

setBandwidth(bandwidth: float) → None[source]

Set bandwidth of each sample. Defaults to 1.0

setSample(sample: pyspark.rdd.RDD[float]) → None[source]

Set sample points from the population. Should be a RDD