Normalizer

class pyspark.mllib.feature.Normalizer(p: float = 2.0)[source]

Normalizes samples individually to unit Lp norm

For any 1 <= p < float(‘inf’), normalizes samples using sum(abs(vector) p) (1/p) as norm.

For p = float(‘inf’), max(abs(vector)) will be used as norm for normalization.

New in version 1.2.0.

Parameters
pfloat, optional

Normalization in L^p^ space, p = 2 by default.

Examples

>>> from pyspark.mllib.linalg import Vectors
>>> v = Vectors.dense(range(3))
>>> nor = Normalizer(1)
>>> nor.transform(v)
DenseVector([0.0, 0.3333, 0.6667])
>>> rdd = sc.parallelize([v])
>>> nor.transform(rdd).collect()
[DenseVector([0.0, 0.3333, 0.6667])]
>>> nor2 = Normalizer(float("inf"))
>>> nor2.transform(v)
DenseVector([0.0, 0.5, 1.0])

Methods

transform(vector)

Applies unit length normalization on a vector.

Methods Documentation

transform(vector: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]][source]

Applies unit length normalization on a vector.

New in version 1.2.0.

Parameters
vectorpyspark.mllib.linalg.Vector or pyspark.RDD

vector or RDD of vector to be normalized.

Returns
:py:class:`pyspark.mllib.linalg.Vector` orpy:class:pyspark.RDD

normalized vector(s). If the norm of the input is zero, it will return the input vector.