Normalizer¶
-
class
pyspark.mllib.feature.
Normalizer
(p: float = 2.0)[source]¶ Normalizes samples individually to unit Lp norm
For any 1 <= p < float(‘inf’), normalizes samples using sum(abs(vector) p) (1/p) as norm.
For p = float(‘inf’), max(abs(vector)) will be used as norm for normalization.
New in version 1.2.0.
- Parameters
- pfloat, optional
Normalization in L^p^ space, p = 2 by default.
Examples
>>> from pyspark.mllib.linalg import Vectors >>> v = Vectors.dense(range(3)) >>> nor = Normalizer(1) >>> nor.transform(v) DenseVector([0.0, 0.3333, 0.6667])
>>> rdd = sc.parallelize([v]) >>> nor.transform(rdd).collect() [DenseVector([0.0, 0.3333, 0.6667])]
>>> nor2 = Normalizer(float("inf")) >>> nor2.transform(v) DenseVector([0.0, 0.5, 1.0])
Methods
transform
(vector)Applies unit length normalization on a vector.
Methods Documentation
-
transform
(vector: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]][source]¶ Applies unit length normalization on a vector.
New in version 1.2.0.
- Parameters
- vector
pyspark.mllib.linalg.Vector
orpyspark.RDD
vector or RDD of vector to be normalized.
- vector
- Returns
- :py:class:`pyspark.mllib.linalg.Vector` orpy:class:pyspark.RDD
normalized vector(s). If the norm of the input is zero, it will return the input vector.