pyspark.mllib.linalg.
Vectors
Factory methods for working with vectors.
Notes
Dense vectors are simply represented as NumPy array objects, so there is no need to covert them for use in MLlib. For sparse vectors, the factory methods in this class create an MLlib-compatible type, or users can pass in SciPy’s scipy.sparse column vectors.
Methods
dense(*elements)
dense
Create a dense vector of 64-bit floats from a Python list or numbers.
fromML(vec)
fromML
Convert a vector from the new mllib-local representation.
norm(vector, p)
norm
Find norm of the given vector.
parse(s)
parse
Parse a string representation back into the Vector.
sparse(size, *args)
sparse
Create a sparse vector, using either a dictionary, a list of (index, value) pairs, or two separate arrays of indices and values (sorted by index).
squared_distance(v1, v2)
squared_distance
Squared distance between two vectors.
stringify(vector)
stringify
Converts a vector into a string, which can be recognized by Vectors.parse().
zeros(size)
zeros
Methods Documentation
Examples
>>> Vectors.dense([1, 2, 3]) DenseVector([1.0, 2.0, 3.0]) >>> Vectors.dense(1.0, 2.0) DenseVector([1.0, 2.0])
Convert a vector from the new mllib-local representation. This does NOT copy the data; it copies references.
New in version 2.0.0.
pyspark.ml.linalg.Vector
pyspark.mllib.linalg.Vector
>>> Vectors.parse('[2,1,2 ]') DenseVector([2.0, 1.0, 2.0]) >>> Vectors.parse(' ( 100, [0], [2])') SparseVector(100, {0: 2.0})
Size of the vector.
Non-zero entries, as a dictionary, list of tuples, or two sorted lists containing indices and values.
>>> Vectors.sparse(4, {1: 1.0, 3: 5.5}) SparseVector(4, {1: 1.0, 3: 5.5}) >>> Vectors.sparse(4, [(1, 1.0), (3, 5.5)]) SparseVector(4, {1: 1.0, 3: 5.5}) >>> Vectors.sparse(4, [1, 3], [1.0, 5.5]) SparseVector(4, {1: 1.0, 3: 5.5})
Squared distance between two vectors. a and b can be of type SparseVector, DenseVector, np.ndarray or array.array.
>>> a = Vectors.sparse(4, [(0, 1), (3, 4)]) >>> b = Vectors.dense([2, 5, 4, 1]) >>> a.squared_distance(b) 51.0
>>> Vectors.stringify(Vectors.sparse(2, [1], [1.0])) '(2,[1],[1.0])' >>> Vectors.stringify(Vectors.dense([0.0, 1.0])) '[0.0,1.0]'