NaiveBayes¶
-
class
pyspark.mllib.classification.
NaiveBayes
[source]¶ Train a Multinomial Naive Bayes model.
New in version 0.9.0.
Methods
train
(data[, lambda_])Train a Naive Bayes model given an RDD of (label, features) vectors.
Methods Documentation
-
classmethod
train
(data: pyspark.rdd.RDD[pyspark.mllib.regression.LabeledPoint], lambda_: float = 1.0) → pyspark.mllib.classification.NaiveBayesModel[source]¶ Train a Naive Bayes model given an RDD of (label, features) vectors.
This is the Multinomial NB which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB. The input feature values must be nonnegative.
New in version 0.9.0.
- Parameters
- data
pyspark.RDD
The training data, an RDD of
pyspark.mllib.regression.LabeledPoint
.- lambda_float, optional
The smoothing parameter. (default: 1.0)
- data
-
classmethod