pyspark.mllib.classification.
NaiveBayes
Train a Multinomial Naive Bayes model.
New in version 0.9.0.
Methods
train(data[, lambda_])
train
Train a Naive Bayes model given an RDD of (label, features) vectors.
Methods Documentation
This is the Multinomial NB which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB. The input feature values must be nonnegative.
pyspark.RDD
The training data, an RDD of pyspark.mllib.regression.LabeledPoint.
pyspark.mllib.regression.LabeledPoint
The smoothing parameter. (default: 1.0)