BinaryClassificationMetrics

class pyspark.mllib.evaluation.BinaryClassificationMetrics(scoreAndLabels: pyspark.rdd.RDD[Tuple[float, float]])[source]

Evaluator for binary classification.

New in version 1.4.0.

Parameters
scoreAndLabelspyspark.RDD

an RDD of score, label and optional weight.

Examples

>>> scoreAndLabels = sc.parallelize([
...     (0.1, 0.0), (0.1, 1.0), (0.4, 0.0), (0.6, 0.0), (0.6, 1.0), (0.6, 1.0), (0.8, 1.0)], 2)
>>> metrics = BinaryClassificationMetrics(scoreAndLabels)
>>> metrics.areaUnderROC
0.70...
>>> metrics.areaUnderPR
0.83...
>>> metrics.unpersist()
>>> scoreAndLabelsWithOptWeight = sc.parallelize([
...     (0.1, 0.0, 1.0), (0.1, 1.0, 0.4), (0.4, 0.0, 0.2), (0.6, 0.0, 0.6), (0.6, 1.0, 0.9),
...     (0.6, 1.0, 0.5), (0.8, 1.0, 0.7)], 2)
>>> metrics = BinaryClassificationMetrics(scoreAndLabelsWithOptWeight)
>>> metrics.areaUnderROC
0.79...
>>> metrics.areaUnderPR
0.88...

Methods

call(name, *a)

Call method of java_model

unpersist()

Unpersists intermediate RDDs used in the computation.

Attributes

areaUnderPR

Computes the area under the precision-recall curve.

areaUnderROC

Computes the area under the receiver operating characteristic (ROC) curve.

Methods Documentation

call(name: str, *a: Any) → Any

Call method of java_model

unpersist() → None[source]

Unpersists intermediate RDDs used in the computation.

New in version 1.4.0.

Attributes Documentation

areaUnderPR

Computes the area under the precision-recall curve.

New in version 1.4.0.

areaUnderROC

Computes the area under the receiver operating characteristic (ROC) curve.

New in version 1.4.0.