pyspark.RDD.sumApprox¶
-
RDD.
sumApprox
(timeout: int, confidence: float = 0.95) → pyspark.rdd.BoundedFloat[source]¶ Approximate operation to return the sum within a timeout or meet the confidence.
New in version 1.2.0.
- Parameters
- timeoutint
maximum time to wait for the job, in milliseconds
- confidencefloat
the desired statistical confidence in the result
- Returns
BoundedFloat
a potentially incomplete result, with error bounds
See also
Examples
>>> rdd = sc.parallelize(range(1000), 10) >>> r = sum(range(1000)) >>> abs(rdd.sumApprox(1000) - r) / r < 0.05 True