pyspark.RDD.reduceByKeyLocally¶

RDD.reduceByKeyLocally(func: Callable[[V, V], V]) → Dict[K, V][source]¶

Merge the values for each key using an associative and commutative reduce function, but return the results immediately to the master as a dictionary.

This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce.

New in version 0.7.0.

Parameters

funcfunction: the reduce function

Returns

dict: a dict containing the keys and the aggregated result for each key

See also

RDD.reduceByKey()
RDD.aggregateByKey()

Examples

>>> from operator import add
>>> rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
>>> sorted(rdd.reduceByKeyLocally(add).items())
[('a', 2), ('b', 1)]

pyspark.RDD.reduceByKey

pyspark.RDD.repartition