RDD.
groupWith
Alias for cogroup but with support for multiple RDDs.
New in version 0.7.0.
RDD
another RDD
other RDDs
a RDD containing the keys and cogrouped values
See also
RDD.cogroup()
RDD.join()
Examples
>>> rdd1 = sc.parallelize([("a", 5), ("b", 6)]) >>> rdd2 = sc.parallelize([("a", 1), ("b", 4)]) >>> rdd3 = sc.parallelize([("a", 2)]) >>> rdd4 = sc.parallelize([("b", 42)]) >>> [(x, tuple(map(list, y))) for x, y in ... sorted(list(rdd1.groupWith(rdd2, rdd3, rdd4).collect()))] [('a', ([5], [1], [2], [])), ('b', ([6], [4], [], [42]))]