RDD.
cogroup
For each key k in self or other, return a resulting RDD that contains a tuple with the list of values for that key in self as well as other.
New in version 0.7.0.
RDD
another RDD
a RDD containing the keys and cogrouped values
See also
RDD.groupWith()
RDD.join()
Examples
>>> rdd1 = sc.parallelize([("a", 1), ("b", 4)]) >>> rdd2 = sc.parallelize([("a", 2)]) >>> [(x, tuple(map(list, y))) for x, y in sorted(list(rdd1.cogroup(rdd2).collect()))] [('a', ([1], [2])), ('b', ([4], []))]