RDD.
leftOuterJoin
Perform a left outer join of self and other.
For each element (k, v) in self, the resulting RDD will either contain all pairs (k, (v, w)) for w in other, or the pair (k, (v, None)) if no elements in other have key k.
Hash-partitions the resulting RDD into the given number of partitions.
New in version 0.7.0.
RDD
another RDD
the number of partitions in new RDD
a RDD containing all pairs of elements with matching keys
See also
RDD.join()
RDD.rightOuterJoin()
RDD.fullOuterJoin()
pyspark.sql.DataFrame.join()
Examples
>>> rdd1 = sc.parallelize([("a", 1), ("b", 4)]) >>> rdd2 = sc.parallelize([("a", 2)]) >>> sorted(rdd1.leftOuterJoin(rdd2).collect()) [('a', (1, 2)), ('b', (4, None))]