pyspark.RDD.intersection¶
-
RDD.
intersection
(other)[source]¶ Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.
Notes
This method performs a shuffle internally.
Examples
>>> rdd1 = sc.parallelize([1, 10, 2, 3, 4, 5]) >>> rdd2 = sc.parallelize([1, 6, 2, 3, 7, 8]) >>> rdd1.intersection(rdd2).collect() [1, 2, 3]