pyspark.RDD.mapValues

RDD.mapValues(f: Callable[[V], U]) → pyspark.rdd.RDD[Tuple[K, U]][source]

Pass each value in the key-value pair RDD through a map function without changing the keys; this also retains the original RDD’s partitioning.

New in version 0.7.0.

Parameters
ffunction

a function to turn a V into a U

Returns
RDD

a RDD containing the keys and the mapped value

Examples

>>> rdd = sc.parallelize([("a", ["apple", "banana", "lemon"]), ("b", ["grapes"])])
>>> def f(x): return len(x)
>>> rdd.mapValues(f).collect()
[('a', 3), ('b', 1)]