pyspark.sql.functions.width_bucket#

pyspark.sql.functions.width_bucket(v, min, max, numBucket)[source]#

Returns the bucket number into which the value of this expression would fall after being evaluated. Note that input arguments must follow conditions listed below; otherwise, the method will return null.

New in version 3.5.0.

Parameters

vstr or Column: value to compute a bucket number in the histogram
minstr or Column: minimum value of the histogram
maxstr or Column: maximum value of the histogram
numBucketstr, Column or int: the number of buckets

Returns

Column: the bucket number into which the value would fall after being evaluated

Examples

>>> df = spark.createDataFrame([
...     (5.3, 0.2, 10.6, 5),
...     (-2.1, 1.3, 3.4, 3),
...     (8.1, 0.0, 5.7, 4),
...     (-0.9, 5.2, 0.5, 2)],
...     ['v', 'min', 'max', 'n'])
>>> df.select(width_bucket('v', 'min', 'max', 'n')).show()
+----------------------------+
|width_bucket(v, min, max, n)|
+----------------------------+
|                           3|
|                           0|
|                           5|
|                           3|
+----------------------------+