pyspark.pandas.range#

pyspark.pandas.range(start, end=None, step=1, num_partitions=None)[source]#

Create a DataFrame with some range of numbers.

The resulting DataFrame has a single int64 column named id, containing elements in a range from start to end (exclusive) with step value step. If only the first parameter (i.e. start) is specified, we treat it as the end value with the start value being 0.

This is like the range function in SparkSession and is used primarily for testing.

Parameters

startint: the start value (inclusive)
endint, optional: the end value (exclusive)
stepint, optional, default 1: the incremental step
num_partitionsint, optional: the number of partitions of the DataFrame

Returns

DataFrame

Examples

When the first parameter is specified, we generate a range of values up till that number.

>>> ps.range(5)
   id
 0
 1
 2
 3
 4

When start, end, and step are specified:

>>> ps.range(start = 100, end = 200, step = 20)
    id
100
120
140
160
180