pyspark.pandas.range#

pyspark.pandas.range(start, end=None, step=1, num_partitions=None)[source]#

Create a DataFrame with some range of numbers.

The resulting DataFrame has a single int64 column named id, containing elements in a range from start to end (exclusive) with step value step. If only the first parameter (i.e. start) is specified, we treat it as the end value with the start value being 0.

This is like the range function in SparkSession and is used primarily for testing.

Parameters
startint

the start value (inclusive)

endint, optional

the end value (exclusive)

stepint, optional, default 1

the incremental step

num_partitionsint, optional

the number of partitions of the DataFrame

Returns
DataFrame

Examples

When the first parameter is specified, we generate a range of values up till that number.

>>> ps.range(5)
   id
0   0
1   1
2   2
3   3
4   4

When start, end, and step are specified:

>>> ps.range(start = 100, end = 200, step = 20)
    id
0  100
1  120
2  140
3  160
4  180