pyspark.pandas.date_range#

pyspark.pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, inclusive='both', **kwargs)[source]#

Return a fixed frequency DatetimeIndex.

Parameters
startstr or datetime-like, optional

Left bound for generating dates.

endstr or datetime-like, optional

Right bound for generating dates.

periodsint, optional

Number of periods to generate.

freqstr or DateOffset, default ‘D’

Frequency strings can have multiples, e.g. ‘5H’.

tzstr or tzinfo, optional

Time zone name for returning localized DatetimeIndex, for example ‘Asia/Hong_Kong’. By default, the resulting DatetimeIndex is time zone naive.

normalizebool, default False

Normalize start/end dates to midnight before generating date range.

namestr, default None

Name of the resulting DatetimeIndex.

inclusive{“both”, “neither”, “left”, “right”}, default “both”

Include boundaries; Whether to set each bound as closed or open.

New in version 4.0.0.

**kwargs

For compatibility. Has no effect on the result.

Returns
rngDatetimeIndex

See also

DatetimeIndex

An immutable container for datetimes.

Notes

Of the four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting DatetimeIndex will have periods linearly spaced elements between start and end (closed on both sides).

To learn more about the frequency strings, please see this link.

Examples

Specifying the values

The next four examples generate the same DatetimeIndex, but vary the combination of start, end and periods.

Specify start and end, with the default daily frequency.

>>> ps.date_range(start='1/1/2018', end='1/08/2018')  
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
              dtype='datetime64[ns]', freq=None)

Specify start and periods, the number of periods (days).

>>> ps.date_range(start='1/1/2018', periods=8)  
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
              dtype='datetime64[ns]', freq=None)

Specify end and periods, the number of periods (days).

>>> ps.date_range(end='1/1/2018', periods=8)  
DatetimeIndex(['2017-12-25', '2017-12-26', '2017-12-27', '2017-12-28',
               '2017-12-29', '2017-12-30', '2017-12-31', '2018-01-01'],
              dtype='datetime64[ns]', freq=None)

Specify start, end, and periods; the frequency is generated automatically (linearly spaced).

>>> ps.date_range(
...     start='2018-04-24', end='2018-04-27', periods=3
... )  
DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00',
               '2018-04-27 00:00:00'],
              dtype='datetime64[ns]', freq=None)

Other Parameters

Changed the freq (frequency) to 'M' (month end frequency).

>>> ps.date_range(start='1/1/2018', periods=5, freq='M')  
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
               '2018-05-31'],
              dtype='datetime64[ns]', freq=None)

Multiples are allowed

>>> ps.date_range(start='1/1/2018', periods=5, freq='3M')  
DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
               '2019-01-31'],
              dtype='datetime64[ns]', freq=None)

freq can also be specified as an Offset object.

>>> ps.date_range(
...     start='1/1/2018', periods=5, freq=pd.offsets.MonthEnd(3)
... )  
DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
               '2019-01-31'],
              dtype='datetime64[ns]', freq=None)

inclusive controls whether to include start and end that are on the boundary. The default includes boundary points on either end.

>>> ps.date_range(
...     start='2017-01-01', end='2017-01-04', inclusive="both"
... )  
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'],
               dtype='datetime64[ns]', freq=None)

Use inclusive='left' to exclude end if it falls on the boundary.

>>> ps.date_range(
...     start='2017-01-01', end='2017-01-04', inclusive='left'
... )  
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq=None)

Use inclusive='right' to exclude start if it falls on the boundary.

>>> ps.date_range(
...     start='2017-01-01', end='2017-01-04', inclusive='right'
... )  
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq=None)