pyspark.pandas.date_range#

pyspark.pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, inclusive='both', **kwargs)[source]#

Return a fixed frequency DatetimeIndex.

Parameters

startstr or datetime-like, optional: Left bound for generating dates.
endstr or datetime-like, optional: Right bound for generating dates.
periodsint, optional: Number of periods to generate.
freqstr or DateOffset, default ‘D’: Frequency strings can have multiples, e.g. ‘5H’.
tzstr or tzinfo, optional: Time zone name for returning localized DatetimeIndex, for example ‘Asia/Hong_Kong’. By default, the resulting DatetimeIndex is time zone naive.
normalizebool, default False: Normalize start/end dates to midnight before generating date range.
namestr, default None: Name of the resulting DatetimeIndex.
inclusive{“both”, “neither”, “left”, “right”}, default “both”: Include boundaries; Whether to set each bound as closed or open.

New in version 4.0.0.
**kwargs: For compatibility. Has no effect on the result.

Returns

rngDatetimeIndex

See also

DatetimeIndex: An immutable container for datetimes.

Notes

Of the four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting DatetimeIndex will have periods linearly spaced elements between start and end (closed on both sides).

To learn more about the frequency strings, please see this link.

Examples

Specifying the values

The next four examples generate the same DatetimeIndex, but vary the combination of start, end and periods.

Specify start and end, with the default daily frequency.

>>> ps.date_range(start='1/1/2018', end='1/08/2018')  
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
              dtype='datetime64[ns]', freq=None)

Specify start and periods, the number of periods (days).

>>> ps.date_range(start='1/1/2018', periods=8)  
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
               '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
              dtype='datetime64[ns]', freq=None)

Specify end and periods, the number of periods (days).

>>> ps.date_range(end='1/1/2018', periods=8)  
DatetimeIndex(['2017-12-25', '2017-12-26', '2017-12-27', '2017-12-28',
               '2017-12-29', '2017-12-30', '2017-12-31', '2018-01-01'],
              dtype='datetime64[ns]', freq=None)

Specify start, end, and periods; the frequency is generated automatically (linearly spaced).

>>> ps.date_range(
...     start='2018-04-24', end='2018-04-27', periods=3
... )  
DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00',
               '2018-04-27 00:00:00'],
              dtype='datetime64[ns]', freq=None)

Other Parameters

Changed the freq (frequency) to 'M' (month end frequency).

>>> ps.date_range(start='1/1/2018', periods=5, freq='M')  
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
               '2018-05-31'],
              dtype='datetime64[ns]', freq=None)

Multiples are allowed

>>> ps.date_range(start='1/1/2018', periods=5, freq='3M')  
DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
               '2019-01-31'],
              dtype='datetime64[ns]', freq=None)

freq can also be specified as an Offset object.

>>> ps.date_range(
...     start='1/1/2018', periods=5, freq=pd.offsets.MonthEnd(3)
... )  
DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
               '2019-01-31'],
              dtype='datetime64[ns]', freq=None)

inclusive controls whether to include start and end that are on the boundary. The default includes boundary points on either end.

>>> ps.date_range(
...     start='2017-01-01', end='2017-01-04', inclusive="both"
... )  
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'],
               dtype='datetime64[ns]', freq=None)

Use inclusive='left' to exclude end if it falls on the boundary.

>>> ps.date_range(
...     start='2017-01-01', end='2017-01-04', inclusive='left'
... )  
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq=None)

Use inclusive='right' to exclude start if it falls on the boundary.

>>> ps.date_range(
...     start='2017-01-01', end='2017-01-04', inclusive='right'
... )  
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq=None)