pyspark.pandas.DataFrame.plot.bar¶
-
plot.
bar
(x=None, y=None, **kwds)¶ Vertical bar plot.
- Parameters
- xlabel or position, optional
Allows plotting of one column versus another. If not specified, the index of the DataFrame is used.
- ylabel or position, optional
Allows plotting of one column versus another. If not specified, all numerical columns are used.
- **kwdsoptional
Additional keyword arguments are documented in
pyspark.pandas.Series.plot()
orpyspark.pandas.DataFrame.plot()
.
- Returns
plotly.graph_objs.Figure
Return an custom object when
backend!=plotly
. Return an ndarray whensubplots=True
(matplotlib-only).
Examples
Basic plot.
For Series:
>>> s = ps.Series([1, 3, 2]) >>> s.plot.bar()
For DataFrame:
>>> df = ps.DataFrame({'lab': ['A', 'B', 'C'], 'val': [10, 30, 20]}) >>> df.plot.bar(x='lab', y='val')
Plot a whole dataframe to a bar plot. Each column is stacked with a distinct color along the horizontal axis.
>>> speed = [0.1, 17.5, 40, 48, 52, 69, 88] >>> lifespan = [2, 8, 70, 1.5, 25, 12, 28] >>> index = ['snail', 'pig', 'elephant', ... 'rabbit', 'giraffe', 'coyote', 'horse'] >>> df = ps.DataFrame({'speed': speed, ... 'lifespan': lifespan}, index=index) >>> df.plot.bar()
Instead of stacking, the figure can be split by column with plotly APIs.
>>> from plotly.subplots import make_subplots >>> speed = [0.1, 17.5, 40, 48, 52, 69, 88] >>> lifespan = [2, 8, 70, 1.5, 25, 12, 28] >>> index = ['snail', 'pig', 'elephant', ... 'rabbit', 'giraffe', 'coyote', 'horse'] >>> df = ps.DataFrame({'speed': speed, ... 'lifespan': lifespan}, index=index) >>> fig = (make_subplots(rows=2, cols=1) ... .add_trace(df.plot.bar(y='speed').data[0], row=1, col=1) ... .add_trace(df.plot.bar(y='speed').data[0], row=1, col=1) ... .add_trace(df.plot.bar(y='lifespan').data[0], row=2, col=1)) >>> fig
Plot a single column.
>>> speed = [0.1, 17.5, 40, 48, 52, 69, 88] >>> lifespan = [2, 8, 70, 1.5, 25, 12, 28] >>> index = ['snail', 'pig', 'elephant', ... 'rabbit', 'giraffe', 'coyote', 'horse'] >>> df = ps.DataFrame({'speed': speed, ... 'lifespan': lifespan}, index=index) >>> df.plot.bar(y='speed')
Plot only selected categories for the DataFrame.
>>> speed = [0.1, 17.5, 40, 48, 52, 69, 88] >>> lifespan = [2, 8, 70, 1.5, 25, 12, 28] >>> index = ['snail', 'pig', 'elephant', ... 'rabbit', 'giraffe', 'coyote', 'horse'] >>> df = ps.DataFrame({'speed': speed, ... 'lifespan': lifespan}, index=index) >>> df.plot.bar(x='lifespan')