pyspark.pandas.
Series
pandas-on-Spark Series that corresponds to pandas Series logically. This holds Spark Column internally.
_internal – an internal immutable Frame to manage metadata.
_psdf – Parent’s pandas-on-Spark DataFrame
Contains data stored in Series Note that if data is a pandas Series, other arguments should not be used.
Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence is used, the index will override the keys found in the dict.
If None, dtype will be inferred
Copy input data
Methods
abs()
abs
Return a Series/DataFrame with absolute numeric value of each element.
add(other)
add
Return Addition of series and other, element-wise (binary operator +).
add_prefix(prefix)
add_prefix
Prefix labels with string prefix.
add_suffix(suffix)
add_suffix
Suffix labels with string suffix.
agg(func)
agg
Aggregate using one or more operations over the specified axis.
aggregate(func)
aggregate
align(other[, join, axis, copy])
align
Align two objects on their axes with the specified join method.
all([axis, skipna])
all
Return whether all elements are True.
any([axis])
any
Return whether any element is True.
append(to_append[, ignore_index, …])
append
Concatenate two or more Series.
apply(func[, args])
apply
Invoke function on values of Series.
argmax([axis, skipna])
argmax
Return int position of the largest value in the Series.
argmin([axis, skipna])
argmin
Return int position of the smallest value in the Series.
argsort()
argsort
Return the integer indices that would sort the Series values.
asof(where)
asof
Return the last row(s) without any NaNs before where.
astype(dtype)
astype
Cast a pandas-on-Spark object to a specified dtype dtype.
dtype
at_time(time[, asof, axis])
at_time
Select values at particular time of day (example: 9:30AM).
autocorr([lag])
autocorr
Compute the lag-N autocorrelation.
backfill([axis, inplace, limit])
backfill
Synonym for DataFrame.fillna() or Series.fillna() with method=`bfill`.
method=`bfill`
between(left, right[, inclusive])
between
Return boolean Series equivalent to left <= series <= right.
between_time(start_time, end_time[, …])
between_time
Select values between particular times of the day (example: 9:00-9:30 AM).
bfill([axis, inplace, limit])
bfill
bool()
bool
Return the bool of a single element in the current object.
clip([lower, upper, inplace])
clip
Trim values at input threshold(s).
combine_first(other)
combine_first
Combine Series values, choosing the calling Series’s values first.
compare(other[, keep_shape, keep_equal])
compare
Compare to another Series and show the differences.
copy([deep])
copy
Make a copy of this object’s indices and data.
corr(other[, method, min_periods])
corr
Compute correlation with other Series, excluding missing values.
count([axis, numeric_only])
count
Count non-NA cells for each column.
cov(other[, min_periods, ddof])
cov
Compute covariance with Series, excluding missing values.
cummax([skipna])
cummax
Return cumulative maximum over a DataFrame or Series axis.
cummin([skipna])
cummin
Return cumulative minimum over a DataFrame or Series axis.
cumprod([skipna])
cumprod
Return cumulative product over a DataFrame or Series axis.
cumsum([skipna])
cumsum
Return cumulative sum over a DataFrame or Series axis.
describe([percentiles])
describe
Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
NaN
diff([periods])
diff
First discrete difference of element.
div(other)
div
Return Floating division of series and other, element-wise (binary operator /).
divide(other)
divide
divmod(other)
divmod
Return Integer division and modulo of series and other, element-wise (binary operator divmod).
dot(other)
dot
Compute the dot product between the Series and the columns of other.
drop([labels, index, columns, level, inplace])
drop
Return Series with specified index labels removed.
drop_duplicates([keep, inplace])
drop_duplicates
Return Series with duplicate values removed.
droplevel(level)
droplevel
Return Series with requested index level(s) removed.
dropna([axis, inplace])
dropna
Return a new Series with missing values removed.
duplicated([keep])
duplicated
Indicate duplicate Series values.
eq(other)
eq
Compare if the current value is equal to the other.
equals(other)
equals
ewm([com, span, halflife, alpha, …])
ewm
Provide exponentially weighted window transformations.
expanding([min_periods])
expanding
Provide expanding transformations.
explode()
explode
Transform each element of a list-like to a row.
factorize([sort, na_sentinel])
factorize
Encode the object as an enumerated type or categorical variable.
ffill([axis, inplace, limit])
ffill
Synonym for DataFrame.fillna() or Series.fillna() with method=`ffill`.
method=`ffill`
fillna([value, method, axis, inplace, limit])
fillna
Fill NA/NaN values.
filter([items, like, regex, axis])
filter
Subset rows or columns of dataframe according to labels in the specified index.
first(offset)
first
Select first periods of time series data based on a date offset.
first_valid_index()
first_valid_index
Retrieves the index of the first valid value.
floordiv(other)
floordiv
Return Integer division of series and other, element-wise (binary operator //).
ge(other)
ge
Compare if the current value is greater than or equal to the other.
get(key[, default])
get
Get item from object for given key (DataFrame column, Panel slice, etc.).
get_dtype_counts()
get_dtype_counts
Return counts of unique dtypes in this object.
groupby(by[, axis, as_index, dropna])
groupby
Group DataFrame or Series using one or more columns.
gt(other)
gt
Compare if the current value is greater than the other.
head([n])
head
Return the first n rows.
hist([bins])
hist
Draw one histogram of the DataFrame’s columns.
idxmax([skipna])
idxmax
Return the row label of the maximum value.
idxmin([skipna])
idxmin
Return the row label of the minimum value.
interpolate([method, limit, …])
interpolate
Fill NaN values using an interpolation method.
isin(values)
isin
Check whether values are contained in Series or Index.
isna()
isna
Detect existing (non-missing) values.
isnull()
isnull
item()
item
Return the first element of the underlying data as a Python scalar.
items()
items
Lazily iterate over (index, value) tuples.
iteritems()
iteritems
This is an alias of items.
keys()
keys
Return alias for index.
kurt([axis, skipna, numeric_only])
kurt
Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
kurtosis([axis, skipna, numeric_only])
kurtosis
last(offset)
last
Select final periods of time series data based on a date offset.
last_valid_index()
last_valid_index
Return index for last non-NA/null value.
le(other)
le
Compare if the current value is less than or equal to the other.
lt(other)
lt
Compare if the current value is less than the other.
mad()
mad
Return the mean absolute deviation of values.
map(arg[, na_action])
map
Map values of Series according to input correspondence.
mask(cond[, other])
mask
Replace values where the condition is True.
max([axis, skipna, numeric_only])
max
Return the maximum of the values.
mean([axis, skipna, numeric_only])
mean
Return the mean of the values.
median([axis, skipna, numeric_only, accuracy])
median
Return the median of the values for the requested axis.
min([axis, skipna, numeric_only])
min
Return the minimum of the values.
mod(other)
mod
Return Modulo of series and other, element-wise (binary operator %).
mode([dropna])
mode
Return the mode(s) of the dataset.
mul(other)
mul
Return Multiplication of series and other, element-wise (binary operator *).
multiply(other)
multiply
ne(other)
ne
Compare if the current value is not equal to the other.
nlargest([n])
nlargest
Return the largest n elements.
notna()
notna
notnull()
notnull
nsmallest([n])
nsmallest
Return the smallest n elements.
nunique([dropna, approx, rsd])
nunique
Return number of unique elements in the object.
pad([axis, inplace, limit])
pad
pct_change([periods])
pct_change
Percentage change between the current and a prior element.
pipe(func, *args, **kwargs)
pipe
Apply func(self, *args, **kwargs).
pop(item)
pop
Return item and drop from series.
pow(other)
pow
Return Exponential power of series of series and other, element-wise (binary operator **).
prod([axis, skipna, numeric_only, min_count])
prod
Return the product of the values.
product([axis, skipna, numeric_only, min_count])
product
quantile([q, accuracy])
quantile
Return value at the given quantile.
radd(other)
radd
Return Reverse Addition of series and other, element-wise (binary operator +).
rank([method, ascending, numeric_only])
rank
Compute numerical data ranks (1 through n) along axis.
rdiv(other)
rdiv
Return Reverse Floating division of series and other, element-wise (binary operator /).
rdivmod(other)
rdivmod
Return Integer division and modulo of series and other, element-wise (binary operator rdivmod).
reindex([index, fill_value])
reindex
Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.
reindex_like(other)
reindex_like
Return a Series with matching indices as other object.
rename([index])
rename
Alter Series index labels or name.
rename_axis([mapper, index, inplace])
rename_axis
Set the name of the axis for the index or columns.
repeat(repeats)
repeat
Repeat elements of a Series.
replace([to_replace, value, regex])
replace
Replace values given in to_replace with value.
resample(rule[, closed, label, on])
resample
Resample time-series data.
reset_index([level, drop, name, inplace])
reset_index
Generate a new DataFrame or Series with the index reset.
rfloordiv(other)
rfloordiv
Return Reverse Integer division of series and other, element-wise (binary operator //).
rmod(other)
rmod
Return Reverse Modulo of series and other, element-wise (binary operator %).
rmul(other)
rmul
Return Reverse Multiplication of series and other, element-wise (binary operator *).
rolling(window[, min_periods])
rolling
Provide rolling transformations.
round([decimals])
round
Round each value in a Series to the given number of decimals.
rpow(other)
rpow
Return Reverse Exponential power of series and other, element-wise (binary operator **).
rsub(other)
rsub
Return Reverse Subtraction of series and other, element-wise (binary operator -).
rtruediv(other)
rtruediv
sample([n, frac, replace, random_state, …])
sample
Return a random sample of items from an axis of object.
searchsorted(value[, side])
searchsorted
Find indices where elements should be inserted to maintain order.
sem([axis, skipna, ddof, numeric_only])
sem
Return unbiased standard error of the mean over requested axis.
shift([periods, fill_value])
shift
Shift Series/Index by desired number of periods.
skew([axis, skipna, numeric_only])
skew
Return unbiased skew normalized by N-1.
sort_index([axis, level, ascending, …])
sort_index
Sort object by labels (along an axis)
sort_values([ascending, inplace, …])
sort_values
Sort by the values.
squeeze([axis])
squeeze
Squeeze 1 dimensional axis objects into scalars.
std([axis, skipna, ddof, numeric_only])
std
Return sample standard deviation.
sub(other)
sub
Return Subtraction of series and other, element-wise (binary operator -).
subtract(other)
subtract
sum([axis, skipna, numeric_only, min_count])
sum
Return the sum of the values.
swapaxes(i, j[, copy])
swapaxes
Interchange axes and swap values axes appropriately.
swaplevel([i, j, copy])
swaplevel
Swap levels i and j in a MultiIndex.
tail([n])
tail
Return the last n rows.
take(indices)
take
Return the elements in the given positional indices along an axis.
to_clipboard([excel, sep])
to_clipboard
Copy object to the system clipboard.
to_csv([path, sep, na_rep, columns, header, …])
to_csv
Write object to a comma-separated values (csv) file.
to_dataframe([name])
to_dataframe
Convert Series to DataFrame.
to_dict([into])
to_dict
Convert Series to {label -> value} dict or dict-like object.
to_excel(excel_writer[, sheet_name, na_rep, …])
to_excel
Write object to an Excel sheet.
to_frame([name])
to_frame
to_json([path, compression, num_files, …])
to_json
Convert the object to a JSON string.
to_latex([buf, columns, col_space, header, …])
to_latex
Render an object to a LaTeX tabular environment table.
to_list()
to_list
Return a list of the values.
to_markdown([buf, mode])
to_markdown
Print Series or DataFrame in Markdown-friendly format.
to_numpy()
to_numpy
A NumPy ndarray representing the values in this DataFrame or Series.
to_pandas()
to_pandas
Return a pandas Series.
to_string([buf, na_rep, float_format, …])
to_string
Render a string representation of the Series.
tolist()
tolist
transform(func[, axis])
transform
Call func producing the same type as self with transformed values and that has the same axis length as input.
func
transpose(*args, **kwargs)
transpose
Return the transpose, which is self.
truediv(other)
truediv
truncate([before, after, axis, copy])
truncate
Truncate a Series or DataFrame before and after some index value.
unique()
unique
Return unique values of Series object.
unstack([level])
unstack
Unstack, a.k.a.
update(other)
update
Modify Series in place using non-NA values from passed Series.
value_counts([normalize, sort, ascending, …])
value_counts
Return a Series containing counts of unique values.
var([axis, ddof, numeric_only])
var
Return unbiased variance.
where(cond[, other])
where
Replace values where the condition is False.
xs(key[, level])
xs
Return cross-section from the Series.
Attributes
T
at
Access a single value for a row/column label pair.
axes
Return a list of the row axis labels.
Return the dtype object of the underlying data.
dtypes
empty
Returns true if the current object is empty.
hasnans
Return True if it has any missing values.
iat
Access a single value for a row/column pair by integer position.
iloc
Purely integer-location based indexing for selection by position.
index
The index (axis labels) Column of the Series.
is_monotonic
Return boolean if values in the object are monotonically increasing.
is_monotonic_decreasing
Return boolean if values in the object are monotonically decreasing.
is_monotonic_increasing
is_unique
Return boolean if values in the object are unique
loc
Access a group of rows and columns by label(s) or a boolean Series.
name
Return name of the Series.
ndim
Return an int representing the number of array dimensions.
shape
Return a tuple of the shape of the underlying data.
size
Return an int representing the number of elements in this object.
values
Return a Numpy representation of the DataFrame or the Series.