Series.
compare
Compare to another Series and show the differences.
Note
This API is slightly different from pandas when indexes from both Series are not identical and config ‘compute.eager_check’ is False. pandas raise an exception; however, pandas-on-Spark just proceeds and performs by ignoring mismatches.
>>> psser1 = ps.Series([1, 2, 3, 4, 5], index=pd.Index([1, 2, 3, 4, 5])) >>> psser2 = ps.Series([1, 2, 3, 4, 5], index=pd.Index([1, 2, 4, 3, 6])) >>> psser1.compare(psser2) ... ValueError: Can only compare identically-labeled Series objects
>>> with ps.option_context("compute.eager_check", False): ... psser1.compare(psser2) ... self other 3 3.0 4.0 4 4.0 3.0 5 5.0 NaN 6 NaN 5.0
Object to compare with.
If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.
If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.
Notes
Matching NaNs will not appear as a difference.
Examples
>>> from pyspark.pandas.config import set_option, reset_option >>> set_option("compute.ops_on_diff_frames", True) >>> s1 = ps.Series(["a", "b", "c", "d", "e"]) >>> s2 = ps.Series(["a", "a", "c", "b", "e"])
Align the differences on columns
>>> s1.compare(s2).sort_index() self other 1 b a 3 d b
Keep all original rows
>>> s1.compare(s2, keep_shape=True).sort_index() self other 0 None None 1 b a 2 None None 3 d b 4 None None
Keep all original rows and all original values
>>> s1.compare(s2, keep_shape=True, keep_equal=True).sort_index() self other 0 a a 1 b a 2 c c 3 d b 4 e e
>>> reset_option("compute.ops_on_diff_frames")