pyspark.pandas.DataFrame.insert¶
-
DataFrame.
insert
(loc: int, column: Union[Any, Tuple[Any, …]], value: Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, Series, Iterable], allow_duplicates: bool = False) → None[source]¶ Insert column into DataFrame at specified location.
Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.
- Parameters
- locint
Insertion index. Must verify 0 <= loc <= len(columns).
- columnstr, number, or hashable object
Label of the inserted column.
- valueint, Series, or array-like
- allow_duplicatesbool, optional
Examples
>>> psdf = ps.DataFrame([1, 2, 3]) >>> psdf.sort_index() 0 0 1 1 2 2 3 >>> psdf.insert(0, 'x', 4) >>> psdf.sort_index() x 0 0 4 1 1 4 2 2 4 3
>>> from pyspark.pandas.config import set_option, reset_option >>> set_option("compute.ops_on_diff_frames", True)
>>> psdf.insert(1, 'y', [5, 6, 7]) >>> psdf.sort_index() x y 0 0 4 5 1 1 4 6 2 2 4 7 3
>>> psdf.insert(2, 'z', ps.Series([8, 9, 10])) >>> psdf.sort_index() x y z 0 0 4 5 8 1 1 4 6 9 2 2 4 7 10 3
>>> reset_option("compute.ops_on_diff_frames")