pyspark.pandas.
to_numeric
Convert argument to a numeric type.
Argument to be converted.
If ‘coerce’, then invalid parsing will be set as NaN.
If ‘raise’, then invalid parsing will raise an exception.
If ‘ignore’, then invalid parsing will return the input.
Note
‘ignore’ doesn’t work yet when arg is pandas-on-Spark Series.
See also
DataFrame.astype
Cast argument to a specified dtype.
to_datetime
Convert argument to datetime.
to_timedelta
Convert argument to timedelta.
numpy.ndarray.astype
Cast a numpy array to a specified type.
Examples
>>> psser = ps.Series(['1.0', '2', '-3']) >>> psser 0 1.0 1 2 2 -3 dtype: object
>>> ps.to_numeric(psser) 0 1.0 1 2.0 2 -3.0 dtype: float32
If given Series contains invalid value to cast float, just cast it to np.nan when errors is set to “coerce”.
>>> psser = ps.Series(['apple', '1.0', '2', '-3']) >>> psser 0 apple 1 1.0 2 2 3 -3 dtype: object
>>> ps.to_numeric(psser, errors="coerce") 0 NaN 1 1.0 2 2.0 3 -3.0 dtype: float32
Also support for list, tuple, np.array, or a scalar
>>> ps.to_numeric(['1.0', '2', '-3']) array([ 1., 2., -3.])
>>> ps.to_numeric(('1.0', '2', '-3')) array([ 1., 2., -3.])
>>> ps.to_numeric(np.array(['1.0', '2', '-3'])) array([ 1., 2., -3.])
>>> ps.to_numeric('1.0') 1.0