Data Types
Spark SQL and DataFrames support the following data types:
- Numeric types
ByteType
: Represents 1-byte signed integer numbers. The range of numbers is from-128
to127
.ShortType
: Represents 2-byte signed integer numbers. The range of numbers is from-32768
to32767
.IntegerType
: Represents 4-byte signed integer numbers. The range of numbers is from-2147483648
to2147483647
.LongType
: Represents 8-byte signed integer numbers. The range of numbers is from-9223372036854775808
to9223372036854775807
.FloatType
: Represents 4-byte single-precision floating point numbers.DoubleType
: Represents 8-byte double-precision floating point numbers.DecimalType
: Represents arbitrary-precision signed decimal numbers. Backed internally byjava.math.BigDecimal
. ABigDecimal
consists of an arbitrary precision integer unscaled value and a 32-bit integer scale.
- String type
StringType
: Represents character string values.
- Binary type
BinaryType
: Represents byte sequence values.
- Boolean type
BooleanType
: Represents boolean values.
- Datetime type
TimestampType
: Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local time-zone. The timestamp value represents an absolute point in time.DateType
: Represents values comprising values of fields year, month and day, without a time-zone.
- Complex types
ArrayType(elementType, containsNull)
: Represents values comprising a sequence of elements with the type ofelementType
.containsNull
is used to indicate if elements in aArrayType
value can havenull
values.MapType(keyType, valueType, valueContainsNull)
: Represents values comprising a set of key-value pairs. The data type of keys is described bykeyType
and the data type of values is described byvalueType
. For aMapType
value, keys are not allowed to havenull
values.valueContainsNull
is used to indicate if values of aMapType
value can havenull
values.StructType(fields)
: Represents values with the structure described by a sequence ofStructField
s (fields
).StructField(name, dataType, nullable)
: Represents a field in aStructType
. The name of a field is indicated byname
. The data type of a field is indicated bydataType
.nullable
is used to indicate if values of these fields can havenull
values.
All data types of Spark SQL are located in the package org.apache.spark.sql.types
.
You can access them by doing
import org.apache.spark.sql.types._
Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala" in the Spark repo.
Data type | Value type in Scala | API to access or create a data type |
---|---|---|
ByteType | Byte | ByteType |
ShortType | Short | ShortType |
IntegerType | Int | IntegerType |
LongType | Long | LongType |
FloatType | Float | FloatType |
DoubleType | Double | DoubleType |
DecimalType | java.math.BigDecimal | DecimalType |
StringType | String | StringType |
BinaryType | Array[Byte] | BinaryType |
BooleanType | Boolean | BooleanType |
TimestampType | java.sql.Timestamp | TimestampType |
DateType | java.sql.Date | DateType |
ArrayType | scala.collection.Seq |
ArrayType(elementType, [containsNull]) Note: The default value of containsNull is true. |
MapType | scala.collection.Map |
MapType(keyType, valueType, [valueContainsNull]) Note: The default value of valueContainsNull is true. |
StructType | org.apache.spark.sql.Row |
StructType(fields) Note: fields is a Seq of StructFields. Also, two fields with the same name are not allowed. |
StructField | The value type in Scala of the data type of this field (For example, Int for a StructField with the data type IntegerType) |
StructField(name, dataType, [nullable]) Note: The default value of nullable is true. |
All data types of Spark SQL are located in the package of
org.apache.spark.sql.types
. To access or create a data type,
please use factory methods provided in
org.apache.spark.sql.types.DataTypes
.
Data type | Value type in Java | API to access or create a data type |
---|---|---|
ByteType | byte or Byte | DataTypes.ByteType |
ShortType | short or Short | DataTypes.ShortType |
IntegerType | int or Integer | DataTypes.IntegerType |
LongType | long or Long | DataTypes.LongType |
FloatType | float or Float | DataTypes.FloatType |
DoubleType | double or Double | DataTypes.DoubleType |
DecimalType | java.math.BigDecimal |
DataTypes.createDecimalType() DataTypes.createDecimalType(precision, scale). |
StringType | String | DataTypes.StringType |
BinaryType | byte[] | DataTypes.BinaryType |
BooleanType | boolean or Boolean | DataTypes.BooleanType |
TimestampType | java.sql.Timestamp | DataTypes.TimestampType |
DateType | java.sql.Date | DataTypes.DateType |
ArrayType | java.util.List |
DataTypes.createArrayType(elementType) Note: The value of containsNull will be true DataTypes.createArrayType(elementType, containsNull). |
MapType | java.util.Map |
DataTypes.createMapType(keyType, valueType) Note: The value of valueContainsNull will be true. DataTypes.createMapType(keyType, valueType, valueContainsNull) |
StructType | org.apache.spark.sql.Row |
DataTypes.createStructType(fields) Note: fields is a List or an array of StructFields. Also, two fields with the same name are not allowed. |
StructField | The value type in Java of the data type of this field (For example, int for a StructField with the data type IntegerType) | DataTypes.createStructField(name, dataType, nullable) |
All data types of Spark SQL are located in the package of pyspark.sql.types
.
You can access them by doing
Data type | Value type in Python | API to access or create a data type |
---|---|---|
ByteType |
int or long Note: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127. |
ByteType() |
ShortType |
int or long Note: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767. |
ShortType() |
IntegerType | int or long | IntegerType() |
LongType |
long Note: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType. |
LongType() |
FloatType |
float Note: Numbers will be converted to 4-byte single-precision floating point numbers at runtime. |
FloatType() |
DoubleType | float | DoubleType() |
DecimalType | decimal.Decimal | DecimalType() |
StringType | string | StringType() |
BinaryType | bytearray | BinaryType() |
BooleanType | bool | BooleanType() |
TimestampType | datetime.datetime | TimestampType() |
DateType | datetime.date | DateType() |
ArrayType | list, tuple, or array |
ArrayType(elementType, [containsNull]) Note: The default value of containsNull is True. |
MapType | dict |
MapType(keyType, valueType, [valueContainsNull]) Note: The default value of valueContainsNull is True. |
StructType | list or tuple |
StructType(fields) Note: fields is a Seq of StructFields. Also, two fields with the same name are not allowed. |
StructField | The value type in Python of the data type of this field (For example, Int for a StructField with the data type IntegerType) |
StructField(name, dataType, [nullable]) Note: The default value of nullable is True. |
Data type | Value type in R | API to access or create a data type |
---|---|---|
ByteType |
integer Note: Numbers will be converted to 1-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -128 to 127. |
"byte" |
ShortType |
integer Note: Numbers will be converted to 2-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -32768 to 32767. |
"short" |
IntegerType | integer | "integer" |
LongType |
integer Note: Numbers will be converted to 8-byte signed integer numbers at runtime. Please make sure that numbers are within the range of -9223372036854775808 to 9223372036854775807. Otherwise, please convert data to decimal.Decimal and use DecimalType. |
"long" |
FloatType |
numeric Note: Numbers will be converted to 4-byte single-precision floating point numbers at runtime. |
"float" |
DoubleType | numeric | "double" |
DecimalType | Not supported | Not supported |
StringType | character | "string" |
BinaryType | raw | "binary" |
BooleanType | logical | "bool" |
TimestampType | POSIXct | "timestamp" |
DateType | Date | "date" |
ArrayType | vector or list |
list(type="array", elementType=elementType, containsNull=[containsNull]) Note: The default value of containsNull is TRUE. |
MapType | environment |
list(type="map", keyType=keyType, valueType=valueType, valueContainsNull=[valueContainsNull]) Note: The default value of valueContainsNull is TRUE. |
StructType | named list |
list(type="struct", fields=fields) Note: fields is a Seq of StructFields. Also, two fields with the same name are not allowed. |
StructField | The value type in R of the data type of this field (For example, integer for a StructField with the data type IntegerType) |
list(name=name, type=dataType, nullable=[nullable]) Note: The default value of nullable is TRUE. |