Skip to main content

Data types

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

For rules governing how conflicts between data types are resolved, see SQL data type rules.

Supported data types

Databricks supports the following data types:

Data TypeDescription
BIGINTRepresents 8-byte signed integer numbers.
BINARYRepresents byte sequence values.
BOOLEANRepresents Boolean values.
DATERepresents values comprising values of fields year, month and day, without a time-zone.
DECIMAL(p,s)Represents numbers with maximum precision p and fixed scale s.
DOUBLERepresents 8-byte double-precision floating point numbers.
FLOATRepresents 4-byte single-precision floating point numbers.
INTRepresents 4-byte signed integer numbers.
INTERVAL intervalQualifierRepresents intervals of time either on a scale of seconds or months.
VOIDRepresents the untyped NULL.
SMALLINTRepresents 2-byte signed integer numbers.
STRINGRepresents character string values.
TIMESTAMPRepresents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone.
TIMESTAMP_NTZRepresents values comprising values of fields year, month, day, hour, minute, and second. All operations are performed without taking any time zone into account.
TINYINTRepresents 1-byte signed integer numbers.
ARRAY < elementType >Represents values comprising a sequence of elements with the type of elementType.
MAP < keyType,valueType >Represents values comprising a set of key-value pairs.
STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] >Represents values with the structure described by a sequence of fields.
VARIANTRepresents semi-structured data.
OBJECTRepresents values in a VARIANT with the structure described by a set of fields.
important

Delta Lake does not support the VOID type.

Data type classification

Data types are grouped into the following classes:

  • Binary floating point types use exponents and a binary representation to cover a large range of numbers:

Language mappings

Applies to: check marked yes Databricks Runtime

Spark SQL data types are defined in the package org.apache.spark.sql.types. You access them by importing the package:

Scala
import org.apache.spark.sql.types._
SQL typeData typeValue typeAPI to access or create data type
TINYINTByteTypeByteByteType
SMALLINTShortTypeShortShortType
INTIntegerTypeIntIntegerType
BIGINTLongTypeLongLongType
FLOATFloatTypeFloatFloatType
DOUBLEDoubleTypeDoubleDoubleType
DECIMAL(p,s)DecimalTypejava.math.BigDecimalDecimalType
STRINGStringTypeStringStringType
BINARYBinaryTypeArray[Byte]BinaryType
BOOLEANBooleanTypeBooleanBooleanType
TIMESTAMPTimestampTypejava.sql.TimestampTimestampType
TIMESTAMP_NTZTimestampNTZTypejava.time.LocalDateTimeTimestampNTZType
DATEDateTypejava.sql.DateDateType
year-month intervalYearMonthIntervalTypejava.time.PeriodYearMonthIntervalType (3)
day-time intervalDayTimeIntervalTypejava.time.DurationDayTimeIntervalType (3)
ARRAYArrayTypescala.collection.SeqArrayType(elementType [, containsNull]). (2)
MAPMapTypescala.collection.MapMapType(keyType, valueType [, valueContainsNull]). (2)
STRUCTStructTypeorg.apache.spark.sql.RowStructType(fields). fields is a Seq of StructField. 4.
StructFieldThe value type of the data type of this field(For example, Int for a StructField with the data type IntegerType)StructField(name, dataType [, nullable]). 4
VARIANTVariantTypeorg.apache.spark.unsafe.type.VariantValVariantType
OBJECTNot SupportedNot supportedNot supported

(1) Numbers are converted to the domain at runtime. Make sure that numbers are within range.

(2) The optional value defaults to TRUE.

(3) Interval types

  • YearMonthIntervalType([startField,] endField): Represents a year-month interval which is made up of a contiguous subset of the following fields:

    startField is the leftmost field, and endField is the rightmost field of the type. Valid values of startField and endField are 0(MONTH) and 1(YEAR).

  • DayTimeIntervalType([startField,] endField): Represents a day-time interval which is made up of a contiguous subset of the following fields:

    startField is the leftmost field, and endField is the rightmost field of the type. Valid values of startField and endField are 0(DAY), 1(HOUR), 2(MINUTE), 3(SECOND).

(4) StructType

  • StructType(fields) Represents values with the structure described by a sequence, list, or array of StructFields (fields). Two fields with the same name are not allowed.
  • StructField(name, dataType, nullable) Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable indicates if values of these fields can have null values. This is the default.