CREATE TABLE with Hive format
Applies to: Databricks Runtime
Defines a table using Hive format.
Syntax
CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1[:] col_type1 [ COMMENT col_comment1 ], ... ) ]
[ COMMENT table_comment ]
[ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... )
| ( col_name1, col_name2, ... ) ]
[ ROW FORMAT row_format ]
[ STORED AS file_format ]
[ LOCATION path ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
[ AS select_statement ]
row_format:
: SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
| DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ]
[ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ]
[ MAP KEYS TERMINATED BY map_key_terminated_char ]
[ LINES TERMINATED BY row_terminated_char ]
[ NULL DEFINED AS null_char ]
The clauses between the column definition clause and the AS SELECT
clause can appear in any order. For example, you can write COMMENT table_comment
after TBLPROPERTIES
.
You must specify either the STORED AS
or ROW FORMAT
clause. Otherwise, the SQL parser uses the CREATE TABLE [USING] syntax to parse it and creates a Delta table by default.
Parameters
-
table_identifier
A table name, optionally qualified with a schema name.
Syntax:
[schema_name.] table_name
-
EXTERNAL
Defines the table using the path provided in
LOCATION
. -
PARTITIONED BY
Partitions the table by the specified columns.
-
ROW FORMAT
Use the
SERDE
clause to specify a custom SerDe for one table. Otherwise, use theDELIMITED
clause to use the native SerDe and specify the delimiter, escape character, null character and so on. -
SERDE
Specifies a custom SerDe for one table.
-
serde_class
Specifies a fully-qualified class name of a custom SerDe.
-
SERDEPROPERTIES
A list of key-value pairs used to tag the SerDe definition.
-
DELIMITED
The
DELIMITED
clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on. -
FIELDS TERMINATED BY
Used to define a column separator.
-
COLLECTION ITEMS TERMINATED BY
Used to define a collection item separator.
-
MAP KEYS TERMINATED BY
Used to define a map key separator.
-
LINES TERMINATED BY
Used to define a row separator.
-
NULL DEFINED AS
Used to define the specific value for NULL.
-
ESCAPED BY
Define the escape mechanism.
-
COLLECTION ITEMS TERMINATED BY
Define a collection item separator.
-
MAP KEYS TERMINATED BY
Define a map key separator.
-
LINES TERMINATED BY
Define a row separator.
-
NULL DEFINED AS
Define the specific value for
NULL
. -
STORED AS
The file format for the table. Available formats include
TEXTFILE
,SEQUENCEFILE
,RCFILE
,ORC
,PARQUET
, andAVRO
. Alternatively, you can specify your own input and output formats throughINPUTFORMAT
andOUTPUTFORMAT
. Only formatsTEXTFILE
,SEQUENCEFILE
, andRCFILE
can be used withROW FORMAT SERDE
and onlyTEXTFILE
can be used withROW FORMAT DELIMITED
. -
LOCATION
Path to the directory where table data is stored, which could be a path on distributed storage.
-
COMMENT
A string literal to describe the table.
-
TBLPROPERTIES
A list of key-value pairs used to tag the table definition.
-
AS select_statement
Populates the table using the data from the select statement.
Examples
--Use hive format
CREATE TABLE student (id INT, name STRING, age INT) STORED AS ORC;
--Use data from another table
CREATE TABLE student_copy STORED AS ORC
AS SELECT * FROM student;
--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT)
COMMENT 'this is a comment'
STORED AS ORC
TBLPROPERTIES ('foo'='bar');
--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT)
STORED AS ORC
TBLPROPERTIES ('foo'='bar')
COMMENT 'this is a comment';
--Create partitioned table
CREATE TABLE student (id INT, name STRING)
PARTITIONED BY (age INT)
STORED AS ORC;
--Create partitioned table with different clauses order
CREATE TABLE student (id INT, name STRING)
STORED AS ORC
PARTITIONED BY (age INT);
--Use Row Format and file format
CREATE TABLE student (id INT, name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
--Use complex datatype
CREATE EXTERNAL TABLE family(
name STRING,
friends ARRAY<STRING>,
children MAP<STRING, INT>,
address STRUCT<street: STRING, city: STRING>
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\'
COLLECTION ITEMS TERMINATED BY '_'
MAP KEYS TERMINATED BY ':'
LINES TERMINATED BY '\n'
NULL DEFINED AS 'foonull'
STORED AS TEXTFILE
LOCATION '/tmp/family/';
--Use predefined custom SerDe
CREATE TABLE avroExample
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{ "namespace": "org.apache.hive",
"name": "first_schema",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" },
{ "name":"string2", "type":"string" }
] }');
--Use personalized custom SerDe(we may need to `ADD JAR xxx.jar` first to ensure we can find the serde_class,
--or you may run into `CLASSNOTFOUND` exception)
ADD JAR /tmp/hive_serde_example.jar;
CREATE EXTERNAL TABLE family (id INT, name STRING)
ROW FORMAT SERDE 'com.ly.spark.serde.SerDeExample'
STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat'
OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat'
LOCATION '/tmp/family/';