Importing Hive Tables

This topic shows how to import Hive tables from cloud storage into Spark, using an external table.

Tables on cloud storage must be mounted to Databricks File System - DBFS.

Step 1: Show the CREATE TABLE statement

Issue a SHOW CREATE TABLE <tablename> command on your Hive command line to see the statement that created the table.

hive> SHOW CREATE TABLE wikicc;
OK
CREATE  TABLE `wikicc`(
  `country` string,
  `count` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/user/hive/warehouse/wikicc'
TBLPROPERTIES (
  'totalSize'='2335',
  'numRows'='240',
  'rawDataSize'='2095',
  'COLUMN_STATS_ACCURATE'='true',
  'numFiles'='1',
  'transient_lastDdlTime'='1418173653')
Time taken: 0.03 seconds, Fetched: 18 row(s)

Step 2: Issue a CREATE TABLE EXTERNAL statement

If the statement that is returned uses a simple CREATE TABLE command, copy the statement and replace CREATE TABLE with CREATE TABLE EXTERNAL.

  • EXTERNAL ensures that Spark SQL does not delete your data if you drop the table.
  • You can omit the TBLPROPERTIES field.
DROP TABLE wikicc
CREATE EXTERNAL TABLE `wikicc`(
  `country` string,
  `count` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/user/hive/warehouse/wikicc'

Step 3: Issue SQL commands on your data

SELECT * FROM wikicc