Accessing Data

This topic describes how to import data, load data using the Spark API, and edit and delete data using Databricks File System - DBFS commands.

Import data

If you have a small file on your local machine that you want to analyze with Databricks, you must upload it to Databricks File System - DBFS. A simple way to do so is to follow the instructions for Upload File in Create a table using the UI.

For production environments, however, we recommend that you use Databricks File System - DBFS instead of the Create Table UI. You can also use a wide variety of Data Sources to import data directly in your notebooks.

Load data

You can read your raw data into Spark directly. For example, if you uploaded a CSV, you can read your data using one of these examples.

Tip

For easier access, we recommend that you create a table. See Databases and Tables for more information.

Scala
val sparkDF = sqlContext.read.format("csv").load("/FileStore/tables/state_income-9f7c5.csv")
Python
sparkDF = sqlContext.read.format("csv").load("/FileStore/tables/state_income-9f7c5.csv")
R
sparkDF <- read.df(sqlContext, source = "csv", path = "/FileStore/tables/state_income-9f7c5.csv")
Scala RDD
val rdd = sc.textFile("/FileStore/tables/state_income-9f7c5.csv")
Python RDD
rdd = sc.textFile"/FileStore/tables/state_income-9f7c5.csv")

If the data volume is small enough, you can also load this data directly onto the driver node. For example:

Python
pandas_df = pd.read_csv("/FileStore/tables/state_income-9f7c5.csv", header=True)
R
df = read.csv("/FileStore/tables/state_income-9f7c5.csv", header = TRUE)

Edit data

You cannot edit data directly within Databricks, but you can overwrite a data file using Databricks File System - DBFS commands.

Delete data

To delete data, use the following Databricks Utilities command:

dbutils.fs.rm("dbfs:/FileStore/tables/state_income-9f7c5.csv", true)

Warning

Deleted data cannot be recovered.