Reading CSV Files

Reading CSV files in Apache Spark is simple. In this example we’ll be using the diamonds dataset available as a Databricks Dataset. All we have to do is specify the path as well as any options that we would like.


Sometimes you may find yourself with a variety of CSV files in one folder. In Apache Spark, you can read that entire directory of CSV files by specifying the directory as the file location as opposed to individual files.

Reading CSV Files Notebook Example

Manipulating Data

Once you’ve read in your data you’re going to need to manipulate it. In order to better understand how to manipulate data in Apache Spark you should check out the Spark SQL Language Manual.