Databricks Datasets

Databricks includes a variety of datasets mounted to the Databricks File System - DBFS that you can use to either learn Spark or test algorithms. You’ll see these throughout the documentation pages.

To browse these files, you can use Databricks Utilities. Here’s a code snippet that you can use to list all of the Databricks datasets.

display(dbutils.fs.ls("/databricks-datasets"))

With each of those you can can then print out the README for any dataset to get some more information about it.

%python
with open("/dbfs/databricks-datasets/README.md") as f:
    x = ''.join(f.readlines())

print x