Databricks Datasets

Databricks includes a variety of datasets mounted to the Databricks File System that you can use to either learn Spark or test algorithms. You’ll see these throughout the documentation pages.

To browse these files, you can use Databricks Utilities. Here’s a code snippet that you can use to list all of the Databricks datasets.

display(dbutils.fs.ls("/databricks-datasets"))

With each of those you can can then print out the README for any dataset to get some more information about it.

%python
with open("/dbfs/databricks-datasets/README.md") as f:
    x = ''.join(f.readlines())

print(x)