Databricks Utilities

Databricks Utilities (DBUtils) make it easy to perform powerful combinations of tasks. You can use the utilities to work with blob storage efficiently, to chain and parameterize notebooks, and to work with secrets.

All dbutils utilities are available in Python and Scala notebooks. Only Widget utilities are available in R notebooks; however, you can use a language magic command to invoke other dbutils methods in R and SQL notebooks. For example, to list the Databricks Datasets DBFS folder in an R or SQL notebook, run the command:

%python
dbutils.fs.ls("/databricks-datasets")

This topic includes the following sections:

File system utilities

The file system utilities access Databricks File System - DBFS, making it easier to use Databricks as a file system. Learn more by running:

dbutils.fs.help()
cp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems
head(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8
ls(dir: String): Seq -> Lists the contents of a directory
mkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories
mv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems
put(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory

mount(source: String, mountPoint: String, encryptionType: String = "", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point
mounts: Seq -> Displays information about what is mounted within DBFS
refreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information
unmount(mountPoint: String): boolean -> Deletes a DBFS mount point

Notebook workflow utilities

Notebook workflows allow you to chain together notebooks and act on their results. See Notebook Workflows. Learn more by running:

dbutils.notebook.help()
exit(value: String): void -> This method lets you exit a notebook with a value
run(path: String, timeoutSeconds: int, arguments: Map): String -> This method runs a notebook and returns its exit value.

Note

The maximum length of the string value returned from run is 5 MB. See Runs Get Output.

Widget utilities

Widgets allow you to parameterize notebooks. See Widgets. Learn more by running:

dbutils.widgets.help()
combobox(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a combobox input widget with a given name, default value and choices
dropdown(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a dropdown input widget a with given name, default value and choices
get(name: String): String -> Retrieves current value of an input widget
getArgument(name: String, optional: String): String -> (DEPRECATED) Equivalent to get
multiselect(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a multiselect input widget with a given name, default value and choices
remove(name: String): void -> Removes an input widget from the notebook
removeAll: void -> Removes all widgets in the notebook
text(name: String, defaultValue: String, label: String): void -> Creates a text input widget with a given name and default value

Secrets utilities

Secrets allow you to store and access sensitive credential information without making them visible in notebooks. See Secrets and Use the secrets in a notebook. Learn more by running:

Note

Secrets utilities are available on clusters running Databricks Runtime 4.0 and above.

dbutils.secrets.help()
get(scope: String, key: String): String -> Gets the string representation of a secret value with scope and key
getBytes(scope: String, key: String): byte[] -> Gets the bytes representation of a secret value with scope and key
list(scope: String): Seq -> Lists secret metadata for secrets within a scope
listScopes: Seq -> Lists secret scopes

Library utilities

Note

Library utilities are in Public Preview.

Library utilities allow you to install Python libraries and create an environment scoped to a notebook session. The libraries are available both on the driver and on the executors, so you can reference them in UDFs. This enables:

  • Library dependencies of a notebook to be organized within the notebook itself.
  • Notebook users with different library dependencies to share a cluster without interference.

Detaching a notebook destroys this environment – however, you can recreate it by re-running the library install API commands in the notebook. See the restartPython API for how you can reset your notebook state without losing your environment.

Library utilities are enabled by default on clusters running Databricks Runtime 5.1 and above. Therefore, by default the Python environment for each notebook is isolated by using a separate Python executable that is created when the notebook is attached to and inherits the default Python environment on the cluster. Libraries installed through an init script into the Databricks Python environment are still available. You can disable this feature by setting spark.databricks.libraryIsolation.enabled to false.

This API is designed to be preferred way to install libraries. It is compatible with the existing cluster-wide library installation through the UI and REST API. However, libraries installed through this API have higher priority than cluster-wide libraries.

dbutils.library.help()
install(path: String): boolean -> Install the library within the current notebook session
installPyPI(pypiPackage: String, version: String = "", repo: String = ""): boolean -> Install the PyPI library within the current notebook session
list: List -> List the isolated libraries added for the current notebook session via dbutils
restartPython: void -> Restart python process for the current notebook session

Examples

  • Install a .egg or .whl library in a notebook.

    The accepted library sources are dbfs and s3.

    dbutils.library.install("dbfs:/path/to/your/library.egg")
    
    dbutils.library.install("dbfs:/path/to/your/library.whl")
    
  • Install a PyPI library in a notebook. version and repo are optional.

    dbutils.library.installPyPI("PyPIpackage", "version", "repo")
    
  • Specify your library requirements in one notebook and install them through %run in the other.

    • Define the libraries to install in a notebook called InstallDependencies.

      dbutils.library.installPyPI("torch")
      
    • Install them in the notebook that needs those dependencies.

      %run /path/to/InstallDependencies    # install the dependencies in first cell
      
      import torch
      # do the actual work
      
  • List the libraries installed in a notebook.

    dbutils.library.list()
    
  • Reset the Python notebook state while maintaining the environment. This API is available only in Python notebooks. This can be used to:

    • Reload libraries Databricks preinstalled with a different version. For example:

      dbutils.library.installPyPI("numpy","1.15.4")
      dbutils.library.restartPython()
      
      # Make sure you start using the library in another cell.
      import numpy
      
    • Install libraries like tensorflow that need to be loaded on process start up. For example:

      dbutils.library.installPyPI("tensorflow")
      dbutils.library.restartPython()
      
      # Use the library in another cell.
      import tensorflow
      

    Important

    The Python notebook state is reset after the notebook cell containing restartPython is run. After running the cell calling restartPython, the notebook loses all state held in the Python including but not limited to local variables, imported libraries, and other ephemeral states. Therefore, we recommended that you install libraries and reset the notebook state in the first notebook cell.

Databricks Utilities API library

To accelerate application development, it can be helpful to compile, build, and test applications before you deploy them as production jobs. To enable you to compile against Databricks Utilities, Databricks provides the dbutils-api library. You can download the dbutils-api library or include the library by adding a dependency to your build file:

  • SBT

    libraryDependencies += "com.databricks" % "dbutils-api_2.11" % "0.0.3"
    
  • Maven

    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>dbutils-api_2.11</artifactId>
        <version>0.0.3</version>
    </dependency>
    
  • Gradle

    compile 'com.databricks:dbutils-api_2.11:0.0.3'
    

Once you build your application against this library, you can deploy the application on a cluster running Databricks Runtime 4.0 or above.

Important

The dbutils-api library allows you to locally compile an application that uses dbutils, but not to run it. To run the application, you must deploy it in Databricks.

Example projects

Here is an example archive containing minimal example projects that show you how to compile using the dbutils-api library for 3 common build tools:

  • sbt: sbt package
  • Maven: mvn install
  • Gradle: gradle build

These commands create output JARs in the locations:

  • sbt: target/scala-2.11/dbutils-api-example_2.11-0.0.1-SNAPSHOT.jar
  • Maven: target/dbutils-api-example-0.0.1-SNAPSHOT.jar
  • Gradle: build/libs/dbutils-api-example-0.0.1-SNAPSHOT.jar

You can attach this JAR to your cluster as a library, restart the cluster, (which you must do using Databricks Runtime 4.0), and then run:

example.Test()

This statement creates a text input widget with the label Hello: and the initial value World.

You can use all the other dbutils APIs the same way.

To test an application that uses the dbutils object outside Databricks, you can mock up the dbutils object by calling:

com.databricks.dbutils_v1.DBUtilsHolder.dbutils0.set(
  new com.databricks.dbutils_v1.DBUtilsV1{
    ...
  }
)

Substitute your own DBUtilsV1 instance in which you implement the interface methods however you like, for example providing a local filesystem mockup for dbutils.fs.