Libraries API

The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster via the API.


All Cluster Statuses

Endpoint HTTP Method
2.0/libraries/all-cluster-statuses GET

Get the status of all libraries on all clusters. A status will be available for all libraries installed on clusters via the API or the libraries UI as well as libraries set to be installed on all clusters via the libraries UI. If a library has been set to be installed on all clusters, is_library_for_all_clusters will be true, even if the library was also installed on this specific cluster.

An example response:

{
  "statuses": [
    {
      "cluster_id": "11203-my-cluster",
      "library_statuses": [
        {
          "library": {
            "jar": "dbfs:/mnt/libraries/library.jar"
          },
          "status": "INSTALLING",
          "messages": [],
          "is_library_for_all_clusters": false
        }
      ]
    },
    {
      "cluster_id": "20131-my-other-cluster",
      "library_statuses": [
        {
          "library": {
            "egg": "dbfs:/mnt/libraries/library.egg"
          },
          "status": "ERROR",
          "messages": ["Could not download library"],
          "is_library_for_all_clusters": false
        }
      ]
    }
  ]
}

Response Structure

Field Name Type Description
statuses An array of ClusterLibraryStatuses A list of cluster statuses.

Cluster Status

Endpoint HTTP Method
2.0/libraries/cluster-status GET

Get the status of libraries on a cluster. A status will be available for all libraries installed on the cluster via the API or the libraries UI as well as libraries set to be installed on all clusters via the libraries UI. If a library has been set to be installed on all clusters, is_library_for_all_clusters will be true, even if the library was also installed on the cluster.

An example request:

/libraries/cluster-status?cluster_id=11203-my-cluster

And response:

{
  "cluster_id": "11203-my-cluster",
  "library_statuses": [
    {
      "library": {
        "jar": "dbfs:/mnt/libraries/library.jar"
      },
      "status": "INSTALLED",
      "messages": [],
      "is_library_for_all_clusters": false
    },
    {
      "library": {
        "pypi": {
          "package": "beautifulsoup4"
        },
      },
      "status": "INSTALLING",
      "messages": ["Successfully resolved package from PyPI"],
      "is_library_for_all_clusters": false
    },
    {
      "library": {
        "cran": {
          "package": "ada",
          "repo": "http://cran.us.r-project.org"
        },
      },
      "status": "FAILED",
      "messages": ["R package installation is not supported on this spark version.\nPlease upgrade to Runtime 3.2 or higher"],
      "is_library_for_all_clusters": false
    }
  ]
}

Request Structure

Field Name Type Description
cluster_id STRING Unique identifier of the cluster whose status should be retrieved. This field is required.

Response Structure

Field Name Type Description
cluster_id STRING Unique identifier for the cluster.
library_statuses An array of LibraryFullStatus Status of all libraries on the cluster.

Install

Endpoint HTTP Method
2.0/libraries/install POST

Install libraries on a cluster. The installation is asynchronous - it happens in the background after the completion of this request. The actual set of libraries to be installed on a cluster is the union of the libraries specified via this method and the libraries set to be installed on all clusters via the libraries UI.

Installing a wheel library on clusters running Databricks Runtime 4.2 or higher is like running the pip command against the wheel file directly on driver and executors. All the dependencies specified in the library setup.py file are installed and this requires the library name to satisfy the wheel file name convention. The installation on the executors happens only when a new task is launched and the installation order is nondeterministic if there are multiple wheel files to be installed by the same task launching. To get a deterministic installation order, create a zip file with suffix .wheelhouse.zip that includes all the wheel files.

Installing a wheel library on clusters running Databricks Runtime lower than 4.2 adds the file to the PYTHONPATH variable, without installing the dependencies.

CRAN libraries can be installed only on clusters running Databricks Runtime 3.2 and above.

Important

A library installed using the API does not appear in the cluster UI.

An example request:

{
  "cluster_id": "10201-my-cluster",
  "libraries": [
    {
      "jar": "dbfs:/mnt/libraries/library.jar"
    },
    {
      "egg": "dbfs:/mnt/libraries/library.egg"
    },
    {
      "whl": "dbfs:/mnt/libraries/mlflow-0.0.1.dev0-py2-none-any.whl"
    },
    {
      "whl": "dbfs:/mnt/libraries/wheel-libraries.wheelhouse.zip"
    },
    {
      "maven": {
        "coordinates": "org.jsoup:jsoup:1.7.2",
        "exclusions": ["slf4j:slf4j"]
      }
    },
    {
      "pypi": {
        "package": "simplejson",
        "repo": "http://my-pypi-mirror.com"
      }
    },
    {
      "cran": {
        "package": "ada",
        "repo": "http://cran.us.r-project.org"
      }
    }
  ]
}

Request Structure

Field Name Type Description
cluster_id STRING Unique identifier for the cluster on which to install these libraries. This field is required.
libraries An array of Library The libraries to install.

Uninstall

Endpoint HTTP Method
2.0/libraries/uninstall POST

Set libraries to be uninstalled on a cluster. The libraries aren’t uninstalled until the cluster is restarted. Uninstalling libraries that are not installed on the cluster has no impact but is not an error.

An example request:

{
  "cluster_id": "10201-my-cluster",
  "libraries": [
    {
      "jar": "dbfs:/mnt/libraries/library.jar"
    },
    {
      "cran": "ada"
    }
  ]
}

Request Structure

Field Name Type Description
cluster_id STRING Unique identifier for the cluster on which to uninstall these libraries. This field is required.
libraries An array of Library The libraries to uninstall.

Data Structures

ClusterLibraryStatuses

Field Name Type Description
cluster_id STRING Unique identifier for the cluster.
library_statuses An array of LibraryFullStatus Status of all libraries on the cluster.

Library

Field Name Type Description
jar OR egg OR whl OR pypi OR maven OR cran STRING OR STRING OR STRING OR PythonPyPiLibrary OR MavenLibrary OR RCranLibrary

If jar, URI of the jar to be installed. DBFS and S3 URIs are supported. For example: { "jar": "dbfs:/mnt/databricks/library.jar" } or { "jar": "s3://my-bucket/library.jar" }. If S3 is used, make sure the cluster has read access on the library. You may need to launch the cluster with an IAM role to access the S3 URI.

If egg, URI of the egg to be installed. DBFS and S3 URIs are supported. For example: { "egg": "dbfs:/my/egg" } or { "egg": "s3://my-bucket/egg" }. If S3 is used, make sure the cluster has read access on the library. You may need to launch the cluster with an IAM role to access the S3 URI.

If whl, URI of the wheel or zipped wheels to be installed. DBFS and S3 URIs are supported. For example: { "whl": "dbfs:/my/whl" } or { "whl": "s3://my-bucket/whl" }. If S3 is used, make sure the cluster has read access on the library. You may need to launch the cluster with an IAM role to access the S3 URI. Also the wheel file name needs to use the correct convention. If zipped wheels are to be installed, the file name suffix should be .wheelhouse.zip.

If pypi, specification of a PyPi library to be installed. For example: { "package": "simplejson" }

If maven, specification of a Maven library to be installed. For example: { "coordinates": "org.jsoup:jsoup:1.7.2" }

If cran, specification of a CRAN library to be installed.

LibraryFullStatus

The status of the library on a specific cluster.

Field Name Type Description
library Library Unique identifier for the library.
status LibraryInstallStatus Status of installing the library on the cluster.
messages An array of STRING All the info and warning messages that have occurred so far for this library.
is_library_for_all_clusters BOOL Whether the library was set to be installed on all clusters via the libraries UI.

MavenLibrary

Field Name Type Description
coordinates STRING Gradle-style Maven coordinates. For example: org.jsoup:jsoup:1.7.2. This field is required.
repo STRING Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched.
exclusions An array of STRING

List of dependences to exclude. For example: ["slf4j:slf4j", "*:hadoop-client"].

Maven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

PythonPyPiLibrary

Field Name Type Description
package STRING The name of the PyPi package to install. An optional exact version specification is also supported. Examples: simplejson and simplejson==3.8.0. This field is required.
repo STRING The repository where the package can be found. If not specified, the default pip index is used.

RCranLibrary

Field Name Type Description
package STRING The name of the CRAN package to install. This field is required.
repo STRING The repository where the package can be found. If not specified, the default CRAN repo is used.

LibraryInstallStatus

The status of a library on a specific cluster.

Status Description
PENDING No action has yet been taken to install the library. This state should be very short lived.
RESOLVING

Metadata necessary to install the library is being retrieved from the provided repository.

For JAR and egg libraries, this step is a no-op.

INSTALLING The library is actively being installed, either by adding resources to Spark or executing system commands inside the Spark nodes.
INSTALLED The library has been successfully installed and can now be used.
FAILED Some step in installation failed. More information can be found in the `messages field.
UNINSTALL_ON_RESTART The library has been marked for removal. Libraries can be removed only when clusters are restarted, so libraries that enter this state will remain until the cluster is restarted.