Databricks Runtime for Machine Learning (Databricks Runtime ML) provides a ready-to-go environment for machine learning and data science. It contains multiple popular libraries, including TensorFlow, Keras, and XGBoost. It also supports distributed TensorFlow training using Horovod.
Databricks Runtime ML lets you start a Databricks cluster with all of the libraries required for distributed TensorFlow training. It ensures the compatibility of the libraries included on the cluster (between TensorFlow and CUDA / cuDNN, for example) and substantially decreases the cluster start-up time compared to using init scripts.
In this topic:
Databricks Runtime ML is built on Databricks Runtime. For example, Databricks Runtime 5.0 ML Beta is built on Databricks Runtime 5.0. The libraries included in the base Databricks Runtime are listed in the Databricks Runtime Release Notes.
Databricks Runtime ML includes additional libraries to support machine learning. See the following topics for an up-to-date list of libraries for the available runtimes:
Databricks Runtime ML includes high-performance distributed machine learning packages that use MPI (Message Passing Interface) and other low-level communication protocols. Because these protocols do not natively support encryption over the wire, these ML packages can potentially send unencrypted sensitive data across the network.
What are the risks?
Messages sent across the network by these ML packages are typically either ML model parameters or summary statistics about training data. It is therefore not typically expected that sensitive data, such as protected health information, would be sent over the wire in an unencrypted fashion. However, it is possible that certain configurations or uses of these packages (such as specific model designs) could result in messages being sent across the network that contain such information.
Which libraries are affected?
When you create a cluster, select a Databricks Runtime ML version from the Databricks Runtime Version drop-down. Both CPU and GPU-enabled ML runtimes are available.
If you select a GPU-enabled ML runtime, you are prompted to select a compatible Driver Type and Worker Type. Incompatible instance types are grayed out in the drop-downs. GPU-enabled instance types are listed under the GPU-Accelerated label.
Libraries in your workspace that automatically attach to all clusters can conflict with the libraries included in Databricks Runtime ML. Before you create a cluster with Databricks Runtime ML, clear the Attach automatically to all clusters checkbox for conflicting libraries.
By using this version of Databricks Runtime, you agree to the terms and conditions outlined in the NVIDIA End User License Agreement (EULA) with respect to the CUDA, cuDNN, and Tesla libraries, and the NVIDIA End User License Agreement (with NCCL Supplement) for the NCCL library.