HIPAA-Compliant Deployment

Databricks supports HIPAA compliant deployment to process PHI data. In this deployment mode, all PHI data will be encrypted at rest and through transit. In addition to running the instructions in this guide, you should also sign a Business Associate Agreement (BAA) with AWS to maintain compliance with HIPAA regulations.

Warning

High-performance distributed machine learning (ML) packages such as Horovod use MPI (Message Passing Interface) and other low-level communication protocols. Because these protocols do not natively support encryption over the wire, these ML packages can potentially send unencrypted sensitive data across the network, which may not be HIPAA-compliant.

What are the risks?

Messages sent across the network by these ML packages are typically either ML model parameters or summary statistics about training data. It is therefore not typically expected that sensitive data, such as protected health information, would be sent over the wire in an unencrypted fashion. However, it is possible that certain configurations or uses of these packages (such as specific model designs) could result in messages being sent across the network that contain such information.

Which libraries are affected?

These libraries are also included in Databricks Runtime for Machine Learning.

HIPAA-compliant deployment

In order to have a HIPAA-compliant deployment, contact your account manager or email sales@databricks.com.

Create a HIPAA-compliant cluster

The HIPAA-compliant deployment ensures that all the data that flows through our services meet the HIPAA regulations. When you process the data with your clusters, you still need to make sure that the clusters are set up such that they also comply with HIPAA regulations.

Follow these instructions for creating a HIPAA-compliant cluster to process PHI data.

Step 1: Spark Version

You must choose Spark 2.1.0-db3 or higher as only these versions support encryption on the wire which is required for HIPAA.

../../_images/create-cluster.png

Step 2: EBS Volumes

You must provision EBS volume to your cluster, as Databricks EBS volumes are encrypted while the default local storage provided by Amazon is not.

../../_images/ebs-volumes.png

Step 3: Create Cluster

After providing other desired parameters for your cluster, click the Create Cluster button.

Step 4: Verification

Create a notebook in the workspace and attach the notebook to the cluster that was created in the previous step.

Run the following command in the notebook:

%scala spark.conf.get("spark.ssl.enabled")

If the returned value is true, you have successfully created a cluster with encryption turned on. If not, contact support@databricks.com.