Genomics Guide

Databricks provides tools and frameworks that bridge computer science and bioinformatics, designed for end-to-end analysis of genomics data at terabyte to petabyte scales.

Beta

The Databricks genomics applications require Databricks Runtime HLS, which is in Beta. Sign up for access.

ETL

Once variants have been detected in individuals, append data into Delta Lake tables. Then perform aggregate analyses to calculate cohort level quality control and summary statistics.

Machine Learning

Databricks provides an environment for building, training, and deploying machine learning and deep learning models at scale. To learn more and find example notebooks, see Machine Learning, Deep Learning, and MLflow guides.

This section provides examples of machine learning algorithms applied to genomics data.