Skip to main content

Model training examples

This section includes examples showing how to train machine learning models on Databricks using many popular open-source libraries.

You can also use AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.

Machine learning examples

PackageNotebook(s)Features
scikit-learnMachine learning tutorialUnity Catalog, classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow
scikit-learnEnd-to-end exampleUnity Catalog, classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost
MLlibMLlib examplesBinary classification, decision trees, GBT regression, Structured Streaming, custom transformer
xgboostXGBoost examplesPython, PySpark, and Scala, single node workloads and distributed training

Hyperparameter tuning examples

For general information about hyperparameter tuning in Databricks, see Hyperparameter tuning.

PackageNotebookFeatures
OptunaGet started with OptunaOptuna, distributed Optuna, scikit-learn, MLflow
HyperoptDistributed hyperoptDistributed hyperopt, scikit-learn, MLflow
HyperoptCompare modelsUse distributed hyperopt to search hyperparameter space for different model types simultaneously
HyperoptDistributed training algorithms and hyperoptHyperopt, MLlib
HyperoptHyperopt best practicesBest practices for datasets of different sizes