Skip to main content

Production job scheduling cheat sheet

This article aims to provide clear and opinionated guidance for production job scheduling. Using best practices can help reduce costs, improve performance, and tighten security.

Best PracticeImpactDocs
Use jobs clusters for automated workflowsCost: Jobs clusters are billed at lower rates than interactive clusters.- Create a cluster
- All-purpose and job clusters.
Restart long-running clustersSecurity: Restart clusters to take advantage of patches and bug fixes to the Databricks Runtime.- Restart a cluster to update it with the latest images
Use service principals instead of user accounts to run production jobsSecurity: If jobs are owned by individual users, when those users leave the org, these jobs may stop running.- Manage service principals
Use Databricks Jobs for orchestration whenever possibleCost: There’s no need to use external tools to orchestrate if you are only orchestrating workloads on Databricks.- Overview of orchestration on Databricks
Use latest LTS version of Databricks RuntimePerformance and cost: Databricks is always improving Databricks Runtime for usability, performance, and security.- Compute
- Databricks support lifecycles
Don’t store production data in DBFS rootSecurity: When data is stored in the DBFS root, all users can access it.- What is DBFS?
- Recommendations for working with DBFS root