Skip to main content

Platform administration cheat sheet

This article aims to provide clear and opinionated guidance for account and workspace admins on recommended best practices. The following practices should be implemented by account or workspace admins to help optimize cost, observability, data governance, and security in their Databricks account.

For in-depth security best practices, see this PDF: Databricks AWS Security Best Practices and Threat Model.

Best practiceImpactDocs
Enable Unity CatalogData governance: Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces.- Set up and manage Unity Catalog
Apply usage tagsObservability: Have discrete mapping of usage to relevant categories. Assign and enforce tags for your organization’s business units, specific projects, and any other users or groups.- Usage dashboards
Use cluster policiesCost: Control costs with auto-termination (for all-purpose clusters), max cluster sizes, and instance type restrictions.

Observability: Set custom_tags in your cluster policy to enforce tagging.

Security: Restrict cluster access mode to only allow users to create Unity Catalog-enabled clusters to enforce data permissions.
- Create and manage cluster policies
- Monitor cluster usage with tags
Use Service Principals to connect to third-party softwareSecurity: A service principal is a Databricks identity type that allows third-party services to authenticate directly to Databricks, not through an individual user’s credentials.

If something happens to an individual user’s credentials, the third-party service won’t be interrupted.
- Create and manage service principals
Set up SSOSecurity: Instead of having users type their email to log into a workspace, set up Databricks SSO so users can authenticate via your identity provider.- Set up SSO for your workspace
Set up SCIM integrationSecurity: Instead of adding users to Databricks manually, integrate with your identity provider to automate user provisioning and deprovisioning. When a user is removed from the identity provider, they are automatically removed from Databricks too.- Sync users and groups from your identity provider
Manage access control with account-level groupsData governance: Create account-level groups so you can bulk control access to workspaces, resources, and data. This saves you from having to grant all users access to everything or grant individual users specific permissions.

You can also sync groups from your identity provider to Databricks groups.
- Manage groups
- Control access to resources
- Sync groups from your IdP to Databricks
- Data governance guide
Set up IP access for IP whitelistingSecurity: IP access lists prevent users from accessing Databricks resources in unsecured networks. Accessing a cloud service from an unsecured network can pose security risks to an enterprise, especially when the user may have authorized access to sensitive or personal data

Make sure to set up IP access lists for your account console and workspaces.
- Create IP access lists for workspaces
- Create IP access lists for the account console
Configure a customer-managed VPC with regional endpointsSecurity: You can use a customer-managed VPC to exercise more control over your network configurations to comply with specific cloud security and governance standards your organization might require.

Cost: Regional VPC endpoints to AWS services have a more direct connections and reduced cost compared to AWS global endpoints.
- Customer-managed VPC
Use Databricks Secrets or a cloud provider secrets managerSecurity: Using Databricks secrets allows you to securely store credentials for external data sources. Instead of entering credentials directly into a notebook, you can simply reference a secret to authenticate to a data source.- Manage Databricks secrets
Set expiration dates on personal access tokens (PATs)Security: Workspace admins can manage PATs for users, groups, and service principals. Setting expiration dates for PATs reduces the risk of lost tokens or long-lasting tokens that could lead to data exfiltration from the workspace.- Manage personal access tokens
Use budget alerts to monitor usageObservability: Monitor usage based on budgets important to your organization. Budget examples include: project, migration, BU, and quarterly or annual budgets.- Create and monitor budgets
Use system tables to monitor account usageObservability: System tables are a Databricks-hosted analytical store of your account’s operational data, including audit logs, data lineage, and billable usage. You can use system tables for observability across your account.- Monitor usage with system tables