Skip to main content

Tutorial: Create your first custom Databricks Asset Bundle template

In this tutorial, you’ll create a custom Databricks Asset Bundle template for creating bundles that run a job with a specific Python task on a cluster using a specific Docker container image.

Before you start

Install the Databricks CLI version 0.218.0 or above. If you’ve already installed it, confirm the version is 0.218.0 or higher by running databricks -version from the command line.

Define user prompt variables

The first step in buidling a bundle template is to define the databricks bundle init user prompt variables. From the command line:

  1. Create an empty directory named dab-container-template:

    sh
    mkdir dab-container-template
  2. In the directory’s root, create a file named databricks_template_schema.json:

    sh
    cd dab-container-template
    touch databricks_template_schema.json
  3. Add the following contents to the databricks_template_schema.json and save the file. Each variable will be translated to a user prompt during bundle creation.

    JSON
    {
    "properties": {
    "project_name": {
    "type": "string",
    "default": "project_name",
    "description": "Project name",
    "order": 1
    }
    }
    }

Create the bundle folder structure

Next, in the template directory, create subdirectories named resources and src. The template folder contains the directory structure for your generated bundles. The names of the subdirectories and files will follow Go package template syntax when derived from user values.

sh
  mkdir -p "template/resources"
mkdir -p "template/src"

Add YAML configuration templates

In the template directory, create a file named databricks.yml.tmpl and add the following YAML:

sh
  touch template/databricks.yml.tmpl
YAML
  # This is a Databricks asset bundle definition for {{.project_name}}.
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
bundle:
name: {{.project_name}}

include:
- resources/*.yml

targets:
# The 'dev' target, used for development purposes.
# Whenever a developer deploys using 'dev', they get their own copy.
dev:
# We use 'mode: development' to make sure everything deployed to this target gets a prefix
# like '[dev my_user_name]'. Setting this mode also disables any schedules and
# automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
mode: development
default: true
workspace:
host: {{workspace_host}}

# The 'prod' target, used for production deployment.
prod:
# For production deployments, we only have a single copy, so we override the
# workspace.root_path default of
# /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name}
# to a path that is not specific to the current user.
#
# By making use of 'mode: production' we enable strict checks
# to make sure we have correctly configured this target.
mode: production
workspace:
host: {{workspace_host}}
root_path: /Shared/.bundle/prod/${bundle.name}
{{- if not is_service_principal}}
run_as:
# This runs as {{user_name}} in production. Alternatively,
# a service principal could be used here using service_principal_name
# (see Databricks documentation).
user_name: {{user_name}}
{{end -}}

Create another YAML file named {{.project_name}}_job.yml.tmpl and place it in the template/resources directory. This new YAML file splits the project job definitions from the rest of the bundle’s definition. Add the following YAML to this file to describe the template job, which contains a specific Python task to run on a job cluster using a specific Docker container image:

sh
  touch template/resources/{{.project_name}}_job.yml.tmpl
  # The main job for {{.project_name}}
resources:
jobs:
{{.project_name}}_job:
name: {{.project_name}}_job
tasks:
- task_key: python_task
job_cluster_key: job_cluster
spark_python_task:
python_file: ../src/{{.project_name}}/task.py
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
docker_image:
url: databricksruntime/python:10.4-LTS
node_type_id: i3.xlarge
spark_version: 13.3.x-scala2.12

In this example, you use a default Databricks base Docker container image, but you can specify your own custom image instead.

Add files referenced in your configuration

Next, create a template/src/{{.project_name}} directory and create the Python task file referenced by the job in the template:

sh
  mkdir -p template/src/{{.project_name}}
touch template/src/{{.project_name}}/task.py

Now, add the following to task.py:

Python
  import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.master('local[*]').appName('example').getOrCreate()

print(f'Spark version{spark.version}')

Verify the bundle template structure

Review the folder structure of your bundle template project. It should look like this:

sh
  .
├── databricks_template_schema.json
└── template
├── databricks.yml.tmpl
├── resources
│ └── {{.project_name}}_job.yml.tmpl
└── src
└── {{.project_name}}
└── task.py

Test your template

Finally, test your bundle template. To generate a bundle based on your new custom template, use the databricks bundle init command, specifying the new template location. From your bundle projects root folder:

sh
mkdir my-new-container-bundle
cd my-new-container-bundle
databricks bundle init dab-container-template

Next steps

Resources