Monitor Delta Live Tables pipelines
This article describes using built-in monitoring and observability features for Delta Live Tables pipelines. These features support tasks such as:
- Observing the progress and status of pipeline updates. See What pipeline details are available in the UI?.
- Alerting on pipeline events such as the success or failure of pipeline updates. See Add email notifications for pipeline events.
- Viewing metrics for streaming sources like Apache Kafka and Auto Loader (Public Preview). See View streaming metrics.
- Extracting detailed information on pipeline updates such as data lineage, data quality metrics, and resource usage. See What is the Delta Live Tables event log?.
- Defining custom actions to take when specific events occur. See Define custom monitoring of Delta Live Tables pipelines with event hooks.
To inspect and diagnose query performance, see Access query history for Delta Live Tables pipelines. This feature is in Public Preview.
Add email notifications for pipeline events
You can configure one or more email addresses to receive notifications when the following occurs:
- A pipeline update completes successfully.
- A pipeline update fails, either with a retryable or a non-retryable error. Select this option to receive a notification for all pipeline failures.
- A pipeline update fails with a non-retryable (fatal) error. Select this option to receive a notification only when a non-retryable error occurs.
- A single data flow fails.
To configure email notifications when you create or edit a pipeline:
- Click Add notification.
- Enter one or more email addresses to receive notifications.
- Click the checkbox for each notification type to send to the configured email addresses.
- Click Add notification.
What pipeline details are available in the UI?
The pipeline graph appears as soon as an update to a pipeline has successfully started. Arrows represent dependencies between datasets in your pipeline. By default, the pipeline details page shows the most recent update for the table, but you can select older updates from a drop-down menu.
Details include the pipeline ID, source code, compute cost, product edition, and the channel configured for the pipeline.
To see a tabular view of datasets, click the List tab. The List view allows you to see all datasets in your pipeline represented as a row in a table and is useful when your pipeline DAG is too large to visualize in the Graph view. You can control the datasets displayed in the table using multiple filters such as dataset name, type, and status. To switch back to the DAG visualization, click Graph.
The Run as user is the pipeline owner, and pipeline updates run with this user’s permissions. To change the run as
user, click Permissions and change the pipeline owner.
How can you view dataset details?
Clicking on a dataset in the pipeline graph or dataset list shows details about the dataset. Details include the dataset schema, data quality metrics, and a link to the source code defining the dataset.
View update history
To view the history and status of pipeline updates, click the update history drop-down menu in the top bar.
Select the update in the drop-down menu to view the graph, details, and events for an update. To return to the latest update, click Show the latest update.
View streaming metrics
Streaming observability for Delta Live Tables is in Public Preview.
You can view streaming metrics from the data sources supported by Spark Structured Streaming, like Apache Kafka, Amazon Kinesis, Auto Loader, and Delta tables, for each streaming flow in your Delta Live Tables pipeline. Metrics are displayed as charts in the Delta Live Tables UI’s right pane and include backlog seconds, backlog bytes, backlog records, and backlog files. The charts display the maximum value aggregated by minute and a tooltip shows maximum values when you hover over the chart. The data is limited to the last 48 hours from the current time.
Tables in your pipeline with streaming metrics available display the icon when viewing the pipeline DAG in the UI Graph view. To view the streaming metrics, click the
to display the streaming metric chart in the Flows tab in the right pane. You can also apply a filter to view only tables with streaming metrics by clicking List and then clicking Has streaming metrics.
Each streaming source supports only specific metrics. Metrics not supported by a streaming source are not available to view in the UI. The following table shows the metrics available for supported streaming sources:
source | backlog bytes | backlog records | backlog seconds | backlog files |
---|---|---|---|---|
Kafka | ✓ | ✓ | ||
Kinesis | ✓ | ✓ | ||
Delta | ✓ | ✓ | ||
Auto Loader | ✓ | ✓ | ||
Google Pub/Sub | ✓ | ✓ |
What is the Delta Live Tables event log?
The Delta Live Tables event log contains all information related to a pipeline, including audit logs, data quality checks, pipeline progress, and data lineage. You can use the event log to track, understand, and monitor the state of your data pipelines.
You can view event log entries in the Delta Live Tables user interface, the Delta Live Tables API, or by directly querying the event log. This section focuses on querying the event log directly.
You can also define custom actions to run when events are logged, for example, sending alerts, with event hooks.
Do not delete the event log or the parent catalog or schema where the event log is published. Deleting the event log might result in your pipeline failing to update during future runs.
Event log schema
The following table describes the event log schema. Some of these fields contain JSON data that require parsing to perform some queries, such as the details
field. Databricks supports the :
operator to parse JSON fields. See :
(colon sign) operator.
Field | Description |
---|---|
id | A unique identifier for the event log record. |
sequence | A JSON document containing metadata to identify and order events. |
origin | A JSON document containing metadata for the origin of the event, for example, the cloud provider, the cloud provider region, user_id , pipeline_id , or pipeline_type to show where the pipeline was created, either DBSQL or WORKSPACE . |
timestamp | The time the event was recorded. |
message | A human-readable message describing the event. |
level | The event type, for example, INFO , WARN , ERROR , or METRICS . |
maturity_level | The stability of the event schema. The possible values are: - STABLE : The schema is stable and will not change.- NULL : The schema is stable and will not change. The value may be NULL if the record was created before the maturity_level field was added (release 2022.37).- EVOLVING : The schema is not stable and may change.- DEPRECATED : The schema is deprecated and the Delta Live Tables runtime may stop producing this event at any time. |
error | If an error occurred, details describing the error. |
details | A JSON document containing structured details of the event. This is the primary field used for analyzing events. |
event_type | The event type. |
Query the event log
This section describes the default behavior and syntax for working with event logs for pipelines configured with Unity Catalog and the default publishing mode.
- For behavior for Unity Catalog pipelines that use legacy publishing mode, see Work with event log for Unity Catalog legacy pipelines.
- For behavior and syntax for Hive metastore pipelines, see Work with event log for Hive metastore pipelines.
By default, Delta Live Tables writes the event log to a hidden Delta table in the default catalog and schema configured for the pipeline. While hidden, the table can still be queried by all sufficiently privileged users. By default, only the owner of the pipeline can query the event log table.
By default, the name for the hidden event log is formatted as event_log_{pipeline_id}
, where the pipeline ID is the system-assigned UUID with dashed replaced by underscores.
You can interact with the JSON configuration to publish the event log. When you publish an event log, you specify the name for the event log and can optionally specify a catalog and schema, as in the following example:
{
"id": "ec2a0ff4-d2a5-4c8c-bf1d-d9f12f10e749",
"name": "billing_pipeline",
"event_log": {
"catalog": "catalog_name",
"schema": "schema_name",
"name": "event_log_table_name"
}
}
The event log location also serves as the schema location for any Auto Loader queries in the pipeline. Databricks recommends creating a view over the event log table before modifying the privileges, as some compute settings might allow users to gain access to schema metadata if the event log table is shared directly. The following example syntax creates a view on an event log table:
CREATE VIEW event_log_view
AS SELECT * FROM catalog_name.schema_name.event_log_table_name;
In Unity Catalog, views support streaming queries. The following example uses Structured Streaming to query a view defined on top of an event log table:
df = spark.readStream.table("event_log_view")
Query lineage information from the event log
Events containing information about lineage have the event type flow_definition
. The details:flow_definition
object contains the output_dataset
and input_datasets
defining each relationship in the graph.
You can use the following query to extract the input and output datasets to see lineage information:
SELECT
details:flow_definition.output_dataset as output_dataset,
details:flow_definition.input_datasets as input_dataset
FROM
event_log_raw,
latest_update
WHERE
event_type = 'flow_definition'
AND
origin.update_id = latest_update.id
output_dataset | input_datasets |
---|---|
customers | null |
sales_orders_raw | null |
sales_orders_cleaned | ["customers", "sales_orders_raw"] |
sales_order_in_la | ["sales_orders_cleaned"] |
Query data quality from the event log
If you define expectations on datasets in your pipeline, the data quality metrics are stored in the details:flow_progress.data_quality.expectations
object. Events containing information about data quality have the event type flow_progress
. The following example queries the data quality metrics for the last pipeline update:
SELECT
row_expectations.dataset as dataset,
row_expectations.name as expectation,
SUM(row_expectations.passed_records) as passing_records,
SUM(row_expectations.failed_records) as failing_records
FROM
(
SELECT
explode(
from_json(
details :flow_progress :data_quality :expectations,
"array<struct<name: string, dataset: string, passed_records: int, failed_records: int>>"
)
) row_expectations
FROM
event_log_raw,
latest_update
WHERE
event_type = 'flow_progress'
AND origin.update_id = latest_update.id
)
GROUP BY
row_expectations.dataset,
row_expectations.name
dataset | expectation | passing_records | failing_records |
---|---|---|---|
sales_orders_cleaned | valid_order_number | 4083 | 0 |
Monitor data backlog by querying the event log
Delta Live Tables tracks how much data is present in the backlog in the details:flow_progress.metrics.backlog_bytes
object. Events containing backlog metrics have the event type flow_progress
. The following example queries backlog metrics for the last pipeline update:
SELECT
timestamp,
Double(details :flow_progress.metrics.backlog_bytes) as backlog
FROM
event_log_raw,
latest_update
WHERE
event_type ='flow_progress'
AND
origin.update_id = latest_update.id
The backlog metrics may not be available depending on the pipeline’s data source type and Databricks Runtime version.
Monitor enhanced autoscaling events from the event log for pipelines without serverless enabled
For DLT pipelines that do not use serverless compute, the event log captures cluster resizes when enhanced autoscaling is enabled in your pipelines. Events containing information about enhanced autoscaling have the event type autoscale
. The cluster resizing request information is stored in the details:autoscale
object. The following example queries the enhanced autoscaling cluster resize requests for the last pipeline update:
SELECT
timestamp,
Double(
case
when details :autoscale.status = 'RESIZING' then details :autoscale.requested_num_executors
else null
end
) as starting_num_executors,
Double(
case
when details :autoscale.status = 'SUCCEEDED' then details :autoscale.requested_num_executors
else null
end
) as succeeded_num_executors,
Double(
case
when details :autoscale.status = 'PARTIALLY_SUCCEEDED' then details :autoscale.requested_num_executors
else null
end
) as partially_succeeded_num_executors,
Double(
case
when details :autoscale.status = 'FAILED' then details :autoscale.requested_num_executors
else null
end
) as failed_num_executors
FROM
event_log_raw,
latest_update
WHERE
event_type = 'autoscale'
AND
origin.update_id = latest_update.id
Monitor compute resource utilization
cluster_resources
events provide metrics on the number of task slots in the cluster, how much those task slots are utilized, and how many tasks are waiting to be scheduled.
When enhanced autoscaling is enabled, cluster_resources
events also contain metrics for the autoscaling algorithm, including latest_requested_num_executors
, and optimal_num_executors
. The events also show the status of the algorithm as different states such as CLUSTER_AT_DESIRED_SIZE
, SCALE_UP_IN_PROGRESS_WAITING_FOR_EXECUTORS
, and BLOCKED_FROM_SCALING_DOWN_BY_CONFIGURATION
.
This information can be viewed in conjunction with the autoscaling events to provide an overall picture of enhanced autoscaling.
The following example queries the task queue size history for the last pipeline update:
SELECT
timestamp,
Double(details :cluster_resources.avg_num_queued_tasks) as queue_size
FROM
event_log_raw,
latest_update
WHERE
event_type = 'cluster_resources'
AND
origin.update_id = latest_update.id
The following example queries the utilization history for the last pipeline update:
SELECT
timestamp,
Double(details :cluster_resources.avg_task_slot_utilization) as utilization
FROM
event_log_raw,
latest_update
WHERE
event_type = 'cluster_resources'
AND
origin.update_id = latest_update.id
The following example queries the executor count history, accompanied by metrics available only for enhanced autoscaling pipelines, including the number of executors requested by the algorithm in the latest request, the optimal number of executors recommended by the algorithm based on the most recent metrics, and the autoscaling algorithm state:
SELECT
timestamp,
Double(details :cluster_resources.num_executors) as current_executors,
Double(details :cluster_resources.latest_requested_num_executors) as latest_requested_num_executors,
Double(details :cluster_resources.optimal_num_executors) as optimal_num_executors,
details :cluster_resources.state as autoscaling_state
FROM
event_log_raw,
latest_update
WHERE
event_type = 'cluster_resources'
AND
origin.update_id = latest_update.id
Audit Delta Live Tables pipelines
You can use Delta Live Tables event log records and other Databricks audit logs to get a complete picture of how data is being updated in Delta Live Tables.
Delta Live Tables uses the credentials of the pipeline owner to run updates. You can change the credentials used by updating the pipeline owner. Delta Live Tables records the user for actions on the pipeline, including pipeline creation, edits to configuration, and triggering updates.
See Unity Catalog events for a reference of Unity Catalog audit events.
Query user actions in the event log
You can use the event log to audit events, for example, user actions. Events containing information about user actions have the event type user_action
.
Information about the action is stored in the user_action
object in the details
field. Use the following query to construct an audit log of user events. To create the event_log_raw
view used in this query, see Query the event log.
SELECT timestamp, details:user_action:action, details:user_action:user_name FROM event_log_raw WHERE event_type = 'user_action'
timestamp | action | user_name |
---|---|---|
2021-05-20T19:36:03.517+0000 | START | user@company.com |
2021-05-20T19:35:59.913+0000 | CREATE | user@company.com |
2021-05-27T00:35:51.971+0000 | START | user@company.com |
Runtime information
You can view runtime information for a pipeline update, for example, the Databricks Runtime version for the update:
SELECT details:create_update:runtime_version:dbr_version FROM event_log_raw WHERE event_type = 'create_update'
dbr_version |
---|
11.0 |