Skip to main content

Compute system tables reference

Preview

This system table is in Public Preview. To access the table, the schema must be enabled in your system catalog. For more information, see Enable system table schemas.

This article provides you with a reference guide for the compute system tables. You can use these tables to monitor the activity and metrics of all-purpose and jobs compute in your account:

  • clusters: Records compute configurations in your account.
  • node_types: Includes a single record for each of the currently available node types, including hardware information.
  • node_timeline: Includes minute-by-minute records of your compute’s utilization metrics.

Cluster table schema

The cluster table is a slow-changing dimension table that contains the full history of compute configurations over time for all-purpose and jobs compute.

Table path: This system table is located at system.compute.clusters

Column nameData typeDescriptionExample
account_idstringID of the account where this cluster was created.23e22ba4-87b9-
4cc2-9770-d10b894b7118
workspace_idstringID of the workspace where this cluster was created.1234567890123456
cluster_idstringID of the cluster for which this record is associated.0000-123456-xxxxxxxx
cluster_namestringUser defined name for the cluster.My cluster
owned_bystringUsername of the cluster owner. Defaults to the cluster creator, but can be changed through the Clusters API.sample_user@email.com
create_timetimestampTimestamp of the change to this compute definition.2023-01-09 11:00:00.000
delete_timetimestampTimestamp of when the cluster was deleted. The value is null if the cluster is not deleted.2023-01-09 11:00:00.000
driver_node_typestringDriver node type name. This matches the instance type name from the cloud provider.i3.xlarge
worker_node_typestringWorker node type name. This matches the instance type name from the cloud provider.i3.xlarge
worker_countbigintNumber of workers. Defined for fixed-size clusters only.4
min_autoscale_workersbigintThe set minimum number of workers. This field is valid only for autoscaling clusters.1
max_autoscale_workersbigintThe set maximum number of workers. This field is valid only for autoscaling clusters.1
auto_termination_minutesbigintThe configured autotermination duration.120
enable_elastic_diskbooleanAutoscaling disk enablement status.true
tagsmapUser-defined tags for the cluster (does not include default tags).{"ResourceClass":"SingleNode"}
cluster_sourcestringIndicates the creator for the cluster: UI, API, JOB, etc.UI
init_scriptsarraySet of paths for init scripts."/Users/example@email.com
/files/scripts/install-python-pacakges.sh"
aws_attributesstructAWS specific settings.{
"ebs_volume_count": null,
"availability": "SPOT_WITH_FALLBACK",
"first_on_demand": "0",
"spot_bid_price_percent": "100"
}
azure_attributesstructAzure specific settings.null
gcp_attributesstructGCP specific settings. This field will be empty.null
driver_instance_pool_idstringInstance pool ID if the driver is configured on top of an instance pool.1107-555555-crhod16-pool-DIdnjazB
worker_instance_pool_idstringInstance Pool ID if the worker is configured on top of an instance pool.1107-555555-crhod16-pool-DIdnjazB
dbr_versionstringThe Databricks Runtime of the cluster.14.x-snapshot-scala2.12
change_timetimestampTimestamp of change to the compute definition.2023-01-09 11:00:00.000
change_datedateChange date. Used for retention.2023-01-09

Node types table schema

The node type table captures the currently available node types with their basic hardware information.

Table path: This system table is located at system.compute.node_types.

Column nameData typeDescriptionExample
account_idstringID of the account where this cluster was created.23e22ba4-87b9-4cc2-9770-d10b894b7118
node_typestringUnique identifier for node type.i3.xlarge
core_countdoubleNumber of vCPUs for the instance.48.0
memory_mblongTotal memory for the instance.393216
gpu_countlongNumber of GPUs for the instance.0

Node timeline table schema

The node timeline table captures node-level resource utilization data at minute granularity. Each record contains data for a given minute of time per instance.

Table path: This system table is located at system.compute.node_timeline.

Column nameData typeDescriptionExample
account_idstringID of the account where this compute resource is running.23e22ba4-87b9-4cc2-9770-d10b894b7118
workspace_idstringID of the workspace where this compute resource is running.1234567890123456
cluster_idstringID of the compute resource.0000-123456-crmpt124
instance_idstringID for the specific instance.i-1234a6c12a2681234
start_timetimestampStart time for the record in UTC.2024-07-16T12:00:00Z
end_timetimestampEnd time for the record in UTC.2024-07-16T13:00:00Z
driverbooleanWhether the instance is a driver or worker node.true
cpu_user_percentdoublePercentage of time the CPU spent in userland.34.76163817234407
cpu_system_percentdoublePercentage of time the CPU spent in the kernel.1.0895310279488264
cpu_wait_percentdoublePercentage of time the CPU spent waiting for I/O.0.03445157400629276
mem_used_percentdoublePercentage of the compute’s memory that was used during the time period (including memory used by background processes running on the compute).45.34858216779041
mem_swap_percentdoublePercentage of memory usage attributed to memory swap.0.014648443087939
network_sent_bytesbigintThe number of bytes sent out in network traffic.517376
network_received_bytesbigintThe number of received bytes from network traffic.179234
disk_free_bytes_per_mount_pointmapThe disk utilization grouped by mount point. This is ephemeral storage provisioned only while the compute is running.{"/var/lib/lxc":123455551234,"/":

123456789123,"/local_disk0":123412341234}
node_typestringThe name of the node type. This will match the instance type name from the cloud provider.i3.xlarge

Known limitations

  • Compute resources that were marked deleted before October 23, 2023 do not appear in the clusters table. This might result in joins from the system.billing.usage table not matching records in the clusters table. All active compute resources have been backfilled.
  • These tables only includes records for all-purpose and jobs compute. They do not contain records for serverless compute, Delta Live Tables compute, or SQL warehouses.
  • Nodes that ran for less than 10 minutes might not appear in the node_timeline table.

Sample queries

You can use the following sample queries to answer common questions:

note

Some of these examples join the cluster table with the system.billing.usage table. Since billing records are cross-regional and cluster records region-sepcific, billing records only match cluster records for the region in which you are querying. To see records from another region, please execute the query in that region.

Join cluster records with the most recent billing records

This query can help you understand spending over time. Once you update the usage_start_time to the most current billing period, it grabs the most recent updates to the billing records to join into clusters data.

Each record is associated with the cluster owner during that particular run. So, if the cluster owner changes, costs will roll up to the correct owner based on when the cluster was used.

SQL
SELECT
u.record_id,
c.cluster_id,
c.owned_by,
c.change_time,
u.usage_start_time,
u.usage_quantity
FROM
system.billing.usage u
JOIN system.compute.clusters c
JOIN (SELECT u.record_id, c.cluster_id, max(c.change_time) change_time
FROM system.billing.usage u
JOIN system.compute.clusters c
WHERE
u.usage_metadata.cluster_id is not null
and u.usage_start_time >= '2023-01-01'
and u.usage_metadata.cluster_id = c.cluster_id
and date_trunc('HOUR', c.change_time) <= date_trunc('HOUR', u.usage_start_time)
GROUP BY all) config
WHERE
u.usage_metadata.cluster_id is not null
and u.usage_start_time >= '2023-01-01'
and u.usage_metadata.cluster_id = c.cluster_id
and u.record_id = config.record_id
and c.cluster_id = config.cluster_id
and c.change_time = config.change_time
ORDER BY cluster_id, usage_start_time desc;

Identify the compute resources with the highest average utilization and peak utilization

Identify the all-purpose and jobs compute that have the highest average CPU utilization and the highest peak CPU utilization.

SQL
SELECT
distinct cluster_id,
driver,
avg(cpu_user_percent + cpu_system_percent) as `Avg CPU Utilization`,
max(cpu_user_percent + cpu_system_percent) as `Peak CPU Utilization`,
avg(cpu_wait_percent) as `Avg CPU Wait`,
max(cpu_wait_percent) as `Max CPU Wait`,
avg(mem_used_percent) as `Avg Memory Utilization`,
max(mem_used_percent) as `Max Memory Utilization`,
avg(network_received_bytes)/(1024^2) as `Avg Network MB Received per Minute`,
avg(network_sent_bytes)/(1024^2) as `Avg Network MB Sent per Minute`
FROM
node_timeline
WHERE
start_time >= date_add(now(), -1)
GROUP BY
cluster_id,
driver
ORDER BY
3 desc;