Databricks provides comprehensive end-to-end audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns.
In this topic:
Audit logs are available only in the Enterprise Plan. Contact firstname.lastname@example.org for more information.
If your account is enabled for audit logging, the Databricks account owner configures where Databricks sends the logs. Admin users cannot configure audit log delivery.
Log in to the Account Console.
Click the Audit Logs tab.
Configure the S3 bucket and directory:
- S3 Bucket in <region name>: the S3 bucket where you want to store your audit logs. The bucket must exist.
- Path: the path to the directory in the S3 bucket where you want to store the audit logs. For example,
/databricks/auditlogs. If you want to store the logs at the bucket root, enter
Databricks sends the audit logs to the specified S3 bucket and directory path, partitioned by date. For example,
Configure Databricks access to your AWS S3 bucket using an access policy.
In this section:
In the Databricks Account Console, on the Audit Logs tab:
- Click the Generate Policy button. This policy ensures that the Databricks AWS account has write permission on the bucket and directory that you specified. The first section grants Databricks write permissions. Databricks does not have read, list, or delete permission. The second section ensures that you have full control over everything that Databricks writes to your bucket.
- Copy the generated JSON policy to your clipboard.
- In the AWS console, go to the S3 service.
- Click the name of the bucket where you want to store the audit logs.
- Click the Permissions tab.
- Click the Bucket Policy button.
- Paste the policy string from Step 1.
- Click Save.
In the Databricks Account Console, on the Audit Logs tab, click the Verify Access button.
If you see a check mark , audit logs are configured correctly. If verification fails:
- Check that you entered the bucket name correctly, and that the AWS region is correct.
- Check that you copied the generated policy correctly to AWS.
- Contact your AWS account admin.
Once logging is enabled for your account, Databricks automatically starts sending audit logs in human-readable format to your delivery location on a periodic basis. Logs are available within 72 hours of activation.
- Databricks encrypts audit logs using Amazon S3 server-side encryption.
- Databricks delivers audit logs in gzipped json format, for example
- Databricks delivers audit logs daily and partitions the logs by date in
- Databricks delivers logs within 72 hours after day close.
- Each audit log record is unique.
- In order to guarantee exactly-once delivery of your audit logs while accounting for late records, Databricks can overwrite the delivered log files in your bucket at any time during the three-day period after the log date. After three days, audit files become immutable. In other words, logs for 2018-01-06 are subject to overwrites through 2018-01-09, and you can safely archive them on 2018-01-10.
- Overwriting ensures exactly-once semantics without requiring read or delete access to your account.
The schema of audit log records is as follows:
version: the schema version of the audit log format
timestamp: UTC timestamp of the action
sourceIPAddress: the IP address of the source request
userAgent: the browser or API client used to make the request
sessionId: session ID of the action
userIdentity: information about the user that makes the requests
serviceName: the service that logged the request
actionName: the action, such as login, logout, read, write, etc
requestId: unique request ID
requestParams: parameter key-value pairs used in the audited event
response: response to the request
errorMessage: the error message if there was an error
result: the result of the request
statusCode: HTTP status code that indicates the request succeeds or not
actionName properties identify an audit event in an audit log record. The naming convention follows the Databricks REST API 2.0.
Databricks provides audit logs for the following services:
- ACLs, including SQL-only table permissions
If actions take a long time, the request and response are logged separately but the request and response pair have the same
With the exception of mount-related operations, Databricks audit logs do not include DBFS-related operations. We recommend that you set up server access logging in S3, which can log object-level operations associated with an IAM role. If you map IAM roles to Databricks users, your Databricks users cannot share IAM roles.
Automated actions–such as resizing a cluster due to autoscaling or launching a job due to scheduling–are performed by the user ‘System-User’.
You can analyze audit logs using Databricks. The following example uses logs to report on Databricks access and Spark versions.
Load audit logs as a DataFrame and register the DataFrame as a temp table. See Amazon S3 for a detailed guide.
val df = sqlContext.read.json("s3a://bucketName/path/to/auditLogs") df.createOrReplaceTempView("audit_logs")
List the users who accessed Databricks and from where.
%sql SELECT DISTINCT userIdentity.email, sourceIPAddress FROM audit_logs WHERE serviceName = "accounts" AND actionName LIKE "%login%"
Check the Spark versions used.
%sql SELECT requestParams.spark_version, COUNT(*) FROM audit_logs WHERE serviceName = "clusters" AND actionName = "create" GROUP BY requestParams.spark_version