Audit Logs

Databricks provides comprehensive end-to-end audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns.

Enable audit logs

Audit logs are available only in the Enterprise Plan. Contact sales@databricks.com for more information.

Configure audit log delivery

If your account is enabled for audit logging, the Databricks account owner configures where Databricks sends the logs. Admin users cannot configure audit log delivery.

  1. Log in to the Account Console.

  2. Click the Audit Logs tab.

  3. Configure the S3 bucket and directory:

    • S3 Bucket in <region name>: the S3 bucket where you want to store your audit logs. The bucket must exist.
    • Path: the path to the directory in the S3 bucket where you want to store the audit logs. For example, /databricks/auditlogs. If you want to store the logs at the bucket root, enter /.

    Databricks sends the audit logs to the specified S3 bucket and directory path, partitioned by date. For example, my-bucket/databricks/auditlogs/date=2018-01-15/part-0.json.gz.

Configure access policy

Configure Databricks access to your AWS S3 bucket using an access policy.

Step 1: Generate the access policy

In the Databricks Account Console, on the Audit Logs tab:

  1. Click the Generate Policy button. This policy ensures that the Databricks AWS account has write permission on the bucket and directory that you specified. The first section grants Databricks write permissions. Databricks does not have read, list, or delete permission. The second section ensures that you have full control over everything that Databricks writes to your bucket.
  2. Copy the generated JSON policy to your clipboard.

Step 2: Apply the policy to the AWS S3 bucket

  1. In the AWS console, go to the S3 service.
  2. Click the name of the bucket where you want to store the audit logs.
  3. Click the Permissions tab.
  4. Click the Bucket Policy button.
  5. Paste the policy string from Step 1.
  6. Click Save.

Step 3: Verify that the policy is applied correctly

In the Databricks Account Console, on the Audit Logs tab, click the Verify Access button.

../../_images/audit-logs-verify.png

If you see a check mark check, audit logs are configured correctly. If verification fails:

  1. Check that you entered the bucket name correctly, and that the AWS region is correct.
  2. Check that you copied the generated policy correctly to AWS.
  3. Contact your AWS account admin.

Audit log delivery

Once logging is enabled for your account, Databricks automatically starts sending audit logs in human-readable format to your delivery location on a periodic basis. Logs are available within 72 hours of activation.

Encryption
Databricks encrypts audit logs using Amazon S3 server-side encryption.
Format
Databricks delivers audit logs in gzipped json format, for example json.gz.
When
Databricks delivers audit logs daily and partitions the logs by date in yyyy-MM-dd format.
Guarantees
  • Databricks delivers logs within 72 hours after day close.
  • Each audit log record is unique.

Note

  • In order to guarantee exactly-once delivery of your audit logs while accounting for late records, Databricks can overwrite the delivered log files in your bucket at any time during the three-day period after the log date. After three days, audit files become immutable. In other words, logs for 2018-01-06 are subject to overwrites through 2018-01-09, and you can safely archive them on 2018-01-10.
  • Overwriting ensures exactly-once semantics without requiring read or delete access to your account.

Audit log schema

The schema of audit log records is as follows:

  • version: the schema version of the audit log format
  • timestamp: UTC timestamp of the action
  • sourceIPAddress: the IP address of the source request
  • userAgent: the browser or API client used to make the request
  • sessionId: session ID of the action
  • userIdentity: information about the user that makes the requests
    • email: user email address
  • serviceName: the service that logged the request
  • actionName: the action, such as login, logout, read, write, etc
  • requestId: unique request ID
  • requestParams: parameter key-value pairs used in the audited event
  • response: response to the request
    • errorMessage: the error message if there was an error
    • result: the result of the request
    • statusCode: HTTP status code that indicates the request succeeds or not

Audit events

The serviceName and actionName properties identify an audit event in an audit log record. The naming convention follows the Databricks REST API 2.0.

Databricks provides audit logs for the following services:

  • Accounts
  • Clusters
  • DBFS
  • Genie
  • Jobs
  • ACLs, including SQL-only table permissions
  • SSH
  • Tables

If actions take a long time, the request and response are logged separately but the request and response pair have the same requestId.

With the exception of mount-related operations, Databricks audit logs do not include DBFS-related operations. We recommend that you set up server access logging in S3, which can log object-level operations associated with an IAM role. If you map IAM roles to Databricks users, your Databricks users cannot share IAM roles.

Note

Automated actions–such as resizing a cluster due to autoscaling or launching a job due to scheduling–are performed by the user ‘System-User’.

Analyze audit logs

You can analyze audit logs using Databricks. The following example uses logs to report on Databricks access and Spark versions.

Load audit logs as a DataFrame and register the DataFrame as a temp table. See Amazon S3 for a detailed guide.

val df = sqlContext.read.json("s3a://bucketName/path/to/auditLogs")
df.createOrReplaceTempView("audit_logs")

List the users who accessed Databricks and from where.

%sql
SELECT DISTINCT userIdentity.email, sourceIPAddress
FROM audit_logs
WHERE serviceName = "accounts" AND actionName LIKE "%login%"

Check the Spark versions used.

%sql
SELECT requestParams.spark_version, COUNT(*)
FROM audit_logs
WHERE serviceName = "clusters" AND actionName = "create"
GROUP BY requestParams.spark_version