VACUUM Command on a Databricks Delta Table Stored in S3

This topic explains a scenario that you might encounter when running a VACUUM command on a Databricks Delta table stored in an S3 bucket.

Problem

You can store a Databricks Delta table in DBFS or S3. When saving into S3 the location of the table should be provided using s3a and not using s3. If the tables are created using s3 instead of s3a, the VACUUM command would fail with the following error:

Error in SQL statement: SparkException: Job aborted due to stage failure: Task 6 in stage 343.0 failed 4 times, most recent failure: Lost task 6.3 in stage 343.0 (TID 4130677, 10.25.122.176, executor 13): java.lang.AssertionError: assertion failed:
Shouldn't have any absolute paths for deletion here

Solution

The error no longer occurs in Databricks Runtime 5.0. If you are experiencing the error for a large number of tables, create a ticket with Databricks support to update the metadata. A quick workaround can be to:

  1. Create a new table and provide the table location using s3a.
  2. Write an insert query to insert the data from the old table to the new table.
  3. Drop the old table.