Databricks Delta provides ACID transactional guarantees between reads and writes. This means that
- Multiple writers, even if they are across multiple clusters, can simultaneously modify a dataset and see a consistent snapshot view of the table and there will be a serial order for these writes
- Readers will continue to see the consistent snapshot view of the table that the spark job started with even when the table is modified during the job.
Delta uses Optimistic Concurrency Control to provide transactional guarantees between writes. Under this mechanism, writers operate in three stages.
- Read: A transactional write will start by reading (if needed) the latest available version of the table to identify which files need to be modified (that is, rewritten).
- Write: Then it will stage all the changes by writing new data files.
- Validate and commit: Finally, for committing the changes, in our optimistic concurrency protocol checks whether the proposed changes conflict with any other changes that may have been concurrently committed since the snapshot that was read. If there are no conflicts, then the all the staged changes will be committed as a new versioned snapshot, and the write operation succeeds. However, if there are conflicts, then the write operation will fail with a concurrent modification exception rather than corrupting the table as would happen with OSS Spark.
Together, Delta achieves the sweet spot of achieving high availability for reads while ensuring data consistency for all writers. Users can tune this balance by setting the isolation level of a Delta table.
The isolation level of a table defines the degree to which a transaction must be isolated from modifications made by concurrent transactions. Currently, we support two isolation levels.
Serializable: This is the strongest isolation level which ensures that committed write operations and all reads are Serializable.In other words, operations will be allowed as long as there exists a serial sequence of executing them one-at-a-time that generates the same outcome as that seen in the table. For the write operations, the serial sequence is exactly the same as seen in the table’s history.
WriteSerializable [DEFAULT]: This is a slightly weaker isolation level than Serializable because it ensures only that the write operations (i.e. not reads) are serializable. However, this is still stronger than Snapshot Isolation level. This is the default isolation level because it provides great balance of data consistency and availability for most common operations.
In this mode, the content of the table may be different from that which is expected from the sequence operations seen in the table’s history. This is because this mode allows certain pairs of concurrent writes (say, operations X and Y) to proceed such that the result would be as if Y was performed before X (i.e. serializable between them) even though the history would show that Y was committed after X. If you would like to disallow this reordering, you can set the table isolation level to be Serializable to cause these transactions to instead fail (see Setting the isolation level of a table).
The following matrix describes which pairs of write operations can conflict in each isolation level.
|INSERT||UPDATE / DELETE / MERGE||OPTIMIZE|
|UPDATE / DELETE / MERGE||Can conflict in Serializable, cannot conflict in WriteSerializable||Can conflict in Serializable & WriteSerializable|
|OPTIMIZE||Cannot conflict||Can conflict in Serializable & WriteSerializable||Can conflict in Serializable & WriteSerializable|
In all cases marked “can conflict”, whether the two operations will conflict depends on whether they touch the same set of files. The two sets of files can be made disjoint by partitioning the table by the same columns as those used in the conditions of the operations. For example, the two commands
UPDATE table WHERE date > '2010-01-01' ... and
DELETE table WHERE date < '2010-01-01' will conflict if the table is not partitioned by date as both can attempt to modify the same set of files. Partitioning the table by
date will avoid the conflict. Hence, partitioning a table according to the conditions commonly used on the command can reduce conflicts significantly. However, note that partitioning a table by a column that has high cardinality can lead to other performance issues due to large number of subdirectories.
You can set the isolation level using the ALTER TABLE command.
ALTER TABLE <table-id> SET TBLPROPERTIES ('delta.isolationLevel' = <level-name>) where <level-name> = 'Serializable' or 'WriteSerializable'
For example, to change the isolation level from the default Serializable to WriteSerializable, run the following:
ALTER TABLE tableName SET TBLPROPERTIES ('delta.isolationLevel' = 'WriteSerializable')