1.3 Delta Lake Fundamentals

Key Takeaways

  • Delta Lake is an open-source storage layer that brings ACID transactions and reliability to data lakes.
  • Every Delta table has a transaction log (_delta_log directory) that records every change as an ordered, atomic commit.
  • Delta Lake provides four key guarantees: atomicity, consistency, isolation, and durability (ACID).
  • Time travel allows querying any historical version of a Delta table using VERSION AS OF or TIMESTAMP AS OF syntax.
  • The DESCRIBE HISTORY command shows the full audit log of all operations performed on a Delta table.
Last updated: March 2026

Delta Lake Fundamentals

Quick Answer: Delta Lake is an open-source storage layer that adds ACID transactions, schema enforcement, and time travel to data stored in Parquet format on cloud object storage. Every change is recorded in a transaction log (_delta_log), enabling reliable, auditable data management.

What Is Delta Lake?

Delta Lake is the default table format in Databricks. When you create a table without specifying a format, it is automatically a Delta table. Delta Lake adds a transaction log on top of Parquet files stored in cloud object storage.

Delta Lake = Parquet Files + Transaction Log

my_table/
├── _delta_log/          # Transaction log directory
│   ├── 00000000000000000000.json  # Commit 0
│   ├── 00000000000000000001.json  # Commit 1
│   ├── 00000000000000000002.json  # Commit 2
│   └── 00000000000000000010.checkpoint.parquet  # Checkpoint
├── part-00000-xxxx.parquet  # Data file
├── part-00001-xxxx.parquet  # Data file
└── part-00002-xxxx.parquet  # Data file

ACID Transactions

Delta Lake provides full ACID compliance:

PropertyGuaranteeExample
AtomicityTransactions fully complete or fully fail — no partial writesA multi-file write either adds all files or none
ConsistencyData always meets defined constraints after each transactionSchema enforcement rejects invalid writes
IsolationConcurrent operations don't interfere with each otherReaders see a consistent snapshot even during writes
DurabilityCommitted changes are permanent and survive failuresCommitted data persists in cloud object storage

How ACID Works in Delta Lake

  1. Write operations create new Parquet data files
  2. A new JSON commit is atomically added to the transaction log
  3. The commit references which files to add and which to remove
  4. Readers always see the latest committed version (snapshot isolation)
  5. Failed writes leave no trace — uncommitted files are eventually cleaned up

On the Exam: A common question pattern asks what happens when a write fails midway. The answer is always: the transaction is atomic, so readers see either the fully completed write or the previous version. There is no corrupted or partial state.

The Transaction Log (_delta_log)

The transaction log is a directory called _delta_log within every Delta table. It is an ordered record of every transaction performed on the table.

Transaction Log Entries

Each commit is a JSON file containing:

  • Add file actions: New Parquet files to include in the table
  • Remove file actions: Old Parquet files to exclude (logically deleted)
  • Metadata changes: Schema updates, table properties changes
  • Protocol changes: Reader/writer protocol version updates

Checkpoints

Every 10 commits, Delta Lake creates a checkpoint file (in Parquet format) that summarizes the current state of the table. This speeds up reads by eliminating the need to replay all individual commit files.

Time Travel

Time travel lets you query historical versions of a Delta table:

-- Query by version number
SELECT * FROM my_table VERSION AS OF 5;

-- Query by timestamp
SELECT * FROM my_table TIMESTAMP AS OF '2026-03-01';

-- Using the @ shortcut syntax
SELECT * FROM my_table@v5;
SELECT * FROM my_table@20260301;

Use Cases for Time Travel

  • Audit and compliance: See exactly what the data looked like at any point
  • Debugging: Compare current data to a previous version to find when a bug was introduced
  • Reproducibility: Rerun ML training on the exact same dataset version
  • Rollback: Restore a table to a previous version if a bad write occurs

Restoring a Previous Version

-- Restore table to version 5
RESTORE TABLE my_table TO VERSION AS OF 5;

-- Restore table to a timestamp
RESTORE TABLE my_table TO TIMESTAMP AS OF '2026-03-01';

DESCRIBE HISTORY

The DESCRIBE HISTORY command shows the audit log of all operations:

DESCRIBE HISTORY my_table;

This returns a table with columns including:

  • version: The version number
  • timestamp: When the operation occurred
  • operation: The type of operation (WRITE, MERGE, DELETE, etc.)
  • operationParameters: Details of the operation
  • userIdentity: Who performed the operation
  • operationMetrics: Rows affected, files added/removed

Schema Enforcement and Evolution

Schema Enforcement (Schema on Write)

When writing to a Delta table, Delta Lake checks that the incoming data matches the table schema:

  • Column names must match
  • Data types must be compatible
  • Extra columns in the write data are rejected by default
  • Missing columns receive null values
-- This will fail if new_data has columns not in the table schema
INSERT INTO my_table SELECT * FROM new_data;

Schema Evolution

When source data legitimately has new columns, you can enable schema evolution:

-- Enable schema evolution for a specific write
SET spark.databricks.delta.schema.autoMerge.enabled = true;

-- Or use mergeSchema option
INSERT INTO my_table
SELECT * FROM new_data
-- New columns in new_data will be added to the table schema

On the Exam: Know the difference between schema enforcement (rejects mismatched writes) and schema evolution (automatically adds new columns when enabled). Schema enforcement is the default behavior; schema evolution must be explicitly enabled.

Loading diagram...
Delta Lake Transaction Log Flow
Test Your Knowledge

What happens when a write operation to a Delta table fails midway through execution?

A
B
C
D
Test Your Knowledge

How does Delta Lake speed up reads after many transactions have occurred?

A
B
C
D
Test Your Knowledge

Which SQL syntax correctly queries version 10 of a Delta table named "orders"?

A
B
C
D
Test Your Knowledge

A data engineer writes data to a Delta table, but the incoming DataFrame has two extra columns not in the table schema. What is the default behavior?

A
B
C
D
Test Your Knowledge

Which command shows the complete history of all operations performed on a Delta table, including who performed each operation?

A
B
C
D
Test Your Knowledge

A data engineer accidentally updates the wrong rows in a Delta table. How can they undo this change?

A
B
C
D