2.10 Table Management and Properties
Key Takeaways
- Managed tables have both their metadata and data files governed by Databricks; dropping one deletes the underlying data after a retention period.
- External (unmanaged) tables are created with a LOCATION clause; dropping one removes only the metadata, leaving the data files intact.
- CREATE TABLE ... AS SELECT (CTAS) creates a table and populates it from a query in one statement.
- TBLPROPERTIES set key-value metadata such as delta.appendOnly or delta.deletedFileRetentionDuration; comments and tags aid governance.
- Unity Catalog uses three-level naming: catalog.schema.table.
Managed vs External Tables
The most-tested table concept is the distinction between managed and external (also called unmanaged) tables, which differ in who controls the data lifecycle:
| Aspect | Managed | External |
|---|---|---|
LOCATION clause | Not specified | Required |
| Data storage | Databricks-managed storage | Your specified path |
DROP TABLE | Deletes metadata and data files | Deletes metadata only |
| Lifecycle | Fully governed by Databricks | You manage the files |
-- Managed: no LOCATION
CREATE TABLE main.silver.orders (id INT, amt DOUBLE);
-- External: LOCATION points to your storage
CREATE TABLE main.silver.raw (id INT)
LOCATION 'abfss://data@acct.dfs.core.windows.net/raw';
The critical exam trap: dropping a managed table deletes its data (after a retention period in Unity Catalog), whereas dropping an external table leaves the underlying files untouched — only the catalog entry is removed. Choose external tables when other tools must own or share the files; choose managed tables for the simplest governed lifecycle.
CTAS and Three-Level Namespace
CREATE TABLE AS SELECT (CTAS) creates a table and fills it from a query in a single atomic statement, inferring the schema from the query:
CREATE TABLE main.gold.daily_sales AS
SELECT date, sum(amount) AS total
FROM main.silver.orders GROUP BY date;
Variants include CREATE OR REPLACE TABLE ... AS SELECT (replace the table) and CREATE TABLE IF NOT EXISTS. In Unity Catalog, every table is addressed with a three-level namespace: catalog.schema.table (schema is sometimes called database). Always qualifying names — main.silver.orders rather than bare orders — avoids ambiguity across catalogs and is best practice in governed environments. The legacy two-level hive_metastore namespace still exists but Unity Catalog's three-level model is the current standard.
Table Properties
TBLPROPERTIES attach arbitrary key-value metadata to a table, including Delta behaviors:
ALTER TABLE main.silver.orders SET TBLPROPERTIES (
'delta.appendOnly' = 'true',
'quality' = 'silver');
| Property | Effect |
|---|---|
delta.appendOnly | Forbid updates/deletes; only appends allowed |
delta.deletedFileRetentionDuration | How long removed files are kept for time travel/VACUUM |
delta.logRetentionDuration | How long the transaction-log history is retained |
custom keys (e.g. team, pii) | Governance tags / documentation |
Properties prefixed delta. configure Delta engine behavior; other keys are free-form documentation. You can also attach a COMMENT to a table or column for the data catalog, and use ALTER TABLE ... ADD COLUMN / SET TBLPROPERTIES to evolve a table without recreating it. Inspect the current properties and storage details with DESCRIBE TABLE EXTENDED or DESCRIBE DETAIL, which reveal whether a table is managed or external and where its files live.
Views, Temp Views, and Inspecting Tables
Beyond tables, Databricks offers several view types that store a query, not data:
| Object | Lifetime / scope |
|---|---|
| (Stored) VIEW | Persisted in the catalog; available to all sessions |
| TEMP VIEW | Session-scoped; gone when the notebook detaches |
| GLOBAL TEMP VIEW | Cluster-scoped via the global_temp schema |
CREATE VIEW main.gold.active AS SELECT * FROM main.silver.users WHERE active;
CREATE OR REPLACE TEMP VIEW recent AS SELECT * FROM events WHERE ts > current_date() - 7;
A view always re-runs its underlying query, so it reflects current data but adds no storage; use a table (or a materialized view) when you need precomputed results.
To introspect tables, the DESCRIBE family is essential: DESCRIBE TABLE shows columns and types, DESCRIBE TABLE EXTENDED adds the table Type (MANAGED vs EXTERNAL), location, provider, and properties, and DESCRIBE DETAIL returns Delta-specific metadata such as the number of files and size. DESCRIBE HISTORY lists every commit (the audit log enabling time travel), and SHOW TBLPROPERTIES lists the key-value settings. Recognizing that EXTENDED/DETAIL output is how you confirm whether a table is managed or external — and therefore what DROP will do to the data — is a frequently tested operational skill on the certification.
Altering Tables and Recovering Dropped Data
Tables evolve. ALTER TABLE covers most non-destructive changes without recreating the object:
| Operation | Statement |
|---|---|
| Add a column | ALTER TABLE t ADD COLUMN c STRING |
| Rename a column | ALTER TABLE t RENAME COLUMN a TO b |
| Set/unset properties | ALTER TABLE t SET TBLPROPERTIES (...) |
| Add a comment | ALTER TABLE t SET TBLPROPERTIES ('comment'='...') |
| Rename the table | ALTER TABLE t RENAME TO t2 |
Column renames and drops require column-mapping mode (a delta.minReaderVersion/minWriterVersion upgrade), which Databricks sets via TBLPROPERTIES when needed.
A Unity Catalog safety feature the exam highlights: because dropping a managed table deletes its data only after a retention window, you can recover from an accidental drop with UNDROP TABLE t within that window — a capability external tables do not need, since their files were never deleted. Likewise, RESTORE TABLE t TO VERSION AS OF n rolls a Delta table back to an earlier version using the transaction log, undoing a bad write without restoring from backup.
Together, managed-table UNDROP and Delta time-travel RESTORE give engineers strong recovery options, but they reinforce why the managed-vs-external distinction — and knowing exactly what DROP removes — is essential operational knowledge for the certification.
You run DROP TABLE on an EXTERNAL table. What happens to the underlying data files?
Which clause makes a CREATE TABLE statement define an EXTERNAL (unmanaged) table?
Which TBLPROPERTIES setting prevents any updates or deletes, allowing only appends to a Delta table?