4.1 Data Loading, Unloading, and Connectivity Overview
Key Takeaways
- This domain is 18% of the SnowPro Core COF-C03 exam and centers on COPY INTO, stages, Snowpipe, and unloading.
- Bulk loading uses COPY INTO <table> against a stage; continuous loading uses Snowpipe pipes with serverless compute.
- Snowflake supports six file formats for loading: CSV, JSON, Avro, ORC, Parquet, and XML.
- Stages come in four flavors: user (@~), table (@%table), named internal, and external (S3/Azure/GCS).
- COPY INTO maintains 64-day load metadata per table to prevent reloading the same files.
What This Domain Covers
Data Loading, Unloading, and Connectivity is worth 18% of the SnowPro Core (COF-C03) exam, making it one of the two heaviest domains alongside Account & Security. Every question here ties back to one of four pillars: bulk loading with COPY INTO <table>, continuous loading with Snowpipe, bulk unloading with COPY INTO <location>, and the connectors/drivers that applications use to reach Snowflake. You should be able to pick the right tool for an ingestion pattern, name the supported file formats, and explain how stages broker files between cloud storage and Snowflake tables.
Snowflake separates storage (where files sit), stages (the pointer/landing zone), and compute (the virtual warehouse that runs the load). A bulk COPY INTO consumes a user-specified warehouse; Snowpipe and external-table refresh use serverless compute that Snowflake manages and meters separately. Knowing which path uses your warehouse versus Snowflake-managed compute is a recurring exam distinction.
Supported file formats
Snowflake loads six formats. Memorize this list and which are structured versus semi-structured.
| Format | Category | Notes |
|---|---|---|
| CSV (delimited) | Structured | Default TYPE = CSV; field/record delimiters, skip header |
| JSON | Semi-structured | Loads into a VARIANT column |
| Avro | Semi-structured | Binary, schema-embedded → VARIANT |
| ORC | Semi-structured | Columnar binary → VARIANT |
| Parquet | Semi-structured | Columnar binary → VARIANT; common for data lakes |
| XML | Semi-structured | Preview-era support; loads to VARIANT |
Semi-structured formats land in a single VARIANT column unless you transform during load. Binary columnar formats (Parquet, ORC, Avro) are billed by Snowpipe on their observed size regardless of compression, while text formats (CSV, JSON, XML) are billed on uncompressed size.
Stages: The Four Types
A stage is a named location that references files for loading or unloading. There are two internal categories (Snowflake-managed storage) and one external category (your own cloud bucket).
- User stage
@~— one per user, automatically provisioned, cannot be altered or dropped, not shareable. Good for files a single user loads into many tables. - Table stage
@%mytable— one per table, automatically provisioned, references files destined for that one table. Cannot be altered/dropped independently; no transformation on load to a different table. - Named internal stage
@my_stage— explicitly created withCREATE STAGE, the most flexible internal option; supports file formats, directory tables, and grants to roles. - External stage
@my_ext_stage— points at Amazon S3, Azure Blob/ADLS Gen2, or Google Cloud Storage using a credential or, preferably, a storage integration.
You upload local files to an internal stage with the PUT command (via SnowSQL or a driver — PUT is not available in the Snowsight worksheet UI) and download from a stage with GET. LIST @stage shows staged files and metadata (name, size, MD5, last-modified); REMOVE deletes them.
Load metadata and idempotency
Snowflake records load metadata for each target table for 64 days. COPY INTO uses it to skip files already loaded successfully, making reloads idempotent. After 64 days, that history expires and a file could be reloaded. Use FORCE = TRUE to deliberately reload, or LOAD_UNCERTAIN_FILES for files whose status is ambiguous. This metadata is per-table, which is why the same file can be loaded into two different tables independently.
File Formats Objects and Directory Tables
Rather than repeating parsing options inside every COPY INTO, you can create a reusable file format object: CREATE FILE FORMAT my_csv TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1 NULL_IF = ('NULL',''). The format can then be referenced by name in COPY INTO, in stage definitions, or in external tables. CSV options you should recognize include FIELD_DELIMITER, RECORD_DELIMITER, SKIP_HEADER, FIELD_OPTIONALLY_ENCLOSED_BY, ESCAPE, DATE_FORMAT, ENCODING, and ERROR_ON_COLUMN_COUNT_MISMATCH.
For semi-structured formats, STRIP_OUTER_ARRAY (JSON) collapses a top-level array so each element becomes a row, and BINARY_FORMAT controls binary encoding.
A directory table is an implicit, queryable layer of file metadata over a stage (DIRECTORY = (ENABLE = TRUE)). It returns RELATIVE_PATH, SIZE, LAST_MODIFIED, MD5, and a scoped file URL, and underpins features like unstructured-file access and Streamlit data apps. Refresh it manually with ALTER STAGE ... REFRESH or automatically via cloud notifications.
Compression on load
Snowflake auto-detects common compression on staged files. By default COMPRESSION = AUTO, so a .gz, .bz2, .zstd, .deflate, or .snappy file is decompressed transparently during load. You only set COMPRESSION explicitly when auto-detection would guess wrong or when the file is uncompressed but named misleadingly. Because Snowflake stores all table data in its own compressed columnar micro-partitions, the staged file's compression affects only transfer and parse cost, not how data is ultimately stored.
Why this domain rewards a workflow mindset
Rather than memorizing options in isolation, picture the end-to-end path for a load: a file is produced upstream, lands in cloud storage or is pushed with PUT, a stage points at it, a file format describes how to parse it, a warehouse (or serverless Snowpipe) runs the COPY, load metadata records the result, and copy options decide what happens to errors and to the staged file afterward. Most exam stems describe one point on that path and ask for the next correct action. If you can locate the stem on this pipeline, the distractors that belong to a different stage of the workflow become easy to eliminate.
Build the pipeline picture first, then map each term — stage, file format, COPY, Snowpipe, external table, connector — onto it.
A developer needs to upload local CSV files from their laptop into a Snowflake internal stage. Which command should they use, and where can it run?
How long does Snowflake retain load metadata that prevents COPY INTO from reloading previously loaded files?
Which stage type is automatically provisioned, unique to a single user, and cannot be altered or dropped?
Which of the following file formats can Snowflake load into a table?