4.4 Bulk Unloading and Semi-Structured Data

Key Takeaways

  • Unloading uses COPY INTO <location> FROM <table-or-query> into a stage, then GET to download.
  • SINGLE = FALSE (default) splits output into multiple files; MAX_FILE_SIZE caps each file (default 16 MB, max 5 GB).
  • Unloaded files are GZIP-compressed by default; supported output formats are CSV, JSON, and Parquet.
  • Semi-structured data loads into a VARIANT column; FLATTEN explodes arrays/objects into rows.
  • Dot/bracket notation (col:path) and casts (::type) extract values from VARIANT.
Last updated: June 2026

Bulk Unloading with COPY INTO <location>

Unloading reverses the load: COPY INTO @stage FROM (SELECT ...) writes table or query results to files in an internal or external stage, after which you GET them to a local machine (for internal stages) or read them directly from the cloud bucket (external). The output format is set with FILE_FORMAT; supported unload formats are CSV (delimited), JSON, and Parquet — note Avro, ORC, and XML can be loaded but not unloaded.

Key unload options:

OptionDefaultEffect
SINGLEFALSEFALSE splits into multiple files (one per thread); TRUE writes one file
MAX_FILE_SIZE16 MBMax bytes per output file; up to 5 GB
COMPRESSIONGZIPOutput compression (GZIP/BROTLI/ZSTD/NONE etc.)
OVERWRITEFALSEReplace existing files at the path
HEADERFALSEWrite column headers (CSV)
INCLUDE_QUERY_IDFALSEAdd query ID to filenames to avoid collisions

With SINGLE = FALSE, Snowflake names files like data_0_0_0.csv.gz, splitting work across warehouse threads for parallel writes. To produce one output file, set SINGLE = TRUE, but be aware a single file is capped (you cannot exceed MAX_FILE_SIZE's 5 GB hard limit). Because unloaded files default to GZIP, downstream consumers must decompress unless you set COMPRESSION = NONE. Like loading, 100-250 MB compressed files are the efficiency sweet spot for re-loading the data elsewhere.

Semi-Structured Data: VARIANT and FLATTEN

Snowflake stores JSON, Avro, ORC, Parquet, and XML in a VARIANT column (max ~16 MB compressed per value). You can load an entire document into one VARIANT and query it later without a fixed schema — a major Snowflake selling point.

Accessing VARIANT data

  • Path notation: src:owner.name (dot) or src['owner']['name'] (bracket) navigates nested objects.
  • Array indexing: src:items[0] reads the first element.
  • Casting: append ::type to coerce, e.g. src:price::number or src:dt::date. Without a cast, values return as VARIANT.

FLATTEN

The FLATTEN table function explodes an array or object into one row per element, used with a LATERAL join:

SELECT f.value:sku::string AS sku
FROM orders,
     LATERAL FLATTEN(input => orders.line_items) f;

FLATTEN returns columns including SEQ, KEY, PATH, INDEX, VALUE, and THIS. Set OUTER => TRUE to keep rows whose array is empty/null (like an outer join), and RECURSIVE => TRUE to expand all nested levels. This pairing — load to VARIANT, then FLATTEN into relational rows — is the canonical semi-structured workflow and is frequently tested. You can also let COPY INTO transform during load (select specific columns, apply MATCH_BY_COLUMN_NAME, reorder, or cast) so semi-structured files land in a relational shape directly.

Unload Targets, Partitioning, and Functions

Where unloaded files go

COPY INTO <location> can target any of the four stage types. Unloading to an internal stage (named, table @%t, or user @~) keeps files inside Snowflake until you GET them; unloading to an external stage writes straight into your S3/Azure/GCS bucket, where downstream systems can pick them up without a Snowflake client. To organize output, the PARTITION BY clause writes files into subfolders by an expression (for example PARTITION BY (TO_VARCHAR(dt,'YYYY/MM'))), which is handy for date-partitioned exports consumed by a data lake.

Helpful semi-structured functions

Beyond FLATTEN, several functions appear in loading/unloading questions:

FunctionUse
PARSE_JSONConvert a string to a VARIANT
TO_JSON / TO_VARIANTSerialize back to a JSON string / cast to VARIANT
OBJECT_CONSTRUCTBuild a VARIANT object from key/value pairs (great for unloading to JSON)
ARRAY_AGGAggregate rows into a VARIANT array
GET / GET_PATHProgrammatic path access into a VARIANT
TYPEOFInspect the data type stored in a VARIANT

A common unload pattern

To export relational rows as JSON, combine OBJECT_CONSTRUCT(*) with a JSON file format: COPY INTO @stage FROM (SELECT OBJECT_CONSTRUCT(*) FROM orders) FILE_FORMAT = (TYPE = JSON) SINGLE = FALSE MAX_FILE_SIZE = 268435456. The result is GZIP-compressed JSON split into multiple ~256 MB files. Remember that unloaded JSON/CSV defaults to GZIP, so set COMPRESSION = NONE if the consumer cannot decompress, and use OVERWRITE = TRUE cautiously since it replaces existing files at the path.

Traps the exam likes to set

Several unloading and semi-structured details are recurring traps. First, you cannot unload to Avro, ORC, or XML even though you can load them — only CSV, JSON, and Parquet are unload targets. Second, SINGLE = FALSE is the default, so assuming one output file produces a wrong answer; the parallel split is intentional. Third, accessing a VARIANT value without a cast returns VARIANT, not the scalar type, which breaks downstream joins or comparisons — always append ::type.

Fourth, FLATTEN is a table function: it must appear in the FROM clause (typically with LATERAL) to expand an array into rows, so calling it like a scalar function in the SELECT list is a syntax error.

Fifth, a single VARIANT value is limited to about 16 MB compressed, so an enormous JSON document may need to be split before loading. Keeping these five facts straight handles most of the unloading and semi-structured questions on the exam.

Test Your Knowledge

When unloading data with COPY INTO <location> and the default SINGLE setting, what happens to the output?

A
B
C
D
Test Your Knowledge

A JSON column 'doc' contains an array under the key 'tags'. Which approach produces one row per tag value?

A
B
C
D
Test Your Knowledge

Which output file formats are supported when unloading data from Snowflake?

A
B
C
D
Test Your Knowledge

When unloaded data must be read by a downstream tool that cannot decompress files, what should you set in the COPY INTO <location> statement?

A
B
C
D