When unloading data with COPY INTO and the default SINGLE setting, what happens to the output?

The data is split across multiple files, one per processing thread. SINGLE defaults to FALSE, so COPY INTO splits output across multiple files (named like data_0_0_0.csv.gz), one per thread, enabling parallel writes. Setting SINGLE = TRUE forces a single file, subject to the MAX_FILE_SIZE cap.

4.4 Bulk Unloading and Semi-Structured Data

Key Takeaways

Unloading uses COPY INTO <location> FROM <table-or-query> into a stage, then GET to download.
SINGLE = FALSE (default) splits output into multiple files; MAX_FILE_SIZE caps each file (default 16 MB, max 5 GB).
Unloaded files are GZIP-compressed by default; supported output formats are CSV, JSON, and Parquet.
Semi-structured data loads into a VARIANT column; FLATTEN explodes arrays/objects into rows.
Dot/bracket notation (col:path) and casts (::type) extract values from VARIANT.

Last updated: June 2026

Bulk Unloading with COPY INTO <location>

Unloading reverses the load: COPY INTO @stage FROM (SELECT ...) writes table or query results to files in an internal or external stage, after which you GET them to a local machine (for internal stages) or read them directly from the cloud bucket (external). The output format is set with FILE_FORMAT; supported unload formats are CSV (delimited), JSON, and Parquet — note Avro, ORC, and XML can be loaded but not unloaded.

Key unload options:

Option	Default	Effect
SINGLE	FALSE	FALSE splits into multiple files (one per thread); TRUE writes one file
MAX_FILE_SIZE	16 MB	Max bytes per output file; up to 5 GB
COMPRESSION	GZIP	Output compression (GZIP/BROTLI/ZSTD/NONE etc.)
OVERWRITE	FALSE	Replace existing files at the path
HEADER	FALSE	Write column headers (CSV)
INCLUDE_QUERY_ID	FALSE	Add query ID to filenames to avoid collisions

With SINGLE = FALSE, Snowflake names files like data_0_0_0.csv.gz, splitting work across warehouse threads for parallel writes. To produce one output file, set SINGLE = TRUE, but be aware a single file is capped (you cannot exceed MAX_FILE_SIZE's 5 GB hard limit). Because unloaded files default to GZIP, downstream consumers must decompress unless you set COMPRESSION = NONE. Like loading, 100-250 MB compressed files are the efficiency sweet spot for re-loading the data elsewhere.

Semi-Structured Data: VARIANT and FLATTEN

Snowflake stores JSON, Avro, ORC, Parquet, and XML in a VARIANT column (max ~16 MB compressed per value). You can load an entire document into one VARIANT and query it later without a fixed schema — a major Snowflake selling point.

Accessing VARIANT data

Path notation: src:owner.name (dot) or src['owner']['name'] (bracket) navigates nested objects.
Array indexing: src:items[0] reads the first element.
Casting: append ::type to coerce, e.g. src:price::number or src:dt::date. Without a cast, values return as VARIANT.

FLATTEN

The FLATTEN table function explodes an array or object into one row per element, used with a LATERAL join:

SELECT f.value:sku::string AS sku
FROM orders,
     LATERAL FLATTEN(input => orders.line_items) f;

FLATTEN returns columns including SEQ, KEY, PATH, INDEX, VALUE, and THIS. Set OUTER => TRUE to keep rows whose array is empty/null (like an outer join), and RECURSIVE => TRUE to expand all nested levels. This pairing — load to VARIANT, then FLATTEN into relational rows — is the canonical semi-structured workflow and is frequently tested. You can also let COPY INTO transform during load (select specific columns, apply MATCH_BY_COLUMN_NAME, reorder, or cast) so semi-structured files land in a relational shape directly.

Unload Targets, Partitioning, and Functions

Where unloaded files go

COPY INTO <location> can target any of the four stage types. Unloading to an internal stage (named, table @%t, or user @~) keeps files inside Snowflake until you GET them; unloading to an external stage writes straight into your S3/Azure/GCS bucket, where downstream systems can pick them up without a Snowflake client. To organize output, the PARTITION BY clause writes files into subfolders by an expression (for example PARTITION BY (TO_VARCHAR(dt,'YYYY/MM'))), which is handy for date-partitioned exports consumed by a data lake.

Helpful semi-structured functions

Beyond FLATTEN, several functions appear in loading/unloading questions:

Function	Use
PARSE_JSON	Convert a string to a VARIANT
TO_JSON / TO_VARIANT	Serialize back to a JSON string / cast to VARIANT
OBJECT_CONSTRUCT	Build a VARIANT object from key/value pairs (great for unloading to JSON)
ARRAY_AGG	Aggregate rows into a VARIANT array
GET / GET_PATH	Programmatic path access into a VARIANT
TYPEOF	Inspect the data type stored in a VARIANT

A common unload pattern

To export relational rows as JSON, combine OBJECT_CONSTRUCT(*) with a JSON file format: COPY INTO @stage FROM (SELECT OBJECT_CONSTRUCT(*) FROM orders) FILE_FORMAT = (TYPE = JSON) SINGLE = FALSE MAX_FILE_SIZE = 268435456. The result is GZIP-compressed JSON split into multiple ~256 MB files. Remember that unloaded JSON/CSV defaults to GZIP, so set COMPRESSION = NONE if the consumer cannot decompress, and use OVERWRITE = TRUE cautiously since it replaces existing files at the path.

Traps the exam likes to set

Several unloading and semi-structured details are recurring traps. First, you cannot unload to Avro, ORC, or XML even though you can load them — only CSV, JSON, and Parquet are unload targets. Second, SINGLE = FALSE is the default, so assuming one output file produces a wrong answer; the parallel split is intentional. Third, accessing a VARIANT value without a cast returns VARIANT, not the scalar type, which breaks downstream joins or comparisons — always append ::type.

Fourth, FLATTEN is a table function: it must appear in the FROM clause (typically with LATERAL) to expand an array into rows, so calling it like a scalar function in the SELECT list is a syntax error.

Fifth, a single VARIANT value is limited to about 16 MB compressed, so an enormous JSON document may need to be split before loading. Keeping these five facts straight handles most of the unloading and semi-structured questions on the exam.

Test Your Knowledge

When unloading data with COPY INTO <location> and the default SINGLE setting, what happens to the output?

Snowflake writes exactly one file regardless of data volume

The data is split across multiple files, one per processing thread

Snowflake refuses to unload more than 16 MB total

Output is written only in Parquet format

Test Your Knowledge

A JSON column 'doc' contains an array under the key 'tags'. Which approach produces one row per tag value?

SELECT doc:tags FROM t

SELECT doc:tags[*] FROM t

SELECT f.value FROM t, LATERAL FLATTEN(input => t.doc:tags) f

SELECT UNNEST(doc.tags) FROM t

Test Your Knowledge

Which output file formats are supported when unloading data from Snowflake?

CSV, JSON, and Parquet

Only CSV

CSV, JSON, Avro, ORC, Parquet, and XML

Only Parquet and Avro

Test Your Knowledge

When unloaded data must be read by a downstream tool that cannot decompress files, what should you set in the COPY INTO <location> statement?

SINGLE = TRUE

OVERWRITE = TRUE

MAX_FILE_SIZE = 5000000000

COMPRESSION = NONE

Up Next

4.5 Connectors, Drivers, and Snowsight Loading

Continue learning

SnowPro Core Study Guide

SnowPro Core Certification

4.4 Bulk Unloading and Semi-Structured Data

Key Takeaways

Bulk Unloading with COPY INTO <location>

Semi-Structured Data: VARIANT and FLATTEN

Accessing VARIANT data

FLATTEN

Unload Targets, Partitioning, and Functions

Where unloaded files go

Helpful semi-structured functions

A common unload pattern

Traps the exam likes to set

SnowPro Core Study Guide

1Chapter 1: SnowPro Core Orientation and Exam Strategy

2Chapter 2: Snowflake AI Data Cloud Features and Architecture

3Chapter 3: Account Management and Data Governance

4Chapter 4: Data Loading, Unloading, and Connectivity

5Chapter 5: Performance Optimization, Querying, and Transformation

6Chapter 6: Data Collaboration

7Chapter 7: Final Review and Test Day

SnowPro Core Certification

4.4 Bulk Unloading and Semi-Structured Data

Key Takeaways

Bulk Unloading with COPY INTO <location>

Semi-Structured Data: VARIANT and FLATTEN

Accessing VARIANT data

FLATTEN

Unload Targets, Partitioning, and Functions

Where unloaded files go

Helpful semi-structured functions

A common unload pattern

Traps the exam likes to set