4.2 Core Workflows and Decision Points

Key Takeaways

COPY INTO <table> options include ON_ERROR, VALIDATION_MODE, PURGE, FORCE, MATCH_BY_COLUMN_NAME, and PATTERN.
ON_ERROR values are ABORT_STATEMENT (default for bulk), CONTINUE, SKIP_FILE, and SKIP_FILE_<n>/<n>%.
VALIDATION_MODE (RETURN_n_ROWS / RETURN_ERRORS / RETURN_ALL_ERRORS) checks files without loading them.
Snowpipe defaults ON_ERROR to SKIP_FILE and uses serverless compute billed per GB ingested.
Snowflake recommends data files of roughly 100-250 MB compressed for efficient parallel loading.

Last updated: June 2026

The COPY INTO Loading Workflow

Bulk loading follows a fixed sequence: create/reference a stage, optionally define a file format, then run COPY INTO <table> FROM @stage with copy options. A typical statement looks like:

COPY INTO sales
  FROM @my_stage/2026/
  FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)
  PATTERN = '.*sales.*[.]csv'
  ON_ERROR = SKIP_FILE
  PURGE = TRUE;

Key copy options and what they control:

Option	Purpose
ON_ERROR	How to handle row-level errors (see below)
VALIDATION_MODE	Dry-run; inspect errors without loading
PURGE	Delete staged files after a successful load
FORCE	Reload files even if already loaded
PATTERN	Regex filter on file paths in the stage
FILES	Explicit list of specific files to load
MATCH_BY_COLUMN_NAME	Map columns by name for Parquet/JSON
SIZE_LIMIT	Stop after roughly N bytes loaded
TRUNCATECOLUMNS / ENFORCE_LENGTH	Truncate vs. error on over-length strings

ON_ERROR values

ON_ERROR is the single most tested copy option. For bulk COPY INTO, the default is ABORT_STATEMENT — any error aborts the whole load. CONTINUE loads good rows and skips bad ones. SKIP_FILE skips the entire file containing an error. SKIP_FILE_<n> skips a file once it hits n errors; SKIP_FILE_<n>% skips once the error percentage is reached. Importantly, Snowpipe defaults ON_ERROR to SKIP_FILE, not ABORT_STATEMENT — a classic exam trap.

Validation, File Sizing, and Compute

Validating before you load

Use VALIDATION_MODE to run COPY INTO as a dry run that returns errors without inserting any rows. The accepted values are RETURN_n_ROWS (return the first n rows if they parse), RETURN_ERRORS (return all errors across the files), and RETURN_ALL_ERRORS (include errors from previously partially-loaded files). After a real load, the VALIDATE table function returns errors for a prior COPY — but only for standard loads, not for COPY statements that perform transformations. The COPY_HISTORY view/function and Snowsight's Copy History page let you audit load outcomes.

File sizing best practice

Snowflake recommends producing data files of roughly 100-250 MB (or larger) compressed. This applies to both bulk loading and Snowpipe. The reasons:

Too many tiny files (under ~10 MB, e.g. one row per file) create per-file overhead that dominates cost and underutilizes the warehouse.
Too few huge files can't be split across compute threads, so parallelism suffers.
Each warehouse node loads files in parallel — an X-Small warehouse (1 node, 8 threads) loads up to 8 files concurrently, so splitting one giant file into ~8 right-sized files lets a small warehouse work fully in parallel.

Compute used

Bulk COPY INTO runs on the virtual warehouse you specify and is billed as normal warehouse credits. Snowpipe and external-table auto-refresh use Snowflake-managed serverless compute, billed separately. For Snowpipe, the current model bills a fixed credit amount per GB ingested (text files by uncompressed size; binary columnar files by observed size), having replaced the legacy per-1,000-files overhead charge.

Transformations During Load and PATTERN/FILES

A powerful feature is transforming data during the COPY: instead of COPY INTO t FROM @stage, you can supply a SELECT over the stage, for example COPY INTO t FROM (SELECT $1, $2::number, METADATA$FILENAME FROM @stage). Within that SELECT you can reorder columns, cast, omit columns, apply simple functions, and reference METADATA$FILENAME and METADATA$FILE_ROW_NUMBER pseudo-columns. Two limits to remember: the transform SELECT supports a restricted set of operations (no joins, no aggregates, no WHERE/ORDER BY/GROUP BY), and the VALIDATE function does not work on transforming COPY statements.

Targeting specific files

You control which staged files load three ways, and they interact:

Method	Behavior
Path/prefix	`FROM @stage/2026/jan/` loads everything under that folder
PATTERN	A regex over the full path, e.g. `PATTERN = '.*[.]parquet'`
FILES	An explicit list of up to 1,000 file names

FILES is the most precise and bypasses listing the whole stage, which matters for very large stages. MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE lets Parquet and JSON load by column name rather than position, so source column order no longer has to match the table.

Auditing loads

After loading, the COPY_HISTORY table function and the INFORMATION_SCHEMA.COPY_HISTORY / Account Usage views report rows loaded, rows parsed, errors, and file status. Snowsight surfaces the same data on each table's Copy History tab. For row-level errors, VALIDATE(table, JOB_ID => '_last') returns the offending rows from a recent standard load so you can correct the source files.

Putting the options together

A production load often combines several options at once: a FILE_FORMAT for parsing, MATCH_BY_COLUMN_NAME so Parquet maps by name, ON_ERROR = CONTINUE to tolerate a few bad rows, PURGE = TRUE to clean the stage afterward, and a regex PATTERN to pick only the files for the current run. Understanding the order of evaluation helps: Snowflake first lists files matching the path and PATTERN, then checks load metadata to skip already-loaded files (unless FORCE), then parses each file with the file format, applies the optional transform SELECT, and finally honors ON_ERROR for any record that fails.

Each copy option plugs into one of those stages, so a stem that mentions 'skip files already loaded' points at metadata/FORCE, while 'load good rows, drop bad rows' points squarely at ON_ERROR = CONTINUE.

Test Your Knowledge

A data engineer wants to confirm that a staged file will parse correctly before committing any rows to the target table. Which COPY INTO option accomplishes this?

PURGE = TRUE

FORCE = TRUE

VALIDATION_MODE = RETURN_ERRORS

ON_ERROR = CONTINUE

Test Your Knowledge

By default, how does a Snowpipe pipe handle a file that contains record-level errors during loading?

It aborts the entire pipe and stops ingestion (ABORT_STATEMENT)

It skips the entire file (SKIP_FILE)

It loads good rows and skips bad rows (CONTINUE)

It retries the file indefinitely

Test Your Knowledge

For efficient parallel loading, what compressed file size does Snowflake recommend producing?

Roughly 100-250 MB compressed

Exactly 1 MB per file

As large as possible, ideally over 5 GB per file

Under 1 MB so each file loads instantly

Test Your Knowledge

A COPY INTO must load Parquet files whose column order differs from the target table's column order. Which option lets the load succeed by mapping on column names?

ON_ERROR = SKIP_FILE

PURGE = TRUE

VALIDATION_MODE = RETURN_ERRORS

MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE

Up Next

4.3 Snowpipe, External Tables, and Storage Integrations

Continue learning

SnowPro Core Study Guide

SnowPro Core Certification

4.2 Core Workflows and Decision Points

Key Takeaways

The COPY INTO Loading Workflow

ON_ERROR values

Validation, File Sizing, and Compute

Validating before you load

File sizing best practice

Compute used

Transformations During Load and PATTERN/FILES

Targeting specific files

Auditing loads

Putting the options together

SnowPro Core Study Guide

1Chapter 1: SnowPro Core Orientation and Exam Strategy

2Chapter 2: Snowflake AI Data Cloud Features and Architecture

3Chapter 3: Account Management and Data Governance

4Chapter 4: Data Loading, Unloading, and Connectivity

5Chapter 5: Performance Optimization, Querying, and Transformation

6Chapter 6: Data Collaboration

7Chapter 7: Final Review and Test Day

SnowPro Core Certification

4.2 Core Workflows and Decision Points

Key Takeaways

The COPY INTO Loading Workflow

ON_ERROR values

Validation, File Sizing, and Compute

Validating before you load

File sizing best practice

Compute used

Transformations During Load and PATTERN/FILES

Targeting specific files

Auditing loads

Putting the options together