4.2 Core Workflows and Decision Points

Key Takeaways

  • COPY INTO <table> options include ON_ERROR, VALIDATION_MODE, PURGE, FORCE, MATCH_BY_COLUMN_NAME, and PATTERN.
  • ON_ERROR values are ABORT_STATEMENT (default for bulk), CONTINUE, SKIP_FILE, and SKIP_FILE_<n>/<n>%.
  • VALIDATION_MODE (RETURN_n_ROWS / RETURN_ERRORS / RETURN_ALL_ERRORS) checks files without loading them.
  • Snowpipe defaults ON_ERROR to SKIP_FILE and uses serverless compute billed per GB ingested.
  • Snowflake recommends data files of roughly 100-250 MB compressed for efficient parallel loading.
Last updated: June 2026

The COPY INTO Loading Workflow

Bulk loading follows a fixed sequence: create/reference a stage, optionally define a file format, then run COPY INTO <table> FROM @stage with copy options. A typical statement looks like:

COPY INTO sales
  FROM @my_stage/2026/
  FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)
  PATTERN = '.*sales.*[.]csv'
  ON_ERROR = SKIP_FILE
  PURGE = TRUE;

Key copy options and what they control:

OptionPurpose
ON_ERRORHow to handle row-level errors (see below)
VALIDATION_MODEDry-run; inspect errors without loading
PURGEDelete staged files after a successful load
FORCEReload files even if already loaded
PATTERNRegex filter on file paths in the stage
FILESExplicit list of specific files to load
MATCH_BY_COLUMN_NAMEMap columns by name for Parquet/JSON
SIZE_LIMITStop after roughly N bytes loaded
TRUNCATECOLUMNS / ENFORCE_LENGTHTruncate vs. error on over-length strings

ON_ERROR values

ON_ERROR is the single most tested copy option. For bulk COPY INTO, the default is ABORT_STATEMENT — any error aborts the whole load. CONTINUE loads good rows and skips bad ones. SKIP_FILE skips the entire file containing an error. SKIP_FILE_<n> skips a file once it hits n errors; SKIP_FILE_<n>% skips once the error percentage is reached. Importantly, Snowpipe defaults ON_ERROR to SKIP_FILE, not ABORT_STATEMENT — a classic exam trap.

Validation, File Sizing, and Compute

Validating before you load

Use VALIDATION_MODE to run COPY INTO as a dry run that returns errors without inserting any rows. The accepted values are RETURN_n_ROWS (return the first n rows if they parse), RETURN_ERRORS (return all errors across the files), and RETURN_ALL_ERRORS (include errors from previously partially-loaded files). After a real load, the VALIDATE table function returns errors for a prior COPY — but only for standard loads, not for COPY statements that perform transformations. The COPY_HISTORY view/function and Snowsight's Copy History page let you audit load outcomes.

File sizing best practice

Snowflake recommends producing data files of roughly 100-250 MB (or larger) compressed. This applies to both bulk loading and Snowpipe. The reasons:

  • Too many tiny files (under ~10 MB, e.g. one row per file) create per-file overhead that dominates cost and underutilizes the warehouse.
  • Too few huge files can't be split across compute threads, so parallelism suffers.
  • Each warehouse node loads files in parallel — an X-Small warehouse (1 node, 8 threads) loads up to 8 files concurrently, so splitting one giant file into ~8 right-sized files lets a small warehouse work fully in parallel.

Compute used

Bulk COPY INTO runs on the virtual warehouse you specify and is billed as normal warehouse credits. Snowpipe and external-table auto-refresh use Snowflake-managed serverless compute, billed separately. For Snowpipe, the current model bills a fixed credit amount per GB ingested (text files by uncompressed size; binary columnar files by observed size), having replaced the legacy per-1,000-files overhead charge.

Transformations During Load and PATTERN/FILES

A powerful feature is transforming data during the COPY: instead of COPY INTO t FROM @stage, you can supply a SELECT over the stage, for example COPY INTO t FROM (SELECT $1, $2::number, METADATA$FILENAME FROM @stage). Within that SELECT you can reorder columns, cast, omit columns, apply simple functions, and reference METADATA$FILENAME and METADATA$FILE_ROW_NUMBER pseudo-columns. Two limits to remember: the transform SELECT supports a restricted set of operations (no joins, no aggregates, no WHERE/ORDER BY/GROUP BY), and the VALIDATE function does not work on transforming COPY statements.

Targeting specific files

You control which staged files load three ways, and they interact:

MethodBehavior
Path/prefixFROM @stage/2026/jan/ loads everything under that folder
PATTERNA regex over the full path, e.g. PATTERN = '.*[.]parquet'
FILESAn explicit list of up to 1,000 file names

FILES is the most precise and bypasses listing the whole stage, which matters for very large stages. MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE lets Parquet and JSON load by column name rather than position, so source column order no longer has to match the table.

Auditing loads

After loading, the COPY_HISTORY table function and the INFORMATION_SCHEMA.COPY_HISTORY / Account Usage views report rows loaded, rows parsed, errors, and file status. Snowsight surfaces the same data on each table's Copy History tab. For row-level errors, VALIDATE(table, JOB_ID => '_last') returns the offending rows from a recent standard load so you can correct the source files.

Putting the options together

A production load often combines several options at once: a FILE_FORMAT for parsing, MATCH_BY_COLUMN_NAME so Parquet maps by name, ON_ERROR = CONTINUE to tolerate a few bad rows, PURGE = TRUE to clean the stage afterward, and a regex PATTERN to pick only the files for the current run. Understanding the order of evaluation helps: Snowflake first lists files matching the path and PATTERN, then checks load metadata to skip already-loaded files (unless FORCE), then parses each file with the file format, applies the optional transform SELECT, and finally honors ON_ERROR for any record that fails.

Each copy option plugs into one of those stages, so a stem that mentions 'skip files already loaded' points at metadata/FORCE, while 'load good rows, drop bad rows' points squarely at ON_ERROR = CONTINUE.

Test Your Knowledge

A data engineer wants to confirm that a staged file will parse correctly before committing any rows to the target table. Which COPY INTO option accomplishes this?

A
B
C
D
Test Your Knowledge

By default, how does a Snowpipe pipe handle a file that contains record-level errors during loading?

A
B
C
D
Test Your Knowledge

For efficient parallel loading, what compressed file size does Snowflake recommend producing?

A
B
C
D
Test Your Knowledge

A COPY INTO must load Parquet files whose column order differs from the target table's column order. Which option lets the load succeed by mapping on column names?

A
B
C
D