4.2 Core Workflows and Decision Points
Key Takeaways
- COPY INTO <table> options include ON_ERROR, VALIDATION_MODE, PURGE, FORCE, MATCH_BY_COLUMN_NAME, and PATTERN.
- ON_ERROR values are ABORT_STATEMENT (default for bulk), CONTINUE, SKIP_FILE, and SKIP_FILE_<n>/<n>%.
- VALIDATION_MODE (RETURN_n_ROWS / RETURN_ERRORS / RETURN_ALL_ERRORS) checks files without loading them.
- Snowpipe defaults ON_ERROR to SKIP_FILE and uses serverless compute billed per GB ingested.
- Snowflake recommends data files of roughly 100-250 MB compressed for efficient parallel loading.
The COPY INTO Loading Workflow
Bulk loading follows a fixed sequence: create/reference a stage, optionally define a file format, then run COPY INTO <table> FROM @stage with copy options. A typical statement looks like:
COPY INTO sales
FROM @my_stage/2026/
FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1)
PATTERN = '.*sales.*[.]csv'
ON_ERROR = SKIP_FILE
PURGE = TRUE;
Key copy options and what they control:
| Option | Purpose |
|---|---|
| ON_ERROR | How to handle row-level errors (see below) |
| VALIDATION_MODE | Dry-run; inspect errors without loading |
| PURGE | Delete staged files after a successful load |
| FORCE | Reload files even if already loaded |
| PATTERN | Regex filter on file paths in the stage |
| FILES | Explicit list of specific files to load |
| MATCH_BY_COLUMN_NAME | Map columns by name for Parquet/JSON |
| SIZE_LIMIT | Stop after roughly N bytes loaded |
| TRUNCATECOLUMNS / ENFORCE_LENGTH | Truncate vs. error on over-length strings |
ON_ERROR values
ON_ERROR is the single most tested copy option. For bulk COPY INTO, the default is ABORT_STATEMENT — any error aborts the whole load. CONTINUE loads good rows and skips bad ones. SKIP_FILE skips the entire file containing an error. SKIP_FILE_<n> skips a file once it hits n errors; SKIP_FILE_<n>% skips once the error percentage is reached. Importantly, Snowpipe defaults ON_ERROR to SKIP_FILE, not ABORT_STATEMENT — a classic exam trap.
Validation, File Sizing, and Compute
Validating before you load
Use VALIDATION_MODE to run COPY INTO as a dry run that returns errors without inserting any rows. The accepted values are RETURN_n_ROWS (return the first n rows if they parse), RETURN_ERRORS (return all errors across the files), and RETURN_ALL_ERRORS (include errors from previously partially-loaded files). After a real load, the VALIDATE table function returns errors for a prior COPY — but only for standard loads, not for COPY statements that perform transformations. The COPY_HISTORY view/function and Snowsight's Copy History page let you audit load outcomes.
File sizing best practice
Snowflake recommends producing data files of roughly 100-250 MB (or larger) compressed. This applies to both bulk loading and Snowpipe. The reasons:
- Too many tiny files (under ~10 MB, e.g. one row per file) create per-file overhead that dominates cost and underutilizes the warehouse.
- Too few huge files can't be split across compute threads, so parallelism suffers.
- Each warehouse node loads files in parallel — an X-Small warehouse (1 node, 8 threads) loads up to 8 files concurrently, so splitting one giant file into ~8 right-sized files lets a small warehouse work fully in parallel.
Compute used
Bulk COPY INTO runs on the virtual warehouse you specify and is billed as normal warehouse credits. Snowpipe and external-table auto-refresh use Snowflake-managed serverless compute, billed separately. For Snowpipe, the current model bills a fixed credit amount per GB ingested (text files by uncompressed size; binary columnar files by observed size), having replaced the legacy per-1,000-files overhead charge.
Transformations During Load and PATTERN/FILES
A powerful feature is transforming data during the COPY: instead of COPY INTO t FROM @stage, you can supply a SELECT over the stage, for example COPY INTO t FROM (SELECT $1, $2::number, METADATA$FILENAME FROM @stage). Within that SELECT you can reorder columns, cast, omit columns, apply simple functions, and reference METADATA$FILENAME and METADATA$FILE_ROW_NUMBER pseudo-columns. Two limits to remember: the transform SELECT supports a restricted set of operations (no joins, no aggregates, no WHERE/ORDER BY/GROUP BY), and the VALIDATE function does not work on transforming COPY statements.
Targeting specific files
You control which staged files load three ways, and they interact:
| Method | Behavior |
|---|---|
| Path/prefix | FROM @stage/2026/jan/ loads everything under that folder |
| PATTERN | A regex over the full path, e.g. PATTERN = '.*[.]parquet' |
| FILES | An explicit list of up to 1,000 file names |
FILES is the most precise and bypasses listing the whole stage, which matters for very large stages. MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE lets Parquet and JSON load by column name rather than position, so source column order no longer has to match the table.
Auditing loads
After loading, the COPY_HISTORY table function and the INFORMATION_SCHEMA.COPY_HISTORY / Account Usage views report rows loaded, rows parsed, errors, and file status. Snowsight surfaces the same data on each table's Copy History tab. For row-level errors, VALIDATE(table, JOB_ID => '_last') returns the offending rows from a recent standard load so you can correct the source files.
Putting the options together
A production load often combines several options at once: a FILE_FORMAT for parsing, MATCH_BY_COLUMN_NAME so Parquet maps by name, ON_ERROR = CONTINUE to tolerate a few bad rows, PURGE = TRUE to clean the stage afterward, and a regex PATTERN to pick only the files for the current run. Understanding the order of evaluation helps: Snowflake first lists files matching the path and PATTERN, then checks load metadata to skip already-loaded files (unless FORCE), then parses each file with the file format, applies the optional transform SELECT, and finally honors ON_ERROR for any record that fails.
Each copy option plugs into one of those stages, so a stem that mentions 'skip files already loaded' points at metadata/FORCE, while 'load good rows, drop bad rows' points squarely at ON_ERROR = CONTINUE.
A data engineer wants to confirm that a staged file will parse correctly before committing any rows to the target table. Which COPY INTO option accomplishes this?
By default, how does a Snowpipe pipe handle a file that contains record-level errors during loading?
For efficient parallel loading, what compressed file size does Snowflake recommend producing?
A COPY INTO must load Parquet files whose column order differs from the target table's column order. Which option lets the load succeed by mapping on column names?