4.3 Snowpipe, External Tables, and Storage Integrations
Key Takeaways
- Snowpipe auto-ingest triggers loads from cloud event notifications (S3 SQS/SNS, Azure Event Grid, GCP Pub/Sub).
- The Snowpipe REST API (insertFiles/insertReport/loadHistoryScan) drives loads programmatically with file-level dedup.
- External tables expose files in a stage as a read-only table over a VARIANT VALUE column plus a metadata pseudo-column.
- A storage integration stores an IAM identity so external stages need no inline keys.
- External tables and Snowpipe both refresh via serverless compute and can use directory tables for metadata.
Snowpipe Continuous Ingestion
Snowpipe loads files within minutes of their arrival in a stage, using serverless compute instead of a user warehouse. A pipe is a first-class object that wraps a COPY INTO statement: CREATE PIPE my_pipe AUTO_INGEST = TRUE AS COPY INTO t FROM @ext_stage. There are two ways to trigger a pipe:
- Auto-ingest — the cloud provider sends an event notification when a new file lands. On AWS S3 this is an SQS (or SNS) notification; on Azure it is Event Grid; on GCP it is Pub/Sub. The pipe's
notification_channel(fromSHOW PIPES) is the ARN you wire the bucket event to. - REST API — a client calls
insertFileswith a list of filenames;insertReportandloadHistoryScanreturn load status. This is how the Kafka connector and custom apps drive Snowpipe.
Snowpipe performs file-level deduplication: it tracks loaded file names for ~14 days and will not reload a file with the same path unless FORCE semantics apply. Because Snowpipe is event-driven and micro-batch, it suits frequent small batches; for sub-second row-level streaming, Snowpipe Streaming writes rows directly via the Snowflake Ingest SDK without staging files.
| Trait | Bulk COPY INTO | Snowpipe |
|---|---|---|
| Compute | Your warehouse | Serverless (Snowflake-managed) |
| Trigger | Manual / scheduled | Event notification or REST API |
| Default ON_ERROR | ABORT_STATEMENT | SKIP_FILE |
| Best for | Large scheduled batches | Continuous micro-batches |
| Billing | Warehouse credits | Per-GB serverless credits |
External Tables and Storage Integrations
External tables
An external table lets you query files in place in an external stage without loading them into Snowflake. The data exposes through a built-in VALUE column (a VARIANT of each row/record) plus a METADATA$FILENAME pseudo-column. You define virtual columns as expressions over VALUE (for example VALUE:c1::number) and can partition the table by parts of the file path to prune scans. External tables are read-only: no inserts, updates, indexes, or clustering keys, because the data lives outside Snowflake.
They are commonly paired with a materialized view to accelerate repeated queries and with directory tables for file metadata. Auto-refresh of external-table partitions uses the same cloud-notification mechanism as Snowpipe and runs on serverless compute.
Storage integrations
A storage integration is an account-level object that stores a generated cloud IAM identity plus optional allowed/blocked storage locations. It lets external stages and pipes reach S3/Azure/GCS without embedding access keys in SQL. The setup pattern is: CREATE STORAGE INTEGRATION (Snowflake generates an IAM user/principal), grant that principal access in the cloud console, then reference the integration in CREATE STAGE ... STORAGE_INTEGRATION = my_int. Benefits over inline credentials:
- No secrets in code — credentials never appear in stage definitions or query history.
- Centralized control — a security admin manages cloud trust once; many stages reuse it.
- Scoped access —
STORAGE_ALLOWED_LOCATIONSrestricts which buckets/paths are reachable.
Creating or altering a storage integration requires the ACCOUNTADMIN role or a role with the global CREATE INTEGRATION privilege.
Managing and Monitoring Pipes
Pipes have a lifecycle you can control with SQL. ALTER PIPE my_pipe SET PIPE_EXECUTION_PAUSED = TRUE pauses ingestion (useful during maintenance), and SHOW PIPES lists each pipe's pattern, notification channel, and pending-file count. To inspect Snowpipe activity, query the PIPE_USAGE_HISTORY table function for credits and bytes ingested over time, and COPY_HISTORY for per-file outcomes. If files were already in the bucket before a pipe was created, a one-time ALTER PIPE my_pipe REFRESH queues those existing files for load — auto-ingest only reacts to new notifications, so historical backfill needs this manual refresh.
Snowpipe vs. bulk vs. streaming
Choosing among ingestion paths is a frequent scenario. The rule of thumb:
- Bulk COPY INTO for large, scheduled batch jobs where you control the warehouse and want maximum throughput.
- Snowpipe (auto-ingest or REST) for files arriving continuously and unpredictably; latency is minutes and compute is serverless.
- Snowpipe Streaming for row-by-row, sub-second latency with no intermediate files, driven by the Snowflake Ingest SDK or the Kafka connector in streaming mode.
Cost awareness
Because Snowpipe and external-table refresh run on serverless compute, they do not require a running warehouse, but they are not free — they draw serverless credits metered per GB (Snowpipe) or per refresh. Loading many tiny files inflates Snowpipe cost because overhead is amortized across less data, which is exactly why the 100-250 MB compressed sizing guidance applies to continuous loading too. Right-sizing files before they hit the stage is the single biggest lever on ingestion cost.
Privileges and ownership
To create a pipe a role needs the CREATE PIPE privilege on the schema plus USAGE on the database, schema, and stage, and INSERT/SELECT on the target table. Snowpipe runs under the pipe owner's privileges, so if that role loses access to the stage or table, ingestion silently stops — a frequent troubleshooting scenario. For external tables, the role needs CREATE EXTERNAL TABLE plus USAGE on the stage and, when a storage integration is involved, USAGE on the integration.
Recognizing which privilege is missing is often the difference between two plausible answers: if files exist but nothing loads, suspect the pipe owner's stage/table grants; if a CREATE STAGE fails against S3, suspect the storage-integration USAGE grant or the cloud-side trust policy. Tie each connectivity object back to its required grants and the scenario questions resolve cleanly.
On AWS, which mechanism does a Snowpipe pipe configured with AUTO_INGEST = TRUE use to learn that a new file has arrived in an S3 bucket?
Which statement about Snowflake external tables is correct?
What is the primary security advantage of using a storage integration for an external stage?
Files were already sitting in an S3 bucket before a new auto-ingest pipe was created. What is required to load those pre-existing files?