4.2 Storage Cost Optimization — S3 Tiers, EBS, and Data Transfer
Key Takeaways
- S3 lifecycle policies automate transitions to colder classes; moving Standard to Glacier Deep Archive cuts storage cost by about 95%.
- S3 Intelligent-Tiering moves objects between tiers automatically for a small per-object monitoring fee — the answer when access patterns are unknown.
- gp3 is about 20% cheaper than gp2 and decouples IOPS/throughput from size; st1/sc1 HDD volumes are far cheaper for large sequential workloads.
- Ingress is always free; cross-AZ is $0.01/GB each way, cross-Region about $0.02/GB, and internet egress starts at $0.09/GB.
- S3 and DynamoDB Gateway VPC Endpoints are free and bypass NAT Gateway data-processing charges of $0.045/GB.
S3 Storage Class Costs and Retrieval
S3 storage classes trade price per GB against retrieval latency, retrieval cost, and minimum durations. Approximate us-east-1 prices:
| Storage class | $/GB-month | Min duration | Retrieval |
|---|---|---|---|
| S3 Standard | 0.023 | none | instant, free |
| Intelligent-Tiering | 0.023 → 0.0036 | none | instant; small monitoring fee/object |
| Standard-IA | 0.0125 | 30 days | instant; per-GB retrieval fee |
| One Zone-IA | 0.01 | 30 days | instant; single-AZ durability |
| Glacier Instant Retrieval | 0.004 | 90 days | milliseconds |
| Glacier Flexible Retrieval | 0.0036 | 90 days | minutes to 12 hr |
| Glacier Deep Archive | 0.00099 | 180 days | 12-48 hr |
Moving Standard → Deep Archive saves about 95.7% per GB. Watch the minimum-duration trap: deleting an IA or Glacier object early still bills the full 30/90/180-day minimum, so short-lived data belongs in Standard.
Choosing a Tiering Strategy
| Situation | Correct answer |
|---|---|
| Known, predictable age-out (e.g., logs) | Lifecycle policy with explicit transitions |
| Unknown/changing access pattern | Intelligent-Tiering (no retrieval fees) |
| Reproducible data, can tolerate single AZ | One Zone-IA |
| Compliance archive, retrieval rare and slow | Glacier Deep Archive |
Lifecycle rules also expire noncurrent versions and abort incomplete multipart uploads (commonly after 7 days) — a frequent source of silent storage waste. Use S3 Storage Class Analysis to discover when to transition, and Requester Pays so downloaders, not the bucket owner, pay egress on shared datasets.
EBS Cost Optimization
| Action | Savings |
|---|---|
| Migrate gp2 → gp3 | ~20% cheaper; set IOPS/throughput independently |
| Use st1 / sc1 HDD | Far cheaper for large sequential throughput (big-data, logs) |
| Delete orphaned volumes | 100% — unattached volumes still bill |
| Prune old snapshots with DLM | Amazon Data Lifecycle Manager automates creation/cleanup |
| Right-size over-allocated volumes | You pay for provisioned size, not used |
Data Transfer Charges
| Transfer | Cost |
|---|---|
| Internet into AWS (ingress) | Free |
| Same AZ via private IP | Free |
| Cross-AZ, same Region | $0.01/GB each way |
| Cross-Region | ~$0.02/GB |
| Internet egress | from $0.09/GB |
| Via CloudFront | ~$0.085/GB + caching cuts origin hits |
| S3/DynamoDB Gateway Endpoint | Free |
On the Exam: "Cut data-transfer cost from EC2 to S3" → S3 Gateway VPC Endpoint (free, bypasses NAT). "Reduce NAT Gateway bill" → route AWS-service traffic through VPC endpoints. "Lower egress to global users" → CloudFront.
Hybrid Storage with Storage Gateway
| Gateway type | Protocol → backend | Use case |
|---|---|---|
| S3 File Gateway | NFS/SMB → S3 | File shares, data-lake ingest |
| FSx File Gateway | SMB → FSx for Windows | Cached Windows shares |
| Volume Gateway | iSCSI → S3 + EBS snapshots | Block storage, DR |
| Tape Gateway | VTL → S3 Glacier | Replace physical backup tape |
Storage Gateway keeps a hot local cache while tiering the bulk of data to low-cost S3/Glacier — the standard answer for "reduce on-prem storage cost without a full migration."
Worked Example: Lifecycle vs. Intelligent-Tiering
A media company stores rendered video. Editors access a clip heavily for its first month, occasionally for a year, then almost never. Because the access pattern is known and time-based, a lifecycle policy is the cheaper answer: Standard for 30 days, Standard-IA at 30 days, Glacier Flexible at 365 days, and Deep Archive beyond two years. Intelligent-Tiering would also work but charges a small per-object monitoring fee that is pure waste when the decay curve is already predictable.
Flip the scenario — a data-science bucket where any object might suddenly become hot — and Intelligent-Tiering wins, because it moves objects between frequent and infrequent tiers automatically with no retrieval fees and no guessing.
A second worked example exposes the minimum-duration trap. A team writes temporary export files and deletes them after 5 days. Moving these to Standard-IA looks cheaper per GB, but IA bills a 30-day minimum per object, so deleting at day 5 still incurs 30 days of charges plus a per-GB retrieval fee on read — net more expensive than leaving them in Standard. Short-lived objects belong in Standard; only data that genuinely ages past the minimum duration benefits from colder classes.
EBS and Snapshot Cost Detail
EBS volumes bill on provisioned capacity, not used bytes, so a 1 TB gp3 volume holding 100 GB still bills for 1 TB. Migrating gp2 → gp3 is almost always free money: gp3 starts cheaper per GB and includes a baseline 3,000 IOPS and 125 MB/s independent of size, so you stop over-provisioning size just to buy throughput. For large sequential workloads — log processing, big-data scans — st1 (throughput-optimized HDD) and sc1 (cold HDD) are dramatically cheaper than SSD because the access pattern does not need low-latency random IOPS.
Snapshots are incremental but accumulate silently. Use Amazon Data Lifecycle Manager (DLM) to schedule creation and automatic deletion against a retention policy rather than relying on manual cleanup. The exam's cost-optimization framing here is: find and delete orphaned volumes (unattached but still billing 100%), prune stale snapshots with DLM, and right-size over-allocated volumes.
On the Exam: "Cheapest EBS for large sequential big-data reads" → st1. "Lower EBS cost with no performance loss" → gp2 to gp3. "Automate snapshot retention" → Data Lifecycle Manager.
Storage-Cost Trap Table
| Tempting wrong answer | Why it fails |
|---|---|
| IA/Glacier for objects deleted within days | Minimum-duration billing exceeds Standard cost |
| Intelligent-Tiering with a known decay curve | Unneeded per-object monitoring fee |
| Transfer Acceleration to "cut transfer cost" | It adds a premium; it speeds long-distance uploads |
| Sizing an EBS volume up just for more IOPS | gp3 sets IOPS/throughput independently of size |
| Leaving detached volumes "just in case" | Unattached volumes still bill at 100% |
EC2 instances in a private subnet read large volumes of data from Amazon S3 through a NAT Gateway, producing high data-processing charges. How can the team eliminate the NAT Gateway charges for this S3 traffic?
A bucket holds 100 TB that is read frequently for 30 days, rarely for the next 11 months, and then must be retained untouched for 7 years. Which approach minimizes total storage cost?
Which AWS data transfer is always FREE?