6.5 AWS DataSync and Transfer Family
Key Takeaways
- AWS DataSync automates and accelerates online data transfer between on-premises storage and AWS (S3, EFS, FSx) with built-in encryption, scheduling, filtering, and post-transfer integrity validation.
- DataSync uses a purpose-built protocol that can run up to about 10x faster than open-source tools like rsync, and uses an on-premises agent for on-prem sources (no agent for AWS-to-AWS).
- AWS Transfer Family is a fully managed SFTP, FTPS, FTP, and AS2 service backed by S3 or EFS — it preserves existing partner file-exchange workflows without changing their clients.
- Use DataSync for migration and recurring synchronization; use Transfer Family when the requirement is a specific file-transfer protocol (especially SFTP) for external partners.
- DataSync pricing is per-gigabyte transferred, and it supports bandwidth throttling and include/exclude filters to control cost and scope.
AWS DataSync — Fast, Automated Online Transfer
Quick Answer: DataSync = fast, automated, online data transfer between on-premises and AWS storage (S3/EFS/FSx) with scheduling, filtering, and integrity checks. Transfer Family = managed SFTP/FTPS/FTP/AS2 backed by S3 or EFS so partners keep their existing clients. Exam shorthand: "bulk migrate or sync files over the network" → DataSync; "partners upload via SFTP" → Transfer Family.
DataSync moves data over the network (not by shipping disks) and is built for both one-time migrations and recurring synchronization. It deploys an agent as a VM (or EC2 instance) next to on-premises NFS/SMB/HDFS/object stores; AWS-to-AWS transfers need no agent.
| Feature | Detail |
|---|---|
| Throughput | Up to ~10 Gbps; roughly 10x faster than rsync/open-source tools |
| Sources | NFS, SMB, HDFS, self-managed object storage, S3, EFS, FSx |
| Destinations | S3, EFS, FSx for Windows, FSx for Lustre, FSx for OpenZFS, FSx for NetApp ONTAP |
| Encryption | TLS in transit; destination-side encryption at rest |
| Validation | Automatic data-integrity verification after each task |
| Scheduling | Hourly/daily/weekly tasks for ongoing sync |
| Bandwidth | Configurable throttling to protect production links |
| Filtering | Include/exclude patterns for selective transfer |
| Pricing | Per-GB transferred (no separate license) |
DataSync vs. scripted copies
| Capability | DataSync | S3 CLI / rsync |
|---|---|---|
| Speed | Purpose-built, ~10x faster | Standard network speed |
| Scheduling | Built-in task scheduler | Needs cron/external tooling |
| Integrity check | Automatic | Manual |
| Encryption | TLS by default | Must configure |
| Operations | AWS-managed | Self-managed scripts |
Worked example: Migrate 50 TB from on-prem NFS to Amazon EFS and then keep them in sync nightly. Deploy a DataSync agent on the NFS network, create a task NFS→EFS, run the initial full transfer, then schedule a nightly task with include filters so only changed datasets move. Validation confirms each task's integrity automatically.
AWS Transfer Family — Managed File-Transfer Protocols
Many organizations have partners and vendors hard-wired to SFTP/FTPS/FTP scripts or B2B AS2 flows. Transfer Family gives those workflows a managed AWS endpoint while the files land directly in S3 or EFS — no protocol change for the partner.
| Feature | Detail |
|---|---|
| Protocols | SFTP, FTPS, FTP, AS2 (B2B EDI) |
| Backend storage | Amazon S3 or Amazon EFS |
| Authentication | Service-managed users, Active Directory, or a custom Lambda authorizer |
| Endpoint types | Public, VPC, or VPC with internet-facing Elastic IPs |
| Scaling/HA | Fully managed, auto-scaling, multi-AZ |
| Auditing | Integrates with CloudWatch and CloudTrail |
Choosing between the two
| Requirement | Service |
|---|---|
| Bulk migrate or nightly-sync file shares to AWS | DataSync |
| Partners must keep using their SFTP client | Transfer Family |
| Move data between two AWS file systems | DataSync (no agent) |
| Inbound B2B EDI document exchange (AS2) | Transfer Family |
Worked example: A retailer's 200 suppliers drop nightly inventory files via SFTP to an aging on-prem server. Stand up a Transfer Family SFTP server with an S3 backend, give each supplier a service-managed user mapped to its own S3 prefix, and update DNS so existing SFTP scripts connect unchanged. Files now arrive in S3 and trigger downstream Lambda processing.
Common trap: Choosing Transfer Family for a one-time 50 TB migration. Transfer Family is about protocol compatibility for ongoing exchange, not high-throughput bulk migration — that is DataSync (online) or Snow Family (offline). Conversely, do not pick DataSync when the explicit requirement is "partners must use SFTP/FTPS," because DataSync exposes no such protocol endpoint.
On the Exam: "Migrate terabytes from on-prem NFS to EFS over the network, with scheduling and validation" → DataSync. "External partners need SFTP access landing in S3 without changing their tooling" → AWS Transfer Family.
Where each fits among the transfer services
The SAA-C03 exam clusters several transfer options, and the differentiator is almost always the requirement's keyword. DataSync wins on "fast online migration" and "scheduled sync with validation." Transfer Family wins on a named protocol ("SFTP," "FTPS," "AS2") for external parties. Snow Family wins on "limited bandwidth" plus large volume. Storage Gateway wins on "hybrid access" where on-premises apps need ongoing low-latency reads of data cached locally while it is durably stored in AWS. S3 Replication wins only for bucket-to-bucket copies inside AWS.
Reading the requirement keyword first, before the answer choices, prevents the classic mistake of choosing DataSync for an SFTP scenario or Transfer Family for a bulk migration.
Cost and operational considerations
DataSync charges per gigabyte transferred, so for very large one-time moves over a constrained link you should compare its cost and elapsed time against shipping a Snowball Edge device — past a certain volume-to-bandwidth ratio, offline shipping is both cheaper and faster. DataSync's bandwidth throttling lets you cap throughput during business hours so migration traffic does not starve production, and include/exclude filters restrict a task to specific paths.
Transfer Family is billed per enabled protocol-hour plus data transferred, and because it is fully managed and multi-AZ, it removes the need to patch and scale your own SFTP servers. For partner onboarding, map each Transfer Family user to a distinct S3 prefix and a least-privilege session policy so vendors can only see their own drop folder — the same per-identity isolation principle used with Cognito Identity Pools earlier in this chapter.
A company must migrate 50 TB from on-premises NFS storage to Amazon EFS over the network as quickly as possible, with scheduling and automatic integrity validation. Which service is best?
Business partners currently upload files via SFTP to an aging on-premises server. The company wants the workflow on AWS without partners changing their SFTP clients, storing files in S3. Which service should they use?
An architect needs recurring nightly synchronization between an on-premises NFS file system and Amazon S3, with automatic integrity validation and bandwidth throttling during business hours. Which service fits?