9.2 Encryption, Secrets, Networking, and Data Privacy
Key Takeaways
- AI data should be protected at rest and in transit, including prompts, source documents, model artifacts, logs, embeddings, and generated outputs where they are stored.
- AWS KMS, TLS, IAM, Secrets Manager, VPC design, private connectivity, and storage policies help reduce exposure, but they must be matched to the data risk.
- Secrets such as API keys, database passwords, and service credentials should not be placed in prompts, code, notebooks, source files, or plain environment variables without controls.
- Data privacy for AI depends on minimization, classification, retention, residency, logging decisions, masking, and user consent or policy basis where required.
- Practitioners should ask whether sensitive data is actually needed for the model task before approving its use.
Protecting the AI data path
AI security discussions often focus on the model, but the data path is usually wider. A generative AI application may read documents from Amazon S3, create embeddings, store vectors, send prompts to Amazon Bedrock, call Lambda functions, write logs to CloudWatch Logs, store transcripts, and return outputs to users. Each step can contain sensitive data. Encryption, secrets management, network controls, and privacy rules should be considered before the pilot becomes production.
Encryption at rest protects stored data. S3 objects, databases, logs, model artifacts, vector indexes, and backups should use encryption settings appropriate to the sensitivity of the data. AWS Key Management Service, or AWS KMS, lets organizations manage cryptographic keys and key policies for many AWS services. A practitioner does not need to design key rotation logic, but should know that key ownership, access to keys, and logging destinations matter.
Encryption in transit protects data moving between clients, applications, and services. TLS is the normal expectation for service APIs and web applications. If users upload documents to an AI assistant, those uploads should be protected in transit. If an application calls Bedrock, SageMaker AI, S3, OpenSearch Service, or an API action, the network path should use secure protocols and avoid unnecessary public exposure.
| Control area | What it protects | Practitioner question |
|---|---|---|
| Encryption at rest | Stored prompts, logs, documents, embeddings, outputs, and artifacts | Are storage locations encrypted with keys that match data sensitivity? |
| Encryption in transit | Requests and responses between users, apps, and AWS services | Are secure protocols used for uploads, model calls, and APIs? |
| Secrets management | Passwords, API keys, database credentials, and tokens | Are secrets stored in AWS Secrets Manager or another approved system instead of prompts or code? |
| Network boundaries | Traffic paths to services and data stores | Can private connectivity or VPC endpoints reduce public network exposure? |
| Privacy controls | Personal, regulated, customer, or confidential information | Is the data necessary, minimized, masked, retained properly, and logged intentionally? |
Secrets are a common failure point. A developer may paste a database password into a notebook, put a third-party API key in application code, or ask a model to remember a token during troubleshooting. That is not acceptable governance. Secrets should be stored in a managed system such as AWS Secrets Manager and retrieved by authorized applications at runtime. IAM should decide which role can read a secret, and logs should not expose secret values.
Network design depends on the workload. Some AI tools are internal and should not be reachable from the public internet. A web application may run in a VPC, use private subnets for backend services, and access AWS services through controlled paths. For some AWS services, VPC endpoints and AWS PrivateLink can keep traffic on the AWS network path instead of sending it over the public internet. The exam-level judgment is to recognize when private connectivity and network segmentation reduce exposure.
Data privacy starts with minimization. If the model only needs a product description, do not include customer account numbers. If a summarizer only needs a transcript excerpt, do not include unrelated case history. If a RAG assistant needs approved policy documents, do not index personal employee records by default. The safest sensitive data is the data that never enters the AI workflow.
Logging decisions can create privacy risk. Model invocation logs, application logs, traces, and error messages may contain prompts, responses, retrieved passages, or user identifiers. Those logs are useful for troubleshooting and safety review, but they need access control, encryption, retention limits, and redaction where appropriate. A team should not enable verbose logging in production without knowing who can read the logs and how long they are kept.
Data residency and retention are business requirements, not only technical settings. Some organizations restrict where regulated or customer data may be processed or stored. Others have rules for deleting chat transcripts, removing documents from search indexes, or keeping audit evidence. A practitioner should ask which Region is approved, which records must be retained, which records must be deleted, and whether downstream copies such as embeddings or logs follow the same rule.
Privacy review workflow:
- Classify the data that will enter the AI workflow.
- Decide whether each data element is necessary for the task.
- Remove, mask, tokenize, or aggregate sensitive fields where possible.
- Confirm encryption at rest and in transit for each storage and service boundary.
- Store secrets in an approved secrets manager and grant access only to runtime roles that need them.
- Decide whether logs will contain prompts, responses, or retrieved context.
- Define retention, deletion, Region, and access-review requirements before launch.
Scenario: a financial services team wants to summarize customer calls. The model may need the call transcript, but it probably does not need full payment card details or authentication secrets. The design should redact or avoid sensitive fields, protect the transcript store, limit who can view summaries, and control logs. A human review step may also be required if summaries affect customer decisions.
Scenario: an engineering team wants a code assistant that can query internal systems. Secrets Manager can store API credentials, but the assistant should not reveal those credentials to users or include them in model prompts. The backend can call approved APIs using a role, while the model receives only the minimum result needed to answer the user. That separation is a basic privacy and security pattern.
In AWS Skill Builder practice, pay attention to where data lands. A lab might show S3 encryption, KMS keys, Secrets Manager, VPC settings, or CloudWatch Logs. Do not memorize them as isolated services. Map them onto the AI path: source data, prompt, model call, retrieved context, output, logs, and action systems.
A developer pasted a database password into a prompt while testing an AI assistant. What is the best practitioner response?
A RAG assistant needs to answer product policy questions. Which privacy principle should be applied first?
A company wants to reduce public network exposure for an internal AI application that calls AWS services. Which control family is most relevant?