A team needs to extract text, tables, and key-value pairs from scanned vendor forms before summarization. Which AWS service is the best fit for the extraction stage?

Amazon Textract. Amazon Textract is designed to extract text, forms, and tables from documents. Summarization and compliance review are separate stages.

A generated contract summary says a required clause is present, but no extracted text supports the claim. What should the workflow do?

Route the item for human review and show the unsupported claim. Unsupported generated conclusions in compliance workflows should trigger review. The reviewer needs the original document, extraction output, citations, and uncertainty.

Which control best addresses unexpected sensitive data appearing in the document intake S3 bucket?

Use Amazon Macie findings as part of the data discovery and review process. Amazon Macie helps discover sensitive data in S3. Findings can inform privacy controls, access review, and whether the intake process should pause.

Document Intelligence and Compliance Review | Free Guide 2026

Lab scenario: contract and policy review intake

A financial services operations team receives vendor contracts, policy attestations, insurance certificates, and audit questionnaires. Reviewers manually open PDFs, copy fields into a case system, flag missing clauses, and write a short compliance summary. The team wants AI assistance to extract fields, identify document type, summarize key obligations, and route exceptions. The chief compliance officer says the AI may assist reviewers, but it may not make final legal or compliance decisions.

Start by splitting the workflow into stages. Document upload and storage are not the same as text extraction. Extraction is not the same as legal interpretation. Summarization is not the same as approval. A practitioner should map each stage to the service that fits the job and the risk control that keeps the workflow auditable. This prevents the common mistake of asking one foundation model to read everything, decide everything, and store everything with no review trail.

Workflow stage	AWS fit	Practitioner control question
Secure intake	Amazon S3 with KMS encryption and strict IAM	Who can upload, read, tag, delete, and restore documents?
Sensitive data discovery	Amazon Macie for S3 data discovery	Are PII and sensitive buckets known before documents enter AI workflows?
OCR and forms	Amazon Textract	Are forms, tables, signatures, and low-quality scans tested separately?
Classification and entities	Amazon Comprehend or Bedrock classification prompts	Is the document type and field extraction accurate enough for routing?
Summary and clause review	Amazon Bedrock with a controlled prompt and citations to extracted text	Does the model state uncertainty and avoid unsupported legal conclusions?
Human review	Amazon A2I or an internal review queue	Which exceptions require a person and what evidence do they see?
Audit evidence	CloudTrail, CloudWatch, AWS Config, Audit Manager, and ticket history	Can the organization prove who accessed and approved each document?

The pilot should use non-sensitive sample documents or approved redacted copies. Store them in a dedicated S3 bucket with default encryption using AWS KMS, bucket policies that deny public access, and least-privilege IAM roles. Enable object versioning if the retention plan requires it. Use tags such as document type, business unit, review status, data classification, and retention category. If Macie flags unexpected sensitive data in a location that should hold redacted samples only, stop and fix the intake process before expanding the lab.

Use Amazon Textract when the task is extracting printed or handwritten text, key-value pairs, and tables from documents. Test different formats: a clean PDF, a scanned image, a document with rotated pages, and a form with multiple tables. Expected observations include text blocks, detected forms, table cells, confidence values, and pages that need review. Failure modes include missing small print, merged table cells, low-confidence handwriting, and signatures being treated as ordinary marks. Do not hide confidence values from reviewers if they are important to the decision.

Add classification and summarization as separate steps. Amazon Comprehend can help with entity extraction and classification patterns for text, while Bedrock can generate a structured summary from extracted text. The prompt should ask for a concise summary, missing required fields, cited source snippets, and a list of uncertainties. It should not ask the model to decide whether a contract is legally acceptable. For high-impact reviews, route low-confidence extraction, missing clauses, conflicting clauses, or disallowed document types to human reviewers.

Decision log example:

Intake bucket: dedicated S3 bucket with KMS encryption, blocked public access, and restricted roles.
Extraction service: Textract for OCR, forms, and tables because the source is scanned PDFs.
Summarization service: Bedrock with a prompt that requires citations to extracted text and uncertainty flags.
Human review trigger: missing signature, low extraction confidence, unsupported summary claim, sensitive data mismatch, or high-value vendor category.
Evidence owner: compliance operations owns review notes, while cloud operations owns access logs and storage controls.
Retention rule: documents and outputs follow approved records policy, not model developer preference.

A responsible review queue shows the original document, extracted fields, confidence indicators, generated summary, citations, and reason for escalation. Reviewers should be able to approve, reject, correct fields, request more information, or mark the document out of scope. Their decision becomes the business record. The generated summary is supporting evidence, not the authority. If the team cannot explain how a reviewer changed or accepted AI output, the workflow is not ready for regulated use.

Compliance failure modes deserve explicit testing. Upload a document with a missing signature and confirm it routes to review. Upload a document with a table that Textract partially reads and confirm the low-confidence field is visible. Include a prompt injection inside a document, such as instructions telling the model to ignore policy, and verify guardrails and prompts do not follow it. Include a document with customer PII and confirm masking, access, logging, and retention behavior match the data classification.

Review prompts before the quiz:

Which documents are safe sample inputs, and who approved their use?
Which extracted fields can be auto-populated, and which require reviewer confirmation?
What evidence would an auditor need to reconstruct a review decision?
How are low confidence, missing clauses, and sensitive data mismatches routed?
What would make this workflow inappropriate for AI assistance until governance improves?

AWS AI Practitioner Study Guide

10.2 Document Intelligence and Compliance Review Lab

Key Takeaways

Lab scenario: contract and policy review intake

AWS AI Practitioner Study Guide

1Chapter 1: AIF-C01 Orientation and Official Source Control

2Chapter 2: AI/ML Foundations and Use-Case Fit

3Chapter 3: ML Lifecycle, Metrics, and Practitioner MLOps

4Chapter 4: Generative AI Foundations and Inference Concepts

5Chapter 5: Prompting, Model Selection, Customization, and Evaluation

6Chapter 6: Amazon Bedrock, RAG, Agents, and Guardrails

7Chapter 7: AWS Managed AI/ML Services and SageMaker Map

8Chapter 8: Responsible AI, Human Review, and Safety

9Chapter 9: Security, Compliance, Governance, and Cost Controls

10Chapter 10: Integrated AWS AI Business Scenario Labs

11Chapter 11: Final Review, Exam Readiness, and Recertification

10.2 Document Intelligence and Compliance Review Lab

Key Takeaways

Lab scenario: contract and policy review intake