A media team wants searchable text transcripts from recorded interviews. Which managed AWS service is the best starting point?

Amazon Transcribe. Amazon Transcribe converts speech to text, which can then be searched, analyzed, translated, or summarized by downstream services.

An ecommerce company has product metadata and user interaction events and wants individualized product recommendations. Which service is most directly aligned?

Amazon Personalize. Amazon Personalize is the managed AWS service for recommendation and personalization use cases based on interaction and item data.

A company wants to flag potentially unsafe user-uploaded images for review. Which service is the most relevant managed computer vision starting point?

Amazon Rekognition. Amazon Rekognition is the managed computer vision service for image and video analysis. High-impact moderation workflows should still include review and governance controls.

Vision, Speech, Contact Center, and Personal | Free Guide 2026

Matching modality to managed AI services

AWS managed AI services are often easiest to understand by modality. Images and videos point toward Amazon Rekognition. Audio that needs a text transcript points toward Amazon Transcribe. Text that should become speech points toward Amazon Polly. Conversational bot interactions point toward Amazon Lex. Contact center analytics can involve Amazon Connect features and services such as Transcribe, Comprehend, or foundation model assistance. Recommendations point toward Amazon Personalize.

Amazon Rekognition is the managed computer vision service most practitioners should recognize. It can analyze images and video for use cases such as label detection, content moderation, face-related workflows, and visual inspection patterns. The business question is never just whether the API can detect something. The question is whether the detection is reliable enough for the workflow, whether users gave appropriate consent, whether bias or privacy risk exists, and whether a human must review edge cases.

Amazon Transcribe converts speech to text. It fits call recording transcripts, meeting notes, media captions, and searchable audio archives. A transcript can then feed downstream analytics, translation, summarization, or quality review. The practitioner should ask about audio quality, speaker separation needs, vocabulary, language, retention, and whether users were informed that audio is recorded and processed.

Amazon Polly turns text into lifelike speech. It can support voice prompts, accessibility features, training content, and interactive applications. Polly is not a speech recognition service; it produces audio from text. Amazon Lex supports conversational interfaces such as chatbots and voice bots. Lex can collect intents and slots, route users, and integrate with business workflows. It is not the same as a general-purpose open chat model.

Contact center scenarios usually combine services. A customer calls a support line. Lex may handle simple self-service intents. Transcribe may create a transcript. Contact center analytics may evaluate sentiment, topics, or agent performance. A foundation model may summarize the call or draft after-call notes. The practitioner should break the workflow into steps before naming one service as the complete answer.

Amazon Personalize is designed for recommendation and personalization use cases. It can help recommend products, articles, media, or actions when the organization has useful interaction data. The service choice depends on whether the business has enough events, item metadata, and user context to create valuable recommendations. Personalization can improve experience, but it can also reinforce bias, create privacy concerns, or produce poor results if data is sparse.

Scenario signal	AWS service to consider	Governance or boundary question
Analyze images or video	Amazon Rekognition	Are privacy, consent, bias, and human review addressed?
Convert speech to text	Amazon Transcribe	Are audio quality, language, retention, and transcript accuracy acceptable?
Convert text to speech	Amazon Polly	Is generated voice appropriate for the user experience and accessibility need?
Build a structured chatbot or voice bot	Amazon Lex	Are intents, fallback paths, and escalation to humans clear?
Improve contact center insights	Amazon Connect features plus AI services	Which step needs AI: bot, transcript, sentiment, summary, or agent assist?
Recommend content or products	Amazon Personalize	Is there enough interaction and item data to support useful recommendations?

A common service-selection trap is confusing transcription with understanding. Transcribe can create text from a call, but it does not by itself decide why the customer is upset or which policy applies. That may require Comprehend, a contact center analytics feature, Bedrock summarization, a rules engine, or human review. Keep the workflow components separate so ownership and controls are clear.

Another trap is using computer vision for decisions that require a deterministic or legally defensible outcome without review. For example, a moderation workflow may use Rekognition to flag likely unsafe content, but a high-impact enforcement action may need human review, clear appeal processes, and monitoring for false positives. The service helps scale triage. It does not remove accountability.

Personalization also needs guardrails. A model can recommend popular items while ignoring new inventory, over-personalize based on sensitive behavior, or create a narrow user experience. Practitioners should ask what business metric matters, what data is collected, how consent is handled, whether users can opt out, and how recommendation quality is monitored. A simple top-sellers list may be enough for a small catalog.

A practical contact center review checklist:

Identify whether the business needs deflection, transcript search, quality scoring, sentiment insight, summarization, or agent assistance.
Separate live customer experience requirements from after-call analytics requirements.
Define when a bot must escalate to a human.
Confirm data retention, recording notice, consent, and access controls.
Test with noisy audio, accents, domain terms, frustrated customers, and incomplete conversations.
Monitor customer feedback, containment rate, error patterns, and cost per interaction.

In AWS Skill Builder or a sandbox, compare the service inputs and outputs. Uploading an image to a vision service, transcribing a short audio sample, and configuring a simple bot reveal very different operational questions. The study goal is not to become a contact center engineer. It is to identify which managed AI capability belongs in which part of the customer journey.

AWS AI Practitioner Study Guide

7.3 Vision, Speech, Contact Center, and Personalization Services

Key Takeaways

Matching modality to managed AI services

AWS AI Practitioner Study Guide

1Chapter 1: AIF-C01 Orientation and Official Source Control

2Chapter 2: AI/ML Foundations and Use-Case Fit

3Chapter 3: ML Lifecycle, Metrics, and Practitioner MLOps

4Chapter 4: Generative AI Foundations and Inference Concepts

5Chapter 5: Prompting, Model Selection, Customization, and Evaluation

6Chapter 6: Amazon Bedrock, RAG, Agents, and Guardrails

7Chapter 7: AWS Managed AI/ML Services and SageMaker Map

8Chapter 8: Responsible AI, Human Review, and Safety

9Chapter 9: Security, Compliance, Governance, and Cost Controls

10Chapter 10: Integrated AWS AI Business Scenario Labs

11Chapter 11: Final Review, Exam Readiness, and Recertification

7.3 Vision, Speech, Contact Center, and Personalization Services

Key Takeaways

Matching modality to managed AI services