4.5 Azure AI Translator Service
Key Takeaways
- Azure AI Translator supports text translation across 135 languages, document translation that preserves formatting, and Custom Translator for domain-specific models.
- Translator uses a dedicated global endpoint (api.cognitive.microsofttranslator.com) and requires the Ocp-Apim-Subscription-Region header in addition to the key.
- The /translate operation accepts a 'to' array for multi-target translation in one call and can auto-detect the source when 'from' is omitted.
- Transliteration converts between scripts of the SAME language; it does not translate meaning.
- Custom Translator trains on parallel sentences (≈10,000 minimum), is evaluated with BLEU score, and is invoked through a custom category ID.
Quick Answer: Azure AI Translator provides text translation across 135 languages, document translation that preserves formatting, and Custom Translator for domain models. It uses a unique global endpoint and, unlike most Azure AI services, requires a region header in addition to the key. The
/translateoperation takes atoarray for multi-language output in one request.
The Translator Endpoint Is Different
Most Azure AI services use https://<resource>.cognitiveservices.azure.com with only Ocp-Apim-Subscription-Key. Translator uses the global host https://api.cognitive.microsofttranslator.com and additionally requires Ocp-Apim-Subscription-Region. Omitting the region header is the single most common Translator failure (returns 401).
import requests, uuid
endpoint = "https://api.cognitive.microsofttranslator.com"
params = {"api-version": "3.0", "from": "en", "to": ["fr", "de", "es"]}
headers = {
"Ocp-Apim-Subscription-Key": "<key>",
"Ocp-Apim-Subscription-Region": "eastus", # REQUIRED
"Content-Type": "application/json",
"X-ClientTraceId": str(uuid.uuid4())}
body = [{"text": "Hello, how are you?"}]
r = requests.post(endpoint + "/translate", params=params, headers=headers, json=body)
for t in r.json()[0]["translations"]:
print(t["to"], t["text"]) # fr ... / de ... / es ...
If you omit from, Translator auto-detects the source and returns a detectedLanguage object with a confidence score.
Key API Operations
| Operation | Path | Purpose |
|---|---|---|
| Translate | /translate | text → one or more target languages |
| Detect | /detect | identify source language + score |
| Transliterate | /transliterate | convert script (same language) |
| Dictionary Lookup | /dictionary/lookup | alternative translations of a word |
| Dictionary Examples | /dictionary/examples | example sentences for a pair |
| Languages | /languages | list supported languages (no auth) |
Useful translate parameters
| Parameter | Effect |
|---|---|
textType | plain (default) or html (skips tags) |
profanityAction | NoAction, Marked, or Deleted |
suggestedFrom | fallback source if detection fails |
category | custom model category ID |
Transliteration vs. Translation
Transliteration rewrites text in a different script while keeping the same language — e.g. Japanese kanji こんにちは → Latin konnichiha. It does not change meaning. Translation changes language; transliteration changes alphabet. Exam items frequently swap these definitions.
params = {"api-version": "3.0", "language": "ja",
"fromScript": "jpan", "toScript": "latn"}
body = [{"text": "こんにちは"}] # -> "konnichiha"
Document Translation
Document translation preserves layout for .docx, .xlsx, .pptx, .pdf, .html, .txt/.csv, .rtf, .md and runs as an async batch:
- Upload source files to a Blob source container.
- Create a target container.
- Generate SAS tokens (or use managed identity) for both.
- POST a batch job specifying source/target containers + target language.
- Poll job status until
Succeeded. - Download translated files (formatting intact) from the target container.
Custom Translator
When industry terminology must translate consistently, train a custom model on parallel text (aligned source-target sentence pairs).
| Aspect | Detail |
|---|---|
| Document types | training, tuning, testing, phrase dictionary, sentence dictionary |
| Minimum data | ~10,000 parallel sentences for meaningful gains |
| Quality metric | BLEU (Bilingual Evaluation Understudy) score |
| Invocation | pass category=<custom-category-id> on /translate |
A phrase dictionary forces exact replacements (brand names) without retraining; a sentence dictionary enforces whole-sentence translations. Both can supplement a trained model.
Common Trap: BLEU (not F1, not accuracy) evaluates translation quality. And remember the region header — exam scenarios describing a 401 from Translator while "the key is correct" point at the missing
Ocp-Apim-Subscription-Region. Choosing Document Translation vs. text/translatehinges on whether formatting must be preserved across whole files.
Versioning, Batching, and Limits
The Translator REST surface is versioned with the api-version=3.0 query parameter, and forgetting it returns a 400. Requests are batched: the body is a JSON array of text objects, and you may send up to 100 array elements per call, with a combined limit of roughly 50,000 characters across the whole request. Each translation response mirrors that array, so the first input maps to the first result. Designing around these batch limits is a real-world concern the exam reflects in capacity-planning style questions, where the correct answer chunks a large document set rather than sending one giant payload.
Language detection deserves a precise mental model because three different services touch it. The Translator /detect operation returns the language plus a score and alternatives; Azure AI Language has its own detect_language feature; and the /translate operation auto-detects when from is omitted. They overlap but are billed and surfaced differently, so a question that says "detect the language and translate it in a single call" is pointing at omitting from on /translate, not at calling /detect first.
Choosing Translation Approach
Decide among the three Translator capabilities by the artifact and the consistency need. Plain strings or chat messages use text /translate; whole files where layout, tables, and styles must survive use Document Translation through Blob containers; and domain content where terminology must render consistently uses a Custom Translator model invoked by category ID. Layer a phrase dictionary on top when only a handful of brand terms must never change, since that avoids the ten-thousand-sentence training threshold entirely.
These distinctions, plus the mandatory region header and the BLEU evaluation metric, are the highest-yield Translator facts to carry into the exam.
Beyond the subscription key, which header must accompany Azure AI Translator calls and is a frequent cause of 401 errors when missing?
Converting the Japanese text こんにちは into the Latin string "konnichiha" while keeping it Japanese is an example of what?
A team trains a Custom Translator model on aligned bilingual sentences. Which metric reports its translation quality?