5.2 AI Enrichment with Skillsets

Key Takeaways

  • A skillset is an ordered array of skills; each skill declares a context, inputs (source paths), and outputs (targetName) against the enrichment tree.
  • Built-in skills cover OCR, image analysis, V3 entity recognition, key phrases, language detection, V3 sentiment, PII detection, translation, and the Split skill used to chunk text for RAG.
  • The enrichment tree is an in-memory hierarchy rooted at /document; the context controls iteration scope, e.g. /document/normalized_images/* fans a skill out over every image.
  • Custom Web API skills must honor the values/recordId/data request-response contract and respect batchSize and a timeout up to PT230S.
  • Knowledge stores persist enrichment to table projections (Power BI), object projections (JSON in Blob), and file projections (normalized images); incremental enrichment caches outputs to avoid reprocessing.
Last updated: June 2026

Quick Answer: A skillset is an ordered array of skills. Each skill has a context (where it runs), inputs (source paths in the enrichment tree), and outputs (named results). Built-in skills handle OCR, NER, key phrases, sentiment, PII, translation, and text splitting; custom Web API skills call your own HTTP endpoint. Knowledge stores persist enriched data to Table/Blob storage. Incremental enrichment caches results so unchanged documents are not reprocessed.

Anatomy of a Skill

Every skill, built-in or custom, shares the same shape:

{
  "@odata.type": "#Microsoft.Skills.Text.V3.EntityRecognitionSkill",
  "context": "/document",
  "categories": ["Organization", "Person", "Location"],
  "inputs":  [{"name": "text", "source": "/document/content"}],
  "outputs": [{"name": "organizations", "targetName": "organizations"}]
}
  • context sets where the skill executes and where its outputs attach. /document runs once per document; /document/normalized_images/* runs once per image and writes a result on each image node.
  • inputs.source is the read path; outputs.targetName is the new node name written under the context.

A broken skillset on the exam is usually a path mismatch: a skill reads /document/text when the content node is /document/content, or an OCR skill uses context: /document instead of /document/normalized_images/*, so it never iterates the images.

Built-in Skills You Must Know

Skill@odata.typeOutput
OCR#Microsoft.Skills.Vision.OcrSkilltext (per image)
Image Analysis#Microsoft.Skills.Vision.ImageAnalysisSkilltags, description, captions
Entity Recognition (V3)#Microsoft.Skills.Text.V3.EntityRecognitionSkillorganizations, persons, locations
Key Phrase Extraction#Microsoft.Skills.Text.KeyPhraseExtractionSkillkeyPhrases
Language Detection#Microsoft.Skills.Text.LanguageDetectionSkilllanguageCode
Sentiment (V3)#Microsoft.Skills.Text.V3.SentimentSkillsentiment, confidenceScores
PII Detection#Microsoft.Skills.Text.PIIDetectionSkillpiiEntities, maskedText
Translation#Microsoft.Skills.Text.TranslationSkilltranslatedText
Text Split#Microsoft.Skills.Text.SplitSkilltextItems (chunks)
Merge#Microsoft.Skills.Text.MergeSkillmergedText
Shaper#Microsoft.Skills.Util.ShaperSkillreshaped object

Two RAG-critical skills: The Split skill chunks long text into passages (textSplitMode: pages, with maximumPageLength and pageOverlapLength) so each chunk fits an embedding model's token limit — chunking is mandatory before vectorization. The Merge skill stitches OCR text from images back into the document body so downstream text skills see the full content.

Most text skills (entity recognition, key phrases, sentiment, PII) impose a per-call character limit (about 50,000 characters), which is another reason long documents must be split before enrichment. The newer AzureOpenAIEmbedding skill can call an Azure OpenAI deployment to generate embeddings inside the skillset itself — pairing a Split skill (chunk) with an embedding skill (vectorize) is the canonical integrated vectorization pattern that builds a RAG-ready index in one indexer run.

Image Enrichment Prerequisite

For OCR to run, the indexer (not the skillset) must extract images. Set imageAction: "generateNormalizedImages" and dataToExtract: "contentAndMetadata" in the indexer's parameters.configuration. This populates /document/normalized_images/*, which the OCR skill then reads. Without generateNormalizedImages, embedded images are silently ignored and the OCR skill produces nothing.

Attaching a Billable Cognitive Services Resource

Built-in skills that call AI (OCR, NER, key phrases, image analysis, translation) are billed per transaction beyond a free quota. A skillset that uses these must reference an attached Azure AI multi-service (Cognitive Services) key. Without it, you are capped at 20 free enrichment documents per indexer per day, after which the indexer fails with a quota error — a frequent exam trap when a pipeline that worked in testing stops at scale.

Custom Web API Skills

When no built-in skill fits — proprietary classification, a domain ontology, a third-party API — use a #Microsoft.Skills.Custom.WebApiSkill pointing at an Azure Function or any HTTPS endpoint.

{
  "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
  "uri": "https://my-func.azurewebsites.net/api/classify",
  "httpMethod": "POST", "timeout": "PT30S", "batchSize": 10,
  "context": "/document",
  "inputs":  [{"name": "text", "source": "/document/content"}],
  "outputs": [{"name": "category", "targetName": "documentCategory"}]
}

Your endpoint must honor this contract — the wrapper is values[], each item keyed by recordId, payload under data:

SideRequired keys
Requestvalues[].recordId, values[].data.<input>
Responsevalues[].recordId (echoed), values[].data.<output>, errors[], warnings[]

Limits / traps: timeout ranges roughly PT30S up to PT230S; batchSize controls records per call (lower it if the function times out). If a recordId in the response does not match the request, the skill drops that record. For long-running models, the AML skill (Azure Machine Learning) is the variant designed for deployed AML endpoints.

Knowledge Stores

A knowledge store persists enrichments to storage for analytics and reuse — independent of the search index.

ProjectionTargetBest for
TableAzure Table StoragePower BI dashboards, relational analytics
ObjectAzure Blob StorageEnriched JSON for data science / ML training
FileAzure Blob StorageNormalized images extracted from documents

To flatten the enrichment tree into the rows a table projection needs, you typically add a Shaper skill to build a single composite object, then project it. A scenario asking for "enriched data in Power BI" maps to table projections; "store the extracted images" maps to **file projections."

Incremental Enrichment and Debug Sessions

Incremental enrichment uses a cache (a Blob container) so that when you change one skill, only documents affected by that skill are reprocessed — not the entire corpus. This controls cost, since each Cognitive Services call is billable.

Debug sessions in the Azure portal let you step through one document's enrichment tree, inspect each skill's input/output, and edit a skill definition live. On the exam, "how do you troubleshoot a skill producing empty output" points to a debug session, and "avoid reprocessing 1M documents after a small skillset edit" points to incremental enrichment / the enrichment cache.

On the Exam: Memorize the path/context rules and the Web API values/recordId/data contract. Questions frequently hand you a JSON skillset with one wrong source path or a missing generateNormalizedImages setting and ask for the fix.

Test Your Knowledge

An OCR skill in a skillset produces no text even though the source PDFs contain scanned images. The skill's context is set to /document. What is the most likely fix?

A
B
C
D
Test Your Knowledge

Before generating embeddings for a RAG index, which built-in skill should you use to break long documents into passages that fit the embedding model's token limit?

A
B
C
D
Test Your Knowledge

A custom Web API skill calls an Azure Function. For a record to be processed correctly, what must the function's HTTP response include?

A
B
C
D
Test Your Knowledge

You edited one skill in a skillset that enriches one million documents and want to reprocess only the documents affected by that change. Which feature accomplishes this?

A
B
C
D