A media company wants each news article tagged with every topic it covers, such as Politics, Economy, and Technology simultaneously. Which project type fits?

Custom multi-label classification. Multi-label classification lets one document carry several non-exclusive classes at once, which matches an article spanning multiple topics. Single-label allows only one class per document, custom NER extracts spans rather than categorizing documents, and CLU classifies short utterances into intents.

A custom NER model shows precision 0.95 and recall 0.60 for CaseNumber. What does this indicate and how do you improve it?

When it predicts CaseNumber it is usually right but it misses many real ones; add more diverse labeled CaseNumber examples. High precision means predictions are usually correct, while low recall means roughly 40% of actual CaseNumbers go undetected. The remedy is more varied labeled positive examples so the model generalizes to formats it currently misses, not adding negatives or changing infrastructure.

Where must the training documents for a custom classification project physically reside?

In Azure Blob Storage as .txt files. Custom classification and custom NER read .txt documents from an Azure Blob Storage container, with a JSON labels file mapping documents to classes or spans. Language Studio only connects to that container via SAS or managed identity rather than storing the documents itself.

Custom Text Classification and Custom NER | Free Guide 2026

Quick Answer: Custom text classification assigns documents to your own classes; custom NER extracts your own entity types. Both need labeled .txt files in Azure Blob Storage, a JSON labels file, a train/test split, training in Language Studio or REST, and a named deployment before prediction. Evaluation reports precision, recall, and F1 per class or entity.

Single-Label vs. Multi-Label Classification

Type	Rule	Example	Project kind
Single-label	Exactly one class per document	Ticket = Billing OR Technical	`CustomSingleLabelClassification`
Multi-label	Zero or more classes per document	Article = Politics AND Economy	`CustomMultiLabelClassification`

Pick single-label when classes are mutually exclusive; pick multi-label when a document can legitimately belong to several. Choosing wrong is a common scenario-question trap — a news article tagged with several topics needs multi-label.

Data and Labeling Requirements

Requirement	Minimum	Recommended
Documents per class/entity	10	50+
Storage	Azure Blob (.txt, UTF-8)	dedicated container
Labels file	one JSON mapping documents	versioned
Auth to storage	SAS or managed identity	managed identity

The labels JSON declares metadata.projectKind, the list of classes (or entities), and per-document assignments. For custom NER each labeled span carries an exact offset and length — off-by-one offsets are the most common cause of poor recall.

{
  "projectFileVersion": "2022-05-01",
  "metadata": {"projectKind": "CustomEntityRecognition",
               "projectName": "LegalExtractor", "language": "en"},
  "assets": {"projectKind": "CustomEntityRecognition",
    "entities": [{"category": "CaseNumber"}, {"category": "JudgeName"}],
    "documents": [{"location": "doc1.txt", "language": "en",
      "entities": [{"regionOffset": 45, "regionLength": 12,
        "labels": [{"category": "CaseNumber", "offset": 45, "length": 12}]}]}]}}

Training, Splitting, and Calling

poller = client.begin_single_label_classify(
    documents=["I was charged twice for my renewal"],
    project_name="TicketClassifier", deployment_name="production")
for result in poller.result():
    for c in result.classifications:
        print(c.category, round(c.confidence_score, 2))

Use begin_multi_label_classify for multi-label and begin_recognize_custom_entities for custom NER. All three are long-running pollers and require a deployment name — predicting against an undeployed model fails.

Split strategy	How	Best for
Automatic %	service randomly holds out e.g. 20%	most projects
Manual	you tag each doc Train or Test	reproducible/regulated test sets

Evaluation Metrics

Metric	Meaning	Formula
Precision	Of predicted positives, how many were right	TP / (TP + FP)
Recall	Of actual positives, how many were found	TP / (TP + FN)
F1	Harmonic mean balancing the two	2·P·R / (P + R)

Interpreting these is heavily tested. High precision + low recall = conservative model that misses real entities (fix: add more diverse positive examples). Low precision + high recall = over-eager model with many false positives (fix: add negative/boundary examples and clean mislabels).

Best Practices and Traps

Label consistency: the same text type must always get the same label, or the model learns noise.
Negative examples: include documents that contain none of the target entities so the model learns when not to predict.
Class balance: roughly equal counts per class; a dominant class skews accuracy upward while small classes silently fail.
Boundary precision: trim trailing punctuation/whitespace from custom-NER spans.

Common Trap: Custom features need data in Blob Storage, not uploaded directly in Language Studio — Studio merely connects to the container. And custom classification is distinct from CLU: CLU classifies short conversational utterances into intents, while custom text classification classifies whole documents into business categories.

Project Lifecycle and Storage Connection

The end-to-end lifecycle for both custom features is the same: connect a storage account, create the project and schema, label data, train with a chosen split, review metrics, deploy to a named slot, then call the runtime endpoint. The storage connection step trips up many candidates. Language Studio does not host your files; instead you grant the Language resource access to a blob container, preferably through a managed identity assigned the Storage Blob Data Contributor role, or via a shared access signature.

If that permission is missing, project creation succeeds but training fails to read documents, which surfaces as an empty or errored job rather than an obvious permission message.

A practical sizing rule is that the model is only as good as the diversity of its labeled spans and documents. Ten examples per class is a floor that produces a model you can demonstrate, not one you should ship; fifty or more per class is where accuracy stabilizes. For custom NER specifically, the quality of character offsets dominates everything else, so labeling in the Studio interface rather than hand-editing JSON is recommended because the tool computes offsets for you and respects the configured string index type.

When Custom Beats Pre-Built

The exam repeatedly contrasts custom features with pre-built ones and with generative alternatives. Choose custom NER when the entities are proprietary, such as internal part numbers, policy codes, or claim identifiers that no general model knows. Choose custom classification when categories reflect your own taxonomy, such as routing tickets into your specific support queues. If, instead, the entities are universal (people, places, dates) or the categories are generic sentiment, the pre-built models win because they need no data and no training cost. Knowing where that line sits lets you discard distractors quickly under time pressure.

Azure AI Engineer Associate

Azure AI-102

4.3 Custom Text Classification and Custom NER

Key Takeaways

Single-Label vs. Multi-Label Classification

Data and Labeling Requirements

Training, Splitting, and Calling

Evaluation Metrics

Best Practices and Traps

Project Lifecycle and Storage Connection

When Custom Beats Pre-Built

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

4.3 Custom Text Classification and Custom NER

Key Takeaways

Single-Label vs. Multi-Label Classification

Data and Labeling Requirements

Training, Splitting, and Calling

Evaluation Metrics

Best Practices and Traps

Project Lifecycle and Storage Connection

When Custom Beats Pre-Built