4.3 Custom Text Classification and Custom NER

Key Takeaways

  • Custom text classification assigns documents to your classes in single-label (one class) or multi-label (many classes) projects.
  • Custom NER extracts domain-specific entities the pre-built NER model does not cover, using labeled character spans.
  • Both features require .txt documents in Azure Blob Storage plus a JSON labels file; Language Studio connects to the container via managed identity or SAS.
  • Training supports automatic percentage split (e.g. 80/20) or manual train/test assignment; evaluation reports precision, recall, and F1 per class/entity.
  • Both are long-running operations: begin_single_label_classify / begin_multi_label_classify / begin_recognize_custom_entities return a poller and require a named deployment.
Last updated: June 2026

Quick Answer: Custom text classification assigns documents to your own classes; custom NER extracts your own entity types. Both need labeled .txt files in Azure Blob Storage, a JSON labels file, a train/test split, training in Language Studio or REST, and a named deployment before prediction. Evaluation reports precision, recall, and F1 per class or entity.

Single-Label vs. Multi-Label Classification

TypeRuleExampleProject kind
Single-labelExactly one class per documentTicket = Billing OR TechnicalCustomSingleLabelClassification
Multi-labelZero or more classes per documentArticle = Politics AND EconomyCustomMultiLabelClassification

Pick single-label when classes are mutually exclusive; pick multi-label when a document can legitimately belong to several. Choosing wrong is a common scenario-question trap — a news article tagged with several topics needs multi-label.

Data and Labeling Requirements

RequirementMinimumRecommended
Documents per class/entity1050+
StorageAzure Blob (.txt, UTF-8)dedicated container
Labels fileone JSON mapping documentsversioned
Auth to storageSAS or managed identitymanaged identity

The labels JSON declares metadata.projectKind, the list of classes (or entities), and per-document assignments. For custom NER each labeled span carries an exact offset and lengthoff-by-one offsets are the most common cause of poor recall.

{
  "projectFileVersion": "2022-05-01",
  "metadata": {"projectKind": "CustomEntityRecognition",
               "projectName": "LegalExtractor", "language": "en"},
  "assets": {"projectKind": "CustomEntityRecognition",
    "entities": [{"category": "CaseNumber"}, {"category": "JudgeName"}],
    "documents": [{"location": "doc1.txt", "language": "en",
      "entities": [{"regionOffset": 45, "regionLength": 12,
        "labels": [{"category": "CaseNumber", "offset": 45, "length": 12}]}]}]}}

Training, Splitting, and Calling

poller = client.begin_single_label_classify(
    documents=["I was charged twice for my renewal"],
    project_name="TicketClassifier", deployment_name="production")
for result in poller.result():
    for c in result.classifications:
        print(c.category, round(c.confidence_score, 2))

Use begin_multi_label_classify for multi-label and begin_recognize_custom_entities for custom NER. All three are long-running pollers and require a deployment name — predicting against an undeployed model fails.

Split strategyHowBest for
Automatic %service randomly holds out e.g. 20%most projects
Manualyou tag each doc Train or Testreproducible/regulated test sets

Evaluation Metrics

MetricMeaningFormula
PrecisionOf predicted positives, how many were rightTP / (TP + FP)
RecallOf actual positives, how many were foundTP / (TP + FN)
F1Harmonic mean balancing the two2·P·R / (P + R)

Interpreting these is heavily tested. High precision + low recall = conservative model that misses real entities (fix: add more diverse positive examples). Low precision + high recall = over-eager model with many false positives (fix: add negative/boundary examples and clean mislabels).

Best Practices and Traps

  • Label consistency: the same text type must always get the same label, or the model learns noise.
  • Negative examples: include documents that contain none of the target entities so the model learns when not to predict.
  • Class balance: roughly equal counts per class; a dominant class skews accuracy upward while small classes silently fail.
  • Boundary precision: trim trailing punctuation/whitespace from custom-NER spans.

Common Trap: Custom features need data in Blob Storage, not uploaded directly in Language Studio — Studio merely connects to the container. And custom classification is distinct from CLU: CLU classifies short conversational utterances into intents, while custom text classification classifies whole documents into business categories.

Project Lifecycle and Storage Connection

The end-to-end lifecycle for both custom features is the same: connect a storage account, create the project and schema, label data, train with a chosen split, review metrics, deploy to a named slot, then call the runtime endpoint. The storage connection step trips up many candidates. Language Studio does not host your files; instead you grant the Language resource access to a blob container, preferably through a managed identity assigned the Storage Blob Data Contributor role, or via a shared access signature.

If that permission is missing, project creation succeeds but training fails to read documents, which surfaces as an empty or errored job rather than an obvious permission message.

A practical sizing rule is that the model is only as good as the diversity of its labeled spans and documents. Ten examples per class is a floor that produces a model you can demonstrate, not one you should ship; fifty or more per class is where accuracy stabilizes. For custom NER specifically, the quality of character offsets dominates everything else, so labeling in the Studio interface rather than hand-editing JSON is recommended because the tool computes offsets for you and respects the configured string index type.

When Custom Beats Pre-Built

The exam repeatedly contrasts custom features with pre-built ones and with generative alternatives. Choose custom NER when the entities are proprietary, such as internal part numbers, policy codes, or claim identifiers that no general model knows. Choose custom classification when categories reflect your own taxonomy, such as routing tickets into your specific support queues. If, instead, the entities are universal (people, places, dates) or the categories are generic sentiment, the pre-built models win because they need no data and no training cost. Knowing where that line sits lets you discard distractors quickly under time pressure.

Test Your Knowledge

A media company wants each news article tagged with every topic it covers, such as Politics, Economy, and Technology simultaneously. Which project type fits?

A
B
C
D
Test Your Knowledge

A custom NER model shows precision 0.95 and recall 0.60 for CaseNumber. What does this indicate and how do you improve it?

A
B
C
D
Test Your Knowledge

Where must the training documents for a custom classification project physically reside?

A
B
C
D