Where must custom text classification training documents be stored?

In Azure Blob Storage. Custom text classification and custom NER training documents must be stored in Azure Blob Storage as .txt files. The labels file (JSON) maps each document to its class labels or entity annotations. Language Studio connects to the storage container during project creation.

A custom NER model has high precision (0.95) but low recall (0.60) for the "CaseNumber" entity. What does this mean?

The model is accurate when it extracts CaseNumber but misses many actual CaseNumbers. High precision (0.95) means that when the model identifies a CaseNumber, it is correct 95% of the time. Low recall (0.60) means it only finds 60% of the actual CaseNumbers in the text — it misses 40%. To improve, add more labeled examples of CaseNumber entities, especially diverse examples.

Which method in the TextAnalyticsClient performs custom single-label text classification?

begin_single_label_classify(). begin_single_label_classify() is the method for custom single-label classification. For multi-label classification, use begin_multi_label_classify(). For custom NER, use begin_recognize_custom_entities(). These are all long-running operations that return a poller.

Custom Text Classification and Custom NER

Quick Answer: Custom text classification categorizes documents into your classes (single-label or multi-label). Custom NER extracts domain-specific entities. Both require labeled training data in Azure Blob Storage, training via Language Studio or REST API, and deployment to a prediction endpoint.

Custom Text Classification

Single-Label vs. Multi-Label

Type	Description	Example
Single-label	Each document gets exactly one class	Support tickets: "Billing" OR "Technical" OR "Account"
Multi-label	Each document can have multiple classes	Movie reviews: "Action" AND "Comedy" AND "Sci-Fi"

Training Data Requirements

Requirement	Minimum	Recommended
Documents	10 per class	50+ per class
Classes	2	Depends on use case
Document format	.txt files in Azure Blob Storage	UTF-8 encoded
Labels file	JSON file mapping documents to classes	Consistent labeling

Data Format

{
    "projectFileVersion": "2022-05-01",
    "stringIndexType": "Utf16CodeUnit",
    "metadata": {
        "projectKind": "CustomSingleLabelClassification",
        "projectName": "SupportTicketClassifier",
        "language": "en"
    },
    "assets": {
        "projectKind": "CustomSingleLabelClassification",
        "classes": [
            {"category": "Billing"},
            {"category": "Technical"},
            {"category": "Account"}
        ],
        "documents": [
            {
                "location": "ticket001.txt",
                "language": "en",
                "class": {"category": "Billing"}
            },
            {
                "location": "ticket002.txt",
                "language": "en",
                "class": {"category": "Technical"}
            }
        ]
    }
}

Calling the Custom Classification Endpoint

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(
    endpoint="https://my-language.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

# Single-label classification
poller = client.begin_single_label_classify(
    documents=["I was charged twice for my subscription renewal"],
    project_name="SupportTicketClassifier",
    deployment_name="production"
)

for result in poller.result():
    for classification in result.classifications:
        print(f"Class: {classification.category}")
        print(f"Confidence: {classification.confidence_score:.2f}")

Custom Named Entity Recognition

Custom NER trains a model to extract entities specific to your domain:

Example Use Cases

Domain	Custom Entities
Legal	Case numbers, judge names, statute references, legal terms
Healthcare	Drug names, dosages, symptoms, procedures
Finance	Account types, transaction codes, policy numbers
Manufacturing	Part numbers, defect types, machine identifiers

Training Data Format for Custom NER

{
    "projectFileVersion": "2022-05-01",
    "metadata": {
        "projectKind": "CustomEntityRecognition",
        "projectName": "LegalEntityExtractor",
        "language": "en"
    },
    "assets": {
        "projectKind": "CustomEntityRecognition",
        "entities": [
            {"category": "CaseNumber"},
            {"category": "JudgeName"},
            {"category": "StatuteReference"}
        ],
        "documents": [
            {
                "location": "legal_doc_001.txt",
                "language": "en",
                "entities": [
                    {
                        "regionOffset": 45,
                        "regionLength": 12,
                        "labels": [
                            {
                                "category": "CaseNumber",
                                "offset": 45,
                                "length": 12
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

Training and Evaluation

# Train a custom NER model
train_url = f"{endpoint}/language/authoring/analyze-text/projects/{project_name}/:train?api-version=2023-04-01"

train_body = {
    "modelLabel": "v1",
    "trainingConfigVersion": "latest",
    "evaluationOptions": {
        "kind": "percentage",
        "trainingSplitPercentage": 80,
        "testingSplitPercentage": 20
    }
}

response = requests.post(train_url, headers=headers, json=train_body)

Model Evaluation Metrics

Metric	Description	Formula
Precision	Correctness of positive predictions	TP / (TP + FP)
Recall	Completeness of positive predictions	TP / (TP + FN)
F1 Score	Harmonic mean of precision and recall	2 * (P * R) / (P + R)

On the Exam: Know how to interpret a confusion matrix. High precision but low recall means the model is conservative (misses some entities). Low precision but high recall means the model over-predicts (many false positives). F1 balances both.

Data Splitting Strategies

Strategy	Description	Best For
Automatic (percentage)	Random split (e.g., 80/20)	Most projects
Manual	You define which documents are train vs. test	When you need specific test cases

Best Practices for Training Data

Label consistency: Ensure the same type of text is always labeled the same way
Boundary precision: Entity labels must have exact character offsets
Negative examples: Include documents WITHOUT the target entities
Class balance: Roughly equal examples per entity type
Real-world data: Use data that represents actual production inputs

Azure AI Engineer Associate

4.3 Custom Text Classification and Custom NER

Key Takeaways

Custom Text Classification and Custom NER

Custom Text Classification

Single-Label vs. Multi-Label

Training Data Requirements

Data Format

Calling the Custom Classification Endpoint

Custom Named Entity Recognition

Example Use Cases

Training Data Format for Custom NER

Training and Evaluation

Model Evaluation Metrics

Data Splitting Strategies

Best Practices for Training Data

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (15-20%)

3Domain 2: Implement Content Moderation Solutions (10-15%)

4Domain 3: Implement Computer Vision Solutions (15-20%)

5Domain 4: Implement Natural Language Processing Solutions (25-30%)

6Domain 5: Implement Knowledge Mining and Document Intelligence Solutions (10-15%)

7Domain 6: Implement Generative AI Solutions (10-15%)

8Exam Review: Cross-Domain Topics and Advanced Practice

4.3 Custom Text Classification and Custom NER

Key Takeaways

Custom Text Classification and Custom NER

Custom Text Classification

Single-Label vs. Multi-Label

Training Data Requirements

Data Format

Calling the Custom Classification Endpoint

Custom Named Entity Recognition

Example Use Cases

Training Data Format for Custom NER

Training and Evaluation

Model Evaluation Metrics

Data Splitting Strategies

Best Practices for Training Data