What is the correct order of components in the Azure AI Search indexing pipeline?

Data Source → Indexer → Skillset → Index. The correct pipeline order is: Data Source (where data lives) → Indexer (pulls data) → Skillset (enriches data with AI) → Index (stores searchable content). The indexer coordinates the entire process, pulling from the data source, applying the skillset, and writing to the index.

Which indexer parameter must be set to enable OCR processing of images embedded in documents?

imageAction: "generateNormalizedImages". Setting imageAction to "generateNormalizedImages" tells the indexer to extract and normalize images from documents, making them available for OCR processing by skills in the skillset. Without this setting, images in documents are ignored during indexing.

What does semantic ranking do in Azure AI Search?

It re-ranks search results using a language model to better understand query intent. Semantic ranking uses a deep learning language model to re-rank the top search results based on semantic understanding of the query intent. It also provides semantic captions (relevant passages) and semantic answers (direct answers extracted from results).

In a vector search query, what does the k_nearest_neighbors parameter specify?

The number of most similar results to return. k_nearest_neighbors specifies how many of the most similar results to return. A vector search finds the K vectors in the index that are closest (most similar) to the query vector based on the distance metric (e.g., cosine similarity).

Azure AI Search — Fundamentals and Architecture

Quick Answer: Azure AI Search provides full-text search with AI enrichment. The pipeline is: Data Source → Indexer → Skillset (AI enrichment) → Index → Query. Indexers pull data from Azure storage, SQL, and Cosmos DB. Skillsets enrich data with OCR, NER, key phrases, and custom skills. Vector search enables RAG patterns.

Azure AI Search Architecture

[Data Sources]           [AI Enrichment]           [Search]
┌─────────────┐     ┌─────────────────┐     ┌──────────────┐
│ Blob Storage│     │   Skillset      │     │ Search Index │
│ SQL Database│ ──▶ │ ┌─────────────┐ │ ──▶ │ ┌──────────┐ │ ──▶ [Client]
│ Cosmos DB   │     │ │ OCR         │ │     │ │ Full-text│ │
│ Table Store │     │ │ NER         │ │     │ │ Vector   │ │
└─────────────┘     │ │ Key Phrases │ │     │ │ Semantic │ │
    [Indexer]       │ │ Custom Skill│ │     │ └──────────┘ │
                    │ └─────────────┘ │     └──────────────┘
                    └─────────────────┘

Core Components

1. Data Sources

Supported data sources for automatic indexing:

Data Source	Connector	Best For
Azure Blob Storage	Built-in	Documents, images, PDFs
Azure SQL Database	Built-in	Structured relational data
Azure Cosmos DB	Built-in	NoSQL document data
Azure Table Storage	Built-in	Key-value data
Azure Data Lake Gen2	Built-in	Large-scale data lakes
SharePoint	Built-in	Enterprise documents

2. Indexers

Indexers automate data ingestion:

{
    "name": "my-blob-indexer",
    "dataSourceName": "my-blob-datasource",
    "targetIndexName": "my-search-index",
    "skillsetName": "my-ai-skillset",
    "schedule": {
        "interval": "PT2H"
    },
    "parameters": {
        "configuration": {
            "dataToExtract": "contentAndMetadata",
            "imageAction": "generateNormalizedImages",
            "parsingMode": "default"
        }
    },
    "fieldMappings": [
        {
            "sourceFieldName": "metadata_storage_name",
            "targetFieldName": "documentName"
        }
    ],
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/organizations",
            "targetFieldName": "organizations"
        }
    ]
}

Key Indexer Settings

Setting	Description
schedule.interval	How often to run (e.g., PT2H = every 2 hours)
dataToExtract	"contentAndMetadata" or "storageMetadata"
imageAction	"generateNormalizedImages" to enable OCR on images
parsingMode	"default", "json", "jsonArray", "jsonLines", "delimitedText"
fieldMappings	Map source fields to index fields (before enrichment)
outputFieldMappings	Map enrichment outputs to index fields (after enrichment)

3. Search Index

The search index defines the schema for searchable content:

{
    "name": "my-search-index",
    "fields": [
        {"name": "id", "type": "Edm.String", "key": true, "filterable": true},
        {"name": "content", "type": "Edm.String", "searchable": true, "analyzer": "en.microsoft"},
        {"name": "title", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true},
        {"name": "organizations", "type": "Collection(Edm.String)", "searchable": true, "filterable": true, "facetable": true},
        {"name": "keyPhrases", "type": "Collection(Edm.String)", "searchable": true, "filterable": true},
        {"name": "language", "type": "Edm.String", "filterable": true},
        {"name": "contentVector", "type": "Collection(Edm.Single)", "searchable": true, "dimensions": 1536, "vectorSearchProfile": "my-vector-profile"}
    ]
}

Field Attributes

Attribute	Description	Use Case
searchable	Full-text searchable with analyzer	Text content for keyword search
filterable	Can be used in $filter expressions	Category filtering, date ranges
sortable	Can be used in $orderby expressions	Sort by date, relevance, name
facetable	Can be used for faceted navigation	Filter sidebar (by category, author)
retrievable	Returned in search results	Fields to display to the user

Vector Search

Vector search uses embeddings (numerical representations of text) to find semantically similar content:

Creating a Vector Search Index

{
    "name": "my-vector-index",
    "fields": [
        {"name": "id", "type": "Edm.String", "key": true},
        {"name": "content", "type": "Edm.String", "searchable": true},
        {
            "name": "contentVector",
            "type": "Collection(Edm.Single)",
            "searchable": true,
            "dimensions": 1536,
            "vectorSearchProfile": "my-vector-profile"
        }
    ],
    "vectorSearch": {
        "algorithms": [
            {
                "name": "my-hnsw-algo",
                "kind": "hnsw",
                "hnswParameters": {
                    "metric": "cosine",
                    "m": 4,
                    "efConstruction": 400,
                    "efSearch": 500
                }
            }
        ],
        "profiles": [
            {
                "name": "my-vector-profile",
                "algorithmConfigurationName": "my-hnsw-algo"
            }
        ]
    }
}

Vector Query

from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.models import VectorizedQuery

search_client = SearchClient(
    endpoint="https://my-search.search.windows.net",
    index_name="my-vector-index",
    credential=AzureKeyCredential("<your-key>")
)

# Generate embedding for the query
query_embedding = get_embedding("What is machine learning?")  # From OpenAI

# Vector search
results = search_client.search(
    search_text=None,  # No text search, vector only
    vector_queries=[
        VectorizedQuery(
            vector=query_embedding,
            k_nearest_neighbors=5,
            fields="contentVector"
        )
    ]
)

for result in results:
    print(f"Score: {result['@search.score']}")
    print(f"Content: {result['content'][:200]}")

Semantic Ranking

Semantic ranking uses a language model to re-rank search results by understanding query intent:

Feature	Description
Semantic ranker	Re-ranks top results using deep learning for relevance
Semantic captions	Extracts the most relevant passage from each result
Semantic answers	Extracts a direct answer to the query from the top results

On the Exam: Know the difference between keyword search (BM25), vector search (embeddings), and semantic ranking (re-ranking). For RAG, hybrid search (keyword + vector) with semantic ranking provides the best results.

Azure AI Engineer Associate

5.1 Azure AI Search — Fundamentals and Architecture

Key Takeaways

Azure AI Search — Fundamentals and Architecture

Azure AI Search Architecture

Core Components

1. Data Sources

2. Indexers

Key Indexer Settings

3. Search Index

Field Attributes

Vector Search

Creating a Vector Search Index

Vector Query

Semantic Ranking

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (15-20%)

3Domain 2: Implement Content Moderation Solutions (10-15%)

4Domain 3: Implement Computer Vision Solutions (15-20%)

5Domain 4: Implement Natural Language Processing Solutions (25-30%)

6Domain 5: Implement Knowledge Mining and Document Intelligence Solutions (10-15%)

7Domain 6: Implement Generative AI Solutions (10-15%)

8Exam Review: Cross-Domain Topics and Advanced Practice

5.1 Azure AI Search — Fundamentals and Architecture

Key Takeaways

Azure AI Search — Fundamentals and Architecture

Azure AI Search Architecture

Core Components

1. Data Sources

2. Indexers

Key Indexer Settings

3. Search Index

Field Attributes

Vector Search

Creating a Vector Search Index

Vector Query

Semantic Ranking