6.6 Azure OpenAI On Your Data and AI Agents

Key Takeaways

On Your Data delivers managed RAG by attaching a data source (Azure AI Search, Blob, Cosmos DB, uploaded files, URLs) to chat completions, returning grounded answers with citations and no custom retrieval code.
Key On Your Data parameters: query_type (simple/semantic/vector/vector_semantic_hybrid), in_scope (answer only from data), strictness 1-5, and top_n_documents 1-20.
AI agents extend LLMs with planning plus tool calling, code interpreter, file search/retrieval, and persistent memory to complete multi-step tasks autonomously.
Azure AI Foundry Agent Service is the managed platform: create agent (model + instructions + tools) -> create thread -> add message -> run -> handle requires_action tool calls -> read result.
Pick the simplest approach that works: On Your Data for standard RAG, custom RAG for full control, agents for dynamic multi-step tool-driven workflows.

Last updated: June 2026

Quick Answer: On Your Data attaches a data source (Azure AI Search, Blob, Cosmos DB, files, URLs) to chat completions for managed RAG with citations and no retrieval code. Agents add planning + tool calling to do multi-step work. Azure AI Foundry Agent Service is the managed agent platform: create agent → thread → message → run → handle requires_action → read result.

Azure OpenAI On Your Data

Instead of writing the embed→search→ground pipeline yourself (section 6.3), you point chat completions at an index and Azure handles retrieval, injection, and citation.

Data source	How it's used
Azure AI Search	Existing full-text + vector index
Azure Blob Storage	Files auto-chunked and indexed
Azure Cosmos DB	NoSQL document content
Uploaded files	Drag-in files, auto-processed
URLs / web	Crawled and indexed

resp = client.chat.completions.create(
  model="gpt4o-chat",
  messages=[{"role": "user", "content": "What is our return policy?"}],
  extra_body={"data_sources": [{
    "type": "azure_search",
    "parameters": {
      "endpoint": "https://my-search.search.windows.net",
      "index_name": "company-docs",
      "authentication": {"type": "system_assigned_managed_identity"},
      "query_type": "vector_semantic_hybrid",
      "embedding_dependency": {"type": "deployment_name",
                               "deployment_name": "embed-large"},
      "in_scope": True,        # answer ONLY from the data
      "strictness": 3,         # 1 permissive .. 5 strict
      "top_n_documents": 5}}]})
# message.context.citations holds the grounded sources

Parameter	Purpose	Values
query_type	Retrieval method	simple, semantic, vector, vector_semantic_hybrid
in_scope	Restrict to the data source	true / false
strictness	Relevance threshold	1 (loose) - 5 (strict)
top_n_documents	Docs retrieved	1 - 20

On the Exam: in_scope=true forces answers to come only from the data (else it declines), and higher strictness drops marginally relevant chunks — set it high when precision matters more than recall. vector_semantic_hybrid is the most comprehensive query_type.

AI Agents (Agentic AI)

An agent wraps an LLM with the ability to plan, call tools, run code, retrieve documents, and remember state — turning a one-shot chat into an autonomous multi-step worker.

Capability	What it adds	Example
Tool / function calling	Invoke your APIs	Query a database, hit a weather API
Code interpreter	Write & execute code in a sandbox	Analyze a CSV, make a chart
File search	Built-in RAG over uploaded files	Cite a PDF spec
Multi-step reasoning	Decompose tasks	Research → analyze → summarize
Memory (threads)	Persist conversation state	Recall earlier user constraints

Azure AI Foundry Agent Service Lifecycle

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

client = AIProjectClient(credential=DefaultAzureCredential(),
                         endpoint="https://my-project.services.ai.azure.com/")

agent = client.agents.create_agent(            # 1. define agent
    model="gpt-4o", name="Product Assistant",
    instructions="Help customers find and compare products.",
    tools=[{"type": "function", "function": {"name": "search_products", ...}},
           {"type": "code_interpreter"}, {"type": "file_search"}])

thread = client.agents.create_thread()         # 2. conversation thread
client.agents.create_message(thread_id=thread.id, role="user",
    content="Find a laptop under $1000 with 16GB RAM")  # 3. add message
run = client.agents.create_run(thread_id=thread.id, assistant_id=agent.id)  # 4. run

while run.status in ("queued", "in_progress", "requires_action"):  # 5. tool loop
    run = client.agents.get_run(thread_id=thread.id, run_id=run.id)
    if run.status == "requires_action":
        outputs = []
        for call in run.required_action.submit_tool_outputs.tool_calls:
            result = run_tool(call.function.name, json.loads(call.function.arguments))
            outputs.append({"tool_call_id": call.id, "output": json.dumps(result)})
        client.agents.submit_tool_outputs(thread_id=thread.id, run_id=run.id,
                                          tool_outputs=outputs)
# 6. read final answer
print(client.agents.list_messages(thread_id=thread.id).data[0].content[0].text.value)

The requires_action status is the heart of the pattern: the agent has decided to call a tool and pauses until your code executes it and submits outputs back. The run then resumes; this can repeat several times before reaching completed.

Choosing a RAG/Agent Approach

Approach	Effort	Control	Best for
On Your Data	Low	Limited	Standard managed RAG fast
Custom RAG (code)	Medium	Full	Custom ranking/pre/post-processing
Agent + tools	High	Maximum	Dynamic multi-step, tool-driven tasks

On Your Data vs. building it yourself

On Your Data trades flexibility for speed. You cannot fully control the chunking, the ranking algorithm, or post-processing the way a hand-built pipeline (section 6.3) allows, but you also write almost no retrieval code and get citations for free. Choose it for standard "answer questions over my documents" requirements; choose custom RAG when a scenario demands bespoke ranking, filtering, or transformation of retrieved content before it reaches the model.

Why agents differ from plain RAG

The distinction the exam draws is autonomy and steps. A RAG call retrieves once and answers once. An agent can decide to search, then call an API, then run code, then answer — looping through requires_action as many times as the task needs, carrying state in its thread. That power costs complexity: more tokens, harder debugging, and the need to secure every tool the agent can invoke. Built-in tool types — code_interpreter for sandboxed code and file_search for managed RAG over uploaded files — let you add common capabilities without writing your own functions.

On the Exam: The 2026 AI-102 leans into agents. Memorize the lifecycle order (create agent → thread → message → run → handle requires_action → read result) and that requires_action = waiting for your tool output, not an error or completion. Prefer the simplest option that satisfies the requirement — don't reach for an agent when On Your Data or a single function call suffices.

Test Your Knowledge

What does the in_scope parameter do in Azure OpenAI On Your Data?

Restricts which Azure regions can serve the request

Limits the model to answer only from the connected data source, not general knowledge

Filters retrieved documents by date

Forces the response into a specific language

Test Your Knowledge

In the Azure AI Foundry Agent Service, a run reports status 'requires_action'. What does this mean?

The run failed with an unrecoverable error

The agent finished and the answer is ready

The agent wants to call one or more tools and is waiting for your code to submit their outputs

The thread is waiting for a new user message

Test Your Knowledge

Which On Your Data query_type returns the most comprehensive results?

simple

semantic

vector

vector_semantic_hybrid

Test Your Knowledge

A team needs standard question-answering over a single Azure AI Search index with citations and minimal code. Which approach is most appropriate?

Build a fully custom RAG pipeline from scratch

Use Azure OpenAI On Your Data with the index as the data source

Deploy a multi-tool autonomous agent

Fine-tune GPT-4o on the index contents

Up Next

3.1 Agentic Solutions: Foundry Agent Service and the Microsoft Agent Framework

Domain 3: Implement an Agentic Solution (5-10%)

Azure AI Engineer Associate

Azure AI-102

6.6 Azure OpenAI On Your Data and AI Agents

Key Takeaways

Azure OpenAI On Your Data

AI Agents (Agentic AI)

Azure AI Foundry Agent Service Lifecycle

Choosing a RAG/Agent Approach

On Your Data vs. building it yourself

Why agents differ from plain RAG

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

6.6 Azure OpenAI On Your Data and AI Agents

Key Takeaways

Azure OpenAI On Your Data

AI Agents (Agentic AI)

Azure AI Foundry Agent Service Lifecycle

Choosing a RAG/Agent Approach

On Your Data vs. building it yourself

Why agents differ from plain RAG