6.6 Azure OpenAI On Your Data and AI Agents
Key Takeaways
- On Your Data delivers managed RAG by attaching a data source (Azure AI Search, Blob, Cosmos DB, uploaded files, URLs) to chat completions, returning grounded answers with citations and no custom retrieval code.
- Key On Your Data parameters: query_type (simple/semantic/vector/vector_semantic_hybrid), in_scope (answer only from data), strictness 1-5, and top_n_documents 1-20.
- AI agents extend LLMs with planning plus tool calling, code interpreter, file search/retrieval, and persistent memory to complete multi-step tasks autonomously.
- Azure AI Foundry Agent Service is the managed platform: create agent (model + instructions + tools) -> create thread -> add message -> run -> handle requires_action tool calls -> read result.
- Pick the simplest approach that works: On Your Data for standard RAG, custom RAG for full control, agents for dynamic multi-step tool-driven workflows.
Quick Answer: On Your Data attaches a data source (Azure AI Search, Blob, Cosmos DB, files, URLs) to chat completions for managed RAG with citations and no retrieval code. Agents add planning + tool calling to do multi-step work. Azure AI Foundry Agent Service is the managed agent platform: create agent → thread → message → run → handle
requires_action→ read result.
Azure OpenAI On Your Data
Instead of writing the embed→search→ground pipeline yourself (section 6.3), you point chat completions at an index and Azure handles retrieval, injection, and citation.
| Data source | How it's used |
|---|---|
| Azure AI Search | Existing full-text + vector index |
| Azure Blob Storage | Files auto-chunked and indexed |
| Azure Cosmos DB | NoSQL document content |
| Uploaded files | Drag-in files, auto-processed |
| URLs / web | Crawled and indexed |
resp = client.chat.completions.create(
model="gpt4o-chat",
messages=[{"role": "user", "content": "What is our return policy?"}],
extra_body={"data_sources": [{
"type": "azure_search",
"parameters": {
"endpoint": "https://my-search.search.windows.net",
"index_name": "company-docs",
"authentication": {"type": "system_assigned_managed_identity"},
"query_type": "vector_semantic_hybrid",
"embedding_dependency": {"type": "deployment_name",
"deployment_name": "embed-large"},
"in_scope": True, # answer ONLY from the data
"strictness": 3, # 1 permissive .. 5 strict
"top_n_documents": 5}}]})
# message.context.citations holds the grounded sources
| Parameter | Purpose | Values |
|---|---|---|
| query_type | Retrieval method | simple, semantic, vector, vector_semantic_hybrid |
| in_scope | Restrict to the data source | true / false |
| strictness | Relevance threshold | 1 (loose) - 5 (strict) |
| top_n_documents | Docs retrieved | 1 - 20 |
On the Exam:
in_scope=trueforces answers to come only from the data (else it declines), and higher strictness drops marginally relevant chunks — set it high when precision matters more than recall.vector_semantic_hybridis the most comprehensivequery_type.
AI Agents (Agentic AI)
An agent wraps an LLM with the ability to plan, call tools, run code, retrieve documents, and remember state — turning a one-shot chat into an autonomous multi-step worker.
| Capability | What it adds | Example |
|---|---|---|
| Tool / function calling | Invoke your APIs | Query a database, hit a weather API |
| Code interpreter | Write & execute code in a sandbox | Analyze a CSV, make a chart |
| File search | Built-in RAG over uploaded files | Cite a PDF spec |
| Multi-step reasoning | Decompose tasks | Research → analyze → summarize |
| Memory (threads) | Persist conversation state | Recall earlier user constraints |
Azure AI Foundry Agent Service Lifecycle
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
client = AIProjectClient(credential=DefaultAzureCredential(),
endpoint="https://my-project.services.ai.azure.com/")
agent = client.agents.create_agent( # 1. define agent
model="gpt-4o", name="Product Assistant",
instructions="Help customers find and compare products.",
tools=[{"type": "function", "function": {"name": "search_products", ...}},
{"type": "code_interpreter"}, {"type": "file_search"}])
thread = client.agents.create_thread() # 2. conversation thread
client.agents.create_message(thread_id=thread.id, role="user",
content="Find a laptop under $1000 with 16GB RAM") # 3. add message
run = client.agents.create_run(thread_id=thread.id, assistant_id=agent.id) # 4. run
while run.status in ("queued", "in_progress", "requires_action"): # 5. tool loop
run = client.agents.get_run(thread_id=thread.id, run_id=run.id)
if run.status == "requires_action":
outputs = []
for call in run.required_action.submit_tool_outputs.tool_calls:
result = run_tool(call.function.name, json.loads(call.function.arguments))
outputs.append({"tool_call_id": call.id, "output": json.dumps(result)})
client.agents.submit_tool_outputs(thread_id=thread.id, run_id=run.id,
tool_outputs=outputs)
# 6. read final answer
print(client.agents.list_messages(thread_id=thread.id).data[0].content[0].text.value)
The requires_action status is the heart of the pattern: the agent has decided to call a tool and pauses until your code executes it and submits outputs back. The run then resumes; this can repeat several times before reaching completed.
Choosing a RAG/Agent Approach
| Approach | Effort | Control | Best for |
|---|---|---|---|
| On Your Data | Low | Limited | Standard managed RAG fast |
| Custom RAG (code) | Medium | Full | Custom ranking/pre/post-processing |
| Agent + tools | High | Maximum | Dynamic multi-step, tool-driven tasks |
On Your Data vs. building it yourself
On Your Data trades flexibility for speed. You cannot fully control the chunking, the ranking algorithm, or post-processing the way a hand-built pipeline (section 6.3) allows, but you also write almost no retrieval code and get citations for free. Choose it for standard "answer questions over my documents" requirements; choose custom RAG when a scenario demands bespoke ranking, filtering, or transformation of retrieved content before it reaches the model.
Why agents differ from plain RAG
The distinction the exam draws is autonomy and steps. A RAG call retrieves once and answers once. An agent can decide to search, then call an API, then run code, then answer — looping through requires_action as many times as the task needs, carrying state in its thread. That power costs complexity: more tokens, harder debugging, and the need to secure every tool the agent can invoke. Built-in tool types — code_interpreter for sandboxed code and file_search for managed RAG over uploaded files — let you add common capabilities without writing your own functions.
On the Exam: The 2026 AI-102 leans into agents. Memorize the lifecycle order (create agent → thread → message → run → handle
requires_action→ read result) and thatrequires_action= waiting for your tool output, not an error or completion. Prefer the simplest option that satisfies the requirement — don't reach for an agent when On Your Data or a single function call suffices.
What does the in_scope parameter do in Azure OpenAI On Your Data?
In the Azure AI Foundry Agent Service, a run reports status 'requires_action'. What does this mean?
Which On Your Data query_type returns the most comprehensive results?
A team needs standard question-answering over a single Azure AI Search index with citations and minimal code. Which approach is most appropriate?