Agent Tools, Functions, Memory, and Evaluation

Key Takeaways

  • A Foundry agent combines a model, instructions, tools, and managed runtime concepts so it can reason, retrieve, call APIs, and complete multi-step tasks.
  • Knowledge tools ground answers, action tools change systems, and function or OpenAPI schemas define the callable contract the model can request.
  • Threads or conversations store multi-turn state, but cross-session memory should be deliberately persisted, permissioned, summarized, and injected only when relevant.
  • Tool calls require validation, least privilege, trusted outputs, retry handling, and human approval for costly, sensitive, or irreversible actions.
  • Evaluation and tracing are production requirements: measure groundedness, relevance, safety, tool correctness, latency, token use, and failure patterns.
Last updated: June 2026

From Chatbot to Agent

A simple chatbot mostly generates text. An agent can reason over a goal, use tools, retrieve knowledge, call APIs, run code, and continue through multiple steps. Microsoft Foundry Agent Service provides managed agent runtime options so teams can define prompt agents quickly or deploy hosted agents with custom code while still using Foundry models, platform tools, identity, observability, and publishing controls.

Core Agent Pieces

PiecePurposeDesign question
ModelReasoning and language capabilityDoes the task need a large, small, reasoning, or multimodal model?
InstructionsRole, goals, constraints, and policiesWhat should the agent always do or refuse?
ToolsKnowledge access or external actionWhat can the agent read, calculate, or change?
Conversation stateMulti-turn contextWhat history is needed for this run?
Memory storeDurable user or task factsWhat should persist across sessions, and who may see it?
EvaluationQuality and safety evidenceWhich metrics block release or trigger rollback?

AI-103 uses terms such as agents, conversations or threads, messages, runs or responses, tool schemas, memory, tracing, and evaluation. In the classic thread model, a thread stores conversation messages, and a run invokes the agent against that thread. Newer Foundry agent patterns expose a Responses API and conversation concepts, but the same exam idea remains: state is explicit, scoped, and inspected.

Knowledge Tools and Action Tools

Knowledge tools add context. Examples include Azure AI Search, file search, web grounding, Fabric, or a knowledge base. They should return source metadata so the agent can cite evidence and so tracing can explain what was retrieved.

Action tools change or inspect external systems. Examples include function calling, Azure Functions, Logic Apps, OpenAPI tools, Model Context Protocol endpoints, code interpreter, and browser automation. The model may choose a tool, but the platform or host code must enforce policy. For a purchase order agent, reading vendor status might be automatic. Submitting an order, changing a bank account, or emailing a customer should require confirmation and least-privilege credentials.

A tool schema is the contract the model sees: name, description, parameters, required fields, enum values, and expected result shape. Good schemas are narrow and obvious. Do not expose a generic run_sql tool when a read_vendor_invoice_status tool would limit risk. Validate arguments before execution, validate outputs before reusing them in prompts, and treat tool output as untrusted because retrieved or returned text can contain indirect prompt injection.

Memory and Threads

Conversation memory is not the same as RAG. RAG retrieves source documents; memory preserves user or task context. A thread can remember prior turns for one conversation, but cross-session memory should be stored in a governed database or index keyed by user, tenant, or task. Store only useful facts, summarize when needed, expire stale memory, and avoid injecting sensitive or irrelevant history into every prompt.

Evaluation, Tracing, and Operations

Foundry evaluations can test agents, models, or datasets before deployment and monitor production quality afterward. For RAG and agents, measure retrieval relevance, final-answer relevance, groundedness, response completeness, safety, tool success rate, refusal correctness, latency, and token cost. Tracing captures the evidence trail: prompt construction, retrieved chunks, model calls, tool invocations, outputs, errors, and timing. When an agent gives a bad answer, traces show whether the defect came from weak instructions, missing retrieval, a bad tool result, or an unsafe tool choice.

Official Anchors

Test Your KnowledgeMulti-Select

Which THREE controls are most appropriate before allowing a Foundry agent to call an external API that updates customer records? (Select three)

Select all that apply

Validate the tool arguments against the schema and business rules before execution
Grant the tool broad administrator permissions so the model never encounters access errors
Require user or human approval for sensitive or irreversible updates
Log traces that include the selected tool, arguments, result, latency, and errors
Treat all tool output as trusted system instructions for the next model call