Prompt Design, Model Selection, and Structured Outputs

Key Takeaways

AI-103 expects you to choose models by task requirements: reasoning depth, modality, latency, context size, cost, deployment type, and tool support.
A strong prompt defines role, task, context, constraints, output format, refusal behavior, and examples only when examples clarify the expected pattern.
Structured outputs are for reliable machine-readable JSON; function calling is for deciding whether an app should execute a tool or API.
For schema-bound outputs, require every field, set additionalProperties to false, validate the response, and handle schema or refusal failures.
Generation parameters such as temperature and max output tokens are operational controls, not substitutes for grounding, tool validation, or evaluations.

Last updated: June 2026

Why This Topic Matters

AI-103 (officially Developing AI Apps and Agents on Azure, leading to the Microsoft Certified: Azure AI Apps and Agents Developer Associate credential) is not a trivia test about model names. It passes at 700 out of 1000 (a scaled score, not a raw percentage), is delivered through Pearson VUE, and frames you as an Azure AI engineer who builds, manages, and deploys Microsoft Foundry (formerly Azure AI Foundry) apps and agents.

The blueprint weights Implement generative AI and agentic solutions at 30-35% and Plan and manage an Azure AI solution at 25-30%, so prompt design, model choice, and structured output are core scoring areas. Expect every model-choice question to hinge on task fit, not on which model is newest or largest.

Model Selection Decision Grid

Model selection means matching a deployment to seven dimensions: reasoning depth, modality (text, vision, audio, multimodal), latency tolerance, context-window size, token cost, deployment type (standard, provisioned/PTU, or fine-tuned), and tool support.

Requirement	Prefer	Exam clue
Deep planning, math, code review, multi-step analysis	Reasoning-capable large model	Hard problem, high quality, latency acceptable
Short summarization, routing, classification, simple extraction	Small or mini model	High volume, low cost, low latency
Images, screenshots, diagrams, audio, or mixed inputs	Multimodal model	User provides non-text evidence
Similarity search and RAG indexing	Embedding model	Need vectors, chunk retrieval, semantic match
Exact business schema	Structured outputs	Downstream code needs valid JSON
External action or live data	Tool or function calling	Need API, database, workflow, or search result

A safe exam answer usually chooses the smallest capability set that satisfies the task. Do not pick a reasoning model for a simple sentiment label if a mini model meets accuracy and latency targets. The exam rewards cost-awareness: a high-volume classification pipeline running a frontier model is a wrong answer when a small model passes the same accuracy bar.

RAG versus fine-tuning versus prompting

Memorize this decision ladder, because AI-103 tests it repeatedly:

Need current or private facts? Use retrieval-augmented generation (RAG) — do not fine-tune to inject facts, because facts go stale and retraining is expensive.
Need a durable behavior, tone, format, or domain style the model keeps getting wrong even with good prompts? Use fine-tuning.
Need a quick correction or constraint? Fix the prompt first; it is the cheapest lever.

Prompt Anatomy

A production prompt separates durable instructions from user input and retrieved content. Use system or developer messages for role, policy, and format; treat user text, tool output, documents, image optical character recognition (OCR), and retrieved chunks as untrusted input vulnerable to prompt injection. A practical prompt checklist:

Role: what the assistant is responsible for.
Task: what the current request requires.
Context: facts, retrieved chunks, or tool results to use.
Constraints: what not to do, what to refuse, what to ask about.
Format: table, bullets, JSON schema, or terse answer.
Quality rule: cite sources, ask a clarifying question, or say when evidence is missing.

Use few-shot examples when the output pattern is subtle, such as mapping support tickets to a custom severity taxonomy. Avoid long example libraries that inflate token cost and bury the real request.

Generation parameters

Temperature controls randomness (0 = near-deterministic, higher = more varied); top_p is nucleus sampling that limits the token pool by cumulative probability — tune one, not both. max output tokens caps response length and cost. For deterministic extraction, routing, or classification, set a low temperature and a tight token cap. For brainstorming or creative drafting, a higher temperature is acceptable, but business apps on the exam reward predictability. Parameters are operational controls; they never replace grounding, tool validation, or evaluations.

Structured JSON Outputs

Structured outputs constrain the model to a JSON Schema subset so application code can parse and validate the response without brittle string parsing. They suit extraction, routing, tool arguments, evaluation records, and UI state. Design the schema as an API contract:

Mark every field required (the schema-strict mode demands all properties be required).
Use enums for closed choices (e.g. priority: low, medium, high).
Use arrays or nested objects only where the app expects them.
Set additionalProperties to false so unexpected keys never leak into automation.
Always validate the returned JSON and handle a schema failure or a model refusal as a real code path, not a happy-path assumption.

Function calling is related but different. Structured output answers in a shape. Function calling lets the model request a tool call with arguments, while the host app executes the tool and returns the result. The model proposes the call; your code stays responsible for argument validation, authorization, retries, user confirmation for high-impact actions, and error handling. Worked example: an order app exposes a lookup_order_status(orderId) function. The model emits the call with an argument; your backend authenticates, runs it, and feeds the result back so the model can phrase the answer.

The model never touches the database directly.

Official Anchors

Microsoft Learn AI-103 study guide: https://learn.microsoft.com/en-us/credentials/certifications/resources/study-guides/ai-103
Azure OpenAI function calling: https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/function-calling
Azure OpenAI structured outputs: https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/structured-outputs

Test Your Knowledge

A claims intake app must extract claimType, policyNumber, dateOfLoss, and priority as valid JSON for a workflow engine. The answer must reject any extra fields. Which design best fits the requirement?

Use a high-temperature chat prompt that asks for a concise paragraph

Use structured outputs with a JSON schema, required fields, enums where appropriate, and additionalProperties set to false

Fine-tune a model so it memorizes the company's current claim records

Use an embedding model directly as the final response generator

Test Your Knowledge

A team runs a frontier reasoning model to label 2 million short support messages per day as positive, neutral, or negative. Accuracy is already met by a small model in testing, but cost and latency are too high. What is the best change?

Raise temperature to 1.0 so the reasoning model answers faster

Switch the high-volume classification step to a small or mini model that meets the accuracy bar

Fine-tune the reasoning model on the labels to reduce token cost

Add more few-shot examples to the reasoning prompt

Up Next

RAG, Grounding, Embeddings, and Azure AI Search

Continue learning

Microsoft Azure AI Apps and Agents Developer Associate

Microsoft Azure AI App and Agent Developer (AI-103)

Prompt Design, Model Selection, and Structured Outputs

Key Takeaways

Why This Topic Matters

Model Selection Decision Grid

RAG versus fine-tuning versus prompting

Prompt Anatomy

Generation parameters

Structured JSON Outputs

Official Anchors

Microsoft Azure AI Apps and Agents Developer Associate

1AI-103 Blueprint, Microsoft Foundry, and Solution Planning

2Generative AI, Agents, and Retrieval-Augmented Generation

3Vision, Language, Information Extraction, and Final Review

Microsoft Azure AI App and Agent Developer (AI-103)

Prompt Design, Model Selection, and Structured Outputs

Key Takeaways

Why This Topic Matters

Model Selection Decision Grid

RAG versus fine-tuning versus prompting

Prompt Anatomy

Generation parameters

Structured JSON Outputs

Official Anchors