5.1 Generative AI Fundamentals

Key Takeaways

Generative AI creates NEW content (text, images, code, audio) rather than analyzing or classifying existing content.
Large Language Models (LLMs) are deep neural networks trained on massive text datasets that can generate human-like text, answer questions, and follow instructions.
The Transformer architecture (specifically the self-attention mechanism) is what makes modern LLMs like GPT possible — it allows models to understand context and relationships in text.
Generative AI models are "pre-trained" on vast data (the "P" in GPT) and then can be fine-tuned or prompted for specific tasks.
Key limitations of generative AI include hallucinations (generating false information), bias from training data, and the inability to access real-time information without grounding.

Last updated: March 2026

Generative AI Fundamentals

Quick Answer: Generative AI creates new content (text, images, code, audio) using large language models (LLMs) built on the Transformer architecture. LLMs like GPT are pre-trained on massive datasets and can generate human-like text, answer questions, summarize documents, and write code. Key limitations include hallucinations, bias, and lack of real-time knowledge.

What Is Generative AI?

Generative AI is a category of artificial intelligence that can create new, original content. Unlike traditional AI that analyzes or classifies existing data (is this email spam?), generative AI produces new outputs that did not exist before.

What Generative AI Can Create

Content Type	Example	Model Type
Text	Articles, emails, summaries, code, poetry	GPT-4o, GPT-4
Images	Photographs, illustrations, design concepts	GPT-Image, DALL-E (retired)
Code	Functions, classes, entire applications	GPT-4o, Codex
Audio	Speech, music, sound effects	Whisper (recognition), neural TTS
Conversations	Chatbot responses, virtual assistant dialogues	GPT-4o

Generative AI vs. Traditional AI

Aspect	Traditional AI	Generative AI
Task	Analyze and classify existing data	Create new content
Output	Labels, scores, predictions	Text, images, code, audio
Example	"This email is spam" (classification)	"Write a professional email about..." (generation)
Data flow	Input → Analysis → Label	Input (prompt) → Generation → New content

Large Language Models (LLMs)

Large Language Models are the neural networks that power text-based generative AI. They are called "large" because they have billions of parameters (adjustable values that the model learns during training).

How LLMs Work (Conceptual)

Pre-training: The model is trained on massive amounts of text data (books, websites, articles, code). During training, it learns the statistical patterns of language — which words and phrases typically follow other words and phrases.
Token prediction: At its core, an LLM predicts the next token (word or word-part) in a sequence. Given "The capital of France is", the model predicts "Paris" as the most likely next token.
Context understanding: Through the Transformer's self-attention mechanism, the model understands the CONTEXT of words — "bank" means something different in "river bank" vs. "bank account."
Emergent capabilities: Large enough models exhibit capabilities that were not explicitly trained, such as reasoning, following complex instructions, and chain-of-thought problem solving.

Key LLM Concepts

Concept	Definition	Example
Token	A unit of text (word, subword, or character) processed by the model	"Hello world" = 2 tokens; "unbelievable" might = 3 tokens
Context window	Maximum number of tokens the model can process at once	GPT-4o: 128,000 tokens (~96,000 words)
Parameters	Internal values learned during training	GPT-4: estimated hundreds of billions of parameters
Temperature	Controls randomness of output (0 = deterministic, 1 = creative)	Temperature 0 for factual answers; 0.7 for creative writing
Inference	The process of generating output from the model	Sending a prompt and receiving a response

The Transformer Architecture

The Transformer is the neural network architecture that makes modern generative AI possible. It was introduced in the 2017 paper "Attention Is All You Need."

Why Transformers Changed Everything

Before Transformers, language models processed text sequentially (one word at a time). Transformers introduced self-attention, which allows the model to consider ALL words in the input simultaneously:

Old Approach (RNNs)	Transformer Approach
Process words one at a time (sequential)	Process all words simultaneously (parallel)
Struggle with long-distance relationships	Capture relationships across entire text
Slow to train	Fast to train (parallelizable on GPUs)
Limited context	Very long context windows

Self-Attention Explained (Conceptual)

Self-attention allows each word to "attend to" (consider the relevance of) every other word in the input:

Example: "The cat sat on the mat because it was tired."

When processing "it," self-attention determines that "it" refers to "cat" (not "mat") by calculating the relevance of every other word. This is how the model understands context and references.

Pre-Training and Fine-Tuning

Pre-Training

The foundation step: train the model on massive, general-purpose text data
The model learns language patterns, facts, and reasoning
Requires enormous compute resources (thousands of GPUs for weeks)
Creates a "foundation model" with broad capabilities

Fine-Tuning

Adapt the pre-trained model for a specific task or domain
Uses a smaller, task-specific dataset
Much less compute required than pre-training
Examples: fine-tune for medical Q&A, legal document analysis, customer service

Prompt Engineering (No Training)

Use the pre-trained model AS-IS by crafting effective prompts
No model modification needed
The fastest way to customize model behavior
This is how most people use generative AI

On the Exam: Know the progression: pre-training (expensive, builds the foundation) → fine-tuning (moderate, adapts for a domain) → prompt engineering (cheap, customizes via instructions). The AI-900 focuses mainly on prompt engineering since it does not require ML expertise.

Limitations of Generative AI

1. Hallucinations

The model generates information that sounds plausible but is factually incorrect. The model does not "know" facts — it predicts statistically likely text.

Example: "Albert Einstein won the Nobel Prize in Physics in 1922 for his work on relativity."

Sounds plausible but is wrong — Einstein won in 1921, and it was for the photoelectric effect, not relativity.

2. Bias

Models inherit biases present in their training data. If the training data contains stereotypes or underrepresentation, the model may perpetuate these.

3. Knowledge Cutoff

Pre-trained models have a knowledge cutoff date — they do not know about events after their training data was collected. This is why grounding and RAG are important.

4. Inconsistency

The same prompt can produce different responses depending on temperature settings and random seed, making outputs non-deterministic.

5. Prompt Injection

Malicious users can craft prompts that trick the model into ignoring its instructions or producing harmful content.

On the Exam: Hallucination is the most commonly tested limitation. Know that generative AI can produce false information that sounds convincing, and that RAG (Retrieval-Augmented Generation) and grounding are key techniques to reduce hallucinations.

Test Your Knowledge

What is the key difference between generative AI and traditional AI?

Generative AI is faster than traditional AI

Generative AI creates new content while traditional AI analyzes existing data

Traditional AI is more accurate than generative AI

Generative AI does not use machine learning

Test Your Knowledge

What is a "hallucination" in the context of generative AI?

The model seeing images that do not exist

The model generating information that sounds plausible but is factually incorrect

The model refusing to answer a question

The model processing too many tokens at once

Test Your Knowledge

What does "GPT" stand for in models like GPT-4o?

General Processing Technology

Generative Pre-trained Transformer

Global Pattern Training

Graphical Processing Tool

Test Your Knowledge

Which feature of the Transformer architecture allows it to understand context by considering the relevance of all words in the input simultaneously?

Backpropagation

Self-attention

Gradient descent

Tokenization

Test Your Knowledge

Which approach to customizing generative AI is the fastest and least expensive?

Pre-training a new model from scratch

Fine-tuning the model with domain-specific data

Prompt engineering (crafting effective prompts)

Rebuilding the Transformer architecture

Up Next

5.2 Azure OpenAI Service

Continue learning

Microsoft Azure AI Fundamentals

1Introduction

2Domain 1: Describe AI Workloads and Considerations (15-20%)

3Domain 2: Fundamental Principles of Machine Learning on Azure (20-25%)

4Domain 3: Computer Vision Workloads on Azure (15-20%)

5Domain 4: Natural Language Processing Workloads on Azure (15-20%)

6Domain 5: Generative AI Workloads on Azure (15-20%)

7Exam Review and Full-Length Practice Questions

5.1 Generative AI Fundamentals

Key Takeaways

Generative AI Fundamentals

What Is Generative AI?

What Generative AI Can Create

Generative AI vs. Traditional AI

Large Language Models (LLMs)

How LLMs Work (Conceptual)

Key LLM Concepts

The Transformer Architecture

Why Transformers Changed Everything

Self-Attention Explained (Conceptual)

Pre-Training and Fine-Tuning

Pre-Training

Fine-Tuning

Prompt Engineering (No Training)

Limitations of Generative AI

1. Hallucinations

2. Bias

3. Knowledge Cutoff

4. Inconsistency

5. Prompt Injection