5.1 Generative AI Fundamentals

Key Takeaways

  • Generative AI creates NEW content (text, images, code, audio) rather than analyzing or classifying existing content.
  • Large Language Models (LLMs) are deep neural networks trained on massive text datasets that can generate human-like text, answer questions, and follow instructions.
  • The Transformer architecture (specifically the self-attention mechanism) is what makes modern LLMs like GPT possible — it allows models to understand context and relationships in text.
  • Generative AI models are "pre-trained" on vast data (the "P" in GPT) and then can be fine-tuned or prompted for specific tasks.
  • Key limitations of generative AI include hallucinations (generating false information), bias from training data, and the inability to access real-time information without grounding.
Last updated: March 2026

Generative AI Fundamentals

Quick Answer: Generative AI creates new content (text, images, code, audio) using large language models (LLMs) built on the Transformer architecture. LLMs like GPT are pre-trained on massive datasets and can generate human-like text, answer questions, summarize documents, and write code. Key limitations include hallucinations, bias, and lack of real-time knowledge.

What Is Generative AI?

Generative AI is a category of artificial intelligence that can create new, original content. Unlike traditional AI that analyzes or classifies existing data (is this email spam?), generative AI produces new outputs that did not exist before.

What Generative AI Can Create

Content TypeExampleModel Type
TextArticles, emails, summaries, code, poetryGPT-4o, GPT-4
ImagesPhotographs, illustrations, design conceptsGPT-Image, DALL-E (retired)
CodeFunctions, classes, entire applicationsGPT-4o, Codex
AudioSpeech, music, sound effectsWhisper (recognition), neural TTS
ConversationsChatbot responses, virtual assistant dialoguesGPT-4o

Generative AI vs. Traditional AI

AspectTraditional AIGenerative AI
TaskAnalyze and classify existing dataCreate new content
OutputLabels, scores, predictionsText, images, code, audio
Example"This email is spam" (classification)"Write a professional email about..." (generation)
Data flowInput → Analysis → LabelInput (prompt) → Generation → New content

Large Language Models (LLMs)

Large Language Models are the neural networks that power text-based generative AI. They are called "large" because they have billions of parameters (adjustable values that the model learns during training).

How LLMs Work (Conceptual)

  1. Pre-training: The model is trained on massive amounts of text data (books, websites, articles, code). During training, it learns the statistical patterns of language — which words and phrases typically follow other words and phrases.

  2. Token prediction: At its core, an LLM predicts the next token (word or word-part) in a sequence. Given "The capital of France is", the model predicts "Paris" as the most likely next token.

  3. Context understanding: Through the Transformer's self-attention mechanism, the model understands the CONTEXT of words — "bank" means something different in "river bank" vs. "bank account."

  4. Emergent capabilities: Large enough models exhibit capabilities that were not explicitly trained, such as reasoning, following complex instructions, and chain-of-thought problem solving.

Key LLM Concepts

ConceptDefinitionExample
TokenA unit of text (word, subword, or character) processed by the model"Hello world" = 2 tokens; "unbelievable" might = 3 tokens
Context windowMaximum number of tokens the model can process at onceGPT-4o: 128,000 tokens (~96,000 words)
ParametersInternal values learned during trainingGPT-4: estimated hundreds of billions of parameters
TemperatureControls randomness of output (0 = deterministic, 1 = creative)Temperature 0 for factual answers; 0.7 for creative writing
InferenceThe process of generating output from the modelSending a prompt and receiving a response

The Transformer Architecture

The Transformer is the neural network architecture that makes modern generative AI possible. It was introduced in the 2017 paper "Attention Is All You Need."

Why Transformers Changed Everything

Before Transformers, language models processed text sequentially (one word at a time). Transformers introduced self-attention, which allows the model to consider ALL words in the input simultaneously:

Old Approach (RNNs)Transformer Approach
Process words one at a time (sequential)Process all words simultaneously (parallel)
Struggle with long-distance relationshipsCapture relationships across entire text
Slow to trainFast to train (parallelizable on GPUs)
Limited contextVery long context windows

Self-Attention Explained (Conceptual)

Self-attention allows each word to "attend to" (consider the relevance of) every other word in the input:

Example: "The cat sat on the mat because it was tired."

When processing "it," self-attention determines that "it" refers to "cat" (not "mat") by calculating the relevance of every other word. This is how the model understands context and references.

Pre-Training and Fine-Tuning

Pre-Training

  • The foundation step: train the model on massive, general-purpose text data
  • The model learns language patterns, facts, and reasoning
  • Requires enormous compute resources (thousands of GPUs for weeks)
  • Creates a "foundation model" with broad capabilities

Fine-Tuning

  • Adapt the pre-trained model for a specific task or domain
  • Uses a smaller, task-specific dataset
  • Much less compute required than pre-training
  • Examples: fine-tune for medical Q&A, legal document analysis, customer service

Prompt Engineering (No Training)

  • Use the pre-trained model AS-IS by crafting effective prompts
  • No model modification needed
  • The fastest way to customize model behavior
  • This is how most people use generative AI

On the Exam: Know the progression: pre-training (expensive, builds the foundation) → fine-tuning (moderate, adapts for a domain) → prompt engineering (cheap, customizes via instructions). The AI-900 focuses mainly on prompt engineering since it does not require ML expertise.

Limitations of Generative AI

1. Hallucinations

The model generates information that sounds plausible but is factually incorrect. The model does not "know" facts — it predicts statistically likely text.

Example: "Albert Einstein won the Nobel Prize in Physics in 1922 for his work on relativity."

  • Sounds plausible but is wrong — Einstein won in 1921, and it was for the photoelectric effect, not relativity.

2. Bias

Models inherit biases present in their training data. If the training data contains stereotypes or underrepresentation, the model may perpetuate these.

3. Knowledge Cutoff

Pre-trained models have a knowledge cutoff date — they do not know about events after their training data was collected. This is why grounding and RAG are important.

4. Inconsistency

The same prompt can produce different responses depending on temperature settings and random seed, making outputs non-deterministic.

5. Prompt Injection

Malicious users can craft prompts that trick the model into ignoring its instructions or producing harmful content.

On the Exam: Hallucination is the most commonly tested limitation. Know that generative AI can produce false information that sounds convincing, and that RAG (Retrieval-Augmented Generation) and grounding are key techniques to reduce hallucinations.

Test Your Knowledge

What is the key difference between generative AI and traditional AI?

A
B
C
D
Test Your Knowledge

What is a "hallucination" in the context of generative AI?

A
B
C
D
Test Your Knowledge

What does "GPT" stand for in models like GPT-4o?

A
B
C
D
Test Your Knowledge

Which feature of the Transformer architecture allows it to understand context by considering the relevance of all words in the input simultaneously?

A
B
C
D
Test Your Knowledge

Which approach to customizing generative AI is the fastest and least expensive?

A
B
C
D