5.1 Generative AI Fundamentals
Key Takeaways
- Generative AI creates NEW content (text, images, code, audio) rather than analyzing or classifying existing content.
- Large Language Models (LLMs) are deep neural networks trained on massive text datasets that can generate human-like text, answer questions, and follow instructions.
- The Transformer architecture (specifically the self-attention mechanism) is what makes modern LLMs like GPT possible — it allows models to understand context and relationships in text.
- Generative AI models are "pre-trained" on vast data (the "P" in GPT) and then can be fine-tuned or prompted for specific tasks.
- Key limitations of generative AI include hallucinations (generating false information), bias from training data, and the inability to access real-time information without grounding.
Generative AI Fundamentals
Quick Answer: Generative AI creates new content (text, images, code, audio) using large language models (LLMs) built on the Transformer architecture. LLMs like GPT are pre-trained on massive datasets and can generate human-like text, answer questions, summarize documents, and write code. Key limitations include hallucinations, bias, and lack of real-time knowledge.
What Is Generative AI?
Generative AI is a category of artificial intelligence that can create new, original content. Unlike traditional AI that analyzes or classifies existing data (is this email spam?), generative AI produces new outputs that did not exist before.
What Generative AI Can Create
| Content Type | Example | Model Type |
|---|---|---|
| Text | Articles, emails, summaries, code, poetry | GPT-4o, GPT-4 |
| Images | Photographs, illustrations, design concepts | GPT-Image, DALL-E (retired) |
| Code | Functions, classes, entire applications | GPT-4o, Codex |
| Audio | Speech, music, sound effects | Whisper (recognition), neural TTS |
| Conversations | Chatbot responses, virtual assistant dialogues | GPT-4o |
Generative AI vs. Traditional AI
| Aspect | Traditional AI | Generative AI |
|---|---|---|
| Task | Analyze and classify existing data | Create new content |
| Output | Labels, scores, predictions | Text, images, code, audio |
| Example | "This email is spam" (classification) | "Write a professional email about..." (generation) |
| Data flow | Input → Analysis → Label | Input (prompt) → Generation → New content |
Large Language Models (LLMs)
Large Language Models are the neural networks that power text-based generative AI. They are called "large" because they have billions of parameters (adjustable values that the model learns during training).
How LLMs Work (Conceptual)
-
Pre-training: The model is trained on massive amounts of text data (books, websites, articles, code). During training, it learns the statistical patterns of language — which words and phrases typically follow other words and phrases.
-
Token prediction: At its core, an LLM predicts the next token (word or word-part) in a sequence. Given "The capital of France is", the model predicts "Paris" as the most likely next token.
-
Context understanding: Through the Transformer's self-attention mechanism, the model understands the CONTEXT of words — "bank" means something different in "river bank" vs. "bank account."
-
Emergent capabilities: Large enough models exhibit capabilities that were not explicitly trained, such as reasoning, following complex instructions, and chain-of-thought problem solving.
Key LLM Concepts
| Concept | Definition | Example |
|---|---|---|
| Token | A unit of text (word, subword, or character) processed by the model | "Hello world" = 2 tokens; "unbelievable" might = 3 tokens |
| Context window | Maximum number of tokens the model can process at once | GPT-4o: 128,000 tokens (~96,000 words) |
| Parameters | Internal values learned during training | GPT-4: estimated hundreds of billions of parameters |
| Temperature | Controls randomness of output (0 = deterministic, 1 = creative) | Temperature 0 for factual answers; 0.7 for creative writing |
| Inference | The process of generating output from the model | Sending a prompt and receiving a response |
The Transformer Architecture
The Transformer is the neural network architecture that makes modern generative AI possible. It was introduced in the 2017 paper "Attention Is All You Need."
Why Transformers Changed Everything
Before Transformers, language models processed text sequentially (one word at a time). Transformers introduced self-attention, which allows the model to consider ALL words in the input simultaneously:
| Old Approach (RNNs) | Transformer Approach |
|---|---|
| Process words one at a time (sequential) | Process all words simultaneously (parallel) |
| Struggle with long-distance relationships | Capture relationships across entire text |
| Slow to train | Fast to train (parallelizable on GPUs) |
| Limited context | Very long context windows |
Self-Attention Explained (Conceptual)
Self-attention allows each word to "attend to" (consider the relevance of) every other word in the input:
Example: "The cat sat on the mat because it was tired."
When processing "it," self-attention determines that "it" refers to "cat" (not "mat") by calculating the relevance of every other word. This is how the model understands context and references.
Pre-Training and Fine-Tuning
Pre-Training
- The foundation step: train the model on massive, general-purpose text data
- The model learns language patterns, facts, and reasoning
- Requires enormous compute resources (thousands of GPUs for weeks)
- Creates a "foundation model" with broad capabilities
Fine-Tuning
- Adapt the pre-trained model for a specific task or domain
- Uses a smaller, task-specific dataset
- Much less compute required than pre-training
- Examples: fine-tune for medical Q&A, legal document analysis, customer service
Prompt Engineering (No Training)
- Use the pre-trained model AS-IS by crafting effective prompts
- No model modification needed
- The fastest way to customize model behavior
- This is how most people use generative AI
On the Exam: Know the progression: pre-training (expensive, builds the foundation) → fine-tuning (moderate, adapts for a domain) → prompt engineering (cheap, customizes via instructions). The AI-900 focuses mainly on prompt engineering since it does not require ML expertise.
Limitations of Generative AI
1. Hallucinations
The model generates information that sounds plausible but is factually incorrect. The model does not "know" facts — it predicts statistically likely text.
Example: "Albert Einstein won the Nobel Prize in Physics in 1922 for his work on relativity."
- Sounds plausible but is wrong — Einstein won in 1921, and it was for the photoelectric effect, not relativity.
2. Bias
Models inherit biases present in their training data. If the training data contains stereotypes or underrepresentation, the model may perpetuate these.
3. Knowledge Cutoff
Pre-trained models have a knowledge cutoff date — they do not know about events after their training data was collected. This is why grounding and RAG are important.
4. Inconsistency
The same prompt can produce different responses depending on temperature settings and random seed, making outputs non-deterministic.
5. Prompt Injection
Malicious users can craft prompts that trick the model into ignoring its instructions or producing harmful content.
On the Exam: Hallucination is the most commonly tested limitation. Know that generative AI can produce false information that sounds convincing, and that RAG (Retrieval-Augmented Generation) and grounding are key techniques to reduce hallucinations.
What is the key difference between generative AI and traditional AI?
What is a "hallucination" in the context of generative AI?
What does "GPT" stand for in models like GPT-4o?
Which feature of the Transformer architecture allows it to understand context by considering the relevance of all words in the input simultaneously?
Which approach to customizing generative AI is the fastest and least expensive?