2.5 Deep Learning and Neural Networks
Key Takeaways
- Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers to learn complex patterns from data.
- A neural network has an input layer (features), one or more hidden layers (processing), and an output layer (prediction).
- Deep learning excels at complex tasks like image recognition, speech processing, and language understanding where traditional ML struggles.
- The Transformer architecture (introduced in 2017) is the foundation of modern large language models (GPT, BERT) and revolutionized NLP and generative AI.
- The AI-900 tests conceptual understanding of neural networks — you do not need to know the mathematics or how to build them.
Deep Learning and Neural Networks
Quick Answer: Deep learning uses neural networks with multiple layers to learn complex patterns. Neural networks have input layers (features), hidden layers (processing), and output layers (predictions). The Transformer architecture powers modern large language models like GPT. Deep learning excels at image recognition, speech, and language tasks.
What Is Deep Learning?
Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers (hence "deep") to learn complex patterns from large amounts of data. While traditional ML algorithms work well for structured data with clear features, deep learning shines on unstructured data like images, audio, and text.
When to Use Deep Learning vs. Traditional ML
| Characteristic | Traditional ML | Deep Learning |
|---|---|---|
| Data type | Structured (tables, spreadsheets) | Unstructured (images, text, audio) |
| Data volume | Works with smaller datasets | Requires large datasets |
| Feature engineering | You define the features | Model discovers features automatically |
| Compute requirements | Moderate | High (GPUs often required) |
| Interpretability | Generally more interpretable | Often a "black box" |
| Best for | Tabular data, simple patterns | Images, speech, language, complex patterns |
How Neural Networks Work (Conceptual)
A neural network is inspired by the human brain — it consists of interconnected nodes (neurons) organized in layers:
Network Structure
Input Layer Hidden Layers Output Layer
(features) (processing) (prediction)
[x₁] ─────┐
├──→ [h₁] ──→ [h₄] ──┐
[x₂] ─────┤ ├──→ [y]
├──→ [h₂] ──→ [h₅] ──┤
[x₃] ─────┤ │
└──→ [h₃] ──→ [h₆] ──┘
The Three Layer Types
| Layer | Role | Example |
|---|---|---|
| Input Layer | Receives raw data (features) | Pixel values of an image (28x28 = 784 inputs) |
| Hidden Layers | Process and transform data | Each layer learns increasingly complex features |
| Output Layer | Produces the final prediction | Class probabilities (90% cat, 8% dog, 2% bird) |
How Learning Happens
- Forward pass: Input data flows through the network, each neuron applies a mathematical transformation
- Loss calculation: The network's prediction is compared to the actual answer — the difference is the "loss"
- Backpropagation: The network adjusts its internal weights (connections between neurons) to reduce the loss
- Repeat: This process repeats thousands or millions of times with different training examples
- Convergence: Eventually, the network's predictions become accurate
On the Exam: You do NOT need to understand the mathematics of neural networks. Know that deep learning uses layers of neurons that process data, learn from errors (backpropagation), and improve over time with more data and training.
What Makes Deep Learning "Deep"
The "deep" in deep learning refers to having multiple hidden layers. Each layer learns different levels of abstraction:
Example: Image recognition of a face
- Layer 1: Detects edges and simple shapes
- Layer 2: Combines edges into facial features (eyes, nose, mouth)
- Layer 3: Combines features into facial structures
- Layer 4: Recognizes specific faces
More layers allow the network to learn more complex patterns — this is why deep learning outperforms traditional ML on complex tasks like image and speech recognition.
The Transformer Architecture
The Transformer is a neural network architecture introduced in 2017 that revolutionized NLP and is the foundation of modern AI:
Key Features of Transformers
| Feature | Description | Why It Matters |
|---|---|---|
| Self-attention | The model can focus on different parts of the input simultaneously | Understands context and relationships between words |
| Parallel processing | Processes all input positions at once (not sequentially) | Much faster training than previous architectures |
| Scalability | Can be scaled to billions of parameters | Enables large language models like GPT-4o |
| Transfer learning | Pre-trained on massive datasets, then fine-tuned for specific tasks | Reduces the data needed for specific applications |
Transformers in Modern AI
| Model | Type | Creator | Application |
|---|---|---|---|
| GPT-4o | Generative Pre-trained Transformer | OpenAI | Text and image generation |
| BERT | Bidirectional Encoder Representations | Text understanding, search | |
| DALL-E | Image generation Transformer | OpenAI | Image creation from text prompts |
| Whisper | Audio Transformer | OpenAI | Speech recognition |
| Copilot | AI assistant built on GPT models | Microsoft | Code generation, productivity |
On the Exam: You need to know that Transformers are the architecture behind modern language models (GPT, BERT) and that they use self-attention to understand context. You do NOT need to understand the mathematical details of attention mechanisms.
Deep Learning Applications on Azure
| Application | Azure Service | Deep Learning Role |
|---|---|---|
| Image classification | Azure AI Vision | Convolutional neural networks (CNNs) |
| Object detection | Azure AI Vision, Custom Vision | CNNs with region detection |
| Speech recognition | Azure AI Speech | Recurrent and Transformer networks |
| Text analysis | Azure AI Language | Transformer-based models |
| Text generation | Azure OpenAI Service | GPT Transformer models |
| Image generation | Azure OpenAI Service | Diffusion and Transformer models |
What makes deep learning "deep"?
Which neural network architecture is the foundation of modern large language models like GPT-4o?
Which type of data is deep learning MOST effective at processing compared to traditional machine learning?
In a neural network, which layer produces the final prediction?