2.5 Deep Learning and Neural Networks

Key Takeaways

  • Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers to learn complex patterns from data.
  • A neural network has an input layer (features), one or more hidden layers (processing), and an output layer (prediction).
  • Deep learning excels at complex tasks like image recognition, speech processing, and language understanding where traditional ML struggles.
  • The Transformer architecture (introduced in 2017) is the foundation of modern large language models (GPT, BERT) and revolutionized NLP and generative AI.
  • The AI-900 tests conceptual understanding of neural networks — you do not need to know the mathematics or how to build them.
Last updated: March 2026

Deep Learning and Neural Networks

Quick Answer: Deep learning uses neural networks with multiple layers to learn complex patterns. Neural networks have input layers (features), hidden layers (processing), and output layers (predictions). The Transformer architecture powers modern large language models like GPT. Deep learning excels at image recognition, speech, and language tasks.

What Is Deep Learning?

Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers (hence "deep") to learn complex patterns from large amounts of data. While traditional ML algorithms work well for structured data with clear features, deep learning shines on unstructured data like images, audio, and text.

When to Use Deep Learning vs. Traditional ML

CharacteristicTraditional MLDeep Learning
Data typeStructured (tables, spreadsheets)Unstructured (images, text, audio)
Data volumeWorks with smaller datasetsRequires large datasets
Feature engineeringYou define the featuresModel discovers features automatically
Compute requirementsModerateHigh (GPUs often required)
InterpretabilityGenerally more interpretableOften a "black box"
Best forTabular data, simple patternsImages, speech, language, complex patterns

How Neural Networks Work (Conceptual)

A neural network is inspired by the human brain — it consists of interconnected nodes (neurons) organized in layers:

Network Structure

Input Layer          Hidden Layers          Output Layer
(features)           (processing)           (prediction)
   [x₁] ─────┐
              ├──→ [h₁] ──→ [h₄] ──┐
   [x₂] ─────┤                     ├──→ [y]
              ├──→ [h₂] ──→ [h₅] ──┤
   [x₃] ─────┤                     │
              └──→ [h₃] ──→ [h₆] ──┘

The Three Layer Types

LayerRoleExample
Input LayerReceives raw data (features)Pixel values of an image (28x28 = 784 inputs)
Hidden LayersProcess and transform dataEach layer learns increasingly complex features
Output LayerProduces the final predictionClass probabilities (90% cat, 8% dog, 2% bird)

How Learning Happens

  1. Forward pass: Input data flows through the network, each neuron applies a mathematical transformation
  2. Loss calculation: The network's prediction is compared to the actual answer — the difference is the "loss"
  3. Backpropagation: The network adjusts its internal weights (connections between neurons) to reduce the loss
  4. Repeat: This process repeats thousands or millions of times with different training examples
  5. Convergence: Eventually, the network's predictions become accurate

On the Exam: You do NOT need to understand the mathematics of neural networks. Know that deep learning uses layers of neurons that process data, learn from errors (backpropagation), and improve over time with more data and training.

What Makes Deep Learning "Deep"

The "deep" in deep learning refers to having multiple hidden layers. Each layer learns different levels of abstraction:

Example: Image recognition of a face

  • Layer 1: Detects edges and simple shapes
  • Layer 2: Combines edges into facial features (eyes, nose, mouth)
  • Layer 3: Combines features into facial structures
  • Layer 4: Recognizes specific faces

More layers allow the network to learn more complex patterns — this is why deep learning outperforms traditional ML on complex tasks like image and speech recognition.

The Transformer Architecture

The Transformer is a neural network architecture introduced in 2017 that revolutionized NLP and is the foundation of modern AI:

Key Features of Transformers

FeatureDescriptionWhy It Matters
Self-attentionThe model can focus on different parts of the input simultaneouslyUnderstands context and relationships between words
Parallel processingProcesses all input positions at once (not sequentially)Much faster training than previous architectures
ScalabilityCan be scaled to billions of parametersEnables large language models like GPT-4o
Transfer learningPre-trained on massive datasets, then fine-tuned for specific tasksReduces the data needed for specific applications

Transformers in Modern AI

ModelTypeCreatorApplication
GPT-4oGenerative Pre-trained TransformerOpenAIText and image generation
BERTBidirectional Encoder RepresentationsGoogleText understanding, search
DALL-EImage generation TransformerOpenAIImage creation from text prompts
WhisperAudio TransformerOpenAISpeech recognition
CopilotAI assistant built on GPT modelsMicrosoftCode generation, productivity

On the Exam: You need to know that Transformers are the architecture behind modern language models (GPT, BERT) and that they use self-attention to understand context. You do NOT need to understand the mathematical details of attention mechanisms.

Deep Learning Applications on Azure

ApplicationAzure ServiceDeep Learning Role
Image classificationAzure AI VisionConvolutional neural networks (CNNs)
Object detectionAzure AI Vision, Custom VisionCNNs with region detection
Speech recognitionAzure AI SpeechRecurrent and Transformer networks
Text analysisAzure AI LanguageTransformer-based models
Text generationAzure OpenAI ServiceGPT Transformer models
Image generationAzure OpenAI ServiceDiffusion and Transformer models
Test Your Knowledge

What makes deep learning "deep"?

A
B
C
D
Test Your Knowledge

Which neural network architecture is the foundation of modern large language models like GPT-4o?

A
B
C
D
Test Your Knowledge

Which type of data is deep learning MOST effective at processing compared to traditional machine learning?

A
B
C
D
Test Your Knowledge

In a neural network, which layer produces the final prediction?

A
B
C
D