2.2 Regression Models

Key Takeaways

  • Regression models predict continuous numerical values — house prices, temperatures, sales volumes, stock prices.
  • Linear regression finds the straight-line relationship between features and the label; it is the simplest regression algorithm.
  • Key evaluation metrics for regression: MAE (average error magnitude), RMSE (penalizes large errors), R-squared (proportion of variance explained, where 1.0 = perfect).
  • R-squared (coefficient of determination) ranges from 0 to 1 — higher values indicate better model fit. An R-squared of 0.85 means the model explains 85% of variance in the data.
  • The AI-900 tests conceptual understanding of regression — you will not be asked to calculate metrics or write regression code.
Last updated: March 2026

Regression Models

Quick Answer: Regression models predict continuous numerical values like prices, temperatures, and quantities. Linear regression finds the straight-line relationship between features and the label. Key evaluation metrics are MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), and R-squared (proportion of variance explained). Higher R-squared values indicate better model performance.

What Is Regression?

Regression is a supervised machine learning technique that predicts a continuous numerical value based on input features. The output is a number on a continuous scale, not a category.

How to Identify a Regression Problem

Ask yourself: Is the predicted output a number on a continuous scale?

  • Predicting a house price ($450,000) → Regression (continuous number)
  • Predicting if an email is spam → Classification (category)
  • Predicting tomorrow's temperature (72.5°F) → Regression (continuous number)
  • Predicting customer segment → Clustering (group)

Linear Regression

Linear regression is the simplest and most fundamental regression algorithm. It finds the straight-line relationship between features and the label.

Simple Linear Regression (One Feature)

With one feature, linear regression fits a straight line through the data:

Formula: y = mx + b

  • y = predicted value (label)
  • x = input value (feature)
  • m = slope (how much y changes when x changes by 1)
  • b = y-intercept (value of y when x = 0)

Example: Predicting house price (y) based on square footage (x)

  • If m = 200 and b = 50,000
  • A 1,500 sq ft house: y = 200(1,500) + 50,000 = $350,000

Multiple Linear Regression (Multiple Features)

With multiple features, the model accounts for several input variables:

Formula: y = m₁x₁ + m₂x₂ + m₃x₃ + ... + b

Example: Predicting house price based on square footage (x₁), bedrooms (x₂), and age (x₃)

Regression Evaluation Metrics

After training a regression model, you evaluate its performance using these metrics:

Mean Absolute Error (MAE)

The average absolute difference between predicted and actual values.

  • Interpretation: On average, how far off are the predictions?
  • Example: MAE of $15,000 means predictions are off by $15,000 on average
  • Lower is better

Root Mean Squared Error (RMSE)

The square root of the average squared differences between predicted and actual values.

  • Interpretation: Similar to MAE but penalizes larger errors more heavily
  • Example: RMSE of $20,000 (larger than MAE because big errors are penalized more)
  • Lower is better

R-squared (Coefficient of Determination)

The proportion of variance in the label that the model explains.

  • Range: 0 to 1 (can be negative for very poor models)
  • Interpretation: How much of the variation in the output does the model explain?
  • Example: R² = 0.85 means the model explains 85% of the variation in house prices
  • Higher is better (1.0 = perfect, 0 = no better than predicting the average)
MetricWhat It MeasuresGood ValuesDirection
MAEAverage prediction errorClose to 0Lower is better
RMSEAverage error (penalizes large errors)Close to 0Lower is better
R-squaredProportion of variance explainedClose to 1.0Higher is better

On the Exam: You will NOT be asked to calculate these metrics. You need to know what each metric means, how to interpret it, and which direction is "better." Common question: "An R-squared of 0.92 indicates that..." → the model explains 92% of the variance in the data.

Common Regression Use Cases

Use CaseFeaturesLabel
House price predictionSize, bedrooms, locationPrice ($)
Sales forecastingMonth, promotions, weatherSales volume
Temperature predictionDate, location, humidityTemperature (°F)
Stock price predictionVolume, market index, newsStock price ($)
Delivery time estimationDistance, traffic, weatherDelivery time (minutes)
Insurance premium pricingAge, health history, coveragePremium ($/month)
Energy consumptionTime, weather, building sizeEnergy (kWh)

Regression vs. Classification: The Key Difference

AspectRegressionClassification
Output typeContinuous numberDiscrete category
Examples$450,000, 72.5°F, 3.7 hoursSpam/Not Spam, Cat/Dog, Yes/No
Question answered"How much?" or "How many?""Which category?" or "Is it X?"
EvaluationMAE, RMSE, R-squaredAccuracy, Precision, Recall, F1
Test Your Knowledge

What does an R-squared value of 0.92 indicate about a regression model?

A
B
C
D
Test Your Knowledge

Which machine learning technique would you use to predict the temperature in a city tomorrow?

A
B
C
D
Test Your Knowledge

Which regression evaluation metric penalizes larger prediction errors more heavily?

A
B
C
D
Test Your Knowledge

A model predicts house prices. The MAE is $25,000 and the R-squared is 0.78. What does this tell you?

A
B
C
D