Career upgrade: Learn practical AI skills for better jobs and higher pay.
Level up
All Practice Exams

100+ Free SRE Practitioner (SREP) Practice Questions

Pass your Site Reliability Engineering (SRE) Practitioner Certification (SREP) exam on the first try — instant access, no signup required.

✓ No registration✓ No credit card✓ No hidden fees✓ Start practicing immediately
100+ Questions
100% Free
1 / 100
Question 1
Score: 0/0

In a distributed system using the saga pattern for multi-step transactions, what reliability mechanism is needed to handle partial failures?

A
B
C
D
to track
2026 Statistics

Key Facts: SRE Practitioner (SREP) Exam

26/40

Passing Score

65% — PeopleCert

90 min

Exam Duration

PeopleCert

SRE Foundation

Prerequisite

PeopleCert / DevOps Institute

3 years

Certification Validity

PeopleCert renewal model

30-60 hrs

Typical Study Time

Practical estimate

Practitioner

Difficulty Level

DevOps Institute (above Foundation)

40 MCQ

Exam Format

PeopleCert

SREP requires 65% on 40 multiple-choice questions in 90 minutes. SRE Foundation is a mandatory prerequisite. Topics skew applied and scenario-based: error budget burn rate math, tiered release policies, toil automation ROI, full-stack observability design, incident command, blameless postmortems, chaos engineering, AIOps, platform engineering, and SRE team scaling.

Sample SRE Practitioner (SREP) Practice Questions

Try these sample questions to test your SRE Practitioner (SREP) exam readiness. Each question includes a detailed explanation. Start the interactive quiz above for the full 100+ question experience with AI tutoring.

1Your team's payment service has a 99.9% monthly availability SLO. The error budget is nearly exhausted with 10 days remaining. The product team wants to ship a major database migration. What is the SRE-aligned action?
A.Approve the migration since the SLO is still technically being met
B.Freeze non-emergency deployments until the budget resets at month end
C.Negotiate a temporary SLO relaxation with stakeholders and proceed
D.Deploy during the maintenance window and ignore the error budget
Explanation: When an error budget is nearly exhausted, the SRE model mandates halting risky changes to protect reliability. The error budget is the governance mechanism that gives both dev and ops a shared, objective signal: budget gone means reliability work takes priority over feature velocity. Deploying anyway would violate the agreed-upon error budget policy.
2An SRE team identifies that restarting a specific microservice every morning takes 25 minutes of manual effort and is entirely scriptable. By which SRE definition is this work classified?
A.Overhead
B.Toil
C.Technical debt
D.Incident response
Explanation: Toil is manual, repetitive, automatable, tactical work devoid of enduring value that scales linearly with service growth. Daily microservice restarts fit all five characteristics perfectly. Google SRE recommends keeping toil below 50% of an SRE's time and actively automating it away.
3A service's SLI is request latency at the 99th percentile. The SLO is p99 < 300ms. Over the past week p99 was 320ms. What has occurred?
A.The SLA has been violated
B.The error budget has been fully consumed
C.The SLO has been missed, burning error budget
D.The SLI measurement is invalid and must be recalibrated
Explanation: An SLO miss means the service performed below the agreed objective, which burns into the error budget for that period. An SLA violation only occurs if the SLO miss triggers a contractual obligation (usually customer-facing). Missing the SLO for one week does not necessarily mean the budget is fully consumed — it depends on the budget window and allowed downtime.
4Your organization wants to set SLOs that align with actual user expectations rather than internal technical metrics. Which approach BEST achieves this?
A.Set SLOs based on what the infrastructure can reliably deliver today
B.Derive SLOs from user journey critical paths and acceptable degradation thresholds
C.Copy SLOs from industry benchmarks like 99.9% availability
D.Let the ops team define SLOs independently from product owners
Explanation: Business-aligned SLOs start from user journeys: identify the user actions that matter most (checkout, login, data export), measure what good looks like from their perspective, and set the SLO at the threshold where users notice degradation. Infrastructure-derived SLOs often miss what users actually experience.
5During an error budget review, the team notices that 80% of budget consumption came from a single weekly batch job that errors during processing. The SLO window is 30 days. What is the MOST effective remediation?
A.Widen the SLO window to 90 days to absorb the spikes
B.Exclude batch job errors from the SLI measurement entirely
C.Fix or make the batch job more reliable; also consider a separate SLO for batch vs. interactive
D.Reduce the SLO target from 99.9% to 99.5% to accommodate the batch failures
Explanation: The correct SRE response is to fix the root cause (batch job reliability) and optionally separate SLOs for different workload types since interactive and batch traffic often have different user expectations and risk tolerances. Widening the window or lowering the target simply hides the problem.
6A team measures toil at 60% of weekly engineering hours. The SRE handbook recommends keeping toil below what threshold?
A.25%
B.50%
C.70%
D.There is no recommended threshold
Explanation: Google's SRE book establishes 50% as the threshold: if an SRE spends more than 50% of time on toil, the team is not able to invest enough in engineering work to reduce future toil. At 60%, the team has exceeded this threshold and must escalate toil reduction as a priority.
7Which observability pillar provides the BEST ability to reconstruct the path of a single user request across multiple microservices?
A.Metrics
B.Logs
C.Distributed traces
D.Dashboards
Explanation: Distributed tracing propagates a unique trace ID through every service a request touches, creating a timeline of spans that shows exactly where latency or errors occurred in the call chain. Metrics provide aggregate signals and logs provide per-service event records, but neither reconstruct end-to-end request paths by themselves.
8An observability-driven development (ODD) team mandates that every new service must ship with SLIs, dashboards, and runbooks BEFORE going to production. Which principle does this exemplify?
A.Shift-left reliability
B.Defense in depth
C.Zero-trust security
D.Toil elimination
Explanation: Shift-left reliability means building observability and reliability practices into the development lifecycle rather than retrofitting them after deployment. ODD embeds SLIs, instrumentation, dashboards, and runbooks as development acceptance criteria, catching reliability gaps before they reach production.
9A distributed system experiences a spike in p99 latency. Metrics show CPU and memory are normal. Which observability action is MOST likely to identify the root cause?
A.Check error rate dashboards from the past 30 days
B.Review distributed traces for the high-latency requests to find the slow span
C.Redeploy all services to clear any transient state
D.Reduce the SLO target temporarily to mask the issue
Explanation: When aggregate infrastructure metrics are normal but latency is high, distributed tracing is the most targeted tool: it isolates which specific service, database call, or external API is the slow span causing the p99 spike. This is the core use case for distributed tracing in microservice architectures.
10Which combination of telemetry types forms the three 'pillars of observability'?
A.Metrics, logs, and traces
B.Dashboards, alerts, and runbooks
C.Synthetic tests, RUM, and APM
D.CPU, memory, and disk I/O
Explanation: The three pillars of observability — metrics (aggregated numeric signals), logs (discrete event records), and traces (distributed request paths) — together provide comprehensive visibility into system behavior. Each pillar addresses a different dimension of system understanding that the others cannot fully replace.

About the SRE Practitioner (SREP) Exam

The SRE Practitioner (SREP) certification is the advanced follow-on to SRE Foundation, requiring Foundation as a prerequisite. It tests applied, scenario-based mastery of SRE practices: designing SLOs aligned to business outcomes, governing release velocity through error budget policies, eliminating toil systematically, implementing full-stack observability, running incident command, building resilient systems by design, and scaling SRE organizations. SREP also covers current practices including platform engineering, AIOps, Generative AI for SRE, and progressive delivery.

Questions

40 scored questions

Time Limit

90 minutes

Passing Score

65% (26/40)

Exam Fee

Check peoplecert.org for current pricing (PeopleCert / DevOps Institute)

SRE Practitioner (SREP) Exam Content Outline

20%

SLOs, Error Budgets, and Business Alignment

Applied SLO design from user journeys, error budget policy tiers, burn rate calculation, multi-window alerting (fast + slow burn), and SLO calibration over time.

15%

Toil Identification and Elimination

Toil measurement and prioritization (frequency × duration), automation-ROI cases, self-healing runbooks, auto-remediation, and toil normalization / treadmill anti-patterns.

20%

Full-Stack Observability

Three pillars (metrics/logs/traces), distributed tracing for microservices, synthetic vs RUM, observability-driven development, full-stack instrumentation requirements.

15%

Monitoring and Alerting Strategy

Symptom-based vs cause-based alerts, SLO burn rate alert design, alert rationalization, predictive trend alerts, alert fatigue reduction, on-call health metrics.

15%

Incident Management

Incident command framework, severity classification with error budget context, blameless postmortems, contributing factor analysis, MTTR optimization, runbook design.

10%

Platform Engineering and AIOps

Internal developer platforms, golden paths, progressive delivery (canary, feature flags), policy-as-code gates, value stream management, AIOps, Generative AI for SRE.

5%

SRE Team Organization and Scaling

Centralized, embedded, and consulting SRE models; SRE engagement and offboarding model; production readiness reviews; on-call sustainability and interrupt budgets.

How to Pass the SRE Practitioner (SREP) Exam

What You Need to Know

  • Passing score: 65% (26/40)
  • Exam length: 40 questions
  • Time limit: 90 minutes
  • Exam fee: Check peoplecert.org for current pricing

Keys to Passing

  • Complete 500+ practice questions
  • Score 80%+ consistently before scheduling
  • Focus on highest-weighted sections
  • Use our AI tutor for tough concepts

SRE Practitioner (SREP) Study Tips from Top Performers

1Practice burn rate math until it is automatic: 14x burn rate exhausts a 30-day budget in ~2 days; 1x means exactly on track for the window
2Memorize the five toil characteristics (manual, repetitive, automatable, tactical, no enduring value, scales linearly) — exam scenarios use these to classify work
3Distinguish MTTD, MTTR, MTBF, and MTTF precisely; SREP scenarios often hinge on which metric is being measured and which lever improves it
4Know all three pillars of observability and when each is the right tool: metrics for aggregates, logs for events, traces for request paths across services
5Error budget policy tiers are heavily tested: >50% remaining = free deployment, <50% = SRE review, 0% = emergency only
6For platform engineering questions, anchor on the primary benefit: eliminating fragmentation and inconsistency across teams, not just 'making things faster'
7AIOps in SREP is always augmentation, never replacement; human judgment remains required for novel incidents and consequential decisions
8Study the SRE engagement model: PRR for onboarding, offboarding/handback for mature services, consulting model for scale without embedding

Frequently Asked Questions

What is the SRE Practitioner (SREP) prerequisite?

SRE Foundation certification is a mandatory prerequisite for SREP. Candidates must hold the SRE Foundation credential from PeopleCert / DevOps Institute before they can sit the Practitioner exam. There is no formal exception; the Foundation builds the conceptual base that Practitioner-level scenarios build upon.

How is SREP different from SRE Foundation?

SRE Foundation introduces core SRE concepts: reliability principles, SLIs/SLOs, toil, monitoring basics, and organizational impact. SRE Practitioner tests applied mastery: designing SLOs from user journeys, writing tiered error budget policies, implementing full-stack observability architecture, running incident command, automating toil with validation, using chaos engineering for resilience, applying AIOps, and scaling SRE across dozens of teams. Practitioner questions are scenario-based, not definitional.

How many questions are on the SREP exam?

The SRE Practitioner exam has 40 multiple-choice questions delivered in 90 minutes with a 65% passing score (26/40). This is the same question count as SRE Foundation but with longer time (90 vs 60 minutes), reflecting the more complex applied scenarios.

What topics get the most questions on SREP?

The heaviest SREP topic areas are full-stack observability and SLO/error budget application (approximately 20% each), followed by monitoring/alerting strategy, toil management, and incident management (approximately 15% each). Platform engineering, AIOps, and organizational topics make up the remainder. Focus your preparation on applied scenarios rather than pure definitions.

How long should I study for SREP?

Candidates with SRE Foundation and active production operations experience typically need 30-60 study hours over 4-8 weeks. Focus on: burn rate math (practice the calculations), error budget policy design, blameless postmortem facilitation, distributed tracing use cases, chaos engineering methodology, platform engineering concepts (golden paths, IDPs), and AIOps/Generative AI integration patterns. Aim for 80%+ on practice tests before booking.

Is SREP an open-book exam?

Check current PeopleCert exam delivery guidelines at peoplecert.org; exam format details can change. SRE Foundation is described as open book, and SREP candidates should verify whether the same applies to Practitioner. Regardless, practitioner-level scenario questions test judgment and reasoning, not memorization, so open-book access provides limited advantage without deep understanding.