11.5 Systems Practice, Program Evaluation, and Quality Improvement

Key Takeaways

  • Systems practice asks how policies, workflows, incentives, culture, and resources shape client outcomes, not just symptoms.
  • Program evaluation pairs a narrow, answerable question with appropriate indicators, data sources, stakeholder input, and ethical reporting.
  • Quality improvement uses iterative Plan-Do-Study-Act cycles with feasible, measurable, culturally responsive changes tied to client welfare.
  • Part 2 scenarios may require system-level thinking even when the presenting problem looks like an individual case.
Last updated: June 2026

Seeing the System Around the Client

Systems practice recognizes that client outcomes are shaped by more than symptoms or technique. Referral pathways, staffing, language access, transportation, insurance rules, waitlists, documentation systems, school discipline practices, and community trust can all determine whether services work. Part 2-Skills may frame this as consultation, supervision, or program evaluation, but the tested skill is the same: understand the system before recommending change.

First, define the level of intervention. Some problems are clinical (a client needs a different plan), some supervisory (trainees assess risk inconsistently), some organizational (a clinic cannot track no-shows), and some community-level (a program never reaches a linguistic minority). A good answer targets the level the facts support.

Systems questionData to considerRisk if ignored
Who is not being served?Demographics, referral sources, waitlists, language needsInequitable access and unmet need
Are services effective?Symptom measures, functioning, retention, client goalsContinuing ineffective practice
Are procedures followed?Chart audits, supervision notes, incident reportsSafety failures and poor continuity
Do stakeholders trust it?Client feedback, staff input, community advisory dataLow engagement, weak implementation
What constrains change?Staffing, training, budget, technology, policyRecommendations that cannot be sustained

Program Evaluation Starts With an Answerable Question

Instead of asking whether a program is good, ask whether a new intake process reduced time to first appointment, whether a trauma group improved functioning among completers, or whether a supervision change improved documentation of risk. A narrow question drives the indicators and prevents overclaiming. Distinguish formative evaluation (improve a program in progress) from summative evaluation (judge overall outcomes or worth) and needs assessment (identify gaps before designing services).

Ethical program evaluation attends to consent, privacy, data security, conflict of interest, and fair interpretation. If data were collected for operations, clients must not be misled into thinking they received individualized assessment results. If findings could harm a group or staff member, report accurately while protecting confidentiality and avoiding blame the data do not support.

Quality Improvement and the PDSA Cycle

Quality improvement (QI) typically uses iterative Plan-Do-Study-Act (PDSA) cycles: plan a small change, implement it, study the results against a measure, then act to adopt, adapt, or abandon. QI recommendations must be practical. A clinic with no bilingual staff may need interpreter protocols, hiring goals, translated materials, referral partnerships, and training. A hospital service with inconsistent discharge communication may need a role-specific checklist, an electronic-health-record prompt, and audit-and-feedback.

An Evaluation Workflow for Part 2

  1. Define the system problem and the decision the evaluation should support.
  2. Identify stakeholders, including the clients or communities affected.
  3. Select feasible indicators that match the question.
  4. Protect confidentiality, consent, and data security.
  5. Analyze patterns without overstating causation.
  6. Report findings in accessible language with stated limitations.
  7. Recommend measurable changes and a follow-up plan.

Systems issues hide inside individual cases. A trainee who misses risk documentation may have a competence gap, but the clinic templates and policy may also be unclear. A client who misses sessions may be ambivalent, but may face transportation, disability, or scheduling barriers. ASPPB's Part 2 orientation is applied decision-making: the defensible move is rarely to blame one person quickly. It is to define the problem, gather relevant data, include stakeholders, protect ethics, and recommend changes proportionate to the evidence.

Distinguishing Evaluation, Research, and Operations

A boundary the EPPP tests is the line between program evaluation, human-subjects research, and routine quality monitoring, because each carries different consent and oversight rules. Quality monitoring and internal program evaluation conducted to improve services usually do not require Institutional Review Board approval, but the moment a project is designed to produce generalizable knowledge for publication or external use, it becomes research and triggers informed consent, IRB review, and the full research-ethics framework.

Misclassifying a generalizable study as mere operations, and thereby skipping consent and review, is a recurring wrong answer. When a stem hints that findings will be presented as research, the safer choice routes the project through IRB and informed-consent procedures.

Reporting Without Overclaiming

The last common failure is interpretive overreach. Program data are usually observational, so a psychologist should describe associations, not prove causation, and should report effect sizes and limitations honestly rather than spotlight only flattering numbers. If administrators want a clean success story, the ethical evaluator still reports dropout, missing data, and subgroup gaps. The strongest Part 2 answers in this area pair an accurate, plainly written summary with concrete, feasible recommendations and a plan to re-measure, rather than a confident verdict the data cannot support.

Logic Models and Choosing the Right Indicator

A practical tool that underlies many systems items is the logic model, which maps inputs, activities, outputs, and outcomes so that evaluators measure the right thing at the right level. Outputs (sessions delivered, clients seen) describe activity; outcomes (reduced symptoms, improved functioning, return to work) describe impact. A frequent error is celebrating outputs as if they were outcomes, for example reporting that a clinic ran more groups without checking whether participants actually improved.

When a stem offers competing indicators, prefer the one that maps to the decision the program actually faces, is feasible to collect reliably, and reflects client benefit rather than mere volume.

Consider scope and unintended consequences too. A change that improves one metric can degrade another, such as a faster-intake target that shortens assessments and raises misdiagnosis. Competent systems practice anticipates these trade-offs, includes a balancing measure, and gathers frontline staff and client input before scaling. On the exam, the answer that monitors for unintended effects and includes the people who must live with the change usually outperforms the answer that optimizes a single number in isolation.

Test Your Knowledge

A clinic asks whether its new intake process improved access. Which evaluation question is most useful?

A
B
C
D
Test Your Knowledge

A psychologist tests a new discharge checklist on one unit, measures completion for two weeks, and then refines it before spreading it. This iterative approach best reflects:

A
B
C
D
Test Your Knowledge

A program evaluation finds improvement among clients who completed treatment but high dropout among one language group. What is the best interpretation?

A
B
C
D