"Time-to-first-token" (TTFT) is a sub-metric of which quality attribute?

Latency. The section lists time-to-first-token as one of the structural components of latency (alongside inter-token latency, end-to-end latency, and throughput). It measures how quickly the system starts responding, which is a latency concern, not robustness, privacy, or cost.

Which statement about cost as a GenAI quality attribute is correct?

Cost-per-request is tokens (input + output) multiplied by the per-token price, and input and output are often priced separately. The section explains that inference is billed per token, that input and output tokens are usually priced separately (output typically costing more), and that cost-per-request is computed as tokens times the per-token price. Cost is therefore a genuine, testable operational constraint.

A model returns the correct answer, but simply paraphrasing the question or adding a typo flips it to an incorrect answer. Which attribute is weak, and how is it best tested?

Robustness, tested by applying meaning-preserving perturbations and checking whether the output stays stable. As the section states, robustness is the stability of output quality under meaning-preserving perturbations (typos, paraphrases). It is tested with invariance/metamorphic testing that changes the input and checks whether the output holds. The other options measure unrelated attributes.

Robustness, privacy, latency & cost — Free Study Guide 2026

Non-functional quality attributes

Robustness, privacy, latency, and cost describe how a GenAI system behaves as an operational component rather than the meaning of any single answer. They are non-functional attributes: they constrain how the system responds, not what it says. Testers treat them as first-class requirements with explicit thresholds, because a correct answer that is slow, expensive, leaks personal data, or collapses under a typo still fails in production. Each attribute is verified against a target — for example a p95 latency budget or a maximum cost-per-session — just like a performance or security requirement in classic testing.

Robustness

Robustness is the stability of output quality when the input is perturbed. Small, meaning-preserving changes — typos, extra whitespace, a paraphrase, reordered sentences, or a different but equivalent prompt template — should not swing the answer. Robustness also covers adversarial input: prompt injection, distractor text, and out-of-distribution or malformed inputs.

How to measure: apply controlled perturbations to a fixed test set and measure the change in output or metric (invariance testing). Useful signals are the rate at which a correct answer flips to incorrect under perturbation, the variance of outputs across paraphrases, and the success rate of adversarial or injection attacks. Metamorphic testing is the standard technique: define a relation such as "paraphrasing the question must not change the answer," then flag every violation of that relation. Robustness testing must also account for the model's own non-determinism — the same prompt can yield different outputs on repeat calls — so a tester distinguishes acceptable natural variation from a genuine instability triggered by the perturbation, often by sampling several runs per input.

Privacy

Privacy covers protection of personal and sensitive data across the system. Two GenAI-specific concerns dominate: (1) PII leakage — the model emitting names, emails, identifiers, or health data in its output, whether echoed from the prompt or produced from memory; and (2) training-data memorization / extraction — an attacker prompting the model to regurgitate verbatim secrets or copyrighted text it saw in training. Data-protection obligations such as GDPR also govern how prompts and logs are stored and retained.

How to measure: scan inputs and outputs with PII detectors or named-entity recognition and count leaked entities; run extraction attacks that prompt for memorized data and measure verbatim reproduction; and verify that redaction, anonymization, and data-retention controls actually work. A canary or membership-inference approach — inserting a unique marker in training data and testing whether it can later be extracted — quantifies memorization directly.

Latency

Latency is how quickly the system responds — a direct driver of user experience. For streaming LLMs it has internal structure: time-to-first-token (TTFT), the delay before the first token appears; inter-token latency / tokens-per-second, the streaming speed once generation starts; and total (end-to-end) latency for the full response. Throughput is the system-level view: requests or tokens served per second under concurrency.

How to measure: benchmark under representative load and report percentiles (p50, p95, p99) rather than averages, because tail latency drives perceived slowness. Measure TTFT and total latency separately, and test across different prompt and response lengths and concurrency levels, since latency grows with token count and with contention.

Cost

Cost is a genuine test and operational constraint for GenAI because inference is billed per token. Most APIs price input and output tokens separately (output usually costs more), so long prompts, large retrieved contexts, and verbose answers all raise the bill. Cost-per-request and cost-per-user-session therefore become non-functional requirements sitting alongside latency.

How to measure: track tokens per request (input + output) and multiply by the model's per-token price to get cost-per-request; aggregate to cost-per-session and to a projected monthly cost at expected volume. Then compare configurations — smaller model, shorter context, prompt caching, tighter output limits — against quality to find the quality-per-dollar trade-off.

Non-functional attribute table

Attribute	Definition	Key metrics / how to measure
Robustness	Stable quality under input perturbation / adversarial input	Flip rate under perturbation, output variance across paraphrases, injection-attack success; metamorphic/invariance tests
Privacy	No PII leakage or training-data extraction; data protected	PII-detector counts, extraction/canary attacks, redaction & retention checks
Latency	Speed of response	TTFT, inter-token / tokens-per-sec, end-to-end latency, throughput; report p50/p95/p99 under load
Cost	Per-token inference expense as a constraint	Tokens/request x price = cost/request; cost/session; quality-per-dollar comparison

Trade-offs and SLAs

These four attributes rarely move independently. A larger model may improve factuality but worsen latency and cost; aggressive output truncation cuts cost but can hurt coherence; heavy input sanitization improves privacy but adds latency. Testers therefore express them as an SLA-style budget — for example "p95 latency < 3 s, cost-per-session < a set ceiling, zero PII in output" — and check that a change which improves one attribute does not silently break another.

Exam tips: report latency as percentiles under load, never as a single average; remember that input and output tokens are priced separately; and note that robustness is about stability, so it is tested by changing the input and checking whether the output holds.

ISTQB Certified Tester — Testing with Generative AI

ISTQB Generative AI Testing Specialist (CT-GenAI)

2.3 Robustness, privacy, latency & cost

Key Takeaways

Non-functional quality attributes

Robustness

Privacy

Latency

Cost

Non-functional attribute table

Trade-offs and SLAs

ISTQB Certified Tester — Testing with Generative AI

1GenAI Foundations for Testers

2Quality Attributes for GenAI

3Test Design for Non-Determinism

4GenAI Risks & Mitigation

5Test Infrastructure & Tooling

6Organizational Adoption

ISTQB Generative AI Testing Specialist (CT-GenAI)

2.3 Robustness, privacy, latency & cost

Key Takeaways

Non-functional quality attributes

Robustness

Privacy

Latency

Cost

Non-functional attribute table

Trade-offs and SLAs