3.5 Practitioner MLOps: Repeatability, Monitoring, and Retraining
Key Takeaways
- MLOps is the discipline of making ML work repeatable, governed, monitored, and improvable after launch.
- Practitioners should understand artifacts, versions, approvals, monitoring, rollback, and retraining triggers without needing to build pipelines.
- Repeatability protects the business because teams must know which data, code, parameters, model version, and approval produced a production model.
- Retraining should be triggered by evidence such as drift, new labels, changed business rules, incidents, or scheduled refresh needs.
MLOps without overbuilding
MLOps combines ML development practices with operations, governance, automation, monitoring, and continuous improvement. For the AWS Certified AI Practitioner, this is a recognition topic. You should know why MLOps matters and what good controls look like, but you are not expected to build SageMaker Pipelines, implement CI/CD infrastructure, or tune training jobs in detail.
A model is not just a file. A production ML system includes data sources, preprocessing steps, feature definitions, training code or service configuration, model artifacts, evaluation results, approvals, deployment configuration, monitoring rules, logs, user feedback, and retraining decisions. If the team cannot explain which version is live and why it was approved, the organization has an operational risk.
Repeatability means the team can trace how a model was created and reproduce the process when needed. This does not always mean every training run gives bit-for-bit identical results, because some algorithms and distributed systems have randomness. It means the organization tracks enough information to understand the lineage: dataset version, code version, configuration, parameters, training environment, evaluation report, approver, deployment time, and rollback target.
| MLOps concern | Practitioner question | AWS context to recognize |
|---|---|---|
| Versioning | Which model, data, and code version is live? | SageMaker Model Registry, S3 versioning, source control |
| Approval | Who accepted the model for production? | Model registry approval states, change management |
| Deployment | How is the model served or consumed? | SageMaker endpoints, batch transform, managed AI API, application integration |
| Monitoring | What alerts if behavior changes? | CloudWatch, SageMaker Model Monitor, application logs |
| Governance | Is access least privilege and auditable? | IAM, CloudTrail, KMS, tagging, data governance |
| Retraining | What evidence triggers refresh? | Drift, new labels, scheduled review, incidents |
| Rollback | What happens if the model fails? | Previous model version, rules fallback, manual process |
Monitoring as an operating agreement
Monitoring should be defined before launch, not after complaints arrive. Technical teams may monitor latency, error rates, endpoint CPU or GPU utilization, failed invocations, and request volume. Data teams may monitor missing values, schema changes, feature distributions, and prediction drift. Business teams may monitor conversion, fraud loss, review volume, customer satisfaction, or false-positive complaints.
The practitioner should insist on owners for each metric. An alarm that no one reviews is theater. A dashboard with no threshold is weak. A model that collects feedback but never uses it is not improving. The operating agreement should say what is measured, where it is measured, who responds, how quickly they respond, and what action they can take.
Monitoring also includes responsible AI concerns. If outputs affect people differently across segments, the team needs a way to measure and address that. If a generative AI application produces unsafe or unsupported responses, the team needs logging, user feedback, guardrails, and escalation. If sensitive data appears in prompts or logs, security and privacy teams need retention and access controls.
Retraining triggers
Retraining is the process of updating a model with new or improved data. It should not happen randomly or only because a calendar says so. A calendar may be useful for review, but retraining should be tied to evidence and business need. Common triggers include performance degradation, detected drift, new labeled data, changed product catalogs, changed fraud patterns, policy changes, new regions, new customer segments, or a major incident.
Retraining can also create risk. A new model can perform worse than the old one, break segment fairness, change user behavior, increase cost, or fail compliance review. The team should compare the candidate model with the current production model, review business metrics, approve the change, deploy in a controlled way, and keep a rollback option. For high-risk workflows, human review and phased rollout are often appropriate.
Retraining decision workflow
1. Detect drift, performance change, new labels, or business rule change.
2. Confirm the issue is not a data pipeline or application bug.
3. Prepare an updated dataset under approved governance controls.
4. Train or update the candidate model.
5. Evaluate against the current production model and segment requirements.
6. Approve, deploy, monitor, and keep rollback available.
Governance and security in the MLOps loop
MLOps is not only a data science practice. It intersects with security, compliance, and cost management. IAM should restrict who can access training data, invoke endpoints, approve models, update pipelines, and read logs. AWS KMS can protect data and artifacts with encryption keys. CloudTrail can help audit API activity. CloudWatch can collect logs and metrics. Tags can support ownership and cost reporting. Data governance tools such as Lake Formation can help control access to shared datasets.
Cost matters because ML workloads can scale quickly. Training jobs can use expensive compute. Real-time endpoints can run continuously. Foundation model calls can grow with token volume. Batch jobs can process more records than expected. A practitioner should ask for budget alarms, usage forecasts, and right-sized deployment patterns.
What good practitioner MLOps looks like
A healthy MLOps plan is understandable to business and technical stakeholders. It names the production model version, the data lineage, the approval path, the monitoring metrics, the responsible owner, the incident response path, and the retraining criteria. It also explains when the model should not be used.
A weak plan says the model is done after launch. It has no baseline, no owner, no drift monitoring, no feedback loop, no rollback plan, and no cost controls. In that situation, the practitioner should not approve expansion. The right next step is to define operating controls, narrow the use case, or keep the model in pilot until the team can run it responsibly.
Which statement best describes practitioner-level MLOps?
A model has been in production for six months and input distributions have shifted. What is the best next step?
Which artifact is most useful for proving which model version was approved for production?