Continuous evaluation, always signed.
Assurance does not end at deployment. Watch monitors production AI for drift, degradation, and policy breach — and turns every signal into signed, append-only evidence.
Every AI deployment degrades over time. What matters is whether the degradation becomes evidence, or whether it becomes a surprise during an examiner visit.
What Watch does
Watch reads the signed decision stream from the Attest plane and continuously evaluates three things: distribution drift in the inputs and outputs, scheduled evaluations against customer-defined test batteries, and policy-enforcement deviations where the system behaved differently from its declared governance posture. Every detection becomes a signed, immutable record — not a log line that could be rewritten.
This matters because regulators increasingly ask the same question after an AI-related incident: when did you first know? Watch produces a cryptographic answer. The first drift record, signed and logged at the moment it was detected, becomes the timestamped evidence of when the operator had notice.
The three signal sources
Drift detection
Watch runs statistical drift tests against your live decision stream. Population Stability Index for categorical features. Kolmogorov-Smirnov for numerical features. Configurable thresholds per feature, per system, per jurisdiction. A breach produces a signed drift record that references the specific decisions that triggered it — not a hash of the decisions, the actual inclusion proofs in the transparency log, so the drift finding is forensically tied to the underlying events.
Continuous evaluation
Customer-defined test batteries run on a schedule you set: hourly, daily, weekly, or on events. Each eval result is a signed record. A battery might include fair-lending tests, adverse-impact checks against protected groups, red-team prompts for jailbreak detection, or domain-specific correctness tests written by your model risk team. Evaluation failures become eval-failure records signed with the same key as decision records — the evidence chain is unbroken.
Incident records
When a production AI incident occurs — policy breach, customer complaint, regulatory inquiry, internal escalation — Watch generates an incident record that links every affected decision, every drift signal, every eval failure in the time window. The resulting incident record is signed before the postmortem is even written. The postmortem itself is signed when complete. Neither can be retroactively altered.
What it catches in practice
- Covariate drift — your customer population shifts, but your underwriting model does not. Watch flags the feature-level drift before the adverse-impact numbers do.
- Label drift — ground truth is moving and your model is not retraining fast enough. The eval battery catches degradation in specific segments first.
- Policy erosion — a policy update was rolled back for one region, and a specific system is now enforcing an older rule set than your governance tier claims. Watch flags the enforcement/declaration mismatch.
- Silent model swap — a model version was changed upstream without the governance process. Watch detects the model-hash change and requires an acknowledgment before signing resumes.
- Prompt injection at scale — a pattern of inputs is successfully extracting information outside the declared policy. Eval batteries and drift detection together catch it within minutes.
Alerting and routing
Detection is only half the job; routing is the other half. Watch integrates with PagerDuty, Opsgenie, Slack, and Microsoft Teams for primary alerting. Beyond routing, every breach triggers a signed notification record — so if an alert was sent and no one responded, that fact is itself evidenced. The lack of acknowledgment is as discoverable as the alert itself.
Regulatory fit
Watch directly satisfies specific obligations across the major frameworks:
- EU AI Act Article 72 (post-market monitoring) — the signed drift stream and incident records are exactly the artifact the article requires.
- NIST AI RMF MEASURE-2.7 and MANAGE-4 — continuous monitoring and incident response are mapped directly to Watch outputs.
- SR 11-7 ongoing monitoring — bank examiners expect evidence that the model is being watched, not just that a policy says it should be. Watch produces the evidence.
- ISO 42001 performance evaluation clauses — Watch outputs slot directly into AIMS evidence requirements.