The First 90 Days: Why AI Surveillance Deployments Fail After Go-Live, and What to Do About It | The Vigilant

The first 90 days after go-live are the critical failure window for AI surveillance. 74% of companies struggle to achieve AI value post-deployment — not because of algorithms, but because of operations, governance, and feedback loops that were never designed.

Last week we covered the pilot trap: the structural conditions that make AI surveillance proof-of-concepts succeed on paper while predicting almost nothing about production performance. This week we move forward in the timeline — into the deployment gap itself — to the phase that determines whether a deployment becomes infrastructure or shelfware.

The first 90 days after go-live are the critical failure window for AI surveillance in enterprise security environments. Not because the technology is inherently unstable — but because this is the period in which model assumptions meet a real environment, alert volumes meet real operators, and integration requirements meet real infrastructure, all at the same time, often without the vendor support intensity that existed during the pilot.

The research is clear on the aggregate outcome: 74 percent of companies struggle to achieve and scale AI value after deployment, and the problems that drive that figure are not algorithmic. They are operational, organisational, and governance failures that compound fastest in the first three months — before processes have stabilised, before operators have built trust in the system, and before anyone has had time to establish who actually owns the AI layer in production.

This edition covers why that window is so unstable, what the evidence says about the relationship between early post-deployment management and long-term programme performance, and what high-performing organisations do differently from week one.

The First 90 Days: Why AI Surveillance Deployments Are Won or Lost Before the Fourth Month

There is a consistent pattern in enterprise AI deployments that the physical security sector is now experiencing at scale. A system performs well during evaluation. It goes live. And then, over the following weeks and months, something happens that the pilot never predicted — or rather, many things happen simultaneously, and no one is structured to respond to all of them at once.

By month four, the outcome is usually determined. Not by a catastrophic failure, but by a slow accumulation of unresolved friction: operators who have learned to discount alerts, rules that were never tuned to the real environment, integrations that were partially configured and never finished, and a vendor relationship that has transitioned from an engaged A-team to a standard ticketing queue.

Understanding why requires looking at what actually happens in those first ninety days.

Days 0–30: When the environment fights back

The first month of a live AI surveillance deployment is dominated by a single dynamic: the model meets the real world, and the real world is messier than the pilot assumed.

Most AI video analytics models are trained on data that overrepresents clean conditions — well-lit scenes, representative incident types, cameras selected for their quality rather than their typicality. The pilot ran on a subset of the estate chosen for similar reasons. Production runs on everything: the backlit cameras, the low-bitrate streams from legacy hardware, the loading dock that floods with delivery vehicles at shift change, the atrium that turns into a mirror when the afternoon sun hits the glass wall.

In the first weeks, alert volume typically does not fall — it rises. Systems shipping with generic sensitivity and zone parameters generate noise levels that experienced operators do not expect and are not prepared to manage. Practitioners consistently describe the first month as a period of "recalibration shock": the gap between pilot performance and live performance is visible immediately, and operators form strong early impressions of the system's reliability that are difficult to revise later.

The alert fatigue research is specific about what happens next. SOC data from 2024 puts daily alert volumes at around 3,800 per team, with 62 percent routinely ignored. In one consolidation of survey data across security operations environments, 83 percent of alerts were classified as false positives, and 55 percent of analysts reported missing critical alerts regularly because of volume. These figures describe mature environments. In the first weeks of a new AI deployment, before any tuning has occurred, the dynamics are often worse.

When operators encounter high false positive rates from a new system and do not see rapid improvement, they adapt. They learn to assign lower priority to alerts from that source. They develop informal rules about which alert types to act on and which to disregard. Those adaptations become habitual. The system is still running, still generating alerts — but the organisational response to those alerts has quietly degraded. By the time the pattern shows up in any formal review, it is usually entrenched.

The integration failures are less visible but equally consequential. AI video analytics is rarely a standalone layer. It needs to route events into the VMS, pull context from access control, connect to incident management systems, and feed into whatever reporting infrastructure the security team uses. Gaps in those integrations — which exist in almost every deployment — mean that every ambiguous alert requires manual correlation. Operators are spending time on plumbing that should be invisible, which compounds the fatigue and delays any trust-building with the AI layer itself.

The compounding problem

What makes the first 90 days particularly dangerous is not that any one of these dynamics is fatal on its own. It is that they compound, and they compound faster than most teams anticipate.

Alert fatigue reduces operator engagement. Reduced engagement means less feedback reaching the people responsible for tuning the system. Less feedback means the model continues to fire on the same conditions that generated the false positives in the first place. The model does not improve. Operators conclude the system is not improving. Trust erodes further. The cycle accelerates.

At the same time, governance typically lags deployment. Most organisations deploy first and design oversight structures later. In the absence of defined ownership — a named person accountable for the AI layer's performance, with authority to tune, escalate, and make decisions — problems accumulate without anyone being structurally required to respond. By the time underperformance becomes visible in formal reporting, the behaviours that caused it are already normalised.

NIST's 2026 analysis of challenges in monitoring deployed AI systems — one of the few authoritative documents to address post-deployment AI operations in detail — identifies exactly this pattern. Monitoring obligations are underspecified, roles and responsibilities for post-deployment performance are vague, and the organisational incentives that drove the deployment decision do not extend to the sustained, unglamorous work of keeping the system calibrated and compliant once it is live.

What high-performing organisations do differently

The organisations that successfully navigate the first 90 days share a set of practices that are visible before go-live. They are not primarily about model configuration. They are about operating model design.

The most consistent differentiator is scope discipline. High-performing deployments do not go live across the full camera estate on day one. They start with a defined subset — typically 20 to 30 priority cameras covering three to five high-impact zones — and run in shadow mode for two to three weeks before activating live dispatch. Shadow mode serves two purposes: it baselines the false positive rate against the real environment, and it gives operators time to develop familiarity with alert types before they are required to act on them. Systems that go live broad and fast consistently generate the alert volume and trust dynamics described above. Systems that start narrow and prove performance before expanding avoid them.

The second differentiator is a structured feedback loop. In deployments that stabilise, every false positive is captured, categorised, and converted into a parameter adjustment or rule change — not absorbed as informal operator grumbling. Missed incidents, once identified through post-incident audit, are fed back into the training pipeline. The feedback loop is explicit, owned, and has a defined cadence: typically weekly in the first month, tapering to bi-weekly and then monthly as performance stabilises. Where this loop does not exist, the model and the environment diverge silently. Where it does, each early failure becomes a data point that improves subsequent performance.

Third, high-performing organisations define operator roles as decision-makers, not alert-processors. The research on SOC adoption is direct: analysts who understand why a system fires, know how to calibrate it, and see their feedback reflected in system behaviour develop operational trust. Analysts who are asked to click through a queue without context develop workarounds. In the first 90 days, the quality of operator training — not the technical training on how to use the interface, but the conceptual training on what the model is doing and why — determines whether the feedback loop generates useful signal or noise.

Fourth, escalation thresholds and autonomy boundaries are defined before go-live, not after the first incident. High-performing deployments specify, in writing and in the runbook, which alert types trigger immediate dispatch, which require human verification, and which are logged without action. In European contexts, where EU AI Act high-risk obligations require human oversight measures with genuine override capability, this is also a compliance requirement — but the organisations that handle it well treat it as an operational requirement first. The boundary between AI-assisted prioritisation and human-approved action is set conservatively in the first 90 days and relaxed only as performance evidence accumulates.

Finally, vendor engagement is structured, not assumed. High performers maintain a weekly joint review cadence with their vendor or integrator in the first month — a shared session covering KPI delta, tuning backlog, and integration status — with explicit ownership of action items on both sides. This is distinct from standard support ticketing, which is reactive, episodic, and typically involves different personnel than the implementation team. The transition from high-touch engagement to standard support is one of the most consequential moments in any deployment; high-performing organisations negotiate the timing and terms of that transition explicitly, rather than discovering it has happened when response times lengthen.

What failing deployments look like by day 90

The profile of a deteriorating deployment at the end of the first quarter is consistent across contexts. The system is technically live. Alerts are being generated. But the operational reality has diverged significantly from the business case.

Operators have developed informal triage rules that filter out alert types they have learned to associate with noise. The false positive rate has not improved materially since go-live because no structured tuning process exists. The integration gaps that were present on day one are still present, because no one owns the remediation backlog. The vendor's A-team has moved on. The internal champion who drove the procurement is occupied with other priorities. The named owner of the AI layer's performance — if one was ever defined — has not been given the authority or the tools to intervene effectively.

The system is not failed. It is present and running. But its actual influence on security outcomes is minimal, and the gap between what was promised during the pilot and what is being delivered in production is wide enough to have become a source of internal friction rather than organisational confidence.

This is what "live but not delivering" looks like. It is not a catastrophic outcome. It is a gradual one, and it is the most common outcome in enterprise AI surveillance — precisely because it is invisible until the contract renewal conversation forces someone to account for what the system has actually done.

Industry Signal

The Data Behind the Window

The quantitative picture for post-deployment AI performance in enterprise environments is not encouraging, and the specific timing pattern — failure concentrated in the first weeks and months — is increasingly well-documented.

BCG's October 2024 analysis found that 74 percent of companies struggle to achieve and scale AI value after deployment. The primary causes were not model accuracy: they were data readiness gaps, missing operating model changes, and weak post-deployment governance — the same factors that determine what happens in the first 90 days.

McKinsey's 2024 State of AI survey, covering organisations across sectors, found that 44 percent reported at least one negative consequence from a generative AI deployment in production. The most common issues were accuracy problems, followed by security and explainability failures — categories that map directly onto the alert calibration and operator trust dynamics that dominate early post-deployment experience.

The alert fatigue data is specific. A 2024 meta-analysis of SOC operations found that teams manage an average of 3,832 alerts per day, ignoring approximately 62 percent. Forty-three percent of analysts reported occasionally or frequently turning off alerts entirely. Fifty-five percent reported missing critical alerts regularly due to volume. The Vectra 2024 State of Threat Detection survey found that 54 percent of SOC practitioners felt overwhelmed by alert volume, and explicitly linked this pattern to new tool deployments that increase volume before tuning has caught up — a dynamic that typically peaks in the first 30 to 90 days.

NIST's 2026 analysis of deployed AI monitoring challenges — drawing on 2025 research and workshop data — frames the post-deployment problem as a governance and standards gap rather than a tooling gap. Organisations lack shared frameworks for monitoring roles, drift detection, human-AI feedback loops, and post-deployment incident classification. The result is that systems enter production with vague ownership structures and no defined operating rhythm, which is exactly the condition that allows the 90-day failure window to open.

For European enterprise contexts specifically, ENISA's 2024 threat landscape and NIS investments reports identify misconfiguration, inadequate maintenance, and specialised skills shortages as the primary drivers of deployed system vulnerability — not initial design failures. Seventy-six percent of cybersecurity staff in the ENISA NIS investments data lacked formal certifications, which translates directly into immature monitoring and slow incident response in the months after rollout. The State of Physical Security 2025 report, covering European end-user data, found that 57 percent of respondents cited aging and outdated infrastructure as a top challenge — meaning that many AI surveillance deployments are being integrated into estates that amplify rather than absorb the early instability.

From the Field

What we keep coming back to after conversations this week.

The question I always ask clients in the first month after go-live is: who reviews the false positives? Not who receives them, not who closes the ticket — who actually looks at what fired, decides whether the system should have fired, and turns that decision into a change.

In the deployments that are working well, there is always a clear answer. In the ones that are struggling, the answer is usually something like "the operators flag them" or "we send them to the vendor." Which is not an answer. It is a description of how signal disappears.

The feedback loop from operator observation to model change is the most operational thing about an AI surveillance deployment, and it is almost never defined in the contract, the implementation plan, or the handover documentation. It gets assumed. And because everyone assumes someone else owns it, it belongs to no one.

What separates the deployments I have seen stabilise from those that quietly deteriorate is almost always this: in the successful ones, someone decided that reviewing false positives was their job, and they had the authority and the access to do something about what they found. In the struggling ones, that decision was never made, and the system ran — and slowly ran down — on its own.

The first 90 days are not a technical challenge. They are a governance challenge. And the governance question you need to answer before go-live is not who owns the system. It is who owns the feedback loop.

One to Watch

The EU AI Act's post-market monitoring requirements for high-risk AI systems — which include most meaningful AI surveillance applications — are not future obligations. They are obligations that apply to systems being deployed today, with formal requirements coming into effect from August 2026.

What that means practically for any AI surveillance deployment going live in 2025: the post-deployment monitoring, logging, and human oversight structures that the first 90 days are supposed to establish are not just good operational practice. They are the compliance infrastructure that the Act requires. Organisations that build those structures during the stabilisation phase will have the audit trail, the model version history, and the human override documentation that regulators will ask for. Those that treat go-live as a handover and monitoring as a future consideration will be retrofitting that infrastructure at renewal time, under regulatory pressure, without the data that would have made the case straightforward.

The NIST AI 800-4 framework — published in 2026 based on 2025 workshop and research findings — provides the most detailed publicly available structure for thinking about post-deployment AI monitoring across six categories: functionality, operational performance, human factors, security, compliance, and societal impact. It is not surveillance-specific, but it is the closest thing to an authoritative framework for what a defensible post-deployment monitoring programme looks like. For security teams designing their first 90-day operating plans, it is worth reading alongside the EU AI Act's high-risk system requirements.

The organisations that will be best positioned when the regulatory clock completes its first cycle are the ones that treat the first 90 days not as a stabilisation problem but as a compliance foundation-building exercise. They are building the same infrastructure. The difference is whether it is built intentionally or recovered from under duress.

Published: 2026-04-08 · Updated: 2026-04-08

Markdown version of this page