Editor's note: This article was inspired by a framework originally articulated by Karan Shah, Founder and CEO of SoluteLabs, in his newsletter Brew. Build. Breakthrough. published on 27 March 2026. We have expanded and reframed his spectrum concept specifically for enterprise buyers evaluating AI agent development partners — with independent research and data added throughout. Karan's original perspective is available on his LinkedIn profile.
The Gap Between Adoption and Production
Here is the most important statistic in enterprise AI for 2026: 88% of AI agent pilots never reach production. Almost four in five enterprises have adopted AI agents in some form — yet only one in nine runs them in production. This is the largest deployment backlog in enterprise technology history, and the organisations that close it fastest will capture disproportionate competitive advantage.
The 12% who do reach production share four consistent attributes: they invested in infrastructure before deployment, documented governance before pilots, captured baseline metrics before development, and assigned dedicated business ownership with accountability for post-deployment performance. None of these are technology decisions. They are organisational decisions.
The technology itself is not the differentiator. The level of the spectrum your organisation is operating at is.
Karan Shah, Founder of SoluteLabs, articulated a five-level spectrum of agentic engineering maturity in March 2026 that maps this gap with unusual clarity. His framework was written from a software engineering perspective — about how development teams use AI tools. In this article we reframe that same spectrum from the enterprise buyer's perspective — mapping what each level means for a CTO, VP of Engineering, or procurement leader deciding how to invest in AI agents.
Why the AI Agent Debate Keeps Going in Circles
Almost every enterprise leadership team contains two camps when it comes to AI agents. Camp A has seen genuinely impressive results — prototypes built in days, workflows automated in weeks, productivity gains that feel transformative. They believe the technology is ready and the organisation should move fast.
Camp B has tried it and been disappointed. They rolled out AI tools, expected step-change productivity gains, and got marginal improvement. They are sceptical of the hype and point to internal pilots that never made it to production as evidence that the technology is not as ready as vendors claim.
Both camps are right. And both camps are only seeing part of the picture. The problem is not that one camp has better evidence than the other. The problem is that they are operating at different levels of the spectrum — and drawing universal conclusions from localised experience.
As Karan Shah observed: Camp A is right that the barrier to building software has dropped to nearly zero. But there is a difference between something that works and something that works at scale — with security, data integrity, and architecture that does not collapse when real users hit edge cases. Camp B is right that tools alone do not deliver transformation. But what Camp B has not seen is what happens when real engineering discipline is built around those tools. A small number of teams have done that, and they are shipping production systems at five to ten times the pace — not because the AI is better than Camp B thinks, but because those teams built the methodology that makes the AI reliable.
The Five-Level Agentic Engineering Spectrum
The following framework maps five levels of agentic engineering maturity. Each level is characterised by different capabilities, different failure modes, and different requirements for reaching the next level. Most enterprise debate happens between Level 1 and Level 2. The organisations generating measurable, sustained ROI are operating at Level 4 or above.
Level 1 — Vibe Coding and Prototype Automation
At Level 1, AI tools are used for rapid experimentation — prompting, accepting output, iterating without deep review. For enterprises this typically looks like teams using ChatGPT or Copilot to draft content, summarise documents, or generate basic scripts. Individual contributors experimenting with AI on their own workflows. Proof-of-concept demos built quickly to show stakeholders what is possible.
Level 1 is genuinely useful for exploration and validation. It is fast, cheap, and demonstrates the art of the possible. The failure mode is treating Level 1 results as evidence that production deployment is straightforward. It is not. The prototype works. The production system requires architecture, governance, and data infrastructure that the prototype entirely ignores.
Enterprise signal of Level 1: AI champions in individual teams with visible quick wins. No centralised AI strategy. No governance framework. No production deployments.
Level 2 — AI-Assisted Workflows
At Level 2, AI is embedded at specific steps in existing workflows — typically for generation, classification, or summarisation tasks. The workflow itself remains human-controlled and deterministic. AI handles individual steps. This is where most enterprise AI initiatives currently sit. Workers access AI tools has risen 50% in 2025 according to Deloitte, and most of that access is Level 2 — AI embedded in productivity tools rather than deployed as autonomous agents.
Level 2 delivers real but bounded value. Developers report 10 to 20 percent productivity gains. Customer service teams see meaningful deflection of routine queries. Finance teams process documents faster. These are genuine improvements — but they are not the transformative ROI that Level 4 and 5 deployments deliver. Camp B's scepticism largely reflects Level 2 experience extrapolated to the whole spectrum.
Enterprise signal of Level 2: Copilot or similar tools rolled out across teams. Measurable but incremental productivity gains. Leadership asking why the gains are not larger. The gap between expectation and outcome creates the scepticism that drives Camp B.
Level 3 — Supervised Agent Deployment
At Level 3, genuine AI agents are deployed — but with close human supervision at every significant decision point. Agents make multi-step decisions and take actions, but humans review and approve outputs before they affect real systems. The human-in-the-loop rate is high. According to G2's enterprise AI report, 47% of verified agent buyers operate at what they describe as autonomy-with-guardrails — this is Level 3.
Level 3 is where most enterprises that have moved beyond pilots currently operate. It is a legitimate and valuable deployment tier — particularly for regulated industries where human accountability is non-negotiable. The limitation is scalability. If every agent action requires human review before it executes, the throughput gains are bounded by the speed of human review.
Enterprise signal of Level 3: At least one AI agent running in production. Workflows redesigned around agent capability. Governance frameworks in place — but human sign-off required on agent outputs. The business case is validated. The scale is not yet there.
Level 4 — Spec-Driven Agentic Systems
Level 4 is where the economics change structurally. At this level, AI agents operate with defined specifications — clear success criteria, constrained action spaces, and verification layers that catch what agents get wrong automatically. Humans design the system and review outcomes. Agents execute the work. Karan Shah describes this as the phase where the human designs the system, the agent builds it, and the hooks verify it.
The ROI data at Level 4 is dramatically different from Level 2. Enterprise deployments of agentic AI return an average 171% ROI — 192% for US enterprises — exceeding traditional automation ROI by a factor of three according to Deloitte's 2026 research. Salesforce's production deployment achieved 84% AI-driven resolution. JPMorgan pilots achieved 83% faster research through agentic workflows. These are Level 4 outcomes, not Level 2 outcomes.
What makes Level 4 different from Level 3 is not the AI model — it is the surrounding system. Audit trails that enterprise procurement and legal teams accept. Observability layers that capture not just what the agent did but why. Governance frameworks embedded into the architecture rather than bolted on after deployment. The median time-to-value at Level 4 deployments is 5.1 months according to BCG and Forrester 2026 data.
Enterprise signal of Level 4: Multiple agents in production across different workflows. Agents operating with meaningful autonomy within defined boundaries. Governance frameworks mature enough that compliance teams approve agent outputs without reviewing every action. ROI measurable in concrete business terms — not just productivity surveys.
Level 5 — Multi-Agent Orchestration
Level 5 is the frontier — multiple specialised agents coordinating in parallel, with an orchestrator agent routing work to specialist agents and verification agents checking output quality. Humans define the system architecture and the success criteria. Agents handle execution across multiple domains simultaneously. According to OutSystems's 2026 State of AI Development report, 97% of enterprises are exploring system-wide agentic strategies — but only a fraction have actually achieved Level 5 deployment.
Real Level 5 examples in 2026 include EY's Canvas platform coordinating 1.4 trillion lines of audit data annually across 160,000 engagements in 150 countries. Microsoft and Schneider Electric's manufacturing deployment achieving 50% cycle reduction in engineering design. These are not typical enterprise deployments — they represent the leading edge of what is currently possible with multi-agent orchestration at scale.
Enterprise signal of Level 5: Agents coordinating with other agents automatically. Humans operating as system architects rather than task supervisors. Outcomes measured at business unit or company level rather than workflow level. This is where Gartner's prediction that 40% of enterprise applications will include task-specific AI agents by end of 2026 leads over the next 12 to 24 months.
Where Most Enterprises Actually Are in 2026
The data from multiple 2026 enterprise surveys tells a consistent story. According to Mayfield's CXO survey of 266 enterprise technology leaders, 42% of enterprises already have at least one AI agent in production — meaning they have reached at least Level 3. But 72% are still in pilot or production combined, suggesting that most production deployments are early-stage Level 3 rather than mature Level 4.
The production gap is stark. According to research aggregating 150+ enterprise data points, 88% of AI agent pilots never reach production — not because the technology failed, but because of workflow redesign challenges, missing governance frameworks, inadequate data infrastructure, and unclear business ownership. These are all organisational failures, not technology failures. The 12% who succeed invest in these foundations before development begins.
The most common enterprise progression failure is attempting to jump from Level 2 to Level 4 without building the Level 3 infrastructure — governance frameworks, observability layers, and human-in-the-loop escalation paths — that makes Level 4 trustworthy enough to scale.
What Blocks Progression Between Levels
Level 1 to Level 2 — The Structure Gap
Moving from individual AI experimentation to embedded AI workflows requires workflow documentation and process standardisation. AI cannot improve a process that has not been defined. The blocker is organisational, not technical — getting teams to document and standardise processes so AI can be reliably embedded at specific steps.
Level 2 to Level 3 — The Data and Infrastructure Gap
Genuine AI agents require live access to enterprise systems — CRM, ERP, databases, ticketing — not static document exports. Gartner predicts 60% of agentic AI projects will fail in 2026 due to lack of AI-ready data. The investment required at this transition is primarily in data pipeline architecture and enterprise system integration — not in the AI model itself.
Level 3 to Level 4 — The Governance and Observability Gap
This is the most commonly underestimated transition. Moving from supervised agents to autonomous production agents requires observability infrastructure — logs that capture not just what the agent did but why, reasoning traces, tool call logs, and human escalation paths. According to Deloitte's 2026 research, only one in five companies has a mature governance model for AI agents. Without it, organisations cannot responsibly reduce human-in-the-loop rates, and Level 4's economic benefits remain out of reach.
Level 4 to Level 5 — The Architecture Gap
Multi-agent orchestration requires a fundamentally different system architecture — orchestrator agents, specialist agents, verification agents, and the coordination protocols between them. This is engineering complexity at a different order of magnitude from a single-agent deployment. Most enterprises will operate at Level 4 for an extended period before the use case complexity and organisational maturity justify Level 5 investment.
What This Means for Enterprise Buyers Choosing an AI Agent Partner
The spectrum framework has a direct implication for how enterprise buyers should evaluate AI agent development partners. Most vendors will tell you they can build AI agents. The question is what level of the spectrum they have actually delivered in production — and whether that level matches what you need.
Ask any prospective partner these three questions. First: can you show us a production deployment — not a demo — where your agent has run autonomously for at least three months? Second: what does your observability and monitoring stack look like, and can you show us reasoning traces from a live agent? Third: how do you handle agent failures, and can you describe a specific incident and how it was resolved?
Partners operating at Level 3 can answer the first question with confidence. Partners operating at Level 4 can answer all three. Partners operating primarily at Level 1 or 2 — regardless of how sophisticated their marketing materials are — will struggle with the second and third questions.
The question Karan Shah leaves his readers with is equally valuable for enterprise buyers: next time someone tells you AI agents are transformative, or that AI agents are overhyped — just ask them what level they are operating at. The answer will tell you everything about whether their experience applies to your situation.
Frequently Asked Questions
Why do 88% of AI agent pilots fail to reach production?
Forrester's root-cause analysis of failed agent deployments attributes 41% of failures to unclear success criteria, 33% to insufficient tool or data access, and 26% to drift in evaluation coverage. None of these are model-quality problems. They are scoping and governance problems — which means they are solvable with the right partner and the right pre-deployment investment in infrastructure and governance frameworks.
What ROI should we expect from AI agent deployments?
The median payback period for enterprise AI agent deployments is 5.1 months, with an average ROI of 171% — 192% for US enterprises — according to BCG and Forrester 2026 data. These figures reflect Level 4 deployments. Level 2 deployments deliver real but more modest returns — 10 to 20% productivity improvement rather than 171% ROI. The difference is almost entirely explained by the level of the spectrum, not the quality of the underlying AI model.
How do we know what level our enterprise is currently at?
Three diagnostic questions help locate your current position. One: do you have an AI agent running autonomously in a live production environment — not a pilot, not a demo? If no, you are at Level 1 or 2. Two: does your organisation have a documented governance framework for AI agents — specifying who is accountable for agent decisions and how failures are handled? If no, you are not yet at Level 4 regardless of your deployment status. Three: do you measure agent performance with business-level metrics — cost per transaction, cycle time reduction, resolution rate — rather than usage statistics? If no, you have not yet validated the ROI that justifies scaling.
Should we build AI agents in-house or work with a development partner?
Most enterprises attempting to reach Level 4 for the first time benefit significantly from working with a partner who has already navigated the Level 3 to Level 4 transition on previous projects. The infrastructure, governance, and observability requirements at Level 4 are non-trivial engineering challenges. Building them correctly the first time — rather than retrofitting them after a pilot fails to scale — is where experienced implementation partners deliver their most significant value. 91% of CXOs plan to increase their agentic AI budgets in 2026 according to Mayfield's survey. The question is not whether to invest — it is whether to invest in building internal capability from scratch or accelerating through partnership with teams operating at Level 4.
The Level You Are At Is the Strategy
The agentic engineering spectrum makes something important visible that is often obscured by vendor marketing and media coverage: there is no single experience of AI agents. There are five materially different levels of deployment, each with different economics, different failure modes, and different requirements. The 88% failure rate and the 171% ROI are both true — they describe different levels of the same spectrum.
The strategic question for enterprise leaders in 2026 is not whether to deploy AI agents. That decision has been made by market momentum. The strategic question is what level you are currently at, what is blocking you from the next level, and whether you are investing in the right things — governance, data infrastructure, observability, and implementation expertise — to close the gap before your competitors do.
The teams generating the best results are not the ones with the best AI tools. They are the ones with the best methodology around those tools. The spectrum is wide. The gap between Level 2 and Level 4 is not a tool upgrade. It is a discipline shift.
Find an AI Agent Partner Operating at Level 4
Mintonn's directory profiles verified AI agent development companies evaluated on their production deployment history, technical framework expertise, and enterprise governance capability. Browse the directory at mintonn.com/directory to see which companies have demonstrated Level 4 capability — or compare enterprise implementation partners directly at mintonn.com/compare/enterprise-ai-agent-partners.