Many vendors and teams today describe their AI systems as agentic. The word covers both an FAQ chatbot that answers a single question and an AI agent that books flights, moves money, and coordinates with multiple other agents to finish a multi-step task.
But agency in artificial intelligence isn’t binary. It runs along a spectrum, from low-autonomy automation that follows fixed rules, to fully autonomous agents that plan, reason, and act with little human input.
Where a system falls on that spectrum determines how you should govern it.
Autonomous AI agent versus automation
Before splitting agentic AI systems by autonomy level, we must first separate real agency from scripted automation.
A large share of what vendors market as agentic AI systems is merely automation: legacy workflow tools running on predefined rules rather than reasoning toward goals.
Such automation might even have a conversational interface—think shopping site chatbots that route you through a predefined path to resolution.
AI governance built for autonomous agents doesn’t map well onto scripted automation, and vice versa.
Three markers differentiate intelligent systems from mere automation:
- Reasoning under uncertainty. A specialized agent interprets ambiguous input and chooses between options, rather than matching input to a fixed rule.
- Goal persistence. An autonomous agent keeps pursuing an objective across multiple steps, adjusting as conditions change.
- Response to operational environment. An intelligent agent senses new information mid-task and folds it into its behaviour.
A system that can’t do at least two of these is automation, and should be governed as such. Conflate the two and you get controls that either restrict agent capability or overly permit brittle automation.
The 5-level AI autonomy spectrum
In its March 2026 foresight paper, the UK’s Digital Regulation Cooperation Forum (the CMA, FCA, ICO, and Ofcom acting together) set out a five-level autonomy spectrum. It gives practitioners and agentic AI system developers a shared vocabulary.
| Level | Label | Agent behavior | Governance posture |
| L1 agents | Tool | Responds to direct queries, shows no initiative | Output quality controls, content review, basic logging |
| L2 agents | AI assistant | Makes limited decisions in a narrow scope, a human signs off | Human oversight, usage logging, action traceability |
| L3 agents | Operator | Plans and executes within guardrails, minimal oversight | Action limits, decision traceability, escalation paths, audit trails |
| L4 agent | Collaborator | Coordinates with other agents, delegates sub-tasks | Intent verification, sandboxing, continuous audits, red-teaming |
| L5 agents | Autonomous actor | Sets goals, learns from outcomes, runs for long periods | Full controls, plus a hard question about whether to deploy at all |
L1 and L2 agents offer a familiar user experience—think FAQ chatbot or a generative AI writing tool.
An AI model at L3 plans a workflow within an operational environment. It calls APIs and handles exceptions on its own. OpenAI’s ‘Operator’ performed such an agentic workflow with web browsing.
L4 and L5 agents are typically found in a multi agent system with bounded autonomy. Few enterprises run that in production today.
An L5 agent operator is what people imagine when they think of artificial general intelligence: autonomous AI agents with capabilities far beyond human comprehension or control. Such an agentic system would be more advanced than the generative AI tools we use today.
The DRCF specifies that an organisation’s responsibility for legal compliance stays the same regardless of how autonomously its agent acts. Escalating agent autonomy doesn’t transfer accountability. Put differently, “my agent did it” won’t fly in court.

Deployment modes for agentic systems
Most production deployments in 2026 fall at Level 1 or Level 2, but marketing from agent developers often implies Level 3 or Level 4 on the agentic spectrum.
That distance produces two failure modes:
- Over-governing L1 and L2 tools creates compliance theatre: heavy documentation and review spend on systems that are sophisticated chatbots, slowing low-risk work for no risk reduction.
- Under-governing L3 systems or higher is the costlier error. A scheduling or financial agent gets treated as an assistant even though it can commit resources and initiate transactions on its own. This comes with a high risk of failure—imagine a bank’s agentic workflow mistakenly resetting your balance to zero.
Mis-classification feeds today’s high failure rate, with Gartner predicting over 40% of agentic AI projects to be cancelled by 2027, with unclear value and weak governance as leading reasons.
Context moves an AI system along the spectrum
An AI system’s level isn’t a fixed property of the AI model. It’s a property of the deployment. Three variables move the same system along the autonomy spectrum.
- Scope of action: A read-only system that surfaces recommendations is an L2 tool. Give it write access to a CRM and a procurement system and it becomes an L3 operator, with no change to the underlying AI model.
- Organisational reach: A customer-facing agent serving thousands of users daily carries a heavier risk profile than an internal tool used by three analysts. Scale amplifies the consequence of every decision.
- Domain criticality: Healthcare, finance, and legal deployments run at higher stakes than internal productivity tools. A single agentic deployment can trigger obligations across data protection, financial regulation, online safety, and competition at once.
Classify the deployment, not the AI model. Two organisations running the same vendor’s AI agent may need different governance structures—and this scales exponentially with multiple agents in the mix.
Compliance drift: when the autonomy level changes on its own
One overlooked risk is level migration without a governance update. A system launches as an L2 or L3 assistant, teams trust it, its scope widens, and months later it’s making autonomous decisions its original mandate never anticipated.
Three things can drive this drift:
- Scope creep by permission, as teams grant new API access without reassessing autonomy.
- Trust accumulation, as human oversight steps get removed once the system performs well.
- Model updates, where an improved model handles more complex situations than the system was evaluated for.
The Cloud Security Alliance’s January 2026 framework notes that most organisations lack technical enforcement of autonomy boundaries, so nothing stops an L2 system from behaving like an L3 one once access expands.
The capability curve makes drift more likely. Anthropic reported that the time its Claude Code agent works before stopping nearly doubled, from under 25 minutes to over 45 minutes, between October 2025 and January 2026.
Multiple AI agents working longer without oversight greatly raises the risk surface area and blast radius.
The fix isn’t more paperwork at launch. It’s defined migration triggers: specific conditions, such as new API access or a removed approval step, that automatically prompt a fresh governance review of AI capabilities.
Governing the agentic spectrum in practice
Spectrum-aware governance needs three things, none requiring new tooling to begin.
A classification register records each agent’s level across scope, reach, and domain criticality before it goes live (think AI autonomy certificates). This is the baseline regulators expect.
Today only 21% of organisations have a mature governance model for autonomous agents, and just 12% use a centralised platform to maintain control over multiple agents.
Defined migration triggers tie each escalating change to a named reviewer—a person, not a team.
Continuous monitoring then replaces point-in-time assessment, because governance set at deployment and never revisited is governance for a system that no longer exists.
Human oversight should scale with autonomy: mandatory sign-off at L2, then a shift toward boundary-setting and checkpoint intervention as autonomy rises. A well-designed human in the loop catches errors before they compound.
Find out where your agents fall on the spectrum
If you’re deploying AI agents and aren’t sure whether you’re governing them at the right level, that uncertainty is a risk.
I run a free AI audit that maps your current and planned agents against the five-level autonomy spectrum, flags where governance posture doesn’t match autonomy, and gives you a prioritised list of controls to put in place.
No tooling required to start, and you keep the output whether or not we work together.