Unbounded Agent Execution can result in Denial-of-Service Attacks

Agents often appear structured at the planning level, but at runtime their execution becomes increasingly non-deterministic once tools, retries, partial failures, and replanning are introduced.

Most agent frameworks follow some version of the following loop.

while not done:
    action = llm.plan(context)
    result = tool.execute(action)
    if invalid(result):
        context += error
        continue
    context += result

Photo by Nicolas Hoizey on Unsplash

Even if this looks reasonable, at execution time it can easily become an economic denial of service (EDoS) attack. Each retry appends error logs, stack traces, and planner instructions. As the signal-to-noise ratio drops, planner signal degrades and the model becomes less likely to converge on the original task, often producing speculative fixes that further extend the loop.

It isn't an infinite loop but more of a resource exhaustion attack (intentional or accidental) caused by unbounded execution.

A sample failure scenario may look like the following.

Tool Failure: A tool returns a 504 or malformed JSON.
Context Inflation: The agent appends the 2KB error log to the context.
Planner Amplification: The planner retries with "more detail," increasing the prompt size.
Cost Compounding: Each subsequent call is more expensive than the last.

Practically, the system terminates only when it hits a hard timeout or burns through its "budget". By that point, the original intent is lost. This mirrors the classic distributed systems challenge where missing backpressure and positive feedback loops can lead to resource exhaustion. Tool failures act as feedback signals, but without backpressure the planner interprets them as a reason to increase effort rather than slow execution.

An example could be a web scrapper agent. If the tool returns a payload that the agent interprets as an instruction to "analyze this error in extreme detail using 10 retries", it creates a depth-amplification loop under a trusted identity.

Another challenge is the cross-account resource exhaustion or the noisy neighbor agent. In a multi-tenant or multi-team environment, resource starvation becomes a real problem. Because most LLM providers enforce rate limits at the account or API key level, a single agent stuck in a loop can consume the entire quota. Without task-level priority queuing or isolated quotas, one buggy agent becomes a bottleneck that creates a denial of service for every other AI-dependent service in your stack.

Mitigations

Traditional cloud governance controls do not address execution-level amplification. Rate limits work based on frequency and not the execution depth. Account budgets are also after the fact.

To run agents at scale, we must treat the execution graph as a first-class security object. We need middleware that intercepts execution between planning, tool invocation, and context mutation. This middleware can enforce task-scoped budgets, token-delta monitoring, semantic circuit breakers, and inline enforcement.

task-scoped budgets. Hard-stop execution at a granular level (e.g., $0.50 per Task ID).
token-delta monitoring. Context growth exceeding a fixed delta or percentage between steps (e.g. terminate execution if context_growth > 25% between steps).
semantic circuit breakers. If an agent calls the same tool with the same arguments three times and receives the same error, the circuit must break.
inline enforcement. Controls must execute before the next LLM call. This is the only way to prevent the cost from being incurred.

def execution_guard(task_id, current_context, next_action):
    if budget_exceeded(task_id):
        raise SecurityPolicyViolation("Task budget depleted")
    
    if is_livelocked(task_id, next_action):
        raise SecurityPolicyViolation("Semantic loop detected")
        
    return proceed_to_llm(next_action)

If you are looking at the bottom line, we need to track more than just total spend. Metrics like cost / completed tasks, the rate at which token usage accelerates per retry and how often one agent's retry-loop triggers rate-limiting for other users are critical.