Cogent Raises $42M Series A - Read more here

Product

Agentic AI

Customers

Resources

Company

Product

Agentic AI

Customers

Resources

Company

Product

Agentic AI

Customers

Resources

Company

Product

Agentic AI

Customers

Resources

Company

Product

Agentic AI

Customers

Resources

Company

Jan 2, 2026

Agent Architectures: Why We Built an Agent Environment

Comparing tool-calling, RAG, workflows, and execution environments—and why we converged on Agent Environments

Geng Sng, CTO

The Architecture Question

When you decide to build an LLM-powered agent, the first question isn't "which model?" It's "what architecture pattern should we use?"

This question matters more than most teams realize. The architecture you choose determines:

  • What your agent can do (capabilities)

  • How safe it is (blast radius)

  • How expensive it is to run (cost structure)

  • How hard it is to debug (observability)

Over the past year at Cogent, we've evolved through multiple agent architectures. We started with simple tool-calling, added RAG, experimented with workflows, and ultimately converged on what we call an Agent Environment—a pattern that combines tools, code execution, and sandboxed isolation.

This post explains why we made these architectural choices, compares the tradeoffs between patterns, and shares lessons about when each approach makes sense.

Five Agent Architecture Patterns

Let's start by defining the landscape. Most production agents fall into one of five architectural patterns:

1. Tool-Calling Agents

Pattern: LLM chooses from predefined functions, you execute them.

# Simplified example
tools = [
    {"name": "query_database", "parameters": {...}},
    {"name": "create_ticket", "parameters": {...}},
]

response = llm.chat(
    messages=[...],
    tools=tools,  # LLM decides which to call
)

if response.tool_calls:
    for tool_call in response.tool_calls:
        result = execute_tool(tool_call.name, tool_call.arguments)
# Simplified example
tools = [
    {"name": "query_database", "parameters": {...}},
    {"name": "create_ticket", "parameters": {...}},
]

response = llm.chat(
    messages=[...],
    tools=tools,  # LLM decides which to call
)

if response.tool_calls:
    for tool_call in response.tool_calls:
        result = execute_tool(tool_call.name, tool_call.arguments)
# Simplified example
tools = [
    {"name": "query_database", "parameters": {...}},
    {"name": "create_ticket", "parameters": {...}},
]

response = llm.chat(
    messages=[...],
    tools=tools,  # LLM decides which to call
)

if response.tool_calls:
    for tool_call in response.tool_calls:
        result = execute_tool(tool_call.name, tool_call.arguments)
# Simplified example
tools = [
    {"name": "query_database", "parameters": {...}},
    {"name": "create_ticket", "parameters": {...}},
]

response = llm.chat(
    messages=[...],
    tools=tools,  # LLM decides which to call
)

if response.tool_calls:
    for tool_call in response.tool_calls:
        result = execute_tool(tool_call.name, tool_call.arguments)

Characteristics:

  • ✅ Simple to reason about (fixed set of tools)

  • ✅ Easy to control (you define what's possible)

  • ✅ Fast (low latency per call)

  • ❌ Limited flexibility (only predefined functions)

  • ❌ Coordination complexity (need orchestration layer for multi-step)

Best for: Structured tasks with known operations (customer support, data extraction)

2. RAG Agents

Pattern: LLM queries knowledge base, generates answer from context.

# Simplified example
def rag_agent(query):
    # 1. Retrieve relevant context
    docs = vector_store.search(query, top_k=5)

    # 2. Generate answer from context
    response = llm.chat(
        messages=[
            {"role": "system", "content": "Answer using the provided context."},
            {"role": "user", "content": f"Context: {docs}\n Question: {query}"}
        ]
    )
    return response
# Simplified example
def rag_agent(query):
    # 1. Retrieve relevant context
    docs = vector_store.search(query, top_k=5)

    # 2. Generate answer from context
    response = llm.chat(
        messages=[
            {"role": "system", "content": "Answer using the provided context."},
            {"role": "user", "content": f"Context: {docs}\n Question: {query}"}
        ]
    )
    return response
# Simplified example
def rag_agent(query):
    # 1. Retrieve relevant context
    docs = vector_store.search(query, top_k=5)

    # 2. Generate answer from context
    response = llm.chat(
        messages=[
            {"role": "system", "content": "Answer using the provided context."},
            {"role": "user", "content": f"Context: {docs}\n Question: {query}"}
        ]
    )
    return response
# Simplified example
def rag_agent(query):
    # 1. Retrieve relevant context
    docs = vector_store.search(query, top_k=5)

    # 2. Generate answer from context
    response = llm.chat(
        messages=[
            {"role": "system", "content": "Answer using the provided context."},
            {"role": "user", "content": f"Context: {docs}\n Question: {query}"}
        ]
    )
    return response

Characteristics:

  • ✅ Grounded in facts (retrieves actual data)

  • ✅ Scalable knowledge (can index millions of docs)

  • ✅ Explainable (cite sources)

  • ❌ No actions (read-only, can't change state)

  • ❌ Retrieval quality matters (GIGO: garbage in, garbage out)

  • ❌ Context window limits (can't fit all relevant docs)

Best for: Q&A over large knowledge bases (documentation, support, research)

3. ReAct Agents

Pattern: LLM iteratively reasons about the next step, acts by calling tools, then updates based on observations until it reaches a final answer.

# Simplified example
def react_agent(question):
    messages = [
        {"role": "system", "content": "You can use tools. Think step-by-step and call tools when helpful."},
        {"role": "user", "content": question},
    ]

    while True:
        # 1) Model decides next step (either call a tool or answer)
        step = llm.chat(messages=messages)

        if step.get("tool_call"):
            tool_name = step["tool_call"]["name"]
            tool_args = step["tool_call"]["args"]

            # 2) Execute tool and capture observation
            observation = tools[tool_name](**tool_args)

            # 3) Feed observation back to the model
            messages.append(step["message"])
            messages.append({"role": "tool", "name": tool_name, "content": str(observation)})
        else:
            # 4) Final response
            return step["message"]["content"]

# Simplified example
def react_agent(question):
    messages = [
        {"role": "system", "content": "You can use tools. Think step-by-step and call tools when helpful."},
        {"role": "user", "content": question},
    ]

    while True:
        # 1) Model decides next step (either call a tool or answer)
        step = llm.chat(messages=messages)

        if step.get("tool_call"):
            tool_name = step["tool_call"]["name"]
            tool_args = step["tool_call"]["args"]

            # 2) Execute tool and capture observation
            observation = tools[tool_name](**tool_args)

            # 3) Feed observation back to the model
            messages.append(step["message"])
            messages.append({"role": "tool", "name": tool_name, "content": str(observation)})
        else:
            # 4) Final response
            return step["message"]["content"]

# Simplified example
def react_agent(question):
    messages = [
        {"role": "system", "content": "You can use tools. Think step-by-step and call tools when helpful."},
        {"role": "user", "content": question},
    ]

    while True:
        # 1) Model decides next step (either call a tool or answer)
        step = llm.chat(messages=messages)

        if step.get("tool_call"):
            tool_name = step["tool_call"]["name"]
            tool_args = step["tool_call"]["args"]

            # 2) Execute tool and capture observation
            observation = tools[tool_name](**tool_args)

            # 3) Feed observation back to the model
            messages.append(step["message"])
            messages.append({"role": "tool", "name": tool_name, "content": str(observation)})
        else:
            # 4) Final response
            return step["message"]["content"]

# Simplified example
def react_agent(question):
    messages = [
        {"role": "system", "content": "You can use tools. Think step-by-step and call tools when helpful."},
        {"role": "user", "content": question},
    ]

    while True:
        # 1) Model decides next step (either call a tool or answer)
        step = llm.chat(messages=messages)

        if step.get("tool_call"):
            tool_name = step["tool_call"]["name"]
            tool_args = step["tool_call"]["args"]

            # 2) Execute tool and capture observation
            observation = tools[tool_name](**tool_args)

            # 3) Feed observation back to the model
            messages.append(step["message"])
            messages.append({"role": "tool", "name": tool_name, "content": str(observation)})
        else:
            # 4) Final response
            return step["message"]["content"]

Characteristics:

  • ✅ Clear control loop (act → observe → repeat)

  • ✅ Adaptive (chooses next action based on intermediate results)

  • ✅ General-purpose (works across many tasks as long as tools exist)

  • ❌ Not sufficient by itself: assumes tools are safe, reliable, and well-defined

  • ❌ Brittle without good interfaces (tool schemas, error handling, rate limits, retries)

  • ❌ Hard to control and evaluate without guardrails (step limits, budgets, logging, stopping criteria)

Best for: Tool-using assistants where the right next step depends on intermediate observations (investigations, troubleshooting, research).

4. Agentic Workflows

Pattern: Chain multiple agents, each with specialized roles.

# Simplified example (LangGraph-style)
workflow = StateGraph()

workflow.add_node("researcher", research_agent)
workflow.add_node("analyzer", analysis_agent)
workflow.add_node("writer", writing_agent)

workflow.add_edge("researcher", "analyzer")
workflow.add_edge("analyzer", "writer")

result = workflow.invoke({"task": "Analyze vulnerability trends"})
# Simplified example (LangGraph-style)
workflow = StateGraph()

workflow.add_node("researcher", research_agent)
workflow.add_node("analyzer", analysis_agent)
workflow.add_node("writer", writing_agent)

workflow.add_edge("researcher", "analyzer")
workflow.add_edge("analyzer", "writer")

result = workflow.invoke({"task": "Analyze vulnerability trends"})
# Simplified example (LangGraph-style)
workflow = StateGraph()

workflow.add_node("researcher", research_agent)
workflow.add_node("analyzer", analysis_agent)
workflow.add_node("writer", writing_agent)

workflow.add_edge("researcher", "analyzer")
workflow.add_edge("analyzer", "writer")

result = workflow.invoke({"task": "Analyze vulnerability trends"})
# Simplified example (LangGraph-style)
workflow = StateGraph()

workflow.add_node("researcher", research_agent)
workflow.add_node("analyzer", analysis_agent)
workflow.add_node("writer", writing_agent)

workflow.add_edge("researcher", "analyzer")
workflow.add_edge("analyzer", "writer")

result = workflow.invoke({"task": "Analyze vulnerability trends"})

Characteristics:

  • ✅ Compositional (break complex tasks into steps)

  • ✅ Specialized agents (each does one thing well)

  • ✅ Observable (can track state transitions)

  • ❌ Coordination overhead (agents must agree on state)

  • ❌ Failure cascades (one agent failure breaks the chain)

  • ❌ Prompt sprawl (each agent needs its own prompt)

Best for: Multi-step tasks with clear phases (research → analyze → report)

5. Agent Environments

Pattern: LLM has tools + code execution in isolated sandbox.

# Simplified example
class AgentEnvironment:
    def __init__(self):
        self.sandbox = create_isolated_sandbox()
        self.tools = load_mcp_tools()  # Predefined tools

    async def execute(self, agent_request):
        # Agent can use tools OR write code
        if agent_request.type == "tool_call":
            return await self.tools.execute(agent_request)
        elif agent_request.type == "code":
            return await self.sandbox.run(agent_request.code)
# Simplified example
class AgentEnvironment:
    def __init__(self):
        self.sandbox = create_isolated_sandbox()
        self.tools = load_mcp_tools()  # Predefined tools

    async def execute(self, agent_request):
        # Agent can use tools OR write code
        if agent_request.type == "tool_call":
            return await self.tools.execute(agent_request)
        elif agent_request.type == "code":
            return await self.sandbox.run(agent_request.code)
# Simplified example
class AgentEnvironment:
    def __init__(self):
        self.sandbox = create_isolated_sandbox()
        self.tools = load_mcp_tools()  # Predefined tools

    async def execute(self, agent_request):
        # Agent can use tools OR write code
        if agent_request.type == "tool_call":
            return await self.tools.execute(agent_request)
        elif agent_request.type == "code":
            return await self.sandbox.run(agent_request.code)
# Simplified example
class AgentEnvironment:
    def __init__(self):
        self.sandbox = create_isolated_sandbox()
        self.tools = load_mcp_tools()  # Predefined tools

    async def execute(self, agent_request):
        # Agent can use tools OR write code
        if agent_request.type == "tool_call":
            return await self.tools.execute(agent_request)
        elif agent_request.type == "code":
            return await self.sandbox.run(agent_request.code)

Characteristics:

  • ✅ Maximum flexibility (tools + arbitrary code)

  • ✅ Self-correcting (agent can debug its own code)

  • ✅ Exploratory (can try multiple approaches)

  • ❌ Harder to control (code execution is unbounded)

  • ❌ More expensive (sandboxes have overhead)

  • ❌ Requires isolation (code could be malicious)

Best for: Complex, open-ended tasks requiring exploration and verification (data analysis, investigations)

Treat Your Agent Stack Like Layers

An agent system is a stack, and most “patterns” live at different layers of that stack.

The Agent Stack

  1. Interface layer (What the model can call)

    • Tools / function calling / MCP servers / APIs

    • Also includes permissions (which tools, with what scopes)

  2. Knowledge layer (What the model can look up)

    • RAG, search, memory, KBs, embeddings, citations/provenance

  3. Control layer (How decisions unfold over time)

    • Single-shot tool call vs ReAct loop vs planner/critic loops

    • Budgets, step limits, retries, stopping criteria

  4. Execution layer (Where work runs, safely)

    • Host process vs sandboxed execution environment

    • Network policy, credentials, resource limits, filesystem isolation

    • Producing artifacts (tables, charts), capturing stdout/errors

Most production agent systems are built from a small set of building blocks. They differ along four axes: interface, knowledge, control, and execution. The patterns people talk about (tool calling, ReAct, RAG, workflows, environments) are really choices along these axes.

Why We Converged on Agent Environments

Cogent's initial use case was complex vulnerability analysis but that quickly grew into many use cases across vulnerability management and enterprise security workflows:

  • Data is messy: Multiple scanners, different schemas, inconsistent naming

  • Questions are open-ended: "Why did CVE-2024-1234 spike in priority?" requires investigation

  • Context is large: 1,000,000+ vulnerabilities, 50,000+ assets, millions of database rows

  • Actions are varied: Query databases, generate charts, create tickets, run remediations

Given this reality, we converged on a clear design choice: keep the agent loop simple, and put the power in the environment.

We made this bet because the underlying trajectory was obvious:

  1. Models were getting smarter → less need for heavyweight orchestration to “force” good reasoning

  2. Context windows were getting longer → more of the investigation can stay in one coherent loop

  3. Tool use was improving → agents could reliably drive real workflows via tools

That made workflow-heavy or multi-agent designs tempting—but also risky. Their hidden complexity (handoffs, state synchronization, brittleness, debugging overhead) tends to erase the gains you’re trying to achieve.

So we built around Agent Environments: a governed execution layer with well-scoped tools (and optional code execution) that lets a straightforward agentic loop handle messy data, large context, and real actions—without drowning in orchestration.

Agent Architecture Evolution

Let's walk through how we evolved architecturally:

Stage 1: Tool-Calling Agent

What we built:

tools = [
    "query_vuln_db",  # SQL query against knowledge base
    "get_cve_details",  # Fetch CVE metadata
    "create_jira_ticket",  # Create remediation ticket
]
tools = [
    "query_vuln_db",  # SQL query against knowledge base
    "get_cve_details",  # Fetch CVE metadata
    "create_jira_ticket",  # Create remediation ticket
]
tools = [
    "query_vuln_db",  # SQL query against knowledge base
    "get_cve_details",  # Fetch CVE metadata
    "create_jira_ticket",  # Create remediation ticket
]
tools = [
    "query_vuln_db",  # SQL query against knowledge base
    "get_cve_details",  # Fetch CVE metadata
    "create_jira_ticket",  # Create remediation ticket
]

What worked:

  • Fast responses (single LLM call + tool execution)

  • Easy to control (predefined queries)

  • Simple to debug (clear tool execution logs)

What broke:

  • Exploratory analysis was impossible: "Show me trends in RCE vulnerabilities over 6 months" required custom SQL that we hadn't pre-defined

  • Rigid schema: Every new analysis needed a new tool

  • Can't self-correct: If query returned empty, agent couldn't adjust and retry

Example failure:

User: "Why did our CVSS score distribution change in January?"

Agent: [Calls query_vuln_db with hardcoded date range]

User: "Why did our CVSS score distribution change in January?"

Agent: [Calls query_vuln_db with hardcoded date range]

User: "Why did our CVSS score distribution change in January?"

Agent: [Calls query_vuln_db with hardcoded date range]

User: "Why did our CVSS score distribution change in January?"

Agent: [Calls query_vuln_db with hardcoded date range]

Stage 2: ReAct Tool-Calling + Improved retrieval with RAG

What we added:

tools = [
    # ... existing tools
    "search_documentation",  # RAG over security docs
    "search_advisories",  # RAG over vendor advisories
]
tools = [
    # ... existing tools
    "search_documentation",  # RAG over security docs
    "search_advisories",  # RAG over vendor advisories
]
tools = [
    # ... existing tools
    "search_documentation",  # RAG over security docs
    "search_advisories",  # RAG over vendor advisories
]
tools = [
    # ... existing tools
    "search_documentation",  # RAG over security docs
    "search_advisories",  # RAG over vendor advisories
]

What worked:

  • Grounded answers in actual documentation

  • Could cite sources ("According to Qualys advisory...")

  • Reduced hallucinations about CVE details

What still broke:

  • Still couldn't explore: RAG retrieves context but doesn't execute analysis

  • Context window pressure: 10,000 vulnerabilities × 500 tokens = can't fit in context

  • No visualization: Agent could describe trends but couldn't generate charts

Example limitation:

User: "Show me a chart of critical vulnerabilities by asset type"

Agent: [Searches docs, finds chart examples]

User: "Show me a chart of critical vulnerabilities by asset type"

Agent: [Searches docs, finds chart examples]

User: "Show me a chart of critical vulnerabilities by asset type"

Agent: [Searches docs, finds chart examples]

User: "Show me a chart of critical vulnerabilities by asset type"

Agent: [Searches docs, finds chart examples]

Stage 3: ReAct Tool-Calling + RAG + Code Execution

The breakthrough: Let the agent write Python code.

tools = [
    # ... existing tools
    "execute_code",  # NEW: Run Python in sandbox
]

# Agent can now do this:
code = """
import pyathena
import pandas as pd
import matplotlib.pyplot as plt

# Query raw data from Athena
conn = pyathena.connect(...)
df = pd.read_sql("SELECT asset_type, severity, count(*) FROM vulns GROUP BY ...", conn)

# Generate chart
df.plot(kind='bar')
plt.savefig('chart.png')
"""
tools = [
    # ... existing tools
    "execute_code",  # NEW: Run Python in sandbox
]

# Agent can now do this:
code = """
import pyathena
import pandas as pd
import matplotlib.pyplot as plt

# Query raw data from Athena
conn = pyathena.connect(...)
df = pd.read_sql("SELECT asset_type, severity, count(*) FROM vulns GROUP BY ...", conn)

# Generate chart
df.plot(kind='bar')
plt.savefig('chart.png')
"""
tools = [
    # ... existing tools
    "execute_code",  # NEW: Run Python in sandbox
]

# Agent can now do this:
code = """
import pyathena
import pandas as pd
import matplotlib.pyplot as plt

# Query raw data from Athena
conn = pyathena.connect(...)
df = pd.read_sql("SELECT asset_type, severity, count(*) FROM vulns GROUP BY ...", conn)

# Generate chart
df.plot(kind='bar')
plt.savefig('chart.png')
"""
tools = [
    # ... existing tools
    "execute_code",  # NEW: Run Python in sandbox
]

# Agent can now do this:
code = """
import pyathena
import pandas as pd
import matplotlib.pyplot as plt

# Query raw data from Athena
conn = pyathena.connect(...)
df = pd.read_sql("SELECT asset_type, severity, count(*) FROM vulns GROUP BY ...", conn)

# Generate chart
df.plot(kind='bar')
plt.savefig('chart.png')
"""

What this unlocked:

  • Exploratory analysis: Agent can query → inspect → adjust → re-query

  • Self-correction: If query fails, agent reads error and fixes it

  • Visualization: Agent generates charts with matplotlib/plotly

  • Complex transformations: Multi-step pandas operations, not limited to SQL

Example success:

User: "Show me a chart of critical vulnerabilities by asset type"

Agent: [Writes code to query Athena, group by asset_type, plot bar chart]
Code executes successfully
Agent: "Here's the chart [chart.png]. EC2 instances have the most critical vulns."

User: "Can you break that down by environment?"

Agent: [Modifies code to add environment faceting]

User: "Show me a chart of critical vulnerabilities by asset type"

Agent: [Writes code to query Athena, group by asset_type, plot bar chart]
Code executes successfully
Agent: "Here's the chart [chart.png]. EC2 instances have the most critical vulns."

User: "Can you break that down by environment?"

Agent: [Modifies code to add environment faceting]

User: "Show me a chart of critical vulnerabilities by asset type"

Agent: [Writes code to query Athena, group by asset_type, plot bar chart]
Code executes successfully
Agent: "Here's the chart [chart.png]. EC2 instances have the most critical vulns."

User: "Can you break that down by environment?"

Agent: [Modifies code to add environment faceting]

User: "Show me a chart of critical vulnerabilities by asset type"

Agent: [Writes code to query Athena, group by asset_type, plot bar chart]
Code executes successfully
Agent: "Here's the chart [chart.png]. EC2 instances have the most critical vulns."

User: "Can you break that down by environment?"

Agent: [Modifies code to add environment faceting]

Stage 4: Agent Environment

What we added: Sandboxed isolation + explicit boundaries.

The breakthrough in Stage 3 introduced new risks:

  • Code could query any Athena table (not just allowed ones)

  • Code could make arbitrary network requests

  • Code could leak credentials via print statements

  • Code could consume unbounded resources (memory, CPU)

We needed infrastructure:

class AgentEnvironment:
    def __init__(self, tenant_id):
        self.sandbox = E2BSandbox(
            network_policy=DENY_ALL_EXCEPT_APPROVED,  # Network isolation
            env_vars=inject_credentials(tenant_id),  # Credential injection
            allowed_tables=get_tenant_tables(tenant_id),  # Data scoping
            timeout=300,  # 5-minute execution limit
        )
        self.tools = load_mcp_tools()  # Predefined tools still available

    async def execute(self, request):
        if request.type == "tool_call":
            return await self.tools.execute(request)
        elif request.type == "code":
            return await self.sandbox.run(request.code)
class AgentEnvironment:
    def __init__(self, tenant_id):
        self.sandbox = E2BSandbox(
            network_policy=DENY_ALL_EXCEPT_APPROVED,  # Network isolation
            env_vars=inject_credentials(tenant_id),  # Credential injection
            allowed_tables=get_tenant_tables(tenant_id),  # Data scoping
            timeout=300,  # 5-minute execution limit
        )
        self.tools = load_mcp_tools()  # Predefined tools still available

    async def execute(self, request):
        if request.type == "tool_call":
            return await self.tools.execute(request)
        elif request.type == "code":
            return await self.sandbox.run(request.code)
class AgentEnvironment:
    def __init__(self, tenant_id):
        self.sandbox = E2BSandbox(
            network_policy=DENY_ALL_EXCEPT_APPROVED,  # Network isolation
            env_vars=inject_credentials(tenant_id),  # Credential injection
            allowed_tables=get_tenant_tables(tenant_id),  # Data scoping
            timeout=300,  # 5-minute execution limit
        )
        self.tools = load_mcp_tools()  # Predefined tools still available

    async def execute(self, request):
        if request.type == "tool_call":
            return await self.tools.execute(request)
        elif request.type == "code":
            return await self.sandbox.run(request.code)
class AgentEnvironment:
    def __init__(self, tenant_id):
        self.sandbox = E2BSandbox(
            network_policy=DENY_ALL_EXCEPT_APPROVED,  # Network isolation
            env_vars=inject_credentials(tenant_id),  # Credential injection
            allowed_tables=get_tenant_tables(tenant_id),  # Data scoping
            timeout=300,  # 5-minute execution limit
        )
        self.tools = load_mcp_tools()  # Predefined tools still available

    async def execute(self, request):
        if request.type == "tool_call":
            return await self.tools.execute(request)
        elif request.type == "code":
            return await self.sandbox.run(request.code)

What this provides:

  • Flexibility of code execution (Stage 3)

  • Safety of predefined tools (Stage 1)

  • Isolation boundaries (network, credentials, data)

  • Failure containment (sandbox crashes don't affect session)

Architecture Comparison: When to Use Each Pattern

Pattern

Best For

Avoid When

Tool-Calling

Structured tasks, known operations, customer support

Exploratory analysis, data science, research

RAG

Q&A over docs, grounded answers, fact lookup

Need to take actions, modify state, generate visualizations

ReAct

Adaptive, flexible multi-step tool use

Need deterministic, auditable execution (strict runbooks / compliance)

Agentic Workflows

Multi-step tasks, clear phases, specialized roles

Single-step tasks, tight latency requirements

Agent Environment

Complex analysis, exploration, self-correction that require safe code execution

Simple structured tasks, maximum control needed

Decision Tree





The Agent Environment Pattern: Deep Dive

Let's break down what makes an "Agent Environment" distinct:

1. Dual Execution Modes

Tools (structured) + Code (unstructured)

# Agent decides which mode to use:

# Mode 1: Predefined tool (fast, safe)
response = agent.call_tool("query_vuln_db", {"tenant": "acme", "severity": "critical"})

# Mode 2: Code execution (flexible, exploratory)
response = agent.execute_code("""
import pyathena
conn = pyathena.connect(...)
# Custom analysis logic
""")
# Agent decides which mode to use:

# Mode 1: Predefined tool (fast, safe)
response = agent.call_tool("query_vuln_db", {"tenant": "acme", "severity": "critical"})

# Mode 2: Code execution (flexible, exploratory)
response = agent.execute_code("""
import pyathena
conn = pyathena.connect(...)
# Custom analysis logic
""")
# Agent decides which mode to use:

# Mode 1: Predefined tool (fast, safe)
response = agent.call_tool("query_vuln_db", {"tenant": "acme", "severity": "critical"})

# Mode 2: Code execution (flexible, exploratory)
response = agent.execute_code("""
import pyathena
conn = pyathena.connect(...)
# Custom analysis logic
""")
# Agent decides which mode to use:

# Mode 1: Predefined tool (fast, safe)
response = agent.call_tool("query_vuln_db", {"tenant": "acme", "severity": "critical"})

# Mode 2: Code execution (flexible, exploratory)
response = agent.execute_code("""
import pyathena
conn = pyathena.connect(...)
# Custom analysis logic
""")

Why both?

  • Tools for common operations (fast path)

  • Code for edge cases and exploration (slow path)

  • Agent chooses based on task complexity

2. Sandboxed Runtime Isolation

Network, filesystem, and credentials are isolated by default. The agent runs inside a locked-down runtime where access is explicit, scoped, and auditable.





An “agent sandbox” is more than just a place to run code. In practice, the environment becomes the execution substrate that lets agents do real work safely and repeatably:

  1. A reliable filesystem abstraction

    Agents can read/write intermediate artifacts (queries, CSVs, charts) without leaking data or depending on local machines.

  2. Long-running background jobs

    Kick off heavier tasks (large queries, joins, report generation) and stream progress/results back to the agent.

  3. Packaging artifacts as reusable skills

    Turn common operations into versioned, testable building blocks (e.g., “CVE spike analysis”, “asset exposure report”).

  4. Scalable access to raw data lakes

    Query Athena/S3 directly with tenant-scoped permissions, instead of copying data into prompts.

  5. Local compute primitives for agent workflows

    Some systems add in-sandbox analytics (e.g., DuckDB), embedding computation, or lightweight indices to accelerate agentic reasoning (not always needed, but the environment makes it possible).

A core principle is that runtime isolation is what makes agent execution bounded, reproducible, and production-grade.

3. Context Scoping

Agent sees only what it needs:

# Scope resolution example

agent_scope_id = "ab9d8xc"   # scope granted by RBAC
query_tenant_id = "acme"     # tenant selected in UI

env = AgentEnvironment(
    agent_scope_id=agent_scope_id,
    query_tenant=query_tenant_id,
)

# --- Scope resolution ensures calls respect permissions ---
scope = env.resolve_scope()

# Tools never take tenant/schema params; they use the resolved scope.
rows = env.tools.sql.query("SELECT * FROM vulnerabilities LIMIT 10")
# executed as: SELECT * FROM acme.vulnerabilities LIMIT 10

# If RBAC doesn't allow this tenant, resolve_scope fails (or env creation fails).
# e.g., "ab9d8xc" can't access "tools.sql.query" -> PermissionError
# Scope resolution example

agent_scope_id = "ab9d8xc"   # scope granted by RBAC
query_tenant_id = "acme"     # tenant selected in UI

env = AgentEnvironment(
    agent_scope_id=agent_scope_id,
    query_tenant=query_tenant_id,
)

# --- Scope resolution ensures calls respect permissions ---
scope = env.resolve_scope()

# Tools never take tenant/schema params; they use the resolved scope.
rows = env.tools.sql.query("SELECT * FROM vulnerabilities LIMIT 10")
# executed as: SELECT * FROM acme.vulnerabilities LIMIT 10

# If RBAC doesn't allow this tenant, resolve_scope fails (or env creation fails).
# e.g., "ab9d8xc" can't access "tools.sql.query" -> PermissionError
# Scope resolution example

agent_scope_id = "ab9d8xc"   # scope granted by RBAC
query_tenant_id = "acme"     # tenant selected in UI

env = AgentEnvironment(
    agent_scope_id=agent_scope_id,
    query_tenant=query_tenant_id,
)

# --- Scope resolution ensures calls respect permissions ---
scope = env.resolve_scope()

# Tools never take tenant/schema params; they use the resolved scope.
rows = env.tools.sql.query("SELECT * FROM vulnerabilities LIMIT 10")
# executed as: SELECT * FROM acme.vulnerabilities LIMIT 10

# If RBAC doesn't allow this tenant, resolve_scope fails (or env creation fails).
# e.g., "ab9d8xc" can't access "tools.sql.query" -> PermissionError
# Scope resolution example

agent_scope_id = "ab9d8xc"   # scope granted by RBAC
query_tenant_id = "acme"     # tenant selected in UI

env = AgentEnvironment(
    agent_scope_id=agent_scope_id,
    query_tenant=query_tenant_id,
)

# --- Scope resolution ensures calls respect permissions ---
scope = env.resolve_scope()

# Tools never take tenant/schema params; they use the resolved scope.
rows = env.tools.sql.query("SELECT * FROM vulnerabilities LIMIT 10")
# executed as: SELECT * FROM acme.vulnerabilities LIMIT 10

# If RBAC doesn't allow this tenant, resolve_scope fails (or env creation fails).
# e.g., "ab9d8xc" can't access "tools.sql.query" -> PermissionError

4. Progressive Disclosure

Start simple, add complexity as needed:

# Step 1: Try predefined tool
response = agent.call_tool("query_vuln_db", {...})

if response.empty:
    # Step 2: Fall back to code execution
    response = agent.execute_code("""
    # Custom query with adjusted date format
    """)

if response.still_wrong:
    # Step 3: Inspect raw data and adjust
    response = agent.execute_code("""
    # Debug: Print schema, adjust query
    """)
# Step 1: Try predefined tool
response = agent.call_tool("query_vuln_db", {...})

if response.empty:
    # Step 2: Fall back to code execution
    response = agent.execute_code("""
    # Custom query with adjusted date format
    """)

if response.still_wrong:
    # Step 3: Inspect raw data and adjust
    response = agent.execute_code("""
    # Debug: Print schema, adjust query
    """)
# Step 1: Try predefined tool
response = agent.call_tool("query_vuln_db", {...})

if response.empty:
    # Step 2: Fall back to code execution
    response = agent.execute_code("""
    # Custom query with adjusted date format
    """)

if response.still_wrong:
    # Step 3: Inspect raw data and adjust
    response = agent.execute_code("""
    # Debug: Print schema, adjust query
    """)
# Step 1: Try predefined tool
response = agent.call_tool("query_vuln_db", {...})

if response.empty:
    # Step 2: Fall back to code execution
    response = agent.execute_code("""
    # Custom query with adjusted date format
    """)

if response.still_wrong:
    # Step 3: Inspect raw data and adjust
    response = agent.execute_code("""
    # Debug: Print schema, adjust query
    """)

Agent self-corrects without human intervention.

Pros and Cons: The Honest Assessment

Advantages of Agent Environments

1. Maximum Flexibility

Agents aren't limited to predefined operations. If a new analysis pattern emerges, the agent adapts—no need to deploy new tools.

Example: User asks "Show me vulnerabilities that appeared after a scanner update."

  • Tool-calling agent: ❌ No predefined tool for this

  • Agent environment: ✅ Writes SQL to compare pre/post scanner runs

2. Self-Correction

When code fails, agent reads the error and fixes it.

Example traceback:

# Agent's first attempt:
df = pd.read_sql("SELECT * FROM vulns WHERE date > '2024-01-01'")
# Error: column "date" does not exist

# Agent's second attempt (self-corrected):
df = pd.read_sql("SELECT * FROM vulns WHERE created_at > '2024-01-01'")
# Success ✅
# Agent's first attempt:
df = pd.read_sql("SELECT * FROM vulns WHERE date > '2024-01-01'")
# Error: column "date" does not exist

# Agent's second attempt (self-corrected):
df = pd.read_sql("SELECT * FROM vulns WHERE created_at > '2024-01-01'")
# Success ✅
# Agent's first attempt:
df = pd.read_sql("SELECT * FROM vulns WHERE date > '2024-01-01'")
# Error: column "date" does not exist

# Agent's second attempt (self-corrected):
df = pd.read_sql("SELECT * FROM vulns WHERE created_at > '2024-01-01'")
# Success ✅
# Agent's first attempt:
df = pd.read_sql("SELECT * FROM vulns WHERE date > '2024-01-01'")
# Error: column "date" does not exist

# Agent's second attempt (self-corrected):
df = pd.read_sql("SELECT * FROM vulns WHERE created_at > '2024-01-01'")
# Success ✅

3. Composability

Tools + code means agents can:

  • Use tools for common operations (fast)

  • Write code for edge cases (flexible)

  • Combine tools in novel ways

4. Observability

Every code execution is logged with:

  • Input (code)

  • Output (result)

  • Error (traceback)

  • Duration (latency)

Easier to debug than nested tool calls.

Disadvantages of Agent Environments

1. Harder to Control

Code execution is unbounded. Agent could:

  • Write infinite loop (resource exhaustion)

  • Query tables it shouldn't (data leak)

  • Make network requests to arbitrary domains (exfiltration)

Mitigation: Sandboxing (network isolation, quotas, allowed tables)

2. Higher Latency

Spinning up a sandbox takes time:

  • Tool call: ~200ms

  • Code execution: ~2-5 seconds (sandbox startup + execution)

Mitigation: Keep sandboxes warm, batch operations

3. More Expensive

Sandboxes cost more than function calls:

  • E2B sandbox: ~$0.002 per minute

  • Tool call: ~$0 (just LLM + function execution)

Mitigation: Use tools for common paths, code for exploration

4. Requires Isolation Expertise

You need to understand:

  • Network policies (allow-lists, deny-lists)

  • Credential management (injection, rotation)

  • Resource limits (memory, CPU, timeout)

  • Failure modes (sandbox crashes, timeouts)

Mitigation: Use managed sandboxes (E2B, Modal, Runpod)

When Agent Environments Make Sense

Use Agent Environments when:

  1. Tasks are exploratory: You don't know all queries in advance

  2. Data is complex: Multiple schemas, inconsistent naming, needs transformation

  3. Users expect flexibility: "Show me X" where X varies widely

  4. Self-correction is valuable: Agent can debug and retry

  5. Actions are varied: Query databases, generate charts, create tickets, run analysis

Don't use Agent Environments when:

  1. Tasks are structured: All operations known in advance → Tool-calling is simpler

  2. Latency is critical: < 500ms response time → Tools are faster

  3. Control is paramount: You can't risk any deviation → Tools are safer

  4. Budget is tight: Sandboxes are expensive → RAG or tools are cheaper

Architectural Evolution: What We'd Do Differently

What Worked

1. Starting simple: Tool-calling → RAG → Code execution was the right progression. We didn't jump to sandboxes on day 1.

2. Isolation from the start: Even in Stage 3, we used E2B sandboxes. Didn't try to run untrusted code in the main process.

3. Multi-tenancy as storage isolation: We never considered logical multi-tenancy (tenant_id column). Physical isolation (dedicated DBs) was the right call.

What We'd Change

1. Invest in observability earlier: We should have built tracing + logging infrastructure in Stage 1, not Stage 3.

2. Explicit tool access control: We should have defined "which agent can call which tools" from day 1. Now we're retrofitting policy.

3. Sandbox performance: We didn't anticipate latency would be a concern. Should have tested cold-start times earlier.

4. Cost modeling: We underestimated sandbox costs. Should have projected $/query early.

The Future: Secure Agent Harnesses

We're evolving toward secure agent harnesses:

a runtime control plane that decides whether a tool call is allowed, executes it in an isolated sandbox, and produces auditable evidence of the decision.

More to come on this topic in a subsequent blogpost.

Key Takeaways

1. Architecture Matters More Than Model

The difference between tool-calling and agent environments is more impactful than the difference between GPT-4 and Claude.

Architecture determines:

  • What's possible (capabilities)

  • What's safe (blast radius)

  • What's affordable (cost structure)

2. No Single Architecture Fits All

Each pattern has strengths:

  • Tool-Calling: Fast, safe, structured

  • RAG: Grounded, scalable, explainable

  • Workflows: Compositional, observable, specialized

  • Environments: Flexible, self-correcting, exploratory

Choose based on your use case, not hype.

3. Evolution Is Natural

You'll likely progress through architectures:

  1. Start with tool-calling (simple, safe)

  2. Add RAG if you need grounding (docs, facts)

  3. Add code execution if you need flexibility (exploration)

  4. Add sandboxing if you need safety (isolation)

Don't try to build the final architecture on day 1.

4. Isolation Is Infrastructure, Not Prompts

If your agent can execute code, you need:

  • Network isolation (allow-lists)

  • Credential management (injection, rotation)

  • Resource limits (timeouts, memory)

  • Failure containment (sandbox crashes)

Prompts can't provide these guarantees. Infrastructure can.

Conclusion

After a year of building agents for enterprise security, we converged on Agent Environments: tools + code execution in isolated sandboxes.

This pattern gives us:

  • Flexibility to explore complex data

  • Safety through sandboxed isolation

  • Self-correction when things fail

  • Composability of tools and code

It's not the right pattern for every use case. But for complex, exploratory tasks with high-stakes data, it's the only architecture that scales.

If you're building agents and wrestling with these tradeoffs, we hope this helps.

Join Us

If you're excited about building secure autonomous AI agents that can operate in mission-critical environments with provable, controlled autonomy—we're hiring.

We're looking for engineers who:

  • Care about both performance and correctness

  • Embrace incremental improvement over perfect designs

  • Want to work at the intersection of cybersecurity and agentic systems

Check out our careers page or reach out directly.

References: