Cogent Raises $42M Series A - Read more here
Jan 23, 2026
What We Learned Building Safe Agents for Enterprise Security
Real lessons from running AI agents in production enterprise environments
Geng Sng, CTO
The Problem: Agents in High-Stakes Environments
An agent analyzes 11,470,000 vulnerabilities across your infrastructure and decides which to prioritize. It queries production databases, generates remediation plans, and creates tickets for your engineering teams.
This is high-stakes automation. One mistake like querying the wrong tenant's data, leaking credentials in a ticket, or hallucinating a critical severity will have real consequences.
Over the past year at Cogent, we've learned that building safe agents isn't about better prompts. It's about infrastructure: isolation boundaries, credential management, and failure containment.
This post shares what we learned—the patterns that work, the ones that don't, and what we're still figuring out.
What We Actually Built
Before diving into lessons, here's what we're running in production:
The Agent Environment
Our agents run in an isolated execution environment (E2B sandboxes) with explicit boundaries:
Network isolation: Default-deny with allow-list for approved domains
Credential injection: AWS STS tokens via environment variables, never in prompts
Context scoping: Each agent gets only the data it needs for its task
Multi-tenant isolation: Storage-level separation (dedicated PostgreSQL, Redis per tenant)
Failure containment: Sandbox crashes don't affect the parent session
What This Enables
Code execution: Agents can run Python notebooks with direct Athena/S3 access
Tool calling: 15+ MCP servers (KB, CVE data, Splunk, Slack, AWS CLI)
Cross-tenant queries: Internal users can query any customer's data while maintaining audit trails
Safe exploration: Agents can experiment without risking production systems
Lesson 1: Network Isolation Is Non-Negotiable
The Failure Mode
Early on, we tried prompt-based safety: "Only query approved endpoints. Don't send data to external services."
This failed spectacularly. Agents would:
Attempt to call unapproved logging services
Try to fetch data from arbitrary URLs in tool responses
Follow redirects to domains outside our control
What We Built Instead
Default-deny network policy with explicit allow-list:
Why This Works
Can't be bypassed: No prompt injection can override network policy
Explicit boundaries: Every domain must be justified and approved
Audit trail: Network attempts to blocked domains are logged
Easy to reason about: List of 10 domains vs. "don't do bad things"
The Tradeoff
Adding domains is friction. Every new integration requires:
Security review of the domain
Update to allow-list
Deployment of updated sandbox config
This is intentional. Friction forces us to think critically about each external dependency.
Lesson 2: Never Put Credentials in Prompts
The Failure Mode
One aspect we were careful to avoid was putting AWS credentials in the system prompt:
Problems:
Credentials will appear in LLM logs and tracing
Agents sometimes echoed credentials in responses
No automatic rotation — credentials lived in prompts indefinitely
What We Built Instead
Inject credentials via environment variables:
Key properties:
✅ Never logged: Credentials don't appear in LLM traces
✅ Short-lived: STS tokens expire in 1 hour, refreshed at 30 minutes
✅ Tenant-scoped: Each tenant gets their own role-assumed credentials
✅ Automatic rotation: Token refresh happens without agent involvement
Why This Works
Credentials are infrastructure, not context. They belong in the execution environment, not the prompt.
Lesson 3: Context Delegation Over Full Access
The Failure Mode
Another potential issue was giving agents access to all customer data:
Problems:
Attention dilution: Agents get distracted by irrelevant data
Context window exhaustion: Can't fit all data in prompts
Accidental oversharing: Agents mix data from multiple tenants
Blast radius: One compromised tool sees everything
What We Built Instead
Scoped context views with explicit boundaries:
In practice:
Agents only see data for one tenant at a time
Athena queries are scoped to allowed tables (e.g.,
tenant_acme.assets)Environment variables enforce boundaries at runtime
The Pattern
Why This Works
Reduced context: Agents focus on relevant data only
Prevents cross-tenant leakage: Can't accidentally query wrong tenant
Audit trail: Every query includes tenant_id for compliance
Failure containment: Compromise affects one tenant, not all
The Tradeoff
Cross-tenant analysis requires special handling. For example:
"Compare vulnerability trends across all customers" requires aggregated views
We use pre-computed metrics in ClickHouse (historical data warehouse)
Agents query aggregates, not raw tenant data
Lesson 4: Failure Isolation Saves You
The Failure Mode
Early sandboxes shared state across tool calls:
Problems:
One bad tool call poisoned the sandbox for all subsequent calls
Memory leaks accumulated across calls
Credential expiry killed the entire session
What We Built Instead
Isolated execution per tool invocation with failure containment:
Key patterns:
Shielded persistence:
asyncio.shield()prevents cancellation from interrupting writesTimeout enforcement: 5-minute sandbox timeout, 30-second tool timeout
Graceful degradation: One tool failure doesn't break the conversation
Execution history preserved: All tool calls logged even if they fail
Why This Works
Production environments are hostile:
Network can fail mid-request
Credentials can expire
Tools can return malformed data
Users can cancel requests
Isolation means failures are local, not catastrophic.
Lesson 5: Multi-Tenancy Is Storage-Level Isolation
The Failure Mode
Logical multi-tenancy (one database, tenant_id column):
Problems:
One mistake = data breach: Forgot WHERE clause? You just leaked all tenants' data.
Performance interference: Heavy query from one tenant slows all tenants
Compliance complexity: Hard to prove data isolation to auditors
What We Built Instead
Physical multi-tenancy: Each tenant gets their own infrastructure
Resource | Isolation Level |
|---|---|
PostgreSQL | Dedicated Aurora cluster per tenant |
Redis | Dedicated ElastiCache cluster per tenant |
S3 buckets | Tenant-prefixed buckets ( |
IAM roles | Tenant-specific role assumption |
How agents access tenant data:
Why This Works
Impossible to leak cross-tenant data: Can't query another tenant's database
Performance isolation: One tenant's heavy query doesn't affect others
Compliance proof: Auditors can verify physical isolation
Blast radius containment: Incident affects one tenant, not all
The Tradeoff
Operational complexity increases:
Deploying schema changes requires per-tenant migrations
Monitoring requires per-tenant dashboards
Costs are higher (can't share infrastructure)
But for enterprise security data, this is the only acceptable architecture.
Lesson 6: Observability Must Be Tenant-Aware
The Failure Mode
We were cautious not to have LLM tracing send all data to shared observability services:
Problems:
Customer data leaked to third-party services (Braintrust, LangSmith)
No tenant isolation in traces
Hard to debug tenant-specific issues
What We Built Instead
Tenant-scoped tracing with PII filtering:
Key patterns:
✅ Credential redaction before any external logging
✅ Tenant-scoped trace storage
✅ Opt-in for external services (LangSmith, Braintrust)
✅ Internal audit log always captures full context
Why This Works
You can't debug what you can't observe. But observation can't compromise security.
The balance: Internal audit logs (full fidelity) + External traces (redacted) + Tenant control (opt-in/out).
What We're Still Figuring Out
1. Approval UX for High-Risk Actions
The problem: Some agent actions require human approval:
Changing production firewall rules
Deleting customer data
Modifying IAM policies
Current approach: Agents pause and ask for approval via Slack.
Open questions:
How do we avoid approval fatigue?
Should approvals be sync (blocking) or async?
Who should approve?
2. Context Window Management
The problem: Agents work on long-running tasks (analyzing 10,000 vulns).
Current approach:
Compaction: Summarization at context window limits
Chunking large datasets
Delegation to sub-agents with scoped context
Open questions:
When should we summarize vs. paginate?
How do we preserve important details in summaries?
What's the right granularity for sub-agent delegation?
3. Tool Access Control
The problem: Different agents need different tool access:
Read-only analyst agent (knowledge base queries only)
Action agent (can create tickets, run remediations)
Admin agent (full system access)
Current approach: Agents declare tools in config, auth middleware validates.
Future work:
Policy-as-code for tool access decisions
Explicit allow/deny rules per tool
Audit trail for tool access attempts
This is where we're heading: A PDP/PEP gateway pattern where agents have a code of conduct that is enforceable at runtime.
4. Testing Agent Safety
The problem: How do you test that isolation actually works?
Current approach:
Adversarial test cases (try to leak credentials, query wrong tenant)
Boundary tests (try to access blocked domains)
Regression tests (verify redaction on known PII patterns)
Open questions:
How do we continuously test isolation boundaries?
Can we use LLMs to generate adversarial inputs?
What's the right balance between security and agent capabilities?
Key Takeaways
After a year of running agents in production, here's what we know for certain:
1. Infrastructure > Prompts
Safety must be enforced outside the agent:
Network boundaries (allow-lists)
Credential injection (env vars, not prompts)
Multi-tenancy (storage-level isolation)
Prompts can't provide hard guarantees. Infrastructure can.
2. Failure Isolation Is Critical
Production is hostile. Things will fail:
Tool timeouts
Network errors
Credential expiry
Malformed data
Isolation means failures are local, not catastrophic.
3. Observability ≠ Surveillance
You need to observe agents to debug them. But observation can't compromise security:
Redact credentials before logging
Tenant-scoped traces
Opt-in for external services
4. Multi-Tenancy Is Non-Negotiable
For enterprise security data, logical isolation (tenant_id column) isn't enough. You need physical isolation:
Dedicated databases per tenant
Tenant-specific credentials
Storage-level separation
One SQL mistake can't leak all customers' data.
5. Context Scoping Reduces Risk
Agents with access to everything are dangerous. Give them:
Only the tenant they need
Only the tables they need
Only the time window they need
Least-context access reduces blast radius.
What's Next
We're building toward more explicit tool access control:
The vision: Every tool call evaluated against policy (not just authentication). Policy-as-code determines:
Is this tool allowed for this agent?
Does this query match the tenant scope?
Should this action require approval?
This is hard. Policy can become unmanageable ("policy sprawl"). But the alternative—vibes-based access control—doesn't scale for enterprise security.
We'll share more as we build it.
Conclusion
Building safe agents for high-stakes environments isn't about prompt engineering. It's about infrastructure:
Network isolation (deny by default)
Credential management (env vars, not prompts)
Failure containment (sandbox crashes don't propagate)
Multi-tenancy (storage-level isolation)
Context scoping (least-context access)
These patterns work. They're running in production today at Cogent, handling millions of vulnerabilities across dozens of customers.
If you're building agents for enterprise environments, start here.
Join Us
If you're excited about building secure autonomous systems that can operate in mission-critical environments with provable, controlled autonomy—we're hiring.
We're looking for engineers who:
Care about both performance and correctness
Embrace incremental improvement over perfect designs
Want to work at the intersection of cybersecurity and agentic systems
Check out our careers page or reach out directly.


