MCP Security Layer
A security layer between an AI agent and its tools · checks every tool call at the intent level, blocks or approves, logs.
Before an AI agent calls a tool (send_email, execute_sql, transfer_funds), MCP Security intercepts. A secondary AI model classifies the intent of the call, matches it against a policy, allows or blocks · logs everything for audit. Drop-in in front of any MCP-compatible agent stack.
ListenAn AI agent goes to move real money. Built an MCP intent-guard for AI agents. Before the agent transfers money, the system checks intent. Like a colleague asking 'are you sure?'.
MCP Security sits in front of any MCP-compatible agent stack and validates every tool call at the intent level: a secondary classifier model reads the call's purpose, matches it against a YAML policy, allows / blocks, and logs the decision for audit. The studio shipped the intent classifier, the policy engine, and the alerting + audit-log layer.
Our AI agents were already touching live systems, and we had no good answer for the auditor. The studio added a layer that checks every action before it happens, like a colleague asking 'are you sure?' before the button is pressed. The audit log and the rule file passed the regulator's review on the first try.
What's on screen
Frame breakdown
- 01User surface
The whole experience the user sees
This frame shows the live product: a security layer between an ai agent and its tools · checks every tool call at the intent level, blocks or approves, logs. Every component is ours · scope, design, code, deploy.
- 02Stack behind the screen
What's powering it: Python, FastAPI, OpenAI
5 stack components run behind this frame · Python, FastAPI, OpenAI drive the visible UI; the rest sit in the data layer. All studio-owned.
- 03What we shipped
Intent analyser · separate model decides what the call is trying to do
One layer that protects every tool · no per-tool auth logic needed
- 04Status
Private deploy · under NDA.
Per the client's request the URL stays private · the build, architecture, and lessons can be shared in a scoping call.
How it shipped
Timeline- 01 · BRIEF
What does 'safe' mean for an agent calling tools?
Workshop with the customer's security + AI team to draft the threat model: prompt injection, lateral compromise across tools, exfiltration via legitimate-looking calls. YAML policy DSL fell out of that conversation.
- 02 · ARCHITECTURE
Stack decisions before any code.
Decision doc captured the data flow, Python, FastAPI, OpenAI, Anthropic role split, and the failure modes we'd handle in v1 vs defer. Cross-service boundaries (where AI ends and the web app begins) were drawn here so neither side leaked into the other later.
- 02 · BUILD
Intent classifier + policy engine + audit log.
FastAPI middleware fronts the MCP server, calls the secondary model to classify intent, evaluates the YAML policy, returns allow / deny / quarantine. Every decision logged with rationale · replayable for audit.
- 04 · POLISH
Performance, accessibility, and observability.
PSI / a11y / coverage budgets enforced as launch gates. Logging + metrics wired before cut-over · the team can answer 'is it working?' from a dashboard, not a Slack thread. Threat-model checklist signed off before traffic hits the box.
- 03 · SHIP
Live on customer agent stacks · alerts wired.
Drop-in slot took under an hour per stack · Slack + PagerDuty alerts on suspicious-pattern matches, weekly digest emailed to the security lead.
What shipped
04- 01Intent
Secondary classifier model
Classifies what each tool call is trying to do · the policy engine never sees raw prompts, only the inferred intent.
- 02Policy
YAML allow / deny / quarantine
Versioned, code-reviewable policy file · changes go through the same PR review as application code.
- 03Audit
Replayable decision log
Every call + decision + rationale stored · auditor can replay any agent run end-to-end.
- 04Alerts
Slack + PagerDuty + weekly digest
Real-time alert on the suspicious patterns plus a weekly digest of trends · security lead reads one email per week.
From the video
Frame by frame
01FrameAuditor scoreboard · 4 scenarios laid bare
Top counters (events / tool calls / findings / blocked) + four side-by-side scenarios that demonstrate the value: 'attack succeeds (policy=audit)' next to 'attack blocked (policy=enforce)'. Same prompt, different policy, different outcome.
02FrameSet-up · the request that triggers the attack
Innocent-looking 'fetch and summarize' request goes into the conversation pane · the audit stream shows the model + backend (openrouter / openai gpt-4o-mini) about to handle it. The dangerous follow-up call is hidden inside the page content.
03FrameAttack visible · prompt injection asks for the SSH key
Right pane shows the malicious 'MCP AGENT DIRECTIVE' embedded in the wiki page asking the agent to execute `cat ~/.ssh/id_rsa`. Left audit stream logs `BLOCKED execute_command` · the auditor stops the dangerous call cold.
04FrameSafe reply · agent escalates instead of executing
Agent's final reply refuses politely: 'I'm unable to access the content of the internal wiki admin portal due to authentication restrictions.' Tool args + results are redacted on the wire using the proxy's secret-detection patterns.
THE PROBLEM
- −An agent with tools executes whatever the prompt says · prompt injection becomes expensive fast
- −Writing custom auth per tool is engineering-days per tool
- −No central log of what the AI tried · audit isn't reproducible
- −If one tool gets compromised, the others inherit the blast radius
WHAT THE CLIENT GOT
- One layer that protects every tool · no per-tool auth logic needed
- Audit trail ready · fits MNB / NAIH / EU AI Act reporting
- Safe under prompt injection · the intent layer stops malicious calls
- Drop-in · 1 hour to slot into an existing MCP stack
WHAT WE DELIVERED
- +Intent analyser · separate model decides what the call is trying to do
- +Policy engine · YAML-based allow/deny rules
- +Full audit log · every call, decision, rationale preserved
- +Real-time alerts · Slack, PagerDuty, email on suspicious patterns
- +Drop-in MCP-compatible · OpenAI Assistants API, Anthropic, LangChain
STACK
- Python
- FastAPI
- OpenAI
- Anthropic
- LangChain
RELATED READING
- AI solutions · Websites, web apps & online shops · Cybersecurity · Custom software · everything elseQ3 2026 roundup: what shifted, what we shipped, what brokeThree months in. SZEP 2.0 live, NAV v3 cutover, AI Act enforcement, OWASP LLM Top 10 v2. Hard numbers, one strong opinion on the consulting tier.
- AI solutions · Websites, web apps & online shops · Custom software · everything elseQ2 2026 roundup: what shifted, what we shipped, what brokeFour months in. Eleven shipped projects, real before/after numbers, one strong opinion on what the consulting tier got wrong this quarter.
- Custom software · everything else · AI solutionsn8n vs Make vs custom code: 2026 automation stackNo-code automation is brilliant until it isn't. Here's the line where n8n / Make stop saving money and custom code starts - and how to tell which side you're on.
- AI solutionsAI agent pricing 2026: what an autonomous agent costsAn AI agent is not a chatbot with extra steps - it takes actions, and that changes the bill. Here are the real 2026 ranges and what drives them.