AI Penetration Testing for LLM, MCP, Agentic, Vibe-Coded Apps | Swarm
AI Pentest|Sign in
10×10
OWASP LLM TOP 10  /  OWASP AGENTIC TOP 10Receipts for both

AI Penetration Testing

Prompt injection.Tool rug-pull.Memory poisoning.

Swarm runs an audit-grade pen test against your LLM apps, MCP servers, managed agents, and vibe-coded apps shipped on Lovable, v0, Bolt.new, Replit Agent, Cursor, or Claude Artifacts. Dedicated specialists for prompt injection, the EchoLeak (CVE-2025-32711) class, the mcp-remote OAuth RCE (CVE-2025-6514) class, vector and memory poisoning. Receipts on every finding.

Engagement 1f7c2 · liverecording
00114:02:11[recon]http_request GET /.well-known/mcp.json200
00214:02:14[mcp]baseline_tools 14 tools cachedok
00314:02:32[prompt-inject]inject_indirect via /docs/onboarding200
00414:02:48[prompt-inject]submit_finding tool exec via doc retrievalhigh
00514:03:02[mcp]diff_tools list_records description changedrug-pull
00614:03:48[vibe-coded]submit_finding OPENAI_API_KEY in /assets/main.jscritical
00714:04:21[output-handling]submit_finding LLM05 unsanitised render → XSShigh
00814:05:30[chain]submit_finding CHAIN-2 prompt_injection → tool exfilcritical
00914:06:14[report]compose_report attaching audit traildone
audit trail · streamingspecialists 30/30
SWARMSEC.AI · BUILT FOR AI PRODUCT TEAMSOWASP LLM · OWASP AGENTIC · MCP
30+
Specialists
<1hr
To first finding
$4,995
Flat. One number.
30d
Free retest

The engagement

One swarm. Four phases.

01

Recon

Map every endpoint, every framework, every footgun. Manual scanners run a fixed signature set. The swarm runs against your actual surface.

02

Triage

Specialists own classes of attack. Auth flaws. Access control. Injection. Logic. Each agent probes its vector and cites the request that proved it.

03

Exploit

Verified PoC for every Critical and High. Multi-step chains are first-class. The chain analyst composes findings into one exploit path.

04

Report

Markdown narrative. Full audit trail. JSON for tooling. Your auditor reads the action that matches the verdict.

The differentiator

Receipts on every finding.

Every tool call. Every request. Every grep. Every submit. Every verify. Streams to the dashboard live and ships with the report. Your SOC 2 reviewer doesn't have to take our word for it. They open action 1,847 and read what we did.

audit trail · engagement 0a9b3 · actions 142–1501,847 actions · 312KB
014214:11:08[recon]http_request GET /api/internal/health200
014314:11:09[recon]http_request GET /api/internal/users?role=admin200
014414:11:10[auth]submit_finding token-leak in /api/internal/usershigh
014514:11:32[broken-access]source_grep authorize\(.*role7 hits
014614:11:48[broken-access]http_request POST /api/role/upgrade403
014714:12:14[broken-access]http_request POST /api/role/upgrade -H X-Forwarded-User: admin200
014814:12:15[broken-access]submit_finding privilege bypass via X-Forwarded-Usercritical
014914:12:32[chain]submit_finding CHAIN-2 IDOR + role bypass = full takeovercritical
015014:13:08[reviewer]verify CHAIN-2 reproducible against live targetsealed
Continued through engagement completionSealed and signed
200Successful response or benign result
highVerified high-severity finding
criticalVerified critical finding or chain

The price

One number. Read the receipts.

No per-target pricing. No per-finding pricing. No "starts from". One engagement, one fee, one audit trail.

$4,995
Flat per engagement
01
30+ specialists
chain_analyst · idor · prompt_injection · broken_access · +26 more
02
Verified PoC
Every Critical and High, reproducible
03
Audit trail
Every action logged, evidence-grade
04
Signed report
Cryptographically attested. Auditor-deliverable. Prospect-ready.
05
30-day retest
Free verification once you fix
06
SOC 2 evidence
Auditor-ready, no extra prep
Start engagementFree preview before you pay anything.

Questions

What buyers ask. Receipts attached.

The questions every engineering and security lead asks before they fund an engagement. Read the answers here, before the kickoff call.

01What is AI penetration testing?

AI penetration testing covers attack surfaces specific to AI, LLM, and agentic applications: prompt injection (direct, indirect, tool-mediated, browser-mediated), MCP servers, managed agent platforms, vector and embedding stores, and vibe-coded apps shipped on Lovable, v0, Bolt.new, Replit Agent, Cursor, or Claude Artifacts. Swarm covers the full OWASP LLM Top 10 (2025) and OWASP Top 10 for Agentic Applications (2026) with dedicated specialists per category. The deliverable is identical to the SaaS engagement: structured report, audit-trail CSV, validated proof-of-concept for every Critical and High, and a free retest within 30 days.

02Does Swarm test MCP servers?

Yes. mcp_specialist baselines tool descriptions on first contact, diffs them across the engagement, and tests for tool-description rug-pull (the Invariant Labs Supabase Cursor disclosure pattern). The CVE-2025-6514 mcp-remote OAuth RCE class is in the daily-updated CVE library and consulted at runtime. Tool and resource authorization, schema injection, and cross-tool prompt-injection chains all sit under the same specialist.

03Does Swarm cover the OWASP LLM Top 10?

Full coverage on 9 of 10 categories. Prompt Injection (LLM01), Sensitive Information Disclosure (LLM02), Supply Chain (LLM03), Data and Model Poisoning (LLM04), Improper Output Handling (LLM05, includes EchoLeak / CVE-2025-32711 class), Excessive Agency (LLM06), System Prompt Leakage (LLM07), Vector and Embedding Weakness (LLM08), Unbounded Consumption (LLM10). Misinformation (LLM09) is explicitly out of scope: that is content quality, not a security category.

04What is a vibe-coded app pentest?

A focused audit for apps shipped fast on Lovable, v0, Bolt.new, Replit Agent, Cursor, or Claude Artifacts. The patterns are repeatable: OpenAI / Anthropic / Stripe / Supabase keys in client bundles, Supabase tables with permissive RLS, admin pages gated by client-side guards only, server-side auth missing on key endpoints, and SQLi in routes the model wrote without parameterizing. Swarm dispatches a vibe-coded-app specialist on top of the standard recon and access-control specialists, and the report ships with the same receipts and full audit trail.

05Does Swarm cover the OWASP Top 10 for Agentic Applications?

Full coverage on 9 of 10 categories: Agent Goal Hijack (ASI01), Tool Misuse and Exploitation (ASI02), Identity and Privilege Abuse (ASI03), Agentic Supply Chain (ASI04, includes CVE-2025-6514), Unexpected Code Execution (ASI05), Memory and Context Poisoning (ASI06), Inter-Agent Communication (ASI07), Cascading Failures (ASI08), Human-Agent Trust Exploitation (ASI09). Rogue Agents (ASI10) is partial: detecting unsanctioned agent fleets is in scope where they are reachable from the customer-authorised target, not where they live entirely outside it.

06What is prompt injection and how is it tested?

Prompt injection redirects an LLM's behavior through input it ingests. Swarm tests four classes with dedicated specialists: direct (user message), indirect (retrieved content like a document or webpage), tool-mediated (tool output that the model treats as instruction), and browser-mediated (page content the agent navigates to). Every successful injection lands in the audit trail with the exact request that triggered it; chains where one injection enables a tool call live as their own finding under the chain analyst.

07Does Swarm test vector stores and RAG pipelines?

Yes. Vector-DB authorisation, embedding-collision attacks, RAG-ingest paths, persistent-memory injection. OWASP LLM08 (Vector and Embedding Weakness) and ASI06 (Memory and Context Poisoning) are both fully covered. The poisoning vectors that activate weeks later (an attacker plants a payload in a document the swarm later retrieves) are covered by the indirect-injection specialist.

08What is the EchoLeak (CVE-2025-32711) class of vulnerability?

EchoLeak is an Improper Output Handling vulnerability: LLM output is rendered without sanitisation, leading to XSS or SQL injection downstream of the model. It maps to OWASP LLM05. Swarm dispatches a dedicated output-handling specialist that probes every LLM-rendered surface for execution sinks, attaches the evidence row that proved it, and references the CVE class in the finding.

09How does Swarm differ from a manual AI red team?

A manual AI red team is a small group of humans, two-to-four-week timeline, $30,000 to $80,000 typical, deliverable is a PDF whose methodology lives in the consultants' heads. Swarm runs the same OWASP LLM and Agentic coverage in roughly two hours for $4,995 flat, with every specialist action receipted in the audit trail your auditor reads alongside the report. Bespoke red team work (sophisticated social engineering, multi-month adversarial scenarios) still belongs with a senior firm; the standard OWASP-aligned annual AI security audit belongs with Swarm.

Read the receipts.
ENTER YOUR DOMAIN. SWARM MAPS YOUR ATTACK SURFACE IN JUST A FEW MINUTES.No card. Free preview.