Recon
Map every endpoint, every framework, every footgun. Manual scanners run a fixed signature set. The swarm runs against your actual surface.
Swarm runs an audit-grade pen test against your LLM apps, MCP servers, managed agents, and vibe-coded apps shipped on Lovable, v0, Bolt.new, Replit Agent, Cursor, or Claude Artifacts. Dedicated specialists for prompt injection, the EchoLeak (CVE-2025-32711) class, the mcp-remote OAuth RCE (CVE-2025-6514) class, vector and memory poisoning. Receipts on every finding.
The engagement
Map every endpoint, every framework, every footgun. Manual scanners run a fixed signature set. The swarm runs against your actual surface.
Specialists own classes of attack. Auth flaws. Access control. Injection. Logic. Each agent probes its vector and cites the request that proved it.
Verified PoC for every Critical and High. Multi-step chains are first-class. The chain analyst composes findings into one exploit path.
Markdown narrative. Full audit trail. JSON for tooling. Your auditor reads the action that matches the verdict.
The differentiator
Every tool call. Every request. Every grep. Every submit. Every verify. Streams to the dashboard live and ships with the report. Your SOC 2 reviewer doesn't have to take our word for it. They open action 1,847 and read what we did.
The price
No per-target pricing. No per-finding pricing. No "starts from". One engagement, one fee, one audit trail.
Questions
The questions every engineering and security lead asks before they fund an engagement. Read the answers here, before the kickoff call.
AI penetration testing covers attack surfaces specific to AI, LLM, and agentic applications: prompt injection (direct, indirect, tool-mediated, browser-mediated), MCP servers, managed agent platforms, vector and embedding stores, and vibe-coded apps shipped on Lovable, v0, Bolt.new, Replit Agent, Cursor, or Claude Artifacts. Swarm covers the full OWASP LLM Top 10 (2025) and OWASP Top 10 for Agentic Applications (2026) with dedicated specialists per category. The deliverable is identical to the SaaS engagement: structured report, audit-trail CSV, validated proof-of-concept for every Critical and High, and a free retest within 30 days.
Yes. mcp_specialist baselines tool descriptions on first contact, diffs them across the engagement, and tests for tool-description rug-pull (the Invariant Labs Supabase Cursor disclosure pattern). The CVE-2025-6514 mcp-remote OAuth RCE class is in the daily-updated CVE library and consulted at runtime. Tool and resource authorization, schema injection, and cross-tool prompt-injection chains all sit under the same specialist.
Full coverage on 9 of 10 categories. Prompt Injection (LLM01), Sensitive Information Disclosure (LLM02), Supply Chain (LLM03), Data and Model Poisoning (LLM04), Improper Output Handling (LLM05, includes EchoLeak / CVE-2025-32711 class), Excessive Agency (LLM06), System Prompt Leakage (LLM07), Vector and Embedding Weakness (LLM08), Unbounded Consumption (LLM10). Misinformation (LLM09) is explicitly out of scope: that is content quality, not a security category.
A focused audit for apps shipped fast on Lovable, v0, Bolt.new, Replit Agent, Cursor, or Claude Artifacts. The patterns are repeatable: OpenAI / Anthropic / Stripe / Supabase keys in client bundles, Supabase tables with permissive RLS, admin pages gated by client-side guards only, server-side auth missing on key endpoints, and SQLi in routes the model wrote without parameterizing. Swarm dispatches a vibe-coded-app specialist on top of the standard recon and access-control specialists, and the report ships with the same receipts and full audit trail.
Full coverage on 9 of 10 categories: Agent Goal Hijack (ASI01), Tool Misuse and Exploitation (ASI02), Identity and Privilege Abuse (ASI03), Agentic Supply Chain (ASI04, includes CVE-2025-6514), Unexpected Code Execution (ASI05), Memory and Context Poisoning (ASI06), Inter-Agent Communication (ASI07), Cascading Failures (ASI08), Human-Agent Trust Exploitation (ASI09). Rogue Agents (ASI10) is partial: detecting unsanctioned agent fleets is in scope where they are reachable from the customer-authorised target, not where they live entirely outside it.
Prompt injection redirects an LLM's behavior through input it ingests. Swarm tests four classes with dedicated specialists: direct (user message), indirect (retrieved content like a document or webpage), tool-mediated (tool output that the model treats as instruction), and browser-mediated (page content the agent navigates to). Every successful injection lands in the audit trail with the exact request that triggered it; chains where one injection enables a tool call live as their own finding under the chain analyst.
Yes. Vector-DB authorisation, embedding-collision attacks, RAG-ingest paths, persistent-memory injection. OWASP LLM08 (Vector and Embedding Weakness) and ASI06 (Memory and Context Poisoning) are both fully covered. The poisoning vectors that activate weeks later (an attacker plants a payload in a document the swarm later retrieves) are covered by the indirect-injection specialist.
EchoLeak is an Improper Output Handling vulnerability: LLM output is rendered without sanitisation, leading to XSS or SQL injection downstream of the model. It maps to OWASP LLM05. Swarm dispatches a dedicated output-handling specialist that probes every LLM-rendered surface for execution sinks, attaches the evidence row that proved it, and references the CVE class in the finding.
A manual AI red team is a small group of humans, two-to-four-week timeline, $30,000 to $80,000 typical, deliverable is a PDF whose methodology lives in the consultants' heads. Swarm runs the same OWASP LLM and Agentic coverage in roughly two hours for $4,995 flat, with every specialist action receipted in the audit trail your auditor reads alongside the report. Bespoke red team work (sophisticated social engineering, multi-month adversarial scenarios) still belongs with a senior firm; the standard OWASP-aligned annual AI security audit belongs with Swarm.