Web
Injection, broken access control, authentication failures, server-side request forgery. The canonical web attack surface.
Most pen tests run a fixed playbook against a 2015 attack surface. Swarm dispatches 50 specialists with framework-aware decisioning across OWASP Web, API, LLM, and Agentic. Validated PoC on every finding. An audit trail your prospect's security team replays end-to-end. Two hours, not weeks.
Pillar 4 · Coverage of the modern threat surface
Web app, network, maybe an API. The shape of what gets built has changed: AI features, LLM endpoints, MCP servers, autonomous workflows. Swarm covers all four OWASP standards that govern the modern surface, not last decade's.
Injection, broken access control, authentication failures, server-side request forgery. The canonical web attack surface.
BOLA, broken authentication, mass assignment, unrestricted resource consumption. Multi-tenant boundaries probed on every parameter.
Prompt injection, training data poisoning, sensitive information disclosure, model denial-of-service. For teams shipping AI features.
MCP server abuse, tool misuse, memory poisoning, autonomous-action escalation. For teams shipping agent workflows.
The engagement
Map every endpoint, every framework, every footgun. Manual scanners run a fixed signature set. The swarm runs against your actual surface.
Specialists own classes of attack. Auth flaws. Access control. Injection. Logic. Each agent probes its vector and cites the request that proved it.
Verified PoC for every finding. Multi-step chains are first-class. The chain analyst composes findings into one exploit path.
Markdown narrative. Full audit trail. JSON for tooling. Your auditor reads the action that matches the verdict.
Pillar 2 · AI-native, end-to-end
A human-firm pen test ends with a PDF, a spreadsheet, and a Zoom call. Engineers translate findings into tickets manually, lose the request that produced each one, and the report rots in a folder. Swarm closes that loop: findings open in the editor your engineers already use, every finding ships with a validated PoC, and the exact request that produced it is one click away.
Mint a per-engagement MCP token. Open findings, chains, and the exact request that produced each finding directly in Claude Code, Cursor, or Codex. The engineers fixing the bug work from the proof, not from a PDF.
Six knowledge bases via Anthropic Dreaming sit behind the orchestrator: stack-detection signals, persona lessons, dispatch heuristics, CVE curation, compromise patterns, false-positive refinement. Abstracted lessons from every Swarm engagement improve dispatch decisions across the platform with structural cross-tenant isolation.
Every finding ships with a reproducible exploit and the exact request that produced it. Severity demonstrated, not asserted. Not just critical and high: every finding.
Pillar 3 · Receipts
Every tool call. Every request. Every grep. Every submit. Every verify. Streams to the dashboard live and ships with the report. Two readers care: your auditor (signs off once a year) and your prospect's security team (scrutinizes the report every RFP). Both replay the audit trail end-to-end and verify the methodology without taking anyone's word for it.
The price
No per-target pricing. No per-finding pricing. No "starts from". One engagement, one fee, one audit trail.
Questions
The questions every engineering and security lead asks before they fund an engagement. Read the answers here, before the kickoff call.
For most SaaS engagements driven by SOC 2 Type 2 readiness, yes. That is exactly the wedge. As a human pen test alternative and ethical hacking service, Swarm replaces the standard annual engagement for the majority of SaaS security programs. A human pen test firm typically charges $15,000 to $50,000 per engagement, takes two to four weeks, and delivers a PDF whose methodology lives in the consultant's head. Swarm runs in roughly two hours for $4,995 flat and ships a structured report plus the full audit trail of every specialist action: receipted, filterable, traceable from any finding back to the request that surfaced it.
Swarm is a per-engagement product, not a subscription. Customers typically run it annually for SOC 2 Type 2 or ISO 27001 audit prep, and re-run as needed for post-incident validation, new-feature security review, or security-questionnaire responses. The 30-day free retest after each engagement is the close-the-loop validation that human firms charge separately for.
What Swarm replaces well: standard SaaS pen test engagements, especially the recurring annual or semi-annual ones, and especially when an external auditor is the deal-closing reviewer. The combination of an evidence-driven orchestrator dispatching 50+ specialists, the live activity feed, and the full forensic audit trail typically gives auditors more methodology transparency than a human-firm PDF.
What Swarm does not replace: bespoke red team assessment engagements with sophisticated social engineering, on-premise hardware testing, or multi-month engagements scoped to a specific advanced-persistent-threat hypothesis. For those, hire a senior firm. For the SOC 2 pen test you run every year, run Swarm and put the savings into remediation.
No. Automated scanners match known signatures against a checklist. Swarm specialists reason. They build a model of how your application works, form hypotheses, and test them adaptively. The result is findings scanners cannot produce: logic flaws, chained exploits, and authentication bypasses that do not appear in any CVE database. The CVE library augments this; specialists consult it for known issues. But the core engine is reasoning, not signature matching.
Yes. After every engagement, the swarm reviews what just happened and rewrites six knowledge bases that feed dispatch decisions. The mechanism is Anthropic Dreaming (beta), a research capability that lets agents reflect on completed work and update their own context. Swarm runs it against six surfaces: environment signals (stack detection patterns the orchestrator uses to choose specialists), per-specialist lessons learned, orchestrator dispatch heuristics, the CVE curation that decides which disclosures matter for offensive work, the compromise-pattern catalogue refined against new incident reports, and a false-positive refinement loop that updates the environment model whenever a finding gets rejected on review.
The practical effect lands at the platform level: across all engagements, the orchestrator routes specialists faster, the reviewer rejects fewer false positives, and the chain analyst recognizes exploit-chain shapes it has seen before. None of this requires a release on our side; the knowledge bases compound passively between runs.
Dreaming runs only on completed engagements. Abstracted lessons (CVE relevance, exploit-chain shapes, dispatch heuristics) inform the platform; per-customer signals stay scoped to your organization at the same data-model layer that enforces engagement ownership, so cross-tenant leakage is structurally impossible.
Yes. The deliverable is designed for SOC 2 Type 2 review and accepted as a compliance pen test deliverable by SOC 2 auditors. The SOC 2 Type 2 pen test report includes executive summary, individual findings with CVSS scores, exploit chain analysis, and validated proof-of-concept for every finding. The OWASP audit coverage maps every finding to its OWASP category (OWASP Top 10 testing plus OWASP API, LLM, and Agentic Applications Top 10) so your security questionnaire answers write themselves. The full audit trail (every specialist action receipted, filterable by specialist, traceable from any finding back to the request that surfaced it) gives your external auditor forensic-level transparency into methodology. A dedicated read-only Auditor role lets your compliance professional access the dashboard, report, and full audit trail directly.
Yes. Specialists operate within a customer-approved scope before testing begins. No destructive operations are taken without explicit per-action approval. Rate limits are enforced. Every request is logged and exported in the audit trail. Out-of-scope hosts are rejected at the tool layer before any HTTP call leaves the orchestrator.
A traditional pen test delivers a PDF and a verbal debrief; the methodology lives in the consultant's head. Swarm logs every move every specialist makes (every HTTP request, every source grep, every file read, every finding submission, every exploit chain composition) and streams it to your dashboard as the engagement runs. Hand the full record to your SOC 2 auditor afterward. They filter by specialist, pivot the dataset, and trace any finding in the report back to the exact tool call that surfaced it. Methodology that proves itself, not a summary that asks to be trusted.
Swarm specialists work against any modern web stack: Node, Python, Go, Ruby, Elixir, JVM, .NET, PHP. Coverage extends across every major identity provider too: Clerk, Auth0, Okta, Stytch, Cognito, Firebase, Supabase, and custom IDPs. The orchestrator fingerprints your stack during recon and dispatches the appropriate specialists automatically. AI / LLM and MCP server testing kicks in when those surfaces are detected, so you do not configure specialist-by-specialist; the swarm reads the application and routes work accordingly.
Yes. Mint a per-engagement Model Context Protocol token from the dashboard, plug it into Claude Code, Cursor, or any MCP-compatible client, and your team's editor surfaces Swarm findings, the source files the specialists already pulled, and a finding-status update tool in one place. Seven curated tools cover read access to findings and repositories plus the single write path of marking a finding remediated. Tokens are scoped to a single engagement and revoked with one click; nothing in the token can touch another engagement.
The intended workflow: an engagement closes, your engineers open the report inside Claude Code, fetch each finding's full evidence inline, write the fix against the source the specialists already read, and mark the finding remediated from the editor. The 30-day free retest then validates the fix without a separate purchase or scoping call.
The service token is stamped with a developer role: reads plus finding-status updates only. It cannot run engagements, edit scope, change billing, or reach another organization's data.
A Swarm engagement costs $4,995 flat: one price per engagement, no hourly billing, no scope negotiation. Human pen test firms typically charge $15,000 to $50,000 per engagement and take two to four weeks. The full deliverable (structured report, audit trail of every specialist action, validated proof-of-concept for every finding, and a free retest within 30 days of remediation) is included. An annual tier is available at $49,995 per year for organizations running multiple engagements per year (audit prep, post-incident validation, new-feature security review, multi-product testing).