How AI-assisted pentests work in practice

The question CTDefense gets most often from CISOs and IT directors right now is some version of the same one: the board has read the May 2026 coverage of AI-assisted intrusion, and they want to know whether the pentest CTDefense delivers is keeping pace with how attackers are operating. This is the honest answer. AI-assisted penetration testing, on a CTDefense engagement, means human testers running AI-trained models in the loop on the high-volume, deterministic parts of the work, while the senior tester spends a larger share of engagement hours on the parts that need judgement. No replacement, no autonomy, and no claim that the model is doing the engagement on its own.

The shift is operational, not philosophical. The same scope, the same report format, and the same human accountability the buyer is used to. What changes is the share of the engagement window the senior tester gets to spend on the work that actually moves the needle.

Why attacker-side AI changes the threat model

Two recent disclosures have moved AI threat actor capability from a forecast to a current observation. In November 2025, Anthropic reported it had disrupted GTG-1002, a cyber espionage campaign in which Claude Code carried out 80 to 90 percent of the attack chain with only 4 to 6 human decision points. Anthropic’s writeup says the model “identified and tested security vulnerabilities in the target organizations’ systems by researching and writing its own exploit code.” That is vulnerability discovery and exploit synthesis attributed to a frontier model inside a real, named campaign, not a lab demo.

In early May 2026, Dragos published its analysis of the first publicly reported AI-assisted ICS attack, on a US water utility. SecurityWeek covered the IT-side angle of the same incident, noting that Claude “independently identified a vNode SCADA and IIoT management interface running on an internal server.” Dragos’s broader observation is the one that matters for the threat model on the pentesting side: “AI as an intrusion aid evidently compresses that window, meaning defenders have less time between enterprise-level compromise and attempts to breach industrial and operational assets.”

This answers the question the buyer is already asking themselves: how fast can an AI-assisted attacker move through a network. The dwell-time-to-pivot interval is shrinking because reconnaissance and target classification, which used to be human paced, are now model paced. A pentest that still budgets human-only hours for those phases gives the defender a less accurate picture of how fast a real intrusion now moves. The whole point of an authorised engagement is to mirror the threat actor’s tempo, not to fall behind it.

What the model handles in a live engagement

On a CTDefense engagement, penetration testing with AI models is bounded and observable. The senior tester scopes the work, sets the rules of engagement, and stays in the driver’s seat. The model handles the work that is high in volume, low in judgement, and easy to verify after the fact.

In practice, this means the model takes on:

External-surface enumeration. Subdomain discovery, exposed-service inventory, TLS and header configuration sweeps across the perimeter. CTDefense already covers this in its external network security audit; the model now runs the same enumeration faster and more exhaustively, and the human reviews the shortlist.
Dependency CVE triage. Cross-referencing component versions against advisory feeds, filtering out the noise (unreachable code paths, mitigations already in place), and flagging the small set that actually matters for the in-scope app.
Configuration sweeps. TLS profiles, security headers, cookie attributes, output-encoding inventory across a web app, IAM trust-policy review across a cloud account. Deterministic checks that benefit from being run wider rather than deeper.
Structured internal-estate recon. Once a foothold exists, classifying hosts, surfacing service accounts, mapping trust paths, and producing a target shortlist for the human to chain.
Hypothesis generation. Suggesting candidate attack paths for the tester to validate or discard, based on what the model has already enumerated.

The model’s output is not the deliverable. Every finding the model surfaces is verified by a human tester before it lands in the report. A misidentified asset, a CVE that does not actually apply, or a header check that misreads the deployment never reaches the client. The verification step is what separates an AI pentest engagement run by a security team from a scanner running on autopilot.

Where human judgment still owns the work

The honest counter to the “AI replaces pentesters” framing is direct: it does not, and a serious provider will not pretend otherwise. Does AI replace human pentesters on a CTDefense engagement? No. Model-paced recon does not substitute for the work that defines a pentest’s value to the buyer.

The senior tester still owns:

Exploit chaining. Stitching three findings into a single attack path that proves real risk, rather than reporting them as three isolated medium-severity issues.
Business-logic abuse. Discovering that a discount code can be applied twice, that a workflow allows a state transition the designers did not intend, or that a permission boundary depends on an assumption that does not hold. None of that is in a CVE feed.
Scoping calls. Deciding when to stop enumerating and start exploiting, when a finding is in scope, and when something interesting is out of scope and needs a separate engagement.
Remediation prioritisation. Translating findings into business language, ranking them by exploitability and blast radius, and writing the executive summary the CISO actually shows the board.
Client-facing judgement. The conversation with the application owner about why a particular fix matters, and the back-and-forth during the retest.

These are the parts of the work that need a human who has seen 200 engagements and can tell a real risk from a noisy one. The model accelerates the supporting cast around that judgement; it does not replace it.

What changes for the buyer

For the buyer, what does an AI-assisted pentest actually do differently inside the same engagement window. Three things, mostly.

First, faster delivery on configuration-tier findings. The TLS sweeps, header reviews, and dependency triage that used to consume the first week of an engagement are turned around in the first day or two. The human tester gets that time back.

Second, deeper coverage on the business-logic tier. Because the configuration tier is no longer eating the schedule, the senior tester spends more hours on chaining, business-logic abuse, and the targeted recon that only makes sense once the easy work is out of the way. Same engagement window, more time on the work that actually surfaces high-impact findings.

Third, the same report format the buyer is used to. A human-led AI-powered pentest still produces an executive summary, a finding-by-finding writeup with reproduction steps, a remediation plan, and a debrief. Nothing in the deliverable looks unfamiliar to the procurement team or to the auditor. The only thing that changed is what happened upstream of the report.

Between annual deep engagements, continuous security validation closes the gap. Drift in the perimeter, new exposed services after a deploy, a fresh certificate misconfiguration, a dependency that picked up a CVE last week, none of that waits for the next scheduled pentest. Continuous validation catches it inside the cadence the threat model now demands.

The honest case for hybrid testing

Vendor blogs in this space tend to land in one of two camps. One camp says the human is the bottleneck and should be replaced. The other camp says AI is a toy and only humans can pentest. Both are wrong for a CISO sitting across from a board that has read the May 2026 coverage. The defensible position is the one that is already happening on real engagements: a senior tester runs the engagement, models accelerate the work that scales well, and the human owns the judgement calls and the report.

CTDefense delivers this through its in-house pentest team and through an AI pentest platform built on the methodology of senior offensive engineers, PentX, used as the in-the-loop tooling on live engagements and as the continuous-validation layer between them. The April post covered what AI in the loop looks like for ongoing human-led testing; this one focuses on the operational shift inside a single engagement.

CTDefense continues to support mid-to-large enterprises, MSSP partners, and in-house security teams who need a defensible answer when the board asks whether their offensive testing is keeping pace with how attackers are working now. Similar organisations are encouraged to take a close look at how their current provider handles the configuration tier, the business-logic tier, and the gap between annual engagements. Those three questions surface the difference between a 2024-shaped engagement and one built for a 2026 threat model.

Why attacker-side AI changes the threat model

What the model handles in a live engagement

Where human judgment still owns the work

What changes for the buyer

The honest case for hybrid testing

Leave a Reply Cancel reply