CYPFER Offensive Practice | When AI Reconnaissance Loses Context during real attacks

AI attack patterns

Over the past months, reconnaissance traffic has remained largely predictable. Opportunistic scanners, mass internet crawling, WordPress probes, exposed .git directories, the usual noise. Recently however, while reviewing access logs on offsec.cypfer.com, a new pattern started to emerge. At first glance it still looked like reconnaissance. On closer inspection, it was something subtly but fundamentally different.

Below is a non exhaustive sample of URLs that were logged during a short period of time:

/phpinfo/blog/blog/blog/blog/Free-cloud-security-assessment
/app/blog/blog/blog/blog/Mythos-your-environment
/storage/logs/blog/blog/blog/blog/Detection-capability-assessment
//blog///robots.txt
//blog//wp-json/wp/v2/users/
//blog//wp-json/oembed/1.0/embed?url=https://offsec.cypfer.com//blog/
//blog//xmlrpc.php
/blog/blog/.git/config
...
/.git/refs/heads/blog/Malware-campaign-red-team
/.svn/blog/Cobalt-Strike-redirector-aws-azure
/.aws/blog/Cobalt-Strike-redirector-aws-azure
/.hg/store/blog/Advanced-red-teaming-tactics
/.circleci/blog/Malware-campaign-red-team
/.ssh/blog/Cobalt-Strike-redirector-aws-azure
/node_modules/.bin/blog/blog/Mythos-your-environment
/.ssh/blog/Detection-capability-assessment
/node_modules/.bin/blog/blog/Cobalt-Strike-redirector-aws-azure
/.github/workflows/blog/blog/Mythos-your-environment
/blog//blog/.env
/wp-Blogs.php
/.git/logs/blog/Malware-campaign-red-team
/wp-blogs.php
/blog/services-cta
/wp-blog-header.php

This clearly was reconnaissance. But it was reconnaissance without understanding.

Why These URLs Are Impossible

For anyone with even minimal knowledge of web application architecture, most of these paths are structurally invalid.

A few obvious examples:

blog/blog/blog/blog repeated blindly without regard for routing or framework logic.
Mixing application level paths with content slugs like Mythos-your-environment.
Combining version control directories like .git, .svn, .hg, .circleci, .github with blog post names.
Appending human readable article titles to binary paths such as node_modules/.bin.
Treating .env, phpinfo, xmlrpc.php, wp-json and arbitrary blog content as interchangeable primitives.

No modern framework would ever expose endpoints constructed this way. Even misconfigurations or legacy applications have structural constraints. These URLs violate those constraints completely.

This is not creative path guessing. It is syntactically valid HTTP, but semantically impossible.

The Trial and Error Signature of an AI

Classic scanners operate with curated wordlists and known attack primitives. Even bad scanners are consistent. What we see here is different.

The pattern suggests:

Token recombination rather than intent driven probing
No understanding of execution context
No discrimination between file system paths, routes, APIs and content
Lack of feedback loop coherence

In practice, it looks like an AI model was given high level concepts such as WordPress, Git exposure, PHP RCE, CI pipelines and blog content, then asked to generate attack paths. The model complied, but without any grounding in how web applications actually work.

This is trial and error in the truest sense. Enumerate. Retry. Concatenate. Hope.

Why AI Assisted Recon Can Perform Worse Than Traditional Tools

Ironically, this type of attack is less effective than basic tool oriented reconnaissance.

A typical attacker using off the shelf scanners will:

Hit valid well known paths
Follow framework conventions
Trigger meaningful application behaviour
Receive actionable feedback

Here, the AI wastes requests on endpoints that can never exist. There is no signal. No recon value. No learning opportunity.

From an operational standpoint, this approach is also more expensive. Each failed request consumes infrastructure, time, and yes, tokens. In this case, those tokens were burned generating paths that any junior pentester would immediately discard as nonsense.

Detection and Defensive Value

From a defensive perspective, this pattern is useful.

These requests stand out clearly in logs:

Excessive path repetition
Context collapse between layers
Semantic incoherence
High entropy URL construction with low structural validity

This is an emerging fingerprint. As AI assisted attacks scale, defenders can begin distinguishing between human guided tooling and autonomous generation gone wrong.

Ironically, the lack of understanding becomes a detection signal.

AI assisted reconnaissance is real, but it is far from intelligent by default.

The traffic observed against offsec.cypfer.com shows that throwing language models at attack generation without domain grounding produces worse results than traditional scanners. These attacks were noisy, ineffective, and almost certainly cost the operator more than they gained.

For now, humans with boring tools still outperform AI that does not understand context. Until models can reason about application structure instead of remixing tokens, this trend will remain more interesting than dangerous.

And yes, it probably burned a few tokens along the way.

When AI Reconnaissance Loses Context during real attacks