What a real penetration test delivers — and how to spot a cheap one
Most teams buy a penetration test without knowing what a good one produces. Here's what should be in scope, what the deliverable looks like, and the red flags that mean you bought an automated scan with a nicer cover page.
Most teams buy a penetration test the way they'd buy insurance — because a customer's security questionnaire, an investor, or a NIS2 obligation said they need one. That's a fine reason to start, but it means the buyer often can't tell a real test from an automated scan with a nicer cover page. The two cost wildly different amounts and deliver wildly different value. This guide is the buyer's-side breakdown: what a real penetration test actually covers, what the deliverable should look like, and the red flags that tell you the quote in front of you is the cheap kind.
What a penetration test is — and isn't
A penetration test is a controlled, authorised attack on your system, performed by a human, to find the weaknesses a real attacker would exploit and to prove how far they'd get. The word that does the work is performed by a human. Automated tooling is part of any modern pentest — it covers ground fast — but the tooling finds candidates, not conclusions. A scanner flags a thousand "potential" issues; a tester works out which three actually chain into a breach, and proves it.
What a pentest is not: it is not a vulnerability scan (that's automated, continuous, and cheap), and it is not a compliance audit (that's a paperwork exercise against a control framework). All three are useful. They are not interchangeable, and a studio that blurs them in the quote is either confused or hoping you are.
What's actually in scope
A real engagement is scoped before it starts, in writing. "Test our security" is not a scope. A proper scope names the targets, the boundaries, and the rules of engagement.
Targets · which systems are in — the web app, the public API, the cloud configuration, the internal network, the mobile app, and increasingly the LLM / AI layer if you ship one.
Out of scope · what the tester must not touch — third-party services you don't own, production data deletion, denial-of-service. Stated explicitly so nothing is ambiguous.
Rules of engagement · test windows, who to contact if something breaks, whether it's black-box (no internal knowledge), grey-box (some), or white-box (full source access).
Standards · what the test is measured against — the OWASP Top 10 for web, the OWASP API and LLM lists where relevant, and a recognised methodology so coverage isn't ad hoc.
Grey-box or white-box testing usually finds more for the same money — handing over source access lets the tester spend hours on real exploitation instead of on reconnaissance. Black-box only makes sense when the goal is specifically to simulate an outside attacker with zero knowledge.
The methodology, in plain terms
A structured pentest moves through recognisable phases. You don't need to run them — but you should recognise them in a proposal, because a quote that can't describe its own method is a quote for something else.
Reconnaissance · mapping the attack surface — what's exposed, what technologies are in use, where the entry points are.
Scanning · automated and manual probing to enumerate candidate weaknesses across the scoped surface.
Exploitation · the actual test — proving which candidates are real by exploiting them in a controlled way, and showing the impact.
Post-exploitation · how far a foothold goes — privilege escalation, lateral movement, what data becomes reachable. This is what separates "a bug" from "a breach".
Reporting · every confirmed finding written up so you can reproduce it, severity-rated (typically with CVSS), with a concrete remediation.
Retest · after you fix, the tester re-runs the findings and verifies the fixes actually held. A test without a retest is half a test.
The deliverable: what you should receive
The report is where cheap and real diverge most visibly. A real deliverable is built to be acted on, not filed.
Reproducible findings · each one with the exact steps to trigger it. If your engineer can't reproduce it from the report, it can't be fixed with confidence.
Severity that means something · rated consistently (CVSS or an equivalent), so you fix the critical path first and don't drown in low-risk noise.
Remediation you can act on · not "improve input validation" but a specific fix — ideally a pull request opened against your actual repository.
An executive summary · one page a non-technical stakeholder can read: what was tested, what was found, what the real risk is.
A verified retest · documented confirmation that the fixes closed the findings. This is the artefact a customer's security team or an auditor actually wants.
An 80-page report is not a sign of thoroughness — it's usually a sign of an automated scanner's raw output pasted in. A real report is shorter, because every finding in it is confirmed. Length is a vanity metric; reproducibility is the real one.
Red flags: how to spot a cheap test
If a quote shows several of these, you are buying an automated scan dressed up as a penetration test.
A price well under a few thousand euros · real manual testing is senior engineering time; a serious web-app test is dozens of hours.
No named tester · you should know who is doing the work and see their background. "Our team" with no names means a tool ran it.
No exploitation · findings phrased as "this appears vulnerable" rather than "we exploited this, here's the proof". Appears is scanner language.
No retest included · if verifying your fixes costs extra, the engagement was scoped to end before the value does.
An 80-page PDF as the headline selling point · volume substituting for confirmation.
No scoping conversation · a real test starts with a discussion about what's in scope and why; a flat order form does not.
Pentest, scan, audit — which do you actually need
You likely need more than one. A vulnerability scan should run continuously in your pipeline. A penetration test belongs before a major launch, after a significant architecture change, and on a regular cadence — annually is a common floor. A compliance audit happens when a framework or a customer demands it. If a customer questionnaire says "penetration test" and you hand them a scan report, you fail the question — and vice versa.
How DField Solutions runs a test
We run penetration testing the way the rest of our work runs: scoped up front with a fixed price, manual testing against the OWASP web, API and LLM lists, and a deliverable built to be acted on. Findings come as fix-PRs opened against your repository — not an 80-page PDF — each one reproducible, severity-rated, and with the impact proven. A retest after you merge the fixes is part of the engagement, not an upsell. The standard audit runs about two weeks; the report and the fix-PRs land by the end of week two.
If you're scoping a test — for a launch, a customer's security review, or a NIS2 obligation — the cybersecurity service page covers how we work, and a 30-minute discovery call is the fastest way to define scope. If you want the pricing bands first, the pricing page has them.