How to measure penetration testing team performance

Utilisation alone does not show whether a penetration testing team is healthy. Leaders also need to measure quality, rework, client friction and senior dependency.

By Simon Chapman

Talk to us All articles
Offensive security leader reviewing delivery metrics, report quality and team performance indicators
Pentest team performance

Many penetration testing teams measure what is easy to count.

Utilisation. Revenue. Days delivered. Reports issued. Projects completed.

Those measures matter, but they do not show the full picture.

A team can be highly utilised and still be struggling. Reports may be late. QA may be overloaded. Senior testers may be rescuing too many projects. Clients may be asking too many clarification questions. Juniors may not be progressing.

Good measurement should show whether the team is delivering well, not just whether it is busy.

Utilisation is useful but limited

Utilisation is one of the most common measures in a professional services business.

It tells leaders whether people are booked and whether capacity is being converted into revenue.

That is useful.

But utilisation can also hide problems.

A tester may be fully utilised but producing reports that need heavy QA. A senior consultant may be busy, but much of their time may be spent rescuing other people’s work. A team may hit its utilisation target while creating burnout, rework and client frustration.

High utilisation does not always mean healthy performance.

It needs to be read alongside quality, margin and delivery friction.

Report turnaround shows delivery drag

The time between testing finishing and the report being issued is a useful signal.

If reports routinely take too long to leave the business, something is causing drag.

That might be:

  • poor note-taking during testing
  • weak report writing
  • delayed QA
  • unclear findings
  • missing evidence
  • reviewer bottlenecks
  • testers moving to the next job too quickly
  • late severity debates

Report turnaround should not be used to pressure testers into rushing.

It should be used to understand where time is being lost.

A report that needs three review cycles is not just late. It is consuming effort that could have been used elsewhere.

This is where good QA in a penetration testing team matters. QA should expose whether delays are caused by weak evidence, unclear severity logic, poor recommendations or repeated report rework.

QA cycles reveal quality patterns

QA review cycles are one of the clearest indicators of delivery quality.

A report that passes with minor comments is very different from a report that needs repeated technical correction, rewritten findings or severity changes.

Leaders should look at the pattern, not only the individual report.

Useful questions include:

  • which findings are repeatedly rewritten?
  • which testers need the most QA support?
  • which service lines produce the most review comments?
  • are QA issues technical, structural or communication-related?
  • are the same problems appearing across multiple reports?

This helps separate one-off mistakes from system issues.

If the same defects appear repeatedly, the answer is not more pressure at the end of the process. The answer is better development earlier.

Unbilled time shows margin leakage

Fixed-price testing makes unbilled time important.

If a project needs more effort than expected and the business cannot charge for it, margin is lost.

Unbilled time may come from technical complexity, but it often comes from operational issues.

Common causes include:

  • unclear scope
  • poor client readiness
  • missing credentials
  • extra clarification calls
  • report rework
  • client disputes
  • unplanned senior support
  • retest confusion
  • weak recommendations

Leaders should not treat all unbilled time as failure.

Some additional effort is part of professional delivery. But repeated unbillable patterns show where pricing, scoping or delivery habits need attention.

This is also why poor client communication erodes pentest margin. The cost often appears as small amounts of rework, delay and senior support that nobody priced.

Client clarification volume matters

Client questions after report delivery are not always bad.

Some are expected. A good report can still generate sensible remediation questions.

But repeated clarification requests can signal that the report was not clear enough.

Leaders should look for patterns such as:

  • clients asking what a finding means
  • remediation teams asking what to change
  • risk owners asking why severity was assigned
  • repeated questions about affected assets
  • confusion about retest requirements
  • uncertainty about whether a finding is exploitable

These questions create post-delivery cost.

They also affect client confidence.

A client that has to work too hard to understand the report may not feel well served, even if the technical testing was sound.

Severity disputes should be tracked

Severity disputes are part of penetration testing.

They should not be treated as unusual. But they should be measured.

Useful points to track include:

  • how often severity is challenged
  • which findings are most commonly disputed
  • whether ratings change after review
  • whether new client context was provided
  • which testers generate the most disputes
  • whether disputes are resolved at tester level or escalated

This helps leaders understand whether the problem is client pressure, weak rating guidance, poor evidence or inconsistent QA.

A high number of disputes does not automatically mean the team is wrong.

It does mean the team should understand why disputes are happening.

Senior involvement should be visible

Unplanned senior involvement is one of the most important measures and one of the least visible.

Senior people often support projects informally. They review difficult findings, join calls, fix reports, advise on scope and handle escalations.

That work can be valuable, but it has a cost.

Leaders should understand how often senior people are being pulled into routine delivery problems.

Useful questions include:

  • which projects needed senior rescue?
  • why was senior support needed?
  • was the issue technical, commercial or communication-related?
  • could the problem have been avoided earlier?
  • is the same senior person becoming a bottleneck?

This does not mean senior support should be discouraged.

It means the business should know when it is relying on senior people to absorb preventable friction.

When this happens repeatedly, senior pentesters become the delivery bottleneck. The team still delivers, but too much judgement and quality control sits with too few people.

Junior progression needs practical measures

A pentest team should measure whether junior testers are becoming more capable.

Certifications and training completion can help, but they are not enough.

Useful development indicators include:

  • reduced QA comments over time
  • better evidence quality
  • clearer findings
  • more confident client communication
  • ability to explain severity
  • better scoping awareness
  • fewer repeated mistakes
  • increased ownership of debriefs or client calls

These measures are more useful than vague confidence statements alone.

They show whether juniors are moving towards independent consulting capability.

Client confidence should be part of performance

Technical delivery is only part of the client experience.

Clients also judge how clearly the team communicates, whether the report is useful, whether the debrief is controlled and whether the provider understands their environment.

Feedback should therefore go beyond simple satisfaction.

Useful questions include:

  • did the report help the client prioritise action?
  • were findings explained clearly?
  • was the tester able to answer questions?
  • did the engagement feel well managed?
  • were there surprises that should have been handled earlier?
  • would the client trust the team with more complex work?

These questions reveal whether delivery quality is translating into trust.

That matters for renewal and account growth.

Measurement should lead to action

Metrics are only useful if they change decisions.

If QA cycles are high, the team may need report coaching. If unbilled time is increasing, scoping may need attention. If severity disputes are common, rating guidance may need review. If senior involvement is high, team development may be too informal.

The aim is not to create a dashboard for its own sake.

The aim is to make hidden problems visible enough to fix.

A small set of useful measures is better than a large set nobody uses.

What offensive security leaders should measure

A practical starting set would include:

  • utilisation
  • report turnaround
  • QA review cycles
  • unbilled time
  • client clarification volume
  • severity disputes
  • unplanned senior involvement
  • retest friction
  • junior progression
  • client confidence

Together, these give a fuller view of team performance.

They show whether the team is busy, but also whether the work is clear, defensible, profitable and scalable.

Poor scoping should also be part of the picture. Poor scoping damages pentest delivery through missed assumptions, unclear objectives, client readiness issues and avoidable margin loss.

What this means for pentest team performance

Penetration testing teams need better measures than utilisation alone.

Being busy is not the same as being effective.

Leaders need to know where quality is strong, where rework is happening, where senior people are being overused and where clients are experiencing friction.

That does not require a heavy management system.

It requires a practical view of how work actually moves through the team.

The right measures help leaders see the real constraints on performance and act before those constraints become normal.

If this article describes a real delivery pressure, turn it into a next step.

Conversec helps offensive security teams improve consulting maturity, leadership capacity, and delivery clarity.