Correlation and Deduplication Engine

Decide once, for every scanner result, whether a vulnerability is new, recurring, or returning, before a single finding is written, so re-scans never multiply the backlog.

The Correlation and Deduplication Engine answers one question for every scanner result that enters PMAP: does this vulnerability already exist on this asset, or is it new? Before a finding is created, updated, or reopened, the engine resolves the inbound result against what the platform already knows.

Correlation is a pure Go library package under internal/correlation with no HTTP routes of its own. It is never mounted in the server router and never visible to a user directly. Its only trace is the outcome it returns, so the finding layer downstream receives governed, deduplicated records rather than raw vendor noise.

Fingerprints are only stable if their inputs are, so every value is normalized first.

The hard problem at scan scale is not parsing results. It is deciding, before the write, whether each result is new, recurring, or returning, so the same engine serves every connector.

At a glance

Backend package: internal/correlation (Go library, no HTTP surface)
Dedup pipeline: Four ordered cases: scanner_ref, fingerprint, reopen, create
Fingerprinting: SHA-1 over normalized title, asset, and endpoint, with optional ID (V1 and V2)
Lookup order: scanner_ref takes absolute priority over the fingerprint
Wave accounting: RecordScanOccurrence runs on every create, update, and reopen
Rule hook: EvaluateAndApply fires at ingest on create and reopen only
Consumers: Nessus, Qualys, Rapid7, Acunetix, Invicti, Nuclei, SonarQube, Tenable.sc, generic

How it works

One engine, one deduplication policy, before any write. Scanner_ref then fingerprint decide whether to update, reopen, or create, so a recurring vulnerability never becomes a duplicate and a returning one never loses its history.

CorrelateFinding implements an ordered four-case pipeline, and the order is the contract. Each case is only reached when the earlier ones did not resolve the result. The engine tries scanner_ref first, falls back to the fingerprint, branches on whether the matched finding is closed, and creates a new finding only when nothing matched.

Once a match is resolved, the engine does more than flip a flag. Each of the three outcomes runs a precise sequence of repository calls so that history is preserved, recurrence is visible, and downstream consumers see consistent aggregates across every branch of the pipeline.

Key capabilities

Reopen preserves history. When the matched finding is closed, the engine refreshes its fields through UpdateFromScanner, then issues ReopenFinding with the importing actor as attribution. The finding keeps its identity and its prior remediation record while reflecting that the vulnerability has returned.
Wave-visibility accounting. After any outcome, when a scan ID is present, RecordScanOccurrence registers that this finding was seen in this wave. That maintains the cross-scan aggregates the finding layer carries: how many scans observed the vulnerability and which wave most recently saw it.
Post-correlate rules. A small RuleEvaluator interface is declared inside correlation to break an import cycle, then wired in after construction through SetRuleEngine. EvaluateAndApply runs only on create and reopen, because a plain update of an already-open finding changes nothing about its lifecycle position.
Lossless source passthrough. CorrelateRequest carries the full SAST, DAST, and SCA field set: code location, taint flow, dependency path, license, and CI context. When a new finding is created, every field is persisted, so nothing the importer parsed is discarded.

Use cases

Re-ingest a weekly scan. A vulnerability manager re-imports the same Tenable scan a week later. The engine resolves each result by scanner_ref to the finding it already produced, updates fields in place, and records the new wave, so the backlog does not grow because re-ingestion is idempotent.
Catch a vulnerability that came back. A SOC lead reviews a host remediated last month. A fresh scan re-detects the issue, the fingerprint matches the now-closed finding, and the engine reopens it with attribution, so the returning vulnerability keeps its full remediation history.
Consolidate two scanners. A CISO runs both a network scanner and a web scanner against the same application. Where they describe the same vulnerability on the same endpoint, normalization collapses their differing titles and URLs to one fingerprint, so the tools converge on a single governed finding.

One decision per result, before any write, so re-scans never multiply the backlog.

One platform to ingest, correlate, triage and remediate every vulnerability finding.

Build and deliver vulnerability management with PMAP

Help Build the Vulnerability Management Platform Security Teams Trust

Correlation and Deduplication Engine

Get the document

About this datasheet

At a glance

How it works

Key capabilities

Use cases

About this datasheet

At a glance

How it works

Key capabilities

Use cases

Ready to see PMAP in action?