Datasheet

Correlation and Deduplication Engine

4 min read

Get the document

Tell us where to send it. The PDF lands in your inbox in under a minute.

About

About this datasheet

Decide once, for every scanner result, whether a vulnerability is new, recurring, or returning, before a single finding is written, so re-scans never multiply the backlog.

The Correlation and Deduplication Engine answers one question for every scanner result that enters PMAP: does this vulnerability already exist on this asset, or is it new? Before a finding is created, updated, or reopened, the engine resolves the inbound result against what the platform already knows.

Correlation is a pure Go library package under internal/correlation with no HTTP routes of its own. It is never mounted in the server router and never visible to a user directly. Its only trace is the outcome it returns, so the finding layer downstream receives governed, deduplicated records rather than raw vendor noise.

Fingerprints are only stable if their inputs are, so every value is normalized first.
Fingerprints are only stable if their inputs are, so every value is normalized first.

The hard problem at scan scale is not parsing results. It is deciding, before the write, whether each result is new, recurring, or returning, so the same engine serves every connector.

At a glance

  • Backend package: internal/correlation (Go library, no HTTP surface)
  • Dedup pipeline: Four ordered cases: scanner_ref, fingerprint, reopen, create
  • Fingerprinting: SHA-1 over normalized title, asset, and endpoint, with optional ID (V1 and V2)
  • Lookup order: scanner_ref takes absolute priority over the fingerprint
  • Wave accounting: RecordScanOccurrence runs on every create, update, and reopen
  • Rule hook: EvaluateAndApply fires at ingest on create and reopen only
  • Consumers: Nessus, Qualys, Rapid7, Acunetix, Invicti, Nuclei, SonarQube, Tenable.sc, generic

How it works

One engine, one deduplication policy, before any write. Scanner_ref then fingerprint decide whether to update, reopen, or create, so a recurring vulnerability never becomes a duplicate and a returning one never loses its history.

CorrelateFinding implements an ordered four-case pipeline, and the order is the contract. Each case is only reached when the earlier ones did not resolve the result. The engine tries scanner_ref first, falls back to the fingerprint, branches on whether the matched finding is closed, and creates a new finding only when nothing matched.

Once a match is resolved, the engine does more than flip a flag. Each of the three outcomes runs a precise sequence of repository calls so that history is preserved, recurrence is visible, and downstream consumers see consistent aggregates across every branch of the pipeline.

Key capabilities

  • Reopen preserves history. When the matched finding is closed, the engine refreshes its fields through UpdateFromScanner, then issues ReopenFinding with the importing actor as attribution. The finding keeps its identity and its prior remediation record while reflecting that the vulnerability has returned.
  • Wave-visibility accounting. After any outcome, when a scan ID is present, RecordScanOccurrence registers that this finding was seen in this wave. That maintains the cross-scan aggregates the finding layer carries: how many scans observed the vulnerability and which wave most recently saw it.
  • Post-correlate rules. A small RuleEvaluator interface is declared inside correlation to break an import cycle, then wired in after construction through SetRuleEngine. EvaluateAndApply runs only on create and reopen, because a plain update of an already-open finding changes nothing about its lifecycle position.
  • Lossless source passthrough. CorrelateRequest carries the full SAST, DAST, and SCA field set: code location, taint flow, dependency path, license, and CI context. When a new finding is created, every field is persisted, so nothing the importer parsed is discarded.

Use cases

  • Re-ingest a weekly scan. A vulnerability manager re-imports the same Tenable scan a week later. The engine resolves each result by scanner_ref to the finding it already produced, updates fields in place, and records the new wave, so the backlog does not grow because re-ingestion is idempotent.
  • Catch a vulnerability that came back. A SOC lead reviews a host remediated last month. A fresh scan re-detects the issue, the fingerprint matches the now-closed finding, and the engine reopens it with attribution, so the returning vulnerability keeps its full remediation history.
  • Consolidate two scanners. A CISO runs both a network scanner and a web scanner against the same application. Where they describe the same vulnerability on the same endpoint, normalization collapses their differing titles and URLs to one fingerprint, so the tools converge on a single governed finding.

One decision per result, before any write, so re-scans never multiply the backlog.

See it live

Ready to see PMAP in action?

Talk to our team or jump straight into a guided tour of the platform.

We use your email only to set up your guided tour. No marketing drip, no third-party tracking.