Smart Match: Auto-Linking Findings to Templates

A scanner finding arrives as a raw record. It might carry a title, maybe a CVE identifier, maybe a vendor plugin number, and not much else that a triage analyst can act on. Multiply that by the volume an enterprise scanning programme produces across Tenable, Qualys, Rapid7, and a mix of DAST and SAST tools, and the manual work of looking up context becomes the bottleneck. Every analyst ends up doing the same lookups, copying the same remediation text, and tagging the same taxonomy, finding by finding.

PMAP closes that gap with a platform-level vulnerability template library and a 4-stage Smart Match engine that links each incoming finding to the best matching template, then backfills the enrichment automatically. This article explains how vulnerability template matching works in PMAP, what each match stage does, how confidence scoring decides the best match, and what gets written onto a finding when a template is applied. It is grounded in the platform behaviour rather than aspirational claims, so every rule described here reflects how the VulnDB domain and Smart Match pipeline actually run.

If you want the wider context first, our pillar on vulnerability intelligence enrichment covers how PMAP turns raw scanner output into triage-ready findings across the whole programme. This piece zooms into one mechanism inside that story.

Why Raw Scanner Findings Are Not Triage-Ready

A finding straight from a scanner is a detection, not a decision. It tells you that a tool saw something on an asset. It does not tell you the weakness class behind the detection, which adversary techniques the weakness enables, how it is scored under the standard severity model, or what the canonical remediation looks like for your environment. Those are the inputs a triage analyst actually needs, and they rarely arrive complete from the scanner.

The traditional answer is manual enrichment. An analyst opens the finding, copies the CVE into a browser, reads the NVD entry, notes the CWE weakness class, checks the CVSS vector on the FIRST CVSS specification, writes a remediation note, and tags the taxonomy. Done once, this is fine. Done across thousands of findings per assessment wave, it does not scale. Worse, it produces inconsistency. Two analysts enriching the same vulnerability class on different days will phrase remediation differently, tag taxonomy differently, and sometimes disagree on severity. That inconsistency then flows downstream into analytics and reports, where it quietly distorts every roll-up.

The core problem is that enrichment knowledge is being re-derived per finding instead of being authored once and reused. A known vulnerability class does not change between findings. The CVE behind it, its weakness mapping, its scoring, and its remediation are stable facts. They belong in a knowledge base that every finding can draw from, not in an analyst’s head or a one-off note buried on a single finding record.

What a Vuln Template Holds

A vuln template is PMAP’s reusable, scanner-agnostic definition of a known vulnerability class. Authored once by a security content author or platform admin, it carries everything a finding needs to inherit. Because it is scanner-agnostic, the same template enriches a Nessus finding, a Qualys finding, and a manual pentest finding for the same weakness, with no per-scanner duplication.

Each template carries a structured set of fields. On the identifier side, it holds cve_ids[] and cwe_ids[] arrays, a cvss_vector string, a cvss_score float, and a mitre_technique_ids[] array that maps the class to the MITRE ATT&CK framework. CVE identifiers come from the MITRE CVE program, CWE identifiers from the MITRE weakness catalogue, and CVSS scoring from the FIRST specification. Storing all three together means a matched finding inherits identifier-grade context, not just a label.

On the content side, a template stores name, description, and remediation in three variants each. There is a base value, a Turkish variant suffixed _tr, and an English variant suffixed _en. This multi-language structure lets a single library serve teams working in either language without forking the knowledge base. The template detail drawer exposes a TR/EN toggle so an author or analyst can review guidance in their preferred locale.

On the classification and taxonomy side, the template carries severity, vuln_type, and four taxonomy arrays aligned to PMAP’s canonical taxonomy: effects[], root_causes[], remediation_techniques[], and tags[]. These are the labels that make analytics meaningful, because every finding linked to the template inherits the same canonical set rather than an analyst’s ad-hoc wording. A references[] array holds external links, and a cvss_vector plus cvss_score pair provides the severity-grade context that flows to matched findings.

One more field exists purely for matching. The external_match_keys[] array stores scanner-specific plugin identifiers such as Nessus plugin IDs and Qualys QIDs. This is what makes deterministic key matching possible, and it is central to the second stage of the pipeline described below.

The 4-Stage Smart Match Pipeline

Smart Match is triggered with a single POST /match call. The request needs at least one of three inputs: a title, a cve_ids array, or a plugin_keys array. If all three are absent, the API returns a validation error rather than a misleading empty result. Given valid input, the engine runs a prioritised four-stage pipeline and returns ranked candidates, each carrying a confidence score and a match_reason.

The four stages run in strict priority order:

1. CVE exact : template.cve_ids overlaps req.cve_ids (array overlap); confidence = 1.0
2. Plugin key : template.external_match_keys overlaps req.plugin_keys; confidence = 0.9
3. Title fuzzy : pg_trgm similarity(name, title) > 0.2; confidence = similarity score
4. Manual : most-recently-updated active templates; confidence = 0.1

Stage 1 is the strongest signal. When the finding carries a CVE that overlaps a template’s cve_ids[] array, that template matches with a confidence of 1.0 and a match_reason of cve_exact. A CVE identifier is a globally unique reference to a specific vulnerability, so an exact overlap is as close to certainty as matching gets. This is the ideal case, and it is why feeding CVE data into the pipeline matters so much.

Stage 2 handles the very common case where a scanner did not emit a CVE but did emit a vendor plugin identifier. If the finding’s plugin_keys overlap a template’s external_match_keys[], the template matches at confidence 0.9 with a deterministic, vendor-specific signal. This stage is covered in more detail below, because plugin key matching is what rescues the large fraction of scanner output that lacks clean CVE data.

Stage 3 is the fuzzy fallback. When neither CVE nor plugin key produces a hit, the engine compares the finding title against template names using PostgreSQL pg_trgm trigram similarity. Any template whose similarity to the title exceeds 0.2 becomes a candidate, and its confidence equals the similarity score itself. This means a near-identical title produces a higher-confidence candidate than a loose one, and the ranking reflects that gradient directly.

Stage 4 is the safety net. If nothing matched in the first three stages, the engine returns the most-recently-updated active templates at a deliberately low confidence of 0.1. This is not a real match. It is a starting point so an analyst always has something to react to rather than an empty modal, and the low confidence signals clearly that human judgement is required.

Two behaviours keep the ranked list honest. First, stage deduplication: a template that already matched in stage 1 is not re-added in stages 2 or 3, so the same template never appears twice. Second, only active templates participate. The match pipeline and the fuzzy title search both filter is_active = true, so a soft-disabled template is excluded from every stage. Result sizing is bounded too: the default is 5 candidates and the maximum is 20, which keeps the response small and the latency predictable across the up-to-four sequential queries.

Deterministic Matching With Scanner Plugin Keys

The CVE-exact stage is ideal, but real scanner output is messier than the ideal. Many detections, particularly compliance checks, configuration audits, and vendor-specific signatures, never carry a CVE at all. They carry a plugin identifier instead. A Nessus detection has a plugin ID. A Qualys detection has a QID. These are stable, vendor-owned references to a specific check, and they are exactly as deterministic as a CVE within their own ecosystem.

PMAP captures that determinism in the external_match_keys[] field. When a content author knows that a particular vulnerability class is detected by Nessus plugin ID 12345 and Qualys QID 67890, they record both keys on the template. From then on, any finding that arrives carrying one of those plugin keys matches the template at confidence 0.9 in stage 2, with no CVE required and no fuzzy guessing involved.

This stage is what makes Smart Match practical rather than theoretical. CVE-rich findings are the easy case. Plugin-key matching is what extends high-confidence automatic linking to the broad swathe of findings that scanners emit without clean CVE data, and it does so deterministically rather than by similarity guesswork.

Confidence Scoring and Best Match

Every candidate the engine returns carries a numeric confidence between 0.0 and 1.0, and the scoring scheme is deliberate. A CVE-exact match scores 1.0, a plugin-key match scores 0.9, a fuzzy title match scores its actual trigram similarity, and a manual fallback scores 0.1. The score is not decoration. It tells an analyst how much to trust a candidate before accepting it, and it drives the ranking.

The best_match returned in the response is always candidates[0], the winner of the highest-priority stage that produced a result. Because stages run in priority order, the best match is the strongest available signal by construction. If a CVE matched, the best match is that 1.0 candidate. If no CVE matched but a plugin key did, the best match is the 0.9 candidate. The ranking respects stage priority first and then similarity within the fuzzy stage.

One subtlety is worth noting. A stage-1 winner at confidence 1.0 does not suppress lower-confidence candidates from appearing later in the list. Only exact duplicate templates are removed across stages. So an analyst still sees a ranked set of alternatives even when a perfect match exists, which is useful when the top candidate needs a sanity check before it is applied. For the deeper background on how CVSS, CWE, and CVE feed these scores, see our dedicated explainer on CVSS, CWE and CVE in vulnerability management.

Auto-Fill: What Gets Backfilled Onto a Finding

Matching a template to a finding is only useful if the match actually enriches the finding. That is the job of auto-fill. When an analyst selects a matched template, PMAP calls GET /{id}/autofill, which returns an AutoFillFields bundle, the ready-to-apply subset of the template that the finding domain injects into the finding on link, apply, or bulk re-match.

When the bundle is applied, a defined set of finding fields is populated from the template: cwe_ids[], mitre_technique_ids[], effects[], root_causes[], remediation_techniques[], the description, and the remediation guidance. In addition, two provenance fields are recorded directly on the finding: match_confidence and match_method. These two fields are what make the enrichment auditable. Anyone reviewing the finding later can see not just what was filled in, but how confident the match was and which method produced it, whether a CVE exact match, a plugin key match, a fuzzy title match, or a manual selection.

Auto-fill also respects the multi-language design through a language cascade. The AutoFillFromTemplate logic applies a firstNonNilStr fallback for each language variant: if a language-specific field such as name_en is absent, it falls back to the base name value rather than returning null. Critically, all three variants, base, TR, and EN, are returned in the bundle, so the UI can choose the appropriate locale at apply time rather than being locked into one. A template authored only in Turkish still enriches an English-locale finding sensibly, because the cascade guarantees a non-null value.

The practical effect is that a high-confidence match can enrich a finding with almost no analyst effort. The taxonomy, weakness mapping, ATT&CK techniques, and remediation arrive pre-filled and consistent, and the provenance is recorded automatically. The analyst’s job shifts from data entry to judgement, which is where their time should go.

Bulk Re-Match After the Library Updates

A knowledge base is never finished. New CVEs are published, new plugin keys are discovered, remediation guidance improves, and templates are refined. The problem this creates is that findings linked before an update do not automatically benefit from it. Re-enriching them one finding at a time would defeat the purpose of having a library in the first place.

PMAP solves this with bulk re-match. From the findings selection bar, an analyst selects a set of findings and calls POST /bulk-rematch. The operation re-runs Smart Match across every selected finding and backfills the taxonomy, CWE, and MITRE data from the new best match for each one. When the library improves, the improvement propagates to existing findings in a single action rather than a manual sweep.

The response from a bulk re-match reports per-item success or failure, so an analyst knows exactly which findings were updated and which were not, rather than getting an opaque all-or-nothing result. On the notification side, the operation emits a single bulk-updated event for fan-out rather than one event per finding, which keeps a large re-match from flooding the notification pipeline. The design assumes bulk re-match is a routine maintenance action, run whenever the library moves forward, not a rare one-off.

Keeping the Library Clean: Unique Names and Soft-Disable

A shared knowledge base is only trustworthy if it stays clean, and two rules protect it. The first is the duplicate-name guard. When a content author creates a template, Create checks for an existing active template with the same name via ExistsByName, scoped to is_active = true. If one exists, the request is rejected with ErrDuplicateName and an HTTP 409, and no template is created. This prevents two competing definitions of the same vulnerability class from quietly diverging in the library, which would split enrichment and confuse matching.

The second rule is soft-disable. Templates are never silently destroyed when they fall out of use. Instead, a content author sets is_active = false. A soft-disabled template disappears from match results and from default lists, so it stops participating in new enrichment, but it retains all its historical finding links. Existing findings are not de-linked when their template is disabled. This matters for audit and reporting continuity: a finding enriched last quarter keeps its provenance even after the underlying template is retired. You can deprecate a template without rewriting history.

Together these two rules let the library evolve without losing integrity. New definitions cannot collide with existing ones, and old definitions can be retired without erasing the trail they left behind.

A Platform-Level, Multi-Language Knowledge Base

One architectural decision shapes how the whole feature behaves: vuln templates are not tenant-scoped. They live at the platform level, visible to all authenticated users who have access to the relevant navigation section. This is deliberate. A known vulnerability class is the same vulnerability class for every tenant, so authoring it once and sharing it across the platform avoids the duplicated effort that per-tenant libraries would create. Create, edit, and delete actions are still gated by RBAC, so authoring is controlled even though the resulting knowledge is shared.

The platform-level design pairs naturally with the multi-language content model. Because the library is shared, investing in well-authored bilingual content pays off across every tenant and every team. Name, description, and remediation each exist in base, Turkish, and English variants, and the auto-fill cascade ensures a finding always receives a usable value regardless of which variants an author filled in. A team in one locale and a team in another draw from the same authoritative source, each seeing it in their own language.

This is also where Smart Match connects to the rest of PMAP’s knowledge layer. A template can link to a remediation record, which is how the curated fix guidance in our bilingual remediation knowledge base reaches matched findings. And the mitre_technique_ids[] a template carries are exactly what feed the ATT&CK tagging described in mapping findings to MITRE ATT&CK, where a finding’s technique array is backfilled from its matched template. Smart Match is the entry point that pulls all three together onto a single finding.

How PMAP Enriches Before Analysts Triage

Step back from the individual stages and the shape of the value is clear. Smart Match moves enrichment to the front of the workflow, before an analyst ever opens a finding for triage. Instead of an analyst pausing to look up a CVE, read a weakness mapping, and write remediation, that context is already attached, drawn from a library that was authored once and reused everywhere.

The four stages give graceful degradation. A clean CVE produces a certain match. A plugin key produces a deterministic vendor match. A title produces a similarity-ranked match. And when nothing fits, a low-confidence fallback still gives the analyst a starting point rather than a blank screen. At every level the confidence score tells the analyst exactly how much to trust the result, and the recorded match_method keeps the whole thing auditable.

The result is consistency at scale. Every finding linked to a given template inherits the same canonical taxonomy, the same CWE and ATT&CK mapping, and the same remediation guidance, which is precisely what makes downstream analytics and reports trustworthy. Bulk re-match keeps that consistency current as the library improves. Smart Match is not a shortcut that trades accuracy for speed. It is the mechanism that lets enterprise-scale enrichment stay both fast and consistent.

For a step-by-step walkthrough of authoring templates and running matches in the product, see our practitioner guide on Smart Match and VulnDB template linking. To see the broader enrichment picture this fits into, start from the vulnerability intelligence enrichment pillar.

> See the VulnDB Smart Match datasheet and auto-enrich your findings before triage in PMAP.

Frequently Asked Questions

How does Smart Match decide which template fits a finding?

Smart Match runs a prioritised four-stage pipeline on a single POST /match call. Stage 1 looks for an exact CVE overlap between the finding and a template, scoring 1.0. Stage 2 looks for a scanner plugin key overlap, scoring 0.9. Stage 3 compares the finding title against template names using pg_trgm trigram similarity above a 0.2 threshold, scoring the similarity value itself. Stage 4 returns the most-recently-updated active templates at 0.1 as a fallback. The best match is always the highest-priority stage winner, returned as the first candidate in the ranked list.

What counts as a high-confidence match?

The two strongest signals produce the highest confidence. A CVE exact match scores 1.0, because a CVE identifier uniquely references a specific vulnerability and an exact overlap is effectively certain. A scanner plugin key match scores 0.9, because plugin identifiers such as Nessus plugin IDs and Qualys QIDs are deterministic vendor-owned references. Fuzzy title matches score their actual trigram similarity, which is typically lower and varies with how closely the title matches a template name. A manual fallback scores 0.1 and is not a real match, only a starting point.

What finding fields get filled in automatically?

When a matched template is applied, auto-fill backfills cwe_ids[], mitre_technique_ids[], effects[], root_causes[], remediation_techniques[], the description, and the remediation guidance onto the finding. It also records two provenance fields, match_confidence and match_method, so the enrichment is auditable. Content fields respect a language cascade, falling back to the base value when a language-specific variant is absent, and all three variants are returned so the UI can pick the right locale.

Can I re-match many findings after updating the library?

Yes. The POST /bulk-rematch operation, available from the findings selection bar, re-runs Smart Match across a selected set of findings and backfills taxonomy, CWE, and MITRE data from the new best match for each one. It reports per-item success or failure so you can see exactly which findings updated, and it emits a single bulk-updated event rather than one per finding. This is how library improvements propagate to findings that were linked before the update.

Why does a soft-disabled template stop matching but keep its history?

Soft-disable sets is_active = false on a template. The match pipeline and fuzzy title search both filter to active templates only, so a disabled template is excluded from every match stage and from default lists. However, it retains all its historical finding links, and existing findings are never automatically de-linked when their template is disabled. This lets you retire an outdated definition without erasing the provenance of findings that were already enriched from it, which preserves audit and reporting continuity.

Are vuln templates shared across tenants or kept separate?

Templates are platform-level and not tenant-scoped. They are visible to all authenticated users with access to the relevant navigation section, because a known vulnerability class is the same class for every tenant. Authoring it once and sharing it avoids the duplicated effort of per-tenant libraries. Create, edit, and delete actions remain gated by RBAC, so authoring stays controlled even though the resulting knowledge base is shared.

What happens if a finding has no CVE and no plugin key?

The pipeline still runs as long as the request carries at least one of title, CVE IDs, or plugin keys. With no CVE and no plugin key, stages 1 and 2 produce nothing, and the engine falls through to stage 3, which compares the finding title against template names by trigram similarity. If even that finds nothing above the 0.2 threshold, stage 4 returns the most-recently-updated active templates at 0.1 confidence as a starting point. If the request carries none of title, CVE IDs, or plugin keys, the API returns a validation error rather than an empty result.

PMAP Security Team

See Full Bio

One platform to ingest, correlate, triage and remediate every vulnerability finding.

Build and deliver vulnerability management with PMAP

Help Build the Vulnerability Management Platform Security Teams Trust

Smart Match: Auto-Linking Findings to a Vuln Template Library