Automating Vulnerability Operations: Rules and Runbooks

Most vulnerability programs do not fail because they cannot find vulnerabilities. They fail because the work that comes after the scan does not scale. A scanner returns five thousand findings on a Tuesday, and a team of three analysts is supposed to triage, deduplicate, assign, prioritize, ticket, notify, chase, and re-test every one of them. The math never works. The backlog grows faster than the team can clear it, and the most repetitive decisions, the ones a senior analyst makes the same way every single time, consume the hours that should go to the genuinely hard calls.

Automation is the answer, but automation in a vulnerability program is not one thing. There is the work of deciding what a finding is the moment it arrives, and there is the work of reacting to things that happen across the platform over time. PMAP separates these into two distinct, complementary layers. The finding rule engine encodes triage policy that runs at intake. Runbooks encode event-driven response that runs whenever something meaningful occurs. Together they let a program author its operating logic once and have the platform enforce it consistently, with dry-run safety, approval gates, and full audit trails behind every automated action.

This article walks through both layers in depth. It covers how the rule engine matches findings against criteria trees and applies one of eight action types, how runbooks fire across sixteen triggers and dispatch twenty-two actions, how the reliability controls keep automation from becoming a liability, and how durable workflows handle the long-lived waits that real response playbooks require. If you are mapping automation onto your broader vulnerability management lifecycle, this is the layer that makes every other stage move on its own.

Where Manual Vulnerability Work Eats Your Team

Walk the day of a typical vulnerability analyst and you will find the same patterns repeating. A batch of findings lands from a scanner. The analyst opens each one, reads the title, checks the asset it sits on, decides whether the scanner-assigned severity is trustworthy, downgrades the ones that have no known exploit, escalates the ones on internet-facing assets, assigns the AppSec ones to the AppSec team, tags the PCI-scope ones for the auditors, and opens tickets for the criticals. None of these decisions is hard. Every one of them is a rule the analyst already carries in their head. The cost is not difficulty. The cost is volume and repetition.

The same pattern repeats at the event level. A scan finishes, so someone needs to notify the owning team and check whether the critical count crossed a threshold. An SLA breaches, so someone needs to escalate and open a ticket. A finding gets reassigned, so the new owner needs to know. A report finishes rendering, so the requester needs the link. These are reactions, and they are predictable reactions. Yet without automation each one waits on a human noticing that something happened, remembering the correct response, and executing it consistently. People do not notice consistently. They forget, they get pulled into incidents, they go on leave, and the reaction that was supposed to happen in minutes happens in days or never.

The deeper problem is consistency. When triage decisions live only in analysts’ heads, two analysts triage the same finding differently, and the same analyst triages the same finding differently on a bad day. Auditors cannot verify a policy that exists only as habit. Managers cannot prove that critical internet-facing findings always escalate, because sometimes they do not. Automation is not only about speed. It is about making the program’s policy explicit, enforceable, and provable. That is the lens PMAP applies to both layers of automation, and it is why the practitioner ebook on automating vulnerability operations treats consistency as the primary win, not headcount savings.

Two Layers of Automation: Rules vs Runbooks

PMAP draws a clear line between two kinds of automation, and understanding the line is the key to using both well.

The finding rule engine is a policy-driven mutation engine. A rule is a named, prioritized policy that matches active findings against a declarative criteria tree and applies an action whenever the criteria are satisfied. Rules run at finding creation time and during on-demand bulk re-evaluation sweeps. Their job is to make the platform self-governing at intake. Instead of an analyst triaging every imported finding by hand, an operator encodes the triage decision once as a rule, and the engine applies it automatically the moment a finding arrives. Rules answer the question, “what should this finding become the instant we receive it?”

Runbooks are an event-triggered automation engine. A runbook is a named automation composed of one or more trigger rules and an ordered action list. When any trigger rule matches an incoming platform event, the action list executes in sequence. Runbooks subscribe to events such as a finding being created, an SLA breaching, or a scan finishing, and they can also fire on a cron schedule. Their job is reaction and orchestration over time. Runbooks answer the question, “when something happens, what sequence of steps should follow?”

The distinction matters because the two layers have different shapes. A rule applies a single mutation to a single finding based purely on that finding’s attributes. A runbook runs a multi-step sequence that can notify across channels, open tickets, trigger scans, mutate assets, wait, and branch. A rule fires inline and fast on every finding. A runbook fires in response to a discrete event and can run a long-lived playbook that pauses for days. You reach for a rule when the answer is a deterministic property of the finding itself. You reach for a runbook when the answer is a workflow that touches other systems and unfolds over time. Many programs run both, with rules cleaning and classifying findings at the door and runbooks handling everything that happens afterward.

The Finding Rule Engine: AND/OR Criteria

The heart of the rule engine is the criteria tree. A rule’s matching logic is expressed as a nested CriteriaGroup tree of unlimited depth, where individual conditions are linked by and or or logic and sub-groups can themselves mix both. This is what lets a rule capture policy that is genuinely conditional rather than a flat list of equality checks. You can express “severity is critical AND (the asset is internet-exposed OR the finding carries a pci tag)” directly, with the parentheses meaning exactly what they mean in the policy you wrote on a whiteboard.

Each condition in the tree pairs a field, an operator, and a value. The engine exposes a rich field set of more than twenty-five matchable attributes spanning severity, CVSS score, known-exploit status, SLA-breach status, vulnerability type, scanner source and category and reference, asset type and class and group membership, tags, taxonomy codes such as effects and root causes and remediation techniques, project, company, title, CVE ID, and endpoint. The field surface is wide enough that almost any triage decision a human makes by reading a finding can be expressed as a condition on data the platform already holds.

The operator library carries sixteen operators: eq, neq, contains, starts_with, matches_regex, gt, lt, gte, lte, lte_ord, gte_ord, in, not_in, all_in, older_than, is_empty, and is_not_empty. The ordinal comparison operators lte_ord and gte_ord are worth a closer look because severity is not a number. They understand the real ordering of info < low < medium < high < critical < urgent, so a rule can match "severity at or above high" without the operator treating the values as alphabetical strings. The matches_regex operator lets a rule pattern-match titles for scanner-specific naming, all_in checks that a finding carries every tag in a set, and older_than matches on age for findings that have lingered past a threshold.

There is a deliberate safety property in how empty criteria are handled. An empty criteria tree is rejected at create and update time, and the evaluator also returns false for empty trees as a fail-closed default. The combination means a misconfigured or blank rule can never accidentally match every finding and mass-mutate the database. The system refuses the empty rule on the way in, and even if one somehow slipped through, it would match nothing rather than everything. For a feature that mutates production findings automatically, fail-closed is the correct default, and the datasheet for the finding rule engine calls this out as a core guardrail.

Eight Action Types and Priority-Ordered Evaluation

When a rule's criteria match, it applies one of eight action types. override_severity sets the finding's severity, which is the most common triage action because external scanner severity is frequently wrong for your environment. accepted_risk moves the finding into accepted-risk status for findings that match a documented exception. assign is the unified multi-target assignment that can route a finding to multiple users and multiple teams at once, while the legacy auto_assign and auto_assign_team actions remain for single-user and single-team back-compatibility. append_tag adds tags, set_vuln_type classifies the vulnerability type, set_remediation writes remediation text, and normalize_title trims and title-cases the finding title to clean up inconsistent scanner output.

The append_tag action carries an important detail. It is idempotent. Tags already present on a finding are filtered out before the mutation commits, which means re-running a tagging rule never produces duplicate tags and never generates the audit noise that duplicate mutations would create. Small idempotency choices like this are what keep an automated system from polluting its own audit trail over time.

Rules do not run in isolation. The engine evaluates all active, non-expired rules for a company in ascending priority order, and multiple rules can match the same finding in sequence. This sequencing is intentional and powerful. You can author a low-priority rule that normalizes titles, a mid-priority rule that overrides severity, and a high-priority rule that assigns based on the now-corrected severity, and they compose into a small pipeline that transforms a raw finding into a fully triaged one. The priority field is how you control the order of that pipeline. Rules also support an optional expiry, so a rule authored for a specific campaign or window stops applying after its expires_at without anyone needing to remember to delete it.

Safe Mutations: Dry-Run, Preview and Revoke

Automation that mutates production data needs to be reversible and previewable, and the rule engine treats this as a first-class concern rather than an afterthought.

Before a rule is saved, an operator can dry-run it. The preview endpoint runs the rule against up to five hundred active findings for a company with no database writes and returns the before and after severity and status for each matched finding. This is the difference between guessing what a rule will do and seeing exactly what it will do. A live affected-findings view goes further, evaluating the criteria tree against up to five thousand of the most recent active findings and returning the matching page with a change summary on every row, so an operator can audit the blast radius of a rule against real data at any time. There is also an in-context test that evaluates a rule against a caller-supplied finding payload and returns matched or unmatched plus the resolved mutation, which is invaluable for debugging a rule's logic without needing a real finding that fits.

After a rule has run, its effects can be undone. Every successful rule application writes an audit-log row capturing the action snapshot and the before and after summaries. The revoke operation walks that audit log and restores each finding to its recorded prior state. Revoke supports two modes. Safe mode, the default, restores only findings whose current state still matches the snapshot the rule produced, which protects against clobbering manual edits that an analyst made after the rule ran. Force mode restores unconditionally. For large blast radii the revoke runs as an asynchronous job that the frontend can poll for progress, while smaller revokes run inline. The on-finding override message, a human-readable string such as "Rule 'No Exploit Critical to High': severity critical to high," appears as a badge on the finding so anyone reviewing it can see at a glance that an automated rule shaped this record and exactly what it changed.

Sensitive rules can also require approval before they ever run. A rule can opt into a mandatory four-eyes gate, moving through a draft to pending to approved state machine where the approver must differ from the creator. Until a gated rule is approved, the evaluator skips it entirely. This means a rule that mass-accepts risk or mass-downgrades severity cannot take effect on one person's say-so. The combination of dry-run before, four-eyes during, and revoke after gives a program three independent safety nets around automated mutation.

Event-Triggered Runbooks

Where rules act on findings at intake, runbooks act on events across the whole platform. A runbook subscribes to platform events and, when a trigger matches, runs its ordered action list. The subscription model is event-driven and efficient. When an event is dispatched, the engine queries only the runbooks subscribed to that event type, then evaluates each one's conditions in-process, so a busy event does not wake every runbook in the system.

A runbook's trigger logic has two levels. A runbook fires when any of its trigger rules matches, which is an OR across rules. Within a single rule, the event type must match and an optional condition tree must evaluate true, which is AND between the event match and the conditions, with AND/OR logic available inside the condition tree itself. Conditions are evaluated against enriched event payloads using dot-path fields such as finding.severity or asset.criticality, and the condition layer offers twelve operators including eq, neq, the numeric and ordinal comparisons, contains, in, not_in, starts_with, older_than, and within_last. This is how a runbook narrows from "any finding created" down to "any critical finding created on an internet-exposed asset," which is usually the population you actually want to react to.

Payload enrichment is handled once per event rather than once per runbook. When an event arrives, the engine enriches and normalizes the payload a single time, performing the asset and finding database lookups up front, then shares the enriched payload across every matched runbook. On a busy event that matches many runbooks, this avoids repeating the same database queries for each one. Template variables resolve against this same enriched payload, so a {{ finding.title }} placeholder in an action config is substituted with the real value before the action executes, letting notifications and tickets carry the specific details of the event that triggered them.

Sixteen Triggers and Twenty-Two Actions

The expressive range of the runbook engine comes from its catalog of triggers and actions. There are sixteen supported trigger events. Nine of them cover the finding lifecycle: finding_created, finding_updated, finding_status_changed, finding_assigned, finding_retested, finding_note_added, finding_rule_applied, and finding_bulk_updated, which means a runbook can react to almost any change a finding undergoes, including changes made by the rule engine itself through finding_rule_applied. The remaining triggers cover scan completion through remote_scan_completed and scan_finished, report generation through report_generated, SLA events through sla_breached and sla_escalated, asset lifecycle through asset_created and asset_updated, and time-based firing through the synthetic schedule trigger.

The scan_finished trigger deserves a note because its payload carries the per-severity counts: critical, high, medium, low, and total. This lets a runbook condition on the shape of a scan result, firing only when, for example, a scan returns more than zero criticals, so the response runbook stays quiet on clean scans and acts only when there is something worth acting on.

On the action side there are twenty-two action types, organized into clear categories. The notification actions are send_webhook, send_email, send_slack_message, send_teams_message, and a generic http_request for full control over method, headers, and body. The finding-mutation actions are change_finding_status, change_finding_severity, add_finding_tags, remove_finding_tags, extend_sla, reset_sla, assign_finding, and append_comment, which together let a runbook do to a finding most of what an analyst would do by hand. The ITSM actions create and update and comment on tickets through the integration layer, covering Jira, ServiceNow, and generic ticket creation with multi-team routing. The remaining actions span scanning through trigger_scan, reporting through generate_report, integration sync through integration_sync, asset mutation through update_asset and manage_team_assets, owner resolution through resolve_owner, and flow control through sleep, await_signal, and when.

Put the catalog together and a single runbook can express a complete response playbook. Consider a runbook triggered on sla_breached for critical findings: it can extend the SLA window, create a Jira ticket routed to the owning team, post a Slack message to that team's channel, append a comment recording the escalation, and assign the finding to the team lead, all in one ordered sequence. What used to be five manual steps scattered across five tools becomes one automation that runs the same way every time. This is the kind of orchestration the runbook automation platform datasheet details action by action.

Cron and Schedule Triggers

Not every reaction is triggered by an event. Some work needs to run on a clock. The schedule trigger covers this. A runbook with a schedule trigger registers an individual cron entry, and a background scheduler reloads active schedule triggers every sixty seconds to pick up changes made through the API, so a newly scheduled runbook becomes active within a minute of being saved without a server restart.

Schedule triggers turn runbooks into a general-purpose maintenance and reporting engine. A runbook can run every morning to generate and distribute a daily report, sweep weekly for stale findings using the same condition language as event triggers, or run a periodic integration sync that pulls fresh findings from a connector on a cadence. The synthetic schedule payload carries the trigger time, the cron expression, and the company ID, so even a time-based runbook still has context to work with, and its actions can be templated the same way an event-driven runbook's actions are. Cron and event triggers share one engine and one action catalog, so the team that learns to build event runbooks already knows how to build scheduled ones.

Reliability: Circuit Breaker, Retry and Throttle

Automation that can mutate findings, open tickets, and call external systems needs guardrails, because a runbook that misfires can do real damage at machine speed. PMAP builds three reliability controls into the runbook engine.

The circuit breaker protects against runaway failure. After ten consecutive failures, a runbook is automatically deactivated, with its active flag set to false and the trip time stamped. This means a runbook whose downstream system is broken, whose webhook URL has gone stale, or whose configuration has drifted does not keep firing into the void hundreds of times. It trips, stops, and surfaces a badge in the UI with a reset control that an operator uses to re-enable it once the underlying problem is fixed. Skipped runs do not count toward the threshold, because a skip from a pre-flight check or a false when condition is normal operation, not a health signal, and counting it would trip healthy runbooks for doing exactly what they were designed to do.

Per-action retry handles transient failure. Each action carries its own retry policy supporting up to twenty attempts with exponential backoff and configurable initial backoff, maximum backoff, and growth factor. A momentary network blip when posting to Slack or a brief ITSM timeout does not fail the whole run. The action retries on its own schedule and succeeds on the second or third attempt, which is exactly the behavior you want for actions that cross a network boundary into systems you do not control.

The throttle and concurrency gate prevents storms. A throttle_seconds setting stops a runbook from re-firing within a window, which matters when an event source is chatty, and a template-resolved concurrency_key caps the number of simultaneous instances of a runbook. Together they keep a runbook that reacts to a high-frequency event from spawning thousands of overlapping executions, which protects both the platform and any downstream system the runbook calls.

There is one more piece of operational hygiene worth naming. Pre-flight checks turn unconfigured dependencies into clean skips rather than failures. If an email action runs on an installation where SMTP is not configured, or a scan action runs without a scan service wired, the action is recorded as a structured skip with a reason rather than a hard failure. This keeps the audit log clean on fresh installs and during partial configuration, and it keeps the circuit breaker from tripping on conditions that are configuration gaps rather than genuine errors.

Durable Workflows for Long-Lived Waits

Most runbooks run inline. Their actions execute synchronously in the same goroutine as the event handler, with no separate queue, which is fast and simple and correct for the great majority of automations that complete in milliseconds. But some response playbooks need to wait, and waiting is the hard part of any automation system because a process can restart while a wait is in flight.

Two action types embody this need. The sleep action pauses execution for a duration, expressed naturally as 30s, 5m, 2h, or 7d. The await_signal action waits for a named external signal with a configurable timeout that defaults to seventy-two hours, which is the building block for human-in-the-loop gates where a runbook pauses until a person approves a step or an external system reports back. A real remediation playbook might trigger a scan, then await_signal for the scan-completed signal, then react to the result, a sequence that inherently spans the gap between two separate events that could be minutes or hours apart.

For these long-lived waits, PMAP offers a durable workflow engine. PMAP's runbook engine supports three execution modes selected by configuration. The default inline mode runs actions synchronously and skips durable-wait primitives. The workflow mode dispatches each runbook execution as a durable go-workflows instance backed by a dedicated PostgreSQL backend, where actions run as registered activities and timers and signals are persisted so they survive worker restarts. The shadow mode is a hybrid used for production validation, running the inline path for real while also starting a workflow instance in parallel without committing its side effects, so a team can verify the durable path before cutting over to it.

The durability guarantee is what makes long waits trustworthy. When a runbook in workflow mode hits a sleep for seven days or an await_signal for a scan result, the timer and the signal channel are stored in the workflow engine's history and pending-event tables. If the worker process restarts during that week, the engine replays the workflow's recorded history to reconstruct its exact state and resumes the wait where it left off. The execution row is finalized from running to its terminal status only when the workflow genuinely completes, and a stale-running watchdog marks any workflow-driven execution as failed if it has been stuck for more than forty-eight hours, catching the rare case where a worker crashed before it could finalize. The infrastructure that provides this, a correctly bootstrapped and migration-safe durable backend, is isolated cleanly enough that if it cannot initialize at startup the platform falls back to inline mode and logs a warning rather than crashing, so durable workflows are an enhancement to the runbook engine rather than a dependency the whole platform rests on.

Template Library and No-Code Building

Powerful automation is only useful if a team can actually build it, and both layers are designed so an operator builds automation through guided, schema-driven interfaces rather than by writing code.

The rule engine serves a form-builder schema that returns declarative metadata describing every criteria field, its type, its enum values, and its allowed operators. The no-code rule builder UI is driven entirely by this schema, which means the builder always offers exactly the fields and operators the engine actually supports, with no risk of the frontend drifting out of sync with the backend. An operator picks a field, the UI shows only the operators valid for that field, and the resulting rule is guaranteed to be one the engine can evaluate. There is also a seeded rule library of platform-suggested rules that an operator can surface in a "Suggested" tab and enable in one click, so a new program does not start from a blank page but from a set of sensible starting policies.

Runbooks have an equivalent building experience. A trigger schema registry serves a per-event field catalog, including each field's type, operators, enum values, and a sample payload, and this catalog powers the wizard's condition builder and its variable picker, so an operator building a condition on finding_created sees exactly the fields that event carries and can autocomplete template variables from the same source. Runbooks also ship a template library. Library runbooks are marked as templates and excluded from event dispatch, and platform-shipped seed templates are marked as built-in with a clone-to-edit flow, so an operator starts from a proven template, instantiates an editable tenant-scoped copy, adjusts it, and activates it. A dry-run test panel lets an operator evaluate a runbook's trigger conditions and simulate its actions against a payload with no side effects, and a recent-events feed seeds that test payload with real past data, so the test reflects what the runbook will actually see in production.

Both builders share a design philosophy with the broader platform. The schema is the single source of truth, the UI is generated from it, and the operator is guided toward valid configurations rather than left to discover them by trial and error. This is what makes the difference between automation that only a specialist engineer can touch and automation that a security program manager can author and own.

How PMAP Automates the Whole Operation

Step back from the individual features and the architecture of PMAP's automation comes into focus. Two layers, one substrate.

The substrate is the internal event bus. Every state change in the platform, a finding created, a scan finished, an SLA breached, a rule applied, an asset updated, is published to an in-process publish/subscribe bus that carries a catalog of named event types covering the finding lifecycle, scan, SLA, asset, company, project, assessment run, approval, notification, and CI events. Dispatch is asynchronous and isolated. Each subscriber runs in its own goroutine, a panicking handler never crashes its siblings or the emitter, and handlers receive a background context so their follow-on database work survives past the originating HTTP request. The runbook engine is a subscriber to this bus, which is precisely how runbooks become event-triggered without any domain needing to know that runbooks exist. The finding service emits a finding_created event and moves on. The bus delivers it. The runbook engine decides what to do with it. The producers and the automation are fully decoupled.

On top of that substrate sit the two layers. The rule engine runs at intake, invoked by the finding service after every create or import, transforming raw findings into triaged ones through priority-ordered policy. When a rule applies, it emits a finding_rule_applied event back onto the bus, which means a runbook can react to the rule engine's own decisions, chaining the two layers into a longer automation than either could express alone. The runbook engine runs in response to events and on schedules, orchestrating multi-step playbooks across notifications, tickets, scans, assets, and external systems, with reliability controls and durable workflows holding the long-running cases together.

The result is a vulnerability operation that runs itself for the predictable cases and reserves human attention for the genuinely hard ones. Triage policy that used to live in analysts' heads becomes explicit, dry-run-tested, four-eyes-approved, and audit-logged rules. Reactions that used to wait on a human noticing become event-driven runbooks that fire in seconds, retry transient failures, trip a breaker when something is genuinely broken, and survive a worker restart mid-wait. The program's operating logic stops being tribal knowledge and becomes enforced, provable platform behavior. That is what automating vulnerability operations actually means, and it is the foundation that lets every other stage of the vulnerability management lifecycle scale.

For practitioners building this out, the rule engine deep-dive and the runbook design guide cover authoring patterns in detail, the durable workflow explainer covers the long-wait mechanics, and the event bus overview covers the substrate underneath both. To see how reaction logic depends on the underlying intelligence, the work of enriching findings before they ever reach a rule is covered in the platform's vulnerability intelligence layer.

External standards reinforce the model. The structured, repeatable response playbooks PMAP runbooks encode align with the playbook concept in NIST SP 800-61 on incident handling, and mapping automated responses to attacker behavior draws on the MITRE ATT&CK framework. The goal is the same one those standards push toward: response that is consistent, documented, and fast enough to matter.

Frequently Asked Questions

PMAP Security Team

See Full Bio

One platform to ingest, correlate, triage and remediate every vulnerability finding.

Build and deliver vulnerability management with PMAP

Help Build the Vulnerability Management Platform Security Teams Trust

Automating Vulnerability Operations with Rules and Runbooks