A vulnerability platform is only as fast as the work that happens after a finding lands. A critical issue gets imported, then someone has to notice it, open a ticket, assign an owner, extend an SLA, and ping a channel. None of that is hard. All of it is repetitive, and at enterprise scan volume the repetition becomes the bottleneck. This is the gap PMAP runbooks close. A runbook is event-triggered automation that watches the platform, matches the conditions you care about, and runs an ordered list of actions in seconds. No code, no external scheduler, no cron job sitting on a server somewhere that nobody remembers to maintain.
This article is the deep dive on how runbooks are designed in PMAP. It covers what a runbook actually is, the 16 trigger event types that start one, the catalog of 22 actions it can run, and the reliability machinery that keeps a misbehaving playbook from quietly poisoning your environment. If you want the conceptual map of how rules and runbooks fit into the broader automation story, start with the pillar on vulnerability management automation. If you want a step-by-step walkthrough of the builder, the guide on designing a runbook with triggers and actions is the companion piece. Here we focus on the why and the design decisions behind each capability.
One distinction up front. PMAP has two automation layers, and they are easy to conflate. The finding rule engine applies inline mutations to findings as they are imported. It normalizes titles, overrides severity, and assigns owners at ingest time. A runbook is different. It is an event-triggered playbook that reacts after the fact, when a finding is created, when an SLA breaches, when a scan finishes, or on a cron schedule. Rules shape the data as it arrives. Runbooks orchestrate what happens once it is in the system. Both matter, and most mature programs use both.
Where Manual Notification and Ticket Work Eats Your Team
Walk through what a security engineer does in a typical week and you will find the same shapes repeating. A critical finding appears, so they open Jira and file a ticket. An SLA timer trips, so they message the remediation owner on Slack. A nightly assessment lands, so they generate a report and email it to the consulting lead. An asset gets onboarded, so they kick off a scan against it. Each of these is a small, well-defined reaction to a platform event. Each is also a manual handoff that depends on a human being awake, available, and paying attention.
The cost is not just the minutes spent clicking. It is the variance. One engineer files the ticket in ten minutes. Another files it the next morning. A third forgets, and the finding sits untriaged until an SLA breach surfaces it. Inconsistency in response is itself a risk, because the speed and quality of your reaction now depends on who happens to be on shift. Auditors notice this too. When you cannot show that every critical finding got a ticket within a defined window, you cannot claim the control is operating.
Runbook automation removes the human from the predictable middle of these flows. The engineer designs the playbook once. After that, PMAP responds to the event the same way every time, in seconds, with a full execution record. The personas who benefit are concrete. Security engineers build and tune the playbooks. SOC analysts monitor automation health and reset anything that has tripped. Platform administrators publish shared templates and enable durable execution. Remediation owners get automated assignment, SLA extensions, and tickets without lifting a finger.
A Runbook Is Triggers Plus an Ordered Action List
Strip away the interface and a runbook is two things. It is a set of trigger rules that decide when the runbook fires, and it is an ordered list of actions that run when it does. That is the whole model. Everything else is configuration around those two halves.
A runbook fires when any of its trigger rules matches an incoming event. Within a single rule, the event type has to match and an optional condition tree has to evaluate true. So the matching logic is OR across rules and AND-within-a-rule between the event and its conditions. This gives you room to express both broad and narrow intent. One runbook might fire on three different finding events, each with its own conditions, all funneling into the same action list.
When a runbook fires, its actions run sequentially. Action one completes, then action two begins, and so on down the list. This sequential model matters because actions can depend on each other. A resolve_owner action figures out which team owns an affected asset, and the output of that lookup is merged back into the execution payload so a later assign_finding or create_ticket action can route to the right team. PMAP calls this payload chaining. Each activity’s output is folded into the shared workflow state, so state mutations propagate forward to every subsequent step.
PMAP renders this structure on the detail page as a ReactFlow canvas. The trigger sits as a node, the actions chain out from it, and each node carries the status of its most recent execution overlaid directly on the graph. You are not reading a config file to understand a playbook. You are looking at the actual flow, with the last run’s outcome painted onto each step. That visual model is the difference between automation you trust and automation you are afraid to touch.
Sixteen Trigger Event Types Including Cron
A runbook can subscribe to 16 platform event types. Fifteen of them are real events that PMAP emits as work happens, and one is a synthetic cron trigger. The real events span the finding lifecycle, scanning, reporting, SLA, and assets.
On the finding side you have finding_created, finding_updated, finding_status_changed, finding_assigned, finding_retested, finding_note_added, finding_rule_applied, and finding_bulk_updated. That coverage is deliberate. It means a runbook can react to the moment a finding is born, the moment its status moves, the moment a rule mutates it, or the moment someone adds a note. Each event carries context. finding_status_changed brings the old and new status. finding_assigned brings the assignee user and team identifiers. You build conditions against that context.
On the scanning and reporting side you have remote_scan_completed, scan_finished, and report_generated. A scan_finished event arrives with severity counts already attached, so a runbook can branch on whether a scan turned up any critical findings before deciding what to do. SLA contributes sla_breached and sla_escalated, which let you wire breach handling directly into your routing. Assets contribute asset_created and asset_updated, so onboarding a new host can automatically kick off a baseline scan.
The sixteenth trigger is schedule. Instead of waiting for a platform event, a schedule trigger fires on a cron expression. There is no real event behind it. PMAP synthesizes a payload with the trigger time, the cron expression, and the company scope. This is how you run nightly reports, weekly hygiene sweeps, or any time-based automation without standing up a separate scheduler. A background scheduler owns these cron entries and reloads the active set every 60 seconds, so a new or edited schedule runbook becomes live within a minute of being saved.
Conditional Branching and Template Variables
A trigger that fires on every event of a type is rarely what you want. You want the runbook to fire only for the cases that matter. PMAP gives you two tools for that precision. The first is the condition tree. The second is template variables.
A condition tree is a nested AND/OR structure evaluated against the enriched event payload. You build conditions against dot-path fields like finding.severity or asset.criticality, and you combine them into groups with AND or OR logic. The engine supports 12 operators. You get the obvious comparisons in eq, neq, gte, lte, gt, and lt. You get membership with in and not_in. You get string matching with contains and starts_with. And you get two time-aware operators, older_than and within_last, which is how you express conditions like a finding that has gone unaddressed for longer than a threshold. Because payloads are enriched once per dispatch with asset and finding lookups, your conditions can reach into related data the raw event did not originally carry.
Template variables handle the other half. Action configs accept {{ field }} placeholders that resolve against the enriched payload before the action runs. A Slack message body can say the finding title and severity by name. A ticket can be filled with the asset hostname. The same playbook adapts to each event because the values are pulled from context rather than hardcoded. There is also a when expression on individual actions. When a when expression evaluates false, that action is skipped rather than failed, and the rest of the chain continues. The dedicated when action type goes further with explicit then and else sub-lists, so you can branch the action chain itself based on a runtime expression.
The 22-Action Execution Catalog
Triggers decide when. Actions decide what. PMAP ships a catalog of 22 action types, grouped by what they touch. Understanding the catalog is understanding the reach of the platform, because these actions are how a runbook actually changes the world.
Five actions handle notification. send_webhook posts JSON to a URL and auto-wraps the runbook and event metadata in the body. send_email goes out through the platform SMTP configuration with template-resolved recipient, subject, and body. send_slack_message and send_teams_message post to incoming webhooks for those platforms. And http_request is the escape hatch, a generic HTTP client with full control over method, URL, headers, body, and accepted status codes for any integration the named actions do not cover.
Eight actions mutate findings, which is where runbooks meet the triage and remediation parts of the vulnerability management automation workflow. change_finding_status and change_finding_severity set those fields directly. add_finding_tags and remove_finding_tags manage the tag array, with appends deduplicated so they stay idempotent. extend_sla pushes a deadline forward, and reset_sla sets a fresh deadline and clears the pause and notified state. assign_finding adds an assignee, promotes an open finding to assigned, and is idempotent through an ON CONFLICT clause so re-running it does no harm. append_comment writes a comment into the finding’s status history, attributed to the runbook so the audit trail shows automation as the author.
The ITSM group connects runbooks to your ticketing systems. create_jira_issue, create_servicenow_ticket, and the generic create_ticket create tickets through the integration service, support multi-team routing, and link the ticket reference back to the finding. update_ticket_status and comment_ticket keep that linked ticket in sync, and they tolerate a stale reference gracefully rather than failing the whole run.
The remaining actions cover the rest of the platform. trigger_scan creates a new scan through the scan service and auto-selects a scanner for the asset class when you do not name an integration. generate_report creates and async-renders a report, with a per_company fan-out mode that produces a separate report for every active company. integration_sync triggers a pull-mode sync on a named integration, such as a Tenable findings pull. update_asset and manage_team_assets mutate asset metadata and team membership, both audit-logged. And four flow-control actions shape execution itself. sleep pauses for a duration, await_signal waits for an external signal, when branches the chain, and resolve_owner looks up an asset’s owning team and writes the result into the payload for downstream actions to use.
Reliability: Retry, Throttle and Concurrency
Automation that runs unattended has to be reliable, or it becomes a liability. A runbook that hammers a flaky webhook, or fires a thousand times in a burst, or stacks up parallel instances against the same finding, will cause more damage than the manual process it replaced. PMAP builds three guardrails into every runbook for exactly this reason.
The first is per-action retry. Each action carries its own retry policy. You set a maximum attempt count between 1 and 20, an initial backoff, a maximum backoff, and an exponential factor. So a transient network blip on a create_jira_issue action does not fail the whole playbook. It retries with growing delays until it succeeds or exhausts its attempts. Retry is configured per action, not per runbook, because different actions have different failure profiles. A webhook to an internal service might warrant aggressive retries while a notification might warrant none.
The second guardrail is throttling. A throttle_seconds setting prevents the runbook from re-firing within a window. If a flood of finding events arrives, the runbook fires once and then suppresses repeats for the configured duration. A suppressed fire is recorded as a skipped execution with a gate-skip reason, and critically it does not count against the circuit breaker. Throttling is a deliberate behavior, not a failure, so it does not poison the health signal.
The third guardrail is concurrency. A concurrency_key, which can itself be resolved from a template variable, caps how many instances of the runbook run simultaneously. This matters when actions touch shared state. If a runbook mutates a finding, you do not want two copies racing each other on the same record. The concurrency gate serializes them.
The Circuit Breaker That Auto-Deactivates
Retries handle transient failures. The circuit breaker handles systemic ones. Some failures are not blips. A webhook URL gets decommissioned, an integration credential expires, a downstream service goes permanently dark. In those cases retrying forever is worse than stopping, because every fire produces a failed execution and a noisy alert. PMAP watches for this pattern.
After 10 consecutive failed executions, the runbook trips its circuit breaker. The platform sets is_active to false, stamps breaker_tripped_at with the current time, and surfaces a circuit-breaker badge in the interface. The runbook stops firing. This is the automated equivalent of an electrical breaker tripping before a fault burns the house down. Importantly, skipped executions do not count toward the threshold, so a throttled or pre-flight-skipped run never trips the breaker. Only genuine consecutive failures do.
Recovery is a single action. Once an operator has fixed the root cause, they click Reset Breaker. PMAP clears the consecutive failure count, nulls out breaker_tripped_at, and sets is_active back to true. The runbook resumes. The breaker is not a punishment. It is a backstop that buys you time to fix the underlying problem without drowning in alert noise, and it hands control back the moment you are ready.
Dry-Run Testing Against Real Payloads
Nobody should activate a playbook by guessing. A runbook that fires on a bad condition, or runs an action against the wrong target, can do real damage at scale. PMAP gives you a dry-run path so you validate a runbook against reality before it ever touches production.
The test modal lets you choose an event and supply a payload. You can pick a real recent event, seeded from a feed of recent platform events, or you can hand-author a custom JSON payload. The runbook then evaluates its trigger conditions against that payload and simulates every action in the chain. Crucially, the dry-run produces no side effects. No finding is mutated, no ITSM ticket is created, no notification is sent. What you get back is a per-action result showing what each step would have done. You see whether the condition matched, and you see the simulated outcome of every action, all without any real-world consequence.
This closes the gap between designing a playbook and trusting it. You build the runbook, you point it at a real event from last week, and you watch exactly how it would behave. If a condition is too broad or an action targets the wrong field, you catch it in the test rather than in an incident review. Testing against real payloads, not invented ones, is what makes the validation meaningful.
Durable Sleep and Await-Signal for Long Waits
Most runbooks finish in seconds. Some need to wait. A remediation gate might pause for a human to approve a change. A verification step might sleep for three days before re-checking whether a finding was actually fixed. These long-lived waits are where naive automation breaks, because if the process restarts while a runbook is sleeping, the wait is lost.
PMAP solves this with an optional durable execution engine. By default, runbooks use an inline engine. Actions run synchronously in the same goroutine that handled the event, which is fast and simple and correct for the vast majority of playbooks. Under the inline engine, the two durable-wait actions, sleep and await_signal, are skipped rather than failed. The chain continues, and the skip is recorded as a structured outcome so you can see exactly what happened.
When you need real durable waits, you set PMAP_RUNBOOK_ENGINE=workflow. Now each runbook execution is dispatched as a durable workflow instance backed by a PostgreSQL store. A sleep becomes a persisted timer. An await_signal becomes a persisted signal channel with a configurable timeout that defaults to 72 hours. These survive worker restarts. If the process goes down mid-wait and comes back up, the engine replays the workflow history and resumes from exactly where it paused. This is the same engineering pattern that powers durable workflow systems generally, and it is what makes multi-day human-in-the-loop gates reliable instead of fragile. For the internals of how that engine is provisioned and how state survives restarts, the durable workflow engine deep dive carries the full mechanics; here it is enough to know that the capability is one environment flag away.
There is also a shadow mode that runs both paths in parallel, the inline path committing real side effects and the workflow path running alongside without committing, so platform engineers can validate the durable engine against production traffic before cutting over. And if the durable backend fails to initialize at startup, PMAP logs a warning and falls back to inline execution rather than crashing. The rest of the platform stays operational. Durability is an upgrade, never a single point of failure.
Template Library and One-Click Instantiation
Building a runbook from scratch is fine for bespoke automation. For the common patterns, starting from a blank canvas is wasted effort. PMAP ships a template library of seed runbooks that encode proven playbooks, so you instantiate rather than reinvent.
Platform administrators publish these seed templates, and they can be global, meaning available across every tenant. A template runbook is excluded from event dispatch. It does not fire on its own. It sits in the Templates tab as a starting point. When an operator wants to use one, they click Use Template, and a wizard opens. The wizard detects the integration slots the template needs, such as a Jira connection or a Slack webhook, and presents a picker for each one. Once the slots are wired, the wizard calls the instantiate endpoint, which creates a tenant-scoped editable copy in an inactive state. The operator reviews it, activates it, and the playbook is live.
This instantiation flow matters for two reasons. It removes the blank-page problem, and it removes the wiring errors that come from copying a config by hand. The seed template knows it needs an ITSM connection. The wizard makes you supply one before the runbook can run. Built-in templates carry a lock icon and enforce a clone-to-edit flow, so the canonical version stays pristine while your copy is yours to modify. Templates cannot be deleted, only instantiated, which keeps the library stable for everyone who relies on it.
Execution History and Per-Step Drill-Down
Automation you cannot observe is automation you cannot trust. Every time a runbook fires, whether from an event, a manual trigger, or a cron schedule, PMAP records an execution row. That row captures the trigger payload, the result of each action, the overall status, and the timing. This history is the audit trail and the debugging surface in one.
The Run History page presents a cross-runbook execution log. You filter it by runbook, by status, by trigger event, and by date range, so you can answer questions like which playbooks failed in the last day, or how often a given runbook fired this week. Each execution expands into a four-tab detail panel. The Steps tab shows each action and its outcome. The Timing tab shows a proportional bar chart of where the time went. The Context tab shows the payload the runbook acted on. The Telemetry tab carries retry counts, skip reasons, and error messages. When something goes wrong, you diagnose it from this panel rather than digging through server logs.
Execution status itself is precise. A run is running while it executes, then resolves to success, failed, partial, or skipped. A partial status means at least one action succeeded and at least one failed, which is exactly the signal you want when a playbook is mostly working but one step is broken. A skipped status means every action was skipped, whether by a when condition or a pre-flight check. The KPI strip on the definitions list rolls these up into live counters: total runbooks, active, inactive, executions in the last 24 hours, and failures in the last 24 hours, all scoped to what the caller is allowed to see.
How PMAP Responds to Events in Seconds
Step back from the individual capabilities and the design intent is clear. PMAP runbooks take the predictable, repetitive work that sits between a platform event and a human response, and they automate it without sacrificing control. The matching model is expressive enough to fire only on the cases you care about. The action catalog is broad enough to touch findings, assets, scans, reports, tickets, and any HTTP endpoint. The reliability machinery, retry and throttle and concurrency and the circuit breaker, keeps automation from doing harm when something downstream breaks. The durable engine handles the long waits that naive automation cannot. And the execution history makes every fire observable and auditable.
That combination is what separates a real automation platform from a collection of scripts. Scripts run until they fail silently. Runbooks run with guardrails, leave a record, and stop themselves before they cause damage. The work that used to depend on whoever was awake now happens the same way every time, in seconds, with a full trail behind it. If you want to see how this fits alongside inline triage automation, read the companion piece on the finding rule engine, and for the full program-level picture, the vulnerability management automation pillar ties the whole automation story together.
Repetitive event-driven work does not have to own your team’s attention. Read the runbook automation datasheet and put it on autopilot.
Frequently Asked Questions
What is the difference between a finding rule and a runbook in PMAP?
A finding rule applies inline mutations to findings as they are imported, shaping the data at ingest time by normalizing titles, overriding severity, or assigning owners. A runbook is event-triggered automation that reacts after the fact, firing when a finding is created, when an SLA breaches, when a scan finishes, or on a cron schedule, then running an ordered list of actions. Rules shape incoming data. Runbooks orchestrate what happens once it is in the system. Most mature programs use both, and the finding rule engine article covers the rule side in depth.
How many trigger types and actions does a PMAP runbook support?
A runbook can subscribe to 16 trigger event types and run actions from a catalog of 22 types. The triggers span the finding lifecycle, scanning, reporting, SLA, and asset events, plus a synthetic schedule cron trigger. The 22 actions cover notifications, finding mutations, ITSM ticketing, scanning, report generation, integration sync, asset mutation, and flow control.
What happens when a runbook keeps failing?
After 10 consecutive failed executions, the runbook trips its circuit breaker. PMAP sets the runbook inactive, stamps the trip time, and shows a circuit-breaker badge so the runbook stops firing instead of producing endless failed runs. Throttled and skipped executions do not count toward this threshold. Once an operator fixes the root cause, a single Reset Breaker action clears the failure count and reactivates the runbook.
Can a runbook wait for days without losing its state?
Yes, when the durable workflow engine is enabled with PMAP_RUNBOOK_ENGINE=workflow. In that mode, sleep and await_signal actions become PostgreSQL-persisted timers and signal channels that survive worker restarts, with await_signal defaulting to a 72-hour timeout. If the process restarts mid-wait, the engine replays history and resumes from where it paused. Under the default inline engine, these durable-wait actions are skipped rather than failed.
Can I test a runbook before activating it?
Yes. The dry-run test lets you evaluate a runbook against a real recent event payload or a custom JSON payload. It checks the trigger conditions and simulates every action in the chain with no side effects, so no finding is mutated, no ticket is created, and no notification is sent. You get back a per-action result showing exactly what the runbook would have done.
How do I start without building every runbook from scratch?
PMAP ships a template library of seed runbooks. Platform administrators publish these templates, and operators click Use Template to instantiate one. The wizard detects the integration slots the template needs, prompts for each connection, and creates a tenant-scoped editable copy in an inactive state ready for review. The step-by-step process is covered in the runbook design guide.
Can runbooks run on a schedule instead of an event?
Yes. The schedule trigger type fires on a cron expression rather than a platform event, which is how you run nightly reports, weekly hygiene sweeps, or any time-based automation. A background scheduler reloads the active schedule triggers every 60 seconds, so a new or edited cron runbook becomes active within a minute of being saved.