Vulnerability Management

An Internal Event Bus That Powers Automation and Alerts

By PMAP Security Team 25 min read

When a scanner finishes importing results, several things have to happen at once. An analyst should see a notification. A runbook might need to auto-assign the new findings. The browser dashboard should update without a page refresh. The audit trail needs a record. None of these belong inside the scan importer itself, and wiring them in directly would turn one clean importer into a tangle of dependencies on the notification service, the automation engine, the realtime layer and the audit logger.

An internal event bus is the layer that prevents exactly that tangle. The scan importer announces that a scan finished and then moves on. It does not know or care who is listening. Somewhere else, four independent subscribers each react to that announcement in their own way. The producer and the consumers never reference each other. This pattern, where state changes are published as events and interested parties subscribe to them, is what makes a platform event-driven.

This article explains how an internal event bus works in a security platform and why it matters for automation and alerting. It uses PMAP’s event bus as the worked example, because its behavior is concrete and documented rather than abstract. By the end you should understand how producers and consumers stay decoupled, how asynchronous dispatch keeps callers fast, how a typed event catalog keeps the system disciplined, and where the deliberate limits of an in-process bus actually lie. If you are evaluating how automation and alerts hang together under the hood, this is the layer that connects them.

For the broader picture of how automation, rules and orchestration turn raw findings into action, see the pillar guide on vulnerability management automation. This article zooms into the messaging backbone that sits underneath all of it.

What an Internal Event Bus Does in a Security Platform

It helps to define the bus narrowly, because the term means different things in different systems.

An internal event bus is an in-process publish/subscribe backbone. Producers publish events when they change state. Consumers subscribe to the event types they care about. The bus is the matchmaker in the middle. It has no HTTP surface of its own, no user interface and no database. It is infrastructure that other domains use, not a feature that end-users touch directly. In PMAP the bus lives in a single internal/events package and exposes a small surface: producers call Emit or EmitPayload, and consumers call Subscribe.

The value of this arrangement is separation. Without a bus, the scan domain would have to import the notification service, the runbook service, the realtime hub and the audit logger, then call each one in sequence after every scan. Every new reaction to a scan would mean editing the scan domain. With a bus, the scan domain emits one event and remains entirely unaware of who reacts. New behavior is added by registering a new subscriber, not by editing the producer. The producer stays small and focused on its own job, which is the whole point.

The bus also gives the platform a single, observable seam where state changes flow. Because every meaningful change passes through one dispatcher, the set of things that can happen in the system is enumerable. You can read the event catalog and know, at a glance, what the platform is capable of announcing. That property is hard to get when reactions are scattered as direct method calls across dozens of files.

It is worth being clear about what the bus is not. It is not a scanner, a queue product or a network service. In PMAP it depends on nothing else inside the application code. Its only imports are standard library packages and a UUID helper. Everything else in the platform depends on the bus, and the bus depends on almost nothing. That direction of dependency is deliberate, and it is what keeps the bus stable while the domains around it evolve.

Producers and Consumers, Fully Decoupled

The decoupling is not a happy accident. It comes from how the dispatcher is constructed and shared.

A single Dispatcher instance is created once when the server starts. That one instance is then passed by reference into every service that needs to publish or subscribe. The scan service holds a reference to the dispatcher. So does the finding service, the asset service and the rest. They all point at the same object, but none of them point at each other. A producer calls Emit on its reference and the dispatcher takes over from there. The producer has no list of subscribers and no way to address one. It cannot accidentally couple itself to a consumer, because it has no consumer to couple to.

On the other side, a consumer registers a handler for an event type by calling Subscribe. The dispatcher keeps a map from event type to a slice of handler functions. When a producer emits an event, the dispatcher looks up the handlers registered for that event type and invokes each one. Multiple consumers can subscribe to the same event independently. Both the notification service and the runbook service subscribe to the event for a created finding, and neither knows the other exists. The dispatcher fans the single event out to both. Adding a third subscriber later changes nothing for the existing two.

To make this concrete, PMAP wires four subscriber bridges into the bus at startup. The runbook service subscribes to a broad set of finding and scan events to trigger automation workflows. The notification service subscribes to finding, scan, SLA, report and approval events to create in-app notifications. A realtime bridge subscribes to a subset of events and forwards them to connected browsers. An audit bridge subscribes to a smaller set and writes audit-trail rows. On the producing side, domains such as finding, scan, asset, company, project, report and the integration connectors all emit into the same bus. None of these producers imports any of those consumers. The bus is the only thing they share.

Why No External Message Broker Is Needed

A reasonable question at this point is whether this should be a real message broker. Many event-driven systems reach for Kafka, RabbitMQ or a cloud queue as the publish/subscribe layer.

PMAP deliberately does not. The bus is a single in-process dispatcher. Producers and consumers run in the same process, share the same memory and are wired together at startup. There is no network hop, no broker to deploy, no separate cluster to operate and no serialization across a wire. An event is a plain in-memory value handed from emitter to handler. This is the right trade-off for a platform where producers and consumers are co-located by design and the events never need to cross a process boundary.

The reason this works cleanly is that the bus carries zero application dependencies. It imports only context, sync, log, a UUID library and time. It knows nothing about findings, scans or notifications as concepts. It only knows event types and handler functions. Because it sits at the bottom of the dependency graph and imports none of the domains that use it, there is no circular dependency risk and no reason to push it out of process. The simplicity of an in-process broker is a feature, not a limitation, for this shape of system. The classic publish-subscribe channel described in the Enterprise Integration Patterns catalog is exactly this idea, applied within a single process rather than across a network.

Async Dispatch Without Blocking the Caller

A bus that decouples code but blocks the caller would solve one problem and create another. PMAP’s dispatcher is asynchronous by design.

When a producer calls Emit, the dispatcher does not run the handlers inline. It launches each handler in its own goroutine and then returns immediately. The emitter does not wait for any handler to finish. From the producer’s point of view, emitting an event is close to free. The scan importer emits the scan-finished event and continues, even though four subscribers are about to do real work in response, including database writes. This is fire-and-forget dispatch, and it is what keeps the producer fast no matter how heavy the consumers are.

This matters most on the request path. Imagine a finding being created during an HTTP request. The handler that creates the finding emits a created event and returns its response to the user quickly. Meanwhile, in the background, the notification service writes an in-app notification, the runbook service evaluates automation rules, the realtime bridge pushes an update to the browser and the audit bridge records the change. If all of that ran inline before the HTTP response, the user would wait for the slowest consumer. With async dispatch, the user gets a fast response and the side effects happen on their own goroutines.

There is a trade-off that comes with this model, and it is worth naming plainly. Because handlers run concurrently, there is no ordering guarantee between handlers of the same event. If two consumers subscribe to the same event type, they run at the same time, and which one finishes first is non-deterministic. This is acceptable here precisely because the consumers are independent. The notification a user sees does not depend on whether the audit row was written first. When two reactions genuinely must be ordered, the right place to express that order is inside a single handler, not across two subscribers to the same event.

Panic Isolation and Background Context

Asynchronous fan-out raises two reliability questions. What happens when a handler crashes, and what happens to the work a handler started after the original request is gone? The dispatcher answers both deliberately.

The first answer is panic isolation. Each handler runs inside a goroutine that wraps the call in a recovery block. If a handler panics, the dispatcher catches the panic, logs it with the event type and the panic value, and moves on. The panic does not propagate to the emitter, and it does not affect any sibling handler running for the same event. A bug in the notification handler cannot take down the runbook handler or crash the request that emitted the event. Each subscriber is isolated inside its own goroutine boundary, which means one bad consumer degrades exactly one reaction rather than the whole system. For a security platform, where a single missed event might matter but a crashed process certainly does, this is the correct failure mode.

The second answer is background context. When a handler runs, it does not receive the originating HTTP request’s context. It receives a fresh background context instead. This is intentional and easy to get wrong. An HTTP request context is cancelled the moment the response is sent. If a handler inherited that context and then tried to run a database write a few milliseconds later, the write could be cancelled out from under it because the request had already completed. By handing handlers a background context, the dispatcher lets their follow-on work survive past the HTTP lifecycle. The notification write, the audit insert and the runbook evaluation all run to completion regardless of when the original response went out. The Go context package documentation explains why request-scoped contexts get cancelled and why detaching from them is the right call for work that must outlive the request.

These two choices work together. Panic isolation contains failures so a bad handler is a local problem, and background context ensures that the work a handler legitimately needs to do is not torn down prematurely. Together they make asynchronous dispatch safe enough to rely on for real side effects.

A Typed Event Catalog of 34 Kinds

A bus is only as disciplined as the vocabulary it carries. PMAP defines that vocabulary as a fixed catalog of named event types.

There are 34 event type constants. Each is a named Go constant whose string value is lower-snake-case, such as the value for a created finding. Producers do not emit free-form strings. They reference these constants, which means the compiler enforces that an emitted event type is a real one and that a renamed event type is caught at build time rather than at runtime. The catalog is the contract between producers and consumers. A consumer subscribing to an event type and a producer emitting it agree through the same named constant.

This typed approach has a practical consequence that is easy to overlook. The string values are not just internal labels. Some of them are referenced verbatim in PostgreSQL CHECK constraints on the runbooks table, where a runbook’s trigger event type must be one of the known values. Renaming such a constant in Go is therefore a database migration concern, not a free refactor. The platform pins specific event-type strings in a unit test precisely so that an accidental rename fails the test suite before it can break the migration alignment. The catalog is treated as a stable interface, because parts of the system outside Go depend on its exact spelling.

Finding, Scan, SLA, Asset and Approval Event Groups

The 34 constants are not a flat list. They fall into natural groups that mirror the platform’s domains, which makes the catalog easy to reason about.

The largest group covers the finding lifecycle. Events announce that a finding was created, updated, had its status changed, was assigned, was retested, gained a note, requested a rescan, had a rule applied or was bulk-updated. A second group covers the approval workflow, with events for a finding approval being requested, approved, rejected, cancelled, expired or having an exception expire. The scan group announces that a scan finished or that a remote scan completed. The SLA group announces a breach and an escalation. The asset group covers creation, update and deletion, and there is a parallel company group for the tenant lifecycle. Smaller groups cover project updates, assessment-run creation and completion, a maintenance event for archived findings, a reporting event for a generated report, notification-lifecycle events and a set of CI and version-control events for pushes, pull requests, pipeline completions and SAST dispatch.

Grouping the catalog this way is more than tidy bookkeeping. It lets a reader scan the list and understand the platform’s behavioral surface by domain. If you want to know everything the finding domain can announce, the finding group is the answer. Consumers tend to subscribe along these groupings too. The notification service cares about a finding and approval slice of the catalog, the audit bridge cares about a different slice and the runbook engine spans several groups. The catalog is the shared map that lets each consumer pick exactly the slice it needs.

Thin Envelope vs Versioned Rich Payload

Events have to carry data, and PMAP carries it in two shapes that coexist on purpose. Understanding both explains how the bus evolves without breaking older code.

The first shape is the legacy thin envelope. It is a small struct with three fields: the event type, a company identifier and a flat map of payload data. This shape keeps call sites simple. A producer can construct one quickly, and every legacy subscriber reads the same three things, the type, the company and the flat map. For a lot of internal wiring this is all that is needed, and its simplicity is the reason it remains the lingua franca that every handler understands.

The second shape is the versioned rich payload. It carries a schema version, a timestamp, a tenant identifier, a correlation identifier, a typed subject describing the entity the event is about, an optional typed actor describing who or what triggered the event, and a data map. The schema version is the key addition. It currently sits at version one, and the rule is precise: incrementing it signals a breaking semantic change to a field, while adding a new optional field to the data map is backward-compatible and does not require a bump. This gives the platform a disciplined way to evolve event content over time without guessing whether a change is safe. The typed subject and actor exist so that automation can match conditions against structured fields rather than digging through an untyped map, and so that future webhook delivery has a stable, self-describing payload to serialize.

The two shapes are bridged rather than forked. When a producer emits a rich payload, the dispatcher flattens it into the legacy envelope before delivering it. The flattening copies the data-map keys, the schema version, the correlation identifier and the subject and actor identifiers down into the flat payload map. The result is that a producer can adopt the richer payload at its own pace while every existing subscriber, which still reads the thin envelope, keeps working untouched. New emit sites get structure and versioning. Old subscribers get continuity. Neither side has to change in lockstep with the other, which is exactly the property you want when a shared contract has many participants.

Correlation Chaining for End-to-End Tracing

One field in the rich payload deserves its own attention, because it solves a problem that asynchronous fan-out makes harder: following a single logical action across many independent reactions.

The rich payload carries a correlation identifier. When a payload is constructed, it is seeded with a random correlation ID automatically. A caller can override it through a fluent builder method to thread an existing ID through a chain of events. The intended use is to give one logical flow a shared trace ID. A request triggers an event, that event triggers a runbook, and the runbook triggers a sub-action, and all of them can carry the same correlation ID so the whole chain is stitchable in logs and audit. The override is nil-safe by design. Passing a nil identifier is a no-op that leaves the auto-generated ID in place, so a caller who has nothing to chain does not accidentally blank out the trace.

This is what makes an asynchronous, fan-out system observable. Without a correlation ID, a single user action that fans out into four concurrent handlers produces four unrelated-looking log lines. With one, those four lines share an identifier and can be reassembled into a coherent story after the fact. For a security platform, where you frequently need to explain exactly what happened and in what order during an investigation, that traceability is not a nicety. It is part of being auditable.

Who Subscribes: Runbooks, Notifications, SSE and Audit

The bus only matters because of what reacts to it. PMAP registers four subscriber bridges, and each one turns raw events into a different kind of value.

The runbook bridge is the automation consumer. It subscribes to a broad set of finding and scan events, including created, updated, status-changed, assigned, retested, rescan-requested, note-added, rule-applied and bulk-updated findings, plus scan-finished, remote-scan-completed, report-generated and the SLA breach and escalation events, along with asset creation and update. Each delivered event becomes an opportunity to run a runbook automation workflow. This is the bridge that turns an event-driven backbone into event-driven automation. How those workflows are authored is a topic in its own right, covered in the guide on designing runbooks.

The notification bridge is the alerting consumer. It subscribes to finding creation, assignment, status change and retest, scan completion, SLA breach and escalation, report generation, note addition and the full set of approval events. Each delivered event can produce an in-app notification for the right users. This is the path that turns a state change deep in the platform into something a human actually sees.

The realtime bridge is the live-update consumer. It subscribes to a focused subset, including finding lifecycle events, scan completion, SLA breach, note addition, report generation, asset and project changes and the notification-lifecycle events, and forwards them to connected browsers so the UI updates without polling. The bus stops at the bridge; the browser transport beyond it is a separate concern owned by the realtime layer rather than the bus itself.

The audit bridge is the record-keeping consumer. It subscribes to a smaller, security-relevant set, including finding retest, scan completion, SLA breach and escalation, report generation, asset changes and project updates, and writes a row into the activity log for each. This is how a system-wide audit trail is assembled out of the same events that drive everything else, without the producing domains having to know an audit trail exists.

Four bridges, one bus, and four entirely different outcomes from the same stream of events. That is the leverage an internal event bus provides. Each new way of reacting to platform activity is a new subscriber, not a change to the code that produces the activity.

The Limits: No Persistence, No Replay, No Backpressure

An honest description of an event bus has to include what it deliberately does not do. PMAP’s bus makes specific trade-offs, and understanding them is part of using it correctly.

The first limit is that there is no persistence and no replay. Events live entirely in memory. There is no queue, no dead-letter store and no mechanism to replay past events. If the process restarts, any in-flight events are simply dropped. This is a direct consequence of choosing an in-process bus over a durable broker. The implication is important for consumers: durability for any critical side effect is the consumer’s responsibility, not the bus’s. If an audit record absolutely must survive, the audit handler has to make that guarantee itself, because the bus offers no at-least-once delivery promise.

The second limit is in error handling. The dispatcher recovers panics, but a non-panic error returned from inside a handler is silently swallowed unless the handler logs it. There is no retry. A consumer that fails quietly will fail quietly. This pushes responsibility onto handlers to log and, where it matters, to persist their own outcomes. The bus guarantees that a failing handler will not crash the system, not that a failing handler will be noticed automatically.

The third limit is the absence of backpressure. Slow consumers do not throttle producers. Every Emit spawns goroutines unconditionally, regardless of how far behind the consumers are. Under a sudden flood of events with slow handlers, for example handlers making many database calls, goroutines can pile up. The bus optimizes for never blocking the producer, and the cost of that choice is that it does not protect itself from being overwhelmed. In practice this is bounded by the realistic event volume of the platform, but it is a real property worth knowing.

None of these are defects. They are the consequences of choosing a lightweight in-process bus, and each one is a coherent trade. The bus trades durability and flow control for simplicity, zero operational overhead and near-zero producer latency. Knowing where those edges are is what lets you build reliable consumers on top of it. Martin Fowler’s essay What do you mean by “Event-Driven”? is a useful companion here, because it separates the several distinct patterns the term covers and helps locate exactly which one an in-process notification bus is.

How PMAP Wires the Bus at Startup

The last piece is when all of this gets connected, and the answer is short: once, at boot.

Every subscription happens during server startup. The dispatcher is constructed, and then the runbook, notification, realtime and audit bridges are each registered against the event types they care about, all in the server’s main startup path. There is no dynamic subscription or unsubscription at runtime. The set of who listens to what is fixed the moment the server is running. This is a deliberate simplification. Because the wiring is static and lives in one place, the entire event graph of the system, who emits what and who reacts to it, can be read in a single file rather than discovered by tracing calls across the codebase.

Boot-time registration also reinforces the decoupling described earlier. The producing domains are constructed with a reference to the same dispatcher, and the consumers are subscribed to it, and that is the full extent of their relationship. A domain added later that needs to react to existing events adds a Subscribe call at startup and nothing else. A domain that needs to announce something new emits a new event type and lets whoever cares subscribe. The startup wiring is the seam where the whole event-driven design is made explicit and kept honest.

That is the shape of an internal event bus in a security platform. A single in-process dispatcher, producers that emit and forget, consumers that subscribe independently, asynchronous fan-out with panic isolation and background context, a typed catalog of 34 event kinds, a thin envelope bridged to a versioned rich payload, correlation IDs for tracing, four bridges turning one stream into automation, alerts, live updates and audit, and a clear set of limits that come from keeping it in-process. It is not glamorous infrastructure. It is the quiet layer that lets everything above it stay decoupled, fast and observable.

To see how these bus events become real alerts and automation across channels, read the notifications and real-time operations datasheet. It shows the event-to-channel path in production detail.

Frequently Asked Questions

What is an internal event bus in a security platform?

It is an in-process publish/subscribe backbone that decouples the code that changes state from the code that reacts to those changes. Producers emit named events when something happens, such as a finding being created or a scan finishing, and consumers subscribe to the event types they care about. The bus matches the two without either side referencing the other. In PMAP this is a single dispatcher with no HTTP surface, no UI and no database, used by every domain as either a producer or a consumer.

Why use an in-process event bus instead of Kafka or RabbitMQ?

Because the producers and consumers run in the same process and the events never need to cross a process boundary. An in-process dispatcher means no network hop, no broker to deploy, no serialization and no separate cluster to operate. An event is a plain in-memory value handed from emitter to handler. The bus carries zero application dependencies, which removes any circular-dependency risk and any reason to push it out of process. A durable external broker would add operational weight that this shape of system does not need.

How does asynchronous event dispatch avoid blocking the caller?

When a producer emits an event, the dispatcher launches each subscribed handler in its own goroutine and returns immediately without waiting for any of them. This is fire-and-forget dispatch. The producer, often an HTTP request handler, finishes quickly while the side effects, such as notifications, runbook evaluation, live updates and audit writes, run in the background. The trade-off is that handlers for the same event run concurrently, so there is no guaranteed ordering between them.

What happens if an event handler crashes?

Each handler runs inside a goroutine that wraps the call in a recovery block. If a handler panics, the dispatcher catches the panic, logs it with the event type and the panic value, and continues. The panic does not propagate to the producer and does not affect any other handler subscribed to the same event. This panic isolation means one buggy consumer degrades exactly one reaction rather than crashing the process or breaking the request that emitted the event.

What does the correlation ID do?

It gives one logical action a shared trace identifier that survives across many independent reactions. A rich event payload is seeded with a random correlation ID, and a caller can override it to thread an existing ID through a chain, for example from a request to an event to a runbook to a sub-action. Because asynchronous fan-out turns one action into several concurrent handlers, the correlation ID is what lets those handlers be reassembled into a single coherent story in logs and audit. The override is nil-safe, so passing nothing leaves the auto-generated ID in place.

What are the limits of an in-process event bus?

It has no persistence and no replay, so events live only in memory and are dropped if the process restarts. It has no retry, so a non-panic error inside a handler is swallowed unless the handler logs it. It has no backpressure, so slow consumers do not throttle producers and goroutines can pile up under a flood of events. These are deliberate trade-offs for simplicity and near-zero producer latency. Durability for any critical side effect is the consumer’s responsibility, not the bus’s.

How many event types does PMAP define?

PMAP defines 34 named event type constants. They cover the finding lifecycle, the approval workflow, scans, SLA breaches and escalations, asset and company lifecycles, project updates, assessment runs, a maintenance event, reporting, notification lifecycle and a set of CI and version-control events. Producers reference these constants rather than free-form strings, so the compiler enforces valid event types. Some string values are also pinned in database CHECK constraints and a unit test, which makes a rename a deliberate migration concern rather than a casual refactor.

author avatar
PMAP Security Team

Newsletter

Get the next writeup in your inbox

One short email when a new case writeup or detection deep dive ships. No marketing drip, no third-party tracking.