Four-Channel Communication

AuthNexus server nodes communicate with the Control Plane through four physically separated channels. Each channel has a distinct transport, purpose, and failure mode, ensuring that no single bottleneck can disrupt the entire system.

Overview

Channel	Transport	Default Port	Direction	Purpose
1 -- Command	mTLS HTTP	9091	Server -> CP	Node checkin, command/config pull, result reporting
2 -- Event SSE	mTLS HTTP (SSE)	9091 (shared)	CP -> Server	Real-time push hints
3 -- DB Delta	Internal poll	--	Server -> CP (via Channel 1)	Incremental security data sync
4 -- OCSP	Plain HTTP	9092	Server -> CP	OCSP stapling responses

Channel 1: Command

The primary request-response channel. The server node initiates all calls to the Control Plane over mTLS HTTP.

Endpoints

Method	Path	Description
`POST`	`/cp/v2/nodes/:id/checkin`	Heartbeat with application version snapshot and metrics
`POST`	`/cp/v2/nodes/:id/commands/pull?since=N`	Pull new commands since sequence N
`POST`	`/cp/v2/nodes/:id/commands/:cmd_id/result`	Report per-command execution result
`POST`	`/cp/v2/nodes/:id/configs/pull`	Pull new configuration snapshots and cloud function manifests
`POST`	`/cp/v2/nodes/:id/configs/acks`	Batch ACK applied configurations (single-transaction batch upsert)
`POST`	`/cp/v2/nodes/:id/configs/errors`	Batch report configuration apply errors
`GET`	`/cp/v2/objects/cloud-functions/:name?app_id=`	Download cloud function script body

Implementation

On the server side, ControlPlaneAgent (a background strand) dispatches blocking httplib calls onto the cp_io thread pool via co_await asio::post(cp_io). This keeps the background runtime responsive while HTTP calls block.

Each endpoint follows a single-responsibility pattern. Configs are pulled, applied, and acknowledged as separate operations, so partial failures can be retried without replaying the entire sync.

Checkin Payload

The checkin request includes:

Application version snapshot -- the set of applications and their current config versions known to the node.
Metrics -- connection counts, request rates, error rates, resource usage.
Node metadata -- hostname, OS, uptime, software version.

The CP uses this data to power the admin dashboard's node overview, online user presence, and health monitoring.

Channel 2: Event SSE

A Server-Sent Events (SSE) long connection from the Control Plane to each server node, sharing the same mTLS port (9091).

Behavior

One connection per node -- when a new SSE connection arrives, the old one is evicted.
Keep-alive -- ping frames sent every 30 seconds; a health probe monitors connectivity.
Queue overflow -- if the event queue fills, the oldest events are dropped (events are hints, not data).
Events are hints only -- they carry no payload. Detailed data is always pulled via Channel 1.

Event Types

Event	Meaning
`command.pending`	New commands available; trigger a Channel 1 command pull
`config.pending`	New configuration available; trigger a Channel 1 config pull
`blacklist.changed`	Server blacklist updated; trigger a Channel 3 delta pull
`epoch.bumped`	Auth epoch advanced; trigger a Channel 3 delta pull

Implementation

The CP side uses cp::event::EventBus, an in-memory pub/sub system, with httplib chunked content providers for SSE streaming.

The server side runs CpEventSubscriber as an IBackgroundService. It parses the SSE stream and dispatches callbacks to the background runtime. The subscriber runs on a dedicated sse_pool (fixed at 1 thread), physically isolated from the cp_io pool. This prevents SSE's long-lived blocking httplib::Get from starving short-burst CP HTTP calls.

Degraded Mode

If the SSE connection drops, the server falls back to aggressive polling on Channels 1 and 3. Polling intervals tighten (e.g., 800ms/5000ms instead of 5s/30s) to maintain responsiveness until SSE recovers.

Channel 3: DB Delta

Incremental synchronization of security-critical data from the Control Plane to the server's local Runtime DB and in-memory caches.

Synchronized Tables

Table	Content
`auth_epoch_changes`	Per-user auth epoch bumps (session invalidation)
`server_blacklist_changes`	IP/device/user blacklist entries

Implementation

RuntimeSecurityDeltaPuller continuously pulls deltas via Channel 1 endpoints. Timing adapts to SSE health:

SSE Status	Delta Pull Interval
Healthy	5s (epoch) / 30s (blacklist)
Degraded	800ms (epoch) / 5000ms (blacklist)

The server's local Runtime DB is the single source of truth for blacklist enforcement. There is no full-snapshot push from the CP -- only incremental deltas are applied.

Why Incremental Only?

Full snapshots would be expensive for large blacklists and would create thundering-herd problems when many nodes reconnect simultaneously. Incremental deltas are small, idempotent, and can be applied in any order. The since cursor ensures no changes are missed even if the node was offline for an extended period.

Channel 4: OCSP

A plain HTTP endpoint for OCSP stapling responses, running on a separate port and thread pool.

Why Plain HTTP?

OCSP responses are cryptographically self-protecting (RFC 6960, RSA-SHA256 signature). TLS transport is unnecessary and would add mTLS handshake overhead. A dedicated port avoids resource contention with the SSE long connection on Channel 2.

Endpoint

Method	Path	Port	Description
`POST`	`/ocsp/`	9092	Fetch OCSP response for node's server certificate

Behavior

Server-side (OcspStaplingManager): periodically fetches OCSP responses (~15 minutes, self-adaptive based on nextUpdate/2). Responses are cached locally and stapled into TLS handshakes with SDK clients.
CP-side (OcspResponder): signs responses on demand; does not cache.
Revocation handling: when a revoked response is received, the node continues stapling the revoked response to SDK clients. SDK clients with must-staple certificates will reject the handshake (fail-closed on the business side). However, the node does not self-shutdown -- management channels (mTLS /cp/v2/*, SSE, delta) remain alive so the admin can re-enable the node and it recovers automatically.
SDK verification: the SDK uses ocsp_signer_ca.pem (the tcp_server_ca trust anchor) to verify the OCSP signature, checks the time window (maxsec=1800s), and validates cert_status. Certificates with the must-staple extension reject handshakes when no staple is present.

Thread Isolation

Channel 4 runs on an independent httplib::Server with its own thread pool (ocsp_thread_pool_count), completely isolated from Channels 1 and 2.

Failure Modes and Recovery

Each channel is designed to degrade independently:

Channel	Failure Mode	Recovery Behavior
Ch1 (Command)	HTTP timeout / network error	Retry with exponential backoff; node continues with cached data
Ch2 (SSE)	Connection dropped	Auto-reconnect; system falls back to aggressive polling via Ch1/Ch3
Ch3 (Delta)	Stale cursor / missed events	`since` cursor guarantees catch-up on next successful pull
Ch4 (OCSP)	Fetch failure	Cached OCSP response remains valid until `nextUpdate` expires

The key design principle: no channel failure prevents the server from serving authenticated clients. The worst case is delayed security policy updates (epoch bumps, blacklist changes), which are bounded by the degraded polling intervals.

Channel Interaction Summary

Server Node                              Control Plane
    │                                          │
    │──── Ch1: POST /cp/v2/.../checkin ──────>│  (periodic heartbeat)
    │                                          │
    │<──── Ch2: SSE event: command.pending ───│  (real-time hint)
    │                                          │
    │──── Ch1: POST /cp/v2/.../commands/pull ─>│  (pull actual data)
    │                                          │
    │──── Ch3: delta pull (via Ch1) ─────────>│  (security sync)
    │                                          │
    │──── Ch4: POST /ocsp/ ──────────────────>│  (cert revocation check)
    │                                          │

The design ensures:

No single point of failure -- each channel degrades independently.
No head-of-line blocking -- SSE, short HTTP, and OCSP use separate thread pools.
Minimal data duplication -- SSE events are hints; actual data is always pulled, ensuring consistency.

Configuration Delivery via Channel 1

Beyond security deltas, Channel 1 is the primary vehicle for delivering all configuration changes to server nodes. The configuration pull endpoint returns manifests for all six registered config types:

Config Type	Content
`server_runtime_settings`	Server runtime parameters (timeouts, limits)
`app_policy`	Application-level policies
`app_variables`	Cloud variables readable by SDK clients
`app_client_ca_bundle`	Client CA trust bundle for mTLS
`app_mtls_trust_bundle`	mTLS trust bundle published to nodes
`cp_agent_runtime`	CP agent runtime parameters

The delivery lifecycle:

Admin updates a configuration via the admin API.
CP sends config.pending SSE hint to the affected node(s).
Node pulls the manifest via /cp/v2/nodes/:id/configs/pull.
Node applies the configuration locally.
Node sends ACKs via /cp/v2/nodes/:id/configs/acks.
If application fails, node reports errors via /cp/v2/nodes/:id/configs/errors.

Next Steps

TLS & PKI -- certificate infrastructure that secures these channels
Security Model -- fail-closed semantics and blacklist enforcement
Operations Manual -- monitoring channel health

Four-Channel Communication ​

Overview ​

Channel 1: Command ​

Endpoints ​

Implementation ​

Checkin Payload ​

Channel 2: Event SSE ​

Behavior ​

Event Types ​

Implementation ​

Degraded Mode ​

Channel 3: DB Delta ​

Synchronized Tables ​

Implementation ​

Why Incremental Only? ​

Channel 4: OCSP ​

Why Plain HTTP? ​

Endpoint ​

Behavior ​

Thread Isolation ​

Failure Modes and Recovery ​

Channel Interaction Summary ​

Configuration Delivery via Channel 1 ​

Next Steps ​

Four-Channel Communication

Overview

Channel 1: Command

Endpoints

Implementation

Checkin Payload

Channel 2: Event SSE

Behavior

Event Types

Implementation

Degraded Mode

Channel 3: DB Delta

Synchronized Tables

Implementation

Why Incremental Only?

Channel 4: OCSP

Why Plain HTTP?

Endpoint

Behavior

Thread Isolation

Failure Modes and Recovery

Channel Interaction Summary

Configuration Delivery via Channel 1

Next Steps