Four-Channel Communication
AuthNexus server nodes communicate with the Control Plane through four physically separated channels. Each channel has a distinct transport, purpose, and failure mode, ensuring that no single bottleneck can disrupt the entire system.
Overview
| Channel | Transport | Default Port | Direction | Purpose |
|---|---|---|---|---|
| 1 -- Command | mTLS HTTP | 9091 | Server -> CP | Node checkin, command/config pull, result reporting |
| 2 -- Event SSE | mTLS HTTP (SSE) | 9091 (shared) | CP -> Server | Real-time push hints |
| 3 -- DB Delta | Internal poll | -- | Server -> CP (via Channel 1) | Incremental security data sync |
| 4 -- OCSP | Plain HTTP | 9092 | Server -> CP | OCSP stapling responses |
Channel 1: Command
The primary request-response channel. The server node initiates all calls to the Control Plane over mTLS HTTP.
Endpoints
| Method | Path | Description |
|---|---|---|
POST | /cp/v2/nodes/:id/checkin | Heartbeat with application version snapshot and metrics |
POST | /cp/v2/nodes/:id/commands/pull?since=N | Pull new commands since sequence N |
POST | /cp/v2/nodes/:id/commands/:cmd_id/result | Report per-command execution result |
POST | /cp/v2/nodes/:id/configs/pull | Pull new configuration snapshots and cloud function manifests |
POST | /cp/v2/nodes/:id/configs/acks | Batch ACK applied configurations (single-transaction batch upsert) |
POST | /cp/v2/nodes/:id/configs/errors | Batch report configuration apply errors |
GET | /cp/v2/objects/cloud-functions/:name?app_id= | Download cloud function script body |
Implementation
On the server side, ControlPlaneAgent (a background strand) dispatches blocking httplib calls onto the cp_io thread pool via co_await asio::post(cp_io). This keeps the background runtime responsive while HTTP calls block.
Each endpoint follows a single-responsibility pattern. Configs are pulled, applied, and acknowledged as separate operations, so partial failures can be retried without replaying the entire sync.
Checkin Payload
The checkin request includes:
- Application version snapshot -- the set of applications and their current config versions known to the node.
- Metrics -- connection counts, request rates, error rates, resource usage.
- Node metadata -- hostname, OS, uptime, software version.
The CP uses this data to power the admin dashboard's node overview, online user presence, and health monitoring.
Channel 2: Event SSE
A Server-Sent Events (SSE) long connection from the Control Plane to each server node, sharing the same mTLS port (9091).
Behavior
- One connection per node -- when a new SSE connection arrives, the old one is evicted.
- Keep-alive -- ping frames sent every 30 seconds; a health probe monitors connectivity.
- Queue overflow -- if the event queue fills, the oldest events are dropped (events are hints, not data).
- Events are hints only -- they carry no payload. Detailed data is always pulled via Channel 1.
Event Types
| Event | Meaning |
|---|---|
command.pending | New commands available; trigger a Channel 1 command pull |
config.pending | New configuration available; trigger a Channel 1 config pull |
blacklist.changed | Server blacklist updated; trigger a Channel 3 delta pull |
epoch.bumped | Auth epoch advanced; trigger a Channel 3 delta pull |
Implementation
The CP side uses cp::event::EventBus, an in-memory pub/sub system, with httplib chunked content providers for SSE streaming.
The server side runs CpEventSubscriber as an IBackgroundService. It parses the SSE stream and dispatches callbacks to the background runtime. The subscriber runs on a dedicated sse_pool (fixed at 1 thread), physically isolated from the cp_io pool. This prevents SSE's long-lived blocking httplib::Get from starving short-burst CP HTTP calls.
Degraded Mode
If the SSE connection drops, the server falls back to aggressive polling on Channels 1 and 3. Polling intervals tighten (e.g., 800ms/5000ms instead of 5s/30s) to maintain responsiveness until SSE recovers.
Channel 3: DB Delta
Incremental synchronization of security-critical data from the Control Plane to the server's local Runtime DB and in-memory caches.
Synchronized Tables
| Table | Content |
|---|---|
auth_epoch_changes | Per-user auth epoch bumps (session invalidation) |
server_blacklist_changes | IP/device/user blacklist entries |
Implementation
RuntimeSecurityDeltaPuller continuously pulls deltas via Channel 1 endpoints. Timing adapts to SSE health:
| SSE Status | Delta Pull Interval |
|---|---|
| Healthy | 5s (epoch) / 30s (blacklist) |
| Degraded | 800ms (epoch) / 5000ms (blacklist) |
The server's local Runtime DB is the single source of truth for blacklist enforcement. There is no full-snapshot push from the CP -- only incremental deltas are applied.
Why Incremental Only?
Full snapshots would be expensive for large blacklists and would create thundering-herd problems when many nodes reconnect simultaneously. Incremental deltas are small, idempotent, and can be applied in any order. The since cursor ensures no changes are missed even if the node was offline for an extended period.
Channel 4: OCSP
A plain HTTP endpoint for OCSP stapling responses, running on a separate port and thread pool.
Why Plain HTTP?
OCSP responses are cryptographically self-protecting (RFC 6960, RSA-SHA256 signature). TLS transport is unnecessary and would add mTLS handshake overhead. A dedicated port avoids resource contention with the SSE long connection on Channel 2.
Endpoint
| Method | Path | Port | Description |
|---|---|---|---|
POST | /ocsp/ | 9092 | Fetch OCSP response for node's server certificate |
Behavior
- Server-side (
OcspStaplingManager): periodically fetches OCSP responses (~15 minutes, self-adaptive based onnextUpdate/2). Responses are cached locally and stapled into TLS handshakes with SDK clients. - CP-side (
OcspResponder): signs responses on demand; does not cache. - Revocation handling: when a revoked response is received, the node continues stapling the revoked response to SDK clients. SDK clients with
must-staplecertificates will reject the handshake (fail-closed on the business side). However, the node does not self-shutdown -- management channels (mTLS/cp/v2/*, SSE, delta) remain alive so the admin can re-enable the node and it recovers automatically. - SDK verification: the SDK uses
ocsp_signer_ca.pem(thetcp_server_catrust anchor) to verify the OCSP signature, checks the time window (maxsec=1800s), and validatescert_status. Certificates with themust-stapleextension reject handshakes when no staple is present.
Thread Isolation
Channel 4 runs on an independent httplib::Server with its own thread pool (ocsp_thread_pool_count), completely isolated from Channels 1 and 2.
Failure Modes and Recovery
Each channel is designed to degrade independently:
| Channel | Failure Mode | Recovery Behavior |
|---|---|---|
| Ch1 (Command) | HTTP timeout / network error | Retry with exponential backoff; node continues with cached data |
| Ch2 (SSE) | Connection dropped | Auto-reconnect; system falls back to aggressive polling via Ch1/Ch3 |
| Ch3 (Delta) | Stale cursor / missed events | since cursor guarantees catch-up on next successful pull |
| Ch4 (OCSP) | Fetch failure | Cached OCSP response remains valid until nextUpdate expires |
The key design principle: no channel failure prevents the server from serving authenticated clients. The worst case is delayed security policy updates (epoch bumps, blacklist changes), which are bounded by the degraded polling intervals.
Channel Interaction Summary
Server Node Control Plane
│ │
│──── Ch1: POST /cp/v2/.../checkin ──────>│ (periodic heartbeat)
│ │
│<──── Ch2: SSE event: command.pending ───│ (real-time hint)
│ │
│──── Ch1: POST /cp/v2/.../commands/pull ─>│ (pull actual data)
│ │
│──── Ch3: delta pull (via Ch1) ─────────>│ (security sync)
│ │
│──── Ch4: POST /ocsp/ ──────────────────>│ (cert revocation check)
│ │The design ensures:
- No single point of failure -- each channel degrades independently.
- No head-of-line blocking -- SSE, short HTTP, and OCSP use separate thread pools.
- Minimal data duplication -- SSE events are hints; actual data is always pulled, ensuring consistency.
Configuration Delivery via Channel 1
Beyond security deltas, Channel 1 is the primary vehicle for delivering all configuration changes to server nodes. The configuration pull endpoint returns manifests for all six registered config types:
| Config Type | Content |
|---|---|
server_runtime_settings | Server runtime parameters (timeouts, limits) |
app_policy | Application-level policies |
app_variables | Cloud variables readable by SDK clients |
app_client_ca_bundle | Client CA trust bundle for mTLS |
app_mtls_trust_bundle | mTLS trust bundle published to nodes |
cp_agent_runtime | CP agent runtime parameters |
The delivery lifecycle:
- Admin updates a configuration via the admin API.
- CP sends
config.pendingSSE hint to the affected node(s). - Node pulls the manifest via
/cp/v2/nodes/:id/configs/pull. - Node applies the configuration locally.
- Node sends ACKs via
/cp/v2/nodes/:id/configs/acks. - If application fails, node reports errors via
/cp/v2/nodes/:id/configs/errors.
Next Steps
- TLS & PKI -- certificate infrastructure that secures these channels
- Security Model -- fail-closed semantics and blacklist enforcement
- Operations Manual -- monitoring channel health