Skip to content

Four-Channel Communication

AuthNexus server nodes communicate with the Control Plane through four physically separated channels. Each channel has a distinct transport, purpose, and failure mode, ensuring that no single bottleneck can disrupt the entire system.

Overview

ChannelTransportDefault PortDirectionPurpose
1 -- CommandmTLS HTTP9091Server -> CPNode checkin, command/config pull, result reporting
2 -- Event SSEmTLS HTTP (SSE)9091 (shared)CP -> ServerReal-time push hints
3 -- DB DeltaInternal poll--Server -> CP (via Channel 1)Incremental security data sync
4 -- OCSPPlain HTTP9092Server -> CPOCSP stapling responses

Channel 1: Command

The primary request-response channel. The server node initiates all calls to the Control Plane over mTLS HTTP.

Endpoints

MethodPathDescription
POST/cp/v2/nodes/:id/checkinHeartbeat with application version snapshot and metrics
POST/cp/v2/nodes/:id/commands/pull?since=NPull new commands since sequence N
POST/cp/v2/nodes/:id/commands/:cmd_id/resultReport per-command execution result
POST/cp/v2/nodes/:id/configs/pullPull new configuration snapshots and cloud function manifests
POST/cp/v2/nodes/:id/configs/acksBatch ACK applied configurations (single-transaction batch upsert)
POST/cp/v2/nodes/:id/configs/errorsBatch report configuration apply errors
GET/cp/v2/objects/cloud-functions/:name?app_id=Download cloud function script body

Implementation

On the server side, ControlPlaneAgent (a background strand) dispatches blocking httplib calls onto the cp_io thread pool via co_await asio::post(cp_io). This keeps the background runtime responsive while HTTP calls block.

Each endpoint follows a single-responsibility pattern. Configs are pulled, applied, and acknowledged as separate operations, so partial failures can be retried without replaying the entire sync.

Checkin Payload

The checkin request includes:

  • Application version snapshot -- the set of applications and their current config versions known to the node.
  • Metrics -- connection counts, request rates, error rates, resource usage.
  • Node metadata -- hostname, OS, uptime, software version.

The CP uses this data to power the admin dashboard's node overview, online user presence, and health monitoring.

Channel 2: Event SSE

A Server-Sent Events (SSE) long connection from the Control Plane to each server node, sharing the same mTLS port (9091).

Behavior

  • One connection per node -- when a new SSE connection arrives, the old one is evicted.
  • Keep-alive -- ping frames sent every 30 seconds; a health probe monitors connectivity.
  • Queue overflow -- if the event queue fills, the oldest events are dropped (events are hints, not data).
  • Events are hints only -- they carry no payload. Detailed data is always pulled via Channel 1.

Event Types

EventMeaning
command.pendingNew commands available; trigger a Channel 1 command pull
config.pendingNew configuration available; trigger a Channel 1 config pull
blacklist.changedServer blacklist updated; trigger a Channel 3 delta pull
epoch.bumpedAuth epoch advanced; trigger a Channel 3 delta pull

Implementation

The CP side uses cp::event::EventBus, an in-memory pub/sub system, with httplib chunked content providers for SSE streaming.

The server side runs CpEventSubscriber as an IBackgroundService. It parses the SSE stream and dispatches callbacks to the background runtime. The subscriber runs on a dedicated sse_pool (fixed at 1 thread), physically isolated from the cp_io pool. This prevents SSE's long-lived blocking httplib::Get from starving short-burst CP HTTP calls.

Degraded Mode

If the SSE connection drops, the server falls back to aggressive polling on Channels 1 and 3. Polling intervals tighten (e.g., 800ms/5000ms instead of 5s/30s) to maintain responsiveness until SSE recovers.

Channel 3: DB Delta

Incremental synchronization of security-critical data from the Control Plane to the server's local Runtime DB and in-memory caches.

Synchronized Tables

TableContent
auth_epoch_changesPer-user auth epoch bumps (session invalidation)
server_blacklist_changesIP/device/user blacklist entries

Implementation

RuntimeSecurityDeltaPuller continuously pulls deltas via Channel 1 endpoints. Timing adapts to SSE health:

SSE StatusDelta Pull Interval
Healthy5s (epoch) / 30s (blacklist)
Degraded800ms (epoch) / 5000ms (blacklist)

The server's local Runtime DB is the single source of truth for blacklist enforcement. There is no full-snapshot push from the CP -- only incremental deltas are applied.

Why Incremental Only?

Full snapshots would be expensive for large blacklists and would create thundering-herd problems when many nodes reconnect simultaneously. Incremental deltas are small, idempotent, and can be applied in any order. The since cursor ensures no changes are missed even if the node was offline for an extended period.

Channel 4: OCSP

A plain HTTP endpoint for OCSP stapling responses, running on a separate port and thread pool.

Why Plain HTTP?

OCSP responses are cryptographically self-protecting (RFC 6960, RSA-SHA256 signature). TLS transport is unnecessary and would add mTLS handshake overhead. A dedicated port avoids resource contention with the SSE long connection on Channel 2.

Endpoint

MethodPathPortDescription
POST/ocsp/9092Fetch OCSP response for node's server certificate

Behavior

  • Server-side (OcspStaplingManager): periodically fetches OCSP responses (~15 minutes, self-adaptive based on nextUpdate/2). Responses are cached locally and stapled into TLS handshakes with SDK clients.
  • CP-side (OcspResponder): signs responses on demand; does not cache.
  • Revocation handling: when a revoked response is received, the node continues stapling the revoked response to SDK clients. SDK clients with must-staple certificates will reject the handshake (fail-closed on the business side). However, the node does not self-shutdown -- management channels (mTLS /cp/v2/*, SSE, delta) remain alive so the admin can re-enable the node and it recovers automatically.
  • SDK verification: the SDK uses ocsp_signer_ca.pem (the tcp_server_ca trust anchor) to verify the OCSP signature, checks the time window (maxsec=1800s), and validates cert_status. Certificates with the must-staple extension reject handshakes when no staple is present.

Thread Isolation

Channel 4 runs on an independent httplib::Server with its own thread pool (ocsp_thread_pool_count), completely isolated from Channels 1 and 2.

Failure Modes and Recovery

Each channel is designed to degrade independently:

ChannelFailure ModeRecovery Behavior
Ch1 (Command)HTTP timeout / network errorRetry with exponential backoff; node continues with cached data
Ch2 (SSE)Connection droppedAuto-reconnect; system falls back to aggressive polling via Ch1/Ch3
Ch3 (Delta)Stale cursor / missed eventssince cursor guarantees catch-up on next successful pull
Ch4 (OCSP)Fetch failureCached OCSP response remains valid until nextUpdate expires

The key design principle: no channel failure prevents the server from serving authenticated clients. The worst case is delayed security policy updates (epoch bumps, blacklist changes), which are bounded by the degraded polling intervals.

Channel Interaction Summary

Server Node                              Control Plane
    │                                          │
    │──── Ch1: POST /cp/v2/.../checkin ──────>│  (periodic heartbeat)
    │                                          │
    │<──── Ch2: SSE event: command.pending ───│  (real-time hint)
    │                                          │
    │──── Ch1: POST /cp/v2/.../commands/pull ─>│  (pull actual data)
    │                                          │
    │──── Ch3: delta pull (via Ch1) ─────────>│  (security sync)
    │                                          │
    │──── Ch4: POST /ocsp/ ──────────────────>│  (cert revocation check)
    │                                          │

The design ensures:

  • No single point of failure -- each channel degrades independently.
  • No head-of-line blocking -- SSE, short HTTP, and OCSP use separate thread pools.
  • Minimal data duplication -- SSE events are hints; actual data is always pulled, ensuring consistency.

Configuration Delivery via Channel 1

Beyond security deltas, Channel 1 is the primary vehicle for delivering all configuration changes to server nodes. The configuration pull endpoint returns manifests for all six registered config types:

Config TypeContent
server_runtime_settingsServer runtime parameters (timeouts, limits)
app_policyApplication-level policies
app_variablesCloud variables readable by SDK clients
app_client_ca_bundleClient CA trust bundle for mTLS
app_mtls_trust_bundlemTLS trust bundle published to nodes
cp_agent_runtimeCP agent runtime parameters

The delivery lifecycle:

  1. Admin updates a configuration via the admin API.
  2. CP sends config.pending SSE hint to the affected node(s).
  3. Node pulls the manifest via /cp/v2/nodes/:id/configs/pull.
  4. Node applies the configuration locally.
  5. Node sends ACKs via /cp/v2/nodes/:id/configs/acks.
  6. If application fails, node reports errors via /cp/v2/nodes/:id/configs/errors.

Next Steps