FAQ
Why C++23?
Q: Why is AuthNexus written in C++23 instead of Go, Rust, or Java?
AuthNexus targets high-throughput, low-latency authentication workloads where every microsecond on the hot path matters. C++23 provides:
- Zero-cost abstractions -- no garbage collector pauses, no runtime overhead.
- Asio coroutines -- native
co_awaitintegration with the Asio networking library for scalable async I/O without callback spaghetti. - Direct hardware control -- thread domain isolation, memory layout control, and CPU-cache-friendly data structures.
- Mature ecosystem -- OpenSSL, Lua, SQLite, and libpq all have first-class C/C++ bindings.
The thread domain architecture (IO, Logic, DB, Crypto, CloudFunction pools all physically isolated) would be difficult to express cleanly in languages with a managed runtime.
Why a Custom Binary Protocol?
Q: Why not use HTTP/REST or gRPC for SDK-to-server communication?
The business data path between SDK clients and server nodes uses a custom binary protocol over TLS 1.3 for several reasons:
- Minimal overhead -- no HTTP header parsing, no JSON serialization on the hot path. Packets are compact fixed-layout structures.
- Bidirectional push -- the server can push notifications (announcements, force logout, kill process) to connected clients without polling or WebSocket upgrade negotiation.
- Session-oriented -- a single persistent TCP connection carries authentication, heartbeats, queries, cloud function calls, and push notifications. No connection-per-request overhead.
- mTLS-native -- mutual TLS is part of the connection lifecycle, not an afterthought. The four-layer validation (CA chain, certificate semantics, CP binding, business handshake) is deeply integrated.
HTTP is still used for admin-to-CP and node-to-CP communication where request-response semantics and human-debuggability are more valuable than raw throughput.
Why HMAC-SHA256 Instead of Argon2 / bcrypt?
Q: Isn't a fast hash insecure for passwords?
AuthNexus uses $authnexus-fast-hmac-sha256$v=1$ for a deliberate design trade-off:
| Property | Slow Hash (Argon2/bcrypt) | AuthNexus HMAC-SHA256 |
|---|---|---|
| Offline brute-force resistance | High | Relies on HMAC key secrecy |
| Online throughput | 100--1000 hashes/sec/core | 1,000,000+ hashes/sec/core |
| Latency per login | 50--500ms | Sub-millisecond |
| Requirement for crypto thread pool | Large pool needed | Minimal pool sufficient |
The HMAC key is a server-held secret. An attacker who obtains only the database (without the key) cannot perform offline attacks. This is comparable to the "pepper" technique used alongside slow hashes, except the entire hash is based on the secret key.
This trade-off is appropriate for:
- High-concurrency authentication servers handling thousands of logins per second.
- Environments where the HMAC key is stored in hardware security modules (HSMs) or secure enclaves.
- Systems where network-level brute-force is already mitigated by rate limiting and blacklisting.
If your threat model requires resistance to full server compromise (attacker obtains both the database and the key), consider adding an application-level slow hash in the SDK before transmission.
How Do I Migrate from Another System?
Q: Can I migrate existing users and data to AuthNexus?
AuthNexus includes a migration framework (src/migrator/) implemented in Python that supports:
- Dual-backend migration -- import into either SQLite or PostgreSQL.
- CShield migration -- tested with real CShield database dumps (10 database schema).
- Automatic certificate provisioning -- migrated applications automatically receive certificates via the PKI job system (zero C++ changes required).
The migration process:
- Export data from the source system.
- Run the Python migrator against the AuthNexus Control DB.
- Start the Control Plane -- the PKI job poller automatically provisions certificates for imported applications and nodes.
- Verify data integrity through the admin dashboard.
Legacy password hashes (plain SHA256, Argon2id) are not auto-migrated. Users must reset their passwords after migration.
Is mTLS Required?
Q: Can I disable mTLS for development or testing?
No. mTLS is a fundamental security invariant in AuthNexus, not an optional feature:
- SDK-to-server: TLS 1.3 with mutual authentication is mandatory. The client certificate embeds the
app_idvia URI SAN, which is validated against the business handshake. - Node-to-CP: mTLS over HTTP is required for all
/cp/v2/*endpoints. The node's identity is proven by itscp_node_client_ca-signed certificate.
For development, the PKI setup wizard generates all necessary CAs and certificates. The sdk_demo binary and the admin frontend demo mode work with the generated certificates without manual PKI setup.
The only exception is Channel 4 (OCSP), which uses plain HTTP because OCSP responses are cryptographically self-signed.
How Does AuthNexus Scale?
Q: What are the scaling characteristics?
AuthNexus scales vertically within a single node and horizontally across multiple nodes:
Vertical Scaling
A single server_app instance on an 8-core machine can handle thousands of concurrent SDK connections. The --auto thread configuration scales all thread pools proportionally to CPU cores.
Key vertical scaling dimensions:
- IO threads -- network connection capacity.
- Logic threads -- request processing throughput.
- DB threads -- query concurrency (PostgreSQL scales better than SQLite here).
- Cloud function threads -- Lua execution parallelism.
Horizontal Scaling
Deploy multiple server_app nodes, each connecting to the same control_plane_app. The Control Plane distributes configuration, commands, and security policy to all nodes.
SDK Clients ──> server_app (Node 1) ──> control_plane_app
SDK Clients ──> server_app (Node 2) ──> (shared)
SDK Clients ──> server_app (Node 3) ──>Each node operates independently with its own Runtime DB. Client routing to nodes is handled externally (DNS, load balancer, or application-level selection).
Control Plane Scaling
The Control Plane is currently a single process. For most deployments, a single CP instance is sufficient because:
- Admin API traffic is low-volume (human operators).
- Node checkins are infrequent (minutes, not seconds).
- SSE connections are one-per-node (dozens, not thousands).
What Happens When a Node Loses CP Connectivity?
Q: Can server nodes operate independently?
Yes, with graceful degradation:
| Capability | CP Connected | CP Disconnected |
|---|---|---|
| User authentication | Full | Full (using cached data) |
| Heartbeat processing | Full | Full |
| Cloud function execution | Full | Functions cached locally continue working |
| Session invalidation (epoch) | Real-time (seconds) | Delayed (polling at 800ms--5s) |
| Blacklist updates | Real-time | Delayed (polling at 800ms--5s) |
| New configurations | Delivered via Channel 1 | Queued until reconnection |
| OCSP stapling | Fresh responses | Cached response until nextUpdate expires |
The server node caches all critical data locally in its Runtime DB. Authentication, heartbeats, and cloud functions continue without interruption. Security updates (epoch bumps, blacklist changes) may be delayed but will catch up when connectivity is restored.
The --blacklist-fail-closed flag controls behavior when the blacklist cache itself is unavailable (not the CP connection): if set, all requests are denied until the cache is rebuilt.
Why Four Separate CAs?
Q: Wouldn't a single CA be simpler?
The four-CA model provides cryptographic isolation between trust domains:
- Compromise containment -- if one CA's key is compromised, only that trust domain is affected. A compromised
app_client_cadoes not grant access to the CP management plane. - Independent rotation -- each CA can be rotated on its own schedule without disrupting other domains.
- Least privilege -- server nodes hold
tcp_server_caandcp_node_client_cacertificates, but neverapp_client_casigning keys. SDKs holdapp_client_cacertificates but cannot impersonate nodes. - Revocation isolation -- revoking a client CA does not affect node-to-CP or CP server certificates.
Can I Use AuthNexus Without the Admin Frontend?
Q: Can I manage everything through the API?
Yes. The admin frontend is a pure SPA that communicates exclusively through the /admin/v1/* REST API. Every operation available in the UI is available via direct API calls. You can build your own management interface, use curl, or integrate with existing admin tools.
What Is the Database Lock on SQLite?
Q: I see SQLITE_BUSY errors under load. What should I do?
SQLite uses file-level locking. Under high write concurrency, lock contention can cause SQLITE_BUSY errors. Solutions:
- Increase busy timeout --
--sqlite-busy-timeout 10000(10 seconds). - WAL mode -- enabled by default, allows concurrent reads during writes.
- Migrate to PostgreSQL -- for deployments with multiple nodes or high write throughput, PostgreSQL eliminates this bottleneck entirely.
SQLite is best suited for single-node deployments with moderate traffic. PostgreSQL is recommended for anything beyond that.
How Are Cloud Functions Delivered to Nodes?
Q: How do Lua scripts get from the admin dashboard to the server node?
Cloud function delivery follows the standard configuration pipeline:
- Admin creates or updates a function via
POST /admin/v1/cloud-functions. - The function metadata and script body are stored in the Control DB.
- The CP sends a
config.pendingSSE hint (Channel 2) to relevant nodes. - Each node pulls the updated manifest via
POST /cp/v2/nodes/:id/configs/pull(Channel 1). - The node fetches the script body via
GET /cp/v2/objects/cloud-functions/:name?app_id=(Channel 1). - The script is compiled into Lua bytecode and cached in the node's cloud function runtime.
If SSE is unavailable, nodes discover new functions during their periodic config poll.
What Logging Level Should I Use in Production?
Q: What are the trade-offs between log levels?
| Level | Use Case | Volume |
|---|---|---|
info | Production default. Startup, shutdown, key events, errors | Low |
warn | Included in info. Degraded states, retries, near-limit conditions | Low |
debug | Local troubleshooting only. Per-request tracing, state transitions | High (tens of MB) |
trace | Extreme diagnostics. Packet-level, per-field logging | Very high |
Using debug or trace in production will generate tens of megabytes of logs quickly, pollute benchmark samples, and may impact performance under load. Always use info for production deployments and switch to debug only for targeted troubleshooting sessions.
Next Steps
- Getting Started -- build and run AuthNexus
- System Architecture -- detailed process and thread model
- Deployment Guide -- production configuration