Engineering Leadership: Why Ambiguity is More Dangerous Than Complexity
Introduction
In my experience scaling systems to 3M+ concurrent users, I’ve learned that the most difficult challenges aren't found in the code—they are found in the Ambiguity of the requirements.
Most Senior Engineering Managers (EMs) are experts at managing Complexity. We know how to handle distributed deadlocks, gRPC migrations, and database partitioning. Complexity is a known quantity; it follows the laws of logic.
But Ambiguity is a "twisted" game. It’s where the requirements shift mid-stream, and if you don't catch the pivot, you end up building a perfectly engineered solution for the wrong problem.
The Challenge: A "Simple" Notification Dashboard
I recently participated in a design session for a Head of Engineering role. The prompt seemed straightforward: “We have 30 microservices sending notifications with zero visibility. Build a centralized dashboard to visualize the flow.”
My immediate technical response was to solve for Observability:
Ingestion Layer: A centralized service to collect events.
Storage: A high-throughput, analytical store (OLAP).
Visualization: A dashboard to monitor system health and latencies.
The Pivot: The Persona Shift
Then came the requirement change: "Let’s share this dashboard with the client so they can track the lifecycle of their specific notifications."
In one sentence, the "Software Contract" changed.
By not pausing to interrogate this new constraint, I stayed on the analytical path. But "Sharing with a Client" isn't a Monitoring problem; it’s an Auditing and Business State problem.
Internal Teams need an Analytical View (OLAP): "Is the system 99.9% healthy?"
External Clients need a Transactional Truth (OLTP): "What is the exact state of my specific message?"
I was building a tool for Trends, but the client needed a tool for Evidence. One missed log for an engineer is a metric error; one missed notification for a client is a Broken Service Level Agreement (SLA).
The "ARC" Framework: De-Risking the Ambiguity
To navigate these shifts, I’ve refined a 3-point framework called ARC. It’s how I now "box in" a requirement before my team writes a single line of code:
A – Audience (The Persona Pivot): If the user shifts from an Internal SRE to an External Client, the stakes change. You are no longer "showing a graph"; you are "providing a guarantee."
R – Reliability (The Consistency Tax): Does the system need to be Fast or Indisputable? An analytical view (OLAP) is fast for trends; a transactional view (OLTP) is indisputable for state. For client-facing auditability, you need the latter.
C – Consequences (The Impact Test): What happens if the system fails for 5 seconds? If the result is an "Internal Alert," keep it lightweight. If the result is a "Loss of Trust" or a "Legal Dispute," you need a robust design like the Transactional Outbox Pattern to ensure data integrity.

Comments
Post a Comment