Posts

The Physics of Databases (Part 3): The Specialized Engines of the Final 10%

Image
Introduction In Part-1 and Part-2 , we mastered the transactional heavyweights. We learned how B-Trees and LSM-Trees manage the "Two-Layer Problem" of disk and network. But what happens when your data isn't just a row, but a relationship, a search term, or a high-dimensional concept? When general-purpose tools become your biggest bottleneck, you must enter the world of Specialized Physics . 1. The Inverted Index: The Physics of Search (Elasticsearch) Traditional databases are "Forward Indexes" ( $Key \rightarrow Row$ ). If you want to find every log entry containing the word CRITICAL , a B-Tree must perform a Full Table Scan , reading every byte of every row ( $O(N)$ ). The Mechanic: The Inverted Index. During ingestion, the engine (Lucene) tokenizes text into "terms." It builds a sorted map where the "Key" is the word and the "Value" is a Posting List (a compressed list of IDs where that word appears). Practical Example: Searc...

The Physics of Databases (Part 2): The "Two-Layer" Secret to Navigating the CAP Theorem

Image
Introduction In Part-1 , we explored how the physical storage engine (B-Trees vs. LSM-Trees) dictates your primary key strategy and single-node performance. But when you scale a database across multiple machines or global regions, the physical disk is only half the battle. One of the biggest mistakes engineers make is confusing the storage engine with the distributed protocol . If both Apache Cassandra and Google Cloud Spanner use LSM-Trees underneath, why is Cassandra eventually consistent while Spanner is strictly consistent? To choose the right database, you must evaluate the Two-Layer Problem . 1. The Two-Layer Database Architecture A distributed database is actually built of two completely separate architectural layers. Layer 1: The Local Storage Engine (The Disk) The Goal: Write bytes to a specific SSD as fast as mathematically possible. The Tech: B-Trees (PostgreSQL, MySQL) or LSM-Trees (Cassandra, Spanner, DynamoDB). This layer has absolutely no concept of "Consistency...