About Me

Hi, I’m Sarthak Makhija, Principal Architect at Caizin. I write long-form essays on refactoring, storage engines, databases, and engineering trade-offs.
Prior to joining Caizin, I was with Thoughtworks where I led a team that developed a strongly consistent, distributed key/value storage engine in Go. This system was built with a focus on high availability and strict correctness, featuring:
- Core Storage & Coordination: Badger as the underlying local key/value engine, with etcd managing cluster metadata.
- Distribution & Sharding: Hash partitioning for data distribution across the cluster, using consistent hashing for the assignment of partitions/shards.
- Consistency & Consensus: Raft/Multi-Raft for consensus, paired with two-phase commit ensuring a serial isolation level.
- Networking: Persistent TCP connections for efficient, low-latency node-to-node communication.
I also enjoy sharing my knowledge and contributing to the broader engineering community:
- Authoring: I contributed to the validation of distributed system patterns in the book Patterns of Distributed Systems by Unmesh Joshi. I authored articles on persistent memory for Marcin Moskala.
- Workshops: I design and facilitate hands-on, deep-dive workshops focused on mastering software craftsmanship and storage internals.
Additionally, I spend time building educational systems from scratch to demystify how databases and distributed systems work under the hood.
Currently exploring
- Finished building a query engine (Relop) in Rust + launched a 7-part series on its internals
- Designing and building Nilo, my own programming language (private repository for now)
- Writing technical essays on tech-lessons.in
Talks
Questioning database claims: Design patterns of storage engines
I gave a talk on “Questioning database claims: Design patterns of storage engines” at GoConIndia24 on 2nd December. Link to the talk.
The idea of the talk was to understand various patterns of storage engines (/key-value storage engines) like persistence (WAL, fsync), efficient retrieval (B+tree, bloom filters, data layouts), efficient ingestion (Sequential IO, LSM, Wisckey) and then explore a variety of database claims like durability, read optimization, write optimization and pick the right database(s) for our use case.
Some Projects
🔹 Relop Relop is a minimal, in-memory implementation of relational operators built to explore query processing. It covers the pipeline from lexical analysis and parsing to logical planning, optimization and execution.
I have documented the building of Relop in a 7-part series that explains its internal architecture.
Key Features
- SQL Support: Supports basic selection, filtering (WHERE), ordering, and joins.
- Educational Focus: Built with a focus on understanding the internals of a query engine, inspired by Crafting Interpreters and Database Design and Implementation.
- End-to-End Pipeline: Implements the query parsing flow including tokenization, AST generation, logical plans, optimizations and physical execution via iterators.
🔹 Go-LSM LSM-based key-value store in Go for educational purpose, inspired by LSM in a Week. It is a rewrite of the existing workshop code.
Exploring LSM with go-lsm
- Learn LSM from the ground up: Dive deep into the core concepts of Log-Structured Merge-Trees (LSM) through a practical, well-documented implementation.
- Benefit from clean code: Analyze a meticulously crafted codebase that prioritizes simplicity and readability.
- Gain confidence with robust tests: Verify the correctness and reliability of the storage engine through comprehensive tests.
- Experiment and extend: Customize the code to explore different LSM variations or integrate it into your own projects.
🔹 clearcheck Write expressive and elegant assertions with ease! clearcheck is designed to make assertion statements in Rust as clear and concise as possible. It allows chaining multiple assertions together for a fluent and intuitive syntax, leading to more self-documenting test cases.
let pass_phrase = "P@@sw0rd1 zebra alpha";
pass_phrase.should_not_be_empty()
.should_have_at_least_length(10)
.should_contain_all_characters(vec!['@', ' '])
.should_contain_a_digit()
.should_not_contain_ignoring_case("pass")
.should_not_contain_ignoring_case("word");