Sarthak Makhija
Sarthak Makhija Databases & Storage Systems
Sarthak Makhija
Sarthak Makhija Databases & Storage Systems

About Me

Sarthak Makhija

Hi, I’m Sarthak Makhija, Principal Architect at Caizin. I write long-form essays on refactoring, storage engines, databases, and engineering trade-offs.

Prior to joining Caizin, I was with Thoughtworks where I led a team that developed a strongly consistent, distributed key/value storage engine in Go. This system was built with a focus on high availability and strict correctness, featuring:

  • Core Storage & Coordination: Badger as the underlying local key/value engine, with etcd managing cluster metadata.
  • Distribution & Sharding: Hash partitioning for data distribution across the cluster, using consistent hashing for the assignment of partitions/shards.
  • Consistency & Consensus: Raft/Multi-Raft for consensus, paired with two-phase commit ensuring a serial isolation level.
  • Networking: Persistent TCP connections for efficient, low-latency node-to-node communication.

I also enjoy sharing my knowledge and contributing to the broader engineering community:

  • Authoring: I contributed to the validation of distributed system patterns in the book Patterns of Distributed Systems by Unmesh Joshi. I authored articles on persistent memory for Marcin Moskala.
  • Workshops: I design and facilitate hands-on, deep-dive workshops focused on mastering software craftsmanship and storage internals.

Additionally, I spend time building educational systems from scratch to demystify how databases and distributed systems work under the hood.


Currently exploring

  • Finished building a query engine (Relop) in Rust + launched a 7-part series on its internals
  • Designing and building Nilo, my own programming language (private repository for now)
  • Writing technical essays on tech-lessons.in

Talks

Questioning database claims: Design patterns of storage engines

I gave a talk on “Questioning database claims: Design patterns of storage engines” at GoConIndia24 on 2nd December. Link to the talk.

The idea of the talk was to understand various patterns of storage engines (/key-value storage engines) like persistence (WAL, fsync), efficient retrieval (B+tree, bloom filters, data layouts), efficient ingestion (Sequential IO, LSM, Wisckey) and then explore a variety of database claims like durability, read optimization, write optimization and pick the right database(s) for our use case.


Some Projects

🔹 Relop Relop is a minimal, in-memory implementation of relational operators built to explore query processing. It covers the pipeline from lexical analysis and parsing to logical planning, optimization and execution.

I have documented the building of Relop in a 7-part series that explains its internal architecture.

Key Features

  • SQL Support: Supports basic selection, filtering (WHERE), ordering, and joins.
  • Educational Focus: Built with a focus on understanding the internals of a query engine, inspired by Crafting Interpreters and Database Design and Implementation.
  • End-to-End Pipeline: Implements the query parsing flow including tokenization, AST generation, logical plans, optimizations and physical execution via iterators.

🔹 Go-LSM LSM-based key-value store in Go for educational purpose, inspired by LSM in a Week. It is a rewrite of the existing workshop code.

Exploring LSM with go-lsm

  • Learn LSM from the ground up: Dive deep into the core concepts of Log-Structured Merge-Trees (LSM) through a practical, well-documented implementation.
  • Benefit from clean code: Analyze a meticulously crafted codebase that prioritizes simplicity and readability.
  • Gain confidence with robust tests: Verify the correctness and reliability of the storage engine through comprehensive tests.
  • Experiment and extend: Customize the code to explore different LSM variations or integrate it into your own projects.

🔹 clearcheck Write expressive and elegant assertions with ease! clearcheck is designed to make assertion statements in Rust as clear and concise as possible. It allows chaining multiple assertions together for a fluent and intuitive syntax, leading to more self-documenting test cases.

let pass_phrase = "P@@sw0rd1 zebra alpha";
pass_phrase.should_not_be_empty()
    .should_have_at_least_length(10)
    .should_contain_all_characters(vec!['@', ' '])
    .should_contain_a_digit()
    .should_not_contain_ignoring_case("pass")
    .should_not_contain_ignoring_case("word");


Resume

Download my Resume (PDF)