About Me

Hi, I’m Sarthak Makhija, Principal Architect at Caizin.
I build storage engines, search systems, query engines, and distributed databases. My interests lie in database internals, search infrastructure, query processing, and distributed state management, primarily using Go and Rust.
Beyond building software, I document low-level implementation details on my blog, validate core patterns for industry literature like Unmesh Joshi’s Patterns of Distributed Systems and run hands-on systems trainings.
Core Systems & Architecture
Principal Architect | Caizin (2024 - Present): Architecting and building a production-grade transpiler designed to automatically migrate legacy UniBasic codebases into Java ecosystem. The implementation covers the entire compiler pipeline: custom lexical analysis, parsing, semantic analysis, AST transformation, and code generation.
Lead Consultant | Thoughtworks (2016 - 2024): Led the engineering team that built a distributed, strongly consistent Key/Value database from scratch in Go. Designed to guarantee strict correctness and serializable isolation across failure domains, the system sustained 10,000 transactions per second across 32 partitions with a replication factor of 3.
- Storage & Coordination: Badger engine instances for local storage; etcd for cluster metadata management.
- Sharding & Topology: Hash partitioning via consistent hashing, backed by multi-AZ replication to survive zone-level outages.
- Consensus & State: Multi-Raft consensus paired with a two-phase commit (2PC) engine.
- Networking: Custom, persistent TCP connections for low-latency node-to-node RPCs.
Production Systems
Modifying Distributed Search Internals Built a petabyte-scale search engine experiment by modifying Quickwit’s core architecture. To bypass traditional library limits, we overhauled the engine’s internal delete mechanics to implement a custom tombstone store and engineered a specialized multi-valued update engine on top of Tantivy.
Query Engine Processing Layers I authored the 7-part technical series “Inside a Query Engine” on this blog, breaking down physical execution and logical optimization mechanics. This deep-dive series currently ranks #1 globally on Google Search for query engine internals.
Open-Source Projects
🔹 infer (WIP) A type-inference compiler implementing Constraint-Based Hindley-Milner type system mechanics for a custom programming language.
🔹 Relop A minimal, in-memory relational query engine written in Rust. It handles everything from tokenization and custom AST parsing to rule-based logical plan optimizations and iterator-driven execution.
🔹 Go-LSM An educational log-structured merge-tree (LSM) key-value store in Go. Serves as a clean reference codebase for my storage internals workshops, demonstrating WAL boundaries, memtable flushes, and SSTable compaction.
🔹 clearcheck A fluent, expressive assertion library for Rust that allows chaining multiple test assertions together for highly self-documenting test cases.