Inside a Query Engine: Introduction
Image by Gemini
Table of contents
What actually happens when you write:
SELECT name FROM users WHERE age > 25;
How does that string transform into rows flowing through relational operators?
This series dissects that journey.
We will follow a query from raw text to tokens, from tokens to syntax trees, from syntax trees to relational operators, to optimization, and finally to execution. Along the way, we will peel back the abstractions that make databases feel like magic.
To make this exploration concrete, we will use Relop: a minimal, handwritten relational operator engine in Rust designed for learning and architectural transparency.
While Relop simulates core database concepts like storage, catalog, and schema, it is strictly an in-memory engine. It provides the architectural “scaffolding” of a real database without the overhead of disk I/O, allowing us to focus entirely on the query processing logic.
What is this series?
This series is not about using a database.
It is about understanding how a database transforms a declarative language into a procedural execution plan.
SQL tells the system what to compute. The query engine decides how to compute it.
Instead of relying on industrial-grade libraries like sqlparser-rs, we will build core components by hand. The goal is not SQL completeness, but clarity. By focusing on a handwritten parser and a simplified execution engine, we can touch on the core algorithms and data structures that drive systems like Postgres, without getting lost in the thousands of lines of code required for production-grade SQL compliance.
A Brief on Query Processing
Query processing is a multi-stage pipeline that transforms a declarative SQL string into a result set. In a full-fledged database system, this pipeline may include optimization and physical planning phases.
In this series and in Relop, we focus on the following structural stages:
- Lexical Analysis: Breaking the raw string into tokens (e.g.,
SELECT,WHERE). - Syntactic Analysis (Parsing): Validating tokens against a grammar and building an Abstract Syntax Tree (AST).
- Logical Planning: Transforming the AST into a tree of relational operators (Scan, Filter, Join).
- Optimization: Applying logical rules to transform the naive plan into a more efficient one (e.g. Predicate Pushdown).
- Execution (The Volcano Model): Turning the optimized plan into executable operators that pull rows from storage.
(Note: Relop does not implement a physical planning phase.)
Conceptually, the flow looks like this:
SQL String ───▶ Tokens ───▶ AST ───▶ Logical Plan ───▶ Optimizer ───▶ Optimized ───▶ Executable ───▶ Rows
(Text) (Lexer) (Parser) (Relational Logical Plan Operators (Result)
Algebra) (Executor)
Typical Challenges in Query Processing
Building a robust query engine involves solving several non-trivial problems:
- Ambiguity and Precedence: Without a formal grammar, a query like
WHERE age > 30 AND city = 'Berlin' OR salary > 5000is ambiguous. Should the engine evaluate theANDfirst or theOR? A handwritten parser forces us to think explicitly about these rules. - Recursive Structures: Handling joins where a table source could be another joined table, or expressions where an operand could be another expression.
- Performance vs Generality: A parser must be expressive enough to support the language we design, yet simple enough to remain understandable and efficient.
- Bridging the Gap: A syntax tree represents intent. An execution engine produces results. Bridging that gap, turning declarative structure into procedural computation is the core intellectual leap in query processing.
The Series Roadmap
Here is what is planned for the upcoming essays:
- Lexical Analysis: From raw strings to a stream of tokens.
- Thinking in Grammar: Designing the data structures that hold a query’s intent. (Coming Soon)
- Handwritten Parser: Mapping the Grammar to Recursive Descent Code. (Coming Soon)
- Expressions and Precedence: Solving the logic puzzle of
AND,OR, and parentheses. (Coming Soon) - The Logical Plan: Converting the AST to a tree of relational operators. (Coming Soon)
- Query Optimization: Applying heuristics to transform a naive logical plan into an efficient one. (Coming Soon)
- Execution: Turning the Optimized Logical Plan into executable code. (Coming Soon)
Let’s start with Part 1: Lexical Analysis.