Inside a Query Engine (Part 5): From Syntax to Algebra
Table of contents
In the previous parts of this series, we’ve focused on the syntax of a query, how to break a string into tokens and how to validate those tokens against a grammar to build an Abstract Syntax Tree (AST).
But a database doesn’t “execute” an AST. An AST is a linguistic structure; it represents how the query was written. To run the query, we need to understand what it means algebraically.
This part covers the transition: From Syntax to Algebra.
The Transition: AST vs. Logical Plan
The AST is syntactic. It captures the structure of the SQL statement, keeping track of things like keywords, commas, and parentheses. It is a faithful representation of the user’s input.
The Logical Plan is algebraic. It represents the query as a tree of relational operators. It moves us away from SQL syntax and into relational transformations over data sets.
| Feature | Abstract Syntax Tree (AST) | Logical Plan |
|---|---|---|
| Focus | How the query is written (Syntax) | Algebraic interpretation of the query |
| Form | Hierarchical representation of tokens | Tree of relational operators |
Why the AST Is Not Enough
Consider a simple query:
SELECT id FROM users WHERE age > 30;
The AST for this query is a tree that says: “There is a Select node. It has a projection list with id, a source users, and a filter condition age > 30.”
However, an execution engine needs a structured representation of the relational transformations involved:
- Scan the
userstable. - Filter the resulting rows where
age > 30. - Project only the
idcolumn.
The AST captures the structure of the query, but the Logical Plan describes the sequence of relational transformations required to evaluate it.
Introducing Relational Algebra
The Logical Plan is built using the building blocks of Relational Algebra. Don’t let the name intimidate you; in the context of a query engine, these are simply operations that take one or more relations (tables/data sets) as input and produce a new relation as output.
In Relop, we focus on a core set of operators:
- Scan: The base operator that reads rows from a table.
- Selection: Also known as Filter. It picks rows that satisfy a predicate (e.g.,
age > 30). - Projection: Selects specific columns from a relation (e.g., just the
id). - Join: Combines two relations based on a condition.
- Sort / Limit: Auxiliary operators for ordering and truncating the result set.
The Logical Plan Tree
In code, we represent LogicalPlan as a recursive enum. Each node in the tree holds a reference to its child (the base_plan), creating a “chain” of operations:
pub(crate) enum LogicalPlan {
/// Plan to scan a table.
Scan {
table_name: String,
alias: Option<String>,
},
/// Plan to perform a join between two tables.
Join {
left: Box<LogicalPlan>,
right: Box<LogicalPlan>,
on: Option<Predicate>,
},
/// Plan to project specific columns from a base plan.
Projection {
base_plan: Box<LogicalPlan>,
columns: Vec<String>,
},
/// Plan to filter results based on a predicate.
Filter {
base_plan: Box<LogicalPlan>,
predicate: Predicate,
},
/// Plan to limit results.
Limit {
base_plan: Box<LogicalPlan>,
count: usize,
},
/// Plan to order the results.
Sort {
base_plan: Box<LogicalPlan>,
ordering_keys: Vec<OrderingKey>,
},
// ... ShowTables, DescribeTable
}
Unary operators (Filter, Projection, Limit, Sort) have a single child, while binary operators (Join) have two.
For the query SELECT id FROM users WHERE age > 30, the logical tree looks like this:
Projection(id)
└── Filter(age > 30)
└── Scan(users)
Conceptually, data flows from the leaves (Scan) upward toward the root. The Scan produces rows (via Iterator), which flows into the Filter, which then flows into the Projection. Although a logical plan resembles a sequence of steps, it is still declarative. It describes data flow dependencies, not physical execution order.
Mapping AST ──▶ Logical Plan
The component responsible for this transformation is the Planner. It traverses the AST and “lowers” it into the algebraic structure.
In Relop, this is handled by LogicalPlanner:
pub(crate) struct LogicalPlanner;
impl LogicalPlanner {
pub(crate) fn plan(ast: Ast) -> Result<LogicalPlan, PlanningError> {
match ast {
Ast::Select {
source,
projection,
where_clause,
limit,
order_by,
} => {
// 1. Start with the source (Scan or Join)
let base_plan = Self::plan_for_source(source)?;
// 2. Wrap it in a Filter if WHERE exists
let base_plan = Self::plan_for_filter(where_clause, base_plan)?;
// 3. Wrap it in a Projection
let base_plan = Self::plan_for_projection(projection, base_plan);
// 4. Layer on Sort and Limit
let base_plan = Self::plan_for_sort(order_by, base_plan);
Ok(Self::plan_for_limit(limit, base_plan))
}
// ... ShowTables, DescribeTable mapping
}
}
fn plan_for_source(source: TableSource) -> Result<LogicalPlan, PlanningError> {
match source {
TableSource::Table { name, alias } => {
Ok(LogicalPlan::Scan { table_name: name, alias })
}
TableSource::Join { left, right, on } => {
Ok(LogicalPlan::Join {
left: Self::plan_for_source(*left)?.boxed(),
right: Self::plan_for_source(*right)?.boxed(),
on: on.map(Predicate::try_from).transpose()?,
})
}
}
}
fn plan_for_filter(
where_clause: Option<WhereClause>,
base_plan: LogicalPlan,
) -> Result<LogicalPlan, PlanningError> {
if let Some(clause) = where_clause {
return Ok(LogicalPlan::Filter {
base_plan: base_plan.boxed(),
predicate: Predicate::try_from(clause)?,
});
}
Ok(base_plan)
}
// ... plan_for_projection, etc.
}
How the Planner Works
- Bottom-Up Construction: Although a SQL query begins with
SELECT, the planner starts by constructing the data source. It first plans theFROMclause (producing aScanorJoin), and then wraps that source with additional operators likeFilter,Projection,Sort, andLimit. - Conversion: It converts syntactic expressions (from the AST) into logical Predicates. While the AST’s
Expressionenum is about nested logic, aPredicateis about something that can be evaluated against a row. - Strict Hierarchy: The planner enforces a standard order (Projection → Filter → Scan; top-to-bottom). Each stage builds on the one before it. The
Scanprovides raw rows,Filterreduces them,Projectionnarrows the columns, and so on. This ensures that later operators have access to the full context they require.
The LogicalPlanner in Relop is available here.
AST vs. Logical Plan: A Visual Comparison
Let’s look at how the same query is represented in both worlds:
QUERY: SELECT name FROM users WHERE id = 1
AST (Syntax) Logical Plan (Algebra)
──────────────────────────── ─────────────────────────────────
Select { Projection(name)
source: "users", └── Filter(id = 1)
projection: ["name"], └── Scan(users)
where: Comparison(id = 1)
}
The AST is a data structure defining what the user said. The Logical Plan is a tree defining how relations are transformed and combined.
Logical Plan as an Intermediate Representation (IR)
This layer is often referred to as an Intermediate Representation (IR). By decoupling the parser from the executor, we gain a massive advantage: Optimization.
Since the Logical Plan is an algebraic tree, we can perform “Tree Rewrites” before execution. These rewrites preserve logical equivalence while improving performance. This is where most of the “smarts” of a database live:
- Predicate Pushdown: If we have a filter, can we move it closer to the
Scanto read fewer rows? - Projection Pruning: If we only need the
name, can we tell theScanto ignore all other columns early? - Join Reordering: If we are joining three tables, which order will involve the smallest intermediate results?
Even in a minimal engine like Relop, having this IR is the foundation for turning a static query into a high-performance execution.
Conclusion
The transition from a syntactic AST to an algebraic Logical Plan is the moment a query engine stops thinking about “strings” and starts thinking about “data.”
(Note: Relop does not implement a physical planning phase.).
We’ve crossed the bridge from language into the world of relational operators. With a Logical Plan in hand, we have a clear recipe for what we want. But before we actually execute it, we can make that recipe cheaper. In the next part of this series, we will look at Query Optimization (Coming Soon) to see how Relop transforms this naive plan into an efficient one.
References
- Database design and implementation
- Chapter 10