All posts
Engineering

Predicate pushdown, explained simply

How moving filters closer to storage turned a 90-second query into an 8-second one — without buying a single extra server.

9 min read#Databases#Performance#Rust
Predicate pushdown, explained simply

The cheapest data to process is the data you never read. Predicate pushdown is the discipline of deciding what to skip as early as physically possible.

Where the time actually goes

In a columnar warehouse, most query time is spent moving bytes off disk and across the network. If a filter can eliminate a row group before it is ever read, that is pure saved cost.

Parquet stores min/max statistics per row group and page. A query for orders > $1,000 can skip any block whose maximum is below that threshold — often most of the file.

Pushing the filter down the stack

The trick is teaching every layer — planner, scanner, storage — to honor the predicate instead of re-checking it at the top. Each layer that drops rows early shrinks the work for everything above it.

  • Planner: rewrite the query so filters bind to scans.
  • Scanner: use statistics to skip row groups and pages.
  • Execution: evaluate remaining filters vectorized, in batches.

The payoff

On a 4 PB warehouse, pushdown took a representative dashboard query from 90 seconds to under 8 — with the exact same hardware.