A predicate-pushdown query engine that rewrites execution plans across a 64-node cluster, reading Parquet at the SSD level to cut latency by an order of magnitude.
Analysts were waiting minutes for dashboards that should have rendered in seconds. The existing engine scanned far more data than each query needed, and the planner had no awareness of the physical layout on disk.
With a 4 PB warehouse and 600+ daily users, every wasted scan multiplied into real cost and real frustration.
I rebuilt the execution layer around predicate pushdown — filters are evaluated as close to the storage as possible, skipping entire row groups before they ever leave the SSD.
A cost-based planner rewrites queries across the cluster, balancing shuffle cost against parallelism so the 64 nodes stay saturated without thrashing the network.
Queries that took 90 seconds now return in under 8. The team stopped pre-aggregating data just to make dashboards usable.
P99 latency dropped 12×, and the engine now serves the entire analytics org from a single cluster. The cost-based planner removed an entire class of hand-tuned materialized views.