Unpacking Redshift: A System I Misjudged

How I Misjudged Redshift and What Its Architecture Taught Me About System Design at Scale

Lorenzo Bradanini and Lorenzo Tettamanti

Jun 21, 2025

A Personal Reckoning
Redshift’s High-Level Anatomy
From SQL to Native Execution
The Compiler-as-a-Service Revolution
Storage: Beyond the Local Horizon
Concurrency and Snapshot Isolation
Compute and Parallelism
Integration and Elasticity
Conclusion

A Personal Reckoning

When I first left Redshift behind for BigQuery, it felt a bit like stepping into the future. After a year of wrestling with Redshift’s quirks, BigQuery’s serverless, usage-based, practically-no-ops interface was a breath of fresh, well-indexed air.

I remember thinking something like: “Yeah, this is at least three times better.” That impression solidified into dogma. For years, I saw Redshift as a relic; serviceable, maybe, but nowhere near the cutting edge of large-scale OLAP systems.

But things changed, somehow. I had become increasingly curious not just about what BigQuery could do, but what it couldn’t do.

I wanted to understand its architectural boundaries, for example, how it handled latency, compilation, and parallelism at scale. That’s when I stumbled back across Redshift.

What began as a passing glance turned into something far more consuming. I found myself pulled back into Redshift not out of nostalgia, but out of genuine curiosity.

The deeper I went into understanding the limitations of BigQuery, especially around compilation latency and execution flexibility, the more I realized how little I understood about Redshift’s architectural model.

That curiosity became a thread I couldn’t stop pulling.

For over two weeks, I spent long nights dissecting every piece of material I could get my hands on. I read and reread Amazon Redshift Re-invented (2022), scanned countless pages of documentation, dove into user blogs, internal diagrams, AWS re:Invent talks, even forum threads from engineers in the trenches.

I diagrammed execution paths. I traced lineage back to ParAccel. I tried to reconstruct Redshift’s design decisions not just from the outside, but from the inside out, by asking what trade-offs they made, why they made them, and what those choices reveal about the system’s intent.

And still, it didn’t feel like enough. There was too much richness to leave undocumented, too much I needed to crystallize in order to really own the understanding I was forming. So I did what I often do when trying to lock in hard-earned insight: I decided to write.

This post is the result of that process: not a summary, not a tutorial, but a personal architecture walk-through. A reconstruction of Redshift as I came to understand it.

Writing this was both a technical exercise and a mental one: the kind of learning loop that only closes when you try to explain it to someone else.

If you’ve ever dismissed Redshift as a second-tier cloud warehouse, or assumed it’s simply “Postgres in the cloud”, I hope this gives you a reason to take a second look. I know I did.

It was exhausting, but deeply rewarding.

This isn’t a comparison piece. It’s a narrative reconstruction, that you can see more like a reflection on what Redshift has quietly become under the hood. As with many well-designed systems, the elegance only reveals itself when you stop benchmarking and start understanding.

Redshift’s High-Level Anatomy

At first glance, Redshift follows a familiar MPP (massively parallel processing) pattern: a single leader node orchestrates the work, while multiple compute nodes execute it in parallel.

But what looks like a classic shared-nothing topology on the surface conceals something far more dynamic underneath: a system engineered to reconcile two opposing forces: the speed and locality of SSD-backed compute, and the elasticity and scale of remote object storage.

Over the years, Redshift’s architecture has evolved into a tiered and disaggregated model, where the boundaries between storage and compute have been deliberately weakened, not to blur responsibility, but to allow flexibility and scale.

The current system, especially with the introduction of Redshift Managed Storage (RMS) and RA3 instance types, is best described as a three-tiered architecture:

1. Leader Node

This is the brain of the operation. It doesn’t store data or execute queries directly but is responsible for parsing SQL statements, optimizing execution plans, managing metadata, and orchestrating execution across the cluster.

It performs query rewriting, selects appropriate join strategies, and takes data distribution into account to minimize data shuffling.

The leader is what we call “topology-aware”: it plans queries not just based on logical cost but on the physical arrangement of compute resources and the partitioning of data. Think of it as both the compiler and scheduler in a distributed pipeline.

2. Compute Nodes

These are the muscle. Each node is divided into slices, which act as virtual execution units capable of running segments of a query in parallel.

Data is stored locally on high-speed SSDs in a compressed, columnar format, optimized for analytical workloads. These nodes receive compiled C++ code segments from the leader and execute them natively, by leveraging vectorized execution paths, SIMD instructions, and code specialization for each query.

3. Redshift Managed Storage (RMS)

RMS is Redshift’s disaggregated storage layer, introduced with RA3 nodes and available in Redshift Serverless. It replaces the old model where data was stored directly on compute nodes’ disks.

Instead, it places the durable source of truth in Amazon S3, giving Redshift unlimited storage capacity with high availability and strong durability guarantees.

But RMS is not "just S3." It is augmented with:

Tiered caching: hot blocks live in fast NVMe SSDs on compute nodes.
In-memory cache: for frequently accessed blocks and temp data.
Prefetching engines: that analyze access patterns and hydrate blocks before they’re needed.
Zone maps and bloom filters: to minimize unnecessary I/O.

This hybrid design allows Redshift to scale compute and storage independently.

Want more compute during business hours?
Add more RA3 nodes.
Want to shrink compute at night?
Scale it down

your data remains in S3, and only the metadata needs to be redistributed.

Why This Matters

This evolution mirrors what we’ve seen across the modern data stack: a shift toward disaggregation, not just for flexibility, but for operational stability.

By separating compute from storage, Redshift can now support features like elastic resize, concurrency scaling, cross-cluster snapshot restore, and even serverless execution, without forcing users to manually juggle volumes or redistribute data.

Where early Redshift clusters behaved more like monolithic Postgres-on-steroids, today’s Redshift is architected like a cloud-native, multi-tenant, distributed database, with just enough control for users who want it, and just enough abstraction for those who don’t.

From SQL to Native Execution

Every Redshift query begins its life in the leader node, but its path from SQL text to executed machine code is anything but ordinary.

This is not a classic interpreted query engine that passes plans to generic operators at runtime. Redshift’s execution model is compiled, distributed, and data-aware, which makes a departure from most traditional data warehouse architectures.

Parsing, Rewriting, and Planning

The leader node accepts the SQL statement and begins the frontend pipeline:

Parsing: SQL is tokenized, and syntax trees are constructed.
Rewriting: Redshift applies normalization and heuristic rewrites — pushing down predicates, rewriting subqueries, flattening joins.
Optimization: Here's where it gets interesting. Redshift’s planner isn’t just logical; it’s deeply topology-aware. It factors in:
- Data distribution (hash vs. round-robin).
- Node slices (virtualized compute units).
- Data statistics, zone maps, and bloom filters.
- Expected I/O costs based on block temperature (hot/cold).

Most importantly, it avoids data motion, the silent killer of performance in distributed systems.

If a join or aggregation would require redistributing data across nodes, the planner will either push down filters, suggest reordering, or even reject the plan if it becomes infeasible.

Code Generation and Specialization

Unlike engines that rely on a pool of interpreted physical operators, Redshift generates C++ code tailored to the query structure and data types involved.

Each operator in the physical query plan becomes a section of native code, compiled just-in-time into a segment. This segment is:

Monomorphic: Hardcoded for the actual data types involved (no runtime dispatch).
Simplified: Removes all conditional branching and type checking.
Pipelined: Designed to run in tight loops with minimal CPU stalls.

This level of specialization drastically reduces instruction path length, improves CPU predictability, and enables more aggressive use of compiler optimizations (loop unrolling, vectorization hints, etc.).

Segment Graph and Execution Units

A single query may result in dozens of compiled segments, each representing a portion of the plan: scan, filter, join, aggregate, shuffle.

These segments form a directed acyclic graph (DAG), where outputs of one segment feed into the next. Dependencies are explicit. Execution units are isolated and scheduled for parallel execution across node slices.

The segments are shipped over the network to compute nodes, where they are loaded into memory and executed natively.

If the required data blocks are already cached on SSD, the query begins immediately.
If not, Redshift initiates a hydration process from Redshift Managed Storage (RMS), pulling blocks from S3 into local storage before execution.

SIMD and Vectorized Scanning

Despite using full query compilation, Redshift makes a key optimization choice for data scanning: it doesn’t generate that portion of the code on the fly.

Instead, Redshift uses a precompiled SIMD vectorized scan layer, optimized ahead-of-time for all supported data types. These scan functions:

Read compressed columnar data.
Leverage AVX/SSE instructions for wide, parallel comparisons.
Apply zone maps and bloom filters to skip irrelevant blocks.

This tradeoff (precompiled scan paths with runtime-generated operator code) strikes a smart balance:

It avoids bloating the compilation surface with low-level I/O details.
It keeps compilation fast and avoids cold-start penalty for every query.
It ensures scan performance is always optimal, regardless of the query complexity.

Execution Characteristics

At this point, execution is fully distributed and parallel:

Each node slice executes segments independently.
Intermediate results are pipelined between segments in memory.
Output flows back to the leader node for final collation or client return.

And because the execution is native code:

There’s no interpretation overhead.
CPU cache locality is improved.
Latency spikes from runtime dispatch are eliminated.

This execution model resembles something closer to a compiler back-end for data processing than a traditional database query engine.

It's a model built for scale, where CPU efficiency, parallelism, and locality are treated as first-class citizens.

The Compiler-as-a-Service Revolution

One of Redshift’s most impactful architectural evolutions came a bit quietly around 2020, with the introduction of what Amazon calls Compilation-as-a-Service.

This feature fundamentally changed how Redshift handles query compilation: decoupling it from the compute cluster and turning it into a distributed service in its own right.

The Problem

Before this shift, Redshift’s query compilation was performed inline, directly on the leader or compute nodes within the cluster. That meant every query that required new code generation would:

Burn CPU cycles on production hardware.
Compete with active workloads.
Introduce latency spikes, especially for complex queries.

If the compiled segments weren’t already cached locally in the cluster’s code cache, execution would stall while the system generated and built native C++ code from the query plan.

This created a clear tension: specialization made execution fast, but code generation itself became a bottleneck, especially in workloads with frequent query variations or unpredictable patterns (e.g., BI tools, ad hoc dashboards, or parameterized SQL with changing shapes).

The Architectural Shift

With Compilation Service, Redshift externalized this process entirely:

When a query requires compilation, the leader node exports the query plan to an external compiler fleet: a pool of AWS-managed nodes dedicated solely to code generation.
These external compilers transform the query plan into a graph of native C++ segments, compile them, and return the resulting binary object files to Redshift.
The compiled segments are stored in a shared external cache (outside the cluster), indexed by a hash of the query plan and schema.

The cluster never compiles the code itself. It either:

Finds a match in its local cache (lowest latency).
Downloads the object from the external shared cache (moderate latency, but no compilation).
Requests compilation from the service, if no cached version exists (highest latency, but now offloaded).

Measurable Impact

The impact of this shift is non-trivial:

Fleet-wide cache hit rate improved from 99.60% to 99.95%.
87% of the time, when a query plan wasn’t in the cluster’s local cache, Redshift found the object in the external cache, avoiding a cold compile.
This significantly reduces cold-start latency, especially in multi-tenant or bursty workloads.
More importantly, it removes compiler load from the production cluster entirely, by freeing up CPU and memory for actual query execution.

Even when compilation is needed, the parallelism and scalability of the external service allows Redshift to handle complex or high-throughput scenarios without slowdown. Compilation latency is amortized across a larger pool of shared infrastructure.

Subtle but Strategic

This might seem like a small detail (caching compiled queries) but it reflects a deeper maturity in system design. Redshift is no longer just optimizing for execution speed. It’s optimizing for the entire lifecycle of a query:

From how quickly it starts.
To how efficiently it's compiled.
To how predictably it runs across clusters.

For serverless architectures and multi-cluster deployments, this is foundational. It means compiled object files can be reused across clusters, not just within one, enabling faster warm starts and more consistent latencies in autoscaled or ephemeral environments.

Storage: Beyond the Local Horizon

In its early days, Redshift followed a more rigid model: data was physically tied to compute nodes. If you wanted to scale up, you had to reshard or redistribute your data across the new hardware.

Scaling was a logistical operation, not a fluid one, and for a cloud-native system, that tight coupling was a long-term liability.

Enter Redshift Managed Storage (RMS).

RMS broke the binding between compute and storage by moving the durable layer to Amazon S3, transforming Redshift into a disaggregated system where compute nodes can scale independently of the data. This also enabled Redshift Serverless and laid the foundation for multi-cluster and elastic workloads.

But storing everything in S3 isn’t a free lunch. Object storage is incredibly durable, available, and versioned, but it’s also slower than local disk, especially when queries need high-throughput, low-latency access to columnar blocks. Redshift bridges this gap with a tiered caching strategy, where different layers trade off capacity and speed.

The Caching Layers

Tiered SSD Cache
Located on the compute nodes, this cache stores recently accessed data blocks, especially hot blocks that are repeatedly queried. It’s SSD-fast and optimized for high-throughput scans.
In-Memory Disk Cache
This layer sits above the SSD, holding the hottest blocks, including frequently used metadata and intermediate results, as well as temporary blocks generated during query execution. It dynamically expands and contracts based on system memory pressure.
Prefetching Engine
More than just passive caching, Redshift uses access pattern recognition to predict which blocks are likely to be needed soon. These blocks are pulled from S3 before they’re queried, reducing query latency and keeping SSD cache warm.

Intelligent Promotion and Eviction

The system uses reference counting to move blocks up and down this cache hierarchy:

Frequently accessed blocks are promoted toward memory and SSD.
Cold or stale blocks are evicted to make space.
On eviction, reference counts are decremented, and if the count drops to zero, the block is moved down or removed entirely.

This model feels familiar: it’s conceptually similar to Python’s memory management, where objects are kept alive as long as their reference count is non-zero, and to generational garbage collection, where “young” (hot) objects are treated differently from older (colder) ones.

Post-Resize Intelligence

One of the more elegant touches of RMS is its behavior after a cluster resize or restore. Since data no longer lives permanently on compute nodes, Redshift needs to rebuild the local cache from S3.

But instead of blindly pulling data, it uses historical access patterns (recorded during normal operation) to intelligently refill SSDs with the blocks that are most likely to be accessed again. This minimizes cold-start penalties and helps the system “remember” its working set, even across dynamic topology changes.

This kind of intelligent caching isn’t just a performance booster: it’s what makes elastic compute practical.

Without it, every resize would mean a full cache miss and a sharp performance drop. With it, Redshift can scale nodes up or down, knowing that the cache will rehydrate itself in a way that’s both efficient and workload-aware.

Concurrency and Snapshot Isolation

Redshift uses MVCC, extensively called multi-version concurrency control, just like most modern databases. But it layers on an additional mechanism called Serial Safety Net (SSN) to ensure serializability, even under high concurrency.

SSN uses summary metadata from prior commits to make lightweight certifier decisions. This strikes a very rare balance: transactional safety without sacrificing memory.

What I found particularly elegant is Redshift’s log-based commit protocol. Instead of writing random deltas to disk, it appends mutations to a commit log and treats the in-memory structures (superblocks) as buffers. The result? Up to 40% faster commits, with full snapshot isolation.

Compute and Parallelism

Redshift’s compute layer is where some of the most subtle engineering lives.

When you resize a cluster, Redshift doesn’t shuffle actual data. It reassigns partitions to nodes and hydrates the local SSDs accordingly. But this introduces a mismatch: now each node has a different set of partitions than before, which can skew performance.

The solution? Redshift decouples compute parallelism (threads, slices, etc.) from the data layout. Work units are assigned dynamically, and partitions can be shared or split between threads based on availability.

The system introduces an abstraction called slices, which basically are virtual subdivisions of a node that independently handle parallel execution. This isn't the same as data slices (used in RMS), but conceptually adjacent.

It’s one of those design choices that only becomes obvious after years of operational feedback. And it pays off: Elastic Resize operations are used over 15,000 times per month with a failure rate under 0.0001%.

Integration and Elasticity

Redshift isn’t a monolith anymore. It’s grown into a modular system with tightly integrated extensions; not bolt-ons, but real architectural limbs, each solving a scale-specific challenge.

Spectrum lets you query open-format data (Parquet, ORC, CSV) directly from S3, skipping ingestion entirely. It's ideal for data lake querying and long-term storage.
AQUA (Advanced Query Accelerator) offloads scans, filters, and aggregations to custom hardware (FPGAs and Nitro ASICs) near storage. Think of it as a distributed co-processor for I/O-heavy queries.
Materialized Views with Incremental Maintenance, powered by a custom Query Rewriting Framework (QRF), allow Redshift to maintain precomputed results by only processing new data. It's efficient, automatic, and optimizer-aware.

These aren’t side features. They’re core to Redshift’s modern design, built to push processing closer to the data, reduce movement, and extend the platform without bloating it.

Closing Reflection

If there’s one lasting lesson I’ve taken from revisiting Redshift, it’s this: system architecture evolves quietly, but relentlessly. From the outside, it’s easy to dismiss a system as dated, and to assume that innovation lives only in the new and shiny. But what looks frozen at the surface may in fact be reinventing itself underground, brick by brick, bit by bit.

Redshift isn’t trying to be Snowflake. It’s not trying to copy BigQuery either. It’s a system shaped by different decisions, different constraints, and a different moment in history. But over time (through countless small, strategic changes) it has become something deeply modern.

Sophisticated. Modular. Efficient. A system with rough edges, sure, but also one that’s battle-tested at cloud scale, and much more flexible than I ever gave it credit for.

For me, this was more than just a technical deep dive. It was a reminder of why I care about distributed systems in the first place. The elegance of solving hard problems with layered abstractions. The satisfaction of understanding how a query turns into compiled code.

The joy of discovering that behind every slow query, there’s usually a beautiful machine at work, if you’re willing to look close enough.

Redshift may not be the trendiest system out there. But it’s earned my respect. And if you care about the craft of systems design, I think it deserves yours too.

The Software Frontier

Discussion about this post

Ready for more?