Apache Paimon: The Streaming-Native Table Store Powering Real-Time Lakehouses
How to unify streaming ingestion, ACID transactions, and batch analytics with modern data architecture
Introduction
Over the past decade, data engineering has become a bit like juggling flaming swords while riding a unicycle: real-time streams, batch analytics, schema evolution, deduplication… all demanding simultaneous attention.
When I first started diving into modern data architectures, I remember staring at my own pipelines and thinking: there has to be a better way. I spent countless late nights trying to balance streaming ingestion from Kafka with batch queries in Spark, wrestling with schema evolution nightmares and manual compaction jobs that never seemed to end.
I knew there had to be a more elegant solution: something that didn’t make me choose between freshness, consistency, or operational sanity.
That’s when I stumbled upon Apache Paimon (formerly Flink Table Store).
At first, it sounded very much like just another table format; but the deeper I dug, the more it became clear that this was something different.
Streaming-native, ACID-compliant, snapshot-isolated: it promised to unify the worlds of high-throughput streaming ingestion and analytics-friendly batch queries.
And the more I experimented, the more I realized: this wasn’t just a tool; it was a lens into how modern data engineering could finally be practical and elegant at scale.
So here I am, literally after weeks of study, tinkering, and building small test pipelines, ready to share the lessons I learned.
If you’re building a real-time analytics pipeline, supporting CDC workloads, or architecting a next-generation lakehouse, understanding Paimon’s inner workings is more than a nice-to-have: it’s mission-critical.
But…..What makes it truly special? Why not just stick with Delta Lake or Iceberg? And how does it handle millions of events per second without spontaneously combusting?
Let’s find out together.
Why Do We Even Need Paimon?
Traditional data lakes (think HDFS, S3, Iceberg, Delta) were designed with batch-first analytics in mind. Large-scale scans? Sure, they can handle that. Continuous, multi-source streams hitting the same table while you also want point-in-time queries? Not so much.
Typical pain points include:
Read-after-write consistency is weak. Ever tried querying a table while two streaming jobs were writing to it? Good luck getting the latest data reliably.
Schema evolution is a headache. Multiple producers, evolving fields, downstream consumers… suddenly your “intuitive” schema evolution plan looks like a Jackson Pollock painting.
Manual compaction and deduplication. Small updates or deletes force expensive ETL gymnastics just to keep the table sane.
Paimon solves these elegantly by providing a true ACID-compliant table layer on object storage that handles streaming ingestion and batch analytics without sacrificing latency or consistency.
The Core Philosophy
What makes Paimon more than just another file format? Why not just Parquet-and-call-it-a-day? Let’s talk tables, streaming-first design, snapshots, and primary keys.
1. Table-Centric Design
A Paimon table is not a messy pile of Parquet files. It’s a managed logical entity with:
Primary Key Support: Updates and deletes happen efficiently, without rewriting entire partitions.
Partitioning: Queries skip unnecessary data; writes remain lightning fast.
Snapshots: Immutable layers tracking table states over time. Ever wanted a time machine for your analytics? Snapshots are basically that.
Files live in S3 (or HDFS), but you never have to manage them manually. Paimon handles partitioning, primary keys, consistency, and durability under the hood.
2. Streaming First, Batch Friendly
Flink is Paimon’s native playground. Producers, Kafka, Debezium, whatever’s streaming into your world, can write continuously to a Paimon table with exactly-once guarantees. Meanwhile, readers (Spark, Trino, Flink batch) see a consistent snapshot, even as new events are arriving.
// Flink streaming job writing CDC events to a Paimon table
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
TableEnvironment tEnv = StreamTableEnvironment.create(env);
tEnv.executeSql(
“CREATE TABLE user_events (”
+ “ user_id STRING,”
+ “ event_time TIMESTAMP(3),”
+ “ event_type STRING,”
+ “ PRIMARY KEY (user_id, event_time) NOT ENFORCED”
+ “) WITH (”
+ “ ‘connector’ = ‘paimon’,”
+ “ ‘warehouse’ = ‘s3://my-bucket/paimon-warehouse/’”
+ “)”
);
tEnv.executeSql(
“INSERT INTO user_events “
+ “SELECT user_id, event_time, event_type FROM kafka_source_table”
);
Simple, right? But under the hood, exactly-once semantics, partitioned writes, and incremental merges are all happening without blocking readers. That’s the magic; and the part most people don’t realize until they try it themselves.
How Paimon Keeps Your Data Alive and Healthy
By now, we know why Apache Paimon actually exists and why it’s a table-centric, streaming-first marvel.
But let’s get our hands a little bit dirty and look under the hood.
How does Paimon manage to handle millions of events per second without turning your S3 bucket into a chaotic junkyard of tiny Parquet files?
How does it let you query yesterday’s state while new events keep flooding in?
And why doesn’t the whole thing spontaneously explode under high throughput?
Let’s break it down.
Write-Ahead Logging (WAL)
Durability is the eternal headache of streaming systems. You want low-latency writes, but if a node crashes, you don’t want to lose a single event. WAL is Paimon’s secret sauce here.
Here’s how it works:
Data hits memory buffers – Flink tasks push incoming events into Paimon’s in-memory buffers.
Buffers are persisted to WAL storage – Think FSx, EBS, or NFS. This is your safety net.
Acknowledgment happens after persistence – Only when the write is durable does Paimon tell the producer, “Yep, got it!” Exactly-once semantics, check.
Delta files asynchronously flush to Parquet – The data eventually lands in S3 or HDFS, but your pipeline keeps flowing.
// WAL durability configuration in Flink SQL
tEnv.executeSql(
“CREATE TABLE user_events (”
+ “ user_id STRING,”
+ “ event_time TIMESTAMP(3),”
+ “ event_type STRING,”
+ “ PRIMARY KEY (user_id, event_time) NOT ENFORCED”
+ “) WITH (”
+ “ ‘connector’ = ‘paimon’,”
+ “ ‘warehouse’ = ‘s3://paimon-warehouse/’,”
+ “ ‘log.system’ = ‘aws_fsx’,”
+ “ ‘log.flush-interval-ms’ = ‘500’”
+ “)”
);
The beauty of this design? You can ingest millions of events per second, keep latency low, and avoid choking your object storage with millions of tiny writes. Essentially, WAL lets you have your cake and eat it too.
The Time Machine for Your Tables
Here’s a question: ever wished you could peek into the past of your data lake without triggering a full ETL? Snapshots are Paimon’s answer.
Each snapshot is lightweight, immutable, and transactional. It stores:
Partition and bucket layouts
File paths for base and delta files
Transaction metadata to guarantee atomic operations
Why should you care? Because snapshots give you:
Time-travel queries – “What did the inventory look like yesterday at 10 AM?”
Consistent reads during streaming ingestion – Your analytics jobs never see partially committed data.
Incremental scans – Only fetch what changed since the last snapshot.
-- Time-travel query example
SELECT *
FROM user_events /*+ OPTIONS(’snapshot’=’2025-11-22T10:00:00’) */
WHERE event_type = ‘purchase’;
It’s like having a DeLorean for your data; all without the flux capacitor or the need for plutonium.
Merge-on-Read and Compaction
Now, let’s talk files. Streaming ingestion produces tons of small delta files. Left unchecked, queries crawl, storage explodes, and your S3 costs go through the roof.
Paimon’s solution: merge-on-read + background compaction.
Delta files store:
Inserts
Updates (upserts)
Deletes (tombstones)
Example layout for a bucket:
s3://paimon-warehouse/user_events/date=2025-11-22/bucket=000/
├─ bucket-000-delta-0001.parquet
├─ bucket-000-delta-0002.parquet
└─ bucket-000-delta-0003.parquet
Compaction strategies:
Minor compaction: Merge a handful of delta files. Cheap, frequent, keeps reads efficient.
Major compaction: Merge everything into one base file per bucket. Expensive, but worth it for long-term performance.
-- Triggering compaction in Flink SQL
ALTER TABLE user_events COMPACT;
Everything here is fully transactional: new files are written atomically, snapshots update only after successful compaction, and no query ever sees half-baked results.
Caching Layers
Object storage is reliable, sure….but “fast”? Not so much. Fetching small pieces of Parquet from S3 can feel like sending postcards by pigeon: safe, but painfully slow. Paimon solves this with two layers of cache that keep your real-time queries snappy.
Log Cache – Think of it as a hot staging area for streaming writes. Before events hit S3, they sit in memory, letting Flink pipelines flow at full speed while still guaranteeing exactly-once semantics.
Block Cache – Keeps frequently-read columnar blocks in memory, so repeated queries don’t hammer storage.
Analysts checking the same few metrics repeatedly? No problem at all: cached blocks make it nearly instantaneous.
table.cache:
log_cache_size_mb: 1024 # memory for hot streaming writes
block_cache_size_mb: 8192 # memory for frequently-read blocks
The result? Even under Kafka storms or high-velocity CDC bursts, writes stay fast, reads stay responsive, and S3 quietly does the heavy lifting.
Caching turns a sluggish, IO-bound nightmare into a near in-memory experience: without sacrificing durability.
Handling Updates, Deletes, and CDC
Upserts and deletes are tricky in streaming. How do you modify a record that’s already in the lake without rewriting huge partitions?
Paimon handles this elegantly with delta records + tombstones, merged during compaction. No ETL gymnastics required.
-- CDC-style upsert
MERGE INTO user_events AS t
USING staging_events AS s
ON t.user_id = s.user_id AND t.event_time = s.event_time
WHEN MATCHED THEN
UPDATE SET t.event_type = s.event_type
WHEN NOT MATCHED THEN
INSERT *
This lets you mix real-time ingestion and historical queries without breaking a sweat.
Scaling and Parallelism
Paimon’s internal architecture naturally scales:
More buckets → higher write parallelism
Partitions → efficient batch query pruning
Compaction → balances read efficiency vs. write throughput
Pro tip: treat minor compactions like your daily vitamins: cheap, frequent, and necessary.
Major compactions? They’re the full-body workout, less frequent but crucial for long-term performance.
Why Paimon Beats the Alternatives
Iceberg? Delta Lake? Hudi? Sure, they all have their merits.
ACID transactions? Check. Snapshot isolation? Check. Even Hudi can handle streaming in certain contexts.
But here’s the thing: when your pipeline is streaming-heavy, Flink-native, and CDC-driven, the story changes.
Paimon was designed from the ground up for exactly this scenario. It doesn’t just support streaming: it really expects it. Merge-on-read? Built in. CDC support? Full-fledged, first-class, not some afterthought.
Snapshots and time-travel queries? Efficient and fully transactional. Native Flink integration? Absolutely: no glue code or hacks required.
The difference is subtle until you run millions of events per second through your lakehouse and notice how Delta or Iceberg pipelines start to creak under pressure. Small updates turn into rewrite nightmares, merge logic becomes convoluted, and latency starts creeping up.
Meanwhile, Paimon keeps chugging along, handling writes, compactions, and reads with a calm, almost Zen-like efficiency.
It’s the kind of system that makes you think: why settle for a table format that treats streaming as an afterthought, when Paimon treats it as the main act?
Closing Thoughts
Apache Paimon isn’t just another table format to tick off your checklist. It’s a streaming-native, transactional, object-storage-first table system that truly unifies streaming ingestion, incremental updates, and batch analytics.
From its WAL ensuring durability without blocking ingestion, to snapshots enabling consistent reads and time travel, to merge-on-read compactions and intelligent caching, Paimon addresses the core headaches of modern data engineering.
It’s built to scale, built to perform, and built to make engineers’ lives a little less… chaotic.
If you’re building next-generation lakehouses or real-time data platforms, learning Paimon’s internals, and tuning them effectively, can unlock analytics at a scale that traditional systems can only dream of.
So the next time someone casually asks, “Why not just use Delta?”, you can lean back, sip your coffee, and say with a smirk: “Have you alredy met Paimon?”



