What alternatives get wrong blog#1254
Conversation
Signed-off-by: Chris Cranford <chris@hibernate.org>
21d9c14 to
c5af8cb
Compare
| **You need a fully managed service with no operational responsibility**. | ||
| Debezium is self-hosted. | ||
| You run it, you monitor it, you upgrade it. | ||
| If you want to hand all that to a vendor, Debezium may not be the right fit, not because it's complex, but because that is not what it is. |
There was a problem hiding this comment.
Not exact, there are vendors that give you CDC-as-a-Service. In fact this is not somehting that Debezium should provide you. It would be the same as saying yes, criticism is right Qorkuas project does not give you PaaS environment.
There was a problem hiding this comment.
And also Debezium Platform could then simplify this point
There was a problem hiding this comment.
I mean, our own company offers this, via Confluent Cloud :) Ofc. this article isn't meant to be a sales ad, but pointing out the relationship between Debezium as an upstream OSS project (which can be self-run) and managed downstream services like Confluent's would be fair.
There was a problem hiding this comment.
What do you guys feel about
You need a fully managed CDC-as-a-Service offering.
Debezium is an open-source project, not a hosted service.
If your requirement is that someone else operate the CDC infrastructure for you, handling provisioning, upgrades, support, and SLAs, that's a different product category.To be transparent, several managed services use upstream Debezium connectors in their offerings, including Confluent Cloud.
The relationship between Debezium, an upstream open-source project, and the managed downstream services built with it is a feature of the ecosystem.
Choose the self-hosted project if you want to run it yourself, or the managed service if you want someone else to run it.Both are valid, and the criticism that "Debezium isn't a managed service" is really just a question of which layer of the stack you're shopping for.
| --- | ||
|
|
||
| If you've ever searched for Change Data Capture (CDC) solutions recently, you've almost certainly landed on an article with a title like "_Top Debezium Alternatives in 2026_" or "_Why We Moved Away from Debezium_.". | ||
| These articles follow a familiar pattern: they list the same handful of criticisms, present their alternative solution, and move on. |
There was a problem hiding this comment.
I wouldn't refer to present alternative solution, that sounds like an account settling.
There was a problem hiding this comment.
I have two alternatives. What are your opinions?
These articles follow a familiar pattern: they list the same handful of criticisms and recommend a different tool.
These articles follow a familiar pattern: the same handful of criticisms appear again and again, often without any examination of whether they still hold today.
| From there, Debezium participates in the standard Quarkus application lifecycle. | ||
| Configuration lives in `application.properties`, alongside the rest of your application's configuration. | ||
| Change events are consumed as CDI events. | ||
| The entire Quarkus developer experience, including dev services that spin up databases automatically for local development, is available. |
There was a problem hiding this comment.
For the purpose of the message of this article it makes sense to point to https://docs.spring.io/spring-integration/reference/debezium.html
Also Debezium Engine without any framework support should be mentioned.
There was a problem hiding this comment.
Here I'd mention possibility to use Debezium even in Python stack using PyDebezium engine (not at the part about operating JVM stack, see my comment bellow).
There was a problem hiding this comment.
OMG, TIL about PyDebezium, whattt?!
But yes, I think the sequencing needs to be different:
- available as a library for in-app usage (e.g. cache invalidation is a great one)
- plain, Quarkus, Spring, etc.
- PyDebezium
|
|
||
| The alternative articles that cite this as a criticism are, in effect, saying "Debezium doesn't try to replace a stream processing framework." | ||
| That is correct, and it's not meant to. | ||
| It's meant to complement those solutions. |
There was a problem hiding this comment.
With in-process Debezium you can do anything as you have available the whole programming language.
There was a problem hiding this comment.
Thoughts on this
Any alternative article that cites this as a criticism is, in effect, saying "Debezium doesn't try to replace a stream processing framework."
That is correct, and it's not meant to.
It's meant to complement those solutions or be embedded within one.The last point is worth emphasizing.
When you run Debezium in-process, either with the Quarkus extensions or the Embedded engine, transformation isn't a separate concern at all.
Change events arrive as objects in your application, and there you have the entire JVM ecosystem available: use any library, framework, or custom logic you need.The "limited transformations" framing assumes a pipeline architecture in which Debezium emits events and something downstream processes them.
That's one valid deployment model, but it's not the only one.
|
|
||
| **Your team has no existing JVM or distributed systems operational experience**. | ||
| Debezium runs on the Java Virtual Machine (JVM). | ||
| If your operations team has no familiarity with that ecosystem, there will be a learning curve. |
There was a problem hiding this comment.
Can we refer to pydebezium engine as an example of no need to work with Java directly?
There was a problem hiding this comment.
I'm afraid Chris' point about JVM knowledge still applies here, because you still need to run JVM and eventually do the troubleshooting (in case of integration with python, not so frequently used, I guess you would need to debug or at least report bugs more often than when you just run e.g. Debezium server).
Maybe I'd consider to mention here compiling Debezium into native executable using GraalVM to avoid installing and using JVM.
There was a problem hiding this comment.
I think we can work both of those in, as I think they both have merit.
There was a problem hiding this comment.
Does this fit what you two had in mind?
Your team has no existing JVM or distributed systems operational experience.::
Debezium runs on the Java Virtual Machine (JVM).
https://github.com/memiiso/pydbzengine[PyDebezium] makes it possible to consume change events from Python applications, and https://debezium.io/documentation/reference/stable/integrations/quarkus-debezium-engine-extension.html[GraalVM native compilation] removes the need to install a JVM at runtime. +
+
Even so, the underlying engine is still a JVM-based system.
When something goes wrong in production, and in any distributed system, eventually something will; troubleshooting and bug reporting will benefit from at least some familiarity with that ecosystem.
If your operations team has none, there will be a learning curve.
|
|
||
| Let's look at what a basic Debezium Server setup actually involves. | ||
|
|
||
| A minimal `docker-compose.yml` for capturing changes from PostgreSQL and emitting them to a sink of your choice looks like this: |
There was a problem hiding this comment.
With docker composee cotaining only a single container you can also just use docker command directly. That could be other example using for example Google PubSub?
There was a problem hiding this comment.
I'd still prefer the docker-compose way. It'll hopefully be closer to real deployment scenarios.
There was a problem hiding this comment.
Would there be any harm in showing both?
| Many tutorials and articles were written during a period long before these new options were introduced, and are still indexed and widely shared. | ||
| However, it is not an accurate description of Debezium today. | ||
|
|
||
| If you elect to choose an alternative tool specifically to avoid a Kafka dependency, it's worth asking whether you evaluate Debezium Server and Platform first. |
There was a problem hiding this comment.
I can't say what it is, but I feel like this sentence should have another tone too. Maybe more passive wording? wdyt?
There was a problem hiding this comment.
If you're referring to line 44, I can certainly see if there is a lighter way to make the point.
There was a problem hiding this comment.
I think we need to be generally careful to not support any "Kafka is overhead" narrative, consciously or subconsciously. The Debezium Server angle should rather be that "Debezium also is available to you when using other streaming platforms".
|
|
||
| The source is open. | ||
| The community is active. | ||
| And the architecture is significantly more flexible than what you may have realized. |
There was a problem hiding this comment.
Maybe pointing to the picture that other articles claim about Debezium is better here than pointing to the realizations of the reader? Or just summarizing that the presented options for running Debezium make a more flexible solution?
There was a problem hiding this comment.
I believe the latter is the better approach. I'd rather focus on giving the reader information, "hey, in case you aren't aware" mentality. At the end of the day, it's their decision, but we'd want them to be as informed as possible before making it.
vjuranek
left a comment
There was a problem hiding this comment.
Left a few comments, but really nice blog post!
|
|
||
| What _does exist_ is comprehensive metrics exposure. | ||
| Debezium exposes a rich set of JMX metrics covering connector status, transaction log position, event counts, processing rates, and lag. | ||
| When paired with **Prometheus JMX Exporter** and **Grafana**, you get production-grade observability with dashboards the community has built and shared. |
|
|
||
| **Your team has no existing JVM or distributed systems operational experience**. | ||
| Debezium runs on the Java Virtual Machine (JVM). | ||
| If your operations team has no familiarity with that ecosystem, there will be a learning curve. |
There was a problem hiding this comment.
I'm afraid Chris' point about JVM knowledge still applies here, because you still need to run JVM and eventually do the troubleshooting (in case of integration with python, not so frequently used, I guess you would need to debug or at least report bugs more often than when you just run e.g. Debezium server).
Maybe I'd consider to mention here compiling Debezium into native executable using GraalVM to avoid installing and using JVM.
| From there, Debezium participates in the standard Quarkus application lifecycle. | ||
| Configuration lives in `application.properties`, alongside the rest of your application's configuration. | ||
| Change events are consumed as CDI events. | ||
| The entire Quarkus developer experience, including dev services that spin up databases automatically for local development, is available. |
There was a problem hiding this comment.
Here I'd mention possibility to use Debezium even in Python stack using PyDebezium engine (not at the part about operating JVM stack, see my comment bellow).
| * Filtering events by table or operation type | ||
| * Masking or replacing sensitive column values | ||
| * Converting data types | ||
| * Adding metadata fields to events |
There was a problem hiding this comment.
Could be worth to mention AI SMTs?
| **This claim hasn't been true for years**. | ||
|
|
||
| Debezium is a Change Data Capture (CDC) platform. | ||
| Apache Kafka Connect is _one way to deploy Debezium_, and for teams that already run Kafka, it remains an excellent choice. |
There was a problem hiding this comment.
Worth a note on the rich ecosystem of Kafka Connect (sink) connectors you can tap into that way, as well as HA, history storage, etc. pp. I.e. don't feed into the narrative "Kafka is a burden", but rather line out the advantages.
| One process. | ||
| No Zookeeper. | ||
| No Kafka brokers. | ||
| No Kafka Connect cluster setup. |
There was a problem hiding this comment.
Punchy framing, but I think it needs rework. Zookeeper hasn't been a thing for many Kafka users for a while now, so that's a bit of a strawman. As for Kafka and KC, you also lose something when not using them (see above). The way I've always thought about DBZ Server is that it gives you connectivity with other streaming solutions (which ofc. will have their own operational toil).
| A minimal `docker-compose.yml` for capturing changes from PostgreSQL and emitting them to a sink of your choice looks like this: | ||
| [source,yaml] | ||
| ---- | ||
| services: |
There was a problem hiding this comment.
To be fair, this is glossing over history storage (for certain connectors) and offset storage?
| The complexity argument conflates the inherent requirements of CDC (which any tool must address) with the operational overhead of Debezium specifically (which has been substantially reduced). | ||
| Those are different things. | ||
|
|
||
| There is also a deployment path that alternative articles almost never mention at all. |
There was a problem hiding this comment.
Is this meant as a lead-in into the next section? If so, needs some clarification like "let's take a look at this next". Right now, it feels very disconnected.
|
|
||
| #### For Java developers: the Debezium Quarkus Extensions | ||
|
|
||
| If you are already building Java applications, there is a fourth option that sidesteps the infrastructure question entirely: the **Debezium Quarkus Extensions**. |
There was a problem hiding this comment.
"Fourth"? So far we discussed Kafka and DBZ Server, what's the third one?
| #### For Java developers: the Debezium Quarkus Extensions | ||
|
|
||
| If you are already building Java applications, there is a fourth option that sidesteps the infrastructure question entirely: the **Debezium Quarkus Extensions**. | ||
| These let you embed Debezium directly inside any Quarkus-based application, running CDC as part of your existing service rather than as a separate piece of infrastructure. |
There was a problem hiding this comment.
Needs a link and sentence about what Quarkus is.
| If you are already building Java applications, there is a fourth option that sidesteps the infrastructure question entirely: the **Debezium Quarkus Extensions**. | ||
| These let you embed Debezium directly inside any Quarkus-based application, running CDC as part of your existing service rather than as a separate piece of infrastructure. | ||
|
|
||
| For Java developers, this is arguably the most natural entry point into Debezium that exists. |
There was a problem hiding this comment.
I don't know about this; it really depends on the use case, doesn't it. Like, if I want to stream events from Postgres to my datalake, going through a Java-based service probably isn't the natural choice.
| The setup burden looks very different when Debezium is just another dependency in a project your team already knows how to build, test, and deploy. | ||
| There is no separate process to operate, no separate configuration format to learn, and no context switch between your application and your CDC pipeline. | ||
|
|
||
| For teams building microservices in Java, especially those implementing the outbox pattern for reliable event publishing, the Quarkus extensions deserve serious consideration before reaching for a separate CDC tool. |
There was a problem hiding this comment.
Yes, that's a great point!
| * Converting data types | ||
| * Adding metadata fields to events | ||
|
|
||
| For teams that need heavier in-flight processing, such as complex joins, aggregations, conditional routing across multiple topic streams, the right answer is integrating Debezium with a stream processor like **Apache Flink** or **Kafka Streams**. |
There was a problem hiding this comment.
| For teams that need heavier in-flight processing, such as complex joins, aggregations, conditional routing across multiple topic streams, the right answer is integrating Debezium with a stream processor like **Apache Flink** or **Kafka Streams**. | |
| For teams that need heavier in-flight processing, such as complex joins, aggregations, conditional routing across multiple change event streams, the right answer is integrating Debezium with a stream processor like **Apache Flink** or **Kafka Streams**. |
| Debezium Platform is gaining native metrics and monitoring support, built in, not bolted on, so that operators will have first-class observability with the ability for Debezium to provide the entire stack for you. | ||
| That work is in progress, and it reflects the project taking this feedback seriously. | ||
|
|
||
| "Flying blind" is not an accurate description of what users have available today. |
There was a problem hiding this comment.
This sounds a tad too defensive IMO.
gunnarmorling
left a comment
There was a problem hiding this comment.
Good stuff, @Naros! A few comments inline. Another common topic is "Debezium is slow, and we're much faster". Also worth mentioning, although I'm not quite sure about the right angle; Ideally, we'd have some throughput numbers to drive home that point.
What about framing this around the new chunk-based initial snapshot feature? As Jiri pointed out in chat, we need to clarify that Debezium is not designed to be, nor will it ever be, as efficient as a vendor-specific data-dumping tool for mass replication. However, if you want to use CDC tooling, Debezium has invested in the new parallel, chunk-based table snapshot feature. The nice part about this angle is that we aren't talking about database performance, so we sidestep concerns about publishing numbers with any database vendor, because we're strictly focusing on how we re-engineered Debezium for higher throughput and comparing old and new throughput. The only downside is that this isn't focused on the streaming side of the house, but I think, from a terms-of-use PoV, we're a bit limited in publishing benchmarks with many database vendors. @gunnarmorling wdyt? |
| - DEBEZIUM_SINK_KINESIS_REGION=us-east-1 | ||
| ---- | ||
|
|
||
| That is the entire infrastructure. |
There was a problem hiding this comment.
That is the entire infrastructure.
One container.
Configure your source connection and your sink destination, and you have a complete CDC pipeline.
That is the entire infrastructure: configure your source connection and your sink destination, and you have a complete CDC pipeline with one unit of deployment

No description provided.