Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

Jan 19, 2021Read Time: 4 min

Better to Be Wrong Than Vague: Apache Kafka and Software Architecture Predictions for 2021

Written By

Tim BerglundVP of Developer Relations

Jan 19, 2021Read Time: 4 min

On a recent episode of Streaming Audio, Gwen Shapira, Michael Noll, and Ben Stopford joined me to hold forth about the near future of Apache Kafka^® and software architecture in general. House rules were that predictions could cover any topic, but they had to be “precise” in the spirit of Bob Metcalfe, who, back in the 90s, famously predicted a particular day the dot-com bubble would pop, under the theory that it was better to be wrong than vague. At least we all felt pretty sure that our predictions couldn’t possibly be worse than the ones being made at the same time in 2019.

10 million partitions in a single production Kafka cluster

We began, fairly enough, with Kafka itself. Gwen started us off by predicting that by the end of the year it will be possible to run a Kafka cluster with 10 million partitions, facilitated by some consequential architectural changes: KIP-405 (Tiered Storage) and KIP-500 (ZooKeeper removal). These KIPs enable growth in the number of partitions by moving data out of the cluster proper and enabling metadata to be managed in a more scalable and robust way.

Double the size of a Kafka cluster in seconds

Ben predicted being able to double the size of a Kafka cluster in seconds, a task specifically enabled by Tiered Storage. Tiered Storage is often understood simply as a cost-savings and storage play, which is fair enough. People using Kafka as a system of record tend to want longer retention periods, and Tiered Storage is an obvious enough improvement to the economics of that architecture, but it doesn’t stop there. You also get quick autoscaling. Because so much state gets offloaded to your friendly neighborhood cloud object store, when you go to scale brokers, there is significantly less data to move around.

Another architectural benefit is a potential performance boost: data in the object store tier is accessed over the network, with the presumption that it is accessed less frequently than data still on disk. If one plays one’s cards right, that local hotset can fit entirely into the broker’s page cache, making I/O on the hotset a vastly faster proposition. Remember, when it comes to data access patterns, the power law works for you; you don’t work for it.

Streaming everywhere

Event streaming Michael’s prediction, for which there was consensus (see what I did there, KIP-595?), was the continued growth of Kafka-like streaming features in products across the data landscape: from relational stalwarts like Oracle, to Redis, to traditional messaging systems like RabbitMQ. Users have increasingly come to expect features that will let them work with real-time, unbounded datasets, and vendors tend to notice things that users expect.

Given that this transition to understand systems “events first” is well underway and already looks rooted in the emerging software architecture consensus, I will see Michael’s prediction for the year and raise him another couple of decades: I predict event streaming will be seen as the dominant paradigm of this generation’s software architectures.

Multi-paradigm products

So it’s clear that event streaming is happening, but another question is how it can best be added to existing database products, since many existing tools came to life before this paradigm was yet a thing. To begin with, companies each have their own idea for how streaming should even be defined, as Michael has seen firsthand with his work on the committee writing the SQL standard’s streaming extension. And it can be hard to retrofit an existing product built under batch- or state-oriented assumptions, particularly when one wants to operate it at scale. As Gwen pointed out, it may even require completely new data structures to make a truly successful multi-paradigm solution.

As a side note, the broader Kafka ecosystem is making its own claim on multi-paradigm status, since it began with streaming and later added ksqlDB, which brings database concepts and SQL itself into an event-driven system.

Conclusion

So our money is on the table. I must say that I would be surprised if when I’m starting to roll into my Christmas playlist in October of 2021, streaming isn’t even more on the minds of those in the industry than it is now. I don’t know that it will be completely mainstream, but if you’re not doing it already by then, or at least thinking seriously about it, you might start to feel a bit behind the zeitgeist. It would also be surprising if by that same time the effects of KIP-500 and the rest of the gang haven’t started to make their mark in the community’s collective imagination, as we continue to think about what we might build with Kafka next.

Interested in more?

If you want to hear the episode for yourself, have a listen to Streaming Audio and make sure to subscribe through Apple Podcasts or wherever fine podcasts are sold.

Listen Now

Tim serves as the VP of Developer Relations at Confluent, where he and his team work to make streaming data and its emerging toolset accessible to all developers. He is a regular speaker at conferences and a presence on YouTube explaining complex technology topics in an accessible way. He lives with his wife and stepdaughter in Mountain View, CA, USA. He has three grown children, three step-children, and four grandchildren.

Did you like this blog post? Share it now

Unlocking Data Insights with Confluent Tableflow: Querying Apache Iceberg™️ Tables with Jupyter Notebooks

Apr 11, 2025

This blog explores how to integrate Confluent Tableflow with Trino and use Jupyter Notebooks to query Apache Iceberg tables. Learn how to set up Kafka topics, enable Tableflow, run Trino with Docker, connect via the REST catalog, and visualize data using Pandas. Unlock real-time and historical an...

Italo Nesi

Shifting Left: How Data Contracts Underpin People, Processes, and Technology

Apr 2, 2025

Explore how data contracts enable a shift left in data management making data reliable, real-time, and reusable while reducing inefficiencies, and unlocking AI and ML opportunities. Dive into team dynamics, data products, and how the data streaming platform helps implement this shift.

Andrew Jones

Better to Be Wrong Than Vague: Apache Kafka and Software Architecture Predictions for 2021

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Written By

10 million partitions in a single production Kafka cluster

Double the size of a Kafka cluster in seconds

Streaming everywhere

Multi-paradigm products

Conclusion

Interested in more?

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Did you like this blog post? Share it now

Unlocking Data Insights with Confluent Tableflow: Querying Apache Iceberg™️ Tables with Jupyter Notebooks

Shifting Left: How Data Contracts Underpin People, Processes, and Technology

10 million partitions in a single production Kafka cluster

Double the size of a Kafka cluster in seconds

Streaming everywhere

Multi-paradigm products

Conclusion

Interested in more?

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Did you like this blog post? Share it now

Subscribe to the Confluent blog

Unlocking Data Insights with Confluent Tableflow: Querying Apache Iceberg™️ Tables with Jupyter Notebooks

Shifting Left: How Data Contracts Underpin People, Processes, and Technology