Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more
At Confluent, every now and then we like to take a day away from our normal sprint tasks to hack. There are a ton of benefits to hack days, including:
On our most recent hack day, the projects ranged from practical to eccentric. Let’s take a look at some of the highlights.
With many customers around the world and nearly 20 global offices, our solutions engineers often need to answer inquiries and questions in multiple languages (human languages, not coding languages).
Confluent uses Loopio as a knowledge base for tracking technical questions about our product. It works great as long as you query in English and expect the results to be in English. Since questions and queries are submitted in various languages from around the globe, enabling users to query/read in local languages is a big win. To that end, Confluent solutions engineers Brice Leporini, Sergio Duran, Ramón Marquez, and Remi Forest stood up a system that can continuously pipe Loopio’s corpus through Confluent Cloud, apply accurate translations, and use the Elasticsearch connector to store the translated corpus in Elastic Cloud. Then, they wired up a UI to translate queries and return the results in the desired language.
As users move to the cloud, there is an increasing need for clients to validate and verify that their data is handled without intentional or unintentional modification. When a client produces a message into an Apache Kafka® cluster, they may want proof that it is identical to the message they produced onto the cluster. This led Confluent security engineers Nitesh Mor and Mathew Crocker to ask, “Can we build a Kafka client that will let consumers cryptographically prove the integrity of our streamed data as generated by a producer?”
To solve the problem, they built a modified Python producer that creates cryptographic hashes and digital signatures and stores them in message headers. Hash chains are heavily used by a number of cryptographic systems, such as blockchains. When a producer writes a record, it includes cryptographic hashes for each message and one signature per message set. This additional information is stored in the message headers. On the other end, the consumer uses these hashes and signatures to validate that the records it is consuming arrive untampered, exactly as the producer sent them. When the team produces unexpected messages or uses a modified Kafka broker that tampers with the records in the topic, the consumer throws a VerificationError.
Compression works best when redundancies can be eliminated over and over again. Kafka producers can be configured to linger and accumulate a larger batch of data before compressing and sending it. This gives more opportunity for compression within each batch but at the cost of increased latency. Even linger-batching may not provide gains when you have a large fleet of producers, each producing small batches.
Kafka has built-in support for several compression codecs, including Zstandard, but it does not take advantage of Zstandard’s major feature: shared dictionaries. For data streams composed of mostly similar data, it is wasteful to have Zstandard continuously rediscover and encode redundant information for every single batch.
Confluent software engineer Chris Johnson found a way to automatically generate, save, and reuse Zstandard’s shared dictionaries in a serializer/deserializer library by backing the shared dictionaries to durable storage on a Kafka topic, resulting in better compression ratios while using less CPU. A test run using typical audit log events delivered a 2x improvement over the built-in gzip codec.
At Confluent, some of our teammates were tired of seeing (and paying for) cloud resources that were no longer in use. While there are numerous valid use cases for long-running clusters in pre-prod environments (soak testing, performance testing, etc.), the current method for tracking which clusters could be decommissioned was manual, inefficient to run, and infrequently run.
Project Roomba was a hack to automate the cleanup of our unused clusters. Software engineers Harini Rajendran, Joshua Buss, Krishnan Chandra, and Vivek Sharma built a reaper job in our mothership control plane orchestration to find old, unused resources, and delete them. Their strategy included multiple parts:
The project was successful and is now running in pre-prod environments!
Slack does not retain messages forever, and it’s hard to search. But a lot of us at Confluent would like to save useful threads for future reference. That’s why quality engineers Scott Hendricks and Srini Dandu and software engineer Atharva Pendse built Conflack, a tool that’s part Slack bot and part Confluence wiki.
Do you want to save a thread? Simply react to the thread with the Conflack emoji, and the whole thread is instantly saved to the team’s Confluence page. The thread can then be edited and reformatted at the user’s convenience.
One of the main constraints around this problem is delivering a service with low operational overhead, and cheap to run. The usage pattern would be lower volume of hits, and perhaps a lot of idle time throughout the night. Given these constraints, the team chose to implement their backends using a serverless architecture powered by Google Functions. They integrated the Slack API to trigger a Google Function call, which could then write to Confluence using the Atlassian API.
At Confluent, we’re building the world’s leading event streaming platform. Sometimes, that means taking some time to explore our technology and collaborate across team functions. Our hack days come to life when the enthusiasm for our products comes pouring out of our normal sprint work—they’re a purely bottom-up initiative around what we can discover and build when we collaborate across our normal team boundaries. If this sounds like the kind of place where you’d want to work and solve problems, check out our Careers page!
From joining Confluent as one of the first engineers on the security team to now managing a team of four, Tejal has had incredible opportunities to learn and grow during her six years at the company.
Let’s learn more about Tejal and how Confluent fosters an environment of constant learning—all...
A year in at Confluent, Product Manager Surabhi Singh has learned a lot about data streaming—and even more about herself. In this fast-paced environment, Surabhi is highly motivated and committed to her work strategically planning, coordinating, and delivering product improvements for customers...