Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more
This a summary of Podium’s technological journey and an example of how our engineering team is tooling ourselves to scale well into the years to come. Here at Podium, we’re keenly focused on providing messaging tools to help local businesses modernize the way that they communicate with their customer base. What’s now become a full-blown end-to-end product suite servicing industries ranging anywhere from healthcare to home services started out much simpler, with business reviews management focused on automotive dealers.
In the early days, fresh out of Y Combinator, Podium frequently looked to the essays of Paul Graham as an instruction manual for how to thrive as a startup. Do Things That Don’t Scale was a fan favourite within the engineering org. That mantra evolved over time into an internally sourced one of “Ship It and See What Happens,” but the point remained the same: don’t over-engineer.
Our CTO likes to use a car analogy to describe Podium’s requisite evolution as we scale. The tooling has to change to fit the circumstances.
In the early days of the company, the tech stack was akin to the composition of a Toyota Camry. The engine, transmission, axles, and brakes of a Camry are all designed to handle the low-grade stress expected to be placed upon them via the daily commute or a quick errand. Podium’s engineering “Camry” was a monolithic Rails app, single Postgres database, and outsourced DevOps.
Podium’s infrastructure has evolved over the years to include a microservices architecture that runs within Kubernetes and distributed data tools like Apache Airflow and those found within the Apache Kafka® ecosystem. These tools allow us to handle the system load consisting of millions of conversations happening simultaneously on our platform. We started sprinkling in Kafka about three years ago, initially for simple queuing use cases, but it has now become a foundational piece of the infrastructure that we are using to build the engineering equivalent of a Ferrari.
Podium now uses the Kafka ecosystem everywhere within our infrastructure. Examples include:
Podium has done a great job over the years of delaying refactors until the indicators say that the performance gains from a refactoring will be worth it to our customers. We recently hit a critical mass of indicators signaling that our conversation search needed to speed up, so we put a brief hold on the existing roadmap and prioritized the refactoring at the top of the list.
The existing search infrastructure still used a Postgres read replica to query and then surface results. The result set was slow to return and oftentimes surfaced unhelpful search results for the customer. For these reasons, we wanted to move to a solution that used Elasticsearch. However, any latency in a conversation’s availability in Elasticsearch was unacceptable, so a streaming solution seemed requisite—specifically, the combination of Kafka, Kafka Connect, ksqlDB, and Kafka Streams.
We first evaluated the “bookends” of the pipeline, defining what we wanted the records to look like when they were placed into Elasticsearch at the bottom of the pipeline. Then, we evaluated the contents of the database tables at the top of the pipeline to see which tables would need to be used.
The resulting records in Elasticsearch needed to look like this:
{ "user_uid": "ff3b56b0-264d-4346-a27f-1cf1e3b76c79", "conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26", "step": "needs_review_invite" }
The tables that mattered (users, conversations, and conversation_items) were sent to Kafka via change data capture using the Debezium PostgreSQL CDC Connector (io.debezium.connector.postgresql.PostgresConnector).
A conversation_item can be one of three unique types:
One of Podium’s features helps brick-and-mortar businesses collect reviews from their customers to place on platforms like Google reviews.In the example provided, Jane is a customer of Exeter Insurance and just left their office. Jane will receive a text message from the team at Exeter thanking her for her time and asking for a review. Jane leaves a five-star review about Exeter Insurance on Google. Exeter’s team will be notified of the new review, and they can respond to the review within Google or to Jane herself, all from within the Podium platform.
When Exeter’s employees are using the search functionality of Podium, they can filter conversations based on whether they “need a review invite” or “need review response.” That is where the “Calculate step” comes into play in the data pipeline. The messages flow into the “Calculate step” transformer looking like this:
{
"uid": "656071ea-a214-4794-8047-5da1f769e779",
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"type": "message"
}
{
"uid": "1b9020ec-7eb8-4173-a990-27335a025f87",
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"type": "review_invite"
}
{
"uid": "6702a109-34a7-41a8-a576-a930cf76293a",
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"type": "review"
}
We aggregate all the messages for a given conversation_uid to ultimately output messages that look like the following:
{
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"step": "needs_review_invite"
}
{
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"step": "needs_review"
}
{
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"step": "needs_review_response"
}
{
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"step": "finished"
}
We joined the records in the users topic to the records in the conversations topic via the user_uid foreign key reference existent on each conversations topic record. We then joined the records in the users.joined.conversations topic to the records in the formatted.conversation_items topic via the converation_uid foreign key reference existent on each formatted.conversation_items topic record.
Lastly, we only want “open” conversations available for search. You can think of a “closed” conversation similar to an archived email. This is where the closed_at on the conversations topic’s records comes into play.
In the “Maybe tombstone” transformer, the records flowing in will look like this:
{
"user_uid": "ff3b56b0-264d-4346-a27f-1cf1e3b76c79",
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"conversation_closed_at": null,
"step": "finished"
}
{
"user_uid": "ff3b56b0-264d-4346-a27f-1cf1e3b76c79",
"conversation_uid": "72259851-6a25-4f7a-b8c7-8cdbfc0f1b26",
"conversation_closed_at": "2020-07-19 10:23:54",
"step": "finished"
}
If a record’s converation_closed_at timestamp is not null, then instead of passing through the record, we will want to pass through a tombstone (i.e., a record with the same key as the record that came in but a null value). This will tell the ElasticsearchSinkConnector that it needs to send a DELETE to the Elasticsearch instance and remove the record for that key, making that conversation unsearchable from the customer’s point of view.
Kafka and its ecosystem have helped Podium decouple our infrastructure and upgrade system performance when signals coming from our customer base reveal an issue within the existing architecture. Kafka has helped Podium gradually upgrade our metaphorical auto parts from a Camry to Ferrari, while still optimizing for feature speed to market. This approach has made it a joy to “Ship It and See What Happens.”
If you would like to learn more, check out Confluent Developer, the largest collection of resources for getting started with Kafka, including end-to-end Kafka tutorials, videos, demos, meetups, and podcasts.
Skai completely revamped its interactive, ad-campaign dashboard by adding Apache Kafka and an in-memory database—eventually moving the solution to Confluent Cloud. Once on the Cloud, they devised an ingenious architecture for reducing the number of topics they needed.
Turning events into outcomes at scale is not easy! It starts with knowing what events are actually meaningful to your business or customer’s journey and capturing them. At Confluent, we have a good sense of what these critical events or moments are.