Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more
From locomotive analytics companies tracking predictive maintenance data, to digital-native companies tracking semi-trailer trucks’ movement across America’s highways, or municipalities using 100’s of buses to move people around their city, Confluent has seen it all when it comes to collecting IoT (Internet of Things) data. IoT-centric companies are struggling to collect IoT data in order to optimize fleet management, reduce operating costs, improve safety and compliance, and enhance the customer experience.
One thing remains the same — all of these companies rely on one to many IoT devices (commonly 1,000’s to 100,000’s) to collect high volumes of data in a central, cloud repository for processing, warehousing, and delivery back to their platform applications. The hunger for consuming and delivering this real-time data has reached an all-time high. In the trucking industry, I typically see IoT devices such as GPS tracking, ELDs (Electronic Logging Devices), Tire Pressure monitoring systems, and maintenance devices.
This was especially true for one customer I worked with in the logistics industry, whose entire business relied on delivering customer analytics back to their clients. The organization focused on ingesting data from their client’s semi fleet of around ~30,000 vehicles. These semi businesses were required to monitor their drivers for compliance reasons, but also saw this as a good opportunity to analyze their fleet for predictive maintenance and route optimization.
Collecting real-time IoT data can present several technical challenges, some of which we will outline below:
Data volume: IoT devices typically generate a large amount of data across many different devices. Scaling this amount of data to 1,000’s or 100,000’s of devices makes this problem exponentially more difficult. The sheer volume of data generated by IoT devices can be overwhelming, making it challenging to store, process, and analyze in accordance with your platform's SLAs.
Data velocity: IoT devices generate data in real time, which requires real-time processing and analysis. This organization regularly received batches of IoT data which took on average 120 minutes to process. This hindered the organization's ability to react to the data.
Data variety: IoT devices generate data in various formats, such as text, audio, and video. This can make it challenging to collect, process, and analyze the data effectively.
Data quality: IoT devices can generate noisy and incomplete data, which can affect the accuracy of the analysis. It is essential to ensure that the data collected is of high quality and follows an agreed upon format.
The organization we were working with saw significant consequences focused in two areas: operational disruption and loss of trust. Incomplete, missing, or significantly delayed IoT data can lead to loss of trust in your platform by its users or revenue losses from compliance fines. When the organization received an influx of data, RabbitMQ required them to manually horizontally scale via VM’s. If they did not horizontally scale, their environment would, worst-case, break or, best case, take hours to process.
This led to the decision to consider other technology solutions — specifically data streaming and Confluent. Data streaming was an easy choice for developers and operations teams across the organization because it would help reduce operational disruption from RabbitMQ. The ability to horizontally scale prevents unnecessary downtime and ensures their customers do not see any data platform interruption.
This was accompanied by Schema Registry, which helped ensure data quality in Confluent by enforcing data schema compatibility between different components of their systems. Not only that, Schema Registry also provided the ability to validate schemas, track their evolution over time, and enforce data governance. Data streaming also opened the organization up to multiple stream processing technologies such as Kafka Streams, Flink, and/or Spark. These technologies help organizations, regardless of industry, process data in flight while reducing latency — a win-win.
Confluent’s streaming platform can’t solve these problems alone. MQTT and Kafka are two popular technologies used for IoT data collection and processing. MQTT is a lightweight messaging protocol that is widely used for IoT data collection. It is designed to work with low-power devices and unreliable networks. Here's how you can use them together:
Use MQTT to collect data: You can use an MQTT client to collect GPS and ELD data from IoT devices and send it to an MQTT broker.
Integrate MQTT and Confluent: Confluent has a fully-managed MQTT connector that is perfect to easily integrate your MQTT broker with a Confluent Cloud cluster.
Process with Confluent: Stream processing is a core piece of distributed systems and pivotal to quickly process this IoT data in flight rather than in a batch method. As an example, you can easily filter out trucks that have accumulated 1,000’s of miles and could require maintenance soon.
Downstream Delivery: Once the data is in Confluent, you can use various tools to move data to the right place for the right job. For example, you can use a connector to AWS S3 for long-term storage or Snowflake for data warehousing.
Overall, using MQTT and Confluent together can help you collect and process IoT data efficiently and effectively. By using the right tools for collecting and processing IoT data, the organization saw their platform become more reliable to their customer base by reducing the downtime associated with RabbitMQ. This stability improved customer sentiment and reduced the number of support tickets open for operational disruption, which meant their technical teams could focus more on delivering platform enhancements vs. continuous downtime fire drills.
More importantly, this change in technology allowed them to capture new data streams for new revenue streams. A great example of this is the organization offering a new route optimization module that allows their customers to plan out multi-route deliveries to save fuel costs and optimize driver hours. The example below shows how this organization set up their architecture to capture data with MQTT, process and store with Confluent, and then deliver back to their data platform for their clients.
Confluent Cloud is the perfect agnostic platform for handling IoT data because it is built on Apache Kafka, which is a highly scalable and reliable streaming technology. Confluent is designed to handle high volumes of data in real -time, making it well-suited for IoT use cases where large amounts of data volumes are generated by a variety of devices and sensors.
Confluent Cloud offers a wide-range of supported client libraries and pre-built connectors that make it easy to integrate with other systems and applications. Once we have that data produced to Confluent, it doesn’t end there. Stream processing and data governance tools are pivotal to making this ecosystem of technology production ready and future proofed for any scale. This allows organizations to build integrated solutions that can handle complex data processing and analytics requirements.
Confluent Cloud also provides a range of security features, including encryption at rest and in transit, access control, and audit logging. This helps ensure that data is protected from unauthorized access and that compliance requirements are met. These security features were pivotal for the organization as they needed to demonstrate security best practices before they could go into production with this data pipeline. While IoT truck data was not the most sensitive data in the world, this organization still put security at the forefront of design ideology.
I asked the logistics organization why Confluent Cloud was their first choice when looking at options to solve their data problem. They tested out many options such as RabbitMQ, AWS pub/sub, and AWS Kinesis to name a few, but none of them could handle data at the required scale in an efficient way. The organization’s scale could range from 0.5 MBps to 50 MBps in a matter of minutes, leaving little time for manual interaction. To them, Confluent was the only option to provide a holistic portfolio of features that demonstrated a mature platform. Everything from 24/7 support, a 99.99% uptime SLA, a deep portfolio of pre-built connectors, and even governance tools such as Stream Governance. To them, they didn’t see a better balance between future proofing their architecture for scale and price.
If your organization is handling any amount of IoT traffic and you’re looking for a better approach to collecting, processing, and storing data in real-time, I encourage you to learn more about what Confluent is doing in this area.
Visit the IoT solutions page for related use cases.
Check out our webpage on this particular use case for more details.
Kai Waehner’s blog about Confluent Cloud and Waterstream is a great place to start and includes many publicly referenceable companies using Kafka and IoT. Check that out here.
Review our developer site for anything and everything Confluent related. You will find documentation, workshops, courses, and more. This is the perfect space to start for anyone looking to get hands-on experience with Confluent.
This blog explores how cloud service providers (CSPs) and managed service providers (MSPs) increasingly recognize the advantages of leveraging Confluent to deliver fully managed Kafka services to their clients. Confluent enables these service providers to deliver higher value offerings to wider...
With Confluent sitting at the core of their data infrastructure, Atomic Tessellator provides a powerful platform for molecular research backed by computational methods, focusing on catalyst discovery. Read on to learn how data streaming plays a central role in their technology.