Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more
AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service. Lambda functions and Kafka topics can be combined to build scalable event-driven architectures that can fit many use cases across almost any industry. There are two common ways customers can use AWS Lambda for processing messages on a Kafka topic. Confluent (a validated AWS Lambda Ready partner) natively supports using Lambda as a sink connector from topics. AWS Lambda also supports Kafka as an event source through its service integration. In this blog, we will look at some of the high-level concepts of both integration patterns and explain when to use each, with best practices and a reference use case that is best suited for each pattern.
AWS Lambda can consume Kafka messages using two patterns. The first one is native to the Confluent Kafka ecosystem and entails invoking the AWS Lambda function using the Lambda invocation APIs. The second one is native to AWS, which configures Kafka as a trigger to the Lambda function by configuring an event source mapping.
Both patterns work well for most use cases, though there are some considerations when making your decision. The Lambda Sink Connector allows you to invoke Lambda functions synchronously or asynchronously, which helps prevent stalled partitions. If you have a consistent workload and wish to invoke asynchronously or want a dead-letter queue, choose the Lambda Sink Connector. Event source mapping can enable the automatic invocation of the Lambda function synchronously when events occur. And Lambda monitors the load through the consumer offset lag and automatically scales the number of consumers based on your workload. So in use cases like auto-scaling, use event source mapping.
Let’s review each of the patterns.
This pattern is based on the Kafka Connect architecture and provides an easy way to send messages from a Kafka topic to different downstream systems. Confluent natively supports this connector and customers can provision it from the Confluent Cloud console. They can also use Confluent CLI/API to provision the resources. Documentation on Kafka Connect has additional details on the architecture.
The sink connector supports two invocation modes for the Lambda function—synchronous and asynchronous. For synchronous invocations, you can invoke parallel tasks using the task parameter. Asynchronous innovations allow batching and invoking multiple Lambda functions in parallel. The key characteristic to note is that if you need to process messages in sequence across different partitions, then use the synchronous connector setting with the tasks parameter. If the goal is to process messages in a fire-and-forget manner, then the asynchronous setting gives the highest throughput. By using the asynchronous mode, the connector triggers Lambda functions without waiting for a response, resulting in delivery guarantees of at-most-once. Any errors can be caught in the function and sent to a dead letter queue (topic) in Confluent. A real-life example of this would be in Healthcare, imagine multiple providers using a unified system for billing, we need to ensure that events (e.g. debits and credits) happen in the right order.
In addition to asynchronous invocation and the dead letter queue, there is a host of Connect Usability features such as SMTs (Single Message Transforms) for in-flight transformations, Data Preview, Logs, Notifications, and Observability metrics around Connector health, which are all applicable for the Lambda Sink Connector.
In this pattern, you’ll configure Apache Kafka as a trigger for the Lambda function. To configure the trigger from the AWS Lambda Console, select the self-managed Kafka option as the source. Documentation on the Lambda event source mapping (ESM) provides details on the configuration settings. The key aspect of this integration is that Lambda event source mapping does the polling of the Kafka topic based on the offsetLag metric and invokes the target Lambda function. Pollers in the ESM can invoke concurrent Lambda functions if the offsetLag continues to grow because of the inbound flow of messages. Each Lambda instance processes messages across different partitions. (There’s no additional cost for the polling infrastructure.) Both the synchronous Lambda sink connector and AWS Lambda ESM support the processing of messages in order.
The AWS Cloud Development Kit (AWS CDK) provides an AWS Lambda event source module for Kafka. As an example, the following code sets up a self-managed Kafka cluster as an event source. Username and password-based authentication will need to be set up as described in Managing access and permissions.
The healthcare industry has several use cases that use messaging as part of its operations. One such use case is in hospitals or healthcare facilities, which use a messaging backbone to integrate their electronic health record (EHR) system with various clinical systems such as labs, radiology, dispensing, billing, transcription, and various patient health monitoring systems. Traditionally, this data uses the health level seven (HL7) V2 format for the ADT message that’s triggered upon patient admission. These require event ordering to be preserved for most scenarios, such as admission, discharge, and transfer-type operations. For example, a lab system must not process a patient discharge message before the patient admission message. Apache Kafka is well-suited for sending such messages. At a high level, this is the solution architecture:
In addition to the EHR system, each department in a hospital may have its own specialized system, like a lab system, a drug dispensing system, or a radiology system. The ADT message that is generated from the EHR needs to be sent to all such systems so they have a record of the admitted patient. A medical record number(also known as a MRN) is used to connect all the systems. Each such system also has a specific implementation guide for exchanging data, meaning that the ADT messages need to be mapped specifically and with specific code sets.
Messages need to be processed in order for a specific patient and some messages can be in the order of a few MBs. As new clinical systems are added, there may be a need to build the state of a patient record into those systems as well. Each of these requirements makes Apache Kafka a good fit for a message broker for exchanging such messages. The pub-sub capability of a Kafka topic can broadcast messages to all downstream systems that need to be notified.
However, we still need a computing service that can process each of the individual messages. The compute service needs to transform the message based on the target system and deliver it to a topic assigned for the target system. HL7 messages are typically sent or received using a protocol known as the minimal lower layer protocol (MLLP). The compute service needs to deliver the message to the target system using the MLLP. The stateless nature of the processing AWS Lambda is a good fit for this processing.
In the supported integrations with Lambda, business requirements would drive synchronous processing of the messages. It should also preserve the order for a specific patient, though you can process messages across patients in parallel. A native AWS integration using a Lambda ESM would be the easiest to adopt, and you only need to pay for the message processing.
For any organization, an email marketing strategy is an integral part of its marketing services. Traditionally, setting up solutions for mass mailing requires hardware expenditure, license costs, and technical expertise.
With AWS Lambda and the Amazon Simple Email Service (SES), users can build a cost-effective and in-house serverless email platform. Along with using S3 (where the mailing list will be stored) users can quickly send HTML or text-based emails to a large number of recipients. Whenever a user uploads a CSV file, it triggers an S3 event. This event triggers another Lambda function which imports the file into the database and will start sending emails to all the addresses. Assuming you have a list of emails in the Kafka topic you can invoke the Lambda function, which will further use SES to send the emails as desired.
Confluent’s AWS Lambda Sink Connector now supports multiple Lambda functions invocation. It means users can invoke multiple functions using a single instance of the connector, thus reducing the overall cost to run these connectors.
Confluent Cloud offers elastic scaling and pricing that charges only for what you stream. To learn more about Confluent Cloud, sign up for an account and receive $400 USD to spend during your first 30 days.
This blog explores how cloud service providers (CSPs) and managed service providers (MSPs) increasingly recognize the advantages of leveraging Confluent to deliver fully managed Kafka services to their clients. Confluent enables these service providers to deliver higher value offerings to wider...
With Confluent sitting at the core of their data infrastructure, Atomic Tessellator provides a powerful platform for molecular research backed by computational methods, focusing on catalyst discovery. Read on to learn how data streaming plays a central role in their technology.