Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more
Today, as part of our expanded partnership with Elastic, we are announcing an update to the fully managed Elasticsearch Sink Connector in Confluent Cloud. This update allows you to take advantage of a new indexing strategy from Elasticsearch known as data streams. An Elastic data stream lets you store append-only time series data across multiple indices while giving you a single named resource for requests. It is an abstraction on top of regular indices, which stores data in hidden indices. With this new update, Confluent’s Elasticsearch Sink Connector can read data from Apache Kafka® topics and write them to Elasticsearch data streams.
Confluent’s Elasticsearch Sink Connector offers the following benefits:
To better understand the data streams supported in Elasticsearch Sink Connector, let’s use an example of a fictitious company called Health Corporation, which is in the business of hosting and managing electronic health records for various hospitals. They host various types of web servers, software load balancers, firewalls, and storage solutions as a part of their cloud infrastructure. They also host a lot of custom applications across various lines of business in the cloud.
Considering that they are hosting extremely sensitive data related to patients and their medical history, they have a huge responsibility to keep the data secure. Hence, they’ve decided to implement a SIEM solution.
While implementing a SIEM solution involves multiple challenges, we are going to focus on the following challenges faced by Health Corporation:
The diagram shown below provides a high-level overview of the solution. In this scenario, time series based events data from web servers and network firewalls is captured in two different topics in a Kafka cluster within Confluent Cloud. Confluent’s fully managed Elasticsearch Sink Connector then reads this data and securely writes it to Elasticsearch Cloud as two different Elastic data streams based on the type, dataset, and namespace configured within the connector.
Let’s do a walkthrough of some of the key steps followed by Health Corporation to deploy a successful SIEM solution across Confluent Cloud and Elasticsearch Cloud.
There are multiple ways in which Health Corporation can capture event-based time series data into Kafka topics within Confluent Cloud. They can use various connectors to put the data at rest into data in motion by pushing it into Kafka topics. Data from those Kafka topics can then be sent to Elasticsearch for search, analysis, anomaly detection, and dashboarding. If the data is stored in AWS CloudWatch, they can use Confluent’s self-managed AWS CloudWatch Logs Connector to read data from AWS CloudWatch and write it to Kafka topics within Confluent Cloud.
For this demo, we will be using Confluent’s fully managed Datagen Source Connector to populate events-based time series sample data into the topics.
Log in to Confluent Cloud and create two topics in your Kafka cluster, apachewebserver and barracudafirewall, to capture log data from Apache HTTP Server and Barracuda CloudGen firewall.
After the two topics apachewebserver and barracudafirewall are created, you can see them on your “Topics” page:
Next, we’ll use the fully managed Datagen connector to populate data in the apachewebserver topic, then follow the same steps to populate the barracudafirewall topic.
Once the data is in Kafka topics, Health Corporation can configure and launch Confluent’s fully managed Elasticsearch Sink Connector in Confluent Cloud. The Elasticsearch Sink connector reads data from multiple topics and writes them as different data streams within Elasticsearch Cloud based on the type of data and data set. Let’s walk through how Confluent’s fully managed Elasticsearch Sink Connector can be configured to write to data streams.
After logging into Confluent Cloud, click on Connectors on the left hand side and search for “Elastic.” Select the Elasticsearch Service Sink Connector, which is marked as “Fully Managed.”
Next, configure the Elasticsearch Sink Connector so that the two topics created in step 1 can be written to Elasticsearch as two different data streams. Below is a quick walkthrough of the steps.
Fill in all of the configurations as detailed below before launching the connector. You can also look at the configurations listed in the code snippet section.
Once you enter all the configurations and click on Next button, the code configurations will be displayed as shown below:
{ "name": "siem-monitoring", "config": { "topics": "apachewebserver, barracudafirewall", "input.data.format": "JSON", "connector.class": "ElasticsearchSink", "name": "siem-monitoring", "connection.url": "https://<host>:<port>", "connection.username": "<username>", "schema.ignore": "true", "tasks.max": "1", "data.stream.type": "logs", "data.stream.dataset": "infra", "data.stream.timestamp.field": "viewtime" } }
Once all of the configurations are entered properly, click Next and then click Launch to launch the connector. The connector first goes into the “Provisioning” state, and then once the connector is running successfully, it will display the “Running” state.
After the connector starts reading data from the topics, it will display the number of processed messages on the connector page. Click on the connector name shown in the picture above to navigate to the detailed connector page.
This completes all of the steps that need to be performed within Confluent Cloud. In the next step, we’ll validate that the data streams have been created in Elasticsearch Cloud by the Elasticsearch Sink Connector.
Developers and operators can log in to Elasticsearch Cloud and launch Kibana. Navigate to Home > Stack Management > Index Management. On the “Index Management” page, click on the Data Streams tab as shown in the picture below. It should display the two data streams created by the Elasticsearch Sink connector in step 2.
The Elasticsearch Data Streams functionality uses the following naming convention:
Format: type-dataset-namespace
Example 1: logs-infra-apachewebserver
Example 2: logs-infra-barracudafirewall
The flexibility to configure the type of data, dataset and namespace allows you to easily create different data streams instead of storing all the data in one Elasticsearch Index.
After ensuring the data is getting populated into Elasticsearch data streams, the Health Corporation team decides to build a dashboard using Kibana to get real-time visibility into threat detection and anomalies across all of their applications and infrastructure.
Building a dashboard in Kibana is a three-step process:
The Health Corporation team creates an index pattern, which lets them query and filter all the indices based on a prefix. Since this scenario is talking about Elasticsearch Data Streams, which are stored as hidden indices, you have to ensure that you enable the option to view hidden indices by checking the option to Include system and hidden indices.
In this example, the index pattern name is set to .ds-logs-infra-*, which matches the two data streams related to Apache HTTP Server logs and Barracuda CloudGen Firewall logs.
Once the index pattern is created, the Health Corporation team does a quick test by using the Discover functionality to ensure the index pattern is returning the right type of data from the underlying indices of the data streams.
In this step, the Health Corporation team builds a visualization based on the index pattern created in the previous step.
Navigate to Home > Visualize > Create Visualization, and create your visualization as shown below. Ideally, this visualization provides real-time visibility into any anomalies detected during the user logins or user interactions, but because we are using sample data in this example, we will just be displaying user information in a pie chart.
In this step, the visualization is added to a dashboard.
This concludes the final step of the demo. Health Corporation now has a SIEM system up and running across Confluent Cloud and Elasticsearch Cloud.
In this example, we saw how Health Corporation accomplished the following:
If you’d like to do the same as well, you can get started by signing up for a free trial of Confluent Cloud and start using the Elasticsearch Sink Connector or any other fully managed connector. You can use the promo code CL60BLOG for an additional $60 of free usage.*
We covered so much at Current 2024, from the 138 breakout sessions, lightning talks, and meetups on the expo floor to what happened on the main stage. If you heard any snippets or saw quotes from the Day 2 keynote, then you already know what I told the room: We are all data streaming engineers now.
We’re excited to announce Early Access for Confluent for VS Code. This Visual Studio integration streamlines workflows, accelerates development, and enhances real-time data processing, all in a unified environment. This post shows how to get started, and also lists opportunities to get involved.