Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

Introducing Confluent Cloud for Apache Flink

Written By

In the first three parts of our Inside Flink blog series, we discussed the benefits of stream processing, explored why developers are choosing Apache Flink® for a variety of stream processing use cases, and took a deep dive into Flink's SQL API. In this post, we'll focus on how we’ve re-architected Flink as a cloud-native service on Confluent Cloud. However, before we get into the specifics, there is exciting news to share.

As of today, Confluent’s fully managed Flink service is available for preview in select regions on AWS. We will be continuing to build out the offering and make it available to more regions and cloud providers during this preview phase. Check out the Flink quick start to see how you can try the industry's only cloud-native, serverless Flink service today. It is an exciting time to be a part of the Kafka and Flink communities, and we hope everyone takes advantage of this opportunity to try out the service.

Now let’s turn our attention to Confluent Cloud for Apache Flink. What is it? How is it different? Why should you care? 

What is Confluent Cloud for Apache Flink?

Simply put, Confluent Cloud for Apache Flink is Flink re-imagined as a truly cloud-native service.

Confluent's fully managed Flink service allows you to:

  • Effortlessly filter, join, and enrich your data streams with Flink, the de facto standard for stream processing

  • Enable high-performance and efficient stream processing at any scale, without the complexities of infrastructure management

  • Experience Apache Kafka® and Flink as a unified platform, with fully integrated monitoring, security, and governance 

Flink serves as the streaming compute layer for Kafka

When bringing Flink to Confluent Cloud, our goal was to provide a uniquely serverless experience beyond just "cloud-hosted" Flink. Kafka on Confluent Cloud goes beyond Apache Kafka through the Kora engine, which showcases Confluent's engineering expertise in building cloud-native data systems. Our goal is to deliver the same simplicity, security, and scalability for Flink as our customers expect for Kafka.

Let’s double-click into each of the key benefits mentioned above.

Filter, join, and enrich your data streams

As we mentioned in the first blog post of this series, stream processing plays a critical role within the data streaming stack. Flink serves as the streaming compute layer to your Kafka storage layer. It empowers developers to query and inspect data streaming into Kafka, along with functionality to enrich, curate, and transform those streams for improved usability, portability, and compliance. One of the great benefits of Flink is its ANSI standard implementation of Flink SQL—if you know SQL, then you know Flink SQL. 

Our Flink service takes Flink’s SQL API further by integrating the operational catalog with the rest of the Kafka ecosystem on Confluent Cloud. Customers with Kafka topics and schemas in Schema Registry will already have tables to browse and query in Flink SQL without having to wrangle with tedious CREATE TABLE statements or data type mappings that so often trip people up. By eliminating the need to duplicate operational metadata and retaining one holistic view of your data, your first query on Confluent Cloud can simply be a SELECT statement, lowering the barrier to exploration and making it easier to understand and build upon existing data streams.

With that said, submitting Flink SQL queries with open source Apache Flink is not straightforward or accessible to the broad community of users who know SQL due to operational complexity. Furthermore, CLI access isn’t for everyone. The goal at Confluent is to make Flink accessible for everyone!

Our Flink service also includes a new SQL editor and workspace experience that’s fully integrated with Confluent Cloud and our upcoming Data Portal for data discovery. You can seamlessly move from browsing topics to writing queries in seconds. It’s really that simple. 

Rich SQL editor experience in the Confluent Cloud UI

We’ll cover the SQL editor and workspace experience in detail in a future blog post.  Stay tuned!

Enable stream processing at any scale

To deliver a cloud-native experience at launch, we’ve focused on a few core principles:

  • Flink must be serverless 

  • Flink must be simple to use

  • Flink must be independently scalable from Kafka

This approach enabled us to offer a cloud-native service that is highly scalable, easy to use, and optimized for efficient resource utilization.

Serverless

Apache Flink has a cluster-based architecture that provides building blocks for elastic scalability, offers a consistent suite of polyglot APIs, and is supported by a vibrant developer community. However, as with open source Kafka, it's not all smooth sailing. Apache Flink can be very challenging to operate and manage on your own.

Developers must first evaluate the upfront cost of deploying and managing the framework before creating their first application. Next, clusters need to be maintained and the applications that run on top need to stay up-to-date with the framework, making upgrades to Flink painful. This had to change with our Flink offering—enter serverless Flink.

The term "serverless" itself can have many connotations. To us, it has three primary dimensions:

  • Elastic autoscaling with scale-to-zero

  • Evergreen runtime and APIs

  • Usage-based billing

Elastic autoscaling with scale-to-zero

On Confluent Cloud, Flink workloads scale automatically without the need for user intervention. Our autoscaler takes care of everything, from managing the scale-out to determining parallelism, load balancing, and more. There is no need to pre-size your workload or take into account operational peaks with capacity planning. 

Maximize resource utilization and avoid over-provisioning infrastructure

Evergreen runtime & APIs

The Flink runtime must always be up-to-date, providing you with the latest functionality. There must also be strong backward compatibility guarantees so that existing applications continue to function as the runtime is upgraded. As a result, the runtime is not versioned, and its upgrades are fully automated—i.e., what the user experiences is a fully managed service. 

The APIs we expose should also be declarative. You state their intended outcome, and Flink determines how that result is achieved, enabling you to focus on building business logic, not managing infrastructure.

Usage-based billing

Finally, you should pay only for what you use, not what you provision. Flink compute is ephemeral in Confluent Cloud. Once you stop using the compute resources, they are deallocated, and you no longer pay for them. Coupled with the elasticity provided by scale-to-zero, you can benefit from unbounded scalability while maintaining cost efficiency.

Tying it all together: Flink Compute Pools

To tie our serverless principles together, our Flink service exposes a new concept known as Flink compute pools. Compute pools provide users with seamless access to the elastic compute resources of Flink. Developers simply deploy apps without having to pre-determine the resources needed to operate them, and pools automatically expand and contract based on the resources required, improving developer productivity and cost efficiency.

A compute pool can support multiple apps and statements running in parallel, taking advantage of the peaks and troughs each app experiences. Resource usage is aggregated to the compute pool level, eliminating app sizing and replacing it with a simple, user-defined budgetary cap for each pool being used. Operators can easily manage access to the compute resources of Flink, segment workloads by pool, and separate billing line items for each pool.

Flink compute pools provide elastic compute resources

To create a pool, simply pick a cloud provider and region, give it a name, and set its budget. The options provided in the interface automatically detect the regions where you have stored your data streams, helping to ensure the pool is co-located with your data (i.e., Kafka cluster). Everything else is taken care of by us.

Simple

Managing apps on Apache Flink can also be challenging. Each application has to be independently sized and subsequently managed on an ongoing basis. Developers need to remain actively involved as the workload fluctuates, determining application scale and the required parallelism.

You don’t size apps when using Flink on Confluent Cloud. You create Flink compute pools and Confluent takes care of the rest—managing resource assignments, parallelism, and the scale of the pool. Our implementation also takes care of advanced deployment challenges, such as high availability, resource management, security, and auditing. Our goal is for developers to focus on app development, not complex infrastructure-related tasks.

Simplicity isn’t just reserved for scaling and managing applications—it permeates our whole approach to unifying Flink with Kafka. For example, open source Apache Flink has two different connectors when creating tables against Apache Kafka—one for reads and writes, and another for Upserts. This means having two different table definitions that need to be kept in sync as schemas evolve. There is only one unified Kafka connector for Flink SQL in Confluent Cloud, meaning you only need one table to perform any operation.

Scalable

The scalability of a cloud-native service is closely tied to the separation of compute and storage. By separating these two components, you can scale each one independently, allowing for more efficient use of resources and better overall scalability. Although Flink is closely integrated with Kafka and Schema Registry metadata, it applies separation semantics to all data sources, including Kafka. For example, many stream processing applications need to join streams together, but not all streams are stored in the same Kafka cluster. Many companies organize their storage by line of business, geography, or domain. This works well until a business unit has cross-cutting questions and needs to join data from different clusters and potentially write back to another Kafka cluster. 

We wanted Flink to take an unbounded approach when reading and writing data, reaching across environments and clusters and enriching data streams wherever they exist. Flink's separation of compute from Kafka storage means that joining across domains is a completely seamless experience. What's more, this allows you to better align how you organize your data streams to your business needs, empowering you to have smaller, domain-specific clusters rather than forcing all data into one monolithic cluster.

This separation also allows us to offer our service with more affordable pricing.  Flink and Kafka services are co-located on Confluent Cloud, meaning Flink only reads and writes to Kafka clusters in the same cloud region (i.e., saving you on those hidden networking costs). Furthermore, Flink in Confluent Cloud will be designed so that reads from Flink are aligned to the same availability zones as Kafka whenever possible. This capability is known as "Fetch From Follower" and effectively eliminates expensive network traffic charges.

Experience Kafka & Flink as a unified platform

Integrating Flink has enabled Confluent to double down on our capabilities for stream processing, providing a generalized layer of streaming compute over streaming data movement and storage powered by the Kora engine. 

However, Confluent is much more than Kafka and Flink. Customers benefit from the fact that Confluent is a complete data streaming platform. Just like with Kafka, Flink is fully integrated with our tooling for security, governance, and observability.

Security

Open source Apache Flink does not have a security model built into its framework. This is a critical dimension that organizations have to address when deploying Flink. By contrast, Confluent Cloud offers a robust and secure suite of capabilities to control, manage, and govern access to data.

To enable secure stream processing, Flink inherits the same Identity and Access Management providers available on Confluent Cloud. In addition, our role-based access control (RBAC) has been extended to include Flink, creating new roles for scalable management of permissions. Developers can access a compute pool using Flink RBAC roles defined at the environment or compute pool level.

RBAC provides platform-wide security with granular access to critical resources

Your data is already secured at the data level in Confluent Cloud, and Flink is fully integrated with our security model to enforce those controls. Flink complies with our Trust and Security policies, so auditing is always on. Finally, Flink uses managed service accounts for the execution of continuous statements, improving manageability.

Governance

Flink can read from and write to any Confluent Cloud Kafka cluster in the same region. However, we do not allow you to query across regions by design. Not only do we want to help you avoid expensive data transfer charges, but we also want to protect data locality and sovereignty by keeping reads and writes in-region.

When deploying open source Apache Flink, you must first integrate it with a metadata management repository. This enables Flink to persist technical metadata, making it durable across sessions. So far, so good. However, what if you already have topics in Kafka and schemas in Schema Registry? Sadly, you must create any and all table definitions yourself and maintain them over time. This duplication of metadata is hard to keep in sync and makes schema evolution significantly more challenging.

Confluent Cloud provides a unified approach to metadata management. There is one object definition, and Flink integrates directly with that definition. In doing so, Flink avoids any unnecessary duplication of metadata and makes all topics immediately queryable via Flink SQL. Furthermore, any existing schemas in Schema Registry are used to surface fully defined entities in Confluent Cloud. If you’re already on Confluent Cloud, you will automatically see tables ready to query using Flink, simplifying data discovery and exploration.

There is no need to maintain two disparate sets of technical metadata—instead, the experience looks like this:

CREATE TABLE catalog.database.T1 
(
C1 INT,
C2 INT, 
PRIMARY KEY (C1) NOT ENFORCED
)
;

When table T1 is created, three objects are created under the hood. The first is a topic called T1. The second and third objects are schema subjects T1-key and T1-value that capture the definition of the table, mapped to T1. Both the topic and the schemas are consumable by Flink and Kafka applications, enabling interoperability. You can see the schema subjects created for T1 below. Similarly, had I started with topic T2 and schema subjects T2-key and T2-value then Flink in Confluent Cloud would have automatically created table T2 for me making topic T2 immediately queryable.

T1-key

{
  "fields": [
    {
      "name": "C1",
      "type": "int"
    }
  ],
  "name": "record",
  "namespace": "org.apache.flink.avro.generated",
  "type": "record"
}

T1-value

{
  "fields": [
    {
      "default": null,
      "name": "C2",
      "type": [
        "null",
        "int"
      ]
    }
  ],
  "name": "record",
  "namespace": "org.apache.flink.avro.generated",
  "type": "record"
}

Observability

Apache Flink provides a plethora of metrics to choose from, but the onus is on the DevOps team to select the right ones.

Confluent Cloud provides you with a curated set of metrics to simplify the process, exposing them via Confluent's existing metrics API. An opinionated set of Flink metrics will soon be similarly exposed, providing a consistent approach to metrics and monitoring across all services in Confluent Cloud. For customers with established observability platforms in place, Confluent Cloud provides first-class integrations with New Relic, Datadog, Grafana Cloud, and Dynatrace.

You can also monitor workloads directly within the Confluent Cloud UI. Clicking into a compute pool gives you insight into the health and performance of your applications, in addition to the resource consumption of your compute pool.

Monitor compute pool utilization and metrics

What’s next and getting started

And that’s a wrap (for now)! We hope you've enjoyed our “Inside Flink” blog series, where we’ve covered a lot of the questions we’re hearing around Flink. We are thrilled to have concluded with the introduction of the industry's only cloud-native, serverless Flink service.

We have an exciting journey ahead, and this is only the beginning! We look forward to expanding our Flink service to additional clouds and regions. We’ll also be adding exciting new features, such as more automation for developer workflows, integration with OpenAI, support for programmatic APIs including Java and Python support, and a host of other exciting capabilities. 

Are you ready to get started? If you haven't already, sign up for a free trial of Confluent Cloud and create your first Flink SQL application within a matter of minutes using the Flink quick start. Use promo code CL60BLOG to get an additional $60 of free Confluent Cloud usage.*

Interested in learning more? Be sure to register for the upcoming Flink webinar to get hands-on with a technical demo that showcases the full capabilities of Flink SQL on Confluent Cloud.

  • James Rowland-Jones (JRJ) is a director of product management at Confluent, where he leads the Stream Processing and Analytics team. JRJ has over 20 years of experience in the technology industry, specializing in distributed cloud computing and analytics. Prior to joining Confluent, JRJ held senior leadership positions at Microsoft, leading stealth investments for Azure Synapse Analytics (now Azure Fabric), Azure SQL and SQL Server. James is a recognized thought leader and community contributor, having served for many years on the organizing bodies of www.SQLBits.com and the PASS organization. JRJ has spoken at numerous conferences and events around the world and is the author of several books and publications.

Did you like this blog post? Share it now