Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

Oct 11, 2022Read Time: 6 min

The 7 Practices of a Highly Effective Data Mesh

Written By

Travis HoffmanSenior Executive Advisor, Confluent

Oct 11, 2022Read Time: 6 min

First, what is a data mesh?

“Data mesh” is a hot topic these days when IT infrastructure comes up. Data mesh, in our view, is a concept that involves people, processes, and technology. Like Agile, it’s a way of working and thinking that can push your organization toward agility, simplicity, and flexibility.

Why a data mesh architecture?

A well-architected data mesh mindset offers a way to cut complexity and align data products with owners in such a way that every team has the data access and self-serve ability they need to excel.

As you’re thinking about adopting a data mesh architecture, you may begin with a vision, a roadmap, and maybe a few initial use cases. How do you know whether you’re on the right track? How do you maximize the benefits of data mesh?

From working with customers across countless industries, we’ve determined a set of practices essential to successful data meshes.

Here’s how to ensure you’re implementing data mesh the right way:

1. Everything is evolvable.

My First Law of Architecture is: “Whatever architecture you design is wrong; either now because you didn’t understand the requirements fully, or eventually because something changes.”

It’s the first law because we’ve all been there: we know the one constant is change, particularly in modern businesses. The data architecture has to constantly realign according to business requirements, which will also align with shared goals or KPIs between the tech and business teams. To make sure that this team can adjust as needed, every aspect of the data mesh must be defined for evolvability. In successful data mesh organizations, it includes:

Governance that emphasizes long-term evolvability, because part of the challenge is that you’ll always have to balance centralized governance and federated/distributed governance. That balance is never done; it has to adapt and change to maximize velocity and manage architectural complexity, which will always change (see the First Law of Architecture)
Self-determinism, so that great power—and its accompanying great responsibility—lead to well-run teams which plan for evolution at every stage
Sustained agility at scale, where there’s clear ownership of domains and data products. This also includes localized application changes by domain, with minimal cross-domain synchronization that could slow velocity
Faster ROI from a simple, flexible, and consistent interface for data interchange across domains, with a compositional approach for blending data from multiple domains

2. You’ve defined a domain hierarchy.

A hierarchical domain tree structure gives everyone the ability to locally, easily store data from other domains. This provides a mechanism to coordinate data at the common ancestor. Having a hierarchy in place will also let you manage complexity more easily (see practice number 3).

With this in place, it’s possible to think more deeply about time. Data products have a longer lifecycle, and operational realities will require “fixing” data in some way. Concurrency issues are more complicated. You should design for eventual consistency from upstream to downstream, across Al, transactional, and operational data, and time series data. Along with this, define a default way to handle replayability or replacement of streams.

3. Complexity is continuously managed.

My Second Law of Architecture: “Every architectural change increases complexity, unless specifically designed not to.” This practice of a data mesh infrastructure aims to keep complexity from growing by special design patterns. Accomplishing this includes a few approaches:

By default, isolate integrations to a “domain” and define a “mesh” model to isolate the vendor as much as possible.
Take the perspective that “vendors integrate with the data mesh, and not the other way around.”
Design for resilience, which involves defining guidance for topics like the “too many copies problem”: When to merge multiple into one shared? How many use cases to support in one? When to use a derived source or its upstream one?

4. Ownership is more refined.

A well-architected data mesh has a well-defined RACI type of ownership model in place. It is rarely sufficient to just define an “owner”; it is very valuable to break down ownership into more granular terms.

This matters as ownership plays out from top to bottom. In most practical solutions, we prefer to follow the single-writer principle, in which only one service–the data owner–may write to the topic or database. This simplifies system implementation and puts all the code to handle conflicts and synchronization in one place: the hands of the development team that owns the data.

5. Default implementations are defined.

Along with clear team or people ownership, the data mesh should provide a happy path, following the single-writer principle. This means that, by default:

All data consumers should be tolerant of multiple delivery (business logic level)
All data consumers should be tolerant of out-of-order delivery for up to N seconds
All data producers should produce idempotent records
Event sourcing is usually more complex than necessary. Start by publishing “fat objects” for your records tenants and come back to event sourcing if you really need it.

The goal of a data mesh is to put decision-making responsibility as close to the data as possible – in the hands of the implementation team. The idea is to make it easy for that team to do the right thing and follow the happy path for 80% of the use cases. There will be times that teams have to do something off the happy path—that’s OK, as that 20% can be the source of the next innovation. But most of the system should align to best practices naturally through influence rather than by mandate.

6. Think more deeply about time.

The shift to data as a product means that we have to think much more deeply about time, and how our data customers will be impacted when the data changes. And it will change, whether because a vendor changes, or your algorithms change, or maybe your service changes. Even your fact data can change over time.

Data products mean your data now has a longer lifecycle and thus your data customers will either need data to never change or, as a team, we’ll need to have good answers to the following questions:

Data Product Lifecycle – Version 2 is inevitable, what is the migration path?
How will change in data affect the data customer experience?
How will operational changes affect the data customer experience? Our data products are no longer publish-and-forget. As data product owners we are responsible for the care, for the feeding, and for things like SLAs.
How are data products discontinued?
Can we relax transactionality, or are our data customers okay with eventual consistency?

7. Institute Feedback Loops to Continuously Improve.

One of the biggest challenges to internal changes like adopting a data mesh is that value isn’t always clear—much of the work may not result in new features or generate new revenue. This isn’t a problem, it’s an opportunity! We need to develop new ways to track and assess not only business value, but technical value.

Develop new KPIs: How much each data product is being used, data product fanout, payback models, timeliness, mean time to repair, mean time to value, and error rates—all these need to become measures of the data product.
Build a Technology Adoption Lifecycle: In a distributed, hierarchical domain structure, teams at the edges can be the source of innovation. Find ways to identify, generalize, and integrate those back into our best practices and paved path solutions.
Refine your Data Product Lifecycle: Most data products will change over time. We need to think about the migration path for our customers, breaking changes are rarely OK.

The key thing to remember about a data mesh is that it isn’t a piece of technology that you can unbox, plug in, and put on the kitchen counter next to your coffee maker. You don’t just turn it on and wait for a light to go on. A data mesh is a new process, a new practice, and a new approach for how to best share your data around your organization. For this to succeed, you need to begin with best practices that focus on the end goal of interoperable data that doesn’t rely on centralized authority or controls; a sometimes radical departure from what we’re used to in data management. You’ll want to implement these best practices, and to continuously discover more from your teams and systems to fit your organization’s unique needs and situation.

Travis Hoffman is a Senior Executive Advisor with Confluent. Travis takes a holistic approach to strategy development. He works with customers to clarify their business objectives, developing them into strategies that factor in the people, the processes, and the technologies that make up each customer's distinct landscape. From that strategy, he coaches his customers on how to best implement the tactical, technical solutions required to make their streaming implementation a reality. Travis lives in Portland, Oregon with his partner Andrea and their son, Denali.

Did you like this blog post? Share it now

Real-Time Toxicity Detection in Games: Balancing Moderation and Player Experience

Mar 14, 2025

Prevent toxic in-game chat without disrupting player interactions using a real-time AI-based moderation system powered by Confluent and Databricks.

Building AI Agents and Copilots with Confluent, Airy, and Apache Flink

Feb 20, 2025

Airy helps developers build copilots as a new interface to explore and work with streaming data – turning natural language into Flink jobs that act as agents.

Steffen Hoellinger

The 7 Practices of a Highly Effective Data Mesh

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Written By

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Did you like this blog post? Share it now

Real-Time Toxicity Detection in Games: Balancing Moderation and Player Experience

Building AI Agents and Copilots with Confluent, Airy, and Apache Flink

Get started free with Confluent

Watch demo: Kafka streaming in 10 minutes

Did you like this blog post? Share it now

Subscribe to the Confluent blog

Real-Time Toxicity Detection in Games: Balancing Moderation and Player Experience

Building AI Agents and Copilots with Confluent, Airy, and Apache Flink