Data Mesh vs Data Fabric: the Reincarnation of an Old Debate?

Introduction

In the current IT landscape two emerging philosophies are shaping the debate in the Data/Big Data/Analytics space. Data Mesh – or decentralized agility – and Data Fabric – or integrated connectivity.

Data Mesh, championed by Zhamak Dehghani, among others, emphasizes decentralization and domain-oriented data ownership. It envisions treating data as a product and promotes a federated approach where individual domain teams are responsible for the end-to-end data lifecycle within their domains.

Data Fabric, on the other hand, is focused on creating a unified and integrated data architecture. It provides a centralized layer that connects disparate data sources, ensuring seamless accessibility, and consistency across the entire data landscape.

Data Mesh tenets:

  1. Domain Ownership: Data is treated as a product owned by specific business domains, fostering accountability.
  2. Decentralized Architecture: The architecture is designed to distribute data processing responsibilities among domain teams, promoting agility.
  3. Data Products: Data is transformed into self-serve, high-quality data products, enhancing usability and reducing dependencies.

Data Fabric tenets:

  1. Unified Architecture: A centralized data layer that integrates and connects various data sources across the organization.
  2. Data Accessibility: Promotes easy and secure access to data, enabling organizations to derive insights from a unified view.
  3. Interoperability: Ensures data interoperability by creating a fabric that connects different systems and platforms.

Data Mesh undoubtedly provides certain tangible benefits:

  • Scalability: Enables scalable and agile data processing, catering to the specific needs of each business domain.
  • Autonomy: Encourages domain teams to be self-sufficient, leading to faster innovation and problem-solving.
  • Reduced Bottlenecks: Minimizes centralized bottlenecks, fostering a more distributed and responsive data ecosystem.

But at what cost?

  • Coordination Challenges: Data Mesh requires robust coordination mechanisms to ensure harmonious collaboration between different domain teams.
  • Transition Complexity: Implementing Data Mesh may require a significant shift in organizational culture and processes.

On the other hand, Data Fabric ensures:

  • Consistency: Data Fabric provides a consistent view of data across the organization, reducing data silos and inconsistencies.
  • Ease of Access: Simplifies data access and discovery, fostering collaboration and knowledge sharing.
  • Interconnected Ecosystem: Promotes a well-connected data ecosystem, enhancing the overall efficiency of data processes.

At the same time, Data Fabric suffers from some easily identifiable drawbacks:

  • Dependency on Centralization: Some argue that the centralized nature of Data Fabric can lead to potential bottlenecks and hinder agility.
  • Implementation Challenges: Implementing a unified data fabric can be complex, especially in organizations with diverse and legacy systems.

The dichotomy/trade off highlighted above – tactical decentralization which fosters smaller projects with a higher chance of success versus centralized integration which is – yes – more desirable but at a higher risk if the large, company wide effort fails to deliver the expected results, is nothing new in the history of IT.

I remember working in Wall Street twenty years ago, implementing Data Warehousing solutions for the major merchant banks headquartered in New York. Back then there were two approaches for Data Warehousing implementations:

  1. The Kimball Approach
  2. The Inmon Approach

The Kimball approach is similar to Data Mesh. Ralph Kimball advised his customers to focus on dimensional modeling, emphasizing both star and snowflake schemas. He encouraged to build Data Marts first – Data Marts being smaller subsets of data warehouses tailored to specific business areas. He also favored an iterative and incremental development – Data Marts could eventually evolve into full blown Data Warehouses.

The Inmon approach is similar to Data Fabric. Bill Inmon advised his customers to build a centralized Enterprise Data Warehouse (EDW) as the “single version of the truth.” He advocated a normalized data model to reduce redundancy and improve consistency. Inmon’s approach focuses on integrating data from various sources before presenting it to users. Followers of this approach would put a strong emphasis on data quality and consistency at the source level.

Apart from “ideological purity,” who won the debate, if we consider the many Data Warehousing implementations of the last twenty years?

Success stories and case studies exist for both methodologies, showcasing instances where each approach has been effective in meeting the organization’s goals and requirements. The choice between Inmon and Kimball often depended on factors such as the organization’s data complexity, reporting needs, existing infrastructure, and the skill set of the team.

It’s important to note that over time, hybrid approaches and adaptations that borrow elements from both Inmon and Kimball methodologies have become more common, reflecting the evolving nature of data warehousing practices. Organizations customized their approach based on their unique circumstances and requirements. Most organizations ended up adopting hybrid approaches, tailoring both methodologies on their specific needs.

Drawing a parallel between Data Warehousing and Data Mesh/Data Fabric, and considering the Data Warehousing experience of the last twenty years, I feel that I can make a bold prediction: the same will occur to the current debate.

Organizations will adopt hybrid solutions. I can foresee scenarios where an organization striving to adopt a Data Fabric solution will have to settle for less-than-ideal outcomes, where the initial centralized and homogeneous data architecture will have to coexist with more tactical initiatives initiated by those departments which could not wait for the the company-wide data architecture “nirvana” to be completed.

And I can also foresee scenarios where tactical and compartmentalized initiatives will have to be merged into a “bigger picture” architecture further down the track, to avoid the impedance mismatch of similar, but not identical, data residing in different company “silos.”

Conclusion:

In the end, the two approaches will coexist and will contribute in equal manner to the success of companies in their journeys dealing with Data/Big Data and Analytics.