Equancy – consulting in digital and data transformation

Published on 17 Jun 2022

What is a data mesh?

The data mesh, literally "net of data", refers to a big data processing system, with a very specific architecture whose key word is: decentralisation.

Conceptualised in 2019 by Zhamak Dehghani of Thoughtworks, the data mesh proposes an architecture where each data domain of the company (e.g. customers, products, etc.) is managed independently by the team responsible for it (domain-oriented approach). This data is offered as self-service via APIs, as if it were a ready-to-use product. This should result in time savings (reuse), agility (use as a service) and space savings (no duplication) in its processing and analysis. It can be summarised as "a change in self-service data architecture, treating data as a product".

* Image Source : Towards Data Science

What is the difference between data lake, data warehouse and data mesh?

The data lake is a tool for storing and making available data that is as fresh and relevant as possible. It is based on the principle of a "read-only" schema, where the data is first loaded as it is, and "interpreted" only when it is read, according to the use that will be made of it.

The data warehouse is also a storage tool based on a write schema principle, i.e. the data is structured and organised as soon as it is loaded, according to need (a process known as ETL: Extract, Transform, Load).

The data mesh is an architecture system that allows bridges to be created between the various databases. The engineering teams will move and transform the centralised data to obtain the desired result. It simplifies collaboration and self-service, thus complementing the data lake and data warehouse.

What are the advantages of a data mesh?

The data mesh is supposed to have several advantages for the exploitation of its data:
Data as a product: the data mesh consists of a meshed infrastructure of services that will each consume data as input, in order to return it cleaned, structured and offered as a ready-to-use product. These products can in turn be consumed by other meshes in the data mesh.

Decentralised domain ownership: each division of the company owns its data because it is the one most familiar with it. It is responsible for collecting the input data and carrying out the necessary transformations to turn it into a product.

Self-serve data platform: the data is made easily and quickly accessible to all those who need to access it. This is achieved by pooling a self-service infrastructure (the only centralised point in a data mesh).

Federated computational governance: two levels of governance are defined. Firstly, intra-domain governance (by domain/BU) which governs data quality, provenance, security and compliance. And secondly, inter-domain governance which governs global compliance risks (RGPD), data format standardisation, system security and the data life cycle.

The data mesh aims to offer maximum flexibility in the use of data, validated by its functional owner, organised and ready to use, easily combined in new products combining several domains, which in turn can be easily distributed.
This vision of data architectures is still conceptual, and it will be necessary to validate that it stands up well to the technical implementation that companies with a strong data culture have undertaken.

So, is Data Mesh a fad or a truly revolutionary concept?