With the digitization of the corporate world, ever more significant amounts of data are being collected. Central solutions are often used for this, which, however, hurt the agility and quality of the data. The data mesh concept offers an alternative. Every device and system generates new data that needs to be evaluated, analyzed, and used, from the online shop to social media to IoT devices. Even within the modern, networked factory, given the wide variety of data and information levels, transparency and automated processes are becoming increasingly important to reduce costs and increase productivity because only those who can react quickly to new developments will also be able to avoid significant disruptions and thus hold their own in the market in the long term.
Data warehouse or data lake technologies have long been considered the solution for storing a company’s data and making it available to all areas. Many companies implemented corresponding centralized platforms – hoping to avoid siloing and get the most excellent possible value from their available data. However, this often turned out to be a fallacy: This is usually due to the central data team, which stands between the many data sources on the one hand and the constantly growing number of data consumers on the other. Although it is intended to integrate all data streams in a central depot, it often does not have sufficient knowledge of the technical content and the required structure.
Based on proven software engineering principles, the data mesh concept offers a solution. It combines them with decentralized approaches from modern software development – domain-driven design (DDD), product-centric thinking, self-service infrastructure platforms, and federated governance – and applies them to data. One of the key benefits of the data mesh approach is specialized, domain-centric teams.
They know the data and data sources in their respective domains both technically and commercially and can thus integrate new data or structural changes more quickly and make them available in the company. Since the individual teams act independently, the typical bottlenecks of a central data organization are avoided. As a result, data agility in the company increases, and data users are supplied with relevant data more quickly. This, in turn, increases business agility and shortens the time to value.
The definition of Zhamak Dehghani includes the following four principles:
Domain ownership means that data management is divided into technical business areas, each of which has a dedicated data team responsible for it. This way, it can respond better to the requirements and integrate new data sources more quickly. Since the team has the necessary expertise, it can react more quickly to changes – for example, when purchasing external data. Following the data-as-a-product principle, domain teams create data products that other teams can easily access.
To do this, the data must be well documented and easy to find, offer easy access, be high-quality, and be based on user requirements. In short: The team develops and maintains its data product like a natural product. The self-service platform describes the underlying IT platform, which should provide uniform, domain-independent tools with which the domain teams can independently create, maintain, and offer their data products. These should be easy to use and require little or no highly specialized knowledge. This is the only way to reduce the costs and complexity of the processes used to create the data products.
In addition, the quality, interchangeability, and integrability, i.e., the interoperability of the data products, must be guaranteed. This ensures that all domains comply with uniform quality standards and that their data products can be used across the board. The four principles must complement each other. Data-as-a-product and federated governance are crucial to preventing domains from becoming data silos. Federated governance also ensures the quality of the data products and defines a uniform set of rules within which the domains can act autonomously but not uncontrolled. The self-service data platform, in turn, must be functionally powerful and easy to use so that the domains can produce their data products independently and cost-effectively.