6. Data Enrichment
Data enrichment enhances existing data by adding additional information from external or internal sources. It can be a data consolidation task, which adds further data to an existing consolidated data set instead of building a completely new one from scratch, or a data blending task (see next paragraph).
Its benefit relies on the assumption that a new data set provides different information than the existing one. That information might be additional fields for existing entries or new entries. In any case, some form of data mapping and reconciliation must be applied to ensure that the enriched data set remains consistent and valid.
An example use case for data enrichment is a company that already has a database for its suppliers and its interactions with these suppliers but wants to add additional information about the suppliers, such as their revenue, market capitalization, relations to other companies, etc.
7. Data Blending
In contrast to data consolidation, data blending is more of an “on the fly” way of integrating data, meaning the combined data is not necessarily consolidated into a single database but used for ad hoc analysis. Not storing the merged data set in a database makes sense for exploratory analysis or analysis that happens so infrequently that the cost of storing a consolidated view is higher than blending the data during use. It also offers higher flexibility for the analyst in deciding how to use which data, which may be an advantage for exploration but also a risk for errors.
A typical example would be blending data from an Excel sheet with data from a SQL database in an analysis tool. This example illustrates the enrichment of an existing consolidated data source (SQL database) with additional information (Excel sheet), a common use case for data blending.
Conclusion
While this article disentangles the different processes and terminologies involved in data integration, most of these processes overlap and partially depend on one another. For example, data reconciliation cannot happen without data mapping and merging, and the terms are often used interchangeably. Enriching a data set is impossible without mapping, merging, and reconciliation. The aim of all these processes is usually to create a consolidated data set. Thus, data consolidation is often used as a single term to capture the whole process. Data consolidation and merging are about combining data into a single data set, but they have a different focus, i.e., having shared data storage vs. merging individual data entries. Data cleansing can happen separately or as part of the other processes. Data Blending can be based on consolidated or separate data sets. The seven steps provided in this article serve as a rule of thumb on the processes that need to be applied for data integration and the order in which to apply them.
The need for data reconciliation, consolidation (or whatever other term applies to a specific use case) appears to be clear and commonly accepted. A more significant challenge is how to get there. The data architecture or pipeline may vary; there might even be a ready-made software product that handles those questions. However, it is essential to note that combining data ultimately remains a data problem that requires customized solutions based on the actual data and business use case. CID can support you in finding a customized solution for your data problem.
Author © 2024: Lilli Kaufhold – www.linkedin.com/in/lilli-kaufhold-467659110/