Service Oriented Integration is the preferred option for integration as it aids in natural interoperability of business capabilities. However, there are scenarios like bulk data movement where batch mode integration techniques like ETL needs to be considered. For the same, I recommend considering following fundamental industry standard data integration principles while you architect, design & implement the solution for batch mode integration scenarios.
- Data Quality First
- As a first step in the data integration initiative, it is recommended to perform the data quality analysis of the source system to understand its data quality level. It includes aspects like completeness, consistency, accuracy, precision, accessibility and timeliness of the data. It helps in two ways-
- Devising approaches for improving data quality of the source systems
- Aids in the definition of data integration architecture which takes into account of inherent data quality issues.
- Write Once-Read Many
- There may be scenarios where data needs to be extracted from the source system, and then it is fed to multiple target systems after undergoing validations and transformations. If there are multiple target systems which have a dependency on the source system, it may lead to requests for creation of multiple extraction components at source systems. It will increase the complexity at source systems and also escalate the development & maintenance costs. To avoid getting into this trap, it is recommended to adopt architecture principle of Write Once-Read Many. It means to build only one extraction component per source type.
- Extract Everything
- It is observed that there is a tendency to grab data only required for current needs. However, as we know, consumers of the source system may evolve so is their data needs. It is strongly recommended to analyze the data entities managed by source system and then consider both current & future needs of the current & potential consuming systems. Based on the analysis, extract entire data set from the source system which may be required to meet the same.
- Target Based Load Process
- This principle promotes the design of the load process by giving consideration to target system first then looking at the subject areas within that target based system.
One of the recommended reading in this space - Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture.
No comments:
Post a Comment