The client is a multinational dairy cooperative headquartered in the Netherlands. It is the largest dairy co-operative globally and one of the top five dairy companies worldwide. Their operations are spread across 33 countries, with a workforce of over 21,000 employees.
The client aimed to integrate specialized nutrition data from SAP into its Azure D&A platform without disrupting existing systems. The goal was to enable centralized access, self-service analytics, and faster insights across business domains like finance, supply chain, and quality control.
The team conducted high-level data exploration on SAP ECC objects to assess the availability and quality of specialized nutrition data. Based on this assessment, ODP/SLT/DI-based extractors were built to provision the data into Azure Data Lake Storage (ADLS).
Historical data from SAP was migrated into the data lake, and recurring batch schedules were set up with load-balancing considerations. Then, a logging mechanism was implemented to monitor extractions daily and manage dependencies with other batches
Existing Power BI datasets and reports were reviewed to understand their sources and structure. The data model was updated to include a source system dimension that could distinguish between generalized and specialized nutrition data. Source system IDs were added to all fact tables, and relationships were created with the dimension layer.
Power queries were also modified to include the new data and prevent duplication. Post-integration, report features and data accuracy were validated, and recurring refresh schedules were put in place.
The team aligned with the existing platform architecture and future-state design. Metadata of generic components was configured to ingest data into both Raw and Curated zones. STTM mappings were defined to map source SAP objects to the business data hub. Source columns and key attributes were modified to implement intelligent deduplication for cloned company codes.
Pipelines were configured and updated to load data from the curated layer through to the analytical data layer. Finally, jobs were scheduled to manage recurring loads and handle interdependencies across layers.