Our client is a leading Asian media house that operates more than 60 channels in 8 different languages and is part of a leading global media conglomerate.
The client wanted to develop a comprehensive understanding of user viewing behavior for smarter programming and advertising decisions. The objective was to assist the client in setting up a data lake leveraging AWS services such as AWS S3 and AWS EMR to process internal and external third-party data from structured and unstructured sources and provide advanced analytics solutions using the data lake.
- Building governance processes, data vetting and legal compliance of data
- Handling data inconsistencies across multiple sources and handling new data types
- Building a flexible system that allows the addition of data sources in future
- Integrated data of varying frequencies from multiple sources like AWS S3, then onto AWS EMR, and so on. The dataset included user viewership data and advertisement data
- Built a Hadoop based central data repository with parameters defined for data extraction, management, and usage
- Established tools and processes to assess the quality of the data ingested
- Data processing platform allowed to perform analysis of the data for several business use cases
- Integrated the data lake with BI tools and third-party data marts and created metadata for a holistic data view
- Designed a cost-effective data lake using AWS S3 and AWS EMR that processes 100GB data volume daily with varying frequencies (twice daily to monthly)
- Executed data lake analytics projects across multiple themes providing valuable insights to a large number of users across the organization