Case Study

Data Lake using AWS S3 and AWS EMR for Cost-Effective Storage and 360° User Behavior Insights for a Large Media House

Business Objective

Our client is a leading Asian media house that operates more than 60 channels in 8 different languages and is part of a leading global media conglomerate.                                                                                                                                                                                                                                                                                                                                                        The client wanted to develop a comprehensive understanding of 360° User Behavior Insights for smarter programming and advertising decisions. The objective was to assist the client in setting up a data lake leveraging AWS services such as AWS S3 and AWS EMR to process internal and external third-party data from structured and unstructured sources and provide advanced analytics solutions using the data lake.


  • Building governance processes, data vetting and legal compliance of data
  • Handling data inconsistencies across multiple sources and handling new data types
  • Building a flexible system that allows the addition of data sources in future

Solution Methodology 

  • Integrated data of varying frequencies from multiple sources like AWS S3, then onto AWS EMR, and so on. The dataset included user viewership data and advertisement data
  • Built a Hadoop based central data repository with parameters defined for data extraction, management, and usage
  • Established tools and processes to assess the quality of the data ingested
  • Data processing platform allowed to perform analysis of the data for several business use cases
  • Integrated the data lake with BI tools and third-party data marts and created metadata for a holistic data view

Business Impact


  • Designed a cost-effective data lake using AWS S3 and AWS EMR that processes 100GB data volume daily with varying frequencies (twice daily to monthly)
  • Executed data lake analytics projects across multiple themes providing valuable insights to a large number of users across the organization

©2023 Tiger Analytics. All rights reserved.

Log in with your credentials

Forgot your details?