Our client is a US-based Fortune 500 networking & telecom major.Their objective was to identify and proactively invest in upcoming and populartechnologies. We proposed an innovative approach to address this need by analyzingpublic data from open source projects to guide their investment decisions.
- More than 20 million user accounts, 30 million open source projects, 150,000 articles from popular tech blogs and websites needed to be analyzed
- Extracting and aggregating data from multiple public sources such as GitHub, Stack Overflow, and multiple tech publications/blogs through customized APIs and RSS readers
- Required structured and unstructured data to be fused together.
- Employed RSS/API/BigQuery based crawling for data extraction from different sources
- The raw data included user attributes, project attributes, article feeds, public opinions about technologies, commits, forks, bug reports, fixes, etc.
- Developed a predictive model to identify emerging technologies from multiple random samples. We identified 250 technologies as emerging using NLP/text analytics to correlate the model results with views expressed online by key opinion leaders
- Developed an interactive web dashboard to showcase emerging technologies, and relationships among them.