A Decade in Data & Analytics: What Changed and What Remains Unchanged

Author: Santhanakrishnan Ramabadran

Having been associated with the analytics industry for nearly two decades, I consider myself extremely lucky. It’s like being on one long, endlessly evolving wave that a surfer takes great pleasure and pride from riding. Developments over the past decade, coinciding with the 10 years of Tiger Analytics being in business further that feeling and indicate an even more exciting future. I’ve made an attempt to share a slice of my excitement here, capturing a few key changes, as well chronicling elements that have remained constant.

The Variables

1. Advanced Analytics initiatives move on from being tentative experiments:

Even as of a decade ago, except for a handful of global corporations and odd(ball) baseball teams who’d “compete on analytics”, predictive analytics used to be gingerly attempts: called PoCs. That has significantly changed. Analytics, in its evolved DS/ML/AI avatar, is one of the key pillars of the now ubiquitous digital transformation programs.

Formal structures – in the form of Chief Analytics Office, or CDAO (Data & Analytics), or as a sufficiently influential arm within Strategy and/or Digital offices, have been established by many Fortune-500s & beyond, to keep analytics in the line-of-sight of senior executives. This has helped elevate analytics from being experimental efforts to build models that, if successful, get handed over to IT for production deployment and use by businesses (or be silently retired if not so), to high-stakes business transformation initiatives.

Significant accountability is vested with the data & analytics leaders, along with access to budgets and resources, to bring together

– Right sources of data into the lakes, and the right blend of data out of it,
– Robust analytic methods/models, and
– Applications (webapp/mobile app) for easy consumption of analytics by business users

I usually refer to these as upstream, midstream, and downstream capabilities.

The ripple effect of how our client organizations are increasingly thinking of analytics as “end-to-end”, drives how we shape our organization – Tiger Analytics.

It has given us the conviction to double-down on investments we’d made years ago in critical mass end-to-end capabilities (specifically in data engineering, application engineering & ML engineering areas).

2. Data transformation is now better aligned with analytics

Enterprise Data Warehouses (EDWs) of the past – boxes located within data centers, whirring to the flow of organizational data streams – used to be a very costly and rationed resource. Therefore, how much of the data from source systems (ERPs etc.) would ultimately make it to the EDW sinks – via the staging layers and through complex ETL processes that were a burden on IT-Data Warehousing teams – was a key determinant of the extent of analytics possible. In such constrained situations, data required for operational reporting carried priority, but for the few organizations who could afford to go beyond. Scaling enterprise data warehouse capacities also had significant lead times.

Even incremental investments in the form of analytic data marts had limitations. They derived a bulk of their data from EDWs, had minimal external feeds, and held department-specific data with limited/no hand-shake across departments. Resource constraints in terms of relational database management software license costs (even more tightly rationed, discretionary budgets, mostly falling outside core IT spends), people, and processes that could ensure additional high-quality data were also similar.

In the first half of the 2010-20 decade, big data platforms started showing up on the horizon but were yet to make an impact across the industry. While they looked promising, the landscape was rapidly (and in some areas chaotically) evolving, and component releases across the “Hadoop ecosystem” were dependent on open-source community support (HDFS file systems, map-reduce based algorithms, mahout ML engines, HBase vs Hive, and early days of Spark).

A lot of these were happening behind the scenes of many analytics teams and still within the scope of IT/CIO offices. In the later part of the decade, specifically over the last 4-5 years, a few significant changes occurred: all in the right direction.

– Cloud-based storage and computer-driven big data platforms/massively parallel modern data warehouse solutions suited for different needs were made readily available from trusted providers such as Microsoft, Amazon AWS, and Google Cloud who made it easier for a wide range of businesses to adopt enterprise-ready scalable data solutions.

– Data processing paradigms also shifted to the ELT model. Data lakes were in principle acting as erstwhile staging layers, with data as close as possible to source systems (vs only transformed data in EDW). “Hydration” of the data lake at this level is orchestrated by central IT teams, with incremental processing (transformations) pushed out to the analytics teams along with guidelines and very structured areas within a highly connected cloud infrastructure.

– With transformation well within the scope of data & analytics teams, they could either perform individual use-case specific transformations (lower level of maturity and could result in proliferating versions of truth) or think about feature stores and data foundations with a broader set of business requirements (higher level of maturity).

– While there is significant use of technology, this cannot be thrown over for the IT teams to deal with. Data & analytics teams, in close engagement with the business teams, are defining much more robust data foundations to drive “end-to-end” analytics solutions.

All these resulted in a sudden surge of demand for high-quality data engineering capabilities. So much so, that in the short run this could be disproportionately higher compared to the demand for talent with DS/ML/AI skills. Pure play analytics providers have had to develop this new capability.

At Tiger Analytics, we see this triggered a two-fold change

– A significant part of our data scientists are erasing the boundaries between engineering & science, by cross-training themselves across both areas
– Data Engineers with a flair for ML & AI choose to come on board adding to this melting pot of talent.

3. WYSIWYG approach to democratizing Data & Analytics

Reading through an early draft of this blog, my colleague Sunder pointed out how democratized analytics has become, in the way it is accessible to and practiced by more than just a few in many organizations. Indeed, we are seeing this in many ways.

– Visualization: Reports, from being largely staid & static add-ons to enterprise applications even a decade ago, have become a lot more interactive, intuitive, powerful insights-laden, and importantly, self-serviceable too. This was brought to the fore by Tableau & Qlik very well in the past decade, with PowerBI pitching in to take it far & wide.

– Workflow approach to advanced analytics: More than a decade ago, solutions such as SAS EG, SPSS Clementine (now IBM-SPSS Modeler), and FICO Model Builder enabled the construction of visually intuitive advanced analytics workflows. Projects like KNIME (and Orange for Python – give it a try) were also building visual workflows for analytics. The significant growth in the adoption of Alteryx by a broader set of analytics users in organizations – both for advanced analytics workflows (via integration with R), as well as for upstream data processing (like a visual ETL engine) is a trend in a similar direction.

Analytics doable by more than just a few specialists is a big change that addresses the talent demand-supply gap too. At Tiger, many of our data scientists & consultants use at least a couple of the above tools – either for visualization of reports or for analysis & modeling.

4. The Advanced Analytics toolkit is light & sleek

This one is close to my heart. In 2001, while I was in a quant-MBA program and data science was not yet a (cool) phrase, our professor would assign us to work out and submit all our statistical data analysis on R. With 100 odd packages at that time, R was cool but tough to master in comparison with other enticing menu-driven software (often available for free or discounted pricing for students). “You’d understand the power of open-source software in analytics a decade or two down the line”, our Prof would remark.

How true this has come out to be! Today, Python appears to be creeping up on the feet of R (pun intended), but the fact is, the toolkit of data scientists is now mostly open-source software vs proprietary commercial ware with limited adoption.

This is not just from a cost perspective. The way these programming tools could make analytic algorithms run directly on top of large-scale data (along with memory-resident data structures) is much more efficient in comparison to so many others (also because of the sheer community brainpower that goes into optimizing these vs others).

At Tiger, our technical teams are embracing & driving the adoption of open source, but with a low/no incremental code twist. Code templates have been developed to address specific business problems – combining highly optimized run-times with problem-specific analytic steps for high reliability & repeatability. These, in addition to general-purpose automated data exploration & model development modules, help achieve 50-66% acceleration vs the use of grounds-up analytics code for every problem. This approach also brings down the barrier to do analytics, making it more democratic.

The Constants:

While so far we have explored areas that have changed, some things have remained constant

1. The superperson problem and the solution

Making a significant business impact through analytics requires skills across multiple dimensions. These include data processing, analysis & modeling, visualization, engaging with business teams, storytelling for insights, plus a reasonable handle on operational, business and financial metrics relevant to the industries served.

I have seen organizations – core data products companies, captive data & analytics teams of consulting organizations with analytics as mainline or multiple service lines, take many routes to address this challenge.

1. Build entire teams with people who are experts across all skills.
2. Build a majority of the team with people trained/reskilled on a broader subset of the skills mentioned above, with a small proportion of specialists in each area.
3. Set up a team with a more balanced mix of engineering, science, business translator skills (in different people) as a starting point, and enable/encourage teams to heavily collaborate and blur the boundaries as they grow.

Each option represents a significant trade-off across many levers: the ability to scale, the ability to make a real impact vs ‘we also do analytics’, the ability to retain talent through relevant work assignments for all, ensuring collaboration across skill areas, etc. This is a challenge that I don’t see going away. Leadership teams have to constantly grapple as the data & analytics space evolves at an even more breathtaking pace.

At Tiger Analytics, we have broadly defined two capability pillars – the technical capability pillar (consisting of data science, data engineering, application engineering, ML engineering technical skills), and the consulting & program delivery capability pillar (teams that have a significant focus on business engagement, program management, industry knowledge, but also with reasonable data science techniques and/or technology exposure). Collaboration between these capability areas, as well as with the in-market teams is ensured through a well-articulated “Tiger Way”. The mix across and within these pillars is continuously monitored to stay in tune with the industry needs.

2. Focus on value

In the early years of Tiger, when we were trying to help our teams appreciate the value of what they deliver, we’d do reviews with the CEO during the quarterly visits. During one such visit, someone from the team called out with conviction: “Given the client’s current model performance baseline, even if we were to predict with close to 100% accuracy and scale the model across the entire volume of decisions it would make, the incremental value we’d deliver to the client would be just about the same as the project cost”.

We knew two things right then:

– The culture of measuring value has started setting in, well beyond just the leadership, and
– We better quickly tell the client we are wasting each others’ time and move on (which happened rather quickly, with the client appreciating our honesty in not dragging on and commissioned us to work on other initiatives)

To me, this also carried a bigger message. Business value (vs solution build complexity) when used as a measure to prioritize analytics initiatives keeps it objective. Of course, the future-looking initiatives will need to work with assumption-ridden calculations, but reasonably well-debated consensus ROI estimates vs gut-instinct-based leaps work well on a larger base.

The potential value through analytics data-driven decision making is high, and to a large extent, an imperative too. The need to keep value vs complexity (including, but not limited to, the direct cost aspect) remains yet another constant.

At Tiger Analytics, we have adapted this by developing Business Value Estimation & Articulation frameworks for 4 stages of delivering data & analytics programs: Discovery, Design, Development, and Deployment (4D). In the discovery & design phases, these frameworks help layout initiatives (identified in discussions with business & analytics teams) on a Value vs Complexity map and pick the right priorities to work out a detailed business case for a funding commitment. This is tough, but still relatively easier when compared to sustaining value-vs-complexity in the development & deployment stages of a program. In a closer hand-shake, teams from our technical and consulting & program delivery capability pillars are collaborating to define agile execution frameworks that constantly prioritize value vs complexity of incremental features to take on/park/discard, with lesser heartburn and greater speed to value of programs.

Written this way, this blog largely reflects the top-of-the-mind thoughts from the experience that I, as the author, have gone through, rather than a well-researched perspective that reflects broader experience. With that caveat, I put this out here for you, the reader, to read, reflect, agree/disagree and share your perspective.

Thank you for your time.

 

Tags:
0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*

©2017 Tiger Analytics. All rights reserved.

Log in with your credentials

Forgot your details?