Accelerating Digitization of Policy Documents for Better Mortality Prediction

Business Objective

Our client is a Fortune 100 life insurance provider. The client had ten years of application data available in a digitized format for modeling of mortality risk. It wanted to digitize an additional 15 years of applications (~1 million forms) to improve survival prediction. To accomplish this, the client needed a robust and automated process to assess data quality and ensure that digitized applications (post-OCR) meet a certain standard for modeling purposes.

The client wanted us to-

  • Create a scalable solution for automated validation of digitized applications
  • Deliver over 90% accurate data for integration with modeling datasets
Challenges
  • The underwriting process had evolved significantly over the past 25 years – forms with varying templates, changes in questions (and language), incorrect applicant tagging, etc.
  • Significant digitizing errors such as incorrect values, page duplication, wrong order of pages, etc.

Solution Methodology
  • Identified variations in application questions and designed custom workflows to create a single source of truth (SSOT), by
    • Creating a stratified (smaller) sample of policies to manually translate them into digital data to eliminate all digitizing errors
    • Analyzing different trends to define benchmarks for missing percentages, outliers, and invalid entries
  • Built an automated process to analyze digitized data in terms of sanity checks (missing values/outliers), variable distribution, and comparison with SSOT
  • Designed reports for underwriters to track data quality metrics at a variable and policy level
Business Impact
  • The initial prototype helped in making a go-no-go decision with regards to scaling the OCR process and justifying millions of dollars of investment

  • The final solution is able to process and validate 50k digitized policies in less than 2 hours

Copyright © 2023 Tiger Analytics | All Rights Reserved