Business Objective
Our client is one of the largest life insurance companies in the United States. The client was interested in improving short and long-term revenue projections by determining how likely a policyholder was to have his or her policy suspended due to unpaid premiums.
The specific objectives were:
– Predict the probability of policy suspension ascertained from 24 months of unpaid premiums
– Determine factors that drive policyholders’ decisions to suspend premium payments
Challenges
- Multiple data sources (including external data, longitudinal data)
- Data used had limited metadata, sparse data dictionaries, and ambiguous business contexts for certain elements
Solution Methodology
- Developed hypotheses about relationships between data and policy suspension, using
- Macroeconomic data
- Characteristics and behavior of financial professionals
- Policyholder demographics and socio-economic data
- Policy data such as issue dates, coverage amounts, loans, and riders
- Transaction data such as counts of premiums, premium values, and dates paid
- Customer relationship interaction data from the client’s call center
- Experimented with various under-sampling/over-sampling techniques to account for unbalanced response variable with only 6% of policies suspended during study period
- Experimented with logistic regression, random forests, and other ensemble methods before settling on random forests for the final models
- Gained substantial performance improvements from hyperparameter tuning
- Achieved an overall accuracy of 98% – precision and recall both above 0.8 two quarters out and above 0.65 six quarters out
Business Impact
- The final models provided additional information for actuarial models used for revenue projections
- Nearly 85% of premium suspensions were identified correctly by the model