Boosting FWA Detection with ML and Automated Pipelines: A Health Tech Success Story

Boosting FWA Detection with ML and Automated Pipelines: A Health Tech Success Story

Industry

Health Tech | Insurance

Business Function

Claims Analytics & Servicing

Capability

Data Engineering | Outlier Detection | ML Model Development

Tech Stack

ADLS | Azure Synapse | Databricks | Power BI | PostgreSQL | Azure DevOps

Key Highlights: What This Case Study Covers

  • Robust data store and automated pipeline built to manage millions of daily healthcare claim records on scale.
  • End-to-end data ingestion, transformation, and quality monitoring with alert-based orchestration.
  • Real-time and batch-based feature generation across structured and unstructured data sources.
  • Outlier detection using 8 ML regression models spanning claim header and claim line variables.
  • Model-ready architecture enabling scalable deployment and continuous model retraining.
  • ~85% reduction in turnaround time, accelerated weeklong manual workflows into a streamlined one-day pipeline.

Client Overview

A high-growth health tech subsidiary of a major APAC insurer, the client enables payers and providers with digital health solutions, data science capabilities, and tech-enabled transformation across claims and healthcare operations.

The Ask

To support scalable FWA detection, the client aimed to build a high-performance data store capable of ingesting and processing millions of records per day, automating workflows, reducing manual overhead, and significantly improving operational efficiency in claims servicing.

Challenges

  • Manual audits: Traditional FWA detection relied on human audits, increasing delays and reducing coverage.
  • Ingestion complexity: The absence of a unified system for managing multi-source file ingestion introduced operational bottlenecks.
  • Growing data volume: The exponential rise in healthcare claims necessitated a shift to scalable, automated detection frameworks.
  • Provider diversity: High data complexity across providers requires advanced algorithms for effective detection.

Our Solution: Automated Data Ingestion and Pipeline Processing

Automated Data Ingestion

  • Automate daily ingestion of structured and nested JSON files from ADLS into Azure Synapse tables.
  • Archive ingested files and generate daily ingestion summary reports for tracking.
  • Alerts and failure handling mechanism.

Quality Checking and Monitoring

  • Perform data quality checks such as fill rates, table grain etc. on daily ingestion files.
  • Automatically reject loading files not meeting data quality thresholds/standards.
  • Create data quality summary report and share it with data leads over an email.

Data Transformation

  • Apply business rules on the ingested data to create gold layer tables for downstream use (dashboarding/modelling etc.)
  • Clear documentation (such as Data flow diagrams, source to target mapping etc.) improved the reliability and usability of the transformed data.

Feature Engineering

  • Created an exhaustive feature store generating both batch as well as real-time features.
  • Invoice pdf OCR data was used to create target variables (for ex: radiology cost, pathology cost, doctor cost etc.)

Model Development

  • The data was categorized into training, validation, and test sets.
  • Various ML regression algorithms were tested to identify the most effective models across claim and line level.

Scheduling and Monitoring

  • End to End automation, Scheduling was managed through Azure Synapse and DevOps.
  • Alerts and notifications ensured reliable workflows with minimal intervention across ingestion, transformation, and modeling workflows.

Outlier Detection Models

  • Developed a series of models, primarily at the claim’s header level, with additional models at the claims line level to capture detailed insights. Outlier classification was based on residual percentage thresholds computed from historical distributions.

Impact Delivered

  • ~80% + reduction in turnaround time by replacing a weeklong manual process with an automated one-day pipeline.
  • Optimized claims processing with low-latency pipelines, freeing up resources for higher-value activities.
  • Enhanced data quality enabled the development of more robust FWA models and improved straight-through processing (STP).
  • Intelligent risk prioritization sharpened investigative focus and reduced associated costs.
  • Boosted detection accuracy while strengthening compliance and data security standards.

Achieve Faster Claims Decisions, Sharper Risk Signals, and Scalable Fraud Detection

Operationalize AI and automation to keep pace with claim volumes, reduce investigative overhead, and stay ahead in a high-stakes, high-complexity environment.

Copyright © 2025 Tiger Analytics | All Rights Reserved