Applied ML / Insurance Analytics / Human-in-the-Loop AI
Claims Risk Scoring & Triage Model
A claims triage model that moves from a useful-but-not-ready baseline to an enriched, explainable, controlled-pilot-ready review workflow.
Built a claims risk scoring workflow that first identifies when an intake-only model is not ready for deployment, then improves it through enriched operational signals, threshold optimization, explainability, watchlist design and controlled pilot governance.
The business was not blocked by model accuracy alone. It was blocked by whether the review queue would be useful, manageable and safe enough for human reviewers.
Final operating point
Controlled pilot ready, not automated decision-making.
The optimized Gradient Boosting model meets the target operating criteria for a controlled human-in-the-loop rollout. The model prioritizes review; it does not approve, reject or settle claims.
Before vs after enrichment
The project became valuable when the first model was treated as a readiness signal, not a success story.
The first model could capture risk, but only by creating a broad and inefficient review queue. The enriched version improved both risk capture and operational efficiency.
Model readiness journey
The important part is not just the final model. It is the decision process.
The first model was not forced into production. The project treats that result as a readiness decision, then simulates the next realistic step: shadow-mode learning and better data signal.
Intake-only model
The first model used only claim intake data. It had useful signal, but the review queue was too broad and not clean enough to justify operational deployment.
Enriched optimized model
The second iteration enriched the feature layer with LLM-style text risk signals, document quality, provider history and early lifecycle indicators.
Do not force weak models into production. Enrich data, re-train, optimize thresholds, and deploy only as controlled decision support.
Business problem
The claims team cannot review every claim with the same intensity.
Simple claims should move through the standard process quickly. Higher-risk claims should be identified earlier, before they become delayed, disputed, escalated or operationally expensive.
The practical question is not whether a model can classify claims. It is whether it can create a useful review queue without overwhelming human reviewers.
Deployment boundary
The model supports triage. It does not make final claim decisions.
The final system is approved only as a review-support layer for a governed human-in-the-loop pilot. Human reviewers retain final decision authority.
- No automated approval.
- No automated rejection.
- No automatic payout decision.
- No legal or fraud determination without review.
Final triage policy
A three-level operating model, not a binary prediction.
The final policy separates the main priority review queue from a small watchlist layer for near-threshold claims with strong business signals.
Priority human review
risk_score ≥ 0.29
850 claims 75.1% precisionWatchlist / light review
0.25 ≤ score < 0.29 + strong drivers
50 claims 9 extra high-risk capturedStandard processing
below watchlist criteria
2,045 claims standard workflowPortfolio visuals
The visual evidence of the project journey.
The hero image already shows the readiness journey. These visuals explain how the enriched model works, how the operating policy is structured and how a reviewer can understand a case.
Enriched feature architecture
LLM-style text signals, document quality, provider history and early lifecycle indicators improve risk separation.
Final triage operating model
Priority review, watchlist/light review and standard processing are separated into clear operational layers.
Case-level explainability
Reviewer-friendly explanations translate model scores into understandable business drivers.
Technical workflow
From raw claims to pilot governance.
The project is structured as a reproducible ML pipeline, not a one-off notebook. Each step answers a business readiness question before moving closer to pilot deployment.
Core stack
Applied ML with business-facing controls.
The technical stack is intentionally practical: strong enough to show applied Data Science capability, but still realistic for a business-facing ML workflow.
What this project proves
The value is the full applied Data Science decision journey.
This is not positioned as a perfect model or an autonomous AI system. It is positioned as a realistic ML workflow where model performance, operational capacity and governance all matter.
Built an end-to-end ML triage workflow, not an isolated notebook.
Controlled for leakage by separating intake data from downstream claim outcomes.
Translated model scores into an operational review policy with capacity constraints.
Improved model readiness through richer business signals rather than model complexity alone.
Added case-level explanations so reviewers can inspect and challenge recommendations.
Defined monitoring, pause criteria and governance boundaries before pilot rollout.
Explainability
Reviewers get reasons, not just scores.
The case-level explanation layer translates model outputs into reviewer-friendly business drivers: urgency, contradictions, legal language, missing documentation, third-party involvement, document quality and early lifecycle risk signals.
The explanations are not causal proof. They are operational context for a human reviewer during a controlled rollout.
Monitoring
The pilot has explicit stop conditions.
Weekly monitoring tracks review queue volume, precision, recall, false negatives, score distribution, data quality, enrichment health and reviewer feedback.
The pilot should pause if precision falls below 55%, recall falls below 70%, enrichment failures exceed 5%, or the priority queue exceeds 40% for two consecutive weeks.
Repository evidence
Three files that document the decision.
The full repository includes datasets, scripts and reports. These are the three most important evidence files for understanding the final recommendation.
reports/final_model_readiness_report.mdreports/gradient_boosting_optimization_report.mdreports/case_level_explainability_report.md Business conclusion
The strongest signal is not only the model performance. It is the full readiness journey.
This project demonstrates how to move from a weak-but-promising model to a governed review-support rollout: identify the limitation, enrich the data, re-train, optimize thresholds, add explanations, define monitoring and keep humans in control.
Next step
Want to see how this fits into the full portfolio?
Explore the rest of the applied analytics portfolio or reach out directly if you want to discuss the project, the modelling decisions, or the business reasoning behind the controlled pilot recommendation.