ML scores are now integrated into the Unified Risk Watchlist
ML fraud similarity scores have been combined with our 13 statistical tests into a single unified risk system. Providers are ranked by a combination of statistical flags and ML scores into unified tiers: Critical, High, Elevated, and ML Flag. View the Risk Watchlist →
ML Methodology
How our random forest model works: trained on 514 confirmed-excluded providers from the OIG LEIE database, scoring 594K active Medicaid providers for fraud similarity.
Model
Random Forest
Ensemble classifier
AUC Score
0.7762
5-fold cross-validation
Providers Scored
594K
Active Medicaid providers
Training Labels
514
OIG-excluded providers
Two Complementary Approaches to Fraud Detection
Statistical Tests
13 rule-based tests that flag specific, explainable anomalies in billing behavior.
- • Identifies exact codes, ratios, and dollar amounts
- • Human-readable explanations for every flag
- • Code-specific benchmarks (9,578 codes)
- • Catches billing swings, outlier pricing, new entrants
ML Model
Pattern matching against 514 confirmed fraud cases from the OIG exclusion list.
- • Learns complex multi-feature fraud signatures
- • Catches patterns humans might miss
- • Scores every provider on a 0–100% scale
- • Validated against full-dataset cross-validation
Why both matter: Statistical tests are precise and explainable — they tell you exactly what's unusual. ML captures subtler patterns across multiple features simultaneously. A provider flagged by both methods is significantly more likely to warrant investigation. The unified Risk Watchlist combines both signals into a single ranked view.
Feature Importance
The model uses 14 billing features to identify patterns similar to confirmed fraud cases.
Score Distribution
Most providers score very low. Only the top percentiles show patterns consistent with known fraud.
Cross-Validation: Full-Dataset Training
To validate our approach, we trained three models on the full 594,235-provider dataset using Google Colab (12GB RAM). The results confirm our subsampled model as the strongest performer.
Random Forest (Full)
0.7656
594K training samples
Gradient Boosting
0.6815
594K training samples
Logistic Regression
0.6812
594K training samples
Key finding: Our production model (subsampled, AUC 0.7762) outperforms the full-dataset Random Forest (AUC 0.7656). This is because strategic subsampling — using 10K negative samples instead of 593K — reduces noise from the massive legitimate-provider class, allowing the model to better learn fraud patterns. The top-ranked providers are nearly identical across both models, confirming the robustness of our scoring.
Important Disclaimer
ML scores identify statistical patterns similar to known fraud cases. A high score is not evidence of fraud. Many legitimate providers may score highly due to unusual but lawful billing patterns (e.g., specialized practices, government entities, high-volume home care). These scores should be used as one input among many in any investigation.