ML scores are now integrated into the Unified Risk Watchlist

ML fraud similarity scores have been combined with our 13 statistical tests into a single unified risk system. Providers are ranked by a combination of statistical flags and ML scores into unified tiers: Critical, High, Elevated, and ML Flag. View the Risk Watchlist →

ML Methodology

How our random forest model works: trained on 514 confirmed-excluded providers from the OIG LEIE database, scoring 594K active Medicaid providers for fraud similarity.

Model

Random Forest

Ensemble classifier

AUC Score

0.7762

5-fold cross-validation

Providers Scored

594K

Active Medicaid providers

Training Labels

514

OIG-excluded providers

Two Complementary Approaches to Fraud Detection

Statistical Tests

13 rule-based tests that flag specific, explainable anomalies in billing behavior.

• Identifies exact codes, ratios, and dollar amounts
• Human-readable explanations for every flag
• Code-specific benchmarks (9,578 codes)
• Catches billing swings, outlier pricing, new entrants

ML Model

Pattern matching against 514 confirmed fraud cases from the OIG exclusion list.

• Learns complex multi-feature fraud signatures
• Catches patterns humans might miss
• Scores every provider on a 0–100% scale
• Validated against full-dataset cross-validation

Why both matter: Statistical tests are precise and explainable — they tell you exactly what's unusual. ML captures subtler patterns across multiple features simultaneously. A provider flagged by both methods is significantly more likely to warrant investigation. The unified Risk Watchlist combines both signals into a single ranked view.

Feature Importance

The model uses 14 billing features to identify patterns similar to confirmed fraud cases.

Total Payments

Total Claims

Total Beneficiaries

Unique Procedure Codes

Active Months

Cost Per Claim

Cost Per Beneficiary

Claims Per Beneficiary

Payments Per Month

Claims Per Month

Top Code Concentration

Self-Billing Ratio

Short Burst Billing

Low Codes / High Spend

Score Distribution

Most providers score very low. Only the top percentiles show patterns consistent with known fraud.

Median (p50)

10%Typical provider

p90

42%Top 10%

p95

50%Top 5%

p99

69%Top 1%

p99.9

82%Top 0.1%

Cross-Validation: Full-Dataset Training

To validate our approach, we trained three models on the full 594,235-provider dataset using Google Colab (12GB RAM). The results confirm our subsampled model as the strongest performer.

Random Forest (Full)

0.7656

594K training samples

Gradient Boosting

0.6815

594K training samples

Logistic Regression

0.6812

594K training samples

Key finding: Our production model (subsampled, AUC 0.7762) outperforms the full-dataset Random Forest (AUC 0.7656). This is because strategic subsampling — using 10K negative samples instead of 593K — reduces noise from the massive legitimate-provider class, allowing the model to better learn fraud patterns. The top-ranked providers are nearly identical across both models, confirming the robustness of our scoring.

Important Disclaimer

ML scores identify statistical patterns similar to known fraud cases. A high score is not evidence of fraud. Many legitimate providers may score highly due to unusual but lawful billing patterns (e.g., specialized practices, government entities, high-volume home care). These scores should be used as one input among many in any investigation.

Risk Watchlist →← Statistical methodology