Statistical flags indicate unusual patterns — not proof of fraud or wrongdoing. Read our methodology

Download Data

All datasets are derived from HHS Open Data (227 million Medicaid billing records, 2018–2024). Files are in JSON format and can be opened with any text editor, Python, R, or data analysis tool.

Unified Risk Watchlist

Combine both datasets below to build a unified view of all flagged providers. The statistical watchlist contains 880 providers flagged by code-specific billing tests, while the ML scores file contains fraud-similarity scores from a model trained on 514 confirmed fraud cases. Join on the npi field to merge statistical flags with ML scores for a complete risk picture.

Risk Watchlist (Statistical)

880 providers flagged by 4 code-specific fraud detection tests. Includes flag types, flag details with specific codes and ratios, provider demographics, and total spending.

682 KB · 880 providersJSONDownload

Risk Watchlist (Legacy)

788 providers flagged by 9 legacy fraud detection tests including outlier spending, explosive growth, beneficiary stuffing, and billing consistency anomalies.

356 KB · 788 providersJSONDownload

ML Fraud Scores

Machine learning fraud similarity scores for top providers. Random Forest model trained on 514 OIG-excluded providers. Includes feature values like cost per claim, code concentration, and self-billing ratio.

235 KB · 700 scored providersJSONDownload

Top 1,000 Providers

The 1,000 highest-spending Medicaid providers ranked by total payments. Includes NPI, name, specialty, city, state, total paid, claims, beneficiaries, and flag counts.

265 KB · 1,000 providersJSONDownload

State Summaries

Aggregated Medicaid spending data for all 50 states. Includes total payments, claims, beneficiaries, provider counts, and top procedures by state.

6 KB · 50 statesJSONDownload

Procedure Codes

All 10,881 HCPCS procedure codes billed to Medicaid with total payments, claim counts, provider counts, and average cost per claim.

1.1 MB · 10,881 codesJSONDownload

Code Benchmarks

National cost-per-claim benchmarks for 9,578 procedure codes. Includes average, median, standard deviation, and percentile distributions (p10 through p99).

2.7 MB · 9,578 codesJSONDownload

Yearly Trends

Annual Medicaid spending totals from 2018 to 2024. Includes total payments, claims, beneficiaries, and provider counts per year.

1 KB · 7 yearsJSONDownload

Data Usage & Citation

This data is derived from publicly available U.S. Department of Health & Human Services Medicaid provider spending records. The underlying data is in the public domain. Our analysis, risk scores, and derived datasets may be freely used with attribution.

Suggested citation: OpenMedicaid by TheDataProject.ai. Analysis of HHS Medicaid Provider Spending data (2018–2024). Available at openmedicaid.org.

Important caveats: Statistical flags and ML scores indicate unusual billing patterns worth investigating — they are not proof of fraud or wrongdoing. Government entities, home care programs, hospitals, and large care organizations may legitimately bill at high rates. See our methodology page for details on how flags are calculated.