White paper

HHS’s Medicaid provider spending data: Use cases and considerations for state agencies and managed care organizations

ByJohn Belanger, Alex Scharpenberg, and Adam Hearn

2 April 2026

In February 2026, the Department of Health and Human Services (HHS) published a large-scale, open-source dataset of Medicaid payments,¹ enabling nationwide provider- and service code‑level analysis by month of service from January 2018 through December 2024.²

The dataset is limited to outpatient and professional claims, and includes claims covered under Medicaid fee for service, Medicaid managed care, and the Children’s Health Insurance Program (CHIP). Aggregated from Transformed Medicaid Statistical Information System (T-MSIS) data,³ the dataset includes the following information.

Billing and servicing provider: national provider identifier (NPI) for both the billing and servicing providers associated with a claim
Service type: Healthcare Common Procedure Coding System (HCPCS) code, without modifier detail
Claim service month: the month in which the service occurred (e.g., January 2018 represents Medicaid-paid expenditures for services delivered in January 2018)
Utilization, expenditures, and recipient counts: the distinct count of beneficiaries served, the total claim count, and total Medicaid-paid expenditures for a given provider/HCPCS/incurred month combination⁴

As we discuss later in further detail, one potential use of this dataset is the ability to flag provider anomalies as part of fraud, waste, and abuse (FWA) initiatives. While there has been substantial discussion surrounding this, there are several additional ways that state Medicaid agencies and managed care organizations (MCOs) may be able to leverage the open-source Medicaid payments data to better understand and manage their Medicaid populations.

In the following sections, we outline four potential uses of the HHS data for state Medicaid agencies and MCOs, including provider anomaly detection. While each use case includes context-specific considerations, many are relevant across all four.

1. Support Medicaid contract negotiations with healthcare providers.

While state Medicaid agencies have access to all payment data within their state, MCOs do not. MCOs may consider leveraging the open-source Medicaid payments data to better understand service code-level volumes and reimbursement of Medicaid services for each provider. This information can be especially useful when informing provider network contracting decisions. Questions that this type of analysis could help inform include:

What is the average service code-level cost per unit for a given provider, and how does it compare to my organization’s contracted rate?
How much of the provider’s Medicaid revenue from outpatient and professional services does my organization account for, even at the service code level?
How have each of the items above changed over time?

As MCOs look to answer these questions, there are several important caveats and considerations when using the open-source Medicaid payments data. While not exhaustive, key considerations include the following.

The presence of value-based payments: Many providers are enrolled in value-based payment programs with payers, which allow payers to negotiate lower contracted rates while paying performance bonuses. These performance bonuses are part of payers’ total cost of care but, in many cases, are omitted from encounter data.
The presence of supplemental and directed payments: Supplemental and directed payments, such as upper payment-limit supplements and other state-directed payments, are often disbursed outside of the claims process and are unlikely to be reflected as expenditures in the open-source Medicaid payments dataset. This is particularly relevant for academic medical centers and other safety-net providers, where supplemental payments can represent a material share of total Medicaid reimbursement.
The presence of sub-capitation arrangements: Some providers cover a subset of services under a sub-capitation arrangement with an MCO. These payer-to-provider sub-capitation payments are generally included within a payer’s encounter data. At times, these sub-capitation payments are billed under one service code, which are often intended to cover the cost of a range of services, while covered services are billed as “penny” claims (further described below). The cost and/or utilization for the service code that includes the sub-capitation payment may appear artificially higher for the given provider compared to its peers. Likewise, the services encompassed under the sub-capitation payment may appear to have artificially lower expenditures.

In some cases, providers submit “penny” claims for services encompassed by the sub-capitation arrangement, which may include units with seemingly no expenditures. While the Centers for Medicare & Medicaid Services (CMS) provided guidance for the T-MSIS data submissions underlying the open-source Medicaid payments data, the open-source data does not provide the level of granularity needed to identify claims impacted by these types of arrangements. This consideration may also be of particular interest when determining cost metrics, as these discrepancies can create biases.

Coordination of benefits and third-party liability (TPL): The expenditure figures in the open-source Medicaid payments dataset represent Medicaid-paid amounts derived from T-MSIS, as opposed to allowed amounts or total cost of service. Because Medicaid is the payer of last resort, the Medicaid-paid amount for beneficiaries also covered by Medicare or other non-Medicaid coverage reflects only the residual portion after the primary payer. This can materially depress apparent spending for affected service codes. The open-source Medicaid payments dataset does not include an indicator for dual eligibility or other TPL, so users cannot isolate these claims or quantify the degree to which coordination of benefits compresses the reported expenditure figures. As a result, spending comparisons across providers or states will reflect differences in both negotiated rates and payer mix, rather than the underlying cost of service delivery alone.

2. Compare Medicaid expenditures and utilization levels across states.

State Medicaid agencies could compare trends in expenditures or utilization levels for certain services across states. This type of analysis could be used as a cost and utilization benchmarking tool. However, state-to-state comparisons are inherently challenging and require careful consideration of several factors, a subset of which are outlined below. Once key differences between the states being compared are identified, understood, and accounted for, users may be able to use the resulting analysis as a starting point for launching more targeted initiatives.

The underlying T-MSIS data have known challenges related to data quality and completeness, and these issues are assessed at the state level through CMS validation and monitoring processes, which can affect the usability of certain data elements.⁵
Geographic differences may cause variations in cost and utilization. For instance, cost of living can impact provider unit costs, while factors such as population density and geographic isolation can affect access to services and in turn utilization.
HCPCS codes are not applied uniformly across state Medicaid programs or plans. Service definitions and billing configurations can vary state to state and even between MCOs within a state. As a result, direct comparisons at the individual code level may reflect administrative differences rather than true variations in service delivery, unit cost, utilization, or policy.

Coding practices also evolve over time. New codes are introduced for historically provided services, legacy codes are retired, and adoption of changes can vary across programs. These shifts can create spikes or dips in service utilization that are due to administrative changes rather than actual changes in service delivery.
In general, Medicaid services that are also covered by Medicare tend to have more consistent coding configurations across states and payers, while Medicaid-eligible services that are not Medicare-reimbursable (such as behavioral health) tend to have less consistent coding configurations.

Modifiers often provide additional details about how services are delivered (e.g., in an office setting, in-home, via telehealth, daytime versus overnight, etc.) as well as what unit types represent (e.g., 15 minutes, 1 hour, per diem, etc.). Without this level of detail provided in the open-source Medicaid payments dataset, users should consider how the modifier mix underlying the service code might bias results.

For example, certain services may have higher Medicaid rates when provided overnight as opposed to during the daytime. If overnight services are delineated through the use of a modifier code, without this detail, providers administering more frequent overnight services may appear to have a higher unit cost for what appears to be the same service based solely on the HCPCS code.

Medicaid program design can vary materially state to state, impacting covered services as well as the composition and size of enrolled populations.

For example, several states have adopted Medicaid expansion since 2018, which could highly distort cost and utilization trends in these states. Among expansion states, the share of Medicaid beneficiaries enrolled through the expansion population varies widely, ranging from about 13% to 53% of total Medicaid enrollment depending on the state.⁶
Given differences in covered services between state programs, analyses comparing HCPCS-level costs or utilization may be obscured. A service or benefit available in one state may not be offered or may be effectively substituted by a service under a different code in another state. In addition, state policies related to state minimum fee schedules, clinical coverage policies, and restrictions on prior authorizations also result in material utilization and expenditure variances.

3. Analyze Medicaid recipient measures.

Given that the open-source Medicaid payments dataset provides claim expenditures and recipient counts, cost or utilization per-recipient per-month measures can be particularly useful when analyzing drivers of utilization trends over time. These measures help users understand when increases are driven by growth in the number of recipients or by increases in the average number of units a recipient receives per month. For example, an MCO could use these measures to assess whether its members are receiving materially higher utilization per recipient per month than the broader Medicaid population for a given service. Similarly, a state agency could identify services where its utilization per recipient per month is materially higher than that of a benchmarking state and begin to investigate the factors driving the discrepancy.

When using the open-source Medicaid payments data to estimate cost and utilization rates, users should consider several limitations.

Recipient counts may be duplicated when aggregating certain splits of the data. For example, if an individual receives two primary care services in a given month, but the services are captured through different HCPCS codes or are provided by different NPIs, the summarized data may capture this individual recipient twice.
While the open-source Medicaid payments data offers recipient counts, which reflects the distinct number of individuals receiving a service, enrollment counts are typically another key metric in healthcare analytics. As opposed to the number of individuals receiving services, enrollment counts reflect the number of individuals who were eligible for Medicaid during a given time and could receive services (but did not necessarily do so). Often, per-Medicaid enrollee measures provide additional insights beyond per-recipient measures, such as the percentage of Medicaid-enrollees receiving a given service or the program cost per enrolled individual. However, Medicaid enrollment context is missing from the dataset. It may be possible to approximate enrollment in the geographical area of the servicing NPI using publicly available sources such as the American Community Survey or county-level Medicaid enrollment data published by a state; however, individuals can incur services outside of their geographical catchment area, often crossing county and state lines. Without additional details regarding the recipient’s county or state of residence, estimates of per-Medicaid enrollee measures will be constrained by this limitation.

4. Use the open-source Medicaid payments dataset to detect anomalies and assess program integrity.

Given the public attention that the open-source Medicaid payments dataset has received as a tool for identifying potential fraud, waste, and abuse, it is worth examining in some detail what anomaly detection approaches the data can and cannot support. Payers and state Medicaid agencies can leverage techniques of varying levels of complexity to flag variances in data and statistical anomalies as part of fraud, waste, and abuse investigations.

An example of a more straightforward approach might include comparing provider billing patterns of evaluation and management codes that are usually provided in an office setting. Since there are distinct evaluation and management codes for each severity level, users could analyze whether certain providers appear to have an irregularly high distribution of high-severity visits. While there can be valid reasons for why a provider bills higher severity levels than its peers, this type of analysis can be a good starting point for better understanding the utilization and population within a Medicaid program.

Fraud Detection

Crushing Fraud Chili Cook-Off Competition (CMS)

More robust anomaly detection can be realized through the application of artificial intelligence (AI) or machine learning (ML) techniques capable of accounting for the clinical and administrative nuances that simpler approaches cannot address. In December 2025, CMS hosted the Crushing Fraud Chili Cook-Off competition, a market-based research challenge that asked participants to develop explainable AI models capable of detecting anomalies in Medicare claims data.⁷ Milliman’s winning submission demonstrated an actuarially grounded framework that adjusts for patient acuity before comparing providers to dynamically constructed peer groups, an approach designed to distinguish legitimate clinical complexity from genuinely anomalous providers.⁸ By first risk-adjusting claims using demographic and clinical factors and then scoring provider behavior relative to peer benchmarks, the model reduces the false-positive problem that plagues simpler threshold-based approaches. The model also incorporates network analytics derived from provider-to-provider patient sharing patterns, enabling detection of coordinated billing activity that single-provider methods would miss.

While the recently published HHS dataset contains provider, service code, and temporal dimensions that can support basic screening for potential FWA, it lacks several of the data elements that would enable more complex analysis. For example:

The dataset does not include diagnosis codes or modifiers, limiting the ability to risk-adjust for patient acuity or distinguish between clinically appropriate and anomalous billing patterns.
Member-level detail is also absent, which precludes the construction of provider-to-provider patient sharing networks used to detect potentially coordinated billing activity.

As a result, analyses based solely on the open-source Medicaid payments data may produce elevated false-positive rates, flagging providers whose billing patterns reflect legitimate differences in patient complexity rather than genuine anomalies.

The open-source Medicaid payments dataset is best viewed as a screening layer: useful for identifying providers or service codes that warrant further investigation, but not a substitute for the claims-level detail required to support defensible findings. Entities seeking to move beyond screening toward actionable program integrity outcomes may wish to consider approaches that incorporate the richer data elements (e.g., diagnosis codes, member-level detail, and provider network relationships) that require more complex analysis.

Conclusion: The open-source Medicaid payments dataset represents a starting point for state agencies and MCOs

The HHS Medicaid provider spending dataset represents a meaningful expansion of publicly available Medicaid data, and the four use cases outlined illustrate the breadth of analytical applications it can support. However, each use case carries substantial nuance. Differences in coding practices, program design, data quality, and payment arrangements across states and payers mean that surface-level analyses of the data may produce misleading results if key contextual factors are not identified and accounted for.

For this reason, the open-source Medicaid payments dataset is best understood as a starting point rather than an end point. It can reveal questions, highlight areas of apparent variation, and provide directional context. However, translating those observations into actionable findings typically requires supplementation with additional data sources, domain-specific expertise, and methodological rigor tailored to the particular use case. State Medicaid agencies and MCOs that invest in understanding the dataset’s structure and limitations, as well as the considerations outlined in this paper, will be better positioned to use it effectively and avoid the analytical pitfalls that can accompany newly available large-scale public data.

For organizations seeking to move beyond the publicly available open-source Medicaid payments data, access to the underlying T-MSIS research identifiable files (RIF) requires a formal application through the CMS Research Data Assistance Center (ResDAC), the purchase of a virtual seat at the CMS Virtual Research Data Center (VRDC), and a review and approval process that CMS estimates at approximately two to four months from application to delivery. Annual seat costs typically range from $15,000 to $22,000 per seat.⁹

Milliman maintains active T-MSIS seat access and has experience working within these constraints, supplementing the research files with additional data sources and actuarial methods to produce the claims-level insights that neither the public dataset nor the RIF data can fully support on its own. Organizations that require timely, analytically rigorous insights from Medicaid data but lack the infrastructure or timeline to establish their own T-MSIS access may wish to consider engaging Milliman for hands-on analytic support.

Limitations

The analysis and use cases presented in this paper are based on Milliman's review of the publicly available HHS Medicaid provider spending dataset, and the supplementary documentation describing its structure and scope. Milliman has not independently verified the completeness or accuracy of this source nor the underlying T-MSIS data, which is submitted by individual state Medicaid agencies and is subject to known data quality limitations that vary by state, data element, and reporting period. The use cases discussed are illustrative and are not intended to represent an exhaustive set of applications, or to serve as a substitute for the additional data sourcing, domain-specific expertise, and methodological design required to produce actionable analytical findings. Organizations considering use of the dataset should evaluate it in the context of their own data environment, programmatic needs, and quality assurance standards.

Qualifications

Guidelines issued by the American Academy of Actuaries require actuaries to include their professional qualifications in all actuarial communications. John Belanger and Alex Scharpenberg are members of the American Academy of Actuaries, and meet the qualification standards for performing the analyses in this correspondence.

¹ While the HHS website, https://opendata.hhs.gov/datasets/medicaid-provider-spending/, was down for maintenance at the time this paper was published, the dataset and accompanying documentation is widely available on third-party websites

² Expenditures reported for November and December 2024 decline sharply relative to prior months, a pattern consistent with lagged or incomplete state T-MSIS submissions. See https://www.mcdermottplus.com/blog/regs-eggs/over-easy-or-under-done-first-look-at-doges-medicaid-data/.

³ Medicaid.gov. (n.d.). T-MSIS data guide. Retrieved March 31, 2026, from https://www.medicaid.gov/tmsis/dataguide/v3/.

⁴ The inclusion of managed-care expenditure data is particularly notable, as T-MSIS has historically redacted plan-to-provider payment amounts on managed-care encounters. This data has been historically difficult to access outside of individual state or plan reporting.

⁵ Medicaid.gov. (n.d.). Transformed Medicaid Statistical Information System (T-MSIS). Retrieved March 31, 2026, from https://www.medicaid.gov/medicaid/data-systems/medicaid-and-chip-business-information-solution/transformed-medicaid-statistical-information-system-t-msis.

⁶ Tolbert, J., Bell, C., & Rudowitz, R. (November 27, 2024). KFF. Retrieved April 1, 2026, from https://www.kff.org/medicaid/medicaid-expansion-is-a-red-and-blue-state-issue/.

⁷ CMS.gov. (n.d.). Crushing Fraud Chili Cook-Off Competition. Retrieved April 1, 2026, from https://www.cms.gov/priorities/crushing-fraud-waste-abuse/overview/crushing-fraud-chili-cook-competition.

⁸ Milliman wins CMS “Crushing Fraud Chili Cook-Off” competition. (December 23, 2025). Business Wire. Retrieved April 1, 2026, from https://www.businesswire.com/news/home/20251223398078/en/Milliman-Wins-CMS-Crushing-Fraud-Chili-Cook-Off-Competition.

⁹ Centers for Medicare & Medicaid Services Research Data Assistance Center (April 23, 2024). Chronic Conditions Warehouse (CCW) Virtual Research Data Center (VRDC) Fees Frequently Asked Questions (FAQ). Retrieved April 1, 2026, from https://resdac.org/sites/datadocumentation.resdac.org/files/2024-04/CMS%20Fee%20List%20for%20CCW%20VRDC%20Cloud%20Environment.pdf.

HHS’s Medicaid provider spending data: Use cases and considerations for state agencies and managed care organizations

1. Support Medicaid contract negotiations with healthcare providers.

2. Compare Medicaid expenditures and utilization levels across states.

3. Analyze Medicaid recipient measures.

4. Use the open-source Medicaid payments dataset to detect anomalies and assess program integrity.

Conclusion: The open-source Medicaid payments dataset represents a starting point for state agencies and MCOs

Limitations

Qualifications

Explore more tags from this article

About the Author(s)

John Belanger

Alex Scharpenberg

Adam Hearn

We’re here to help

CHOOSE A LOCATION AND LANGUAGE