HTAC Pipeline · Step 6 of 7
Stratification and Suppression
Each site's cohort is stratified across 9 demographic and social dimensions and 4 geographic levels. Any cell with fewer than 11 individuals is suppressed to protect small populations from re-identification — a requirement of the consortium’s Master Data Use Agreement.
Why stratify?
Statewide averages hide disparities. A condition may be twice as prevalent among Black covered lives or patients experiencing homelessness. Stratification makes these inequities visible and actionable for public health programs.
Why suppress at n < 11?
Small-cell suppression prevents re-identification: if only 3 incarcerated Hmong adults in a county have diabetes, publishing that count could identify specific individuals. The threshold of 11 (not 10) is common in public-health DUAs and is consistent with CDC and CMS suppression standards.
Denominator definition
The denominator is persons with at least one VisitOccurrence during the study period — not all persons in the database. This matches typical federated methodology: inactive patients who appear in historical records but had no encounters during the study window are excluded from the denominator.
Prevalence Rate Formula
Suppression clears numerators, denominators, and published rates for any cell below the agreed threshold, using half-up rounding to four decimal places for cells that remain publishable.
Suppression in Practice
Example: Diabetes by race, County X
Even the denominator is suppressed — publishing denominator alone could reveal the group size.
9 Stratification Dimensions
Race
OMOP race concept — Asian, Black/African American, White, Am. Indian, NHOPI, Other, Unknown
Ethnicity
Hispanic or Latino / Not Hispanic or Latino / Unknown
Language
Preferred language field — English, Spanish, Somali, Hmong, Vietnamese, Other
Sex
OMOP gender concept — Male / Female / Unknown
Age Group
Computed from year_of_birth: 0–17, 18–34, 35–49, 50–64, 65+
Homeless Status
Derived from homeless_flag on DeduplicatedRoster (Step 4)
Incarceration Status
Derived from jail_flag OR prison_flag on DeduplicatedRoster (Step 4)
Medicaid Status
Derived from medicaid_flag on DeduplicatedRoster (Step 4)
Total
Overall count across the entire site population — no breakdown
4 Geographic Levels
State
Single statewide estimate — highest population, lowest suppression rate
County
County level — county FIPS from Person.county_fips
ZIP Code
5-digit ZIP code from Person.zip_code
Census Tract
11-char FIPS census tract — finest resolution, highest suppression rate
Estimate Database Status
Total cells
840
Published
145
Suppressed
695 (82.7%)