HTAC Pipeline · Step 6 of 7

Stratification and Suppression

Clinical Data
Deduplication
Cohort Queries
6
Suppression
7
Results

Each site's cohort is stratified across 9 demographic and social dimensions and 4 geographic levels. Any cell with fewer than 11 individuals is suppressed to protect small populations from re-identification — a requirement of the consortium’s Master Data Use Agreement.

Why stratify?

Statewide averages hide disparities. A condition may be twice as prevalent among Black covered lives or patients experiencing homelessness. Stratification makes these inequities visible and actionable for public health programs.

Why suppress at n < 11?

Small-cell suppression prevents re-identification: if only 3 incarcerated Hmong adults in a county have diabetes, publishing that count could identify specific individuals. The threshold of 11 (not 10) is common in public-health DUAs and is consistent with CDC and CMS suppression standards.

Denominator definition

The denominator is persons with at least one VisitOccurrence during the study period — not all persons in the database. This matches typical federated methodology: inactive patients who appear in historical records but had no encounters during the study window are excluded from the denominator.

Prevalence Rate Formula

Prevalence Rate (per 10,000 active patients) prevalence_rate = ( numerator / denominator ) × 10,000

Suppression clears numerators, denominators, and published rates for any cell below the agreed threshold, using half-up rounding to four decimal places for cells that remain publishable.

Suppression in Practice

Example: Diabetes by race, County X

White — 87 cases denominator 312 → rate = 2,788.5 per 10,000 ✓ Published
Hispanic — 14 cases denominator 58 → rate = 2,413.8 per 10,000 ✓ Published
Am. Indian — 8 cases n < 11 → numerator = null, denominator = null, rate = null ✗ Suppressed

Even the denominator is suppressed — publishing denominator alone could reveal the group size.

9 Stratification Dimensions

Race

OMOP race concept — Asian, Black/African American, White, Am. Indian, NHOPI, Other, Unknown

Ethnicity

Hispanic or Latino / Not Hispanic or Latino / Unknown

Language

Preferred language field — English, Spanish, Somali, Hmong, Vietnamese, Other

Sex

OMOP gender concept — Male / Female / Unknown

Age Group

Computed from year_of_birth: 0–17, 18–34, 35–49, 50–64, 65+

Homeless Status

Derived from homeless_flag on DeduplicatedRoster (Step 4)

Incarceration Status

Derived from jail_flag OR prison_flag on DeduplicatedRoster (Step 4)

Medicaid Status

Derived from medicaid_flag on DeduplicatedRoster (Step 4)

Total

Overall count across the entire site population — no breakdown

4 Geographic Levels

State

Single statewide estimate — highest population, lowest suppression rate

County

County level — county FIPS from Person.county_fips

ZIP Code

5-digit ZIP code from Person.zip_code

Census Tract

11-char FIPS census tract — finest resolution, highest suppression rate

Estimate Database Status

Total cells

840

Published

145

Suppressed

695 (82.7%)

Latest pipeline demonstration (completed May 14, 2026 20:05). Counts on this page reflect synthetic federated data from that run. Open the live demo →
← Step 5: Cohort Queries
← Pipeline overview
Step 7: Results →