Federated analytics · Privacy-preserving record linkage

Health Trends Across Communities (HTAC)

HTAC describes a privacy-preserving, multi-site approach to estimating health condition prevalence: clinical data remain with each contributor in a common clinical model (OMOP CDM), linkage uses one-way tokens, and only suppressed aggregate statistics are shared with a coordinating center. The pages here walk through that workflow in a working demonstration environment. This is not an official reporting system and the record counts reflect a sandbox instance without a loaded synthetic dataset.

By the numbers

11

MNEHRC member health systems

5.4M+

patients in the statewide dataset

94%

of Minnesota population covered

70+

health conditions tracked

What is HTAC?

Routine surveillance still leans on registries, surveys, and payer fragments, each with blind spots. HTAC-style programs add federated queries against a shared clinical model, token-based record linkage, and small-cell suppression rules so agencies and collaboratives can estimate: how common is a condition across the communities we serve—without pooling identifiable clinical rows in one warehouse?

Contributors map electronic health data to the same model, execute approved analytic packages on their own infrastructure, and transmit aggregate counts only. Where law and policy allow, administrative and service-system files add coverage, housing, justice, and mortality context so disparity dimensions are explicit in the analytic plan—not implied after the fact.

The same design principles appear in large research networks and statewide data collaboratives: published vocabularies, versioned cohort definitions, documented data agreements, and review before new uses ship.

How collaboratives like this stay accountable

Sustainable federated analytics depend less on any one vendor tool than on shared governance: a steering body with balanced representation, a scientific or ethics review path for new uses, documented data-use agreements, and technical assistance that keeps OMOP mappings and quality checks current.

Governance

Voting or consensus models across participating sites; conflict-of-interest policies; public benefit framing for new analyses.

Data infrastructure

Local OMOP instances, versioned concept sets, automated conformance checks, and packaged analytic definitions that can be re-run on a schedule.

Analysis

Condition definitions as published concept lists; study periods and denominators disclosed with results; change logs for definitions and code sets.

Equity focus

Stratifiers for race, ethnicity, language, geography, age, sex, and social drivers where policy allows—surfacing disparities aggregate dashboards hide.

Illustrative scale: In published statewide federated designs, large integrated networks jointly cover a majority of residents in their region; deduplication rates across sites are often substantial. Your deployment’s numbers will depend on participation and legal scope.

What is OMOP?

OMOP (Observational Medical Outcomes Partnership) is an open, internationally adopted common data model (CDM) maintained by the OHDSI collaborative that standardizes how clinical data from different EHR systems are structured and coded.

Without a common model, comparing two large vendors’ extracts means bespoke translation for every pair of systems. OMOP addresses this by giving each participating site a standard schema — the same table names, column names, and vocabulary codes everywhere. Multi-site collaboratives typically invest sustained effort to harmonize OMOP before federated queries can run consistently.

Technical reference

OMOP domain What it captures Example HTAC use
ConditionOccurrence Diagnoses (ICD-10 mapped to SNOMED concept IDs) Identify patients with Type 2 Diabetes or Opioid Use Disorder
Measurement Lab results and vitals (LOINC codes) HbA1c > 6.5% to refine a diabetes cohort
DrugExposure Prescriptions and administrations (RxNorm) Buprenorphine prescriptions for OUD treatment cohort
Observation Clinical facts outside other domains (SNOMED) Tobacco use, pregnancy status, social determinants
VisitOccurrence Encounters — inpatient, outpatient, ED, telehealth Restrict cohort to patients with ≥1 qualifying visit in study period
Person Demographics — birth year, sex, race, ethnicity, ZIP, census tract Stratify prevalence estimates by race/ethnicity, county, or census tract

Codesets (lists of OMOP concept_ids) are versioned in the HTAC Condition and ConceptCode models so analysts can build and audit condition definitions without ad hoc table-level queries.

How the HTAC Pipeline Works

Walk through each step in detail — data fields captured, why the step exists, and how it works.

  1. Clinical data in OMOP CDM Each site's EHR system maps patient records to the OMOP Common Data Model — a shared vocabulary covering six domains (Person, Visit, Condition, Drug, Measurement, Observation). Query scripts run against each site's local OMOP database; no patient records leave the site.
  2. Tokenization at the site Each health system hashes six PII fields — first name, last name, date of birth, sex, phone number, and ZIP code — into a one-way HMAC-SHA256 token using a shared salt. No raw PII is transmitted. The algorithm has been validated at 97% precision and 75% recall for public health surveillance purposes.
  3. Deduplication into a statewide roster The Data Coordinating Center (DCC) receives tokens from all sites. Because the same PII produces the same hash, patients seen at multiple sites generate identical tokens. The deduplication routine merges these into a single DeduplicatedRoster row. In many multi-site deployments, a large share of patients generate records at more than one organization—deduplication materially changes denominators.
  4. Enrichment with administrative data Authorized public-sector and community-agency data stewards provide periodic extracts aligned to the linkage model. The roster is joined by token hash to coverage, housing, justice, immunization, and vital events where agreements allow—adding social-risk context without pulling raw clinical rows into a central warehouse.
  5. Federated condition cohort queries Analysts define a health condition using OMOP concept codes. Query scripts run at each site against that site's local OMOP database, searching ConditionOccurrence, Measurement, DrugExposure, and Observation tables during a defined study period. Only aggregate counts — not patient records — leave each site.
  6. Stratification and suppression Counts are stratified by geography (state, county, ZIP code, census tract), race/ethnicity, language, age group, sex, and social-risk flags (homelessness, incarceration, Medicaid). Any stratum with fewer than 11 individuals is suppressed before results are stored, per the consortium’s Master Data Use Agreement and local policy. Prevalence rates are calculated per 10,000 enrolled patients.
  7. Reporting Suppressed, stratified estimates are published through agreed channels: narrative summaries, governed file extracts, a password-protected operations console for authorized staff, delimited downloads, and a read-only programmatic interface (for example /api/htac/v1/) where an API is part of the release. Cadence follows the governance calendar.

Known Data Gaps

Tribal and Indigenous health sovereignty — Tribal nations operate sovereign health systems under federal self-governance frameworks. Data from those systems belong in nation-to-nation agreements, not silent inclusion in regional aggregates. Prevalence products that omit authorized Tribal participation can undercount American Indian / Alaska Native communities; governance plans should address Tribal data sovereignty explicitly, not only in disclaimers.

Community health centers and independent clinics — Federally qualified health centers and smaller independent clinics often sit outside the first wave of enterprise EHR integrations. Because they disproportionately care for uninsured, immigrant, and low-English-proficiency populations, their absence can skew equity stratifiers unless inclusion plans and trust-building investments are explicit.

Privacy & Governance

  • PPRL tokens are one-way cryptographic hashes — mathematically irreversible without the shared salt.
  • Small-cell suppression (n < 11) is applied to all strata before estimates are stored or published, when required by policy.
  • Patient-level data stay at participating organizations unless a separate legal mechanism authorizes transfer.
  • Data use is governed by master and project-specific data agreements, tribal consultation where applicable, and institutional review as required.
  • HTAC deployments must follow all applicable federal, state, Tribal, and local health-data authorities—not a single static statute.
  • Ethics and oversight pathways depend on use case and jurisdiction—coordinate with counsel and IRBs as applicable.

Resources

Learning networks & field infrastructure

OMOP / OHDSI

Federal & National Data Programs

Peer-reviewed context

Illustrative publications on statewide federated surveillance and equity-focused prevalence work (titles reflect original study settings).

  • Johnson SG, Chamberlain AM, Drawz P, et al. Design and Implementation of a State-Wide Network for Near Real-Time Public Health Surveillance and Research: The Minnesota Electronic Health Record Consortium Experience. Learning Health Systems. 2026. doi:10.1002/lrh2.70070
  • Zellmer L, Van Siclen R, Bodurtha P, et al. Estimating Health Condition Prevalence Among a Statewide Cohort with Recent Homelessness or Incarceration. J Gen Intern Med. 2025;40(15):3733–42. doi:10.1007/s11606-025-09814-x
  • Shearer RD, Vickery KD, Bodurtha P, et al. COVID-19 Vaccination of People Experiencing Homelessness and Incarceration in Minnesota. Health Affairs. 2022;41(6):846–852.
  • Winkelman T, Chamberlain A, Margolis K, et al. Health Trends Across Communities: A Healthcare System–Public Health Collaboration to Advance Health Equity Across Minnesota. Annals of Family Medicine. 2024;22:6757.