HTAC Pipeline · Step 7 of 7

Prevalence Results

Clinical Data
Deduplication
Cohort Queries
7
Results

The final output: suppressed, stratified prevalence estimates stored as PrevalenceEstimate rows within a StudyRun. Those rows feed an internal operations console, optional maps and charts, governed file extracts, and—where included in scope—a read-only programmatic interface.

What a result represents

Each PrevalenceEstimate row is one cell in a multi-dimensional cube: one condition × one health system × one geographic level × one geographic value × one stratifier × one stratifier value. A single study run for 22 conditions across 11 sites, 4 geo levels, and 9 stratifiers can produce tens of thousands of cells.

StudyRun organizes everything

A StudyRun groups all estimates from one execution of the pipeline: it records the run date, roster version, study period, which conditions were queried, and the run status (Pending → Running → Complete / Failed). Multiple study runs allow year-over-year trend comparisons.

Data quality reports

Each run can also generate DataQualityReport rows — one metric per site per run date. Metrics include completeness rates, missingness in key fields, and concept mapping coverage. DQ flags (Pass / Warn / Fail) help analysts identify sites with data issues before using their estimates.

Output: PrevalenceEstimate Record

FieldRoleDescription
study_runLinkThe run this estimate belongs to
conditionLinkThe HTAC health condition (e.g., Diabetes)
health_systemLink (optional)Empty when the row is a network-wide aggregate across sites
geo_levelCategorystate / county / zip / census_tract
geo_valueText (optional)FIPS code, ZIP, or census tract; empty for state-level rows
stratifierCategoryOne of nine stratification dimensions (race, language, age group, etc.)
stratifier_valueTextThe stratum label (e.g., Black or African American, Somali)
numeratorCount (optional)People with the condition in this stratum; empty when suppressed
denominatorCount (optional)Active people in the study window for this stratum; empty when suppressed
prevalence_rateRate (optional)Cases per 10,000 active people; empty when suppressed
is_suppressedFlagSet when the numerator is below the published threshold; detail fields are cleared

Sample Output — What a Published Row Looks Like

ConditionSiteGeoStratifierStratum NumeratorDenominatorRate (per 10k)Suppressed?
HypertensionStatewideStateTotalAll 163500 3,260.0No
DiabetesRegional Health ACountyRaceBlack or African American 1442 3,333.3No
HIVEssentia HealthCountyRaceAm. Indian / Alaska Native Yes (n < 11)

Figures above are illustrative. After the roster and condition definitions are loaded, run the condition-query job for the active study period to materialize estimates in this environment.

How Results Are Accessed

Operations

Staff console

Authorized analysts review, filter, and export estimate tables through the password-protected console (this deployment: /admin/). Access is limited to defined roles.

Integration

Read-only service interface

Downstream systems can pull stratified cells over HTTPS at /api/htac/v1/ with filters for condition, geography, stratifier, and suppression status, subject to the same publication rules as the console.

Reporting

Maps and dashboards

Maps and trend views are usually deployed in a separate reporting layer that reads the same published tables or service endpoints, so public traffic does not sit on operational databases.

Disclosure

Public-use files

Full public-use extracts are released only through the organization’s data-request process, with DUAs and redistribution terms attached.

Operational snapshot

840Total estimate cells
145Published
695Suppressed

Top conditions by cell count

ConditionEstimate cells
Depression168
Asthma168
Hypertension168
Diabetes168
Opioid Use Disorder168

Study Runs

Pipeline demonstration run 3

Run date: May 15, 2026  |  Roster version: May 15, 2026  |  Complete

Synthetic federated pipeline demonstration output.

HTAC 2022 Baseline

Run date: March 1, 2023  |  Roster version: Jan. 15, 2023  |  Pending

Baseline prevalence study covering the 2022 calendar year.

Pipeline Complete

From raw EHR records in OMOP CDM to suppressed, stratified prevalence estimates — spanning the conditions, sites, stratifiers, and geography levels configured in your study run.

Back to HTAC overview

From Demo to Deployment

This demonstration environment illustrates the full federated pipeline in a sandbox context. A production deployment for a health system consortium or public health agency would layer in: a governance structure with a steering committee and scientific review process, data use agreements with each contributing health system and administrative data provider, a privacy-preserving record linkage process using tokens generated at each contributing site, validated condition codesets reviewed by clinical and epidemiological staff, a data quality monitoring process running on a quarterly or annual cadence, and a publication process that governs who sees what results and under what conditions.

Building that production layer is the work this practice supports.

Latest pipeline demonstration (completed May 14, 2026 20:05). Counts on this page reflect synthetic federated data from that run. Open the live demo →
← Step 6: Suppression
← Pipeline overview