HTAC Pipeline · Step 4 of 7

State Data Enrichment

Clinical Data
Deduplication
4
Enrichment
5
Cohort Queries
6
Suppression
7
Results

EHR data alone cannot capture whether a patient is experiencing homelessness, is currently incarcerated, or is enrolled in public coverage. Authorized agency extracts (Medicaid, housing services, corrections, immunization registries, vital events) are staged and matched to the DeduplicatedRoster by PPRL token hash.

Why enrichment matters

The HTAC mission is to understand health disparities across communities. Medicaid status, housing instability, and incarceration are among the strongest predictors of chronic disease burden — but none of these appear in EHR records. Enrichment makes these dimensions available as stratifiers in Step 6.

How enrichment works

  • Approved extracts are loaded into staging tables keyed on PPRL token hash.
  • Each enrichment function queries its staging table for tokens in the roster.
  • Matched roster rows get their flag and date fields set; unmatched rows get False / None.
  • Two DB round-trips per source regardless of roster size (efficient at scale).

Idempotency

All five enrichment functions perform a complete refresh — every roster row is updated on each run. Re-running with updated staging data is safe and replaces stale flags rather than accumulating duplicates.

The Five Enrichment Sources

Medicaid enrollment

Staging model: MedicaidEnrollment Function: enrich_from_medicaid(roster_qs) Frequency: Example: monthly eligibility extract

Matches on token_hash. Sets medicaid_effective_date to the earliest enrollment period across all records for that token. Captures current and historical Medicaid-style coverage where your jurisdiction defines it.

medicaid_flag medicaid_effective_date

HMIS — homelessness services

Staging model: HMISRecord Function: enrich_from_hmis(roster_qs) Frequency: Example: quarterly extract

The Homeless Management Information System (HMIS) tracks service touchpoints: street outreach, emergency shelter, transitional housing, and supportive housing. Sets homeless_first_service_date to the earliest entry date across all service types.

homeless_flag homeless_first_service_date

Corrections — jail & prison records

Staging model: DOCRecord Function: enrich_from_doc(roster_qs) Frequency: Quarterly extract

Jail (county) and prison (state) records are handled in two separate queries so a person with both types of incarceration history gets both flags set independently. Admission dates are the earliest admission across all matching records.

jail_flag jail_admission_date prison_flag prison_admission_date

Immunization information system (IIS)

Staging model: MIICRecord Function: enrich_from_miic(roster_qs) Frequency: Example: quarterly extract

COVID-19 and influenza vaccinations are tracked separately. COVID stores the earliest vaccination date; influenza stores only a boolean (any influenza vaccination on record). Many jurisdictions operate a statewide or regional IIS with high population coverage.

covid_vaccinated_flag covid_vaccine_date influenza_vaccinated_flag

Vital statistics (death records)

Staging model: VitalStatisticsRecord Function: enrich_from_vitals(roster_qs) Frequency: Example: annual extract with quarterly updates

Civil vital events are matched to the roster. When multiple records exist for the same token (a data quality edge case), the latest death date is used. Deceased patients can be excluded from active-patient denominators or analyzed separately.

deceased_flag death_date

Enrichment Status — Current Roster

Staging tables are empty in the simulation

The five staging tables (MedicaidEnrollment, HMISRecord, DOCRecord, MIICRecord, VitalStatisticsRecord) contain 0 rows total. In a real deployment, data stewards would load extracts from these five sources before running the roster enrichment job, which would populate the flags below.

219 Medicaid enrolled
2 Homeless / unstably housed
13 Jailed
9 Incarcerated (prison)
0 COVID vaccinated
0 Influenza vaccinated
0 Deceased

Staging Table Row Counts

Staging TableRows loaded
MedicaidEnrollment0
HMISRecord0
DOCRecord0
MIICRecord0
VitalStatisticsRecord0

After staging files are loaded, run the roster enrichment job to refresh flags across the 1,000-row roster.

Latest pipeline demonstration (completed May 14, 2026 20:05). Counts on this page reflect synthetic federated data from that run. Open the live demo →
← Step 3: Deduplication
← Pipeline overview
Step 5: Cohort Queries →