HTAC Pipeline · Step 4 of 7
State Data Enrichment
EHR data alone cannot capture whether a patient is experiencing homelessness, is currently incarcerated, or is enrolled in public coverage. Authorized agency extracts (Medicaid, housing services, corrections, immunization registries, vital events) are staged and matched to the DeduplicatedRoster by PPRL token hash.
Why enrichment matters
The HTAC mission is to understand health disparities across communities. Medicaid status, housing instability, and incarceration are among the strongest predictors of chronic disease burden — but none of these appear in EHR records. Enrichment makes these dimensions available as stratifiers in Step 6.
How enrichment works
- Approved extracts are loaded into staging tables keyed on PPRL token hash.
- Each enrichment function queries its staging table for tokens in the roster.
- Matched roster rows get their flag and date fields set; unmatched rows get
False/None. - Two DB round-trips per source regardless of roster size (efficient at scale).
Idempotency
All five enrichment functions perform a complete refresh — every roster row is updated on each run. Re-running with updated staging data is safe and replaces stale flags rather than accumulating duplicates.
The Five Enrichment Sources
Medicaid enrollment
Matches on token_hash. Sets medicaid_effective_date
to the earliest enrollment period across all records for that token.
Captures current and historical Medicaid-style coverage where your jurisdiction defines it.
HMIS — homelessness services
The Homeless Management Information System (HMIS) tracks service touchpoints: street outreach,
emergency shelter, transitional housing, and supportive housing. Sets
homeless_first_service_date to the earliest entry date
across all service types.
Corrections — jail & prison records
Jail (county) and prison (state) records are handled in two separate queries so a person with both types of incarceration history gets both flags set independently. Admission dates are the earliest admission across all matching records.
Immunization information system (IIS)
COVID-19 and influenza vaccinations are tracked separately. COVID stores the earliest vaccination date; influenza stores only a boolean (any influenza vaccination on record). Many jurisdictions operate a statewide or regional IIS with high population coverage.
Vital statistics (death records)
Civil vital events are matched to the roster. When multiple records exist for the same token (a data quality edge case), the latest death date is used. Deceased patients can be excluded from active-patient denominators or analyzed separately.
Enrichment Status — Current Roster
Staging tables are empty in the simulation
The five staging tables (MedicaidEnrollment, HMISRecord, DOCRecord, MIICRecord, VitalStatisticsRecord) contain 0 rows total. In a real deployment, data stewards would load extracts from these five sources before running the roster enrichment job, which would populate the flags below.
Staging Table Row Counts
| Staging Table | Rows loaded |
|---|---|
MedicaidEnrollment | 0 |
HMISRecord | 0 |
DOCRecord | 0 |
MIICRecord | 0 |
VitalStatisticsRecord | 0 |
After staging files are loaded, run the roster enrichment job to refresh flags across the 1,000-row roster.