HTAC Pipeline · Step 2 of 7
PPRL Token Generation
Before any patient can be matched across health systems, their identity is converted into a one-way cryptographic token using HMAC-SHA256. The six PII fields used to generate the token are immediately discarded — only the 64-character hex digest is retained.
Why this step?
HTAC needs to know when the same patient appears at multiple health systems so they are not counted twice in prevalence denominators. But health systems cannot legally share patient names or identifiers across organizational boundaries. PPRL solves this: two sites compute the same hash from the same PII — without exchanging the PII itself.
Why HMAC instead of plain SHA-256?
Plain SHA-256 is deterministic globally — anyone with the same PII gets the same hash, enabling external re-identification attacks. HMAC-SHA256 with a shared secret salt makes tokens meaningful only within the coordinated site group. Without the salt, a token from one network cannot be linked to tokens from another.
Validation
This PPRL approach has been validated in multi-site research networks (e.g., OneFlorida), producing 97% precision (few false links) and 75% recall (most true matches found) — acceptable thresholds for population-level surveillance.
The Six PII Input Fields
Each field is normalized before concatenation. Normalization ensures minor formatting differences (e.g., "612-555-1234" vs "6125551234") produce identical tokens.
First Name
Normalization: lowercase foldingLast Name
Normalization: lowercase foldingDate of Birth
Normalization:YYYYMMDD (strips separators)
Sex
Normalization: first character only, lowercase (m / f)
Phone Number
Normalization: digits only, strip all non-numeric charactersZIP Code
Normalization: first 5 characters onlyHow the Hash Is Computed
HTAC_PPRL_SALT):
htac-prod-salt-[secret]
Token generation follows HMAC-SHA256 over a normalized preimage; the site-group salt is taken from deployment configuration when not supplied on input. Personal identifiers are supplied only transiently from controlled intake paths and are not persisted in the clinical store.
Output: HashToken Record
One HashToken row is written per patient per health system.
A patient seen at two sites generates two HashToken rows with identical token values —
this is how cross-site matches are detected in Step 3.
| Field | Type | Description |
|---|---|---|
token | str(64) | Lowercase hex HMAC-SHA256 digest — the privacy-preserving patient identifier |
person | FK → Person | The Person record this token represents at this site |
health_system | FK → HealthSystem | The site that generated this token |
generated_at | datetime | Timestamp of token generation (auto-set) |
Simulation Token Counts
Total tokens
1,320
Cross-site shared tokens
320
These represent
patients appearing at 2+ sites — they will be deduplicated in Step 3.
| Health System | HashToken rows |
|---|---|
| Allina Health | 176 |
| CentraCare | 86 |
| Children's Regional Health | 77 |
| Essentia Health | 135 |
| HealthPartners | 170 |
| Hennepin Healthcare | 89 |
| M Health Fairview | 169 |
| Mayo Clinic | 150 |
| Minneapolis VA | 90 |
| North Memorial Health | 72 |
| Sanford Health | 106 |
| Total | 1,320 |