HTAC Pipeline · Step 2 of 7

PPRL Token Generation

Clinical Data
2
PPRL Tokens
3
Deduplication
4
Enrichment
5
Cohort Queries
6
Suppression
7
Results

Before any patient can be matched across health systems, their identity is converted into a one-way cryptographic token using HMAC-SHA256. The six PII fields used to generate the token are immediately discarded — only the 64-character hex digest is retained.

Why this step?

HTAC needs to know when the same patient appears at multiple health systems so they are not counted twice in prevalence denominators. But health systems cannot legally share patient names or identifiers across organizational boundaries. PPRL solves this: two sites compute the same hash from the same PII — without exchanging the PII itself.

Why HMAC instead of plain SHA-256?

Plain SHA-256 is deterministic globally — anyone with the same PII gets the same hash, enabling external re-identification attacks. HMAC-SHA256 with a shared secret salt makes tokens meaningful only within the coordinated site group. Without the salt, a token from one network cannot be linked to tokens from another.

Validation

This PPRL approach has been validated in multi-site research networks (e.g., OneFlorida), producing 97% precision (few false links) and 75% recall (most true matches found) — acceptable thresholds for population-level surveillance.

The Six PII Input Fields

Each field is normalized before concatenation. Normalization ensures minor formatting differences (e.g., "612-555-1234" vs "6125551234") produce identical tokens.

Field 1

First Name

Normalization: lowercase folding
Field 2

Last Name

Normalization: lowercase folding
Field 3

Date of Birth

Normalization: YYYYMMDD (strips separators)
Field 4

Sex

Normalization: first character only, lowercase (m / f)
Field 5

Phone Number

Normalization: digits only, strip all non-numeric characters
Field 6

ZIP Code

Normalization: first 5 characters only

How the Hash Is Computed

Normalized fields concatenated: johnsmith19820315m612555123455401
Shared site-group salt (from deployment configuration HTAC_PPRL_SALT): htac-prod-salt-[secret]
Algorithm: HMAC-SHA256(key=salt, msg=preimage) 3a7f8c2d19e04b5a… (64 hex chars)

Token generation follows HMAC-SHA256 over a normalized preimage; the site-group salt is taken from deployment configuration when not supplied on input. Personal identifiers are supplied only transiently from controlled intake paths and are not persisted in the clinical store.

Output: HashToken Record

One HashToken row is written per patient per health system. A patient seen at two sites generates two HashToken rows with identical token values — this is how cross-site matches are detected in Step 3.

FieldTypeDescription
tokenstr(64)Lowercase hex HMAC-SHA256 digest — the privacy-preserving patient identifier
personFK → PersonThe Person record this token represents at this site
health_systemFK → HealthSystemThe site that generated this token
generated_atdatetimeTimestamp of token generation (auto-set)

Simulation Token Counts

Total tokens

1,320

Cross-site shared tokens

320

These represent

patients appearing at 2+ sites — they will be deduplicated in Step 3.

Health SystemHashToken rows
Allina Health176
CentraCare86
Children's Regional Health77
Essentia Health135
HealthPartners170
Hennepin Healthcare89
M Health Fairview169
Mayo Clinic150
Minneapolis VA90
North Memorial Health72
Sanford Health106
Total1,320
Latest pipeline demonstration (completed May 14, 2026 20:05). Counts on this page reflect synthetic federated data from that run. Open the live demo →
← Step 1: Clinical Data
← Pipeline overview
Step 3: Deduplication →