HTAC Pipeline · Step 2 of 7

PPRL Token Generation

Clinical Data

PPRL Tokens

Deduplication

Enrichment

Cohort Queries

Suppression

Results

Before any patient can be matched across health systems, their identity is converted into a one-way cryptographic token using HMAC-SHA256. The six PII fields used to generate the token are immediately discarded — only the 64-character hex digest is retained.

Why this step?

HTAC needs to know when the same patient appears at multiple health systems so they are not counted twice in prevalence denominators. But health systems cannot legally share patient names or identifiers across organizational boundaries. PPRL solves this: two sites compute the same hash from the same PII — without exchanging the PII itself.

Why HMAC instead of plain SHA-256?

Plain SHA-256 is deterministic globally — anyone with the same PII gets the same hash, enabling external re-identification attacks. HMAC-SHA256 with a shared secret salt makes tokens meaningful only within the coordinated site group. Without the salt, a token from one network cannot be linked to tokens from another.

Validation

This PPRL approach has been validated in multi-site research networks (e.g., OneFlorida), producing 97% precision (few false links) and 75% recall (most true matches found) — acceptable thresholds for population-level surveillance.

The Six PII Input Fields

Each field is normalized before concatenation. Normalization ensures minor formatting differences (e.g., "612-555-1234" vs "6125551234") produce identical tokens.

Field 1

First Name

Normalization: lowercase folding

Field 2

Last Name

Normalization: lowercase folding

Field 3

Date of Birth

Normalization: YYYYMMDD (strips separators)

Field 4

Sex

Normalization: first character only, lowercase (m / f)

Field 5

Phone Number

Normalization: digits only, strip all non-numeric characters

Field 6

ZIP Code

Normalization: first 5 characters only

How the Hash Is Computed

Normalized fields concatenated: johnsmith19820315m612555123455401

Shared site-group salt (from deployment configuration HTAC_PPRL_SALT): htac-prod-salt-[secret]

Algorithm: HMAC-SHA256(key=salt, msg=preimage) → 3a7f8c2d19e04b5a… (64 hex chars)

Token generation follows HMAC-SHA256 over a normalized preimage; the site-group salt is taken from deployment configuration when not supplied on input. Personal identifiers are supplied only transiently from controlled intake paths and are not persisted in the clinical store.

Output: HashToken Record

One HashToken row is written per patient per health system. A patient seen at two sites generates two HashToken rows with identical token values — this is how cross-site matches are detected in Step 3.

Field	Type	Description
`token`	str(64)	Lowercase hex HMAC-SHA256 digest — the privacy-preserving patient identifier
`person`	FK → Person	The Person record this token represents at this site
`health_system`	FK → HealthSystem	The site that generated this token
`generated_at`	datetime	Timestamp of token generation (auto-set)

Simulation Token Counts

Total tokens

1,320

Cross-site shared tokens

320

These represent

patients appearing at 2+ sites — they will be deduplicated in Step 3.

Health System	HashToken rows
Allina Health	176
CentraCare	86
Children's Regional Health	77
Essentia Health	135
HealthPartners	170
Hennepin Healthcare	89
M Health Fairview	169
Mayo Clinic	150
Minneapolis VA	90
North Memorial Health	72
Sanford Health	106
Total	1,320

Latest pipeline demonstration (completed May 14, 2026 20:05). Counts on this page reflect synthetic federated data from that run. Open the live demo →

← Step 1: Clinical Data

← Pipeline overview

Step 3: Deduplication →