Progressive Person Matching
Problem Statement
The Registration Portal allows principals (logged-in users) to link other persons to their account for registration purposes - such as family members, team members, or dependents. This creates a sensitive security surface: we must enable legitimate linking while preventing attackers from using the person lookup feature to enumerate personal data.
Security Risks
| Risk | Description |
|---|---|
Data Enumeration |
An attacker could probe the system with partial data (e.g., common surnames) to discover if specific individuals exist in the database, violating POPIA principles. |
Brute Force Matching |
Without rate limiting and minimum proof-of-knowledge requirements, attackers could systematically guess personal details to link unauthorized persons. |
Primary Key Exposure |
Exposing internal |
Information Disclosure |
Returning full personal details for partial matches reveals protected information to unauthorized parties. |
POPIA Compliance Requirements
The Protection of Personal Information Act (POPIA) requires:
-
Purpose Limitation - Personal data only used for the stated purpose (registration linking)
-
Security Safeguards - Technical measures to prevent unauthorized access
-
Data Minimisation - Only reveal necessary information
-
Accountability - Audit trail of all access attempts
As-Is: Search-Then-Link Flow
The current implementation uses a two-step approach:
1. User enters search criteria (name, ID, etc.)
2. System returns list of matching persons
3. User selects from results
4. System creates LinkedPerson record
Current Security Gaps
| Gap | Risk | Impact |
|---|---|---|
List-based results |
Returns multiple matches with personal data |
Enables enumeration of who exists in system |
No minimum criteria |
Can search with minimal information |
Too easy to probe with common data |
User.id exposure |
Primary key returned in responses |
Enables targeted attacks |
No progressive disclosure |
Full details shown immediately |
Violates data minimisation principle |
Rate limiting gaps |
Per-endpoint only, not per-session |
Determined attacker can work around |
To-Be: Progressive Matching with Weighted Scoring
Design Overview
The new design replaces "search-then-link" with "type → auto-match → confirm":
1. User types fields progressively
2. Frontend calculates local score (gate before API call)
3. Backend scores candidates, returns ONLY if unique match
4. Masked suggestion shown after delay
5. User confirms → LinkedPerson created with opaque token
Key Security Features
| Feature | Implementation |
|---|---|
Frontend Pre-Scoring Gate |
20-point minimum before calling backend. Prevents trivial probing. |
Weighted Field Scoring |
Different fields contribute different points based on uniqueness value. |
Uniqueness Requirement |
Backend only returns a match if exactly one candidate meets threshold. Multiple matches return "AMBIGUOUS" requiring more fields. |
Delayed Reveal |
3-second delay at threshold 50; immediate reveal only at 80+. Gives user time to add more fields for better access rights. |
Masked Data Only |
Suggestions show masked name (J* S*), age range, gender - never full details until linked. |
Opaque Match Tokens |
|
Access Level by Score |
Score 50-79 → |
Cross-Account Trust |
+20 boost if same physical User has verified links in another account (prevents re-verification friction) |
Frontend Pre-Scoring (API Gate)
The frontend calculates a local score before calling the backend:
| Field | Points | Rules |
|---|---|---|
First Name |
4-7 |
Sliding scale: 2 chars=4, 3=5, 4=6, 5+=7. Minimum 2 chars. |
Last Name |
4-7 |
Sliding scale: 2 chars=4, 3=5, 4=6, 5+=7. Minimum 2 chars. |
Date of Birth |
5 |
Full date required. Ignored if full ID provided. |
Gender |
4 |
Single selection. Ignored if full ID provided. |
ID Number (full) |
15 |
Must be 13 chars AND pass Luhn checksum. Excludes DOB/Gender (no double-dipping). |
ID Number (partial) |
5 |
6 digits matching valid YYMMDD (no 13th month). Same value as DOB. |
Membership Number |
10 |
Organisation-specific identifier |
10 |
Valid email format |
|
Phone Number |
10 |
Valid phone format |
Threshold: 20 points required to call backend or submit form.
Name Sliding Scale Formula
points = min(7, max(4, length + 2))
// 2 chars → 4 pts
// 3 chars → 5 pts
// 4 chars → 6 pts
// 5+ chars → 7 pts
ID Number Validation
Full ID (13 characters):
-
Exactly 13 digits
-
Positions 1-6: Valid date (YYMMDD) - no invalid months/days
-
Position 13: Luhn checksum digit must validate
Double-Dip Prevention: When full valid ID matches, DOB and Gender fields contribute 0 points (information already encoded in ID).
Partial ID (6 digits): Must match valid YYMMDD pattern to score 5 points.
Backend Scoring Weights
person-matching:
weights:
# ID Number (authoritative)
sa-id-number-exact: 60 # Full 13 chars + Luhn; excludes DOB/Gender
sa-id-number-partial: 25 # First 6 digits (YYMMDD)
# Date of Birth (ignored if full ID provided)
date-of-birth-exact: 25
# Gender (ignored if full ID provided)
gender-exact: 8
# Names (sliding scale)
surname-exact-min: 8 # 2 chars
surname-exact-max: 15 # 5+ chars
first-name-exact-min: 8 # 2 chars
first-name-exact-max: 15 # 5+ chars
# Contact info
email-exact: 20
mobile-exact: 20
membership-number-exact: 40
thresholds:
minimum-to-suggest: 50 # Show masked suggestions
confident-match: 80 # Highlight as likely match
cross-account-trust-boost: 20 # Same User verified in other account
Response States
| Status | Meaning | UI Action |
|---|---|---|
|
No candidates meet threshold |
Continue to new person creation |
|
Multiple candidates qualify |
Prompt for more fields (show suggestions) |
|
Exactly one candidate |
Show masked suggestion |
Cross-Account Trust Model
When a user logs in via different authentication methods (e.g., Facebook vs username/password), they create separate OrgUser accounts but are the same physical User.
Trust Resolution:
-
If the same
Userhas a verified link (accessLevel=READ_WRITE) to a person in another account, +20 boost applied -
Alternatively, if the link has been active for >30 days, trust is assumed
-
After linking 2+ people that match another account’s verified links, existing
READlinks are upgraded toREAD_WRITE
API Endpoints
Progressive Match
POST /api/people/progressive-match
Content-Type: application/json
Request:
{
"input": {
"idNumber": "8501015009087",
"firstName": "Johan",
"surname": null,
"dateOfBirth": null,
"gender": "MALE"
},
"organisationId": 1
}
Response (UNIQUE_MATCH):
{
"status": "UNIQUE_MATCH",
"confidenceScore": 65,
"suggestion": {
"matchToken": "abc123...", // Opaque, time-limited
"maskedName": "J**** S****",
"gender": "Male",
"ageRange": "35-40",
"matchedFields": ["ID_NUMBER_EXACT", "GENDER"]
},
"suggestedAccessLevel": "READ"
}
Response (AMBIGUOUS):
{
"status": "AMBIGUOUS",
"candidateCount": 3,
"message": "Multiple potential matches. Please provide more details.",
"suggestedFields": ["surname", "dateOfBirth"]
}
Rationale
Why Weighted Scoring?
Different fields have different uniqueness value:
-
ID Number is highly unique - strong proof of knowledge
-
Common names like "John" provide less assurance than "Bartholomew"
-
Email and phone are good identifiers but can be socially engineered
Weighted scoring reflects real-world identification confidence.
Why Uniqueness Requirement?
Returning multiple matches would:
-
Reveal that multiple people with those characteristics exist (enumeration)
-
Force UI to display a list (requiring personal data disclosure)
-
Enable attackers to narrow down targets
By requiring exactly one match, we force the user to provide enough information to unambiguously identify the person - proving genuine knowledge.
Why Delayed Reveal?
The 3-second delay at threshold 50:
-
Encourages users to add more fields (potentially reaching 80+ for better access)
-
Prevents rapid probing (rate limiting enhancement)
-
Gives legitimate users time to complete the form naturally
Why Opaque Tokens?
matchToken is:
-
Time-limited (expires after 5 minutes)
-
Single-use (invalidated after redemption)
-
Cryptographically secure (cannot be guessed)
-
Opaque (reveals nothing about the underlying data)
This ensures the client never learns the User.id, preventing:
-
IDOR (Insecure Direct Object Reference) attacks
-
Targeted enumeration based on sequential IDs
-
Correlation between sessions
Why Cross-Account Trust?
Real users may:
-
Create accounts via different auth methods over time
-
Forget they already linked family members in another login
Cross-account trust:
-
Reduces friction for legitimate users
-
Only applies to verified relationships
-
Includes time constraint (30 days) to prevent abuse
Prerequisites
| Deduplication must complete before Progressive Matching goes live. |
Current database duplicates would cause all matches to return "AMBIGUOUS", frustrating legitimate users. A separate Deduplication Epic should:
-
Identify duplicate candidates (same ID number, similar names + DOB)
-
Build admin merge UI for review
-
Implement merge logic preserving relationships and audit trail
-
Complete before Progressive Matching deployment
Security Controls Summary
| Control | Implementation |
|---|---|
Enumeration Prevention |
Uniqueness requirement - only returns if exactly one match |
Proof of Knowledge |
20-point frontend gate + 50-point backend threshold |
Rate Limiting |
Per-session limits on progressive match calls |
Data Minimisation |
Masked suggestions only; full data after verified link |
PK Protection |
User.id never exposed; matchToken and LinkedPerson.id only |
Timing Attack Prevention |
Consistent response times regardless of match status |
Audit Trail |
All progressive match attempts and token redemptions logged |
Related Documentation
-
Entity Classification - Security types for LinkedPerson
-
Security Entities - LinkedPerson entity details
-
Service Layer - Service method security patterns
-
REST Controllers - Endpoint security implementation
Implementation Checklist
-
PersonMatchScorer- Weighted scoring with configurable weights -
PersonMatchConfig- YAML configuration for weights and thresholds -
ProgressiveMatchResponse- Response DTOs for match states -
PersonResource.progressiveMatch()- New endpoint -
LinkedPersonResourceEx.linkByToken()- Token redemption endpoint -
Frontend pre-scoring component
-
Delayed reveal UI with timer
-
Rate limiting aspect adaptation
-
Cross-account trust resolution
-
Progressive access level upgrade