Person Deduplication & Merge
Overview
The Person Deduplication & Merge feature combines duplicate Person records in the system into a single authoritative record. It updates all foreign key references, merges metadata intelligently, and provides an audit trail of all merge operations.
This feature builds on the existing Progressive Person Matching scoring infrastructure (PersonMatchConfig) and is a prerequisite for deploying progressive matching in production — duplicate records cause legitimate matches to return AMBIGUOUS.
Terminology
| Term | Definition |
|---|---|
Source Person |
The duplicate Person record that will be absorbed and soft-deleted after merge. |
Target Person |
The surviving Person record that receives all merged data and FK references. |
PersonWrapper |
Wraps |
PersonMergeLog |
Audit record capturing merge timestamp, source/target IDs, field provenance, trigger type, and operator. |
PersonMergeCandidate |
A detected potential duplicate pair awaiting review or auto-merge, with match score and status. |
Target Selection
The target is the Person record that will survive the merge. Selection criteria, in priority order:
-
The person with the most recently edited data (
last_editedmeta) -
If equal, the person with richer relationship data (more LinkedPerson, EventParticipant, Membership records)
-
For manual merges via API, the caller explicitly specifies source and target
Merging Person Properties (UserMeta)
The merge uses PersonWrapper-aware logic rather than blind metadata key stacking. For each field:
-
Prefer valid over invalid — e.g., a Luhn-valid SA ID over an invalid one, a real email over a fake email (
@member.wpcycling.com) -
Prefer non-null over null — fill gaps from the source where the target has no value
-
Prefer most recently edited — when both have valid, non-null values of equal quality, keep the most recent
-
Record provenance — for each field, record which person (source or target) contributed the surviving value
PersonWrapper Fields Subject to Merge
Identity
first_name, last_name, person_email, title, language, date_of_birth, id_number, id_type, id_country, nationality_country, gender, contact_number
Foreign Key Migration
All database foreign keys referencing wp_users.ID must be updated from source to target. The merge is transactional — all updates succeed or all roll back.
| Table | Column | Records (prod) | Notes |
|---|---|---|---|
|
|
24,073 |
Conflict resolution needed if both source and target have participation in the same event (see Conflict Resolution) |
|
|
38,543 |
Conflict resolution for same MembershipType |
|
|
41,624 |
|
|
|
2,028 |
Deprecated — migrate but mark code |
|
|
30,343 |
|
|
|
188 |
|
|
|
14,235 |
|
|
|
0 |
|
|
|
12,904 |
|
|
|
18,367 |
|
|
|
6,406 |
Join table — update or deduplicate |
|
|
10,823 |
Deduplicate if target already linked to same principal |
|
|
21,702 |
|
|
|
3 |
1:1 extension — migrate or merge row |
|
|
808 |
Application-level reference (no FK constraint) |
Conflict Resolution
EventParticipant
When both source and target have an EventParticipant for the same Event:
-
Same EventCategory: Keep target’s record, delete source’s. Migrate any associated
ProcessData,OrderLineItem,RaceResult, andStartGroupParticipantfrom source’s record to target’s if they don’t conflict. -
Different EventCategory: Flag for manual review — both participations may be legitimate.
Membership
A person may have multiple Membership records for the same MembershipType across different years (annual renewals). The conflict key is MembershipType + MembershipPeriod — a person should appear in each period at most once.
When both source and target have a Membership for the same MembershipType and MembershipPeriod:
-
Keep the record with the most recent renewal or richer financial history.
-
Delete the other after migrating any linked financial records.
When they have the same MembershipType but different MembershipPeriod values, both records are legitimate (different renewal years) — simply re-point the source’s record to the target.
OrgUser Principal Merge
If the source Person is OrgUser.person (i.e., someone’s account principal):
-
Re-point
org_user.person_idfrom source to target. -
Multiple
OrgUserrecords pointing to the same Person is allowed — different session identifiers (e.g., anonymous UUID sessions, OIDC logins) may independently resolve to the same deduplicated person. No manual review needed for this case. -
Deduplicate
LinkedPersonrecords if target is now both a principal’s person and a linked person under the same principal.
Relationship Model Migration
managed_by Meta Retirement
The legacy managed_by UserMeta is replaced by LinkedPerson + OrgUser:
| Aspect | Legacy (managed_by) |
Current (LinkedPerson + OrgUser) |
|---|---|---|
Storage |
|
|
Principal identification |
Implicit (user with login) |
Explicit ( |
Access control |
None |
|
Relationship type |
None |
|
Temporal validity |
None |
|
Security integration |
None |
|
The new merge service operates entirely on LinkedPerson and OrgUser. It does not read or write managed_by meta. After merge migration is complete, managed_by will be deprecated and removed.
Merge Audit Trail
Source Person Metadata
The source (merged-away) person receives:
-
merge_date_time— ISO timestamp of the merge -
PersonWrapper.setPersonDeleted(true)— soft-delete viaUser.userStatusbitmask (PERSON_DELETE_STATUS_MASK = 0x01)
PersonMergeLog Table
A dedicated audit table captures the full merge history:
| Column | Type | Description |
|---|---|---|
|
BIGINT PK |
Auto-generated |
|
DATETIME |
When the merge was executed |
|
BIGINT |
The absorbed (now deleted) person |
|
BIGINT |
The surviving person |
|
VARCHAR |
|
|
VARCHAR |
Admin user ID, or |
|
JSON |
Per-field record of which person contributed each surviving value |
|
JSON |
Summary of FK tables and record counts updated |
Three-Tier Match Classification
Reuses the existing PersonMatchConfig scoring infrastructure:
| Tier | Score | Action |
|---|---|---|
Auto-merge |
>= 80 ( |
Merged automatically by scheduled job. Both persons must share an exact, Luhn-valid SA ID. No manual review needed. |
Flagged for review |
>= 50 ( |
|
No match |
< 50 |
No action. |
API
Manual Merge
POST /api/admin/person-merge
Content-Type: application/json
Request:
{
"sourcePersonId": 12345,
"targetPersonId": 67890
}
Response:
{
"mergeLogId": 101,
"targetPersonId": 67890,
"fieldsUpdated": 14,
"fkTablesUpdated": 12,
"totalRecordsMigrated": 47
}
Merge Candidates
GET /api/admin/person-merge-candidates?status=PENDING&page=0&size=20&sort=score,desc
Response:
{
"content": [
{
"id": 1,
"sourcePersonId": 12345,
"sourcePersonName": "John Smith",
"targetPersonId": 67890,
"targetPersonName": "J. Smith",
"score": 72,
"detectionReason": "Same DOB+gender, surname exact match, first name partial",
"status": "PENDING",
"detectedAt": "2026-03-30T10:00:00Z"
}
]
}
POST /api/admin/person-merge-candidates/{id}/approve
POST /api/admin/person-merge-candidates/{id}/reject
Admin UI Requirements
Duplicate Review Queue
Paginated list of PersonMergeCandidate records with status PENDING. Columns: source name, target name, match score, detection reason, date detected. Sortable by score (highest first). Filterable by status.
Side-by-Side Merge Preview
Selecting a candidate shows both persons side-by-side. Each PersonWrapper field displayed with source value, target value, and which will survive. Admin can override individual field choices before confirming the merge.
Related Documentation
-
Progressive Person Matching — scoring infrastructure reused for deduplication
-
Registration Entities — EventParticipant, Membership entity relationships