Person Deduplication & Merge

Overview

The Person Deduplication & Merge feature combines duplicate Person records in the system into a single authoritative record. It updates all foreign key references, merges metadata intelligently, and provides an audit trail of all merge operations.

This feature builds on the existing Progressive Person Matching scoring infrastructure (PersonMatchConfig) and is a prerequisite for deploying progressive matching in production — duplicate records cause legitimate matches to return AMBIGUOUS.

Terminology

Term Definition

Source Person

The duplicate Person record that will be absorbed and soft-deleted after merge.

Target Person

The surviving Person record that receives all merged data and FK references.

PersonWrapper

Wraps User (WordPress wp_users) with typed metadata accessors for person fields. The canonical interface for reading/writing person data.

PersonMergeLog

Audit record capturing merge timestamp, source/target IDs, field provenance, trigger type, and operator.

PersonMergeCandidate

A detected potential duplicate pair awaiting review or auto-merge, with match score and status.

Target Selection

The target is the Person record that will survive the merge. Selection criteria, in priority order:

  1. The person with the most recently edited data (last_edited meta)

  2. If equal, the person with richer relationship data (more LinkedPerson, EventParticipant, Membership records)

  3. For manual merges via API, the caller explicitly specifies source and target

Merging Person Properties (UserMeta)

The merge uses PersonWrapper-aware logic rather than blind metadata key stacking. For each field:

  1. Prefer valid over invalid — e.g., a Luhn-valid SA ID over an invalid one, a real email over a fake email (@member.wpcycling.com)

  2. Prefer non-null over null — fill gaps from the source where the target has no value

  3. Prefer most recently edited — when both have valid, non-null values of equal quality, keep the most recent

  4. Record provenance — for each field, record which person (source or target) contributed the surviving value

PersonWrapper Fields Subject to Merge

Identity

first_name, last_name, person_email, title, language, date_of_birth, id_number, id_type, id_country, nationality_country, gender, contact_number

Emergency / Medical

ice_name, ice_number, iceRelationship, doctor_name, doctor_number, scheme_name, scheme_principle, scheme_policy_number, medication, medical_conditions, medical_allergies

Guardian

parent_name, parent_email, parent_number, parent_relationship

Other

school, club_name, other_number, external_ref, main_member_id

Billing

billing_address_1, billing_city, billing_country, billing_postcode

Fields NOT Merged (System-Managed)

user_key, uuid, managed_by (deprecated), last_edited (reset on merge), created_on, merge_date_time (set on source), _wpca_sync_on, _wpca_sync_by

Foreign Key Migration

All database foreign keys referencing wp_users.ID must be updated from source to target. The merge is transactional — all updates succeed or all roll back.

Table Column Records (prod) Notes

event_participant

person_id

24,073

Conflict resolution needed if both source and target have participation in the same event (see Conflict Resolution)

membership

person_id

38,543

Conflict resolution for same MembershipType

tag

person_id

41,624

tag_assignment

person_id

2,028

Deprecated — migrate but mark code @Deprecated

race_number

person_id

30,343

race_pack_barcode

person_id

188

race_result

person_id

14,235

race_number_assignment

person_id

0

order_line_item

person_id

12,904

process_data

person_id

18,367

process_instance__person

person_id

6,406

Join table — update or deduplicate

linked_person

linked_person_id

10,823

Deduplicate if target already linked to same principal

org_user

person_id

21,702

See OrgUser Principal Merge

wp_users_ext

ID

3

1:1 extension — migrate or merge row

match_token

user_id

808

Application-level reference (no FK constraint)

Conflict Resolution

EventParticipant

When both source and target have an EventParticipant for the same Event:

  • Same EventCategory: Keep target’s record, delete source’s. Migrate any associated ProcessData, OrderLineItem, RaceResult, and StartGroupParticipant from source’s record to target’s if they don’t conflict.

  • Different EventCategory: Flag for manual review — both participations may be legitimate.

Membership

A person may have multiple Membership records for the same MembershipType across different years (annual renewals). The conflict key is MembershipType + MembershipPeriod — a person should appear in each period at most once.

When both source and target have a Membership for the same MembershipType and MembershipPeriod:

  • Keep the record with the most recent renewal or richer financial history.

  • Delete the other after migrating any linked financial records.

When they have the same MembershipType but different MembershipPeriod values, both records are legitimate (different renewal years) — simply re-point the source’s record to the target.

LinkedPerson

When both source and target are linked to the same OrgUser principal:

  • Keep the LinkedPerson record pointing to target (it already exists).

  • Delete the LinkedPerson record pointing to source.

OrgUser Principal Merge

If the source Person is OrgUser.person (i.e., someone’s account principal):

  1. Re-point org_user.person_id from source to target.

  2. Multiple OrgUser records pointing to the same Person is allowed — different session identifiers (e.g., anonymous UUID sessions, OIDC logins) may independently resolve to the same deduplicated person. No manual review needed for this case.

  3. Deduplicate LinkedPerson records if target is now both a principal’s person and a linked person under the same principal.

Relationship Model Migration

managed_by Meta Retirement

The legacy managed_by UserMeta is replaced by LinkedPerson + OrgUser:

Aspect Legacy (managed_by) Current (LinkedPerson + OrgUser)

Storage

wp_usermeta multi-value

linked_person JPA entity

Principal identification

Implicit (user with login)

Explicit (OrgUser.person)

Access control

None

LinkedPerson.accessLevel (READ, READ_WRITE)

Relationship type

None

LinkedPerson.linkType (FAMILY, GUARDIAN, COACH, etc.)

Temporal validity

None

LinkedPerson.validFrom / validTo

Security integration

None

SecurityDimensionService dynamic resolution

The new merge service operates entirely on LinkedPerson and OrgUser. It does not read or write managed_by meta. After merge migration is complete, managed_by will be deprecated and removed.

Merge Audit Trail

Source Person Metadata

The source (merged-away) person receives:

  • merge_date_time — ISO timestamp of the merge

  • PersonWrapper.setPersonDeleted(true) — soft-delete via User.userStatus bitmask (PERSON_DELETE_STATUS_MASK = 0x01)

PersonMergeLog Table

A dedicated audit table captures the full merge history:

Column Type Description

id

BIGINT PK

Auto-generated

merge_date_time

DATETIME

When the merge was executed

source_person_id

BIGINT

The absorbed (now deleted) person

target_person_id

BIGINT

The surviving person

trigger_type

VARCHAR

ADMIN_MANUAL, AUTO_EXACT_ID, AUTO_CONFIDENT_MATCH

operator

VARCHAR

Admin user ID, or SYSTEM for automated merges

field_provenance

JSON

Per-field record of which person contributed each surviving value

fk_updates

JSON

Summary of FK tables and record counts updated

Three-Tier Match Classification

Reuses the existing PersonMatchConfig scoring infrastructure:

Tier Score Action

Auto-merge

>= 80 (confidentMatch) + exact SA ID number match

Merged automatically by scheduled job. Both persons must share an exact, Luhn-valid SA ID. No manual review needed.

Flagged for review

>= 50 (minimumToSuggest)

PersonMergeCandidate record created for admin review. Includes: valid ID vs. invalid ID with same DOB+gender, passport/other ID type matches, strong name+DOB+gender combinations.

No match

< 50

No action.

API

Manual Merge

POST /api/admin/person-merge
Content-Type: application/json

Request:
{
  "sourcePersonId": 12345,
  "targetPersonId": 67890
}

Response:
{
  "mergeLogId": 101,
  "targetPersonId": 67890,
  "fieldsUpdated": 14,
  "fkTablesUpdated": 12,
  "totalRecordsMigrated": 47
}

Merge Candidates

GET /api/admin/person-merge-candidates?status=PENDING&page=0&size=20&sort=score,desc

Response:
{
  "content": [
    {
      "id": 1,
      "sourcePersonId": 12345,
      "sourcePersonName": "John Smith",
      "targetPersonId": 67890,
      "targetPersonName": "J. Smith",
      "score": 72,
      "detectionReason": "Same DOB+gender, surname exact match, first name partial",
      "status": "PENDING",
      "detectedAt": "2026-03-30T10:00:00Z"
    }
  ]
}

POST /api/admin/person-merge-candidates/{id}/approve
POST /api/admin/person-merge-candidates/{id}/reject

Admin UI Requirements

Duplicate Review Queue

Paginated list of PersonMergeCandidate records with status PENDING. Columns: source name, target name, match score, detection reason, date detected. Sortable by score (highest first). Filterable by status.

Side-by-Side Merge Preview

Selecting a candidate shows both persons side-by-side. Each PersonWrapper field displayed with source value, target value, and which will survive. Admin can override individual field choices before confirming the merge.

Manual Merge Trigger

Admin can search for any two persons by name or ID and initiate a merge directly, bypassing the candidate queue. Uses the same side-by-side preview screen.

Merge History / Audit Log

Paginated list of completed merges from person_merge_log. Columns: merge date, source person (with ID), target person, trigger type, operator. Expandable row shows field-level provenance.