Async Import Architecture

1. Overview

The Async Import architecture provides a framework for processing large spreadsheet imports asynchronously, with support for interactive column and cell mapping when automatic resolution fails. This enables imports that would otherwise timeout to complete successfully with user assistance.

2. Problem Statement

2.1. Current Limitations

The synchronous import approach has several limitations:

Limitation	Impact
HTTP timeout	Large files fail before processing completes
No progress visibility	Users don’t know import status
All-or-nothing mapping	Any unmapped column/cell fails the entire import
No resume capability	Failed imports must restart from scratch
Resource consumption	Long-running requests tie up server threads

Limitation

Impact

HTTP timeout

Large files fail before processing completes

No progress visibility

Users don’t know import status

All-or-nothing mapping

Any unmapped column/cell fails the entire import

No resume capability

Failed imports must restart from scratch

Resource consumption

Long-running requests tie up server threads

2.2. Requirements

The async import system must:

Accept large files with immediate acknowledgment
Process rows asynchronously in the background
Pause for user input when mappings cannot be resolved
Allow resume after user provides missing mappings
Track progress and provide status updates
Clean up completed/abandoned imports automatically

3. Architecture Overview

4. State Machine

4.1. States

4.2. State Descriptions

State	Description	Next States
UPLOADED	File received and stored as BLOB. Awaiting column detection.	COLUMN_MAPPING
COLUMN_MAPPING	Headers analyzed. Required fields not all matched. Waiting for user to provide mappings.	CELL_MAPPING, CANCELLED
CELL_MAPPING	Columns mapped. Foreign key values cannot be resolved. Waiting for user to select valid values.	PROCESSING, CANCELLED
PROCESSING	Actively processing rows. Progress tracked.	CELL_MAPPING (new value), COMPLETED, FAILED, CANCELLED
COMPLETED	All rows processed. Results available.	(terminal)
FAILED	Fatal error occurred during processing.	(terminal)
CANCELLED	User cancelled the import.	(terminal)

State

Description

Next States

UPLOADED

File received and stored as BLOB. Awaiting column detection.

COLUMN_MAPPING

Headers analyzed. Required fields not all matched. Waiting for user to provide mappings.

CELL_MAPPING, CANCELLED

CELL_MAPPING

Columns mapped. Foreign key values cannot be resolved. Waiting for user to select valid values.

PROCESSING, CANCELLED

PROCESSING

Actively processing rows. Progress tracked.

CELL_MAPPING (new value), COMPLETED, FAILED, CANCELLED

COMPLETED

All rows processed. Results available.

(terminal)

FAILED

Fatal error occurred during processing.

(terminal)

CANCELLED

User cancelled the import.

(terminal)

5. BLOB Storage

5.1. Attachment Entity

Import files are stored using the existing Attachment entity:

@Entity
@Inheritance(strategy = InheritanceType.JOINED)
public class Attachment {
    @Id @GeneratedValue
    private Long id;

    @Column(unique = true, nullable = false)
    private String uuid;

    @Lob @Basic(fetch = FetchType.EAGER)
    private byte[] data;

    @ManyToOne(optional = false)
    private Organisation organisation;

    @NotNull
    private String mediaType;

    private Instant expiryDate;

    @NotNull
    private String name;
}

5.2. File Storage Flow

Client uploads file via multipart form
Service validates file type and size
File bytes stored in Attachment.data
Attachment.uuid returned to client as reference
ImportJob links to Attachment for processing

5.3. Supported Media Types

private static final Set<String> ALLOWED_TYPES = Set.of(
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",  // xlsx
    "application/vnd.ms-excel",                                           // xls
    "text/csv",
    "text/plain"  // csv with wrong mime type
);

6. Database Schema

6.1. Entity Relationship Diagram

6.2. Enumerations

public enum ImportType {
    EVENT_PARTICIPANT,
    MEMBERSHIP,
    RESULT
}

public enum ImportJobStatus {
    UPLOADED,
    COLUMN_MAPPING,
    CELL_MAPPING,
    PROCESSING,
    COMPLETED,
    FAILED,
    CANCELLED
}

public enum MappingStatus {
    AUTO_MATCHED,     // System matched with high confidence
    MANUAL_MATCHED,   // User confirmed or corrected
    UNMATCHED,        // Cannot match, needs user input
    IGNORED           // User chose to skip
}

public enum ImportOutcome {
    CREATED,
    UPDATED,
    SKIPPED,
    ERROR
}

7. API Design

7.1. Endpoint Summary

Method Endpoint Description

Method	Endpoint	Description
POST	`/api/imports`	Upload file and create import job
GET	`/api/imports`	List import jobs (paginated)
GET	`/api/imports/{uuid}`	Get import job status
DELETE	`/api/imports/{uuid}`	Cancel import job
GET	`/api/imports/{uuid}/column-mappings`	Get column mappings
PUT	`/api/imports/{uuid}/column-mappings`	Update column mappings
POST	`/api/imports/{uuid}/column-mappings/confirm`	Confirm all and proceed to cell mapping
GET	`/api/imports/{uuid}/cell-mappings`	Get cell mappings
GET	`/api/imports/{uuid}/cell-mappings/candidates`	Get candidate entities for FK resolution
PUT	`/api/imports/{uuid}/cell-mappings`	Update cell mappings
POST	`/api/imports/{uuid}/cell-mappings/confirm`	Confirm all and start processing
POST	`/api/imports/{uuid}/start`	Skip cell mapping and start processing
GET	`/api/imports/{uuid}/results`	Get row results (paginated)
GET	`/api/imports/{uuid}/results/summary`	Get results summary by outcome

POST

/api/imports

Upload file and create import job

GET

/api/imports

List import jobs (paginated)

GET

/api/imports/{uuid}

Get import job status

DELETE

/api/imports/{uuid}

Cancel import job

GET

/api/imports/{uuid}/column-mappings

Get column mappings

PUT

/api/imports/{uuid}/column-mappings

Update column mappings

POST

/api/imports/{uuid}/column-mappings/confirm

Confirm all and proceed to cell mapping

GET

/api/imports/{uuid}/cell-mappings

Get cell mappings

GET

/api/imports/{uuid}/cell-mappings/candidates

Get candidate entities for FK resolution

PUT

/api/imports/{uuid}/cell-mappings

Update cell mappings

POST

/api/imports/{uuid}/cell-mappings/confirm

Confirm all and start processing

POST

/api/imports/{uuid}/start

Skip cell mapping and start processing

GET

/api/imports/{uuid}/results

Get row results (paginated)

GET

/api/imports/{uuid}/results/summary

Get results summary by outcome

7.2. Endpoint Details

7.2.1. Create Import Job

POST /api/imports
Content-Type: multipart/form-data

Parameters:
- file: MultipartFile (required) - The spreadsheet file
- importType: String (required) - EVENT_PARTICIPANT, MEMBERSHIP, RESULT
- contextId: Long (optional) - Event ID, MembershipPeriod ID, or Race ID
- organisationId: Long (optional) - Organisation ID (defaults to current user's org)

Response: 201 Created
Location: /api/imports/{uuid}

{
    "identifier": "abc-123-def-456",
    "importType": "EVENT_PARTICIPANT",
    "status": "COLUMN_MAPPING",
    "originalFilename": "registrations.xlsx",
    "totalRows": 150,
    "processedRows": 0,
    "successCount": 0,
    "errorCount": 0,
    "createdAt": "2026-01-03T10:00:00Z",
    "columnMappings": [
        {"id": 1, "columnIndex": 0, "sourceHeader": "Name", "targetField": "FIRST_NAME", "status": "AUTO_MATCHED", "confidenceScore": 1.0},
        {"id": 2, "columnIndex": 1, "sourceHeader": "Unknown Col", "targetField": null, "status": "UNRESOLVED", "confidenceScore": 0.0}
    ]
}

7.2.2. Get Import Job Status

GET /api/imports/{uuid}?includeMappings=true

Response: 200 OK
{
    "identifier": "abc-123-def-456",
    "importType": "EVENT_PARTICIPANT",
    "status": "PROCESSING",
    "totalRows": 150,
    "processedRows": 75,
    "successCount": 72,
    "errorCount": 3,
    "progressPercent": 50,
    "createdAt": "2026-01-03T10:00:00Z"
}

7.2.3. List Import Jobs

GET /api/imports?organisationId=1&status=PROCESSING&page=0&size=20

Response: 200 OK
X-Total-Count: 5

[
    {"identifier": "abc-123", "status": "PROCESSING", "progressPercent": 50, ...},
    {"identifier": "def-456", "status": "COMPLETED", "progressPercent": 100, ...}
]

7.2.4. Update Column Mappings

PUT /api/imports/{uuid}/column-mappings
Content-Type: application/json

[
    {"id": 1, "confirm": true},
    {"id": 2, "targetField": "EVENT_CATEGORY_NAME"},
    {"id": 3, "ignore": true}
]

Response: 202 Accepted (all required mapped)
Response: 406 Not Acceptable (required fields still unresolved)

7.2.5. Confirm Column Mappings

POST /api/imports/{uuid}/column-mappings/confirm

Response: 202 Accepted
{
    "identifier": "abc-123-def-456",
    "status": "CELL_MAPPING",
    "cellMappings": [
        {"id": 1, "targetField": "EVENT_CATEGORY_NAME", "sourceValue": "Junior", "status": "AUTO_MATCHED", "targetEntityId": 42},
        {"id": 2, "targetField": "EVENT_CATEGORY_NAME", "sourceValue": "Unknown Cat", "status": "UNRESOLVED", "targetEntityId": null}
    ]
}

7.2.6. Get Cell Mapping Candidates

GET /api/imports/{uuid}/cell-mappings/candidates?targetField=EVENT_CATEGORY_NAME

Response: 200 OK
[
    {"id": 42, "displayName": "Junior (U18)"},
    {"id": 43, "displayName": "Senior (18+)"},
    {"id": 44, "displayName": "Masters (40+)"}
]

7.2.7. Update Cell Mappings

PUT /api/imports/{uuid}/cell-mappings
Content-Type: application/json

[
    {"id": 1, "confirm": true},
    {"id": 2, "targetEntityId": 43},
    {"id": 3, "ignore": true}
]

Response: 202 Accepted (all resolved)
Response: 406 Not Acceptable (unresolved mappings remain)

7.2.8. Start Processing

POST /api/imports/{uuid}/start

Description: Confirms all auto-matched mappings and starts processing.
             Can be called from COLUMN_MAPPING (skips cell mapping) or CELL_MAPPING status.

Response: 202 Accepted
{
    "identifier": "abc-123-def-456",
    "status": "PROCESSING",
    "totalRows": 150,
    "processedRows": 0
}

7.2.9. Get Row Results

GET /api/imports/{uuid}/results?outcome=ERROR&page=0&size=50

Response: 200 OK
X-Total-Count: 5

[
    {"id": 1, "rowNumber": 15, "outcome": "ERROR", "message": "Duplicate email: [email protected]"},
    {"id": 2, "rowNumber": 42, "outcome": "ERROR", "message": "Invalid date format"}
]

7.2.10. Get Results Summary

GET /api/imports/{uuid}/results/summary

Response: 200 OK
{
    "created": 120,
    "updated": 25,
    "skipped": 2,
    "error": 3,
    "validationError": 0
}

7.2.11. Cancel Import

DELETE /api/imports/{uuid}

Response: 204 No Content (cancelled successfully)
Response: 404 Not Found (job doesn't exist)
Response: 409 Conflict (already completed/failed/cancelled)

8. Column Mapping

8.1. Auto-Matching Process

Extract headers from first row
Normalize each header (remove spaces, special chars, uppercase)
Look up in header dictionary
Calculate confidence score for fuzzy matches
Mark required fields that couldn’t match

8.2. Confidence Scoring

Match Type	Example	Confidence
Exact match	"FirstName" → FIRST_NAME	1.0
Alias match	"FN" → FIRST_NAME	1.0
Normalized match	"First Name" → FIRST_NAME	1.0
Fuzzy match (>80%)	"First_Nm" → FIRST_NAME	0.8-0.99
No match	"Unknown" → null	0.0

Match Type

Example

Confidence

Exact match

"FirstName" → FIRST_NAME

1.0

Alias match

"FN" → FIRST_NAME

1.0

Normalized match

"First Name" → FIRST_NAME

1.0

Fuzzy match (>80%)

"First_Nm" → FIRST_NAME

0.8-0.99

No match

"Unknown" → null

0.0

8.3. User Resolution

When unmatched columns exist:

Return list of unmatched columns with isRequired flag
Return list of available (unmapped) target fields
User selects target for each or marks as IGNORED
System validates all required fields are mapped
Return 202 if valid, 406 if still missing

9. Cell Mapping

9.1. Foreign Key Resolution

For fields that reference other entities, values must be resolved to IDs:

Field Resolution

Field	Resolution
ID_COUNTRY	Lookup `Country` by code or name
EVENT_CATEGORY_NAME	Lookup `EventCategory` by name within event
CUSTOM_LIST_*	Lookup `CustomListValue` by value within list
MEMBERSHIP_TYPE_ID	Lookup `MembershipType` by ID or name

ID_COUNTRY

Lookup Country by code or name

EVENT_CATEGORY_NAME

Lookup EventCategory by name within event

CUSTOM_LIST_*

Lookup CustomListValue by value within list

MEMBERSHIP_TYPE_ID

Lookup MembershipType by ID or name

9.2. Resolution Strategy

public interface CellValueResolver {
    /**
     * Attempt to resolve a source value to an entity ID.
     */
    CellResolutionResult resolve(String sourceValue, ImportContext context);

    /**
     * Get all valid target values for UI dropdown.
     */
    List<TargetOptionDTO> getAvailableTargets(ImportContext context);
}

9.3. User Resolution Options

When a value cannot be resolved:

Map to existing: Select from dropdown of valid entities
Create new: Create new entity (if permitted for this type)
Ignore: Skip rows with this value
Use default: Apply a default for all unmatched

10. Async Processing

10.1. Spring Async Configuration

@Configuration
@EnableAsync
public class AsyncConfig {

    @Bean(name = "importTaskExecutor")
    public Executor importTaskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(2);
        executor.setMaxPoolSize(5);
        executor.setQueueCapacity(25);
        executor.setThreadNamePrefix("import-");
        executor.initialize();
        return executor;
    }
}

10.2. Processing Service

@Service
public class ImportProcessingService {

    @Async("importTaskExecutor")
    public CompletableFuture<Void> processImport(UUID jobId) {
        ImportJob job = findByUuid(jobId);

        try {
            job.setStatus(PROCESSING);
            save(job);

            SpreadsheetReader reader = createReader(job);
            int rowIndex = 1;  // Skip header

            while (rowIndex < reader.getRowCount()) {
                // Check cancellation
                if (isCancelled(jobId)) return complete();

                try {
                    processRow(reader.getRow(rowIndex), job);
                    job.incrementSuccess();
                } catch (UnmappedCellException e) {
                    // Pause for user input
                    savePendingCellMapping(job, e);
                    job.setStatus(CELL_MAPPING);
                    save(job);
                    return complete();
                } catch (Exception e) {
                    saveError(job, rowIndex, e);
                    job.incrementError();
                }

                job.setProcessedRows(rowIndex);
                if (rowIndex % 100 == 0) save(job);
                rowIndex++;
            }

            job.setStatus(COMPLETED);
            job.setCompletedAt(Instant.now());
            save(job);

        } catch (Exception e) {
            job.setStatus(FAILED);
            save(job);
        }

        return complete();
    }
}

11. Purging Strategy

11.1. Retention Rules

Status	Retention	Rationale
COMPLETED	7 days	Allow review and re-download
FAILED	7 days	Allow debugging
CANCELLED	7 days	Same as completed
UPLOADED, COLUMN_MAPPING, CELL_MAPPING	90 days	User may return to complete
PROCESSING	No auto-purge	Requires manual intervention

Status

Retention

Rationale

COMPLETED

7 days

Allow review and re-download

FAILED

7 days

Allow debugging

CANCELLED

7 days

Same as completed

UPLOADED, COLUMN_MAPPING, CELL_MAPPING

90 days

User may return to complete

PROCESSING

No auto-purge

Requires manual intervention

11.2. Scheduled Cleanup

@Service
public class ImportCleanupService {

    @Scheduled(cron = "0 0 2 * * ?")  // Daily at 2 AM
    @Transactional
    public void cleanupExpiredImports() {
        Instant now = Instant.now();

        // Completed/Failed/Cancelled: 7 days
        deleteByStatusesOlderThan(
            Set.of(COMPLETED, FAILED, CANCELLED),
            now.minus(7, ChronoUnit.DAYS)
        );

        // Abandoned: 90 days
        deleteByStatusesOlderThan(
            Set.of(UPLOADED, COLUMN_MAPPING, CELL_MAPPING),
            now.minus(90, ChronoUnit.DAYS)
        );
    }

    private void deleteByStatusesOlderThan(Set<Status> statuses, Instant cutoff) {
        List<ImportJob> jobs = repository.findByStatusInAndCreatedAtBefore(statuses, cutoff);
        for (ImportJob job : jobs) {
            attachmentRepository.delete(job.getSourceFile());
            repository.delete(job);
        }
    }
}

12. UI Workflow

12.1. Upload Flow

12.2. Polling Flow

13. Related Documentation

File Import - XLSX/CSV parsing and column mapping
Data Synchronisation - Related synchronisation patterns