Async Import Architecture
1. Overview
The Async Import architecture provides a framework for processing large spreadsheet imports asynchronously, with support for interactive column and cell mapping when automatic resolution fails. This enables imports that would otherwise timeout to complete successfully with user assistance.
2. Problem Statement
2.1. Current Limitations
The synchronous import approach has several limitations:
| Limitation | Impact |
|---|---|
HTTP timeout |
Large files fail before processing completes |
No progress visibility |
Users don’t know import status |
All-or-nothing mapping |
Any unmapped column/cell fails the entire import |
No resume capability |
Failed imports must restart from scratch |
Resource consumption |
Long-running requests tie up server threads |
2.2. Requirements
The async import system must:
-
Accept large files with immediate acknowledgment
-
Process rows asynchronously in the background
-
Pause for user input when mappings cannot be resolved
-
Allow resume after user provides missing mappings
-
Track progress and provide status updates
-
Clean up completed/abandoned imports automatically
4. State Machine
4.2. State Descriptions
| State | Description | Next States |
|---|---|---|
UPLOADED |
File received and stored as BLOB. Awaiting column detection. |
COLUMN_MAPPING |
COLUMN_MAPPING |
Headers analyzed. Required fields not all matched. Waiting for user to provide mappings. |
CELL_MAPPING, CANCELLED |
CELL_MAPPING |
Columns mapped. Foreign key values cannot be resolved. Waiting for user to select valid values. |
PROCESSING, CANCELLED |
PROCESSING |
Actively processing rows. Progress tracked. |
CELL_MAPPING (new value), COMPLETED, FAILED, CANCELLED |
COMPLETED |
All rows processed. Results available. |
(terminal) |
FAILED |
Fatal error occurred during processing. |
(terminal) |
CANCELLED |
User cancelled the import. |
(terminal) |
5. BLOB Storage
5.1. Attachment Entity
Import files are stored using the existing Attachment entity:
@Entity
@Inheritance(strategy = InheritanceType.JOINED)
public class Attachment {
@Id @GeneratedValue
private Long id;
@Column(unique = true, nullable = false)
private String uuid;
@Lob @Basic(fetch = FetchType.EAGER)
private byte[] data;
@ManyToOne(optional = false)
private Organisation organisation;
@NotNull
private String mediaType;
private Instant expiryDate;
@NotNull
private String name;
}
6. Database Schema
6.2. Enumerations
public enum ImportType {
EVENT_PARTICIPANT,
MEMBERSHIP,
RESULT
}
public enum ImportJobStatus {
UPLOADED,
COLUMN_MAPPING,
CELL_MAPPING,
PROCESSING,
COMPLETED,
FAILED,
CANCELLED
}
public enum MappingStatus {
AUTO_MATCHED, // System matched with high confidence
MANUAL_MATCHED, // User confirmed or corrected
UNMATCHED, // Cannot match, needs user input
IGNORED // User chose to skip
}
public enum ImportOutcome {
CREATED,
UPDATED,
SKIPPED,
ERROR
}
7. API Design
7.1. Endpoint Summary
| Method | Endpoint | Description |
|---|---|---|
POST |
|
Upload file and create import job |
GET |
|
List import jobs (paginated) |
GET |
|
Get import job status |
DELETE |
|
Cancel import job |
GET |
|
Get column mappings |
PUT |
|
Update column mappings |
POST |
|
Confirm all and proceed to cell mapping |
GET |
|
Get cell mappings |
GET |
|
Get candidate entities for FK resolution |
PUT |
|
Update cell mappings |
POST |
|
Confirm all and start processing |
POST |
|
Skip cell mapping and start processing |
GET |
|
Get row results (paginated) |
GET |
|
Get results summary by outcome |
7.2. Endpoint Details
7.2.1. Create Import Job
POST /api/imports
Content-Type: multipart/form-data
Parameters:
- file: MultipartFile (required) - The spreadsheet file
- importType: String (required) - EVENT_PARTICIPANT, MEMBERSHIP, RESULT
- contextId: Long (optional) - Event ID, MembershipPeriod ID, or Race ID
- organisationId: Long (optional) - Organisation ID (defaults to current user's org)
Response: 201 Created
Location: /api/imports/{uuid}
{
"identifier": "abc-123-def-456",
"importType": "EVENT_PARTICIPANT",
"status": "COLUMN_MAPPING",
"originalFilename": "registrations.xlsx",
"totalRows": 150,
"processedRows": 0,
"successCount": 0,
"errorCount": 0,
"createdAt": "2026-01-03T10:00:00Z",
"columnMappings": [
{"id": 1, "columnIndex": 0, "sourceHeader": "Name", "targetField": "FIRST_NAME", "status": "AUTO_MATCHED", "confidenceScore": 1.0},
{"id": 2, "columnIndex": 1, "sourceHeader": "Unknown Col", "targetField": null, "status": "UNRESOLVED", "confidenceScore": 0.0}
]
}
7.2.2. Get Import Job Status
GET /api/imports/{uuid}?includeMappings=true
Response: 200 OK
{
"identifier": "abc-123-def-456",
"importType": "EVENT_PARTICIPANT",
"status": "PROCESSING",
"totalRows": 150,
"processedRows": 75,
"successCount": 72,
"errorCount": 3,
"progressPercent": 50,
"createdAt": "2026-01-03T10:00:00Z"
}
7.2.3. List Import Jobs
GET /api/imports?organisationId=1&status=PROCESSING&page=0&size=20
Response: 200 OK
X-Total-Count: 5
[
{"identifier": "abc-123", "status": "PROCESSING", "progressPercent": 50, ...},
{"identifier": "def-456", "status": "COMPLETED", "progressPercent": 100, ...}
]
7.2.4. Update Column Mappings
PUT /api/imports/{uuid}/column-mappings
Content-Type: application/json
[
{"id": 1, "confirm": true},
{"id": 2, "targetField": "EVENT_CATEGORY_NAME"},
{"id": 3, "ignore": true}
]
Response: 202 Accepted (all required mapped)
Response: 406 Not Acceptable (required fields still unresolved)
7.2.5. Confirm Column Mappings
POST /api/imports/{uuid}/column-mappings/confirm
Response: 202 Accepted
{
"identifier": "abc-123-def-456",
"status": "CELL_MAPPING",
"cellMappings": [
{"id": 1, "targetField": "EVENT_CATEGORY_NAME", "sourceValue": "Junior", "status": "AUTO_MATCHED", "targetEntityId": 42},
{"id": 2, "targetField": "EVENT_CATEGORY_NAME", "sourceValue": "Unknown Cat", "status": "UNRESOLVED", "targetEntityId": null}
]
}
7.2.6. Get Cell Mapping Candidates
GET /api/imports/{uuid}/cell-mappings/candidates?targetField=EVENT_CATEGORY_NAME
Response: 200 OK
[
{"id": 42, "displayName": "Junior (U18)"},
{"id": 43, "displayName": "Senior (18+)"},
{"id": 44, "displayName": "Masters (40+)"}
]
7.2.7. Update Cell Mappings
PUT /api/imports/{uuid}/cell-mappings
Content-Type: application/json
[
{"id": 1, "confirm": true},
{"id": 2, "targetEntityId": 43},
{"id": 3, "ignore": true}
]
Response: 202 Accepted (all resolved)
Response: 406 Not Acceptable (unresolved mappings remain)
7.2.8. Start Processing
POST /api/imports/{uuid}/start
Description: Confirms all auto-matched mappings and starts processing.
Can be called from COLUMN_MAPPING (skips cell mapping) or CELL_MAPPING status.
Response: 202 Accepted
{
"identifier": "abc-123-def-456",
"status": "PROCESSING",
"totalRows": 150,
"processedRows": 0
}
7.2.9. Get Row Results
GET /api/imports/{uuid}/results?outcome=ERROR&page=0&size=50
Response: 200 OK
X-Total-Count: 5
[
{"id": 1, "rowNumber": 15, "outcome": "ERROR", "message": "Duplicate email: [email protected]"},
{"id": 2, "rowNumber": 42, "outcome": "ERROR", "message": "Invalid date format"}
]
8. Column Mapping
8.1. Auto-Matching Process
-
Extract headers from first row
-
Normalize each header (remove spaces, special chars, uppercase)
-
Look up in header dictionary
-
Calculate confidence score for fuzzy matches
-
Mark required fields that couldn’t match
9. Cell Mapping
9.1. Foreign Key Resolution
For fields that reference other entities, values must be resolved to IDs:
| Field | Resolution |
|---|---|
ID_COUNTRY |
Lookup |
EVENT_CATEGORY_NAME |
Lookup |
CUSTOM_LIST_* |
Lookup |
MEMBERSHIP_TYPE_ID |
Lookup |
9.2. Resolution Strategy
public interface CellValueResolver {
/**
* Attempt to resolve a source value to an entity ID.
*/
CellResolutionResult resolve(String sourceValue, ImportContext context);
/**
* Get all valid target values for UI dropdown.
*/
List<TargetOptionDTO> getAvailableTargets(ImportContext context);
}
10. Async Processing
10.1. Spring Async Configuration
@Configuration
@EnableAsync
public class AsyncConfig {
@Bean(name = "importTaskExecutor")
public Executor importTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(2);
executor.setMaxPoolSize(5);
executor.setQueueCapacity(25);
executor.setThreadNamePrefix("import-");
executor.initialize();
return executor;
}
}
10.2. Processing Service
@Service
public class ImportProcessingService {
@Async("importTaskExecutor")
public CompletableFuture<Void> processImport(UUID jobId) {
ImportJob job = findByUuid(jobId);
try {
job.setStatus(PROCESSING);
save(job);
SpreadsheetReader reader = createReader(job);
int rowIndex = 1; // Skip header
while (rowIndex < reader.getRowCount()) {
// Check cancellation
if (isCancelled(jobId)) return complete();
try {
processRow(reader.getRow(rowIndex), job);
job.incrementSuccess();
} catch (UnmappedCellException e) {
// Pause for user input
savePendingCellMapping(job, e);
job.setStatus(CELL_MAPPING);
save(job);
return complete();
} catch (Exception e) {
saveError(job, rowIndex, e);
job.incrementError();
}
job.setProcessedRows(rowIndex);
if (rowIndex % 100 == 0) save(job);
rowIndex++;
}
job.setStatus(COMPLETED);
job.setCompletedAt(Instant.now());
save(job);
} catch (Exception e) {
job.setStatus(FAILED);
save(job);
}
return complete();
}
}
11. Purging Strategy
11.1. Retention Rules
| Status | Retention | Rationale |
|---|---|---|
COMPLETED |
7 days |
Allow review and re-download |
FAILED |
7 days |
Allow debugging |
CANCELLED |
7 days |
Same as completed |
UPLOADED, COLUMN_MAPPING, CELL_MAPPING |
90 days |
User may return to complete |
PROCESSING |
No auto-purge |
Requires manual intervention |
11.2. Scheduled Cleanup
@Service
public class ImportCleanupService {
@Scheduled(cron = "0 0 2 * * ?") // Daily at 2 AM
@Transactional
public void cleanupExpiredImports() {
Instant now = Instant.now();
// Completed/Failed/Cancelled: 7 days
deleteByStatusesOlderThan(
Set.of(COMPLETED, FAILED, CANCELLED),
now.minus(7, ChronoUnit.DAYS)
);
// Abandoned: 90 days
deleteByStatusesOlderThan(
Set.of(UPLOADED, COLUMN_MAPPING, CELL_MAPPING),
now.minus(90, ChronoUnit.DAYS)
);
}
private void deleteByStatusesOlderThan(Set<Status> statuses, Instant cutoff) {
List<ImportJob> jobs = repository.findByStatusInAndCreatedAtBefore(statuses, cutoff);
for (ImportJob job : jobs) {
attachmentRepository.delete(job.getSourceFile());
repository.delete(job);
}
}
}
13. Related Documentation
-
File Import - XLSX/CSV parsing and column mapping
-
Data Synchronisation - Related synchronisation patterns