File Import Architecture

1. Overview

The File Import architecture provides a flexible framework for importing data from spreadsheet files (XLSX and CSV) into the Event and Membership System. The core feature is a dictionary-based auto column mapping technique that intelligently matches spreadsheet headers to domain entity fields.

2. Supported File Formats

2.1. XLSX (Excel 2007+)

Aspect Details

Aspect	Details
Library	Apache POI 5.2.0
Class	`org.apache.poi.xssf.usermodel.XSSFWorkbook`
Cell Types	STRING, NUMERIC (including dates), BOOLEAN, FORMULA, BLANK
Multi-sheet	Supported via `sheetIndex` parameter
Memory	Full workbook loaded into memory

Library

Apache POI 5.2.0

Class

org.apache.poi.xssf.usermodel.XSSFWorkbook

Cell Types

STRING, NUMERIC (including dates), BOOLEAN, FORMULA, BLANK

Multi-sheet

Supported via sheetIndex parameter

Memory

Full workbook loaded into memory

2.2. CSV (Comma-Separated Values)

Aspect	Details
Implementation	Custom parser or Apache Commons CSV
Delimiters	Comma (default), semicolon, tab
Encoding	UTF-8 (default)
Quoting	Double-quote with escape support
Headers	First row assumed to be headers

Aspect

Details

Implementation

Custom parser or Apache Commons CSV

Delimiters

Comma (default), semicolon, tab

Encoding

UTF-8 (default)

Quoting

Double-quote with escape support

Headers

First row assumed to be headers

3. Class Hierarchy

za.co.idealogic.event.admin.service/
├── ImportXLS (abstract base class)
│   ├── EventParticipantImportXLS
│   ├── MembershipImportXLS
│   └── ResultImportXLS
│
├── dto/
│   ├── ImportAddResponseDTO (base)
│   │   ├── EventParticipantAddResponseDTO
│   │   ├── MembershipAddResponseDTO
│   │   └── ResultAddResponseDTO
│
└── MembershipImportService (orchestration)

4. ImportXLS Base Class

The ImportXLS abstract class provides the foundation for all spreadsheet imports.

4.1. Location

admin-service/src/main/java/za/co/idealogic/event/admin/service/ImportXLS.java

4.2. Core Components

4.2.1. Header Dictionary

A Map<String, Column> that maps normalized header strings to column enum values:

private Map<String, Column> headerDictionary = new HashMap<>();

protected void addHeaderDictionary(String key, Column column) {
    headerDictionary.put(key.toUpperCase().replace(" ", ""), column);
}

4.2.2. Column Interface

Each import type defines its columns as an enum implementing this interface:

public interface Column {
    public boolean isRequired();
    public ColumnType getType();
}

public enum ColumnType {
    STRING, DATE, TIME, DATE_TIME, INTEGER, FLOAT, DECIMAL, BOOLEAN
}

4.2.3. Default Values

Support for fallback values when columns are missing or blank:

private Map<Column, String> defaultValues = new HashMap<>();

5. Dictionary-Based Auto Column Mapping

5.1. Concept

When a spreadsheet is uploaded, the system automatically attempts to match each column header to a known entity field. This is achieved through a normalization and lookup process.

5.2. Header Normalization Algorithm

Headers are normalized before lookup to handle common variations:

protected List<Column> detectColumns(Row headerRow) {
    final List<Column> fields = new ArrayList<>();
    for (Cell cell : headerRow) {
        String value = cell.getStringCellValue();
        // Remove common separators
        value = value.replace("/", "");
        value = value.replace("-", "");
        value = value.replace("_", "");
        value = value.replace(" ", "");
        // Case insensitive
        value = value.toUpperCase();

        if (headerDictionary.containsKey(value)) {
            fields.add(headerDictionary.get(value));
        } else {
            fields.add(null);  // Unmatched
        }
    }
    return fields;
}

5.3. Multi-Alias Support

Each target field can have multiple aliases registered. This allows matching common variations:

// First name variations
addHeaderDictionary("FirstName", Column.FIRST_NAME);
addHeaderDictionary("Name", Column.FIRST_NAME);
addHeaderDictionary("FN", Column.FIRST_NAME);

// Last name variations
addHeaderDictionary("LastName", Column.LAST_NAME);
addHeaderDictionary("Surname", Column.LAST_NAME);
addHeaderDictionary("SN", Column.LAST_NAME);
addHeaderDictionary("LN", Column.LAST_NAME);

// Contact number variations
addHeaderDictionary("ContactNumber", Column.CONTACT_NUMBER);
addHeaderDictionary("Cellphone", Column.CONTACT_NUMBER);
addHeaderDictionary("CellphoneNumber", Column.CONTACT_NUMBER);
addHeaderDictionary("CellNumber", Column.CONTACT_NUMBER);
addHeaderDictionary("Telephone", Column.CONTACT_NUMBER);
addHeaderDictionary("TelephoneNumber", Column.CONTACT_NUMBER);
addHeaderDictionary("Mobile", Column.CONTACT_NUMBER);

5.4. Normalization Examples

Original Header	Normalized	Matches
"First Name"	FIRSTNAME	FIRST_NAME
"first_name"	FIRSTNAME	FIRST_NAME
"Date of Birth"	DATEOFBIRTH	DOB
"ID/Passport"	IDPASSPORT	ID_NUMBER
"Cell-phone Number"	CELLPHONENUMBER	CONTACT_NUMBER

Original Header

Normalized

Matches

"First Name"

FIRSTNAME

FIRST_NAME

"first_name"

FIRSTNAME

FIRST_NAME

"Date of Birth"

DATEOFBIRTH

DOB

"ID/Passport"

IDPASSPORT

ID_NUMBER

"Cell-phone Number"

CELLPHONENUMBER

CONTACT_NUMBER

6. Column Lookup Map

After detecting columns, a reverse lookup map is created for efficient cell access:

protected Map<Column, Integer> columnLookupMap(List<Column> columns) {
    Map<Column, Integer> map = new HashMap<>();
    int i = 0;
    for (Column column : columns) {
        if (column != null && !map.containsKey(column)) {
            map.put(column, i);  // First occurrence wins
        }
        i++;
    }
    return map;
}

Important: If the same column appears multiple times in the spreadsheet (e.g., two "First Name" columns), only the first occurrence is used.

7. Type-Safe Cell Parsing

The base class provides type-safe parsing methods for common data types:

7.1. String Parsing

Handles both numeric and string cell types:

protected String parseAsString(Row row, Map<Column, Integer> map, Column column) {
    Cell cell = extractCell(row, map, column);
    if (cell != null) {
        if (cell.getCellType() == CellType.NUMERIC) {
            Double value = cell.getNumericCellValue();
            return doubleToStringFormatter.format(value);  // No decimals
        } else if (cell.getCellType() == CellType.STRING) {
            String value = cell.getRichStringCellValue().toString().trim();
            if (!value.isEmpty()) return value;
        }
    }
    return defaultValues.get(column);  // Fallback to default
}

7.2. Numeric Parsing

// BigDecimal for monetary amounts
protected BigDecimal parseAsBigDecimal(Row row, Map<Column, Integer> map, Column column);

// Long for IDs
protected Long parseAsLong(Row row, Map<Column, Integer> map, Column column);

// Integer for counts
protected Integer parseAsInt(Row row, Map<Column, Integer> map, Column column);

7.3. Date/Time Parsing

Handles both Excel date cells and string representations:

protected LocalDate parseAsLocalDate(Row row, Map<Column, Integer> map, Column column) {
    Cell cell = extractCell(row, map, column);
    if (cell != null) {
        if (cell.getCellType() == CellType.NUMERIC) {
            // Excel stores dates as numbers
            LocalDateTime value = cell.getLocalDateTimeCellValue();
            return value.toLocalDate();
        } else if (cell.getCellType() == CellType.STRING) {
            String value = cell.getStringCellValue().trim();
            value = value.replace("/", "-");  // Normalize separators
            return LocalDate.parse(value, DateTimeFormatter.ISO_DATE);
        }
    }
    return null;
}

7.4. Boolean Parsing

Flexible interpretation of boolean values:

protected boolean parseAsBoolean(Row row, Map<Column, Integer> map, Column column) {
    Cell cell = extractCell(row, map, column);
    if (cell != null) {
        if (cell.getCellType() == CellType.NUMERIC) {
            return cell.getNumericCellValue() == 1;  // 1 = true
        } else if (cell.getCellType() == CellType.STRING) {
            String value = cell.getStringCellValue().trim().toUpperCase();
            return value.startsWith("Y") || value.startsWith("T");  // Yes/True
        }
    }
    return false;
}

8. Response Model

8.1. ImportAddResponseDTO

Base class for import row results:

public class ImportAddResponseDTO {
    public enum Outcome {
        CREATED,   // Successfully imported
        WARNING,   // Imported with warnings
        ERROR,     // Failed to import
        SKIP       // Row skipped (blank)
    }

    private String message;
    private Outcome outcome;
    private Integer lineNumber;
}

8.2. Specialized Response DTOs

Each import type extends the base:

EventParticipantAddResponseDTO - adds participant-specific fields
MembershipAddResponseDTO - adds membership-specific fields
ResultAddResponseDTO - adds result-specific fields

9. Extending the Framework

9.1. Creating a New Import Type

Define Column Enum:

private enum Column implements ImportXLS.Column {
    FIELD_ONE(ColumnType.STRING, true),   // Required
    FIELD_TWO(ColumnType.DATE),           // Optional
    FIELD_THREE;                          // Optional string

    private final boolean required;
    private final ColumnType type;

    Column() {
        this.type = ColumnType.STRING;
        this.required = false;
    }

    Column(ColumnType type) {
        this.type = type;
        this.required = false;
    }

    Column(ColumnType type, boolean required) {
        this.type = type;
        this.required = required;
    }

    @Override
    public boolean isRequired() { return required; }

    @Override
    public ColumnType getType() { return type; }
}

Register Header Aliases:

public MyImportXLS() {
    super();
    addHeaderDictionary("FieldOne", Column.FIELD_ONE);
    addHeaderDictionary("Field1", Column.FIELD_ONE);
    addHeaderDictionary("F1", Column.FIELD_ONE);
    // ... more aliases
}

Implement Row Extraction:

private MyResponseDTO extractRow(Row row, Map<Column, Integer> map) {
    String fieldOne = parseAsString(row, map, Column.FIELD_ONE);
    LocalDate fieldTwo = parseAsLocalDate(row, map, Column.FIELD_TWO);
    // ... process fields
    return response;
}

Create Response DTO:

public class MyResponseDTO extends ImportAddResponseDTO {
    private Long createdId;
    // ... additional fields
}

10. Import Type: EventParticipant

10.1. Column Definitions

The EventParticipant import supports 32 columns:

Column	Type	Required
FIRST_NAME	STRING	Yes
LAST_NAME	STRING	Yes
TITLE	STRING	No
ID_NUMBER	STRING	No
ID_TYPE	STRING	No
ID_COUNTRY	STRING (FK)	No
GENDER	STRING	No
DOB	DATE	No
EMAIL	STRING	No
CONTACT_NUMBER	STRING	No
EVENT_CATEGORY_NAME	STRING (FK)	No
NUMBER	STRING	No
TAG_NUMBER	STRING	No
ORDER_ID	INTEGER	No
ORDER_STATUS	STRING	No
CUSTOM_1	STRING	No
CUSTOM_2	STRING	No
CUSTOM_3	STRING	No
CUSTOM_LIST_1	STRING (FK)	No
CUSTOM_LIST_2	STRING (FK)	No
CUSTOM_LIST_3	STRING (FK)	No
… and more

Column

Type

Required

FIRST_NAME

STRING

Yes

LAST_NAME

STRING

Yes

TITLE

STRING

ID_NUMBER

STRING

ID_TYPE

STRING

ID_COUNTRY

STRING (FK)

GENDER

STRING

DOB

DATE

STRING

CONTACT_NUMBER

STRING

EVENT_CATEGORY_NAME

STRING (FK)

NUMBER

STRING

TAG_NUMBER

STRING

ORDER_ID

INTEGER

ORDER_STATUS

STRING

CUSTOM_1

STRING

CUSTOM_2

STRING

CUSTOM_3

STRING

CUSTOM_LIST_1

STRING (FK)

CUSTOM_LIST_2

STRING (FK)

CUSTOM_LIST_3

STRING (FK)

… and more

10.2. Foreign Key Fields

Fields marked (FK) require resolution to entity IDs:

ID_COUNTRY → Country.id
EVENT_CATEGORY_NAME → EventCategory.id
CUSTOM_LIST_* → CustomListValue.id

11. Import Type: Membership

11.1. Column Definitions

The Membership import supports 30 columns with similar patterns to EventParticipant, plus membership-specific fields:

MEMBERSHIP_NUMBER
MEMBERSHIP_STATUS
VALID_FROM
VALID_UNTIL
MEMBERSHIP_TYPE_ID

12. Import Type: Result

12.1. Column Definitions

The Result import supports 12 columns focused on race results:

EVENT_PARTICIPANT_ID
FINISH_TIME
POSITION
POINTS
DISQUALIFIED

13. Error Handling

13.1. Row-Level Errors

Each row is processed independently. Errors on one row don’t prevent processing of subsequent rows:

try {
    // Process row
    return createSuccessResponse(id);
} catch (Exception e) {
    log.error("Error processing row {}: {}", rowNum, e.getMessage());
    return createErrorResponse(e.getMessage());
}

13.2. Common Error Scenarios

Scenario	Handling
Required field missing	Row skipped, error response
Date format invalid	Logged, field set to null
FK value not found	Error response (async: queued for mapping)
Duplicate record	Depends on implementation (update or skip)

Scenario

Handling

Required field missing

Row skipped, error response

Date format invalid

Logged, field set to null

FK value not found

Error response (async: queued for mapping)

Duplicate record

Depends on implementation (update or skip)

14. Performance Considerations

14.1. Memory Usage

Full workbook loaded for XLSX
Consider streaming API (SXSSFWorkbook) for very large files
CSV is more memory-efficient for large datasets

14.2. Database Queries

Current implementation has N+1 query patterns:

Organisation lookup per row
Person lookup per row
FK resolution per FK field per row

Optimization opportunities:

Cache FK lookups at import start
Batch database operations
Pre-load reference data

15. Related Documentation

Async Import Architecture - Asynchronous processing pattern
Data Synchronisation - Related synchronisation patterns