File Import Architecture

1. Overview

The File Import architecture provides a flexible framework for importing data from spreadsheet files (XLSX and CSV) into the Event and Membership System. The core feature is a dictionary-based auto column mapping technique that intelligently matches spreadsheet headers to domain entity fields.

2. Supported File Formats

2.1. XLSX (Excel 2007+)

Aspect Details

Library

Apache POI 5.2.0

Class

org.apache.poi.xssf.usermodel.XSSFWorkbook

Cell Types

STRING, NUMERIC (including dates), BOOLEAN, FORMULA, BLANK

Multi-sheet

Supported via sheetIndex parameter

Memory

Full workbook loaded into memory

2.2. CSV (Comma-Separated Values)

Aspect Details

Implementation

Custom parser or Apache Commons CSV

Delimiters

Comma (default), semicolon, tab

Encoding

UTF-8 (default)

Quoting

Double-quote with escape support

Headers

First row assumed to be headers

3. Class Hierarchy

za.co.idealogic.event.admin.service/
├── ImportXLS (abstract base class)
│   ├── EventParticipantImportXLS
│   ├── MembershipImportXLS
│   └── ResultImportXLS
│
├── dto/
│   ├── ImportAddResponseDTO (base)
│   │   ├── EventParticipantAddResponseDTO
│   │   ├── MembershipAddResponseDTO
│   │   └── ResultAddResponseDTO
│
└── MembershipImportService (orchestration)

4. ImportXLS Base Class

The ImportXLS abstract class provides the foundation for all spreadsheet imports.

4.1. Location

admin-service/src/main/java/za/co/idealogic/event/admin/service/ImportXLS.java

4.2. Core Components

4.2.1. Header Dictionary

A Map<String, Column> that maps normalized header strings to column enum values:

private Map<String, Column> headerDictionary = new HashMap<>();

protected void addHeaderDictionary(String key, Column column) {
    headerDictionary.put(key.toUpperCase().replace(" ", ""), column);
}

4.2.2. Column Interface

Each import type defines its columns as an enum implementing this interface:

public interface Column {
    public boolean isRequired();
    public ColumnType getType();
}

public enum ColumnType {
    STRING, DATE, TIME, DATE_TIME, INTEGER, FLOAT, DECIMAL, BOOLEAN
}

4.2.3. Default Values

Support for fallback values when columns are missing or blank:

private Map<Column, String> defaultValues = new HashMap<>();

5. Dictionary-Based Auto Column Mapping

5.1. Concept

When a spreadsheet is uploaded, the system automatically attempts to match each column header to a known entity field. This is achieved through a normalization and lookup process.

5.2. Header Normalization Algorithm

Headers are normalized before lookup to handle common variations:

protected List<Column> detectColumns(Row headerRow) {
    final List<Column> fields = new ArrayList<>();
    for (Cell cell : headerRow) {
        String value = cell.getStringCellValue();
        // Remove common separators
        value = value.replace("/", "");
        value = value.replace("-", "");
        value = value.replace("_", "");
        value = value.replace(" ", "");
        // Case insensitive
        value = value.toUpperCase();

        if (headerDictionary.containsKey(value)) {
            fields.add(headerDictionary.get(value));
        } else {
            fields.add(null);  // Unmatched
        }
    }
    return fields;
}

5.3. Multi-Alias Support

Each target field can have multiple aliases registered. This allows matching common variations:

// First name variations
addHeaderDictionary("FirstName", Column.FIRST_NAME);
addHeaderDictionary("Name", Column.FIRST_NAME);
addHeaderDictionary("FN", Column.FIRST_NAME);

// Last name variations
addHeaderDictionary("LastName", Column.LAST_NAME);
addHeaderDictionary("Surname", Column.LAST_NAME);
addHeaderDictionary("SN", Column.LAST_NAME);
addHeaderDictionary("LN", Column.LAST_NAME);

// Contact number variations
addHeaderDictionary("ContactNumber", Column.CONTACT_NUMBER);
addHeaderDictionary("Cellphone", Column.CONTACT_NUMBER);
addHeaderDictionary("CellphoneNumber", Column.CONTACT_NUMBER);
addHeaderDictionary("CellNumber", Column.CONTACT_NUMBER);
addHeaderDictionary("Telephone", Column.CONTACT_NUMBER);
addHeaderDictionary("TelephoneNumber", Column.CONTACT_NUMBER);
addHeaderDictionary("Mobile", Column.CONTACT_NUMBER);

5.4. Normalization Examples

Original Header Normalized Matches

"First Name"

FIRSTNAME

FIRST_NAME

"first_name"

FIRSTNAME

FIRST_NAME

"Date of Birth"

DATEOFBIRTH

DOB

"ID/Passport"

IDPASSPORT

ID_NUMBER

"Cell-phone Number"

CELLPHONENUMBER

CONTACT_NUMBER

6. Column Lookup Map

After detecting columns, a reverse lookup map is created for efficient cell access:

protected Map<Column, Integer> columnLookupMap(List<Column> columns) {
    Map<Column, Integer> map = new HashMap<>();
    int i = 0;
    for (Column column : columns) {
        if (column != null && !map.containsKey(column)) {
            map.put(column, i);  // First occurrence wins
        }
        i++;
    }
    return map;
}

Important: If the same column appears multiple times in the spreadsheet (e.g., two "First Name" columns), only the first occurrence is used.

7. Type-Safe Cell Parsing

The base class provides type-safe parsing methods for common data types:

7.1. String Parsing

Handles both numeric and string cell types:

protected String parseAsString(Row row, Map<Column, Integer> map, Column column) {
    Cell cell = extractCell(row, map, column);
    if (cell != null) {
        if (cell.getCellType() == CellType.NUMERIC) {
            Double value = cell.getNumericCellValue();
            return doubleToStringFormatter.format(value);  // No decimals
        } else if (cell.getCellType() == CellType.STRING) {
            String value = cell.getRichStringCellValue().toString().trim();
            if (!value.isEmpty()) return value;
        }
    }
    return defaultValues.get(column);  // Fallback to default
}

7.2. Numeric Parsing

// BigDecimal for monetary amounts
protected BigDecimal parseAsBigDecimal(Row row, Map<Column, Integer> map, Column column);

// Long for IDs
protected Long parseAsLong(Row row, Map<Column, Integer> map, Column column);

// Integer for counts
protected Integer parseAsInt(Row row, Map<Column, Integer> map, Column column);

7.3. Date/Time Parsing

Handles both Excel date cells and string representations:

protected LocalDate parseAsLocalDate(Row row, Map<Column, Integer> map, Column column) {
    Cell cell = extractCell(row, map, column);
    if (cell != null) {
        if (cell.getCellType() == CellType.NUMERIC) {
            // Excel stores dates as numbers
            LocalDateTime value = cell.getLocalDateTimeCellValue();
            return value.toLocalDate();
        } else if (cell.getCellType() == CellType.STRING) {
            String value = cell.getStringCellValue().trim();
            value = value.replace("/", "-");  // Normalize separators
            return LocalDate.parse(value, DateTimeFormatter.ISO_DATE);
        }
    }
    return null;
}

7.4. Boolean Parsing

Flexible interpretation of boolean values:

protected boolean parseAsBoolean(Row row, Map<Column, Integer> map, Column column) {
    Cell cell = extractCell(row, map, column);
    if (cell != null) {
        if (cell.getCellType() == CellType.NUMERIC) {
            return cell.getNumericCellValue() == 1;  // 1 = true
        } else if (cell.getCellType() == CellType.STRING) {
            String value = cell.getStringCellValue().trim().toUpperCase();
            return value.startsWith("Y") || value.startsWith("T");  // Yes/True
        }
    }
    return false;
}

8. Response Model

8.1. ImportAddResponseDTO

Base class for import row results:

public class ImportAddResponseDTO {
    public enum Outcome {
        CREATED,   // Successfully imported
        WARNING,   // Imported with warnings
        ERROR,     // Failed to import
        SKIP       // Row skipped (blank)
    }

    private String message;
    private Outcome outcome;
    private Integer lineNumber;
}

8.2. Specialized Response DTOs

Each import type extends the base:

  • EventParticipantAddResponseDTO - adds participant-specific fields

  • MembershipAddResponseDTO - adds membership-specific fields

  • ResultAddResponseDTO - adds result-specific fields

9. Extending the Framework

9.1. Creating a New Import Type

  1. Define Column Enum:

private enum Column implements ImportXLS.Column {
    FIELD_ONE(ColumnType.STRING, true),   // Required
    FIELD_TWO(ColumnType.DATE),           // Optional
    FIELD_THREE;                          // Optional string

    private final boolean required;
    private final ColumnType type;

    Column() {
        this.type = ColumnType.STRING;
        this.required = false;
    }

    Column(ColumnType type) {
        this.type = type;
        this.required = false;
    }

    Column(ColumnType type, boolean required) {
        this.type = type;
        this.required = required;
    }

    @Override
    public boolean isRequired() { return required; }

    @Override
    public ColumnType getType() { return type; }
}
  1. Register Header Aliases:

public MyImportXLS() {
    super();
    addHeaderDictionary("FieldOne", Column.FIELD_ONE);
    addHeaderDictionary("Field1", Column.FIELD_ONE);
    addHeaderDictionary("F1", Column.FIELD_ONE);
    // ... more aliases
}
  1. Implement Row Extraction:

private MyResponseDTO extractRow(Row row, Map<Column, Integer> map) {
    String fieldOne = parseAsString(row, map, Column.FIELD_ONE);
    LocalDate fieldTwo = parseAsLocalDate(row, map, Column.FIELD_TWO);
    // ... process fields
    return response;
}
  1. Create Response DTO:

public class MyResponseDTO extends ImportAddResponseDTO {
    private Long createdId;
    // ... additional fields
}

10. Import Type: EventParticipant

10.1. Column Definitions

The EventParticipant import supports 32 columns:

Column Type Required

FIRST_NAME

STRING

Yes

LAST_NAME

STRING

Yes

TITLE

STRING

No

ID_NUMBER

STRING

No

ID_TYPE

STRING

No

ID_COUNTRY

STRING (FK)

No

GENDER

STRING

No

DOB

DATE

No

EMAIL

STRING

No

CONTACT_NUMBER

STRING

No

EVENT_CATEGORY_NAME

STRING (FK)

No

NUMBER

STRING

No

TAG_NUMBER

STRING

No

ORDER_ID

INTEGER

No

ORDER_STATUS

STRING

No

CUSTOM_1

STRING

No

CUSTOM_2

STRING

No

CUSTOM_3

STRING

No

CUSTOM_LIST_1

STRING (FK)

No

CUSTOM_LIST_2

STRING (FK)

No

CUSTOM_LIST_3

STRING (FK)

No

…​ and more

10.2. Foreign Key Fields

Fields marked (FK) require resolution to entity IDs:

  • ID_COUNTRYCountry.id

  • EVENT_CATEGORY_NAMEEventCategory.id

  • CUSTOM_LIST_*CustomListValue.id

11. Import Type: Membership

11.1. Column Definitions

The Membership import supports 30 columns with similar patterns to EventParticipant, plus membership-specific fields:

  • MEMBERSHIP_NUMBER

  • MEMBERSHIP_STATUS

  • VALID_FROM

  • VALID_UNTIL

  • MEMBERSHIP_TYPE_ID

12. Import Type: Result

12.1. Column Definitions

The Result import supports 12 columns focused on race results:

  • EVENT_PARTICIPANT_ID

  • FINISH_TIME

  • POSITION

  • POINTS

  • DISQUALIFIED

13. Error Handling

13.1. Row-Level Errors

Each row is processed independently. Errors on one row don’t prevent processing of subsequent rows:

try {
    // Process row
    return createSuccessResponse(id);
} catch (Exception e) {
    log.error("Error processing row {}: {}", rowNum, e.getMessage());
    return createErrorResponse(e.getMessage());
}

13.2. Common Error Scenarios

Scenario Handling

Required field missing

Row skipped, error response

Date format invalid

Logged, field set to null

FK value not found

Error response (async: queued for mapping)

Duplicate record

Depends on implementation (update or skip)

14. Performance Considerations

14.1. Memory Usage

  • Full workbook loaded for XLSX

  • Consider streaming API (SXSSFWorkbook) for very large files

  • CSV is more memory-efficient for large datasets

14.2. Database Queries

Current implementation has N+1 query patterns:

  • Organisation lookup per row

  • Person lookup per row

  • FK resolution per FK field per row

Optimization opportunities:

  • Cache FK lookups at import start

  • Batch database operations

  • Pre-load reference data