Objective
Establish a standardized folder structure and file naming convention for new data ingest processes, ensuring compatibility with the latest release schema and efficient storage/validation practices.
Requirements
- Create a new ingest folder in the repository.
- Within the ingest folder, create a subfolder for each data provider.
- All ingests must support the latest release schema.
- Depending on total data size, files should be split to limit each to ~25 MB.
- Do not split records between files: each file must contain only complete records so that validation can be performed independently.
- All data files are to be formatted as JSON lists (enclosed in brackets). Consider https://jsonlines.org/ as an alternative approach if more appropriate for downstream usage.
- File naming convention:
<data provider>_<padded 5 number>.json (e.g., emsl_00001.json).
- Future - explore jsonlines formate
Acceptance Criteria