Skip to content

replaced lib submodule with actual source files#4

Open
Kwanddwo wants to merge 1 commit into
csvtoolkit:mainfrom
Kwanddwo:main
Open

replaced lib submodule with actual source files#4
Kwanddwo wants to merge 1 commit into
csvtoolkit:mainfrom
Kwanddwo:main

Conversation

@Kwanddwo

@Kwanddwo Kwanddwo commented Jun 19, 2026

Copy link
Copy Markdown

replaced the /lib submodule which points to FastCSV-C with the current source files from the directory.

This should solve this issue: #3

Summary by CodeRabbit

Release Notes

  • New Features

    • Added FastCSV-C library with CSV reading and writing capabilities, including support for multiple encodings, custom delimiters, and validation modes.
    • Added memory arena allocator for efficient memory management.
    • Added comprehensive CSV configuration system.
  • Documentation

    • Added library documentation, contributor guidelines, and license.
  • Tests

    • Added test suite covering all library components.
  • Chores

    • Added GitHub Actions workflows for continuous integration and automated releases.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Removes the lib git submodule reference and inlines the complete FastCSV-C library source directly into the repository. The addition includes a bump-pointer arena allocator, CSV configuration/utilities/parser/reader/writer modules with full public C APIs, assert-based test suites for each module with a fork/exec runner, GitHub Actions CI and release workflows, and project documentation (README.md, CONTRIBUTING.md, LICENSE, .gitignore).

Changes

FastCSV-C Library Introduction

Layer / File(s) Summary
Git submodule removal and project scaffolding
.gitmodules, lib, lib/.gitignore, lib/LICENSE, lib/CONTRIBUTING.md, lib/README.md
Removes the lib submodule entry and subproject commit reference, populates .gitignore with build/test artifact patterns, adds MIT license, contributor guide, and full library README with API docs, usage examples, architecture diagram, and testing/performance notes.
Arena allocator contract and implementation
lib/arena.h, lib/arena.c
Declares Arena, ArenaResult, and ArenaRegion types with lifecycle/allocation/region-checkpoint API; implements bump-pointer allocation with 8-byte alignment, strdup, realloc, capacity queries, region rollback, and error string mapping.
CSV config and utils contracts and implementation
lib/csv_config.h, lib/csv_config.c, lib/csv_utils.h, lib/csv_utils.c
Defines CSVEncoding enum and CSVConfig struct with full getter/setter API backed by arena allocation; adds CSVUtilsResult enum and utility functions for whitespace trimming, CSV character validation, escaping detection, and error strings.
CSV parser contract and implementation
lib/csv_parser.h, lib/csv_parser.c
Declares ParseState, CSVParserResult, FieldArray, ParseContext, CSVParseResult, and CSVParser types; implements arena-backed field array management, a delimiter/enclosure state machine in csv_parse_line_inplace, and read_full_record for multi-line quoted record buffering.
CSV reader contract and implementation
lib/csv_reader.h, lib/csv_reader.c
Declares CSVRecord and CSVReader types; implements arena-owned and caller-arena constructors with optional header caching, next_record, rewind, seek, has_next, get_record_count, get_position, set_config, and free.
CSV writer contract and implementation
lib/csv_writer.h, lib/csv_writer.c
Declares CSVWriterResult, CSVWriter, and FieldWriteOptions types; implements BOM emission for UTF-8/16/32, path-based and existing-FILE* constructors with header writing, ordered and mapped record writing with per-field quoting/escaping, numeric detection, auto-flush, and cleanup.
Test suites and runner
lib/tests/README.md, lib/tests/run_all_tests.c, lib/tests/test_arena.c, lib/tests/test_csv_config.c, lib/tests/test_csv_parser.c, lib/tests/test_csv_reader.c, lib/tests/test_csv_utils.c, lib/tests/test_csv_writer.c
Adds a fork/exec test runner and six assert-based test programs covering arena lifecycle/alignment/regions, config defaults/setters/null-safety, parser state machine/escaping/multi-line records, reader iteration/seek/rewind/count, utility trimming/validation/escaping, and writer initialization/quoting/BOM/encoding/line endings.
CI and release workflows
lib/.github/workflows/ci.yml, lib/.github/workflows/release.yml
Adds a CI workflow with matrix (Ubuntu/macOS) test, memory-safety Valgrind, ARM cross-compile, and release-test jobs; adds a tag-triggered release workflow that builds, tests, packages source and binary distributions, and publishes a GitHub Release.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant csv_reader_init_standalone
  participant Arena as Arena (persistent + temp)
  participant csv_parser
  participant FILE

  Caller->>csv_reader_init_standalone: CSVConfig* (path, hasHeader, encoding...)
  csv_reader_init_standalone->>Arena: arena_create x2 (persistent, temp)
  csv_reader_init_standalone->>FILE: fopen(config->path)
  csv_reader_init_standalone->>csv_parser: read_full_record → csv_parse_line_inplace (cache headers)
  csv_reader_init_standalone-->>Caller: CSVReader*

  loop while csv_reader_has_next
    Caller->>csv_parser: csv_reader_next_record
    csv_parser->>Arena: arena_reset(temp_arena)
    csv_parser->>FILE: read_full_record
    csv_parser->>csv_parser: csv_parse_line_inplace (state machine)
    csv_parser->>Arena: alloc CSVRecord in temp_arena
    csv_parser-->>Caller: CSVRecord* (fields[])
  end

  Caller->>csv_reader_init_standalone: csv_reader_free
  csv_reader_init_standalone->>FILE: fclose
  csv_reader_init_standalone->>Arena: arena_destroy x2
Loading
sequenceDiagram
  participant Caller
  participant csv_writer_init
  participant Arena
  participant FILE

  Caller->>csv_writer_init: CSVConfig* (path, encoding, writeBOM, headers)
  csv_writer_init->>Arena: alloc CSVWriter + copy headers
  csv_writer_init->>FILE: fopen(config->path, "wb")
  csv_writer_init->>FILE: write_bom (if writeBOM)
  csv_writer_init->>FILE: write_headers (headers + newline)
  csv_writer_init-->>Caller: CSVWriter*

  Caller->>csv_writer_init: csv_writer_write_record(fields[])
  csv_writer_init->>FILE: write_field per field (quote if needed)
  csv_writer_init->>FILE: fwrite newline
  csv_writer_init->>FILE: fflush (if autoFlush)

  Caller->>csv_writer_init: csv_writer_free
  csv_writer_init->>FILE: fflush + fclose (if owns_file)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • csvtoolkit/FastCSV-ext#1: Updates config.m4 to include lib/arena.c, lib/csv_parser.c, lib/csv_utils.c, and related sources as extension build inputs — directly referencing the same source files introduced by this PR.

Poem

🐇 A submodule once hid in the deep,
Now arena and parser awake from their sleep!
With bump-pointer magic and RFC 4180 in hand,
The rabbit inlines the whole CSV land.
read_full_record, next_record, fields galore —
No quoted newline shall trouble us more! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: removing the lib submodule entry and replacing it with actual source files from the FastCSV-C library.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (22)
lib/csv_utils.c-35-43 (1)

35-43: ⚠️ Potential issue | 🟠 Major

Replace strlen(start) with a bounded scan to respect max_len.

At line 35, strlen(start) reads unbounded and can scan past max_len if the input is not NUL-terminated within bounds. The overflow check at line 41 occurs too late to prevent the over-read. Use strnlen or equivalent to limit the scan to remaining buffer capacity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_utils.c` around lines 35 - 43, The unbounded strlen call at line 35
can read past the max_len boundary if the input is not NUL-terminated within
bounds, creating a buffer over-read vulnerability before the overflow check at
line 41 can prevent it. Replace the strlen(start) call with strnlen(start,
max_len) or equivalent bounded string length function to limit the scan to the
remaining buffer capacity, ensuring the length calculation respects the max_len
constraint before computing trimmed_len.
lib/csv_config.h-32-32 (1)

32-32: ⚠️ Potential issue | 🟠 Major

Fix limit field type to match API contract (int).

At Line 32, limit is declared as char while getter/setter signatures use int. The setter implicitly truncates values outside the char range (-128 to 127), and the getter performs sign extension back to int, causing silent data loss for any limit value > 127 or < -128.

Suggested fix
-    char limit;
+    int limit;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_config.h` at line 32, The `limit` field declared in the csv_config.h
file is declared as `char` type but the corresponding getter and setter methods
use `int` type, creating a type mismatch that causes silent data loss when limit
values fall outside the char range. Change the `limit` field declaration from
`char` to `int` to match the API contract and prevent truncation and sign
extension issues when values exceed the char range limits of -128 to 127.
lib/csv_utils.c-74-79 (1)

74-79: ⚠️ Potential issue | 🟠 Major

Add NULL guard in trim_whitespace to prevent segfault.

The legacy trim_whitespace function at lines 74-92 dereferences str unconditionally at line 77 without checking for NULL. Passing NULL will cause a crash. The modern counterpart csv_utils_trim_whitespace already implements this guard at line 20. Add the same protection to the legacy function.

Suggested fix
 char* trim_whitespace(char *str) {
+    if (!str) return NULL;
     char *end;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_utils.c` around lines 74 - 79, The trim_whitespace function
dereferences the str parameter without first checking if it is NULL, which will
cause a segmentation fault if NULL is passed. Add a NULL guard check at the
beginning of the trim_whitespace function that returns early (similar to the
protection already implemented in the modern csv_utils_trim_whitespace function)
before any pointer dereference operations occur.
lib/arena.c-155-166 (1)

155-166: ⚠️ Potential issue | 🟠 Major

Add validation for used_at_checkpoint consistency before restoring allocator state.

The checkpoint pointer is validated for bounds, but used_at_checkpoint is restored without verification. If an ArenaRegion struct is corrupted or mutated, an inconsistent used_at_checkpoint value will corrupt the allocator's bookkeeping without detection.

Suggested fix
 ArenaResult arena_restore_region(ArenaRegion *region) {
     if (!region || !region->arena) return ARENA_ERROR_NULL_POINTER;
     if (!region->checkpoint) return ARENA_ERROR_INVALID_SIZE;
     
     if (region->checkpoint < region->arena->memory || 
         region->checkpoint > region->arena->end) {
         return ARENA_ERROR_INVALID_SIZE;
     }
+    
+    size_t checkpoint_used = (size_t)(region->checkpoint - region->arena->memory);
+    if (checkpoint_used > region->arena->total_size) return ARENA_ERROR_INVALID_SIZE;
+    if (region->used_at_checkpoint != checkpoint_used) return ARENA_ERROR_INVALID_SIZE;
     
     region->arena->current = region->checkpoint;
     region->arena->used_size = region->used_at_checkpoint;
     return ARENA_OK;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/arena.c` around lines 155 - 166, In the arena_restore_region function,
add validation for the used_at_checkpoint field before restoring the allocator
state. Similar to how region->checkpoint is validated to ensure it falls within
the valid memory bounds (between arena->memory and arena->end), you need to
validate that region->used_at_checkpoint is within a reasonable range to prevent
corruption of the allocator's bookkeeping. Add a bounds check after the
checkpoint validation that ensures used_at_checkpoint does not exceed the total
arena size or is otherwise inconsistent, and return an appropriate error code if
the validation fails.
lib/arena.c-68-73 (1)

68-73: ⚠️ Potential issue | 🟠 Major

Prevent size_t overflow in alignment computation at lines 68 and 134.

The alignment calculations (size + 7) & ~7 can overflow for sizes near SIZE_MAX. For example, if size = SIZE_MAX - 5, the addition wraps around and produces an incorrect aligned size of 0, allowing the OOM check to pass when it should fail. This bypasses capacity validation and can lead to undersized allocations or memory corruption.

Add overflow detection before alignment:

Suggested fix
+static bool align8_checked(size_t size, size_t *out) {
+    if (!out) return false;
+    if (size > SIZE_MAX - 7) return false;
+    *out = (size + 7u) & ~(size_t)7u;
+    return true;
+}
+
 ArenaResult arena_alloc(Arena *arena, size_t size, void **ptr) {
@@
-    size_t aligned_size = (size + 7) & ~7;
+    size_t aligned_size;
+    if (!align8_checked(size, &aligned_size)) {
+        *ptr = NULL;
+        return ARENA_ERROR_INVALID_SIZE;
+    }
@@
 bool arena_can_allocate(const Arena *arena, size_t size) {
@@
-    size_t aligned_size = (size + 7) & ~7;
+    size_t aligned_size;
+    if (!align8_checked(size, &aligned_size)) return false;
     return (arena->current + aligned_size <= arena->end);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/arena.c` around lines 68 - 73, Add overflow detection before the
alignment calculation. Before computing aligned_size with the formula (size + 7)
& ~7, check if size exceeds SIZE_MAX - 7, which would cause the addition to
overflow and produce an incorrect wrapped-around result. If overflow would
occur, set ptr to NULL and return ARENA_ERROR_OUT_OF_MEMORY immediately without
proceeding to the bounds check against arena->current and arena->end. Apply this
same overflow detection at both locations where alignment occurs (the diff shows
line 68 and the comment mentions line 134) to prevent undersized allocations
from bypassing capacity validation.
lib/csv_parser.h-62-62 (1)

62-62: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

read_full_record cannot honor configurable enclosures.

Line 62 fixes quote handling to an API with no CSVConfig, but parsing supports configurable config->enclosure. This will mis-handle multiline records when enclosure is not ".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_parser.h` at line 62, The read_full_record function signature does
not accept a CSVConfig parameter, which means it cannot access the configurable
enclosure character from config->enclosure needed for proper multiline record
parsing. Add a CSVConfig parameter to the read_full_record function signature so
the implementation can use the actual enclosure character instead of assuming it
is always a double quote. Update all callers of read_full_record to pass the
appropriate CSVConfig argument.
lib/csv_reader.c-117-133 (1)

117-133: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

skipEmptyLines is ignored by record iteration.

csv_reader_get_record_count honors config->skipEmptyLines (Lines 242-253), but csv_reader_next_record returns empty records directly. This makes iteration behavior inconsistent with counting and seek semantics.

Also applies to: 242-253

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_reader.c` around lines 117 - 133, The csv_reader_next_record function
does not respect the skipEmptyLines configuration option, while
csv_reader_get_record_count does, creating inconsistent behavior between
iteration and counting. After parsing the line with csv_parse_line_inplace,
check if the parsed result is an empty record and if
reader->config->skipEmptyLines is true. If both conditions are met, continue
reading the next record instead of returning the empty one, ensuring the
iteration respects the same empty line skipping behavior as the record counting
logic.
lib/csv_parser.h-62-71 (1)

62-71: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Public declarations and implementation are out of sync.

This header exposes multiple parser APIs (parse_csv_line, parse_headers, csv_parser_count_fields_in_line, csv_parser_split_line_generic, csv_parser_init, csv_parser_free) that are not implemented in lib/csv_parser.c in this cohort, creating a broken cross-file contract for consumers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_parser.h` around lines 62 - 71, Remove the function declarations from
the header file that do not have corresponding implementations in
lib/csv_parser.c. Specifically, remove declarations for parse_csv_line,
parse_headers, csv_parser_count_fields_in_line, csv_parser_split_line_generic,
csv_parser_init, and csv_parser_free if they are not implemented in the .c file.
This will ensure the public API contract matches the actual implementation and
prevents linker errors for consumers of this header.
lib/csv_reader.c-8-18 (1)

8-18: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Constructors need null/path validation before dereferencing.

csv_reader_init_with_config and csv_reader_init_standalone dereference config->path without guarding config->path (and the first also assumes non-null arenas). This can crash instead of failing gracefully.

Suggested fix
 CSVReader* csv_reader_init_with_config(Arena *persistent_arena, Arena *temp_arena, CSVConfig *config) {
+    if (!persistent_arena || !temp_arena || !config || !config->path) {
+        return NULL;
+    }
     void *ptr;
 CSVReader* csv_reader_init_standalone(CSVConfig *config) {
-    if (!config) {
+    if (!config || !config->path) {
         return NULL;
     }

Also applies to: 47-50, 81-82

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_reader.c` around lines 8 - 18, Add null parameter validation at the
start of csv_reader_init_with_config before any dereferencing: check that
persistent_arena, temp_arena, config, and config->path are all non-null,
returning NULL if any validation fails. Apply the same validation pattern to
csv_reader_init_standalone function (which has similar issues at lines 47-50 and
81-82) to ensure config and config->path are validated before use.
lib/csv_reader.c-229-237 (1)

229-237: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Record counting leaks persistent arena capacity.

csv_reader_get_record_count repeatedly allocates lines in reader->persistent_arena during scanning and never reclaims that memory. Repeated calls can exhaust arena memory and break later reads.

Also applies to: 258-260

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_reader.c` around lines 229 - 237, The csv_reader_get_record_count
function repeatedly allocates lines in reader->persistent_arena via
read_full_record without releasing the memory, exhausting arena capacity and
breaking subsequent reads. To fix this, use a temporary arena (separate from
reader->persistent_arena) when calling read_full_record during the record
counting loop in csv_reader_get_record_count. This ensures the persistent arena
capacity is preserved for actual record reads that occur after counting.
lib/csv_parser.c-204-218 (1)

204-218: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Final empty quoted field is lost.

When input ends in "" (or ,...,""), state is FIELD_END and field_len == 0, but the current guard skips appending the field. This drops valid empty quoted fields.

Suggested fix
-    if (field_len > 0 || state == FIELD_START) {
-        if (state == FIELD_END) {
-            if (!add_quoted_field(&result.fields, field_start, field_len, arena, config->enclosure)) {
-                result.success = false;
-                result.error = "Memory allocation failed";
-                return result;
-            }
-        } else {
-            if (!add_field(&result.fields, field_start, field_len, arena)) {
-                result.success = false;
-                result.error = "Memory allocation failed";
-                return result;
-            }
-        }
-    }
+    if (state == FIELD_END) {
+        if (!add_quoted_field(&result.fields, field_start, field_len, arena, config->enclosure)) {
+            result.success = false;
+            result.error = "Memory allocation failed";
+            return result;
+        }
+    } else if (field_len > 0 || state == FIELD_START) {
+        if (!add_field(&result.fields, field_start, field_len, arena)) {
+            result.success = false;
+            result.error = "Memory allocation failed";
+            return result;
+        }
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_parser.c` around lines 204 - 218, The guard condition `if (field_len
> 0 || state == FIELD_START)` skips adding empty quoted fields at the end of
input because when state is FIELD_END and field_len equals 0, neither condition
is true. Modify the condition to also check for state == FIELD_END, allowing
empty quoted fields that occur at the end of input (following the enclosure
delimiter) to be properly appended to the results.
lib/tests/test_csv_writer.c-25-30 (1)

25-30: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat failed assertions as log-only outcomes.

test_csv_writer_init and test_csv_writer_write_record print failure messages but do not fail the test. This can let broken behavior pass CI.

Also applies to: 106-119

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/tests/test_csv_writer.c` around lines 25 - 30, The test functions
test_csv_writer_init and test_csv_writer_write_record print failure messages but
do not actually fail the test when assertions fail. Replace the printf
statements in the else blocks with proper test failure mechanisms such as
returning a failure status code from the test function or using a standard
assertion macro that terminates the test on failure. This should be applied to
both functions to ensure that failed conditions in csv_writer_init and
csv_writer_write_record operations actually cause the test to fail rather than
just logging a message.
lib/tests/test_csv_writer.c-71-87 (1)

71-87: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard tmpfile() and writer init results before dereferencing/IO.

Several tests use tmpfile() and csv_writer_init_with_file() results without validation, then call rewind/fread/fclose or dereference writer. If setup fails, this can null-deref or use uninitialized pointers.

Also applies to: 134-157, 170-195, 208-233, 245-269, 300-323, 352-378, 428-439, 454-479

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/tests/test_csv_writer.c` around lines 71 - 87, The test code assumes that
tmpfile() and csv_writer_init_with_file() will always succeed without validating
their results before use. Add a NULL check immediately after calling tmpfile()
to ensure the file pointer is valid before passing it to
csv_writer_init_with_file(). Additionally, check that
csv_writer_init_with_file() succeeds (result equals CSV_WRITER_OK) and that the
writer pointer is not NULL before dereferencing writer->file, writer->owns_file,
writer->header_count, or calling any subsequent functions on the writer. Apply
the same guards to all affected test cases referenced in the comment (lines
134-157, 170-195, 208-233, 245-269, 300-323, 352-378, 428-439, 454-479).

Source: Linters/SAST tools

lib/tests/run_all_tests.c-31-34 (1)

31-34: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check waitpid result before inspecting status.

If waitpid fails (or is interrupted), status is undefined and WIFEXITED/WEXITSTATUS becomes invalid. Handle waitpid < 0 as a failed suite first.

Suggested fix
-        int status;
-        waitpid(pid, &status, 0);
+        int status;
+        if (waitpid(pid, &status, 0) < 0) {
+            perror("waitpid failed");
+            printf("❌ %s FAILED\n", suite->name);
+            return 1;
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/tests/run_all_tests.c` around lines 31 - 34, The code checks the status
flags using WIFEXITED and WEXITSTATUS without first verifying that the waitpid
call succeeded. When waitpid returns -1 (indicating failure or interruption),
the status variable is undefined and these macros produce invalid results. Add a
check after the waitpid call to verify that it returned a non-negative value
before attempting to inspect the status with WIFEXITED and WEXITSTATUS. If
waitpid fails (returns -1), treat the test suite as failed instead of proceeding
to check the status flags.
lib/.github/workflows/release.yml-16-30 (1)

16-30: ⚠️ Potential issue | 🟠 Major

Release build/package commands must be scoped to the lib/ directory.

This workflow resides at lib/.github/workflows/release.yml but contains no working-directory directive. GitHub Actions defaults to the repository root, so commands like make clean, make test, and cp *.h *.c Makefile ... will fail—they will attempt to find files in the root (where fastcsv.c belongs to the PHP extension) instead of in lib/ (where the standalone C library files reside).

Add working-directory: lib to the job or prefix each step's paths with lib/.

This applies to lines 16–30 and 32–36.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/.github/workflows/release.yml` around lines 16 - 30, The release workflow
in lib/.github/workflows/release.yml lacks proper directory scoping and will
execute commands from the repository root instead of the lib/ directory. Add a
working-directory directive set to lib at the job level (before the steps), or
alternatively prefix the paths in each step (lines 16-36) such as the make
commands in the "Build release artifacts" and "Run tests" steps, and the mkdir,
cp, and tar commands in the "Create source distribution" step to reference lib/
appropriately so that they operate on the correct library files and directories.
lib/.github/workflows/ci.yml-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Move workflow to repository root .github/workflows/ for automatic execution.

The workflow at lib/.github/workflows/ci.yml will not be automatically discovered or executed by GitHub Actions—they only search the repository-root .github/workflows/ directory. Move this workflow to .github/workflows/ci.yml so its comprehensive C library tests (build, valgrind, cross-compilation, memory safety) actually run on push and pull requests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/.github/workflows/ci.yml` around lines 1 - 3, The CI workflow is
currently located in lib/.github/workflows/ci.yml, but GitHub Actions only
discovers and executes workflows from the repository-root .github/workflows/
directory. Move the entire ci.yml workflow file from its current location in the
lib subdirectory to the root-level .github/workflows/ci.yml so that GitHub
Actions can automatically detect and run the workflow on push and pull request
events.
lib/.github/workflows/ci.yml-40-44 (1)

40-44: ⚠️ Potential issue | 🟠 Major

Workflow steps will fail without setting the correct working directory.

The workflow at lib/.github/workflows/ci.yml executes make and other commands from the repository root, but the Makefile, source files, and generated artifacts are in the lib/ subdirectory. Add defaults.run.working-directory: lib to each job or prefix all paths with lib/.

Suggested fix
jobs:
  test:
    name: Test on ${{ matrix.os }}
    runs-on: ${{ matrix.os }}
+   defaults:
+     run:
+       working-directory: lib
    strategy:
      matrix:
        os: [ubuntu-latest, macos-latest]

  memory-safety:
    name: Memory Safety Tests
    runs-on: ubuntu-latest
+   defaults:
+     run:
+       working-directory: lib

  cross-compile:
    name: Cross Compilation Test
    runs-on: ubuntu-latest
+   defaults:
+     run:
+       working-directory: lib

  release-test:
    name: Release Build Test
    runs-on: ubuntu-latest
+   defaults:
+     run:
+       working-directory: lib
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/.github/workflows/ci.yml` around lines 40 - 44, The "Build library" step
executes make commands from the repository root, but the Makefile is located in
the lib/ subdirectory. Fix this by adding defaults.run.working-directory: lib at
the job level in the CI workflow, which will ensure all run steps in that job
execute from the lib/ directory. Alternatively, you can prefix the make commands
in the run step with cd lib && before executing the make clean and make
commands.
lib/.github/workflows/release.yml-8-11 (1)

8-11: ⚠️ Potential issue | 🟠 Major

Add explicit token permissions block for release creation.

The softprops/action-gh-release@v1 action requires contents: write permission. Without an explicit permissions block, the workflow relies on repository/organization default token settings and will fail if the repo restricts default token permissions to read-only.

Suggested fix
 name: Release
 
 on:
   push:
     tags:
       - 'v*'
+
+permissions:
+  contents: write
 
 jobs:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/.github/workflows/release.yml` around lines 8 - 11, Add an explicit
permissions block to the create-release job that grants write access to
repository contents. After the `runs-on: ubuntu-latest` line in the
create-release job, add a `permissions` section that explicitly sets `contents:
write` to allow the softprops/action-gh-release@v1 action to create releases
successfully, preventing failures due to restrictive default token permissions.
lib/csv_writer.c-245-262 (1)

245-262: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

escape configuration is ignored during quote escaping.

write_field always doubles the enclosure and never uses options->escape, so configured escape behavior is not honored.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_writer.c` around lines 245 - 262, The write_field function always
doubles the enclosure character when escaping, ignoring the configured
options->escape value. Modify the enclosure escaping logic within the character
iteration loop to check if options->escape is configured, and if so, write the
escape character followed by the enclosure character instead of always doubling
the enclosure; otherwise, fall back to doubling the enclosure as the default
behavior.
lib/csv_writer.c-26-63 (1)

26-63: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Non-UTF8 encodings currently write invalid payload bytes.

For UTF-16/UTF-32, only BOM is written; record data is still emitted as single-byte chars. That produces malformed files under those encodings.

Suggested safety fix (until transcoding is implemented)
     switch (encoding) {
         case CSV_ENCODING_UTF8:
             bom = UTF8_BOM;
             bom_size = sizeof(UTF8_BOM);
             break;
-        case CSV_ENCODING_UTF16LE:
-            bom = UTF16LE_BOM;
-            bom_size = sizeof(UTF16LE_BOM);
-            break;
-        case CSV_ENCODING_UTF16BE:
-            bom = UTF16BE_BOM;
-            bom_size = sizeof(UTF16BE_BOM);
-            break;
-        case CSV_ENCODING_UTF32LE:
-            bom = UTF32LE_BOM;
-            bom_size = sizeof(UTF32LE_BOM);
-            break;
-        case CSV_ENCODING_UTF32BE:
-            bom = UTF32BE_BOM;
-            bom_size = sizeof(UTF32BE_BOM);
-            break;
+        case CSV_ENCODING_UTF16LE:
+        case CSV_ENCODING_UTF16BE:
+        case CSV_ENCODING_UTF32LE:
+        case CSV_ENCODING_UTF32BE:
+            return CSV_WRITER_ERROR_ENCODING;

Also applies to: 245-267

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_writer.c` around lines 26 - 63, The write_bom function correctly
writes the BOM for non-UTF8 encodings, but record data is still being written as
single-byte characters elsewhere in the code, creating malformed files. Locate
where record data is being written (around the lines referenced in the "Also
applies to" section, approximately lines 245-267) and add a check that returns
an error or prevents writing when using UTF-16 or UTF-32 encodings until full
transcoding support is implemented. This safety fix should ensure the function
fails gracefully rather than producing invalid output.
lib/csv_writer.c-307-340 (1)

307-340: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate record width when headers are present.

csv_writer_write_record should reject rows whose field_count differs from writer->header_count; otherwise header-aligned CSV contracts are silently broken.

Suggested fix
 CSVWriterResult csv_writer_write_record(CSVWriter *writer, char **fields, int field_count) {
     if (!writer || !fields || field_count <= 0) {
         return CSV_WRITER_ERROR_NULL_POINTER;
     }
+    if (writer->header_count > 0 && field_count != writer->header_count) {
+        return CSV_WRITER_ERROR_INVALID_FIELD_COUNT;
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_writer.c` around lines 307 - 340, The csv_writer_write_record
function does not validate that the number of fields being written matches the
number of headers defined in the writer. Add a validation check after the
existing null pointer checks to compare field_count against
writer->header_count, and return an appropriate error code if they do not match
(such as CSV_WRITER_ERROR_FIELD_COUNT_MISMATCH or similar). This validation
should only apply when headers have been configured, so check if
writer->header_count is greater than zero before enforcing the constraint.
lib/csv_writer.c-342-367 (1)

342-367: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing mapped headers are silently emitted as empty fields.

When a header name is not found in field_names, the function writes an empty field and returns success. That hides mapping errors despite CSV_WRITER_ERROR_FIELD_NOT_FOUND existing.

Suggested fix
-    char *ordered_fields[MAX_FIELDS];
+    char *ordered_fields[MAX_FIELDS];
+    bool matched[MAX_FIELDS];

     for (int i = 0; i < writer->header_count; i++) {
         ordered_fields[i] = NULL;
+        matched[i] = false;
     }

     for (int i = 0; i < field_count; i++) {
         if (!field_names[i]) continue;

         for (int j = 0; j < writer->header_count; j++) {
             if (writer->headers[j] && strcmp(field_names[i], writer->headers[j]) == 0) {
                 ordered_fields[j] = field_values[i];
+                matched[j] = true;
                 break;
             }
         }
     }
+
+    for (int j = 0; j < writer->header_count; j++) {
+        if (!matched[j]) return CSV_WRITER_ERROR_FIELD_NOT_FOUND;
+    }

     return csv_writer_write_record(writer, ordered_fields, writer->header_count);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_writer.c` around lines 342 - 367, In the csv_writer_write_record_map
function, headers that aren't found in the provided field_names are silently
written as empty fields instead of being detected as errors. After the nested
loops that populate the ordered_fields array, add validation to check if any
required header position contains NULL (meaning it wasn't found in the input
field_names). If any header is missing, return CSV_WRITER_ERROR_FIELD_NOT_FOUND
instead of proceeding to call csv_writer_write_record, ensuring mapping errors
are properly reported rather than silently skipped.
🟡 Minor comments (8)
lib/README.md-9-9 (1)

9-9: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve language version discrepancy between README and CONTRIBUTING.

The README claims C99 compatibility (badge on line 9 and guidance on line 587), but CONTRIBUTING.md explicitly specifies "C89/ANSI C" (line 44). These should be consistent. If the library is truly C89/ANSI compatible for maximum portability, update the README badges and guidance accordingly.

🔧 Suggested fix (if actual target is C89/ANSI C)

README lines 9 and 587 should be updated to reference C89/ANSI C instead of C99:

-[![C99](https://img.shields.io/badge/C-99-blue.svg)](https://en.wikipedia.org/wiki/C99)
+[![C89](https://img.shields.io/badge/C-89%2FANSI-blue.svg)](https://en.wikipedia.org/wiki/C89)
 ### Code Style
 
-- Follow C99 standard
+- Follow C89/ANSI C standard for maximum compatibility

Also applies to: 587-587

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/README.md` at line 9, Update the README.md file to align the C language
version with the CONTRIBUTING.md specification. Replace the C99 references in
two locations with C89/ANSI C for consistency. First, update the badge on line 9
that currently displays the C99 version indicator to show C89/ANSI C instead.
Second, update the C language guidance text on line 587 to reference C89/ANSI C
rather than C99. This ensures the documentation is consistent across both the
README and CONTRIBUTING files regarding the minimum supported C language
standard.
lib/csv_config.c-80-82 (1)

80-82: ⚠️ Potential issue | 🟡 Minor

Align NULL fallback with configured default for strict mode.

At line 81, csv_config_get_strict_mode returns true when config is NULL, but csv_config_create initializes strictMode to false (line 17). This inconsistency causes NULL config and valid config with default settings to return different values.

Suggested fix
 bool csv_config_get_strict_mode(const CSVConfig *config) {
-    return config ? config->strictMode : true;
+    return config ? config->strictMode : false;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_config.c` around lines 80 - 82, The function
csv_config_get_strict_mode returns true as the default fallback value when
config is NULL, but this contradicts the initialization in csv_config_create
which sets strictMode to false as the default. Update the return statement in
csv_config_get_strict_mode to return false instead of true in the ternary
operator so that the NULL fallback behavior matches the default initialization
behavior established in csv_config_create.
lib/csv_utils.c-58-60 (1)

58-60: ⚠️ Potential issue | 🟡 Minor

Add NUL validation for escape character in CSV char validation.

The csv_utils_validate_csv_chars function rejects delimiter and enclosure when equal to '\0', but does not validate escape. For consistency and functional correctness, the escape character should also be rejected as NUL since it would be invalid in the escaping contract.

Suggested fix
-    if (delimiter == '\0' || enclosure == '\0') {
+    if (delimiter == '\0' || enclosure == '\0' || escape == '\0') {
         return CSV_UTILS_ERROR_INVALID_INPUT;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_utils.c` around lines 58 - 60, The csv_utils_validate_csv_chars
function currently validates that delimiter and enclosure parameters are not
equal to the NUL character ('\0'), but it does not perform the same validation
for the escape parameter. Add a check to the conditional statement to also
reject the escape character when it equals '\0', ensuring consistency with the
validation of the other character parameters and preventing invalid escape
character usage in the CSV handling contract.
lib/tests/test_csv_parser.c-166-169 (1)

166-169: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail the test when tmpfile() cannot be created.

Returning early here marks the suite as passing while skipping assertions. Convert this to an assertion (or explicit non-zero exit path).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/tests/test_csv_parser.c` around lines 166 - 169, The test is currently
returning early without failing when tmpfile() cannot be created, which causes
the test suite to pass while skipping all subsequent assertions. Replace the
early return statement in the test_file creation failure case with a proper test
failure mechanism such as an assertion (for example, using assert or a similar
testing framework macro) or an explicit non-zero exit path, ensuring that the
test correctly fails when tmpfile() returns NULL instead of silently passing.
lib/tests/README.md-113-147 (1)

113-147: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

The code fences at Line 113 and Line 137 are missing a language tag (text is fine), which trips markdownlint (MD040).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/tests/README.md` around lines 113 - 147, The fenced code blocks in the
README.md file are missing language identifiers which causes markdownlint to
fail (MD040 rule). Add a language tag after the opening triple backticks for
both code fences - one at the beginning of the "CSV Library Test Suite Runner"
output block and another at the beginning of the "Failed Test Example" output
block. Using ```text as the language identifier is appropriate for these example
outputs.

Source: Linters/SAST tools

lib/tests/test_csv_reader.c-9-15 (1)

9-15: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make file-creation failures explicit in test helper.

create_test_csv_file currently no-ops when fopen fails, so tests continue with invalid setup. Return status (or assert) so each test fails at setup time.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/tests/test_csv_reader.c` around lines 9 - 15, The create_test_csv_file
function silently ignores file creation failures when fopen returns NULL,
allowing tests to proceed with invalid setup. Modify the function to explicitly
handle the failure case by either returning a boolean/status code to indicate
success or failure (so callers can assert on setup), or by asserting/failing the
test directly within the function when file creation fails. This ensures tests
fail at setup time rather than producing confusing subsequent failures.
lib/tests/README.md-7-13 (1)

7-13: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align documented test counts across sections.

The per-file counts in “Test Files” don’t match the counts in “Test Coverage” (for example CSV config/utils/parser/reader), so the docs currently conflict.

Also applies to: 65-100

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/tests/README.md` around lines 7 - 13, The test function counts documented
in the "Test Files" section do not match the counts listed in the "Test
Coverage" section (lines 65-100), creating conflicting information in the
README. Verify the actual number of test functions in each test file
(test_arena.c, test_csv_config.c, test_csv_utils.c, test_csv_parser.c,
test_csv_writer.c, and test_csv_reader.c) and update the function counts in both
the "Test Files" section and the "Test Coverage" section to match the actual
counts across all referenced sections of the document.
lib/tests/run_all_tests.c-27-29 (1)

27-29: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use _exit(1) in the child on execl failure.

After fork(), calling exit() in the child can flush parent-inherited stdio buffers; _exit() avoids that.

Suggested fix
-        exit(1);
+        _exit(1);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/tests/run_all_tests.c` around lines 27 - 29, In the child process error
handling after execl failure, replace the call to exit(1) with _exit(1). The
perror call for execl failure occurs in the child process branch after fork(),
and using _exit() instead of exit() prevents unwanted flushing of stdio buffers
that were inherited from the parent process. Locate the line with exit(1)
following the perror("execl failed") statement and change it to _exit(1).
🧹 Nitpick comments (3)
lib/README.md (1)

381-381: ⚡ Quick win

Add language specifications to fenced code blocks.

Three code blocks lack language identifiers, which affects readability and syntax highlighting:

  • Line 381: Test results summary (should be bash)
  • Line 425: Valgrind output (should be text)
  • Line 540: Architecture diagram (should be text)
🔧 Proposed fix

Line 381 - wrap test results in bash code block:

 ### Test Results Summary
 
-```
+```bash
 ✅ Arena Tests: 13/13 passed

Line 425 - wrap Valgrind output in text code block:

 **Validated with Valgrind:**
 
-```
+```text
 ✅ Zero memory leaks

Line 540 - wrap architecture diagram in text code block:

 ## 🏗️ Architecture
 
-```
+```text
 ┌─────────────────┐    ┌─────────────────┐

Also applies to: 425-425, 540-540

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/README.md` at line 381, Add language identifiers to three fenced code
blocks in the README.md file to enable proper syntax highlighting. At line 381,
add bash after the opening triple backticks for the test results summary. At
line 425, add text after the opening triple backticks for the Valgrind output
section. At line 540, add text after the opening triple backticks for the
architecture diagram. This improves readability and enables the appropriate
syntax highlighting for each section.

Source: Linters/SAST tools

lib/CONTRIBUTING.md (1)

8-8: 💤 Low value

Consider using more formal language in commit message example.

The example commit message 'Add amazing feature' uses informal language. For a contributing guide, a more professional phrasing like 'Add feature description' would better model expected commit message conventions.

✏️ Suggested revision
-2. Create a feature branch (`git checkout -b feature/amazing-feature`)
+2. Create a feature branch (`git checkout -b feature/your-feature-name`)
 3. Make your changes
 4. Run tests
-5. Commit your changes (`git commit -m 'Add amazing feature'`)
+5. Commit your changes (`git commit -m 'Add feature description'`)
 6. Push to the branch (`git push origin feature/amazing-feature`)

Also applies to: 11-11

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/CONTRIBUTING.md` at line 8, Update the commit message examples in the
CONTRIBUTING.md file to use more formal and professional language. Replace
informal phrasing like "Add amazing feature" with more professional alternatives
such as "Add feature description" that better model expected commit message
conventions. Ensure all commit message examples throughout the contributing
guide follow this more formal tone to set appropriate expectations for
contributors.
lib/.github/workflows/release.yml (1)

14-15: ⚡ Quick win

Pin GitHub Actions to immutable SHAs.

Using floating tags (@v4, @v1) is a supply-chain risk. Pin actions to commit SHAs for reproducible and safer releases.

Also applies to: 39-40

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/.github/workflows/release.yml` around lines 14 - 15, Replace the floating
version tags in GitHub Actions with immutable commit SHAs. For the
actions/checkout@v4 usage on line 14, and any other Actions using floating tags
(also applies to lines 39-40), replace the `@v4`, `@v1`, etc. tags with their
corresponding full commit SHAs to ensure reproducible and secure builds. This
prevents unauthorized updates to actions that could introduce supply chain
vulnerabilities.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aa590c5c-b93d-42aa-9cf9-5dfb0b7d1c08

📥 Commits

Reviewing files that changed from the base of the PR and between f0b046f and 5d88902.

📒 Files selected for processing (28)
  • .gitmodules
  • lib
  • lib/.github/workflows/ci.yml
  • lib/.github/workflows/release.yml
  • lib/.gitignore
  • lib/CONTRIBUTING.md
  • lib/LICENSE
  • lib/README.md
  • lib/arena.c
  • lib/arena.h
  • lib/csv_config.c
  • lib/csv_config.h
  • lib/csv_parser.c
  • lib/csv_parser.h
  • lib/csv_reader.c
  • lib/csv_reader.h
  • lib/csv_utils.c
  • lib/csv_utils.h
  • lib/csv_writer.c
  • lib/csv_writer.h
  • lib/tests/README.md
  • lib/tests/run_all_tests.c
  • lib/tests/test_arena.c
  • lib/tests/test_csv_config.c
  • lib/tests/test_csv_parser.c
  • lib/tests/test_csv_reader.c
  • lib/tests/test_csv_utils.c
  • lib/tests/test_csv_writer.c
💤 Files with no reviewable changes (2)
  • .gitmodules
  • lib

Comment thread lib/csv_parser.c
Comment on lines +241 to +252
if (record_len >= record_capacity - 1) {
size_t new_capacity = record_capacity * 2;
void *new_ptr;
ArenaResult grow_result = arena_alloc(arena, new_capacity, &new_ptr);
if (grow_result != ARENA_OK) {
return NULL;
}
char *new_record = (char*)new_ptr;
memcpy(new_record, record, record_len);
record = new_record;
record_capacity = new_capacity;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Escaped-quote branch can write past buffer end.

Line 241 only ensures space for one byte, but the next_c == '"' path appends two bytes (Lines 258-259). Near capacity, this can overflow and corrupt memory.

Suggested fix
                 if (next_c == '"') {
+                    if (record_len >= record_capacity - 2) {
+                        size_t new_capacity = record_capacity * 2;
+                        void *new_ptr;
+                        ArenaResult grow_result = arena_alloc(arena, new_capacity, &new_ptr);
+                        if (grow_result != ARENA_OK) {
+                            return NULL;
+                        }
+                        char *new_record = (char*)new_ptr;
+                        memcpy(new_record, record, record_len);
+                        record = new_record;
+                        record_capacity = new_capacity;
+                    }
                     record[record_len++] = '"';
                     record[record_len++] = '"';

Also applies to: 256-260

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_parser.c` around lines 241 - 252, The buffer capacity check at the
beginning of the growth condition only reserves space for one additional byte,
but the escaped-quote handling path (referenced around lines 258-259) writes two
bytes to the record buffer. When the buffer is near capacity, this can cause a
buffer overflow. Update the capacity check from `record_len >= record_capacity -
1` to `record_len >= record_capacity - 2` to ensure sufficient space is reserved
for both the single-byte case and the two-byte escaped-quote case before any
writes occur.

Comment thread lib/csv_writer.c
Comment on lines +87 to +107
static CSVWriterResult copy_headers_to_arena(CSVWriter *writer, char **headers, int header_count) {
if (header_count <= 0) {
writer->headers = NULL;
writer->header_count = 0;
return CSV_WRITER_OK;
}

void *ptr;
ArenaResult result = arena_alloc(writer->arena, header_count * sizeof(char*), &ptr);
if (result != ARENA_OK) return CSV_WRITER_ERROR_MEMORY_ALLOCATION;

writer->headers = (char**)ptr;

for (int i = 0; i < header_count; i++) {
if (!headers[i]) {
writer->headers[i] = arena_strdup(writer->arena, "");
} else {
writer->headers[i] = arena_strdup(writer->arena, headers[i]);
}
if (!writer->headers[i]) return CSV_WRITER_ERROR_MEMORY_ALLOCATION;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard headers before dereferencing in header copy.

With header_count > 0 and headers == NULL, Line 101 dereferences a null pointer and crashes.

Suggested fix
 static CSVWriterResult copy_headers_to_arena(CSVWriter *writer, char **headers, int header_count) {
     if (header_count <= 0) {
         writer->headers = NULL;
         writer->header_count = 0;
         return CSV_WRITER_OK;
     }
+    if (!headers) {
+        return CSV_WRITER_ERROR_NULL_POINTER;
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/csv_writer.c` around lines 87 - 107, In the copy_headers_to_arena
function, add a null pointer guard for the `headers` parameter before the for
loop that iterates through and dereferences `headers[i]`. The function currently
only checks if `header_count <= 0` but does not validate that the `headers`
pointer itself is not null. If `header_count > 0` and `headers` is null, the
loop will attempt to dereference a null pointer causing a crash. Add a check
after the header_count validation and before accessing the headers array to
return an error code if `headers` is null when header_count is greater than
zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant