Skip to content

fix(csv): escape pipes and newlines in CSV cells#1816

Open
Bojun-Vvibe wants to merge 1 commit into
microsoft:mainfrom
Bojun-Vvibe:agent/microsoft_markitdown-1776829197
Open

fix(csv): escape pipes and newlines in CSV cells#1816
Bojun-Vvibe wants to merge 1 commit into
microsoft:mainfrom
Bojun-Vvibe:agent/microsoft_markitdown-1776829197

Conversation

@Bojun-Vvibe
Copy link
Copy Markdown

Repo: microsoft/markitdown (⭐ 113363)
Type: bugfix
Files changed: 1
Lines: +11/-5

What

The CSV converter builds a GitHub-flavored markdown table by joining raw CSV cell values with |. When a cell contains an unescaped | character or an embedded newline (both legal in quoted CSV fields), the emitted markdown table breaks: columns shift, rows split, and downstream markdown renderers misalign the table. This change introduces an escape_cell helper that escapes backslashes and pipes, and collapses CR/LF sequences to spaces, before the values are joined into the table.

Why

CSV files frequently contain free-text cells with punctuation or multi-line values (addresses, descriptions, log entries). Producing malformed markdown for them is a correctness bug in a lossy direction — users won't notice until they render or parse the markdown downstream. The fix is local to the converter, handles header and data rows symmetrically, and follows the standard markdown table escaping convention (\|).

Testing

  • Manual: a single-column CSV with a value a|b previously produced | a|b | (2 visible columns); now produces | a\|b | (1 column, as intended).
  • A cell containing "line1\nline2" previously split the row in two; it now renders as line1 line2 on a single row.
  • No existing tests reference CSV pipe/newline escaping, so behavior for well-formed CSVs without these characters is unchanged.

Risk

Low — change is confined to one converter's output formatting; escaping is the markdown-standard convention and only activates on characters that otherwise corrupt the table.

@Bojun-Vvibe Bojun-Vvibe marked this pull request as ready for review April 24, 2026 15:24
@Bojun-Vvibe
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="Microsoft"

@Bojun-Vvibe Bojun-Vvibe force-pushed the agent/microsoft_markitdown-1776829197 branch from 931cc39 to f4cc557 Compare May 26, 2026 00:11
@Bojun-Vvibe Bojun-Vvibe force-pushed the agent/microsoft_markitdown-1776829197 branch from f4cc557 to 8c89629 Compare May 27, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants