fix(zip): skip directory entries in ZipConverter by nileshpatil6 · Pull Request #1915 · microsoft/markitdown

nileshpatil6 · 2026-05-27T17:31:29Z

What

When a ZIP archive contains explicit directory entries (e.g. created with zipfile.ZipFile.mkdir() or tools like Info-ZIP), zipObj.namelist() returns those entries alongside actual files.

Before this fix, ZipConverter passed every name to zipObj.read(), which returns an empty bytes object for directory entries. That empty stream then went through convert_stream, and the result produced a spurious ## File: subdir/ heading with no content.

Fix

Add an is_dir() guard before reading and converting each entry:

if zipObj.getinfo(name).is_dir():
    continue

Reproduction

import zipfile, io
from markitdown import MarkItDown

buf = io.BytesIO()
with zipfile.ZipFile(buf, 'w') as z:
    z.mkdir('subdir')
    z.writestr('subdir/note.txt', 'hello')
buf.seek(0)

md = MarkItDown()
result = md.convert_stream(buf, stream_info_kwargs={'extension': '.zip'})
print(result.markdown)
# Before: includes empty '## File: subdir/' section
# After:  only '## File: subdir/note.txt' with content

zipObj.namelist() includes directory entries (e.g. 'docs/') that have no file content. Previously the converter tried to convert these zero-byte streams, producing empty '## File: docs/' headings in the output. Skip them with is_dir() before reading.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(zip): skip directory entries in ZipConverter#1915

fix(zip): skip directory entries in ZipConverter#1915
nileshpatil6 wants to merge 1 commit into
microsoft:mainfrom
nileshpatil6:fix/zip-converter-skip-directory-entries

nileshpatil6 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nileshpatil6 commented May 27, 2026

What

Fix

Reproduction

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant