Skip to content

perf: lazy page rendering in convert_pdf_to_image#473

Closed
KRRT7 wants to merge 4 commits intoUnstructured-IO:mainfrom
KRRT7:lazy-page-rendering
Closed

perf: lazy page rendering in convert_pdf_to_image#473
KRRT7 wants to merge 4 commits intoUnstructured-IO:mainfrom
KRRT7:lazy-page-rendering

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented Feb 26, 2026

Summary

  • Render and save each PDF page inside the page loop instead of accumulating all images in memory first, reducing peak memory usage
  • When path_only=True, images are no longer retained after saving to disk

Peak memory drops from O(N pages) to O(1 page).

Benchmark

10 PDFs (2,905 total pages), hi_res strategy:

Before After
Avg memory 7.5 GB 4.6 GB
Steady-state ~7.9 GB (never released) ~4.8 GB
memory_comparison_1

Test plan

  • Added tests for convert_pdf_to_image covering the no-output-folder and output-folder-with-images return paths

@KRRT7
Copy link
Copy Markdown
Collaborator Author

KRRT7 commented Mar 16, 2026

@badGarnet @qued could you review this when you get a chance?

@KRRT7 KRRT7 closed this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant