Skip to content

just remove figures if they don't attach to any content#45

Merged
srkirkland merged 3 commits intomainfrom
srk/alt-issue2
Feb 25, 2026
Merged

just remove figures if they don't attach to any content#45
srkirkland merged 3 commits intomainfrom
srk/alt-issue2

Conversation

@srkirkland
Copy link
Member

@srkirkland srkirkland commented Feb 24, 2026

instead of making contentless spans

Summary by CodeRabbit

  • Bug Fixes

    • PDF remediation: contentless figures and links are now removed from the document structure when possible; previous demotion behavior is retained as a fallback to preserve structure.
    • Logging/reporting now includes counts for removed contentless figures and links so remediation summaries more accurately reflect changes.
  • Tests

    • Tests updated to assert that document structure remains present after remediation rather than checking for a specific demotion role.

instead of making contentless spans
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be5e6fa and 67dc27b.

📒 Files selected for processing (1)
  • server.core/Remediate/PdfRemediationProcessor.cs
🚧 Files skipped from review as they are similar to previous changes (1)
  • server.core/Remediate/PdfRemediationProcessor.cs

📝 Walkthrough

Walkthrough

Attempt removal of contentless Figure and Link structural elements from their parent before falling back to demotion; add removal-tracking counters and identity helpers; update logs and tests to stop asserting a specific /Span role, instead ensuring structure remains non-empty.

Changes

Cohort / File(s) Summary
Core remediation logic
server.core/Remediate/PdfRemediationProcessor.cs
Added TryRemoveStructElemFromParent(PdfDictionary) and IsSameStructElem(...); change Figures/Links remediation to try removal first then fallback to role demotion; introduced contentlessFiguresRemoved and contentlessLinksRemoved counters; updated summary logging to report removed vs demoted counts.
Remediation tests
tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs
Removed private RoleSpan field and replaced assertions that checked for a specific /Span demotion with checks that the document structure remains present (non-empty) after remediation.

Sequence Diagram(s)

(Skipped — changes are internal to remediation logic and tests, not multi-component control flow.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through structure, sniffed each hollow leaf,
Pulled out empty things with a careful brief.
If stubborn pieces clung and would not roam,
I softly marked them neutral, then sent them home.
Tiny cleanups, carrot-sweet relief. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'just remove figures if they don't attach to any content' directly and accurately summarizes the main change: replacing demotion-via-span with removal for contentless figures and links.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch srk/alt-issue2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs (1)

14-14: ⚠️ Potential issue | 🟡 Minor

Stale test method names and assertion reason strings still describe demotion, not removal

  • Method ProcessAsync_ContentlessFigureWithoutAlt_DemotesRole (Line 14) and ProcessAsync_ContentlessFigureWithPlaceholderAlt_RemovesAltAndDemotesRole (Line 58) still refer to "DemotesRole" even though the new primary behavior is removal from the parent.
  • The because strings on Lines 45 and 93 ("contentless /Figure nodes should be demoted") contradict the PR intent.
🛠️ Suggested rename
-    public async Task ProcessAsync_ContentlessFigureWithoutAlt_DemotesRole()
+    public async Task ProcessAsync_ContentlessFigureWithoutAlt_RemovesFigureFromStructTree()

-    ListStructElementsByRole(outputPdf, PdfName.Figure).Should().BeEmpty("contentless /Figure nodes should be demoted");
+    ListStructElementsByRole(outputPdf, PdfName.Figure).Should().BeEmpty("contentless /Figure nodes should be removed");

-    public async Task ProcessAsync_ContentlessFigureWithPlaceholderAlt_RemovesAltAndDemotesRole()
+    public async Task ProcessAsync_ContentlessFigureWithPlaceholderAlt_RemovesAltAndFigureFromStructTree()

-    ListStructElementsByRole(outputPdf, PdfName.Figure).Should().BeEmpty("contentless /Figure nodes should be demoted");
+    ListStructElementsByRole(outputPdf, PdfName.Figure).Should().BeEmpty("contentless /Figure nodes should be removed");

Also applies to: 45-45, 58-58, 93-93

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs`
at line 14, Rename the stale test method names and update assertion reason
strings to reflect removal (not demotion): change
ProcessAsync_ContentlessFigureWithoutAlt_DemotesRole and
ProcessAsync_ContentlessFigureWithPlaceholderAlt_RemovesAltAndDemotesRole to
names that mention "RemovesFromParent" (or similar) and update every assertion
`because` string that says "contentless /Figure nodes should be demoted" to
"contentless /Figure nodes should be removed from their parent" (or equivalent).
Ensure you update all references to these test method names within the test
class and any test runners/mappings so names remain consistent with the PR
behavior.
🧹 Nitpick comments (2)
tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs (1)

45-46: Tests do not cover the Span-demotion fallback path

ListStructElementsByRole(outputPdf, PdfName.Figure).Should().BeEmpty() passes whether the figure was removed or demoted to Span (since a Span-demoted element no longer has role /Figure). The new ListStructElements(...).Should().NotBeEmpty() check (Lines 46, 94) only verifies something remains in the structure tree, not specifically that a Span demotion fallback element does or does not exist.

Consider adding a dedicated test where the parent struct element makes removal impossible (e.g., single direct/non-indirect child dict), asserting that the Span fallback is used and the element is still present with role Span. This would give meaningful regression coverage for the two distinct code paths in TryRemoveStructElemFromParent.

Would you like me to draft a test case for the Span-demotion fallback path?

Also applies to: 93-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs`
around lines 45 - 46, The tests currently don't verify the Span-demotion
fallback path in TryRemoveStructElemFromParent: add a new test in
PdfRemediationProcessorAltTextAssociationTests that sets up a parent struct
element that cannot be removed (e.g., a parent with a single direct/non-indirect
child dict), run the remediation, then assert that
ListStructElementsByRole(outputPdf, PdfName.Span) contains the demoted element
(and that ListStructElementsByRole(outputPdf, PdfName.Figure) no longer contains
it), so the Span fallback is explicitly covered; use the existing helpers
ListStructElements, ListStructElementsByRole and reference
PdfName.Figure/PdfName.Span and TryRemoveStructElemFromParent behavior when
constructing the scenario.
server.core/Remediate/PdfRemediationProcessor.cs (1)

1058-1073: Empty K array left on parent after removing the last child

When kids is a PdfArray containing only the target struct elem, after array.Remove(i) the parent retains K: []. The scalar-child path (Lines 1076-1079) correctly removes the K entry entirely, but the array path does not clean up an empty result. While not technically invalid per the PDF spec, an empty K array can confuse accessibility checkers and diverges from the cleaner scalar-removal behavior.

🛠️ Proposed fix: remove empty K array from parent
  if (kids is PdfArray array)
  {
      var removedAny = false;
      for (var i = array.Size() - 1; i >= 0; i--)
      {
          var item = array.Get(i);
          if (!IsSameStructElem(item, structElem, targetRef))
          {
              continue;
          }

          array.Remove(i);
          removedAny = true;
      }

+     if (removedAny && array.Size() == 0)
+     {
+         parent.Remove(PdfName.K);
+     }

      return removedAny;
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server.core/Remediate/PdfRemediationProcessor.cs` around lines 1058 - 1073,
The array-branch logic in PdfRemediationProcessor that iterates a PdfArray of
kids and removes matching items (uses PdfArray, IsSameStructElem, structElem,
targetRef and array.Remove(i)) can leave an empty K: [] on the parent; after the
removal loop, if removedAny is true and array.Size() == 0, remove the "K" entry
from the parent (the same way the scalar-child path does in the nearby code) so
the parent dictionary no longer contains an empty K array — locate the parent
object used by this method and delete its K key when the array becomes empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server.core/Remediate/PdfRemediationProcessor.cs`:
- Around line 580-591: The counters contentlessFiguresRemoved and
contentlessLinksRemoved are incremented regardless of whether elements were
actually removed or only demoted; update the logic in the loop that iterates
PdfStructTreeIndex.ListStructElementsByRole so you increment separate counters
for removals vs demotions (e.g., contentlessFiguresRemoved vs
contentlessFiguresDemoted and contentlessLinksRemoved vs
contentlessLinksDemoted) based on the boolean result of
TryRemoveStructElemFromParent(figure) / TryRemoveStructElemFromParent(link), and
then change the telemetry/log messages to report removed and demoted counts
distinctly instead of claiming all were "removed".

---

Outside diff comments:
In
`@tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs`:
- Line 14: Rename the stale test method names and update assertion reason
strings to reflect removal (not demotion): change
ProcessAsync_ContentlessFigureWithoutAlt_DemotesRole and
ProcessAsync_ContentlessFigureWithPlaceholderAlt_RemovesAltAndDemotesRole to
names that mention "RemovesFromParent" (or similar) and update every assertion
`because` string that says "contentless /Figure nodes should be demoted" to
"contentless /Figure nodes should be removed from their parent" (or equivalent).
Ensure you update all references to these test method names within the test
class and any test runners/mappings so names remain consistent with the PR
behavior.

---

Nitpick comments:
In `@server.core/Remediate/PdfRemediationProcessor.cs`:
- Around line 1058-1073: The array-branch logic in PdfRemediationProcessor that
iterates a PdfArray of kids and removes matching items (uses PdfArray,
IsSameStructElem, structElem, targetRef and array.Remove(i)) can leave an empty
K: [] on the parent; after the removal loop, if removedAny is true and
array.Size() == 0, remove the "K" entry from the parent (the same way the
scalar-child path does in the nearby code) so the parent dictionary no longer
contains an empty K array — locate the parent object used by this method and
delete its K key when the array becomes empty.

In
`@tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs`:
- Around line 45-46: The tests currently don't verify the Span-demotion fallback
path in TryRemoveStructElemFromParent: add a new test in
PdfRemediationProcessorAltTextAssociationTests that sets up a parent struct
element that cannot be removed (e.g., a parent with a single direct/non-indirect
child dict), run the remediation, then assert that
ListStructElementsByRole(outputPdf, PdfName.Span) contains the demoted element
(and that ListStructElementsByRole(outputPdf, PdfName.Figure) no longer contains
it), so the Span fallback is explicitly covered; use the existing helpers
ListStructElements, ListStructElementsByRole and reference
PdfName.Figure/PdfName.Span and TryRemoveStructElemFromParent behavior when
constructing the scenario.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08e401a and 940d663.

📒 Files selected for processing (2)
  • server.core/Remediate/PdfRemediationProcessor.cs
  • tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
server.core/Remediate/PdfRemediationProcessor.cs (1)

580-595: Counter semantics now correctly track removed vs. demoted — LGTM.

The previous review's concern about unconditional counter increments has been addressed: contentlessFiguresRemoved is only incremented on a successful TryRemoveStructElemFromParent return, and contentlessFiguresDemoted only on the fallback path. Same pattern is applied consistently at Lines 610–625 for links.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server.core/Remediate/PdfRemediationProcessor.cs` around lines 580 - 595,
Counters for contentless figures were fixed to increment only on the branch
taken; apply the same semantics to the link-handling block: in the loop over
PdfStructTreeIndex.ListStructElementsByRole(pdf, PdfName.Link) ensure you call
TryRemoveStructElemFromParent(link) and only increment contentlessLinksRemoved
when it returns true, otherwise set link.Put(PdfName.S, RoleSpan) and increment
contentlessLinksDemoted; reference the existing methods
StructElemHasAssociatedContent and TryRemoveStructElemFromParent to locate and
mirror the corrected pattern used for figures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server.core/Remediate/PdfRemediationProcessor.cs`:
- Around line 1070-1086: In TryRemoveStructElemFromParent, when handling the
PdfArray case (variable array/kids), after removing matching items set
removedAny as now and if removedAny is true and array.Size() == 0 remove the
parent's /K entry (call parent.Remove(PdfName.K)) before returning; ensure the
method still returns true when removals occurred. This mirrors the single-child
branch and prevents leaving an empty /K array on the parent.

---

Duplicate comments:
In `@server.core/Remediate/PdfRemediationProcessor.cs`:
- Around line 580-595: Counters for contentless figures were fixed to increment
only on the branch taken; apply the same semantics to the link-handling block:
in the loop over PdfStructTreeIndex.ListStructElementsByRole(pdf, PdfName.Link)
ensure you call TryRemoveStructElemFromParent(link) and only increment
contentlessLinksRemoved when it returns true, otherwise set link.Put(PdfName.S,
RoleSpan) and increment contentlessLinksDemoted; reference the existing methods
StructElemHasAssociatedContent and TryRemoveStructElemFromParent to locate and
mirror the corrected pattern used for figures.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 940d663 and be5e6fa.

📒 Files selected for processing (1)
  • server.core/Remediate/PdfRemediationProcessor.cs

@srkirkland srkirkland merged commit bd3222f into main Feb 25, 2026
3 checks passed
@srkirkland srkirkland deleted the srk/alt-issue2 branch February 25, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant