Skip to content

Demotes struct elements lacking associated content#44

Merged
srkirkland merged 1 commit intomainfrom
srk/alt-no-content
Feb 24, 2026
Merged

Demotes struct elements lacking associated content#44
srkirkland merged 1 commit intomainfrom
srk/alt-no-content

Conversation

@srkirkland
Copy link
Member

@srkirkland srkirkland commented Feb 24, 2026

don't try to add alt text to fake figures

Summary by CodeRabbit

  • New Features

    • Enhanced PDF remediation to identify and demote contentless structure elements by removing associated Alt text and converting their role to a neutral structure type, improving document accessibility and compliance.
  • Tests

    • Added integration test suite for contentless element handling in PDF remediation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

The changes add logic to detect and demote contentless structure elements in PDF remediation. Helper methods determine when structure elements lack associated content; a fallback pass then removes placeholder Alt text and demotes these elements to neutral Span roles. Integration tests validate the demotion behavior on tagged PDFs with contentless figures.

Changes

Cohort / File(s) Summary
Core Remediation Logic
server.core/Remediate/PdfRemediationProcessor.cs
Introduces RoleSpan constant and three helper methods (StructElemHasAssociatedContent, StructElemKidsContainAssociatedContent, DereferenceStructTreeNode) to detect contentless elements. Adds conditional demotion logic in the fallback pass for figures and links, removing Alt text and setting role to Span for contentless nodes. Extends logging to track contentlessFiguresDemoted and contentlessLinksDemoted counts.
Integration Test Suite
tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs
Adds comprehensive integration test class with two test cases validating contentless element demotion: one for contentless figures without Alt, another for contentless figures with placeholder Alt. Includes utilities (CreateTaggedPdfWithContentlessFigure, struct-tree traversal helpers, GetAlt) and fake service implementations (FakeAltTextService, FakePdfTitleService) for deterministic testing.

Sequence Diagram

sequenceDiagram
    participant Proc as PdfRemediationProcessor
    participant StructTree as Structure Tree
    participant AltSvc as Alt Text Service
    participant Logger as Logger

    rect rgba(70, 130, 180, 0.5)
    Note over Proc: Remediation Process (Fallback Pass)
    end

    loop For each Figure/Link element
        Proc->>StructTree: Check StructElemHasAssociatedContent()
        StructTree-->>Proc: has content? (true/false)
        
        alt No Associated Content
            Proc->>StructTree: Remove Alt text
            Proc->>StructTree: Set Role to Span
            Proc->>Proc: Increment demotion counter
            Proc->>Logger: Log demotion event
        else Has Associated Content
            Note over Proc: Skip remediation
        end
    end

    Proc->>Logger: Log final summary with demotion counts
    Logger-->>Proc: Summary emitted
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

Poem

🐰 Hop along, dear PDF friends!
Contentless spans now meet their ends,
We demote without a fuss,
Placeholder text? No bother to us!
Neutral roles for all who lack—
Remediation's got your back!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Demotes struct elements lacking associated content' directly describes the main technical change: handling contentless structure elements by demoting their role to Span, which prevents unnecessary alt text assignment to fake figures.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch srk/alt-no-content

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
server.core/Remediate/PdfRemediationProcessor.cs (1)

1082-1090: DereferenceStructTreeNode duplicates PdfStructTreeIndex.Dereference.

This method (lines 1082-1090) is byte-for-byte identical to the private Dereference method inside the nested PdfStructTreeIndex record (lines 2102-2110). Consider extracting a single shared static helper at the PdfRemediationProcessor class level to avoid the duplication.

♻️ Suggested consolidation
-    private static PdfObject DereferenceStructTreeNode(PdfObject obj)
-    {
-        if (obj is PdfIndirectReference reference)
-        {
-            return reference.GetRefersTo(true);
-        }
-
-        return obj;
-    }

Then update StructElemKidsContainAssociatedContent (line 1046) and PdfStructTreeIndex.Dereference callers to use a single shared helper, e.g.:

+    private static PdfObject Dereference(PdfObject obj)
+        => obj is PdfIndirectReference reference ? reference.GetRefersTo(true) : obj;

(rename the existing PdfStructTreeIndex.Dereference to call through, or inline)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server.core/Remediate/PdfRemediationProcessor.cs` around lines 1082 - 1090,
Extract the duplicated dereference logic into a single private static helper on
PdfRemediationProcessor (e.g., a method that takes PdfObject and returns
PdfIndirectReference.GetRefersTo(true) when the input is a
PdfIndirectReference), then update the existing
PdfRemediationProcessor.DereferenceStructTreeNode and the nested
PdfStructTreeIndex.Dereference to delegate to that shared helper (or remove the
nested duplicate and call the new helper directly from
StructElemKidsContainAssociatedContent and any PdfStructTreeIndex callers);
reference the existing methods DereferenceStructTreeNode,
PdfStructTreeIndex.Dereference, StructElemKidsContainAssociatedContent and the
PdfStructTreeIndex record when making the change.
tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs (1)

223-252: Note: GetFallbackAltTextForImage() return value is coupled to the processor's PlaceholderImageAltText constant.

FakeAltTextService.GetFallbackAltTextForImage() returns "alt text for image", which must match the processor's PlaceholderImageAltText constant for HasPlaceholderAlt to trigger. If that constant ever changes, this test will silently pass without actually testing the placeholder path. Consider adding a brief comment noting the coupling, or extracting the value to a shared constant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs`
around lines 223 - 252, The test's
FakeAltTextService.GetFallbackAltTextForImage() is hard-coded to "alt text for
image" and is coupled to the processor's PlaceholderImageAltText constant which
makes HasPlaceholderAlt assertions fragile; either reference a shared constant
for the fallback value (extract the processor's PlaceholderImageAltText into a
test-visible constant and use that in
FakeAltTextService.GetFallbackAltTextForImage and/or the test assertions) or add
a clear comment above FakeAltTextService.GetFallbackAltTextForImage noting the
coupling to PlaceholderImageAltText so future changes don't silently break the
test; update references to use the shared symbol (PlaceholderImageAltText or
newly extracted constant) and adjust HasPlaceholderAlt assertions accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@server.core/Remediate/PdfRemediationProcessor.cs`:
- Around line 1082-1090: Extract the duplicated dereference logic into a single
private static helper on PdfRemediationProcessor (e.g., a method that takes
PdfObject and returns PdfIndirectReference.GetRefersTo(true) when the input is a
PdfIndirectReference), then update the existing
PdfRemediationProcessor.DereferenceStructTreeNode and the nested
PdfStructTreeIndex.Dereference to delegate to that shared helper (or remove the
nested duplicate and call the new helper directly from
StructElemKidsContainAssociatedContent and any PdfStructTreeIndex callers);
reference the existing methods DereferenceStructTreeNode,
PdfStructTreeIndex.Dereference, StructElemKidsContainAssociatedContent and the
PdfStructTreeIndex record when making the change.

In
`@tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs`:
- Around line 223-252: The test's
FakeAltTextService.GetFallbackAltTextForImage() is hard-coded to "alt text for
image" and is coupled to the processor's PlaceholderImageAltText constant which
makes HasPlaceholderAlt assertions fragile; either reference a shared constant
for the fallback value (extract the processor's PlaceholderImageAltText into a
test-visible constant and use that in
FakeAltTextService.GetFallbackAltTextForImage and/or the test assertions) or add
a clear comment above FakeAltTextService.GetFallbackAltTextForImage noting the
coupling to PlaceholderImageAltText so future changes don't silently break the
test; update references to use the shared symbol (PlaceholderImageAltText or
newly extracted constant) and adjust HasPlaceholderAlt assertions accordingly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb458e8 and b699fec.

📒 Files selected for processing (2)
  • server.core/Remediate/PdfRemediationProcessor.cs
  • tests/server.tests/Integration/Remediate/PdfRemediationProcessorAltTextAssociationTests.cs

@srkirkland srkirkland merged commit 08e401a into main Feb 24, 2026
3 checks passed
@srkirkland srkirkland deleted the srk/alt-no-content branch February 24, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant