Skip to content

handle removing bad annotations#30

Merged
srkirkland merged 2 commits intomainfrom
srk/annotations
Feb 3, 2026
Merged

handle removing bad annotations#30
srkirkland merged 2 commits intomainfrom
srk/annotations

Conversation

@srkirkland
Copy link
Member

@srkirkland srkirkland commented Jan 31, 2026

Summary by CodeRabbit

  • New Features

    • PDF remediation now removes untagged or structurally orphaned annotations during processing.
    • Remediation records and logs the number of annotations removed for visibility.
  • Tests

    • Added an integration test suite validating that untagged annotations are removed while preserving tagged annotations.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2026

📝 Walkthrough

Walkthrough

Adds a PdfAnnotationRemediator that prunes untagged or structurally orphaned annotations from tagged PDFs, integrates it into PdfRemediationProcessor with logging, and includes an integration test suite validating removal while preserving correctly-tagged annotations.

Changes

Cohort / File(s) Summary
Core Annotation Remediation
server.core/Remediate/PdfAnnotationRemediator.cs
New internal static class with RemoveUntaggedAnnotations(PdfDocument, CancellationToken) that traverses pages/annotations, checks StructParent entries against the document ParentTree/NumberTree, removes annotations missing a valid mapping, and supports cancellation. Includes helpers: TryGetParentTree, NumberTreeContainsKey (recursive with visited set), and Dereference.
Processor Integration
server.core/Remediate/PdfRemediationProcessor.cs
Calls PdfAnnotationRemediator.RemoveUntaggedAnnotations during remediation; captures removal count and logs an info message when > 0.
Integration Tests & Helpers
tests/server.tests/Integration/Remediate/PdfRemediationProcessorTaggedAnnotationsTests.cs
New integration test verifying untagged annotations are removed and tagged annotations preserved. Adds test helpers for reading annotation stats, parent-tree/number-tree traversal (TryGetParentTree, NumberTreeContainsKeyRecursive, Dereference), fake services (FakeAltTextService, FakePdfTitleService), and repo-root discovery.

Sequence Diagram

sequenceDiagram
    participant Processor as PdfRemediationProcessor
    participant Remediator as PdfAnnotationRemediator
    participant PDF as PDF Document
    participant Tree as ParentTree/NumberTree

    Processor->>Remediator: RemoveUntaggedAnnotations(pdf, cancellationToken)
    Remediator->>PDF: Inspect Catalog / StructTreeRoot
    Remediator->>Tree: TryGetParentTree()
    Tree-->>Remediator: Return ParentTree (NumberTree root)
    loop For each Page
        Remediator->>PDF: Get / enumerate Annotations
        loop For each Annotation
            Remediator->>PDF: Dereference annotation, read StructParent
            alt StructParent missing
                Remediator-->>Remediator: Mark annotation for removal
            else StructParent present
                Remediator->>Tree: NumberTreeContainsKey(structParent) (recursive)
                alt Key not found
                    Remediator-->>Remediator: Mark annotation for removal
                end
            end
        end
        Remediator->>PDF: Remove marked annotations from Page
    end
    Remediator-->>Processor: Return totalRemoved
    Processor->>Processor: Log removal count (if > 0)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A trim for PDFs, neat and bright,
I hopped through trees to set things right.
Orphaned notes I gently cleared,
Tagged friends kept close, none were feared.
Hooray — clean pages take flight! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'handle removing bad annotations' is vague and uses non-descriptive language that doesn't clearly convey what specific functionality was added or what constitutes 'bad annotations'. Consider a more specific title such as 'Add PdfAnnotationRemediator to remove untagged annotations' or 'Remove untagged and orphaned annotations from PDFs during remediation' to better describe the actual changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch srk/annotations

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@server.core/Remediate/PdfAnnotationRemediator.cs`:
- Around line 19-50: The code currently skips validation when parentTree (from
TryGetParentTree) is null, leaving annotations with a StructParent
(dict.GetAsNumber(StructParentKey)) orphaned; update the loop in
PdfAnnotationRemediator.cs where annotations are iterated (the block that
computes structParent and checks parentTree) so that if parentTree is null and
structParent is present you treat that annotation as invalid and add it to
toRemove (same handling as when NumberTreeContainsKey returns false); reference
TryGetParentTree, parentTree, StructParentKey, NumberTreeContainsKey, toRemove
and annotation to locate and change the logic.

@srkirkland srkirkland merged commit 03c7748 into main Feb 3, 2026
2 of 3 checks passed
@srkirkland srkirkland deleted the srk/annotations branch February 3, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant