feat: add content freshness detection for external links (#35)#44
Open
feat: add content freshness detection for external links (#35)#44
Conversation
Implements comprehensive content freshness detection system to identify
potentially stale external documentation even when links return 200 OK.
## Key Features
- **Last-Modified Detection**: Checks HTTP last-modified headers against
configurable staleness thresholds (default: 2 years)
- **Content Pattern Detection**: Identifies deprecation notices, moved pages,
legacy documentation, and other staleness indicators
- **Content Change Detection**: Tracks content changes between validations
using SHA-256 content hashes with normalization
- **Domain-Specific Thresholds**: Customizable staleness periods for different
domains (e.g., 6 months for GitHub/Firebase vs 2 years for general sites)
- **Smart Caching**: Caches validation results with TTL and content-based
invalidation to improve performance
## CLI Integration
- `--check-content-freshness`: Enable staleness detection
- `--freshness-threshold <days>`: Configure staleness threshold (default: 730)
- Enhanced output showing fresh vs stale link counts with detailed warnings
- Stale links marked with [STALE] indicator and include suggestions
## Implementation
- ContentFreshnessDetector class with configurable thresholds and patterns
- Enhanced LinkValidator with GET requests for content analysis
- Extended BrokenLink type with freshness information
- Comprehensive test coverage with unit, integration, and CLI tests
## Output Example
```
📊 Validation Summary
Files processed: 3
Total links found: 15
Broken links: 2
Fresh external links: 8
Stale external links: 2
🔗 Broken Links Found:
📄 docs/api.md (1 broken):
❌ [external] https://api.example.com/deprecated (line 42) [STALE]
Warning: Content contains staleness indicators
Suggestion: Review content for updates or alternatives
Detected patterns: deprecated, no longer supported
```
Resolves #35
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements content freshness detection for external links to identify potentially stale documentation even when links return 200 OK status codes. This addresses real-world problems like outdated Firebase docs, deprecated GitHub Actions syntax, and moved API documentation.
Key Features
--check-content-freshnessand--freshness-threshold <days>Implementation Details
Example Usage
Test Coverage
Resolves #35
Test Plan