Skip to content

[Bug]: md-fit strips meaningful content metadata (usernames, attribution) not just boilerplate #1900

@mattheworiordan

Description

@mattheworiordan

crawl4ai version

0.8.6

Expected Behavior

md-fit should strip page chrome (navigation, footers, sidebars, cookie banners) but preserve meaningful content metadata like author names and attribution on user-generated content. Comment attribution (who said what) is semantic content, not boilerplate.

Current Behavior

md-fit strips comment author usernames and profile links from GitHub pages while keeping the comment body and date. The output shows commented Apr 6, 2026 with no indication of who wrote it.

With -o md, you get:

**[mattheworiordan](https://gist.github.com/mattheworiordan)** commented Apr 6, 2026

With -o md-fit, you get:

commented Apr 6, 2026

The username and profile link are gone. On pages with multiple commenters, it's impossible to tell who said what.

Is this reproducible?

Yes

Inputs Causing the Bug

- URL: https://gist.github.com/mattheworiordan/99ba717f7a8ca5a7f838913722cfe7ac
- Settings: `-o md-fit` (default fit mode, no other flags)

Steps to Reproduce

# 1. Crawl the gist with full markdown - username preserved
crwl crawl https://gist.github.com/mattheworiordan/99ba717f7a8ca5a7f838913722cfe7ac -o md

# Look for: **[mattheworiordan](...)** commented Apr 6, 2026

# 2. Same URL with md-fit - username stripped
crwl crawl https://gist.github.com/mattheworiordan/99ba717f7a8ca5a7f838913722cfe7ac -o md-fit

# Look for: commented Apr 6, 2026  (no username)

Code snippets

# Not a code bug - this is CLI / markdown generation behaviour.
# Reproduces purely via crwl CLI as shown above.

OS

macOS (Darwin 25.2.0)

Python version

3.14.3

Browser

(default Chromium managed by crawl4ai)

Browser version

(crawl4ai default)

Error logs & Screenshots (if applicable)

n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    ⚙️ In-progressIssues, Features requests that are in Progress🐞 BugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions