Skip to content

Ddilibe/webprobe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Downloads PyPI version Wheel licence

WebProbe (Python port of open-webSearch)

This project replicates the core search and fetch capabilities of Aas-ee/open-webSearch using Python. It exposes a CLI that can:

  • Search multiple engines (Bing, DuckDuckGo, Baidu, Brave, Exa, Startpage, CSDN, Juejin, Linux.do)
  • Fetch full-length articles from CSDN, Linux.do, and Juejin
  • Download GitHub README.* files without hitting the API

Installation

python -m pip install --upgrade pip
pip install -e .

Package usage

Install the project locally and consume the webprobe package directly:

from webprobe import WebProbeServer, search, fetch_csdn

print(search("visible web", limit=5))
print(fetch_csdn("https://blog.csdn.net/example/article/details/xxxxx"))

# Start the bundled HTTP server (serves /search and /fetch?kind=csdn)
server = WebProbeServer(host="0.0.0.0", port=3210)
try:
    server.serve_forever()
finally:
    server.shutdown()

The HTTP server exposes /search?query=...&limit=...&engines=... and /fetch?kind=<csdn|linuxdo|juejin|github>&url=....

CLI

Run python main.py --help to see available commands. Key subcommands:

search

python main.py search "open websearch" --limit 12 --engines bing,duckduckgo

Article fetchers

Each fetcher prints JSON or plain text:

  • python main.py fetch-csdn <url>
  • python main.py fetch-linuxdo <url>
  • python main.py fetch-juejin <url>
  • python main.py fetch-github <repo-url>

Configuration

Environment variables mirror the TypeScript version:

Variable Default Description
DEFAULT_SEARCH_ENGINE bing Default search engine
ALLOWED_SEARCH_ENGINES (empty) Comma-separated whitelist
USE_PROXY / PROXY_URL false / http://127.0.0.1:7890 HTTP proxy for requests

Set USE_PROXY=true to route all HTTP traffic through PROXY_URL.

Architecture

  • src/engine/search_service.py orchestrates multi-engine searches with distribution logic.
  • src/engines/* implement individual search/fetch adapters for each provider.
  • src/utils/ contains HTTP helpers, Playwright bridges for future browser fallbacks, and shared fetch logic for CSDN articles.

About

python implementation of a mcp web browser tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages