This project replicates the core search and fetch capabilities of Aas-ee/open-webSearch using Python. It exposes a CLI that can:
- Search multiple engines (Bing, DuckDuckGo, Baidu, Brave, Exa, Startpage, CSDN, Juejin, Linux.do)
- Fetch full-length articles from CSDN, Linux.do, and Juejin
- Download GitHub
README.*files without hitting the API
python -m pip install --upgrade pip
pip install -e .Install the project locally and consume the webprobe package directly:
from webprobe import WebProbeServer, search, fetch_csdn
print(search("visible web", limit=5))
print(fetch_csdn("https://blog.csdn.net/example/article/details/xxxxx"))
# Start the bundled HTTP server (serves /search and /fetch?kind=csdn)
server = WebProbeServer(host="0.0.0.0", port=3210)
try:
server.serve_forever()
finally:
server.shutdown()The HTTP server exposes /search?query=...&limit=...&engines=... and /fetch?kind=<csdn|linuxdo|juejin|github>&url=....
Run python main.py --help to see available commands. Key subcommands:
python main.py search "open websearch" --limit 12 --engines bing,duckduckgoEach fetcher prints JSON or plain text:
python main.py fetch-csdn <url>python main.py fetch-linuxdo <url>python main.py fetch-juejin <url>python main.py fetch-github <repo-url>
Environment variables mirror the TypeScript version:
| Variable | Default | Description |
|---|---|---|
DEFAULT_SEARCH_ENGINE |
bing |
Default search engine |
ALLOWED_SEARCH_ENGINES |
(empty) | Comma-separated whitelist |
USE_PROXY / PROXY_URL |
false / http://127.0.0.1:7890 |
HTTP proxy for requests |
Set USE_PROXY=true to route all HTTP traffic through PROXY_URL.
src/engine/search_service.pyorchestrates multi-engine searches with distribution logic.src/engines/*implement individual search/fetch adapters for each provider.src/utils/contains HTTP helpers, Playwright bridges for future browser fallbacks, and shared fetch logic for CSDN articles.