┌──────────────────────────────────────────────────────────────────────┐
│ Render Workflow │
│ │
│ sf-pulse-python-daily (cron) ───▶ sf-pulse-python-workflow │
│ (triggers via Render SDK) (runs Python tasks) │
│ │
│ daily_refresh orchestrator │
│ ├── fetch_eater_sf ──▶ list[RawArticle] │
│ ├── fetch_sfist ──▶ list[NewRestaurant] (regex) │
│ ├── fetch_michelin ──▶ list[NewRestaurant] (regex) │
│ ├── search_restaurants ──▶ list[RawArticle] (DDG) │
│ ├── fetch_funcheap ──▶ list[NewEvent] │
│ ├── fetch_famsf ──▶ list[NewEvent] │
│ ├── fetch_cal_academy ──▶ list[NewEvent] │
│ ├── search_events ──▶ list[RawArticle] │
│ │ │
│ ▼ │
│ LLM extraction (OpenAI/Anthropic) — articles → structured items │
│ ▼ │
│ apply_discovered_items │
│ ├── deduplicate │
│ ├── upsert (Postgres ON CONFLICT) │
│ ├── broadcast SSE event ─────────┐ │
│ └── push to subscribers ──┐ │ │
└───────────────────────────────┼──────┼────────────────────────────────┘
│ │
▼ ▼
┌────────────────────────────┐
│ sf-pulse-python │
│ (FastAPI web) │
│ │
│ / (Jinja2) │
│ /api/* (JSON) │
│ /api/events-stream (SSE) │
│ /diagram/* (Vite/React) │
└────────────────────────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Postgres │ │ Redis │
│ (sf-...-db) │ │ (sf-...-realtime) │
└──────────┘ └──────────┘
Each scraper in app.sources produces either list[NewRestaurant] / list[NewEvent] directly (regex sources: SFist, Michelin, FunCheap, FAMSF, Cal Academy) or list[RawArticle] for the LLM pipeline to extract from (Eater SF, DuckDuckGo).
app.llm.pipeline.extract_restaurants_from_articles and extract_events_from_articles batch articles into ~12K-character chunks, send each batch to the configured provider (OpenAI via chat.completions.parse with a Pydantic response_format, or Anthropic via tool-use), and merge results.
The provider is auto-detected from the LLM_API_KEY prefix (sk-ant- → Anthropic, else OpenAI) unless LLM_PROVIDER is explicitly set.
If LLM_API_KEY is not configured, the factory returns None and the pipeline emits an empty list — callers continue with regex-only sources.
- Restaurants:
identity_key = lower(name) | (lower(address) || lower(neighborhood)). ON CONFLICT (identity_key) updates fields. - Events:
dedupe_key = lower(title) | lower(location) | lower(normalized_date_text). Events with the same dedupe key are treated as the same event. app.refreshalso has fuzzier matching strategies for "near-miss" duplicates (e.g. address normalization, source-URL match) before falling back to identity-key match.
app.sse.broadcast(event, data)publishes to Redis (sf-pulse:realtimechannel) whenREDIS_URLis set, falling back to in-process fan-out otherwise.- The
/api/events-streamendpoint creates a per-client async queue. Heartbeats every 25 seconds. - The browser receives
restaurantsandeventsevents with{version, upserted, deleted, summary}payloads. The currentstatic/home.jsdoes a soft reload after a brief debounce; a future enhancement could splice rows in place.
- VAPID keys live in
VAPID_PUBLIC_KEY/VAPID_PRIVATE_KEY. If unset, the push fan-out is silently skipped. - After
apply_discovered_itemsfinishes, only subscribers whose preferences match the new items receive a push (restaurant_matches_push_preferences/event_matches_push_preferences). - Push provider endpoints are restricted to a trusted hostname allowlist (
is_trusted_push_endpoint).