feat(incidents): Create postmortem doc at incident declaration#222
feat(incidents): Create postmortem doc at incident declaration#222rgibert wants to merge 7 commits into
Conversation
Move Notion postmortem page creation from the dumpslack flow to on_incident_created so the doc exists from the start. The page is added as a Slack bookmark without posting a message. When dumpslack runs later (on status change to mitigated/done/postmortem), it finds the existing page and populates it with channel content. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/4jYVXERt_otp5j1rF_YAEgnh2RY8c8mk3U9J19IuwUI
…th declaration When _create_postmortem_doc commits a placeholder row (url="") and is still calling the Notion API, _trigger_slack_dump could see the empty URL and fall through to create a second page. Poll for the URL to appear before proceeding, avoiding orphaned Notion pages. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/zceqUSIMWMxN-_wwsojxMiKzfo3l8cdsSTarMkl9IOM
…ing loop When _create_postmortem_doc fails and cleans up the placeholder row, refresh_from_db() raises DoesNotExist. When the process crashes without cleanup, the empty placeholder blocks all retries indefinitely. Now catch DoesNotExist during polling and re-acquire the row on either deletion or timeout, taking ownership of orphaned placeholders instead of telling the user to retry manually. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/gaiK6_e-BFE9nt5GSu3LgLCnqJ_a_AIUTpycqehKAUs
…rning empty When dumpslack loses the re-lock race (another process saved a Notion URL first), use the winner's page URL and continue with apply_template to populate it with Slack history. Previously the loser returned immediately, leaving the winning page empty. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/cl-NMNVbu5jWyVJ1ATQwcvv_niIuI5oowy27M4fC3KY
| else: | ||
| notion_link.url = page_url | ||
| notion_link.save(update_fields=["url"]) | ||
| notion_page_created = True |
There was a problem hiding this comment.
Bookmark failure never retried later
Medium Severity
Postmortem Slack bookmarks are now added in _create_postmortem_doc, while dumpslack still only calls add_bookmark when it created the page (notion_page_created). If early creation saves the Notion link but add_bookmark fails, a later dumpslack run uses the existing link and never retries the bookmark, so the channel can lack a postmortem bookmark permanently.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6c93a66. Configure here.
There was a problem hiding this comment.
False positive for this PR. The bookmark retry gap is pre-existing -- before this PR, if dumpslack's own add_bookmark call failed, there was no retry mechanism either.
| "Race condition: concurrent call already created postmortem doc for %s", | ||
| incident.incident_number, | ||
| ) | ||
| return |
There was a problem hiding this comment.
Race loser drops orphan Notion page
Medium Severity
When _create_postmortem_doc re-locks and finds link.url already set, it logs and returns without using the stored URL. If dumpslack (or another racer) saved first, this path leaves the Notion page it just created unused.
Reviewed by Cursor Bugbot for commit 84673d7. Configure here.
…ing tests _trigger_slack_dump left an empty-URL ExternalLink placeholder when the Notion API call failed, causing a 15-second polling delay on retry and blocking _create_postmortem_doc from creating the page. Add the same cleanup that _create_postmortem_doc already performs. Also add tests for the pre-existing URL branch (dumpslack finds an already-populated Notion URL and calls apply_template with update_slack=True) and for the placeholder cleanup on failure. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/3OqlwFB6o1EuMYtSikhAnJGwfLF5tR9hl2uTJbVDUTQ
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 48ea6dc. Configure here.
| if not page_id: | ||
| return |
There was a problem hiding this comment.
Bug: In a race condition, if the winner's Notion URL is unparseable, the function returns silently without notifying the user, unlike other similar error paths.
Severity: MEDIUM
Suggested Fix
Add error handling to the race condition loser path. When _extract_notion_page_id returns None, post an error message to the Slack channel, similar to the handling on lines 99-113. This will inform the user about the failure to parse the Notion page URL.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/firetower/slack_app/handlers/dumpslack.py#L145-L146
Potential issue: In `_trigger_slack_dump`, when a race condition loser attempts to adopt
the winner's Notion page, it calls `_extract_notion_page_id` on the winner's URL. If
this URL is malformed and a page ID cannot be extracted, the function silently returns.
This leaves the user with an empty, unpopulated postmortem page and no notification
explaining the failure. This behavior is inconsistent with a similar check for existing
URLs earlier in the function, which correctly posts an error message to the Slack
channel. While a malformed URL is an edge case, it could occur due to database
corruption or an unexpected API response.
| from firetower.auth.models import ExternalProfile, ExternalProfileType | ||
| from firetower.incidents.hooks import ( | ||
| DEFAULT_STATUSPAGE_WARNING_BUFFER_MINUTES, | ||
| _create_postmortem_doc, |
There was a problem hiding this comment.
Postmortem template markdown silently skipped for incidents using the new early-creation flow
_create_postmortem_doc creates the Notion postmortem page via create_postmortem_page and saves the URL, but unlike create_troubleshooting_page (which applies troubleshooting_template_markdown), it never renders the configured postmortem template_markdown. When dumpslack later runs on a status change, _trigger_slack_dump finds the pre-existing URL (db_record_created=False, existing_url set) and unconditionally sets update_slack=True, so apply_template's if not update_slack and self.template_markdown: branch is skipped. The result is that the postmortem template is never populated for any incident that goes through the normal flow — a behavioral regression, since previously dumpslack created the page itself with update_slack=False and applied the template.
Evidence
_create_postmortem_doc(hooks.py:802) callsnotion.create_postmortem_page(...)and savespage['url'], but never callsapply_template/_send_markdown;create_postmortem_page(notion.py:162) only sets page properties and does not rendertemplate_markdown.- Contrast
create_troubleshooting_page(notion.py), which sendstroubleshooting_template_markdownafter page creation — the postmortem path has no equivalent. - In dumpslack
_trigger_slack_dump, when the row already exists with a URL, line ~98 entersif not db_record_created and existing_url:and setsupdate_slack = True(dumpslack.py:113). apply_template(notion.py:256) gates the template onif not update_slack and self.template_markdown:, so withupdate_slack=Truethe postmortemtemplate_markdownis never applied.
Identified by Warden code-review · KG8-CZB
| "Race condition: concurrent call already created postmortem doc for %s", | ||
| incident.incident_number, | ||
| ) | ||
| notion.archive_page(page["id"]) |
There was a problem hiding this comment.
notion.archive_page() called while holding SELECT FOR UPDATE lock inside transaction
Move notion.archive_page(page["id"]) outside the transaction.atomic() block — the _create_troubleshooting_doc sibling explicitly documents this rule: "Notion API calls happen outside the transaction to avoid holding the SELECT FOR UPDATE lock while making slow external requests."
Evidence
_create_troubleshooting_doc(hooks.py ~l.726) has an explicit comment: "Notion API calls happen outside the transaction to avoid holding the SELECT FOR UPDATE lock while making slow external requests" and never calls Notion insidetransaction.atomic().- In
_create_postmortem_doc, the secondwith transaction.atomic():block issuesExternalLink.objects.select_for_update().get(...)then, on the race-condition branch, callsnotion.archive_page(page["id"])at line 833 before the transaction closes — theSELECT FOR UPDATErow lock is held for the full duration of the HTTP call to Notion. NotionService.archive_pagecallsself.client.pages.update(...)(notion.py:158), a synchronous network request; if slow or hanging, it blocks the lock for all other writers on thisExternalLinkrow.- The
_create_troubleshooting_docrace-condition branch (hooks.py ~l.744) simply logs and returns without any Notion call inside the transaction, confirming the intended pattern.
Also found at 1 additional location
src/firetower/slack_app/handlers/dumpslack.py:142-142
Identified by Warden find-bugs · 8LG-TC8


Move Notion postmortem page creation from the dumpslack flow (triggered on status change to mitigated/done/postmortem) to
on_incident_createdso the doc exists from the start of the incident.The page is added as a Slack channel bookmark without posting a message to the channel. When dumpslack runs later on status change, it finds the existing page via the
ExternalLink(type=NOTION)record and populates it with Slack channel content and AI timeline -- no duplicate page creation occurs.The new
_create_postmortem_docfunction follows the same DB-dedup pattern as_create_troubleshooting_doc: SELECT FOR UPDATE to claim the ExternalLink row, Notion API call outside the transaction, race-condition guard on re-lock, and placeholder cleanup on failure.Resolves RELENG-32
Agent transcript: https://claudescope.sentry.dev/share/UOIi3zUcU_qU3uMtvatWsPs__r3nGeTIqoLl5MUiFnI