Skip to content

feat(incidents): Create postmortem doc at incident declaration#222

Open
rgibert wants to merge 7 commits into
mainfrom
rgibert/postmortem-doc-at-declaration
Open

feat(incidents): Create postmortem doc at incident declaration#222
rgibert wants to merge 7 commits into
mainfrom
rgibert/postmortem-doc-at-declaration

Conversation

@rgibert
Copy link
Copy Markdown
Member

@rgibert rgibert commented May 28, 2026

Move Notion postmortem page creation from the dumpslack flow (triggered on status change to mitigated/done/postmortem) to on_incident_created so the doc exists from the start of the incident.

The page is added as a Slack channel bookmark without posting a message to the channel. When dumpslack runs later on status change, it finds the existing page via the ExternalLink(type=NOTION) record and populates it with Slack channel content and AI timeline -- no duplicate page creation occurs.

The new _create_postmortem_doc function follows the same DB-dedup pattern as _create_troubleshooting_doc: SELECT FOR UPDATE to claim the ExternalLink row, Notion API call outside the transaction, race-condition guard on re-lock, and placeholder cleanup on failure.

Resolves RELENG-32

Agent transcript: https://claudescope.sentry.dev/share/UOIi3zUcU_qU3uMtvatWsPs__r3nGeTIqoLl5MUiFnI

Move Notion postmortem page creation from the dumpslack flow to
on_incident_created so the doc exists from the start. The page is
added as a Slack bookmark without posting a message. When dumpslack
runs later (on status change to mitigated/done/postmortem), it finds
the existing page and populates it with channel content.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/4jYVXERt_otp5j1rF_YAEgnh2RY8c8mk3U9J19IuwUI
Comment thread src/firetower/incidents/hooks.py
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 31, 2026

RELENG-32

…th declaration

When _create_postmortem_doc commits a placeholder row (url="") and is
still calling the Notion API, _trigger_slack_dump could see the empty
URL and fall through to create a second page. Poll for the URL to
appear before proceeding, avoiding orphaned Notion pages.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/zceqUSIMWMxN-_wwsojxMiKzfo3l8cdsSTarMkl9IOM
@rgibert rgibert marked this pull request as ready for review June 1, 2026 14:36
@rgibert rgibert requested a review from a team as a code owner June 1, 2026 14:36
@rgibert rgibert self-assigned this Jun 1, 2026
Comment thread src/firetower/slack_app/handlers/dumpslack.py Outdated
Comment thread src/firetower/incidents/tests/test_hooks.py
…ing loop

When _create_postmortem_doc fails and cleans up the placeholder row,
refresh_from_db() raises DoesNotExist. When the process crashes without
cleanup, the empty placeholder blocks all retries indefinitely.

Now catch DoesNotExist during polling and re-acquire the row on either
deletion or timeout, taking ownership of orphaned placeholders instead
of telling the user to retry manually.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/gaiK6_e-BFE9nt5GSu3LgLCnqJ_a_AIUTpycqehKAUs
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/handlers/dumpslack.py
…rning empty

When dumpslack loses the re-lock race (another process saved a Notion
URL first), use the winner's page URL and continue with apply_template
to populate it with Slack history. Previously the loser returned
immediately, leaving the winning page empty.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/cl-NMNVbu5jWyVJ1ATQwcvv_niIuI5oowy27M4fC3KY
Comment thread src/firetower/slack_app/handlers/dumpslack.py
else:
notion_link.url = page_url
notion_link.save(update_fields=["url"])
notion_page_created = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bookmark failure never retried later

Medium Severity

Postmortem Slack bookmarks are now added in _create_postmortem_doc, while dumpslack still only calls add_bookmark when it created the page (notion_page_created). If early creation saves the Notion link but add_bookmark fails, a later dumpslack run uses the existing link and never retries the bookmark, so the channel can lack a postmortem bookmark permanently.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6c93a66. Configure here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive for this PR. The bookmark retry gap is pre-existing -- before this PR, if dumpslack's own add_bookmark call failed, there was no retry mechanism either.

Comment thread src/firetower/incidents/hooks.py
Comment thread src/firetower/incidents/hooks.py Outdated
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/tests/handlers/test_dumpslack.py
Comment thread src/firetower/incidents/hooks.py Outdated
"Race condition: concurrent call already created postmortem doc for %s",
incident.incident_number,
)
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race loser drops orphan Notion page

Medium Severity

When _create_postmortem_doc re-locks and finds link.url already set, it logs and returns without using the stored URL. If dumpslack (or another racer) saved first, this path leaves the Notion page it just created unused.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 84673d7. Configure here.

…ing tests

_trigger_slack_dump left an empty-URL ExternalLink placeholder when the
Notion API call failed, causing a 15-second polling delay on retry and
blocking _create_postmortem_doc from creating the page. Add the same
cleanup that _create_postmortem_doc already performs.

Also add tests for the pre-existing URL branch (dumpslack finds an
already-populated Notion URL and calls apply_template with
update_slack=True) and for the placeholder cleanup on failure.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/3OqlwFB6o1EuMYtSikhAnJGwfLF5tR9hl2uTJbVDUTQ
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 48ea6dc. Configure here.

Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment thread src/firetower/slack_app/handlers/dumpslack.py
Comment on lines +145 to +146
if not page_id:
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: In a race condition, if the winner's Notion URL is unparseable, the function returns silently without notifying the user, unlike other similar error paths.
Severity: MEDIUM

Suggested Fix

Add error handling to the race condition loser path. When _extract_notion_page_id returns None, post an error message to the Slack channel, similar to the handling on lines 99-113. This will inform the user about the failure to parse the Notion page URL.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/firetower/slack_app/handlers/dumpslack.py#L145-L146

Potential issue: In `_trigger_slack_dump`, when a race condition loser attempts to adopt
the winner's Notion page, it calls `_extract_notion_page_id` on the winner's URL. If
this URL is malformed and a page ID cannot be extracted, the function silently returns.
This leaves the user with an empty, unpopulated postmortem page and no notification
explaining the failure. This behavior is inconsistent with a similar check for existing
URLs earlier in the function, which correctly posts an error message to the Slack
channel. While a malformed URL is an edge case, it could occur due to database
corruption or an unexpected API response.

from firetower.auth.models import ExternalProfile, ExternalProfileType
from firetower.incidents.hooks import (
DEFAULT_STATUSPAGE_WARNING_BUFFER_MINUTES,
_create_postmortem_doc,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postmortem template markdown silently skipped for incidents using the new early-creation flow

_create_postmortem_doc creates the Notion postmortem page via create_postmortem_page and saves the URL, but unlike create_troubleshooting_page (which applies troubleshooting_template_markdown), it never renders the configured postmortem template_markdown. When dumpslack later runs on a status change, _trigger_slack_dump finds the pre-existing URL (db_record_created=False, existing_url set) and unconditionally sets update_slack=True, so apply_template's if not update_slack and self.template_markdown: branch is skipped. The result is that the postmortem template is never populated for any incident that goes through the normal flow — a behavioral regression, since previously dumpslack created the page itself with update_slack=False and applied the template.

Evidence
  • _create_postmortem_doc (hooks.py:802) calls notion.create_postmortem_page(...) and saves page['url'], but never calls apply_template/_send_markdown; create_postmortem_page (notion.py:162) only sets page properties and does not render template_markdown.
  • Contrast create_troubleshooting_page (notion.py), which sends troubleshooting_template_markdown after page creation — the postmortem path has no equivalent.
  • In dumpslack _trigger_slack_dump, when the row already exists with a URL, line ~98 enters if not db_record_created and existing_url: and sets update_slack = True (dumpslack.py:113).
  • apply_template (notion.py:256) gates the template on if not update_slack and self.template_markdown:, so with update_slack=True the postmortem template_markdown is never applied.

Identified by Warden code-review · KG8-CZB

"Race condition: concurrent call already created postmortem doc for %s",
incident.incident_number,
)
notion.archive_page(page["id"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notion.archive_page() called while holding SELECT FOR UPDATE lock inside transaction

Move notion.archive_page(page["id"]) outside the transaction.atomic() block — the _create_troubleshooting_doc sibling explicitly documents this rule: "Notion API calls happen outside the transaction to avoid holding the SELECT FOR UPDATE lock while making slow external requests."

Evidence
  • _create_troubleshooting_doc (hooks.py ~l.726) has an explicit comment: "Notion API calls happen outside the transaction to avoid holding the SELECT FOR UPDATE lock while making slow external requests" and never calls Notion inside transaction.atomic().
  • In _create_postmortem_doc, the second with transaction.atomic(): block issues ExternalLink.objects.select_for_update().get(...) then, on the race-condition branch, calls notion.archive_page(page["id"]) at line 833 before the transaction closes — the SELECT FOR UPDATE row lock is held for the full duration of the HTTP call to Notion.
  • NotionService.archive_page calls self.client.pages.update(...) (notion.py:158), a synchronous network request; if slow or hanging, it blocks the lock for all other writers on this ExternalLink row.
  • The _create_troubleshooting_doc race-condition branch (hooks.py ~l.744) simply logs and returns without any Notion call inside the transaction, confirming the intended pattern.
Also found at 1 additional location
  • src/firetower/slack_app/handlers/dumpslack.py:142-142

Identified by Warden find-bugs · 8LG-TC8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant