Skip to content

chore: Synth Daemon ADR#1579

Draft
megha-narayanan wants to merge 6 commits into
aws:feat/cdk-lsp-explorerfrom
megha-narayanan:feat/explorer-daemon-adr
Draft

chore: Synth Daemon ADR#1579
megha-narayanan wants to merge 6 commits into
aws:feat/cdk-lsp-explorerfrom
megha-narayanan:feat/explorer-daemon-adr

Conversation

@megha-narayanan
Copy link
Copy Markdown

Architecture Decision Record for the per-project synth daemon

Covers:

  • Why a daemon vs. no coordination vs. directory locking (flock)
  • Daemon lifecycle (start, discovery, shutdown, singleton enforcement)
  • CDK app stdout/stderr handling (via IIoHost)
  • TTY behavior
  • Failure modes and recovery
  • Assumptions that would invalidate this decision

Checklist

  • This change contains a major version upgrade for a dependency and I confirm all breaking changes are addressed
    • Release notes for the new version:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Addresses Rico's review comment on PR aws#1561 requesting architectural
justification. Covers: why a daemon (vs flock), lifecycle, stdout/TTY
handling, failure modes, and assumptions that would invalidate it.
and a web server open gets 2 synths per save. With rapid saves
during a refactor, both processes spend 100% of their time
synthesizing.
- No client knows whether its `cdk.out/` is fresh. Each must either
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we solve this with additional bookkeeping though?

synthesizing.
- No client knows whether its `cdk.out/` is fresh. Each must either
poll the filesystem or re-synth speculatively.
- Concurrent writes to `cdk.out/` from two simultaneous synths could
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't happen. There is already locking on the cdk.out directory.

- Concurrent writes to `cdk.out/` from two simultaneous synths could
produce inconsistent output.

#### Option B: Directory locking (flock on `cdk.out/`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't necessarily need to use flock(), there is already a locking system. It may not be great, but it's a small delta from what we currently use. Could potentially be improved by using flock(), who knows.

Comment on lines +52 to +53
- *If losers synth unconditionally after acquiring the lock
("optimistic"):* This is wasted compute — each client re-synths
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this says exactly. "If losers retry until they acquire the lock, nothing changes in between, but the loser still triggers a resynth, then that would be wasted effort".

True.

But:

  1. That's not what your diagram shows. Your diagram shows an actual change in between clients A and B (*), so I think the sequence of what happens in optimistic-flock.png is correct. First we have version A, and then we have version C. We can display both, as soon as we have it.
  2. It's also not necessary to unconditionally resynth. With some additional bookkeeping we can avoid it.

(*) Diagramming tip: if you call your clients A and B, name your versions something else (1, 2, 3, or X, Y, Z). I would have liked to just refer to "A" but that is now ambiguous whether I mean a client or a version.

Comment on lines +55 to +56
With 3 clients and frequent saves, the system may spend near 100%
of wall-clock time synthesizing. Since there is no method to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but watch out. Ask yourself what the essence is of what is happening.

If there are frequent saves (as in, saves come in faster than the time it takes to synthesize), how does a daemon prevent 100% wall clock time spent in synth?

I think you might be getting at a different fundamental trade-off here, CPU usage vs staleness in the face of frequent changes:

image

But note that this trade-off is exactly the same with or without a daemon!

Comment on lines +124 to +125
- Started by: first client that calls `acquireDaemon(projectDir)`
(attempts connection, spawns if absent)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What client? Only LSP and cdk explore, is that right?

Any plans of getting cdk watch in there? cdk synth perhaps?

Comment on lines +135 to +137
The daemon implements `IIoHost` from `@aws-cdk/toolkit-lib`. All
CDK app output during synthesis arrives as structured `IoMessage`
objects via `notify()`. Nothing is silently lost.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure today just attach the subprocess to the parent process' terminal, so this is not true.

The question is to engage with what will happen if we change that assumption. It may be hard for you to gauge the effect of this, as it's quite esoteric, I understand. Search the bug tracker for what happened last time we disconnected a user app from the terminal 😉

**Terminal attachment (TTY):**

Not a concern. The CDK app is always spawned with piped stdio by
`execInChildProcess` in toolkit-lib — `process.stdout.isTTY` is
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in toolkit-lib, but not in aws-cdk.

| ------------------------------ | ------------------------------------------- |
| Daemon crashes mid-synth | Client detects socket close, respawns |
| Concurrent spawn race | Exclusive lock file; loser re-checks socket |
| Synth hangs | 5-min timeout → `synthFailed` broadcast |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synth sometimes takes 40 minutes for some people.

Comment on lines +178 to +179
The file-write phase is fast (milliseconds), so the window is tiny
in practice.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true, it can takes dozens of seconds. Where do the sources for this statement come from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants