feat(upgrade-signal): wire upgrade-signal support into consensus nodes#3726
Conversation
🟡 Heimdall Review Status
|
d39a50b to
1715ab8
Compare
1715ab8 to
a1ea274
Compare
a1ea274 to
79bbd92
Compare
refcell
left a comment
There was a problem hiding this comment.
Looking very good. Left a few comments that I think are worth addressing before we merge.
|
Review Error for refcell @ 2026-06-24 11:38:25 UTC |
| /// Hardfork IDs that are allowed to mutate the local schedule. | ||
| /// Upgrade IDs that are allowed to mutate the local schedule. | ||
| /// | ||
| /// If omitted, every read hardfork ID is eligible for application when the selected mode | ||
| /// If omitted, every read upgrade ID is eligible for application when the selected mode | ||
| /// permits schedule mutation. | ||
| #[arg( | ||
| long = "upgrade-signal.apply-hardfork-id", | ||
| env = "BASE_NODE_UPGRADE_SIGNAL_APPLY_HARDFORK_ID", | ||
| long = "upgrade-signal.apply-upgrade-id", | ||
| env = "BASE_NODE_UPGRADE_SIGNAL_APPLY_UPGRADE_ID", | ||
| value_delimiter = ',' | ||
| )] | ||
| pub apply_hardfork_ids: Vec<String>, | ||
| pub apply_upgrade_ids: Vec<String>, |
There was a problem hiding this comment.
I think there is a subtle runtime issue here when upgrade-signal.apply-upgrade-id is configured to a subset of upgrades. Let me walk through what I'm seeing happens.
The runtime refresh path reads the L1 schedule for upgrade_ids, then filters it down to apply_upgrade_ids via application_schedule(). The filtered schedule is then passed into UpgradeSignalRuntimeApplier::apply_schedule(), which creates a fresh RuntimeRegistrySink.
When apply_schedule_to_sink() finishes, it calls staged_sink.finalize(), and RuntimeRegistrySink::finalize() commits by calling RuntimeUpgradeRegistry::replace_overrides(chain_id, updates) or clear_chain() if the filtered updates are empty. So the filtered application schedule is treated as the complete replacement set for the whole chain.
This means a runtime refresh can unintentionally drop existing runtime overrides for upgrades that were not included in apply_upgrade_ids. For example, if the registry currently has {Azul: 100, Cobalt: 300}, and the node is configured to read [Azul, Beryl, Cobalt] but apply only [Beryl], then the L1 read may correctly return all three signals, but application_schedule() reduces it to only {Beryl: 200}. finalize() then replaces the entire chain registry with just {Beryl: 200}, removing Azul and Cobalt even though L1 did not clear them and the user only scoped which upgrade should be applied.
I'm not sure we should allow this behavior.
One option is to just remove apply_upgrade_ids and use upgrade_ids, requiring the applied schedule to be complete. Alternatively, we could support a subset by making the runtime path merge touched IDs instead of replacing the whole registry, e.g. update only the IDs present in the filtered schedule and avoid calling clear_chain() for an empty scoped schedule. If both full replacement and scoped updates are desired, the code probably needs to carry that distinction explicitly so finalize() knows whether it is committing a complete schedule or only a selected subset.
There was a problem hiding this comment.
I went for the full schedule apply to keep it simple for now, as we don't gain too much from individual upgrade reads. We may want to add a read full schedule to the smart contract to further simplify it.
refcell
left a comment
There was a problem hiding this comment.
Took another look and I think there's a couple issues with how we configure applying upgrades as well as blocking node startup on failure.
79bbd92 to
6156bad
Compare
Adds the upgrade signal metrics actor, the admin_refreshUpgradeSignal RPC, startup schedule application to RollupConfig, and the three rollout modes (metrics-only -> startup-apply -> runtime-admin) in the consensus CLI/service. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
The base rpc command reads one pinned startup schedule and applies it to both the execution and consensus configs before launching embedded services. Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019ef0a8-18c1-74ae-b89f-13064d7d20b3 Co-authored-by: Amp <amp@ampcode.com>
Default the execution upgrade-signal reader to the consensus L1 RPC in integrated commands, honor the upgrade-signal L1 RPC override during consensus startup reads, and fix the rebased protocol test call sites for the updated L1BlockInfoTx::try_new signature. Amp-Thread-ID: https://ampcode.com/threads/T-019ef559-ecab-707e-bbb4-f9222db8a5a3 Co-authored-by: Amp <amp@ampcode.com>
6156bad to
c276f0d
Compare
Review SummaryThis PR refactors upgrade signal handling across execution and consensus layers, introducing unified startup application ( No new critical issues found beyond what previous inline comments have raised. The prior findings remain relevant: Still-relevant prior findings
Resolved prior findings
Architecture notes
|
✅ base-std fork tests: all 616 passedbase/base is fully in sync with the base-std spec.
|
| UpgradeSignalRuntimeValidation::with_activation_admin_address( | ||
| execution_chain.activation_admin_address, | ||
| ); |
There was a problem hiding this comment.
I think we should be able to remove this now. In cobalt, the activation admin address will be in L2 state so it won't be a ChainConfig owned address. As such, I think we should be able to simplify this. Can be done in a follow on, maybe just track it with a linear ticket please so it doesn't get lost
refcell
left a comment
There was a problem hiding this comment.
Happy to land this as is and clean up in follow-ons
Summary
layerandupgradelabelsbase-upgrade-signalexportsTesting
cargo fmt --checkgit diff --check origin/main..HEADcargo check -p base-upgrade-signal -p base-consensus-node -p base-execution-cli -p base-node-core -p basecargo test -p base-upgrade-signal --libcargo test -p base-consensus-cli -p base-execution-cli -p base-node-core --lib upgrade_signalcargo test -p base --bin base upgrade_signal