refactor(consensus): Acknowledge Sequencer Inserts#2513
refactor(consensus): Acknowledge Sequencer Inserts#2513
Conversation
🟡 Heimdall Review Status
|
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
| async fn execute(&self, state: &mut EngineState) -> Result<(), InsertTaskError> { | ||
| let result = self.insert_payload(state).await; | ||
| if self.result_tx.is_some() { | ||
| self.send_channel_result(result).await; | ||
| Ok(()) | ||
| } else { | ||
| result.map(|_| ()) | ||
| } | ||
| } |
There was a problem hiding this comment.
When result_tx is Some and insert_payload fails, this method sends the error through the channel and returns Ok(()). This means the outer EngineTask::execute retry loop (in task.rs:248-283) sees Ok(()) from execute_inner and does not retry the task — the Temporary retry logic is completely bypassed for sequencer inserts.
This is an intentional design choice (the sequencer handles the error via the channel response), but it has a subtle consequence: errors that would normally be retried by the engine (e.g. InsertFailed, UnexpectedPayloadStatus, ForkchoiceUpdateDidNotAdvance — all classified as Temporary) are instead surfaced as a one-shot failure to the sequencer. The sequencer's SealStepError::Insert path will retry on the next seal step tick, but this loses the tight yield_now retry loop the engine normally provides for transient RPC errors.
Worth confirming this is the intended trade-off. If transient engine RPC errors should still be retried at the engine level before responding to the sequencer, consider only returning Ok(()) on the success path and propagating errors normally (letting the engine retry), then sending the success result through the channel.
| let mut unsafe_head_rx = self.unsafe_head_rx.clone(); | ||
| loop { | ||
| let unsafe_head = *unsafe_head_rx.borrow_and_update(); | ||
| if unsafe_head == inserted_head { | ||
| trace!( | ||
| target: "sequencer", | ||
| block_number = inserted_head.block_info.number, | ||
| block_hash = %inserted_head.block_info.hash, | ||
| "Insert unsafe payload acknowledged" | ||
| ); | ||
| return Ok(inserted_head); | ||
| } | ||
|
|
||
| unsafe_head_rx.changed().await.map_err(|_| { | ||
| EngineClientError::ResponseError("unsafe head channel closed.".to_string()) | ||
| })?; | ||
| } |
There was a problem hiding this comment.
This loop waits for unsafe_head_rx to match exactly inserted_head. If a reorg, reset, or concurrent external unsafe insert causes the watch channel to advance past inserted_head (to a different block at the same or higher number), this loop will hang indefinitely — it can never match and the changed() call will keep succeeding as long as the sender is alive.
Consider adding a timeout, or comparing block numbers (e.g. unsafe_head.block_info.number >= inserted_head.block_info.number) rather than requiring exact equality, with appropriate handling for the case where the head advanced to a different block than expected.
| pub fn unsafe_payload_with_result( | ||
| client: Arc<EngineClient_>, | ||
| rollup_config: Arc<RollupConfig>, | ||
| envelope: BaseExecutionPayloadEnvelope, | ||
| result_tx: Option<mpsc::Sender<InsertTaskResult>>, | ||
| ) -> Self { | ||
| Self { | ||
| client, | ||
| rollup_config, | ||
| envelope, | ||
| payload_safety: InsertPayloadSafety::Unsafe, | ||
| result_tx, | ||
| } |
There was a problem hiding this comment.
When result_tx is None, this constructor is functionally identical to unsafe_payload. Consider whether this constructor should require a mpsc::Sender (non-optional) since its sole purpose is the "with result" variant, and use unsafe_payload when no channel is needed. This would make the API self-documenting and avoid the Option ambiguity at the call site.
At the enqueue_unsafe_payload_insert call site in engine_request_processor.rs, the external path already passes None — it could just call InsertTask::unsafe_payload directly instead.
|
Note for the next cleanup step: the acknowledged insert result is intentionally using an mpsc sender in this PR, even though the result is logically one-shot. The current task queue still requires InsertTask / EngineTask to be cloneable, and tokio oneshot senders are not Clone. Once the engine path is flattened and direct methods replace the task wrappers, we should convert this and the other request/response channels to oneshot where each request has exactly one response. |
| .await?; | ||
|
|
||
| if self.result_tx.is_some() && state.sync_state.unsafe_head() != new_block_ref { | ||
| return Err(InsertTaskError::ForkchoiceUpdateDidNotAdvance); |
There was a problem hiding this comment.
This guard only fires when result_tx.is_some(), meaning non-sequencer (external/derivation) inserts silently succeed even if the FCU didn't actually advance the unsafe head. If the FCU returning a mismatched head is a real concern (e.g., the EL rejected the forkchoice), it seems like it should be checked unconditionally regardless of the caller. If it's only relevant for the sequencer's watch-channel flow, the check might belong in send_channel_result or in insert_unsafe_payload on the service side rather than here, so the engine-level task doesn't have caller-specific branching.
Make local sequencer payload insertion return the inserted head after engine processing and unsafe-head watch advancement. Remove the sequencer parent handoff workaround and unused external SealRequest actor route. Co-authored-by: Codex <noreply@openai.com>
Update the action harness SequencerEngineClient implementation for acknowledged insert results and restore the insert task constructor as const for clippy. Co-authored-by: Codex <noreply@openai.com>
65ea1f1 to
0c0bc7b
Compare
Use the acknowledged inserted head as the next sequencer build parent instead of waiting for the unsafe-head watch channel to publish the same value. Co-authored-by: Codex <noreply@openai.com>
| async fn execute(&self, state: &mut EngineState) -> Result<(), InsertTaskError> { | ||
| let result = self.insert_payload(state).await; |
There was a problem hiding this comment.
When insert_payload fails and result_tx.is_some(), the error is sent to the sequencer via the channel and execute returns Ok(()). Since mpsc::Sender::send takes T by value, the error is moved into the channel — it's consumed. If the channel send itself fails (receiver dropped), the error is silently lost: send_channel_result logs a warning but has no way to propagate the original InsertTaskError back to the engine's retry loop.
This means a sequencer insert can fail with a Temporary error (e.g., InsertFailed, transient RPC timeout) AND the channel can be closed (sequencer task cancelled), and the engine treats it as success — no retry, no error propagation, no record of the failure beyond the warn! log.
| next_payload_to_seal = self.builder.build_on(inserted_head).await?; | ||
| if next_payload_to_seal.is_none() { | ||
| build_ticker.reset_immediately(); |
There was a problem hiding this comment.
When build_on(inserted_head) returns None (parent not yet indexed), reset_immediately() fires and the next tick falls through to the else branch at line 398 which calls self.builder.build(). build() reads the unsafe head from the watch channel. Since the engine has acknowledged the insert, the watch channel should reflect the correct head — but there's a race window between the engine updating its internal sync state and the watch channel sender publishing the new value. If the watch hasn't been updated yet, build() will use the previous (grandparent) head and create a block on the wrong parent.
This was the exact scenario the removed next_build_parent stash protected against. Worth confirming that the engine's watch channel publish is guaranteed to happen before insert_unsafe_payload responds on the result channel.
Review SummaryThis PR replaces the fire-and-forget sequencer insert with an acknowledged insert (engine responds via an FindingsError loss on channel failure ( Race window on Stale README reference: Missing unit test coverage: The |
Summary
This changes local sequencer payload insertion from enqueue acknowledgement to engine acknowledgement by returning the inserted unsafe head after engine processing and unsafe-head watch advancement. It removes the sequencer parent handoff workaround and deletes the unused external SealRequest actor path. The remaining external unsafe insert path stays fire-and-forget for network and delegated callers.