Skip to content

[ZEPPELIN-6419] Fix clone paragraph content loss caused by shortCircuit seq filtering#5254

Merged
tbonelee merged 4 commits into
apache:masterfrom
kevinjmh:ZEPPELIN-6419
Jun 3, 2026
Merged

[ZEPPELIN-6419] Fix clone paragraph content loss caused by shortCircuit seq filtering#5254
tbonelee merged 4 commits into
apache:masterfrom
kevinjmh:ZEPPELIN-6419

Conversation

@kevinjmh
Copy link
Copy Markdown
Member

@kevinjmh kevinjmh commented May 21, 2026

What is this PR for?

Problem
When cloning a paragraph in zeppelin-web-angular, the cloned paragraph only contains the interpreter binding (e.g., %mysql) but not the actual editor content (e.g., select 1 a, 2 a). The old zeppelin-web handles this correctly.

Root Cause Analysis
The backend copyParagraph is a two-step operation:

1. insertParagraph() → broadcasts PARAGRAPH_ADDED (empty new paragraph, text="%mysql\n")
2. updateParagraph() → broadcasts PARAGRAPH        (full text="%mysql\nselect 1 a, 2 a")

Step 2's PARAGRAPH response gets silently discarded by the frontend shortCircuit mechanism in message.ts. The original filter logic compares the currently-sent message sequence number against the received response's sequence number:

// OLD logic — overly aggressive
if (this.lastMsgIdSeqSent > msgIdSeqReceived) {
  // "message is already updated by shortcircuit" → discard!
  return false;
}

The problem: between sending COPY_PARAGRAPH (seq=49) and receiving its PARAGRAPH response, other unrelated messages like EDITOR_SETTING (seq=50) may be sent. This makes lastMsgIdSeqSent (50) > msgIdSeqReceived (49), causing the legitimate PARAGRAPH response for the cloned paragraph to be incorrectly filtered out.

image

Solution
Replace the implicit sequence-number comparison with an explicit tracking set (shortCircuitedParagraphMsgIds). Only messages that were explicitly passed to shortCircuit() get filtered — not all messages where lastMsgIdSeqSent > receivedSeq.

Updated Solution
Dropping the PARAGRAPH msgId compare filter branch when received msg

What type of PR is it?

Bug Fix

Todos

  • - Task

What is the Jira issue?

How should this be tested?

  • Strongly recommended: add automated unit tests for any new or changed behavior
  • Outline any manual steps to test the PR here.

Screenshots (if appropriate)

Questions:

  • Does the license files need to update?
  • Is there breaking changes for older versions?
  • Does this needs documentation?

@pan3793 pan3793 requested a review from tbonelee May 23, 2026 12:53
Copy link
Copy Markdown
Contributor

@tbonelee tbonelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking this down. I've been going through both the client and server side and wanted to share a few observations worth weighing before landing. I might be missing context on some of these, so please correct me where I'm wrong.

1. The new tracking Set may not be populated under current code paths

The only caller of shortCircuit() I could find is runParagraph() (message.ts:432), which invokes it with op: OP.PARAGRAPH_STATUS and no msgId. The guard in the new shortCircuit():

if (message.op === OP.PARAGRAPH && message.msgId) { ... add to set ... }

doesn't appear to match in that path, so shortCircuitedParagraphMsgIds would stay empty during normal usage. If that's right, the receive-side filter would always pass through, and the runtime effect of this PR becomes "PARAGRAPH filter is bypassed." That does fix clone loss, but the Set / cap / eviction logic could end up being dead in practice. Worth checking whether there's a future caller you have in mind.

2. The 2020 heuristic may have been targeting something different from what it actually dropped

The server emits OP.PARAGRAPH in two regimes:

With echoed msgId (visible to the filter):

  • COMMIT_PARAGRAPH: NotebookServer.java:1095
  • PARAGRAPH_CLEAR_OUTPUT: NotebookServer.java:1241
  • COPY_PARAGRAPH: NotebookServer.java:1464 calls updateParagraph which broadcasts at :1095. This seems to be the clone-loss path.
  • Personalized-mode RUN_PARAGRAPH unicast: NotebookServer.java:1549

Without msgId (bypassed by if (!message.msgId) return true):

  • Paragraph execution lifecycle (PENDING / RUNNING / FINISHED): NotebookServer.java:2008 uses MSG_ID_NOT_DEFINED
  • Background output clears: :1756
  • runAllParagraphs fallback: :1499

Non-personalized runParagraph (NotebookServer.java:1528-1565) doesn't seem to echo OP.PARAGRAPH with msgId from onSuccess. The status transitions arrive via the lifecycle path without msgId and bypass the filter anyway. The server path may have looked different at the time ZEPPELIN-4985 was introduced, so the heuristic might have been more accurate originally.

3. Suggestion: keep shortCircuit, but drop the dedup branch

The instant-PENDING feedback from shortCircuit({op: OP.PARAGRAPH_STATUS, ...}) is routed to onParagraphStatus (paragraph-base.ts:81), so it's independent of the PARAGRAPH filter. And paragraph-base.ts:218 isUpdateRequired() looks like it would absorb at least some of the no-op updates from redundant echoes. Given those two, dropping the PARAGRAPH filter branch entirely might be a simpler fix than rebuilding the dedup on top of the Set.

Roughly -21 / +0 in message.ts:

receive<K extends keyof MessageReceiveDataTypeMap>(op: K): Observable<...> {
  return this.received$.pipe(
    filter(message => message.op === op),
    map(message => message.data)
  ) as Observable<...>;
}

shortCircuit(message: WebSocketMessage<MessageReceiveDataTypeMap>) {
  this.received$.next(this.interceptReceived(message));
}

Plus removing the MAX_SHORT_CIRCUIT_SIZE constant and shortCircuitedParagraphMsgIds field.

4. Local check

With an artificial Thread.sleep on the server's paragraph run path:

  • Clone preserves text correctly
  • PENDING / RUNNING / FINISHED transitions remain instant through the existing shortCircuit(PARAGRAPH_STATUS) path

I might be missing scenarios the original code was trying to handle, so feel free to push back if anything here doesn't add up.

@tbonelee
Copy link
Copy Markdown
Contributor

@jongyoul @voidmatcha Could you take a look at this when you get a chance? I only verified the clone preservation and PENDING / RUNNING / FINISHED transition scenarios locally, and there might be other cases the original filter was trying to handle that I didn't think to check.

@kevinjmh
Copy link
Copy Markdown
Member Author

@tbonelee thanks for your opinion, that's more clear

@kevinjmh kevinjmh changed the title [ZEPPELIN-6419] Fix clone paragraph content loss caused by overly aggressive shortCircuit seq filtering [ZEPPELIN-6419] Fix clone paragraph content loss caused by shortCircuit seq filtering May 26, 2026
Comment thread zeppelin-web-angular/projects/zeppelin-sdk/src/message.ts
Co-authored-by: ChanHo Lee <chanho0325@gmail.com>
Copy link
Copy Markdown
Contributor

@voidmatcha voidmatcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more correct than my component-side workaround in #5044, so I'll close that once this merges.

On the "other cases the filter handled": it only touches OP.PARAGRAPH with a msgId (clone / personalized run / COMMIT_PARAGRAPH self-echo / PARAGRAPH_CLEAR_OUTPUT). I verified clone and personalized. The rest fall through to isUpdateRequired(), and the PENDING/RUNNING/FINISHED lifecycle never hit this filter. Looks good.

0_COMPARISON_clone_master-asis-tobe.mp4

Cursor focus on clone is out of scope here, but can be applied on top by cherry-picking voidmatcha/zeppelin@74e01edbf.

@tbonelee tbonelee merged commit 1bcd871 into apache:master Jun 3, 2026
18 of 19 checks passed
tbonelee pushed a commit that referenced this pull request Jun 3, 2026
…it seq filtering

### What is this PR for?
**Problem**
When cloning a paragraph in zeppelin-web-angular, the cloned paragraph only contains the interpreter binding (e.g., %mysql) but not the actual editor content (e.g., select 1 a, 2 a). The old zeppelin-web handles this correctly.

**Root Cause Analysis**
The backend copyParagraph is a two-step operation:

```
1. insertParagraph() → broadcasts PARAGRAPH_ADDED (empty new paragraph, text="%mysql\n")
2. updateParagraph() → broadcasts PARAGRAPH        (full text="%mysql\nselect 1 a, 2 a")
```
Step 2's PARAGRAPH response gets silently discarded by the frontend shortCircuit mechanism in message.ts. The original filter logic compares the currently-sent message sequence number against the received response's sequence number:

```ts
// OLD logic — overly aggressive
if (this.lastMsgIdSeqSent > msgIdSeqReceived) {
  // "message is already updated by shortcircuit" → discard!
  return false;
}
```
The problem: between sending COPY_PARAGRAPH (seq=49) and receiving its PARAGRAPH response, other unrelated messages like EDITOR_SETTING (seq=50) may be sent. This makes lastMsgIdSeqSent (50) > msgIdSeqReceived (49), causing the legitimate PARAGRAPH response for the cloned paragraph to be incorrectly filtered out.

<img width="807" height="951" alt="image" src="https://github.com/user-attachments/assets/3395e3eb-352c-45f7-96ca-68d3d3f5a983" />

~~**Solution**
Replace the implicit sequence-number comparison with an explicit tracking set (shortCircuitedParagraphMsgIds). Only messages that were explicitly passed to shortCircuit() get filtered — not all messages where lastMsgIdSeqSent > receivedSeq.~~

**Updated Solution**
Dropping the PARAGRAPH msgId compare filter branch when received msg

### What type of PR is it?
Bug Fix

### Todos
* [ ] - Task

### What is the Jira issue?
* Open an issue on Jira https://issues.apache.org/jira/browse/ZEPPELIN/
* Put link here, and add [ZEPPELIN-*Jira number*] in PR title, eg. [ZEPPELIN-533]

### How should this be tested?
* Strongly recommended: add automated unit tests for any new or changed behavior
* Outline any manual steps to test the PR here.

### Screenshots (if appropriate)

### Questions:
* Does the license files need to update?
* Is there breaking changes for older versions?
* Does this needs documentation?

Closes #5254 from kevinjmh/ZEPPELIN-6419.

Signed-off-by: ChanHo Lee <chanholee@apache.org>
(cherry picked from commit 1bcd871)
Signed-off-by: ChanHo Lee <chanholee@apache.org>
@tbonelee
Copy link
Copy Markdown
Contributor

tbonelee commented Jun 3, 2026

Thanks for the fix. I've Merged this into master and branch-0.12

@tbonelee
Copy link
Copy Markdown
Contributor

tbonelee commented Jun 3, 2026

@voidmatcha Thanks for the improvement suggestion. Could you open a separate PR which handles that improvement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants