Skip to content

Fix data race on nativeWindowsToRedraw across threads#630

Closed
primesoftnz wants to merge 1 commit intohighperformancecoder:masterfrom
primesoftnz:fix/nativewindows-data-race
Closed

Fix data race on nativeWindowsToRedraw across threads#630
primesoftnz wants to merge 1 commit intohighperformancecoder:masterfrom
primesoftnz:fix/nativewindows-data-race

Conversation

@primesoftnz
Copy link
Contributor

@primesoftnz primesoftnz commented Mar 20, 2026

Summary

nativeWindowsToRedraw (std::set<RenderNativeWindow*>) is accessed from multiple threads without synchronisation:

  • requestRedraw() inserts from any thread (via NativeSurface callback)
  • ~RenderNativeWindow() erases during model operations
  • drawNativeWindows() iterates and clears on the minsky thread
  • macOSXLaunchDrawNativeWindows() reads .size() on the minsky thread

This is a data race on std::set that can corrupt the container's internal tree, potentially leading to crashes or infinite loops during iteration.

Relationship to tickets:#1906

Not certain this addresses Steve's reported freeze. The SetWindowPos removal in 3.25.1 likely addresses the primary freeze mechanism. This fix closes a correctness issue that could cause problems under load (complex models with many native windows creating tight timing between requestRedraw and drawNativeWindows), but may not be the root cause of the reported interface freeze.

Submitting separately as a correctness fix regardless.

Approach

Add a dedicated nativeWindowsToRedrawMutex (in MinskyExclude) to protect all accesses. The draw functions swap the set into a local copy under the new mutex, then release it before acquiring minskyCmdMutex — preserving existing lock scope and avoiding lock ordering issues.

This follows the swap-then-iterate pattern already used in macOSXDrawNativeWindows().

Lock ordering safety

nativeWindowsToRedrawMutex is never held simultaneously with minskyCmdMutex:

  • drawNativeWindows(): acquires nativeWindowsToRedrawMutex (swap), releases it, then acquires minskyCmdMutex (draw)
  • requestRedraw() / ~RenderNativeWindow(): acquire only nativeWindowsToRedrawMutex
  • If doCommand() triggers ~RenderNativeWindow() while holding minskyCmdMutex, the inner lock on nativeWindowsToRedrawMutex is safe (consistent minskyCmdMutexnativeWindowsToRedrawMutex ordering)

Minor fix included

macOSXDrawNativeWindows() previously called nativeWindowsToRedraw.clear() after swapping the set out, which discarded any redraw requests made during the draw cycle. Removed — the swap already empties the original set.

Files changed

  • model/minsky.h — add std::mutex nativeWindowsToRedrawMutex to MinskyExclude
  • model/renderNativeWindow.cc — lock mutex in requestRedraw() and ~RenderNativeWindow()
  • RESTService/addon.cc — swap-under-lock in drawNativeWindows(), macOSXDrawNativeWindows(), and size check in macOSXLaunchDrawNativeWindows()

This change is Reviewable

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced thread safety for native window redraw operations on macOS to prevent race conditions and improve rendering stability.

nativeWindowsToRedraw (std::set<RenderNativeWindow*>) is accessed from
multiple threads without synchronisation:

- requestRedraw() inserts from any thread (via NativeSurface callback)
- ~RenderNativeWindow() erases during model operations
- drawNativeWindows() iterates and clears on the minsky thread
- macOSXLaunchDrawNativeWindows() reads .size() on the minsky thread

This is a data race on std::set that can corrupt the container's
internal tree, potentially leading to crashes or infinite loops during
iteration.

Whether this contributes to the freeze reported in tickets:#1906 is
unclear — the SetWindowPos removal in 3.25.1 likely addresses the
primary mechanism. This fix closes a correctness issue that could
cause problems under load (complex models with many native windows),
but may not be the root cause of Steve's reported freeze.

Fix: add a dedicated mutex (nativeWindowsToRedrawMutex) to protect all
accesses. The draw functions swap the set into a local copy under the
new mutex, then release it before acquiring minskyCmdMutex — preserving
existing lock scope and avoiding lock ordering issues.

Also fixes a minor existing bug where macOSXDrawNativeWindows() cleared
nativeWindowsToRedraw AFTER swapping it out, discarding any redraw
requests made during the draw cycle.

Related to tickets:#1906
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

The PR introduces mutex-protected access to the native window redraw queue. It adds a new std::mutex member to coordinate redraw requests, updates RenderNativeWindow to acquire the lock when managing redraw state, and refactors drawing logic to use atomic snapshots instead of direct shared container access.

Changes

Cohort / File(s) Summary
Synchronization Infrastructure
model/minsky.h
Added nativeWindowsToRedrawMutex member to MinskyExclude class and included <mutex> header for coordination of native window redraw requests.
Client-side Redraw Requests
model/renderNativeWindow.cc
Updated destructor and requestRedraw() method to acquire nativeWindowsToRedrawMutex before modifying the shared nativeWindowsToRedraw container.
Drawing Logic & Launch Coordination
RESTService/addon.cc
Refactored drawNativeWindows() to snapshot nativeWindowsToRedraw via mutex-protected swap into a local variable; updated macOSXDrawNativeWindows() to use the same pattern and removed explicit clear calls; modified main run loop to lock mutex before checking redraw queue size during draw launch trigger.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • #576: Modifies synchronization around native window redraws using mutexes to prevent deadlock conditions during redraw enqueuing and processing.

Suggested reviewers

  • highperformancecoder

Poem

🐰 A rabbit's ode to synchronized hops:

Locks protect our redraw queue,
No deadlock tangles, just locks that are true,
Snapshots swap safely, each draw gets its turn,
Synchronized windows—the lesson we learn!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely captures the primary change: fixing a data race on the nativeWindowsToRedraw container accessed across multiple threads.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model/renderNativeWindow.cc`:
- Around line 107-111: requestRedraw() reads winInfoPtr without the same lock
used by renderFrame()/destroyFrame(), introducing a race; fix by taking the same
drawMutex (the mutex used around renderFrame/destroyFrame) before accessing or
snapshotting winInfoPtr inside requestRedraw(), then use that guarded shared_ptr
(or its snapshot) when deciding to queue the window into
minsky().nativeWindowsToRedraw; keep the later lock on
minsky().nativeWindowsToRedrawMutex for the set insertion and ensure you
reference the same symbols: requestRedraw(), winInfoPtr, drawMutex,
renderFrame(), destroyFrame(), nativeWindowsToRedraw,
nativeWindowsToRedrawMutex, and minsky() when making the change.

In `@RESTService/addon.cc`:
- Around line 255-264: The snapshot copies raw RenderNativeWindow* pointers into
a local vector in drawNativeWindows which can dangle if a window is destroyed
concurrently; fix by making the queue hold owning/weak references (e.g. change
nativeWindowsToRedraw from raw pointers to std::shared_ptr<RenderNativeWindow>
or std::weak_ptr<RenderNativeWindow>) and when draining in drawNativeWindows
acquire either shared_ptrs (lock weak_ptrs) or promote weak_ptrs to shared_ptrs
before calling i->draw() or macOSXRedraw(*i,...). Update all producers/consumers
that push/pop from nativeWindowsToRedraw (and the destructor that removes
entries) to use the new smart-pointer type so lifetime is extended while the
snapshot is processed; alternatively, if you cannot change container types, take
the same mutex that serializes destruction when swapping into windows (ensure
native window destruction is serialized by nativeWindowsToRedrawMutex) before
making the local copy in drawNativeWindows to guarantee pointers remain valid.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 649cd5bc-1756-42b9-b8a8-ad424d3560b1

📥 Commits

Reviewing files that changed from the base of the PR and between 967bc63 and 46781ad.

📒 Files selected for processing (3)
  • RESTService/addon.cc
  • model/minsky.h
  • model/renderNativeWindow.cc

Comment on lines 107 to 111
void RenderNativeWindow::requestRedraw()
{
if (!winInfoPtr.get()) return;
const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex);
minsky().nativeWindowsToRedraw.insert(this);
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

winInfoPtr is still raced here.

renderFrame() and destroyFrame() update winInfoPtr under drawMutex, but Line 109 reads the same shared_ptr without that lock. The set is now protected, but this still leaves UB on the pointer state itself. Take drawMutex before checking winInfoPtr (or snapshot it under that lock) before queuing the redraw.

Possible fix
 void RenderNativeWindow::requestRedraw()
 {
-    if (!winInfoPtr.get()) return;
+    {
+      const lock_guard lock(drawMutex);
+      if (!winInfoPtr) return;
+    }
     const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex);
     minsky().nativeWindowsToRedraw.insert(this);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model/renderNativeWindow.cc` around lines 107 - 111, requestRedraw() reads
winInfoPtr without the same lock used by renderFrame()/destroyFrame(),
introducing a race; fix by taking the same drawMutex (the mutex used around
renderFrame/destroyFrame) before accessing or snapshotting winInfoPtr inside
requestRedraw(), then use that guarded shared_ptr (or its snapshot) when
deciding to queue the window into minsky().nativeWindowsToRedraw; keep the later
lock on minsky().nativeWindowsToRedrawMutex for the set insertion and ensure you
reference the same symbols: requestRedraw(), winInfoPtr, drawMutex,
renderFrame(), destroyFrame(), nativeWindowsToRedraw,
nativeWindowsToRedrawMutex, and minsky() when making the change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think CodeRabbit is right here. Checking for existence of winInfoPtr is simply a performance shortcut to avoid unnecessary work. In the actual redraw method, if winInfoPtr is null, then the method is aborted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@highperformancecoder, that's a fair point — the guard in draw() (if (!winInfoPtr.get()) return; under drawMutex) is the authoritative safety check, so a spurious insert caused by the unsynchronised read in requestRedraw() is harmless: the draw is simply a no-op, and the destructor's erase-under-mutex cleans up any stale entry.

The only remaining caveat is that, strictly speaking, reading a shared_ptr on one thread while another thread writes it (even without dereferencing the pointer) is undefined behaviour per the C++ memory model, regardless of whether the outcome is logically benign. In practice this is almost always safe on every mainstream architecture, so I understand the trade-off you're making here.

Acknowledged — I'll drop this concern.


✏️ Learnings added
Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: model/renderNativeWindow.cc:107-111
Timestamp: 2026-03-21T02:16:02.691Z
Learning: In `minsky::RenderNativeWindow::requestRedraw()` (`model/renderNativeWindow.cc`), the `if (!winInfoPtr.get()) return;` guard is an intentional performance shortcut only, not a safety guard. The authoritative null-check for `winInfoPtr` is in `RenderNativeWindow::draw()`, which is already protected by `drawMutex`. A spurious insert into `nativeWindowsToRedraw` caused by the unsynchronised read is harmless because `draw()` aborts early and `~RenderNativeWindow()` erases the stale entry under `nativeWindowsToRedrawMutex`. Do not flag this pattern as a data-race bug.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines 255 to +264
void drawNativeWindows()
{
decltype(nativeWindowsToRedraw) windows;
{
const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);
windows.swap(nativeWindowsToRedraw);
}
const lock_guard<recursive_mutex> lock(minskyCmdMutex);
const Timer timer(timers["draw"]);
for (auto i: nativeWindowsToRedraw)
for (auto i: windows)
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The snapshot can still outlive the window objects.

After Line 260 / Line 292, windows contains bare RenderNativeWindow*s with no lifetime guarantee. If a window is destroyed before the snapshot is drained, the destructor removes it from nativeWindowsToRedraw but not from this local set, so i->draw() / macOSXRedraw(*i, ...) can dereference freed memory. Container locking alone isn't enough here; the snapshot has to be taken under the same lifetime guarantee that serializes destruction, or the queue needs owning/weak references.

Also applies to: 285-295

🧰 Tools
🪛 Clang (14.0.6)

[error] 255-255: method 'drawNativeWindows' can be made static

(readability-convert-member-functions-to-static,-warnings-as-errors)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@RESTService/addon.cc` around lines 255 - 264, The snapshot copies raw
RenderNativeWindow* pointers into a local vector in drawNativeWindows which can
dangle if a window is destroyed concurrently; fix by making the queue hold
owning/weak references (e.g. change nativeWindowsToRedraw from raw pointers to
std::shared_ptr<RenderNativeWindow> or std::weak_ptr<RenderNativeWindow>) and
when draining in drawNativeWindows acquire either shared_ptrs (lock weak_ptrs)
or promote weak_ptrs to shared_ptrs before calling i->draw() or
macOSXRedraw(*i,...). Update all producers/consumers that push/pop from
nativeWindowsToRedraw (and the destructor that removes entries) to use the new
smart-pointer type so lifetime is extended while the snapshot is processed;
alternatively, if you cannot change container types, take the same mutex that
serializes destruction when swapping into windows (ensure native window
destruction is serialized by nativeWindowsToRedrawMutex) before making the local
copy in drawNativeWindows to guarantee pointers remain valid.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code, minskyCmdMutex prevents the RenderNativeWindow object from being destroyed whilst drawing happens. In the new code, that interlock is broken, and what CodeRabbit mentions can happen.

In Windows and X11 builds (ie Linux), calling draw on the minsky thread is possible, because a) the Windows implementation renders to a backing buffer, and blitted later to the actual window on that windows thread (in response to a WM_PAINT event), and b) X11 calls are properly thread safe, they can be called on any thread due to X11's client server architecture.

MacOSX is different, as Carbon API calls must be called on the Windows thread, hence the hack of passing through the mutexes down to the object handling the redraw.

This exact change was proposed by an AI agent (Sonnet 4.6, IIRC) only a week or so ago, and after discussion with it, it withdrew the change. I'm still not convinced that it is necessary, and as CodeRabbit points out, actually introduces problems if not done correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@highperformancecoder, thank you for explaining the threading model in detail — this is exactly the context needed.

You're right that my concern stands, and you've identified precisely why: the original design used minskyCmdMutex as the unified lifetime guard for both the iteration in drawNativeWindows() and the destruction in ~RenderNativeWindow(). The new swap-under-lock pattern creates a gap — between releasing nativeWindowsToRedrawMutex and acquiring minskyCmdMutex — during which a window can be destroyed, leaving the local snapshot holding dangling pointers.

The actual race this PR needs to fix is the unsynchronised insertion from requestRedraw() (called from any thread). The minimal-risk correction would be to keep minskyCmdMutex as the primary guard and simply reorder the locking in drawNativeWindows() so the swap happens inside the minskyCmdMutex scope:

void drawNativeWindows()
{
  const lock_guard<recursive_mutex> lock(minskyCmdMutex); // lifetime guard first
  decltype(nativeWindowsToRedraw) windows;
  {
    const lock_guard<mutex> lock2(nativeWindowsToRedrawMutex);
    windows.swap(nativeWindowsToRedraw);
  }
  const Timer timer(timers["draw"]);
  for (auto i: windows)
    try { i->draw(); }
    catch (const std::exception& ex) { cerr << ex.what() << endl; break; }
    catch (...) { break; }
}

This way destruction is still serialised by minskyCmdMutex (as before), and nativeWindowsToRedrawMutex only needs to protect the brief swap window against concurrent requestRedraw() insertions. The macOS path would need the same reordering. Given the history you mentioned (a prior AI agent proposing and then withdrawing this same restructuring), it may be worth reverting the swap-before-lock pattern entirely and only adding nativeWindowsToRedrawMutex in requestRedraw() and ~RenderNativeWindow(), leaving drawNativeWindows() structurally as it was.


✏️ Learnings added
Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-21T02:16:17.113Z
Learning: In Minsky (RESTService/addon.cc), `minskyCmdMutex` is the primary lifetime guard for `RenderNativeWindow*` pointers in `nativeWindowsToRedraw`. It serialises both iteration in `drawNativeWindows()` and destruction in `~RenderNativeWindow()`. On Windows, drawing is safe on the minsky thread (renders to a backing buffer, blitted on WM_PAINT). On Linux/X11, drawing is thread-safe due to X11's client-server architecture. On macOS, Carbon API calls must be made on the window/main thread, so drawing is dispatched via `macOSXRedraw(*i, minskyCmdMutex)` passing the mutex reference down. Any swap-under-a-separate-mutex pattern that releases the lock before acquiring `minskyCmdMutex` breaks the lifetime guarantee and can cause use-after-free.

Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-21T02:16:17.113Z
Learning: In Minsky, the only real data race on `nativeWindowsToRedraw` is unsynchronised insertion from `requestRedraw()` (called from any thread). The correct minimal fix is to add `nativeWindowsToRedrawMutex` only around insertions in `requestRedraw()` and erasures in `~RenderNativeWindow()`, while leaving `drawNativeWindows()` iterating under `minskyCmdMutex` as before (or ensuring the swap happens while `minskyCmdMutex` is already held).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeRabbit It is my understanding that requestRedraw is only ever called on the minsky thread, and so this issue never arises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

Let me check all call sites of requestRedraw() to verify this claim.


🏁 Script executed:

#!/bin/bash
rg -n --type=cpp -B3 -A3 'requestRedraw'

Length of output: 38866


@highperformancecoder, that's an important clarification. Looking at the call sites in the search results, I can see the codebase is already deliberate about this — notably model/minsky.cc Line 1276:

//requestRedraw calls back into TCL, so don't call it from the simulation thread. See ticket `#973`
if (!RKThreadRunning) canvas.requestRedraw();

This shows there is already an explicit guard against calling requestRedraw() from the RK/simulation thread. All other call sites (mouse events, keyboard events, model operations) appear to flow through the minsky command thread.

However, this does conflict with the PR's own stated rationale: "requestRedraw() inserts from any thread (via NativeSurface callback)". Could you clarify whether that description was accurate at the time of writing, or was it mistaken? Specifically, the PlotWidget::requestRedraw() call at model/plotWidget.cc Line 535 (inside what looks like an accumulator/rate-limiter path) — is that also guaranteed to only execute on the minsky thread?

If your current assessment is correct — that requestRedraw() is always on the minsky thread — then the real data race this PR aimed to fix does not actually exist, and more importantly, the swap-before-lock pattern introduced here (releasing minskyCmdMutex before the draw loop) creates a new potential use-after-free that did not exist before. That would make the safest path a full revert of this PR.


✏️ Learnings added
Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-22T22:13:38.469Z
Learning: In Minsky (model/renderNativeWindow.cc, model/minsky.cc), `RenderNativeWindow::requestRedraw()` may only ever be called on the minsky thread (not from the RK/simulation thread). The simulation thread already explicitly avoids calling `requestRedraw()` via the guard `if (!RKThreadRunning) canvas.requestRedraw()` at model/minsky.cc:1276. If this invariant holds everywhere, the data race on `nativeWindowsToRedraw` reported in PR `#630` does not exist, and the swap-before-lock pattern introduced by that PR (releasing `nativeWindowsToRedrawMutex` before acquiring `minskyCmdMutex`) creates a new use-after-free hazard that is worse than the original code.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-21T02:16:17.113Z
Learning: In Minsky (RESTService/addon.cc), `minskyCmdMutex` is the primary lifetime guard for `RenderNativeWindow*` pointers in `nativeWindowsToRedraw`. It serialises both iteration in `drawNativeWindows()` and destruction in `~RenderNativeWindow()`. On Windows, drawing is safe on the minsky thread (renders to a backing buffer, blitted on WM_PAINT). On Linux/X11, drawing is thread-safe due to X11's client-server architecture. On macOS, Carbon API calls must be made on the window/main thread, so drawing is dispatched via `macOSXRedraw(*i, minskyCmdMutex)` passing the mutex reference down. Any swap-under-a-separate-mutex pattern that releases the lock before acquiring `minskyCmdMutex` breaks the lifetime guarantee and can cause use-after-free.

Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: RESTService/addon.cc:255-264
Timestamp: 2026-03-21T02:16:17.113Z
Learning: In Minsky, the only real data race on `nativeWindowsToRedraw` is unsynchronised insertion from `requestRedraw()` (called from any thread). The correct minimal fix is to add `nativeWindowsToRedrawMutex` only around insertions in `requestRedraw()` and erasures in `~RenderNativeWindow()`, while leaving `drawNativeWindows()` iterating under `minskyCmdMutex` as before (or ensuring the swap happens while `minskyCmdMutex` is already held).

Learnt from: highperformancecoder
Repo: highperformancecoder/minsky PR: 630
File: model/renderNativeWindow.cc:107-111
Timestamp: 2026-03-21T02:16:02.691Z
Learning: In `minsky::RenderNativeWindow::requestRedraw()` (`model/renderNativeWindow.cc`), the `if (!winInfoPtr.get()) return;` guard is an intentional performance shortcut only, not a safety guard. The authoritative null-check for `winInfoPtr` is in `RenderNativeWindow::draw()`, which is already protected by `drawMutex`. A spurious insert into `nativeWindowsToRedraw` caused by the unsynchronised read is harmless because `draw()` aborts early and `~RenderNativeWindow()` erases the stale entry under `nativeWindowsToRedrawMutex`. Do not flag this pattern as a data-race bug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I didn't write the PR rationale. @primesoftnz did, so perhaps ey can comment. I feel like this is possibly an AI hallucination, and your analysis backs that up. So modulo a comment from @primesoftnz, I plan to reject this PR in the coming days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Owner

@highperformancecoder highperformancecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this change a) fixes anything, nor b) is implemented correctly.

This is unfortunately a very tricky bit of code, required because of requirements by MacOSX and Windows system of Windowing operations to be called from a specific thread. I'm not claiming what is there currently is ideal, but as far as I can tell it is correct.

Comment on lines 107 to 111
void RenderNativeWindow::requestRedraw()
{
if (!winInfoPtr.get()) return;
const lock_guard<mutex> lock(minsky().nativeWindowsToRedrawMutex);
minsky().nativeWindowsToRedraw.insert(this);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think CodeRabbit is right here. Checking for existence of winInfoPtr is simply a performance shortcut to avoid unnecessary work. In the actual redraw method, if winInfoPtr is null, then the method is aborted.

Comment on lines 255 to +264
void drawNativeWindows()
{
decltype(nativeWindowsToRedraw) windows;
{
const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);
windows.swap(nativeWindowsToRedraw);
}
const lock_guard<recursive_mutex> lock(minskyCmdMutex);
const Timer timer(timers["draw"]);
for (auto i: nativeWindowsToRedraw)
for (auto i: windows)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code, minskyCmdMutex prevents the RenderNativeWindow object from being destroyed whilst drawing happens. In the new code, that interlock is broken, and what CodeRabbit mentions can happen.

In Windows and X11 builds (ie Linux), calling draw on the minsky thread is possible, because a) the Windows implementation renders to a backing buffer, and blitted later to the actual window on that windows thread (in response to a WM_PAINT event), and b) X11 calls are properly thread safe, they can be called on any thread due to X11's client server architecture.

MacOSX is different, as Carbon API calls must be called on the Windows thread, hence the hack of passing through the mutexes down to the object handling the redraw.

This exact change was proposed by an AI agent (Sonnet 4.6, IIRC) only a week or so ago, and after discussion with it, it withdrew the change. I'm still not convinced that it is necessary, and as CodeRabbit points out, actually introduces problems if not done correctly.

decltype(nativeWindowsToRedraw) windows;
{
const lock_guard<recursive_mutex> lock(minskyCmdMutex);
const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real issue is that calling draw() cannot happen simultaneously to a model update, as draw may happen when model (with all its shared pointers) is in an inconsistent state. The point of minskyCmdMutex is to prevent that occurrence.

if (!drawLaunched)
{
const lock_guard<mutex> lock(nativeWindowsToRedrawMutex);
if (nativeWindowsToRedraw.size())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a problem in the original code, as nativeWindowsToRedraw is only changed on the minsky thread.

@primesoftnz
Copy link
Contributor Author

Russell, you're right — I got the threading model wrong. requestRedraw() is minsky-thread only (the !RKThreadRunning guard at minsky.cc:1276 makes that explicit), so the data race I described doesn't exist. Worse, the swap-before-lock pattern I introduced breaks the minskyCmdMutex lifetime guarantee you rely on for safe iteration. Net negative.

Closing this. Thank you for the detailed walkthrough of the threading architecture — the Windows backing buffer / X11 client-server / macOS Carbon distinction is exactly the kind of context that's hard to derive from source alone, and I've learned a lot from this exchange.

One reflection from the process: the Ravel addon binary is where a lot of the concurrency complexity materialises, and without visibility into both sides of the minskyRESTService boundary, AI tooling is working with an incomplete picture of threading invariants. Both this PR and the earlier Sonnet 4.6 attempt hit the same wall. As AI-assisted contribution becomes more common, full source visibility will make a real difference to the quality of what gets submitted.

@highperformancecoder
Copy link
Owner

highperformancecoder commented Mar 23, 2026 via email

@primesoftnz
Copy link
Contributor Author

The Kernighan quote is spot on. Worth noting — when we were investigating the freeze for this PR, the AI's first structural observation was that the threading complexity likely stemmed from a single-threaded architecture that was later retrofitted for multi-threading. Your confirmation of exactly that is both validating and a good illustration of where AI analysis adds value: pattern recognition across a large codebase, even when it lacks the full context to get the fix right.

Greg and I hope you'll join us tonight for the Skool demo — 6pm Sydney / 8pm Auckland. We're showing how AI-driven model building works with Ravel via the REST API, including the PlotWidget fix from PR #629. Would be great to have you there.

@highperformancecoder
Copy link
Owner

highperformancecoder commented Mar 23, 2026 via email

@primesoftnz
Copy link
Contributor Author

primesoftnz commented Mar 23, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants