Skip to content

Wasm funclet RA#126445

Draft
AndyAyersMS wants to merge 21 commits intodotnet:mainfrom
AndyAyersMS:WasmFuncletRA
Draft

Wasm funclet RA#126445
AndyAyersMS wants to merge 21 commits intodotnet:mainfrom
AndyAyersMS:WasmFuncletRA

Conversation

@AndyAyersMS
Copy link
Copy Markdown
Member

Implement register allocation for funclets, along with some funclet codegen.

Copilot AI review requested due to automatic review settings April 2, 2026 02:03
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 2, 2026
@AndyAyersMS
Copy link
Copy Markdown
Member Author

FYI @dotnet/jit-contrib

Still a work in progress, but figured I'd ask for some feedback before getting too much further. Things I know of that are still to do:

  • We may need to handle case where an allocatable local is not live across a funclet boundary and ends up in different registers in different regions. Currently for STORE_LCL_VAR we always take the reg from the lcl var.
  • ABI classifier type support for funclet params?
  • Check if typeless catch won't actually get an ex obj from runtime (I suppose we can pass nullptr if necessary)
  • Codegen is still off -- there is a bogus block before funclet start, and no return
  • This won't validate (I'm pretty sure) until the jit host extracts the funclets into separate functions.

@SingleAccretion this may be more complex than what you had envisioned.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the WASM JIT backend to support funclets by introducing per-funclet (region) register allocation state, per-funclet locals declarations, and basic funclet prolog codegen so EH regions can be code-generated with their own Wasm-local layouts.

Changes:

  • Add per-funclet RA state (SP/FP/EX + reference tracking) and resolve/register publish logic per region.
  • Teach codegen/emitter to use per-funclet SP/FP and emit per-funclet local declarations / funclet prologs.
  • Update internal-register tracking to be per-funclet on targets without a fixed register set (WASM).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/coreclr/jit/regallocwasm.h Introduces per-region RA state (SP/FP/EX + ref tracking) and new catch-arg collection hook.
src/coreclr/jit/regallocwasm.cpp Implements per-funclet allocation, catch-arg rewriting, per-funclet internal-reg tables, and per-funclet locals decl generation/publishing.
src/coreclr/jit/lclvars.cpp Adjusts frame-location dumping to use region-0 SP/FP on non-fixed-reg targets.
src/coreclr/jit/emitwasm.cpp Extends wasm arg-count computation to support funclet signatures.
src/coreclr/jit/codegenwasm.cpp Uses per-funclet SP/FP accessors and implements funclet prolog local signature emission.
src/coreclr/jit/codegeninterface.h Makes SP/FP storage and internal-reg tracking per-funclet for non-fixed-reg targets; makes locals decls per-funclet.
src/coreclr/jit/codegencommon.cpp Implements non-fixed-reg per-funclet NodeInternalRegisters tables and initializes per-funclet SP/FP/locals storage.

Comment on lines +97 to +105
@@ -91,6 +98,11 @@ void WasmRegAlloc::IdentifyCandidates()
{
AllocateFramePointer();
}

if (m_compiler->compFuncInfoCount > 1)
{
AllocateExceptionPointer();
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IdentifyCandidates only allocates the frame pointer when there are frame locals or localloc. However, ResolveReferences’ funclet path unconditionally reserves two i32 parameters (sp/fp) and will underflow DeclaredCount if FP wasn’t allocated (e.g., EH with no frame locals). Ensure AllocateFramePointer runs whenever compFuncInfoCount > 1 (or otherwise make the funclet arg bookkeeping conditional on actually allocated FP state).

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +230
//------------------------------------------------------------------------
// GetTable: get the internal register table for nodes in this funclet region
// or create it if it does not yet exist.
//
// Parameters:
// funcletIndex - Index of the funclet
//
// Returns:
// Pointer to the internal register table.
//
NodeInternalRegistersTable* NodeInternalRegisters::GetTable(unsigned funcletIndex)
{
assert(funcletIndex < m_tables.size());
NodeInternalRegistersTable* table = m_tables[funcletIndex];
return table;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on GetTable says it will 'or create it if it does not yet exist', but GetTable currently does not create and may return nullptr. Update the comment (or rename to clarify the contract) to avoid misleading future callers.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a lot like what I was thinking about too.

Left some high-level feedback.

// Arguments:
// catchArg - The CATCH_ARG node
//
void WasmRegAlloc::CollectReferencesForCatchArg(GenTree* catchArg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary. We could just hardcode the CATCH_ARG to be local.get 3 in codegen. It would also remove the need for the virtual reg, we would just need to set the indexBase correctly in allocation (which is already done).

Comment on lines +939 to +940
// TODO-WASM: this may no longer hold with funclets. If a local has disjoint lifetimes that do not
// cross funclet boundaries, it may need to be allocated to different registers in different funclets.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is now definitely a concern. Iterate through the functions backwards, so we can detect this case, reset lvRegister, and still end up with the "first" def in SetRegNum?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea.

{
regNumber reg = AllocateTemporaryRegister(type);
m_codeGen->internalRegisters.Add(node, reg);
m_codeGen->internalRegisters.Add(m_currentFunclet, node, reg);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be leaking RA impl details into the codegen map like this.

I think we shouldn't be trying to partition the internal regs to be per-funclet while the virtual regs are still global.

For only rewriting the internal regs belonging to a funclet, we can add the node m_virtualRegRefs and work off of that instead of the codegen map directly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting

  • call CollectReferences on nodes with internal registers
  • in the VirtualRegReferences processing loop during resolution, if a node has internal registers, reassign those to physical regs

or something else?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what you're suggesting?

Yep.

if (varDsc->lvIsParam || varDsc->lvIsParamRegTarget)
auto allocPhysReg = [&](regNumber virtReg, LclVarDsc* varDsc) {
regNumber physReg;
if ((varDsc != nullptr) && varDsc->lvIsRegArg && !varDsc->lvIsStructField)
Copy link
Copy Markdown
Contributor

@SingleAccretion SingleAccretion Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this will try to assign the ABI registers to candidates live inside a funclet (but not across funclets).

We can make the logic more easily correct by construction by doing something like:

physReg = REG_NA;
if (<main function>) {
   // Current logic with varDsc->lvIsRegArg etc
} else {
   if (virtReg == spVirtReg) assign 0
   if (virtReg == fpVirtReg) assign 1
}
if (physReg == REG_NA)
   <assign from the stack>

and sharing the allocPhysReg part below between the main function and funclets.

jitstd::vector<WasmLocalsDecl> WasmLocalsDecls;
// Per-funclet vectors of local declarations
//
jitstd::vector<jitstd::vector<WasmLocalsDecl>*> WasmLocalsDecls;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More traditionally this kind of data would hang off of FuncInfoDsc (here and for sp/fp).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it there initially, don't remember why I moved it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require moving WasmLocalsDecl to an un-nested class, but that's fine...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it.

@AndyAyersMS
Copy link
Copy Markdown
Member Author

@SingleAccretion 90066eb should cover most of your feedback except internal regs.

@am11 am11 added the arch-wasm WebAssembly architecture label Apr 2, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly small comments left from me. But we also need to be a bit more careful with lvRegNum mutations.

Copilot AI review requested due to automatic review settings April 2, 2026 20:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

@AndyAyersMS
Copy link
Copy Markdown
Member Author

Hopefully closer now...

@SingleAccretion in the future maybe just point out one instance of a recurring pattern? Otherwise stuff may get overlooked.

@AndyAyersMS
Copy link
Copy Markdown
Member Author

AndyAyersMS commented Apr 2, 2026

There's still an issue where a node can be collected twice, eg if it both needs a register and also has internal registers, and double processing a collected node is not good.

We could have the collector mark nodes perhaps, use this to avoid a double collection, and unmark them when we process them later.

Also RequestInternalRegister can be called multiple times for the same node.

Copilot AI review requested due to automatic review settings April 2, 2026 23:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Copilot AI review requested due to automatic review settings April 3, 2026 14:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/coreclr/jit/codegenwasm.cpp:2210

  • genCodeForLclVar uses lvIsRegCandidate() to decide between shadow-stack load vs local_get of the enregistered Wasm local, and in the candidate case it reads the register from varDsc->GetRegNum(). With per-funclet RA, a single global regNum on LclVarDsc is insufficient (funclets have independent local index spaces), so this will pick up the root region’s assignment after RA finishes. To avoid mis-codegen, use a per-funclet mapping (indexed by funCurrentFuncIdx) or rewrite LCL_VAR nodes to carry the physical local to use (e.g., via GT_PHYSREG/gtSrcReg) during RA.
    // targets, hence "WasmProduceReg" is only called for non-candidates.
    if (!varDsc->lvIsRegCandidate())
    {
        var_types type = varDsc->GetRegisterType(tree);
        GetEmitter()->emitIns_I(INS_local_get, EA_PTRSIZE, GetFramePointerRegIndex());
        GetEmitter()->emitIns_S(ins_Load(type), emitTypeSize(type), tree->GetLclNum(), 0);
        WasmProduceReg(tree);
    }
    else
    {
        assert(genIsValidReg(varDsc->GetRegNum()));
        unsigned wasmLclIndex = WasmRegToIndex(varDsc->GetRegNum());
        GetEmitter()->emitIns_I(INS_local_get, emitTypeSize(tree), wasmLclIndex);

return;
}

node->gtLIRFlags |= LIR::Flags::VirtualRefsCollected;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new LIR flag just for this one place is a heavy measure.

Since we collect the nodes in order, just checking if the last node is being added again I think would be enough.

// TODO-WASM: this will lead to incorrect debug info in funclets. We may need to track per funclet
// assignments somewhere.
//
varDsc->SetRegNum(physReg);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand on this a bit, my idea would be to have a vector local to this method where we store the physical regs, and do this if ((varDsc != nullptr) && varDsc->lvIsRegCandidate()) block at the end of the method where we iterate over this vector (they would be main method regs at that point).

@AndyAyersMS
Copy link
Copy Markdown
Member Author

GH doesn't seem to be showing one of my inline replies, so let me paste it here too...

https://github.com/dotnet/runtime/pull/126445/changes#r3033433857

Basically if we have divergent allocations for a local we should not be setting the reg num on the local var, and codegen should not be looking at the local var for reg nums in most (all) cases.

Copilot AI review requested due to automatic review settings April 3, 2026 22:22
@AndyAyersMS
Copy link
Copy Markdown
Member Author

I implemented recordVarLocationsAtStartOfBB. Just full tracked-sized vectors for now since we assign regs to all the reg candidate locals whether they appear or not in the funclet. Resetting to the virt reg nums at the end of the resolution loop.

Also realized we were being too aggressive with handler-crossing locals.

I haven't addressed the extra LIR flag.

I suppose we can go back to resolving things in forward funclet order, since now none of the LclVarDsc state carries over to codegen. It all gets updated when codegen does its own walk.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Comment on lines +843 to +868
for (unsigned argLclNum = 0; argLclNum < m_compiler->info.compArgsCount; argLclNum++)
{
const ABIPassingInformation& abiInfo = m_compiler->lvaGetParameterABIInfo(argLclNum);
for (const ABIPassingSegment& segment : abiInfo.Segments())
{
if (segment.IsPassedInRegister())
{
WasmValueType argType;
regNumber argReg = segment.GetRegister();
unsigned argIndex = UnpackWasmReg(argReg, &argType);
indexBase = max(indexBase, argIndex + 1);

LclVarDsc* argVarDsc = m_compiler->lvaGetDesc(argLclNum);
if ((argVarDsc->GetRegNum() == argReg) || (data->m_spReg == argReg))
{
assert(abiInfo.HasExactlyOneRegisterSegment());
virtToPhysRegMap[static_cast<unsigned>(argType)].DeclaredCount--;
}

const ParameterRegisterLocalMapping* mapping =
m_compiler->FindParameterRegisterLocalMappingByRegister(argReg);
if ((mapping != nullptr) &&
(m_compiler->lvaGetDesc(mapping->LclNum)->GetRegNum() == argReg))
{
virtToPhysRegMap[static_cast<unsigned>(argType)].DeclaredCount--;
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the root FuncKind case, DeclaredCount is decremented only when argVarDsc->GetRegNum() == argReg (and similarly for param-reg-target mappings). At this point GetRegNum() is still the virtual reg assigned by IdentifyCandidates, while argReg is the ABI regNumber allocated in targetwasm.cpp using a global m_localIndex++. These generally won’t compare equal, so DeclaredCount won’t be reduced for register args/param targets, causing extra (unused) locals to be declared and shifting the per-type IndexBase calculations unnecessarily. Consider decrementing based on the local’s properties (e.g., lvIsRegCandidate()+lvIsRegArg / lvIsParamRegTarget) rather than equality of virtual reg numbers to ABI reg numbers.

Copilot uses AI. Check for mistakes.
for (unsigned varIndex = 0; varIndex < m_compiler->lvaTrackedCount; varIndex++)
{
LclVarDsc* varDsc = m_compiler->lvaGetDescByTrackedIndex(varIndex);
varDsc->SetRegNum(m_virtualRegAssignments[varIndex]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to do more work than necessary. We could just leave the virtual regs on the LclVarDscs and assign them all in one go at the end based on the physical regs from ROOT_FUNC_IDX.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work, but we'll need to tweak the GT_STORE_LCL_VAR case in virtual ref processing to look at the current map instead of the local var dsc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost works. genGeneratePrologsAndEpilogs rewinds things back to the main method state.

}

#if defined(TARGET_WASM)
// Wasm RA currently does not support EH write-thru, so any local live in or out
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need for ifdefs if we move this condition into IdentifyCandidates.

// Register assignments only change at funclet boundaries
//
bool const isFuncletEntry = m_compiler->bbIsFuncletBeg(bb);
bool const isFuncEntry = m_compiler->fgFirstBB == bb;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need isFuncEntry? The 'initial' lvRegNum assignment is already based on the root function.

siUpdateVariableLiveRange looks to be missing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit odd to leave this ambient state laying around without being able to double-check where it came from, but I suppose it's not that bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-wasm WebAssembly architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants