Conversation
|
FYI @dotnet/jit-contrib Still a work in progress, but figured I'd ask for some feedback before getting too much further. Things I know of that are still to do:
@SingleAccretion this may be more complex than what you had envisioned. |
There was a problem hiding this comment.
Pull request overview
This PR extends the WASM JIT backend to support funclets by introducing per-funclet (region) register allocation state, per-funclet locals declarations, and basic funclet prolog codegen so EH regions can be code-generated with their own Wasm-local layouts.
Changes:
- Add per-funclet RA state (SP/FP/EX + reference tracking) and resolve/register publish logic per region.
- Teach codegen/emitter to use per-funclet SP/FP and emit per-funclet local declarations / funclet prologs.
- Update internal-register tracking to be per-funclet on targets without a fixed register set (WASM).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/regallocwasm.h | Introduces per-region RA state (SP/FP/EX + ref tracking) and new catch-arg collection hook. |
| src/coreclr/jit/regallocwasm.cpp | Implements per-funclet allocation, catch-arg rewriting, per-funclet internal-reg tables, and per-funclet locals decl generation/publishing. |
| src/coreclr/jit/lclvars.cpp | Adjusts frame-location dumping to use region-0 SP/FP on non-fixed-reg targets. |
| src/coreclr/jit/emitwasm.cpp | Extends wasm arg-count computation to support funclet signatures. |
| src/coreclr/jit/codegenwasm.cpp | Uses per-funclet SP/FP accessors and implements funclet prolog local signature emission. |
| src/coreclr/jit/codegeninterface.h | Makes SP/FP storage and internal-reg tracking per-funclet for non-fixed-reg targets; makes locals decls per-funclet. |
| src/coreclr/jit/codegencommon.cpp | Implements non-fixed-reg per-funclet NodeInternalRegisters tables and initializes per-funclet SP/FP/locals storage. |
src/coreclr/jit/regallocwasm.cpp
Outdated
| @@ -91,6 +98,11 @@ void WasmRegAlloc::IdentifyCandidates() | |||
| { | |||
| AllocateFramePointer(); | |||
| } | |||
|
|
|||
| if (m_compiler->compFuncInfoCount > 1) | |||
| { | |||
| AllocateExceptionPointer(); | |||
| } | |||
There was a problem hiding this comment.
IdentifyCandidates only allocates the frame pointer when there are frame locals or localloc. However, ResolveReferences’ funclet path unconditionally reserves two i32 parameters (sp/fp) and will underflow DeclaredCount if FP wasn’t allocated (e.g., EH with no frame locals). Ensure AllocateFramePointer runs whenever compFuncInfoCount > 1 (or otherwise make the funclet arg bookkeeping conditional on actually allocated FP state).
src/coreclr/jit/codegencommon.cpp
Outdated
| //------------------------------------------------------------------------ | ||
| // GetTable: get the internal register table for nodes in this funclet region | ||
| // or create it if it does not yet exist. | ||
| // | ||
| // Parameters: | ||
| // funcletIndex - Index of the funclet | ||
| // | ||
| // Returns: | ||
| // Pointer to the internal register table. | ||
| // | ||
| NodeInternalRegistersTable* NodeInternalRegisters::GetTable(unsigned funcletIndex) | ||
| { | ||
| assert(funcletIndex < m_tables.size()); | ||
| NodeInternalRegistersTable* table = m_tables[funcletIndex]; | ||
| return table; |
There was a problem hiding this comment.
The comment on GetTable says it will 'or create it if it does not yet exist', but GetTable currently does not create and may return nullptr. Update the comment (or rename to clarify the contract) to avoid misleading future callers.
SingleAccretion
left a comment
There was a problem hiding this comment.
Looks a lot like what I was thinking about too.
Left some high-level feedback.
src/coreclr/jit/regallocwasm.cpp
Outdated
| // Arguments: | ||
| // catchArg - The CATCH_ARG node | ||
| // | ||
| void WasmRegAlloc::CollectReferencesForCatchArg(GenTree* catchArg) |
There was a problem hiding this comment.
This doesn't seem necessary. We could just hardcode the CATCH_ARG to be local.get 3 in codegen. It would also remove the need for the virtual reg, we would just need to set the indexBase correctly in allocation (which is already done).
src/coreclr/jit/regallocwasm.cpp
Outdated
| // TODO-WASM: this may no longer hold with funclets. If a local has disjoint lifetimes that do not | ||
| // cross funclet boundaries, it may need to be allocated to different registers in different funclets. |
There was a problem hiding this comment.
That is now definitely a concern. Iterate through the functions backwards, so we can detect this case, reset lvRegister, and still end up with the "first" def in SetRegNum?
src/coreclr/jit/regallocwasm.cpp
Outdated
| { | ||
| regNumber reg = AllocateTemporaryRegister(type); | ||
| m_codeGen->internalRegisters.Add(node, reg); | ||
| m_codeGen->internalRegisters.Add(m_currentFunclet, node, reg); |
There was a problem hiding this comment.
We shouldn't be leaking RA impl details into the codegen map like this.
I think we shouldn't be trying to partition the internal regs to be per-funclet while the virtual regs are still global.
For only rewriting the internal regs belonging to a funclet, we can add the node m_virtualRegRefs and work off of that instead of the codegen map directly.
There was a problem hiding this comment.
Are you suggesting
- call
CollectReferenceson nodes with internal registers - in the
VirtualRegReferencesprocessing loop during resolution, if a node has internal registers, reassign those to physical regs
or something else?
There was a problem hiding this comment.
is this what you're suggesting?
Yep.
src/coreclr/jit/regallocwasm.cpp
Outdated
| if (varDsc->lvIsParam || varDsc->lvIsParamRegTarget) | ||
| auto allocPhysReg = [&](regNumber virtReg, LclVarDsc* varDsc) { | ||
| regNumber physReg; | ||
| if ((varDsc != nullptr) && varDsc->lvIsRegArg && !varDsc->lvIsStructField) |
There was a problem hiding this comment.
Currently this will try to assign the ABI registers to candidates live inside a funclet (but not across funclets).
We can make the logic more easily correct by construction by doing something like:
physReg = REG_NA;
if (<main function>) {
// Current logic with varDsc->lvIsRegArg etc
} else {
if (virtReg == spVirtReg) assign 0
if (virtReg == fpVirtReg) assign 1
}
if (physReg == REG_NA)
<assign from the stack>
and sharing the allocPhysReg part below between the main function and funclets.
src/coreclr/jit/codegeninterface.h
Outdated
| jitstd::vector<WasmLocalsDecl> WasmLocalsDecls; | ||
| // Per-funclet vectors of local declarations | ||
| // | ||
| jitstd::vector<jitstd::vector<WasmLocalsDecl>*> WasmLocalsDecls; |
There was a problem hiding this comment.
More traditionally this kind of data would hang off of FuncInfoDsc (here and for sp/fp).
There was a problem hiding this comment.
I had it there initially, don't remember why I moved it.
There was a problem hiding this comment.
It would require moving WasmLocalsDecl to an un-nested class, but that's fine...
|
@SingleAccretion 90066eb should cover most of your feedback except internal regs. |
|
Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara |
SingleAccretion
left a comment
There was a problem hiding this comment.
Mostly small comments left from me. But we also need to be a bit more careful with lvRegNum mutations.
|
Hopefully closer now... @SingleAccretion in the future maybe just point out one instance of a recurring pattern? Otherwise stuff may get overlooked. |
|
There's still an issue where a node can be collected twice, eg if it both needs a register and also has internal registers, and double processing a collected node is not good. We could have the collector mark nodes perhaps, use this to avoid a double collection, and unmark them when we process them later. Also |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/coreclr/jit/codegenwasm.cpp:2210
- genCodeForLclVar uses lvIsRegCandidate() to decide between shadow-stack load vs local_get of the enregistered Wasm local, and in the candidate case it reads the register from varDsc->GetRegNum(). With per-funclet RA, a single global regNum on LclVarDsc is insufficient (funclets have independent local index spaces), so this will pick up the root region’s assignment after RA finishes. To avoid mis-codegen, use a per-funclet mapping (indexed by funCurrentFuncIdx) or rewrite LCL_VAR nodes to carry the physical local to use (e.g., via GT_PHYSREG/gtSrcReg) during RA.
// targets, hence "WasmProduceReg" is only called for non-candidates.
if (!varDsc->lvIsRegCandidate())
{
var_types type = varDsc->GetRegisterType(tree);
GetEmitter()->emitIns_I(INS_local_get, EA_PTRSIZE, GetFramePointerRegIndex());
GetEmitter()->emitIns_S(ins_Load(type), emitTypeSize(type), tree->GetLclNum(), 0);
WasmProduceReg(tree);
}
else
{
assert(genIsValidReg(varDsc->GetRegNum()));
unsigned wasmLclIndex = WasmRegToIndex(varDsc->GetRegNum());
GetEmitter()->emitIns_I(INS_local_get, emitTypeSize(tree), wasmLclIndex);
| return; | ||
| } | ||
|
|
||
| node->gtLIRFlags |= LIR::Flags::VirtualRefsCollected; |
There was a problem hiding this comment.
Adding a new LIR flag just for this one place is a heavy measure.
Since we collect the nodes in order, just checking if the last node is being added again I think would be enough.
| // TODO-WASM: this will lead to incorrect debug info in funclets. We may need to track per funclet | ||
| // assignments somewhere. | ||
| // | ||
| varDsc->SetRegNum(physReg); |
There was a problem hiding this comment.
To expand on this a bit, my idea would be to have a vector local to this method where we store the physical regs, and do this if ((varDsc != nullptr) && varDsc->lvIsRegCandidate()) block at the end of the method where we iterate over this vector (they would be main method regs at that point).
|
GH doesn't seem to be showing one of my inline replies, so let me paste it here too... https://github.com/dotnet/runtime/pull/126445/changes#r3033433857 Basically if we have divergent allocations for a local we should not be setting the reg num on the local var, and codegen should not be looking at the local var for reg nums in most (all) cases. |
|
I implemented Also realized we were being too aggressive with handler-crossing locals. I haven't addressed the extra LIR flag. I suppose we can go back to resolving things in forward funclet order, since now none of the LclVarDsc state carries over to codegen. It all gets updated when codegen does its own walk. |
| for (unsigned argLclNum = 0; argLclNum < m_compiler->info.compArgsCount; argLclNum++) | ||
| { | ||
| const ABIPassingInformation& abiInfo = m_compiler->lvaGetParameterABIInfo(argLclNum); | ||
| for (const ABIPassingSegment& segment : abiInfo.Segments()) | ||
| { | ||
| if (segment.IsPassedInRegister()) | ||
| { | ||
| WasmValueType argType; | ||
| regNumber argReg = segment.GetRegister(); | ||
| unsigned argIndex = UnpackWasmReg(argReg, &argType); | ||
| indexBase = max(indexBase, argIndex + 1); | ||
|
|
||
| LclVarDsc* argVarDsc = m_compiler->lvaGetDesc(argLclNum); | ||
| if ((argVarDsc->GetRegNum() == argReg) || (data->m_spReg == argReg)) | ||
| { | ||
| assert(abiInfo.HasExactlyOneRegisterSegment()); | ||
| virtToPhysRegMap[static_cast<unsigned>(argType)].DeclaredCount--; | ||
| } | ||
|
|
||
| const ParameterRegisterLocalMapping* mapping = | ||
| m_compiler->FindParameterRegisterLocalMappingByRegister(argReg); | ||
| if ((mapping != nullptr) && | ||
| (m_compiler->lvaGetDesc(mapping->LclNum)->GetRegNum() == argReg)) | ||
| { | ||
| virtToPhysRegMap[static_cast<unsigned>(argType)].DeclaredCount--; | ||
| } |
There was a problem hiding this comment.
In the root FuncKind case, DeclaredCount is decremented only when argVarDsc->GetRegNum() == argReg (and similarly for param-reg-target mappings). At this point GetRegNum() is still the virtual reg assigned by IdentifyCandidates, while argReg is the ABI regNumber allocated in targetwasm.cpp using a global m_localIndex++. These generally won’t compare equal, so DeclaredCount won’t be reduced for register args/param targets, causing extra (unused) locals to be declared and shifting the per-type IndexBase calculations unnecessarily. Consider decrementing based on the local’s properties (e.g., lvIsRegCandidate()+lvIsRegArg / lvIsParamRegTarget) rather than equality of virtual reg numbers to ABI reg numbers.
| for (unsigned varIndex = 0; varIndex < m_compiler->lvaTrackedCount; varIndex++) | ||
| { | ||
| LclVarDsc* varDsc = m_compiler->lvaGetDescByTrackedIndex(varIndex); | ||
| varDsc->SetRegNum(m_virtualRegAssignments[varIndex]); |
There was a problem hiding this comment.
This looks to do more work than necessary. We could just leave the virtual regs on the LclVarDscs and assign them all in one go at the end based on the physical regs from ROOT_FUNC_IDX.
There was a problem hiding this comment.
I think this will work, but we'll need to tweak the GT_STORE_LCL_VAR case in virtual ref processing to look at the current map instead of the local var dsc.
There was a problem hiding this comment.
Almost works. genGeneratePrologsAndEpilogs rewinds things back to the main method state.
| } | ||
|
|
||
| #if defined(TARGET_WASM) | ||
| // Wasm RA currently does not support EH write-thru, so any local live in or out |
There was a problem hiding this comment.
Nit: no need for ifdefs if we move this condition into IdentifyCandidates.
| // Register assignments only change at funclet boundaries | ||
| // | ||
| bool const isFuncletEntry = m_compiler->bbIsFuncletBeg(bb); | ||
| bool const isFuncEntry = m_compiler->fgFirstBB == bb; |
There was a problem hiding this comment.
Do we need isFuncEntry? The 'initial' lvRegNum assignment is already based on the root function.
siUpdateVariableLiveRange looks to be missing.
There was a problem hiding this comment.
Feels a bit odd to leave this ambient state laying around without being able to double-check where it came from, but I suppose it's not that bad.
Implement register allocation for funclets, along with some funclet codegen.