LLVM will try to find scratch register even after RA to resolve the long jump issue. so maybe we could consider similar approach? And I guess the most complicate part would be the scratch register is not found, and require spill/reload after RA. Jeff Law via Gcc-patches 於 2023年6月26日 週一,22:31寫道: > > > On 6/25/23 12:45, Stefan O'Rear wrote: > > > > > To clarify: are you proposing to make ra (or t1 in the hypothetical) a > fixed > > register for all functions, or only those heuristically identified as > potentially > > larger than 1MiB? And would this extend to forcing the creation of > stack frames > > for all functions, including very small functions? I am concerned this > would > > result in a substantial performance regression.For the case Yanzhang is > discussing (firmware and such), yes. And > that's simply the cost they're going to have to pay for wanting > consistent backtraces without utilizing dwarf unwind info, sframe or orc. > > Normal builds won't be using those options and thus won't suffer from > those performance penalties. > > > > > Without seeing the patch I can't know if I'm missing something obvious > but I > > would say t1 has three advantages: > > > > 1. Consistency with tail, possibly simpler implementation. > And as I've already stated, this sequence is defined by the assembler. > While I do want to revisit a compiler only solution, it's way down on my > list of things to improve if I do a cost/benefit analysis. If someone > wants to take a stab at it, I'm all for it. But it's not a simple > problem due the phase ordering issues. > > > > > 2. Very few functions use all seven t-registers. qemu linux-user in > 2016 had an > > off-by-one bug that corrupted t6 in sigreturn and it took months for > anyone to > > notice. By contrast, ra has live data in every non-_Noreturn function. > That's a terrible way to evaluate the impact. The right way is to use > real benchmarks. Not synthetic benchmarks. Not indirect observations > that require triggering a bug in a sigreturn path. Build and run a real > benchmark. > > > > > > > 3. Any jalr instruction which has rs1=ra has a hint effect on the return > address > > stack (call, return, or coroutine swap); a jalr which is intended to be > treated > > as a plain jump must have rs1!=ra, rs1!=t0. > I'm well aware of these concerns. We support disambiguating various > jump forms to facilitate different branch predictors. > > jeff >