Hi Richard, You're 100% right. It’s possible to significantly clean-up this code, replacing the body of the conditional with a call to force_reg and simplifying the conditions under which it is called. These improvements are implemented in the patch below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and make -k check, both with and without -m32, as usual. Interestingly, the CONCAT clause afterwards is still required (I've learned something new), as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a CONCAT instead of a REG, so although the code looks dead, it's required to build libgcc during a bootstrap. But the remaining clean-up is good, reducing the number of source lines and making the logic easier to understand. Ok for mainline? 2023-07-27 Roger Sayle Richard Biener gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register. Roger -- > -----Original Message----- > From: Richard Biener > Sent: 25 July 2023 12:50 > > On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle > wrote: > > > > This patch is the third in series of fixes for PR > > rtl-optimization/110587, a compile-time regression with -O0, that > > attempts to address the underlying cause. As noted previously, the > > pathological test case pr28071.c contains a large number of useless > > register-to-register moves that can produce quadratic behaviour (in > > LRA). These move are generated during RTL expansion in > > emit_group_load_1, where the middle-end attempts to simplify the > > source before calling extract_bit_field. This is reasonable if the > > source is a complex expression (from before the tree-ssa optimizers), > > or a SUBREG, or a hard register, but it's not particularly useful to > > copy a pseudo register into a new pseudo register. This patch eliminates that > redundancy. > > > > The -fdump-tree-expand for pr28071.c compiled with -O0 currently > > contains 777K lines, with this patch it contains 717K lines, i.e. > > saving about 60K lines (admittedly of debugging text output, but it makes the > point). > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > > > As always, I'm happy to revert this change quickly if there's a > > problem, and investigate why this additional copy might (still) be > > needed on other > > non-x86 targets. > > @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, > tree type, > be loaded directly into the destination. */ > src = orig_src; > if (!MEM_P (orig_src) > + && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src)) > && (!CONSTANT_P (orig_src) > || (GET_MODE (orig_src) != mode > && GET_MODE (orig_src) != VOIDmode))) > > so that means the code guarded by the conditional could instead be transformed > to > > src = force_reg (mode, orig_src); > > ? Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) != > VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the > move ... that might also mean we have to use force_reg (GET_MODE (orig_src) == > VOIDmode ? mode : GET_MODE (orig_src), orig_src)) > > Otherwise I think this is OK, as said, using force_reg somehow would improve > readability here I think. > > I also wonder how the > > else if (GET_CODE (src) == CONCAT) > > case will ever trigger with the current code. > > Richard. > > > > > 2023-07-25 Roger Sayle > > > > gcc/ChangeLog > > PR middle-end/28071 > > PR rtl-optimization/110587 > > * expr.cc (emit_group_load_1): Avoid copying a pseudo register into > > a new pseudo register, i.e. only copy hard regs into a new pseudo. > > > >