public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
@ 2023-07-25 11:31 Roger Sayle
  2023-07-25 11:50 ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2023-07-25 11:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: 'Richard Biener'

[-- Attachment #1: Type: text/plain, Size: 1618 bytes --]


This patch is the third in series of fixes for PR rtl-optimization/110587,
a compile-time regression with -O0, that attempts to address the underlying
cause.  As noted previously, the pathological test case pr28071.c contains
a large number of useless register-to-register moves that can produce
quadratic behaviour (in LRA).  These move are generated during RTL
expansion in emit_group_load_1, where the middle-end attempts to simplify
the source before calling extract_bit_field.  This is reasonable if the
source is a complex expression (from before the tree-ssa optimizers), or
a SUBREG, or a hard register, but it's not particularly useful to copy
a pseudo register into a new pseudo register.  This patch eliminates that
redundancy.

The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
777K lines, with this patch it contains 717K lines, i.e. saving about 60K
lines (admittedly of debugging text output, but it makes the point).


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

As always, I'm happy to revert this change quickly if there's a problem,
and investigate why this additional copy might (still) be needed on other
non-x86 targets.


2023-07-25  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR middle-end/28071
        PR rtl-optimization/110587
        * expr.cc (emit_group_load_1): Avoid copying a pseudo register into
        a new pseudo register, i.e. only copy hard regs into a new pseudo.


Thanks in advance,
Roger
--


[-- Attachment #2: patchhg3.txt --]
[-- Type: text/plain, Size: 474 bytes --]

diff --git a/gcc/expr.cc b/gcc/expr.cc
index fff09dc..11d041b 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type,
 	 be loaded directly into the destination.  */
       src = orig_src;
       if (!MEM_P (orig_src)
+	  && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
 	  && (!CONSTANT_P (orig_src)
 	      || (GET_MODE (orig_src) != mode
 		  && GET_MODE (orig_src) != VOIDmode)))

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
  2023-07-25 11:31 [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog Roger Sayle
@ 2023-07-25 11:50 ` Richard Biener
  2023-07-27 17:12   ` Roger Sayle
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2023-07-25 11:50 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch is the third in series of fixes for PR rtl-optimization/110587,
> a compile-time regression with -O0, that attempts to address the underlying
> cause.  As noted previously, the pathological test case pr28071.c contains
> a large number of useless register-to-register moves that can produce
> quadratic behaviour (in LRA).  These move are generated during RTL
> expansion in emit_group_load_1, where the middle-end attempts to simplify
> the source before calling extract_bit_field.  This is reasonable if the
> source is a complex expression (from before the tree-ssa optimizers), or
> a SUBREG, or a hard register, but it's not particularly useful to copy
> a pseudo register into a new pseudo register.  This patch eliminates that
> redundancy.
>
> The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains
> 777K lines, with this patch it contains 717K lines, i.e. saving about 60K
> lines (admittedly of debugging text output, but it makes the point).
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
> As always, I'm happy to revert this change quickly if there's a problem,
> and investigate why this additional copy might (still) be needed on other
> non-x86 targets.

@@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx
orig_src, tree type,
         be loaded directly into the destination.  */
       src = orig_src;
       if (!MEM_P (orig_src)
+         && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
          && (!CONSTANT_P (orig_src)
              || (GET_MODE (orig_src) != mode
                  && GET_MODE (orig_src) != VOIDmode)))

so that means the code guarded by the conditional could instead
be transformed to

   src = force_reg (mode, orig_src);

?  Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) != VOIDmode)
case looks odd as in that case we'd use GET_MODE (orig_src) for the
move ... that
might also mean we have to use force_reg (GET_MODE (orig_src) ==
VOIDmode ? mode : GET_MODE (orig_src), orig_src))

Otherwise I think this is OK, as said, using
force_reg somehow would improve readability here I think.

I also wonder how the

      else if (GET_CODE (src) == CONCAT)

case will ever trigger with the current code.

Richard.

>
> 2023-07-25  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR middle-end/28071
>         PR rtl-optimization/110587
>         * expr.cc (emit_group_load_1): Avoid copying a pseudo register into
>         a new pseudo register, i.e. only copy hard regs into a new pseudo.
>
>
> Thanks in advance,
> Roger
> --
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
  2023-07-25 11:50 ` Richard Biener
@ 2023-07-27 17:12   ` Roger Sayle
  2023-07-27 19:05     ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2023-07-27 17:12 UTC (permalink / raw)
  To: 'Richard Biener'; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4273 bytes --]


Hi Richard,

You're 100% right.  It’s possible to significantly clean-up this code, replacing
the body of the conditional with a call to force_reg and simplifying the conditions
under which it is called.  These improvements are implemented in the patch
below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and
make -k check, both with and without -m32, as usual.

Interestingly, the CONCAT clause afterwards is still required (I've learned something
new),  as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a CONCAT
instead of a REG, so although the code looks dead, it's required to build libgcc during
a bootstrap.  But the remaining clean-up is good, reducing the number of source lines
and making the logic easier to understand.

Ok for mainline?

2023-07-27  Roger Sayle  <roger@nextmovesoftware.com>
            Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
        PR middle-end/28071
        PR rtl-optimization/110587
        * expr.cc (emit_group_load_1): Simplify logic for calling
        force_reg on ORIG_SRC, to avoid making a copy if the source
        is already in a pseudo register.

Roger
--

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 25 July 2023 12:50
> 
> On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> > This patch is the third in series of fixes for PR
> > rtl-optimization/110587, a compile-time regression with -O0, that
> > attempts to address the underlying cause.  As noted previously, the
> > pathological test case pr28071.c contains a large number of useless
> > register-to-register moves that can produce quadratic behaviour (in
> > LRA).  These move are generated during RTL expansion in
> > emit_group_load_1, where the middle-end attempts to simplify the
> > source before calling extract_bit_field.  This is reasonable if the
> > source is a complex expression (from before the tree-ssa optimizers),
> > or a SUBREG, or a hard register, but it's not particularly useful to
> > copy a pseudo register into a new pseudo register.  This patch eliminates that
> redundancy.
> >
> > The -fdump-tree-expand for pr28071.c compiled with -O0 currently
> > contains 777K lines, with this patch it contains 717K lines, i.e.
> > saving about 60K lines (admittedly of debugging text output, but it makes the
> point).
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > As always, I'm happy to revert this change quickly if there's a
> > problem, and investigate why this additional copy might (still) be
> > needed on other
> > non-x86 targets.
> 
> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src,
> tree type,
>          be loaded directly into the destination.  */
>        src = orig_src;
>        if (!MEM_P (orig_src)
> +         && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
>           && (!CONSTANT_P (orig_src)
>               || (GET_MODE (orig_src) != mode
>                   && GET_MODE (orig_src) != VOIDmode)))
> 
> so that means the code guarded by the conditional could instead be transformed
> to
> 
>    src = force_reg (mode, orig_src);
> 
> ?  Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) !=
> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the
> move ... that might also mean we have to use force_reg (GET_MODE (orig_src) ==
> VOIDmode ? mode : GET_MODE (orig_src), orig_src))
> 
> Otherwise I think this is OK, as said, using force_reg somehow would improve
> readability here I think.
> 
> I also wonder how the
> 
>       else if (GET_CODE (src) == CONCAT)
> 
> case will ever trigger with the current code.
> 
> Richard.
> 
> >
> > 2023-07-25  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR middle-end/28071
> >         PR rtl-optimization/110587
> >         * expr.cc (emit_group_load_1): Avoid copying a pseudo register into
> >         a new pseudo register, i.e. only copy hard regs into a new pseudo.
> >
> >


[-- Attachment #2: patchhg4.txt --]
[-- Type: text/plain, Size: 846 bytes --]

diff --git a/gcc/expr.cc b/gcc/expr.cc
index fff09dc..174f8ac 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2622,16 +2622,11 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type,
 	 be loaded directly into the destination.  */
       src = orig_src;
       if (!MEM_P (orig_src)
-	  && (!CONSTANT_P (orig_src)
-	      || (GET_MODE (orig_src) != mode
-		  && GET_MODE (orig_src) != VOIDmode)))
+	  && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
+	  && !CONSTANT_P (orig_src))
 	{
-	  if (GET_MODE (orig_src) == VOIDmode)
-	    src = gen_reg_rtx (mode);
-	  else
-	    src = gen_reg_rtx (GET_MODE (orig_src));
-
-	  emit_move_insn (src, orig_src);
+	  gcc_assert (GET_MODE (orig_src) != VOIDmode);
+	  src = force_reg (GET_MODE (orig_src), orig_src);
 	}
 
       /* Optimize the access just a bit.  */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog.
  2023-07-27 17:12   ` Roger Sayle
@ 2023-07-27 19:05     ` Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2023-07-27 19:05 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches



> Am 27.07.2023 um 19:12 schrieb Roger Sayle <roger@nextmovesoftware.com>:
> 
> 
> Hi Richard,
> 
> You're 100% right.  It’s possible to significantly clean-up this code, replacing
> the body of the conditional with a call to force_reg and simplifying the conditions
> under which it is called.  These improvements are implemented in the patch
> below, which has been tested on x86_64-pc-linux-gnu, with a bootstrap and
> make -k check, both with and without -m32, as usual.
> 
> Interestingly, the CONCAT clause afterwards is still required (I've learned something
> new),  as calling force_reg (or gen_reg_rtx) with HCmode, actually returns a CONCAT
> instead of a REG,

Heh, interesting.

> so although the code looks dead, it's required to build libgcc during
> a bootstrap.  But the remaining clean-up is good, reducing the number of source lines
> and making the logic easier to understand.
> 
> Ok for mainline?

Ok.

Thanks,
Richard 

> 2023-07-27  Roger Sayle  <roger@nextmovesoftware.com>
>            Richard Biener  <rguenther@suse.de>
> 
> gcc/ChangeLog
>        PR middle-end/28071
>        PR rtl-optimization/110587
>        * expr.cc (emit_group_load_1): Simplify logic for calling
>        force_reg on ORIG_SRC, to avoid making a copy if the source
>        is already in a pseudo register.
> 
> Roger
> --
> 
>> -----Original Message-----
>> From: Richard Biener <richard.guenther@gmail.com>
>> Sent: 25 July 2023 12:50
>> 
>>> On Tue, Jul 25, 2023 at 1:31 PM Roger Sayle <roger@nextmovesoftware.com>
>>> wrote:
>>> 
>>> This patch is the third in series of fixes for PR
>>> rtl-optimization/110587, a compile-time regression with -O0, that
>>> attempts to address the underlying cause.  As noted previously, the
>>> pathological test case pr28071.c contains a large number of useless
>>> register-to-register moves that can produce quadratic behaviour (in
>>> LRA).  These move are generated during RTL expansion in
>>> emit_group_load_1, where the middle-end attempts to simplify the
>>> source before calling extract_bit_field.  This is reasonable if the
>>> source is a complex expression (from before the tree-ssa optimizers),
>>> or a SUBREG, or a hard register, but it's not particularly useful to
>>> copy a pseudo register into a new pseudo register.  This patch eliminates that
>> redundancy.
>>> 
>>> The -fdump-tree-expand for pr28071.c compiled with -O0 currently
>>> contains 777K lines, with this patch it contains 717K lines, i.e.
>>> saving about 60K lines (admittedly of debugging text output, but it makes the
>> point).
>>> 
>>> 
>>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>>> and make -k check, both with and without --target_board=unix{-m32}
>>> with no new failures.  Ok for mainline?
>>> 
>>> As always, I'm happy to revert this change quickly if there's a
>>> problem, and investigate why this additional copy might (still) be
>>> needed on other
>>> non-x86 targets.
>> 
>> @@ -2622,6 +2622,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src,
>> tree type,
>>         be loaded directly into the destination.  */
>>       src = orig_src;
>>       if (!MEM_P (orig_src)
>> +         && (!REG_P (orig_src) || HARD_REGISTER_P (orig_src))
>>          && (!CONSTANT_P (orig_src)
>>              || (GET_MODE (orig_src) != mode
>>                  && GET_MODE (orig_src) != VOIDmode)))
>> 
>> so that means the code guarded by the conditional could instead be transformed
>> to
>> 
>>   src = force_reg (mode, orig_src);
>> 
>> ?  Btw, the || (GET_MODE (orig_src) != mode && GET_MODE (orig_src) !=
>> VOIDmode) case looks odd as in that case we'd use GET_MODE (orig_src) for the
>> move ... that might also mean we have to use force_reg (GET_MODE (orig_src) ==
>> VOIDmode ? mode : GET_MODE (orig_src), orig_src))
>> 
>> Otherwise I think this is OK, as said, using force_reg somehow would improve
>> readability here I think.
>> 
>> I also wonder how the
>> 
>>      else if (GET_CODE (src) == CONCAT)
>> 
>> case will ever trigger with the current code.
>> 
>> Richard.
>> 
>>> 
>>> 2023-07-25  Roger Sayle  <roger@nextmovesoftware.com>
>>> 
>>> gcc/ChangeLog
>>>        PR middle-end/28071
>>>        PR rtl-optimization/110587
>>>        * expr.cc (emit_group_load_1): Avoid copying a pseudo register into
>>>        a new pseudo register, i.e. only copy hard regs into a new pseudo.
>>> 
>>> 
> 
> <patchhg4.txt>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-27 19:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-25 11:31 [PATCH] PR rtl-optimization/110587: Reduce useless moves in compile-time hog Roger Sayle
2023-07-25 11:50 ` Richard Biener
2023-07-27 17:12   ` Roger Sayle
2023-07-27 19:05     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).