public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.
@ 2023-05-06 12:57 Roger Sayle
  2023-05-06 18:46 ` Jeff Law
  0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2023-05-06 12:57 UTC (permalink / raw)
  To: 'GCC Patches'

[-- Attachment #1: Type: text/plain, Size: 1591 bytes --]


Following up on posts/reviews by Segher and Uros, there's some question
over why the middle-end's lower subreg pass emits a clobber (of a
multi-word register) into the instruction stream before emitting the
sequence of moves of the word-sized parts.  This clobber interferes
with (LRA) register allocation, preventing the multi-word pseudo to
remain in the same hard registers.  This patch eliminates this
(presumably superfluous) clobber and thereby improves register allocation.

A concrete example of the observed improvement is PR target/43644.
For the test case:
__int128 foo(__int128 x, __int128 y) { return x+y; }

on x86_64-pc-linux-gnu, gcc -O2 currently generates:

foo:    movq    %rsi, %rax
        movq    %rdi, %r8
        movq    %rax, %rdi
        movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %r8, %rax
        adcq    %rdi, %rdx
        ret

with this patch, we now generate the much improved:

foo:    movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
        ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures.  OK for mainline?


2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/43644
        * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
        immediately before moving a multi-word register by parts.

gcc/testsuite/ChangeLog
        PR target/43644
        * gcc.target/i386/pr43644.c: New test case.


Thanks in advance,
Roger
--


[-- Attachment #2: patchcl.txt --]
[-- Type: text/plain, Size: 958 bytes --]

diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
index 81fc5380..7c9cc3c 100644
--- a/gcc/lower-subreg.cc
+++ b/gcc/lower-subreg.cc
@@ -1086,9 +1086,6 @@ resolve_simple_move (rtx set, rtx_insn *insn)
     {
       unsigned int i;
 
-      if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
-	emit_clobber (dest);
-
       for (i = 0; i < words; ++i)
 	{
 	  rtx t = simplify_gen_subreg_concatn (word_mode, dest,
diff --git a/gcc/testsuite/gcc.target/i386/pr43644.c b/gcc/testsuite/gcc.target/i386/pr43644.c
new file mode 100644
index 0000000..ffdf31c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr43644.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x, __int128 y)
+{
+  return x+y;
+}
+
+/* { dg-final { scan-assembler-times "movq" 2 } } */
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.
  2023-05-06 12:57 [PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move Roger Sayle
@ 2023-05-06 18:46 ` Jeff Law
  2023-05-08  6:43   ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Law @ 2023-05-06 18:46 UTC (permalink / raw)
  To: Roger Sayle, 'GCC Patches'



On 5/6/23 06:57, Roger Sayle wrote:
> 
> Following up on posts/reviews by Segher and Uros, there's some question
> over why the middle-end's lower subreg pass emits a clobber (of a
> multi-word register) into the instruction stream before emitting the
> sequence of moves of the word-sized parts.  This clobber interferes
> with (LRA) register allocation, preventing the multi-word pseudo to
> remain in the same hard registers.  This patch eliminates this
> (presumably superfluous) clobber and thereby improves register allocation.
Those clobbered used to help dataflow analysis know that a multi word 
register was fully assigned by a subsequent sequence.  I suspect they 
haven't been terribly useful in quite a while.


> 
> A concrete example of the observed improvement is PR target/43644.
> For the test case:
> __int128 foo(__int128 x, __int128 y) { return x+y; }
> 
> on x86_64-pc-linux-gnu, gcc -O2 currently generates:
> 
> foo:    movq    %rsi, %rax
>          movq    %rdi, %r8
>          movq    %rax, %rdi
>          movq    %rdx, %rax
>          movq    %rcx, %rdx
>          addq    %r8, %rax
>          adcq    %rdi, %rdx
>          ret
> 
> with this patch, we now generate the much improved:
> 
> foo:    movq    %rdx, %rax
>          movq    %rcx, %rdx
>          addq    %rdi, %rax
>          adcq    %rsi, %rdx
>          ret
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} with
> no new failures.  OK for mainline?
> 
> 
> 2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          PR target/43644
>          * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
>          immediately before moving a multi-word register by parts.
> 
> gcc/testsuite/ChangeLog
>          PR target/43644
>          * gcc.target/i386/pr43644.c: New test case.
OK for the trunk.  I won't be at all surprised to see fallout in the 
various target tests.  We can fault in fixes as needed.  More 
importantly I think we want as much soak time for this change as we can 
in case there are unexpected consequences.

jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.
  2023-05-06 18:46 ` Jeff Law
@ 2023-05-08  6:43   ` Richard Biener
  2023-05-08 22:02     ` Jeff Law
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2023-05-08  6:43 UTC (permalink / raw)
  To: Jeff Law; +Cc: Roger Sayle, GCC Patches

On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/6/23 06:57, Roger Sayle wrote:
> >
> > Following up on posts/reviews by Segher and Uros, there's some question
> > over why the middle-end's lower subreg pass emits a clobber (of a
> > multi-word register) into the instruction stream before emitting the
> > sequence of moves of the word-sized parts.  This clobber interferes
> > with (LRA) register allocation, preventing the multi-word pseudo to
> > remain in the same hard registers.  This patch eliminates this
> > (presumably superfluous) clobber and thereby improves register allocation.
> Those clobbered used to help dataflow analysis know that a multi word
> register was fully assigned by a subsequent sequence.  I suspect they
> haven't been terribly useful in quite a while.

Likely - maybe they still make a difference for some targets though.
It might be interesting to see whether combining the clobber with the
first set or making the set a multi-set with a parallel would be any
better?

>
>
> >
> > A concrete example of the observed improvement is PR target/43644.
> > For the test case:
> > __int128 foo(__int128 x, __int128 y) { return x+y; }
> >
> > on x86_64-pc-linux-gnu, gcc -O2 currently generates:
> >
> > foo:    movq    %rsi, %rax
> >          movq    %rdi, %r8
> >          movq    %rax, %rdi
> >          movq    %rdx, %rax
> >          movq    %rcx, %rdx
> >          addq    %r8, %rax
> >          adcq    %rdi, %rdx
> >          ret
> >
> > with this patch, we now generate the much improved:
> >
> > foo:    movq    %rdx, %rax
> >          movq    %rcx, %rdx
> >          addq    %rdi, %rax
> >          adcq    %rsi, %rdx
> >          ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32} with
> > no new failures.  OK for mainline?
> >
> >
> > 2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR target/43644
> >          * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
> >          immediately before moving a multi-word register by parts.
> >
> > gcc/testsuite/ChangeLog
> >          PR target/43644
> >          * gcc.target/i386/pr43644.c: New test case.
> OK for the trunk.  I won't be at all surprised to see fallout in the
> various target tests.  We can fault in fixes as needed.  More
> importantly I think we want as much soak time for this change as we can
> in case there are unexpected consequences.
>
> jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.
  2023-05-08  6:43   ` Richard Biener
@ 2023-05-08 22:02     ` Jeff Law
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff Law @ 2023-05-08 22:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: Roger Sayle, GCC Patches



On 5/8/23 00:43, Richard Biener wrote:
> On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 5/6/23 06:57, Roger Sayle wrote:
>>>
>>> Following up on posts/reviews by Segher and Uros, there's some question
>>> over why the middle-end's lower subreg pass emits a clobber (of a
>>> multi-word register) into the instruction stream before emitting the
>>> sequence of moves of the word-sized parts.  This clobber interferes
>>> with (LRA) register allocation, preventing the multi-word pseudo to
>>> remain in the same hard registers.  This patch eliminates this
>>> (presumably superfluous) clobber and thereby improves register allocation.
>> Those clobbered used to help dataflow analysis know that a multi word
>> register was fully assigned by a subsequent sequence.  I suspect they
>> haven't been terribly useful in quite a while.
> 
> Likely - maybe they still make a difference for some targets though.
> It might be interesting to see whether combining the clobber with the
> first set or making the set a multi-set with a parallel would be any
> better?
Wrapping them inside a PARALLEL might be better, but probably isn't 
worth the effort.  I think all this stuff dates back to the era where we 
had flow.c to provide the register lifetimes used by local-alloc.  We 
also had things like REG_NO_CONFLICT to indicate that the sub-object 
assignments didn't conflict.  In all it was rather hackish.

Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-08 22:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-06 12:57 [PATCH] Don't call emit_clobber in lower-subreg.cc's resolve_simple_move Roger Sayle
2023-05-06 18:46 ` Jeff Law
2023-05-08  6:43   ` Richard Biener
2023-05-08 22:02     ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).