public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset
       [not found] <bug-32629-4@http.gcc.gnu.org/bugzilla/>
@ 2012-06-06 10:06 ` rguenth at gcc dot gnu.org
  2012-06-09 22:17 ` hubicka at ucw dot cz
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-06-06 10:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-06-06
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-06-06 10:05:57 UTC ---
Confirmed with -Os on trunk (4.8).  With -O2 we unroll completely to

f:
.LFB0:
        .cfi_startproc
        movq    $0, (%rdi)
        movq    $0, 8(%rdi)
        movq    $0, 16(%rdi)
        movq    $0, 24(%rdi)
        movq    $0, 32(%rdi)
        movq    $0, 40(%rdi)
        movq    $0, 48(%rdi)
        movq    $0, 56(%rdi)
        movq    $0, 64(%rdi)
        movq    $0, 72(%rdi)
        ret

which lacks the size optimization to use a zeroed %rax.  Likewise
for -Os which now looks like

   0:   48 8d 57 30             lea    0x30(%rdi),%rdx
   4:   48 c7 07 00 00 00 00    movq   $0x0,(%rdi)
   b:   48 c7 47 08 00 00 00    movq   $0x0,0x8(%rdi)
  12:   00 
  13:   48 c7 47 10 00 00 00    movq   $0x0,0x10(%rdi)
  1a:   00 
  1b:   48 c7 47 18 00 00 00    movq   $0x0,0x18(%rdi)
  22:   00 
  23:   b9 08 00 00 00          mov    $0x8,%ecx
  28:   48 c7 47 20 00 00 00    movq   $0x0,0x20(%rdi)
  2f:   00 
  30:   48 c7 47 28 00 00 00    movq   $0x0,0x28(%rdi)
  37:   00 
  38:   31 c0                   xor    %eax,%eax
  3a:   48 89 d7                mov    %rdx,%rdi
  3d:   f3 ab                   rep stos %eax,%es:(%rdi)
  3f:   c3                      retq   

I suppose with -Os we use rep stosl because that's one byte smaller ...(?)

I suppose doing the $0x0 optimization should be done post-reload.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset
       [not found] <bug-32629-4@http.gcc.gnu.org/bugzilla/>
  2012-06-06 10:06 ` [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset rguenth at gcc dot gnu.org
@ 2012-06-09 22:17 ` hubicka at ucw dot cz
  2012-06-11  8:39 ` rguenther at suse dot de
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: hubicka at ucw dot cz @ 2012-06-09 22:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629

--- Comment #2 from Jan Hubicka <hubicka at ucw dot cz> 2012-06-09 22:17:07 UTC ---
> I suppose doing the $0x0 optimization should be done post-reload.
I was wondering how to implement this nice for some years already.... I don't
see how this can be done without specialized pass, really, and the interface is
probably going to be bit weird, since it is very weird property of x86
instruction set that there are no stores with short immediate...

Honza


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset
       [not found] <bug-32629-4@http.gcc.gnu.org/bugzilla/>
  2012-06-06 10:06 ` [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset rguenth at gcc dot gnu.org
  2012-06-09 22:17 ` hubicka at ucw dot cz
@ 2012-06-11  8:39 ` rguenther at suse dot de
  2014-09-27  0:15 ` hubicka at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: rguenther at suse dot de @ 2012-06-11  8:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> 2012-06-11 08:39:45 UTC ---
On Sat, 9 Jun 2012, hubicka at ucw dot cz wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629
> 
> --- Comment #2 from Jan Hubicka <hubicka at ucw dot cz> 2012-06-09 22:17:07 UTC ---
> > I suppose doing the $0x0 optimization should be done post-reload.
> I was wondering how to implement this nice for some years already.... I don't
> see how this can be done without specialized pass, really, and the interface is
> probably going to be bit weird, since it is very weird property of x86
> instruction set that there are no stores with short immediate...

I wonder if we can use a peephole2 and DF information (and update it
on-the-fly).  Thus, when seeing

  mov $0, ...
  mov $0, ...

transform it incrementally to

  xor %eax, %eax
  mov %eax, ...
  mov $0, ...

and then have a 2nd peephole2 with higher priority that looks for
a register with zero content (eh ... that's the interesting part ;))
and do

  xor %eax, %eax
  mov %eax, ...
  mov %eax, ...

I suppose we have the first peephole already - though we'd end up
with

  xor %eax, %eax
  mov %eax, ...
  xor %eax, %eax
  mov %eax, ...

and rely on postreload-cse to clean that up (which isn't run after
the peephole2 in postreload queue ...)

Richard.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset
       [not found] <bug-32629-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2012-06-11  8:39 ` rguenther at suse dot de
@ 2014-09-27  0:15 ` hubicka at gcc dot gnu.org
  2014-09-29 10:24 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: hubicka at gcc dot gnu.org @ 2014-09-27  0:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629

--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Yep, the problem with dealing arbitrarily long sequences is something we need
to solve.  Also memcpy/memset ought to use vector moves by itself in these
cases..


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset
       [not found] <bug-32629-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2014-09-27  0:15 ` hubicka at gcc dot gnu.org
@ 2014-09-29 10:24 ` rguenth at gcc dot gnu.org
  2014-09-29 16:50 ` hubicka at ucw dot cz
  2021-07-26 22:20 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-09-29 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
We could also make $0 a not legitimate constant on x86... (and undo that late
with a peephole2)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset
       [not found] <bug-32629-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2014-09-29 10:24 ` rguenth at gcc dot gnu.org
@ 2014-09-29 16:50 ` hubicka at ucw dot cz
  2021-07-26 22:20 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: hubicka at ucw dot cz @ 2014-09-29 16:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629

--- Comment #6 from Jan Hubicka <hubicka at ucw dot cz> ---
> We could also make $0 a not legitimate constant on x86... (and undo that late
> with a peephole2)

I tried that in 90's. At that time it increased register pressure and was not
win...

Honza


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset
       [not found] <bug-32629-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2014-09-29 16:50 ` hubicka at ucw dot cz
@ 2021-07-26 22:20 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-26 22:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32629

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |DUPLICATE
             Status|NEW                         |RESOLVED

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
On the trunk at -O2 we have:
        pxor    %xmm0, %xmm0
        movups  %xmm0, (%rdi)
        movups  %xmm0, 16(%rdi)
        movups  %xmm0, 32(%rdi)
        pxor    %xmm0, %xmm0
        movups  %xmm0, 48(%rdi)
        movups  %xmm0, 64(%rdi)
        ret

With -mno-sse we get:
        movq    $0, (%rdi)
        movq    $0, 8(%rdi)
        movq    $0, 16(%rdi)
        movq    $0, 24(%rdi)
        movq    $0, 32(%rdi)
        movq    $0, 40(%rdi)
        movq    $0, 48(%rdi)
        movq    $0, 56(%rdi)
        movq    $0, 64(%rdi)
        movq    $0, 72(%rdi)
        ret

at -Os we get:
        xorl    %eax, %eax
        leaq    48(%rdi), %rdx
        movl    $8, %ecx
        movq    %rax, (%rdi)
        movq    %rax, 8(%rdi)
        movq    %rax, 16(%rdi)
        movq    %rax, 24(%rdi)
        movq    %rax, 32(%rdi)
        movq    %rax, 40(%rdi)
        xorl    %eax, %eax
        movq    %rdx, %rdi

Which was implemented by PR 11877.
So I am going to close this as a dup of bug 11877.

*** This bug has been marked as a duplicate of bug 11877 ***

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-26 22:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-32629-4@http.gcc.gnu.org/bugzilla/>
2012-06-06 10:06 ` [Bug rtl-optimization/32629] missing CSE for constant in registers / inefficient memset rguenth at gcc dot gnu.org
2012-06-09 22:17 ` hubicka at ucw dot cz
2012-06-11  8:39 ` rguenther at suse dot de
2014-09-27  0:15 ` hubicka at gcc dot gnu.org
2014-09-29 10:24 ` rguenth at gcc dot gnu.org
2014-09-29 16:50 ` hubicka at ucw dot cz
2021-07-26 22:20 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).