public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/32735]  New: i686 sse2 generates more movdqa than necessary
@ 2007-07-12  1:30 mec at google dot com
  2007-07-12  1:30 ` [Bug c++/32735] " mec at google dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12  1:30 UTC (permalink / raw)
  To: gcc-bugs

Test program attached.

Command line:

mec@hollerith:~/exp-sum-delta$ /home/mec/gcc-4.3-20070707/install/bin/g++ -v -S 
-O2 -msse2 sum-delta.cc 
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: /home/mec/gcc-4.3-20070707/src/configure
--build=i686-pc-linux-
gnu --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu
--prefix=/home/mec/gcc-4
.3-20070707/install --enable-languages=c,c++,objc,obj-c++,treelang
--with-gmp=/h
ome/mec/gmp-4.2.1/install --with-mpfr=/home/mec/mpfr-2.2.1/install
Thread model: posix
gcc version 4.3.0 20070707 (experimental)
 /home/mec/gcc-4.3-20070707/install/libexec/gcc/i686-pc-linux-gnu/4.3.0/cc1plus 
-quiet -v -D_GNU_SOURCE sum-delta.cc -quiet -dumpbase sum-delta.cc -msse2
-mtune
=generic -auxbase sum-delta -O2 -version -o sum-delta.s
ignoring nonexistent directory
"/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-
pc-linux-gnu/4.3.0/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:

/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/../../../../
include/c++/4.3.0

/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/../../../../
include/c++/4.3.0/i686-pc-linux-gnu

/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/../../../../
include/c++/4.3.0/backward
 /usr/local/include
 /home/mec/gcc-4.3-20070707/install/include
 /home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/include

/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/include-fixe
d
 /usr/include
End of search list.
GNU C++ version 4.3.0 20070707 (experimental) (i686-pc-linux-gnu)
        compiled by GNU C version 4.3.0 20070707 (experimental), GMP version
4.2
.1, MPFR version 2.2.1.
warning: GMP header version 4.2.1 differs from library version 4.1.4.
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 1338ea4083517ffee92283f96caf8872

===

The loop for CallSumDeltas2 compiles to:

.L7:
        movdqa  %xmm1, %xmm0
        pslldq  $4, %xmm0
        addl    $1, %eax
        paddd   %xmm1, %xmm0
        cmpl    $100000000, %eax
        movdqa  %xmm0, %xmm1
        pslldq  $8, %xmm1
        paddd   %xmm1, %xmm0
        movdqa  %xmm0, %xmm1
        movdqa  %xmm0, foo1
        jne     .L7

===

This is two more movdqa then the hand-written code in CallSumDeltas3.


-- 
           Summary: i686 sse2 generates more movdqa than necessary
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: mec at google dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c++/32735] i686 sse2 generates more movdqa than necessary
  2007-07-12  1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
@ 2007-07-12  1:30 ` mec at google dot com
  2007-07-12  1:31 ` mec at google dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12  1:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from mec at google dot com  2007-07-12 01:30 -------
Created an attachment (id=13894)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13894&action=view)
Test program


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c++/32735] i686 sse2 generates more movdqa than necessary
  2007-07-12  1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
  2007-07-12  1:30 ` [Bug c++/32735] " mec at google dot com
@ 2007-07-12  1:31 ` mec at google dot com
  2007-07-12  1:33 ` mec at google dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12  1:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from mec at google dot com  2007-07-12 01:31 -------
Created an attachment (id=13895)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13895&action=view)
Generated code


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c++/32735] i686 sse2 generates more movdqa than necessary
  2007-07-12  1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
  2007-07-12  1:30 ` [Bug c++/32735] " mec at google dot com
  2007-07-12  1:31 ` mec at google dot com
@ 2007-07-12  1:33 ` mec at google dot com
  2007-07-12  1:33 ` mec at google dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12  1:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from mec at google dot com  2007-07-12 01:33 -------
Created an attachment (id=13896)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13896&action=view)
Assembly code


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c++/32735] i686 sse2 generates more movdqa than necessary
  2007-07-12  1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
                   ` (2 preceding siblings ...)
  2007-07-12  1:33 ` mec at google dot com
@ 2007-07-12  1:33 ` mec at google dot com
  2007-07-12  8:22 ` [Bug target/32735] " ubizjak at gmail dot com
  2007-07-14 14:04 ` ubizjak at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12  1:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from mec at google dot com  2007-07-12 01:33 -------
Created an attachment (id=13897)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13897&action=view)
Assembly code


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/32735] i686 sse2 generates more movdqa than necessary
  2007-07-12  1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
                   ` (3 preceding siblings ...)
  2007-07-12  1:33 ` mec at google dot com
@ 2007-07-12  8:22 ` ubizjak at gmail dot com
  2007-07-14 14:04 ` ubizjak at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: ubizjak at gmail dot com @ 2007-07-12  8:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from ubizjak at gmail dot com  2007-07-12 08:22 -------
(In reply to comment #0)

> The loop for CallSumDeltas2 compiles to:
> 
> .L7:
>         movdqa  %xmm1, %xmm0
>         pslldq  $4, %xmm0
>         addl    $1, %eax
>         paddd   %xmm1, %xmm0
>         cmpl    $100000000, %eax
>         movdqa  %xmm0, %xmm1
>         pslldq  $8, %xmm1
>         paddd   %xmm1, %xmm0
>         movdqa  %xmm0, %xmm1
>         movdqa  %xmm0, foo1
>         jne     .L7
> 
> ===
> 
> This is two more movdqa then the hand-written code in CallSumDeltas3.

         paddd   %xmm1, %xmm0       (2)
         movdqa  %xmm0, %xmm1       (2)
         movdqa  %xmm0, foo1        (1)
         jne     .L7

(1) is assignment to a global variable. I'm not sure that it can be pushed out
of the loop, but this can be solved by adding a local temporary in
CallSumDeltas2().

(2) is probably regmove, failing to optimize:

(set (reg:V4SI 21 xmm0 [72])
     (plus:V4SI (reg:V4SI 21 xmm0 [69])
                (reg:V4SI 22 xmm1 [71]))) 843 {*addv4si3} (nil))

(set (reg:V2DI 22 xmm1 [orig:73 foo1 ] [73])
     (reg:V2DI 21 xmm0 [72])) 698 {*movv2di_internal} (nil))

into

(set (reg:V4SI 21 xmm1 [72])
     (plus:V4SI (reg:V4SI 21 xmm1 [69])
                (reg:V4SI 22 xmm0 [71]))) 843 {*addv4si3} (nil))


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/32735] i686 sse2 generates more movdqa than necessary
  2007-07-12  1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
                   ` (4 preceding siblings ...)
  2007-07-12  8:22 ` [Bug target/32735] " ubizjak at gmail dot com
@ 2007-07-14 14:04 ` ubizjak at gmail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: ubizjak at gmail dot com @ 2007-07-14 14:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from ubizjak at gmail dot com  2007-07-14 14:04 -------
(In reply to comment #5)

> > This is two more movdqa then the hand-written code in CallSumDeltas3.
> 
>          paddd   %xmm1, %xmm0       (2)
>          movdqa  %xmm0, %xmm1       (2)
>          movdqa  %xmm0, foo1        (1)
>          jne     .L7

(1) is fixed by http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01330.html

(2) it looks like a register allocator should be enhanced to match insn
_output_ to the input that will produce less moves. We are dealing with %0:

  [(set (match_operand:SSEMODEI 0 "register_operand" "=x")
        (plus:SSEMODEI
          (match_operand:SSEMODEI 1 "nonimmediate_operand" "%0")
          (match_operand:SSEMODEI 2 "nonimmediate_operand" "xm")))]

So there is no reason why RA shouldn't match output with most optimal _input_,
producing one insn shorter sequence:

        ...
        cmpl    $100000000, %eax
        movdqa  %xmm0, %xmm1
        pslldq  $8, %xmm1
        paddd   %xmm0, %xmm1        # paddd   %xmm1, %xmm0
                                    # movdqa  %xmm0, %xmm1
        jne     .L7


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2007-07-14 14:04:19
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-07-14 14:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-12  1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
2007-07-12  1:30 ` [Bug c++/32735] " mec at google dot com
2007-07-12  1:31 ` mec at google dot com
2007-07-12  1:33 ` mec at google dot com
2007-07-12  1:33 ` mec at google dot com
2007-07-12  8:22 ` [Bug target/32735] " ubizjak at gmail dot com
2007-07-14 14:04 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).