public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary
@ 2007-07-12 1:30 mec at google dot com
2007-07-12 1:30 ` [Bug c++/32735] " mec at google dot com
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12 1:30 UTC (permalink / raw)
To: gcc-bugs
Test program attached.
Command line:
mec@hollerith:~/exp-sum-delta$ /home/mec/gcc-4.3-20070707/install/bin/g++ -v -S
-O2 -msse2 sum-delta.cc
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: /home/mec/gcc-4.3-20070707/src/configure
--build=i686-pc-linux-
gnu --host=i686-pc-linux-gnu --target=i686-pc-linux-gnu
--prefix=/home/mec/gcc-4
.3-20070707/install --enable-languages=c,c++,objc,obj-c++,treelang
--with-gmp=/h
ome/mec/gmp-4.2.1/install --with-mpfr=/home/mec/mpfr-2.2.1/install
Thread model: posix
gcc version 4.3.0 20070707 (experimental)
/home/mec/gcc-4.3-20070707/install/libexec/gcc/i686-pc-linux-gnu/4.3.0/cc1plus
-quiet -v -D_GNU_SOURCE sum-delta.cc -quiet -dumpbase sum-delta.cc -msse2
-mtune
=generic -auxbase sum-delta -O2 -version -o sum-delta.s
ignoring nonexistent directory
"/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-
pc-linux-gnu/4.3.0/../../../../i686-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/../../../../
include/c++/4.3.0
/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/../../../../
include/c++/4.3.0/i686-pc-linux-gnu
/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/../../../../
include/c++/4.3.0/backward
/usr/local/include
/home/mec/gcc-4.3-20070707/install/include
/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/include
/home/mec/gcc-4.3-20070707/install/lib/gcc/i686-pc-linux-gnu/4.3.0/include-fixe
d
/usr/include
End of search list.
GNU C++ version 4.3.0 20070707 (experimental) (i686-pc-linux-gnu)
compiled by GNU C version 4.3.0 20070707 (experimental), GMP version
4.2
.1, MPFR version 2.2.1.
warning: GMP header version 4.2.1 differs from library version 4.1.4.
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 1338ea4083517ffee92283f96caf8872
===
The loop for CallSumDeltas2 compiles to:
.L7:
movdqa %xmm1, %xmm0
pslldq $4, %xmm0
addl $1, %eax
paddd %xmm1, %xmm0
cmpl $100000000, %eax
movdqa %xmm0, %xmm1
pslldq $8, %xmm1
paddd %xmm1, %xmm0
movdqa %xmm0, %xmm1
movdqa %xmm0, foo1
jne .L7
===
This is two more movdqa then the hand-written code in CallSumDeltas3.
--
Summary: i686 sse2 generates more movdqa than necessary
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: mec at google dot com
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c++/32735] i686 sse2 generates more movdqa than necessary
2007-07-12 1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
@ 2007-07-12 1:30 ` mec at google dot com
2007-07-12 1:31 ` mec at google dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12 1:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from mec at google dot com 2007-07-12 01:30 -------
Created an attachment (id=13894)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13894&action=view)
Test program
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c++/32735] i686 sse2 generates more movdqa than necessary
2007-07-12 1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
2007-07-12 1:30 ` [Bug c++/32735] " mec at google dot com
@ 2007-07-12 1:31 ` mec at google dot com
2007-07-12 1:33 ` mec at google dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12 1:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from mec at google dot com 2007-07-12 01:31 -------
Created an attachment (id=13895)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13895&action=view)
Generated code
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c++/32735] i686 sse2 generates more movdqa than necessary
2007-07-12 1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
2007-07-12 1:30 ` [Bug c++/32735] " mec at google dot com
2007-07-12 1:31 ` mec at google dot com
@ 2007-07-12 1:33 ` mec at google dot com
2007-07-12 1:33 ` mec at google dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12 1:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from mec at google dot com 2007-07-12 01:33 -------
Created an attachment (id=13896)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13896&action=view)
Assembly code
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c++/32735] i686 sse2 generates more movdqa than necessary
2007-07-12 1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
` (2 preceding siblings ...)
2007-07-12 1:33 ` mec at google dot com
@ 2007-07-12 1:33 ` mec at google dot com
2007-07-12 8:22 ` [Bug target/32735] " ubizjak at gmail dot com
2007-07-14 14:04 ` ubizjak at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: mec at google dot com @ 2007-07-12 1:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from mec at google dot com 2007-07-12 01:33 -------
Created an attachment (id=13897)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13897&action=view)
Assembly code
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/32735] i686 sse2 generates more movdqa than necessary
2007-07-12 1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
` (3 preceding siblings ...)
2007-07-12 1:33 ` mec at google dot com
@ 2007-07-12 8:22 ` ubizjak at gmail dot com
2007-07-14 14:04 ` ubizjak at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: ubizjak at gmail dot com @ 2007-07-12 8:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from ubizjak at gmail dot com 2007-07-12 08:22 -------
(In reply to comment #0)
> The loop for CallSumDeltas2 compiles to:
>
> .L7:
> movdqa %xmm1, %xmm0
> pslldq $4, %xmm0
> addl $1, %eax
> paddd %xmm1, %xmm0
> cmpl $100000000, %eax
> movdqa %xmm0, %xmm1
> pslldq $8, %xmm1
> paddd %xmm1, %xmm0
> movdqa %xmm0, %xmm1
> movdqa %xmm0, foo1
> jne .L7
>
> ===
>
> This is two more movdqa then the hand-written code in CallSumDeltas3.
paddd %xmm1, %xmm0 (2)
movdqa %xmm0, %xmm1 (2)
movdqa %xmm0, foo1 (1)
jne .L7
(1) is assignment to a global variable. I'm not sure that it can be pushed out
of the loop, but this can be solved by adding a local temporary in
CallSumDeltas2().
(2) is probably regmove, failing to optimize:
(set (reg:V4SI 21 xmm0 [72])
(plus:V4SI (reg:V4SI 21 xmm0 [69])
(reg:V4SI 22 xmm1 [71]))) 843 {*addv4si3} (nil))
(set (reg:V2DI 22 xmm1 [orig:73 foo1 ] [73])
(reg:V2DI 21 xmm0 [72])) 698 {*movv2di_internal} (nil))
into
(set (reg:V4SI 21 xmm1 [72])
(plus:V4SI (reg:V4SI 21 xmm1 [69])
(reg:V4SI 22 xmm0 [71]))) 843 {*addv4si3} (nil))
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/32735] i686 sse2 generates more movdqa than necessary
2007-07-12 1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
` (4 preceding siblings ...)
2007-07-12 8:22 ` [Bug target/32735] " ubizjak at gmail dot com
@ 2007-07-14 14:04 ` ubizjak at gmail dot com
5 siblings, 0 replies; 7+ messages in thread
From: ubizjak at gmail dot com @ 2007-07-14 14:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from ubizjak at gmail dot com 2007-07-14 14:04 -------
(In reply to comment #5)
> > This is two more movdqa then the hand-written code in CallSumDeltas3.
>
> paddd %xmm1, %xmm0 (2)
> movdqa %xmm0, %xmm1 (2)
> movdqa %xmm0, foo1 (1)
> jne .L7
(1) is fixed by http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01330.html
(2) it looks like a register allocator should be enhanced to match insn
_output_ to the input that will produce less moves. We are dealing with %0:
[(set (match_operand:SSEMODEI 0 "register_operand" "=x")
(plus:SSEMODEI
(match_operand:SSEMODEI 1 "nonimmediate_operand" "%0")
(match_operand:SSEMODEI 2 "nonimmediate_operand" "xm")))]
So there is no reason why RA shouldn't match output with most optimal _input_,
producing one insn shorter sequence:
...
cmpl $100000000, %eax
movdqa %xmm0, %xmm1
pslldq $8, %xmm1
paddd %xmm0, %xmm1 # paddd %xmm1, %xmm0
# movdqa %xmm0, %xmm1
jne .L7
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2007-07-14 14:04:19
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32735
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-07-14 14:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-12 1:30 [Bug c++/32735] New: i686 sse2 generates more movdqa than necessary mec at google dot com
2007-07-12 1:30 ` [Bug c++/32735] " mec at google dot com
2007-07-12 1:31 ` mec at google dot com
2007-07-12 1:33 ` mec at google dot com
2007-07-12 1:33 ` mec at google dot com
2007-07-12 8:22 ` [Bug target/32735] " ubizjak at gmail dot com
2007-07-14 14:04 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).