public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack
@ 2004-11-19 8:49 uros at gcc dot gnu dot org
2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-11-19 8:49 UTC (permalink / raw)
To: gcc-bugs
Compiling this testcase with '-O2 -msse' an unoptimal code is produced. 'val1'
is merged into vector at compile time, but it is still loaded onto stack. Gcc
does not detect that 'val1' value on stack is sitting there unused.
#include <xmmintrin.h>
#include <stdio.h>
int main(void) {
float val1 = 1.3f;
float result[4];
__m128 A;
A = _mm_load1_ps(&val1);
_mm_storeu_ps(result, A);
printf("%f %f %f %f\n", result[0], result[1], result[2], result[3]);
return 0;
}
This code is produced:
...
.LC2: <- merged vector
.long 1067869798
.long 0
.long 0
.long 0
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $72, %esp
movss .LC2, %xmm0 <- vector is loaded into %xmm0
andl $-16, %esp
subl $16, %esp
movl $0x3fa66666, -4(%ebp) <- 'val1' is put on stack here
shufps $0, %xmm0, %xmm0
movl $.LC1, (%esp)
movups %xmm0, -20(%ebp)
flds -8(%ebp)
fstpl 28(%esp)
flds -12(%ebp)
fstpl 20(%esp)
flds -16(%ebp)
fstpl 12(%esp)
flds -20(%ebp)
fstpl 4(%esp)
call printf
xorl %eax, %eax
leave
ret
Even worser situation arises with:
int main(void) {
float val1[4] = {1.3f, 1.4f, 1.5f, 1.6f};
float result[4];
__m128 A;
A = _mm_loadu_ps(val1);
_mm_storeu_ps(result, A);
printf("%f %f %f %f\n", result[0], result[1], result[2], result[3]);
return 0;
Following asm code is produced:
main:
pushl %ebp
movl %esp, %ebp
subl $72, %esp
andl $-16, %esp
movl $0x3fa66666, -16(%ebp)
subl $16, %esp
movl $0x3fb33333, -12(%ebp)
movl $0x3fc00000, -8(%ebp)
movl $0x3fcccccd, -4(%ebp)
movups -16(%ebp), %xmm0
movups %xmm0, -32(%ebp)
flds -20(%ebp)
fstpl 28(%esp)
flds -24(%ebp)
fstpl 20(%esp)
flds -28(%ebp)
fstpl 12(%esp)
flds -32(%ebp)
fstpl 4(%esp)
movl $.LC4, (%esp)
call printf
xorl %eax, %eax
leave
ret
The constant values are not merged into vector at compile time, the vector is
built on the stack and then loaded into %xmm register. Value on stack is again
left unused after vector initialization.
Uros.
--
Summary: SSE constant vector initialization produces dead
constant values on stack
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: uros at gcc dot gnu dot org
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
@ 2004-11-19 14:44 ` pinskia at gcc dot gnu dot org
2004-11-19 15:41 ` uros at gcc dot gnu dot org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-19 14:44 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-19 14:44 -------
Confirmed. This is either a target bug or an RTL optimization problem. The reason why I say that is
because the builtins are not expandded before reaching RTL.
--
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
Component|tree-optimization |target
Ever Confirmed| |1
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2004-11-19 14:44:09
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
@ 2004-11-19 15:41 ` uros at gcc dot gnu dot org
2005-01-11 23:58 ` rth at gcc dot gnu dot org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-11-19 15:41 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at gcc dot gnu dot org 2004-11-19 15:41 -------
If val1 is moved out of the main() function, produced code is OK:
--cut here--
#include <xmmintrin.h>
#include <stdio.h>
float val1[4] = {1.3f, 1.4f, 1.5f, 1.6f};
int main(void) {
float result[4];
__m128 A;
...
--cut here--
val1: <- vector is merged this time
.long 1067869798
.long 1068708659
.long 1069547520
.long 1070386381
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "%f %f %f %f\n"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $56, %esp
andl $-16, %esp
movups val1, %xmm0 <- loaded directy into xmm reg
movups %xmm0, -16(%ebp)
subl $16, %esp
flds -4(%ebp)
...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
2004-11-19 15:41 ` uros at gcc dot gnu dot org
@ 2005-01-11 23:58 ` rth at gcc dot gnu dot org
2005-01-12 6:53 ` uros at kss-loka dot si
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-11 23:58 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rth at gcc dot gnu dot org 2005-01-11 23:58 -------
Your first test case is fixed by the patch for PR13366. We now get
.align 16
.LC0:
.long 1067869798
.long 1067869798
.long 1067869798
.long 1067869798
...
movaps .LC0, %xmm0
movups %xmm0, 56(%esp)
I really don't know what you expected out of your second test case. Perhaps
the problem is that we don't expose what movups means to the compiler. I can
see that perhaps we could reuse MISALIGNED_INDIRECT_REF to represent this
(which as a side benefit would result in movhps+movlps instead of one movups,
which runs faster). But I doubt that we have the machinery at the tree level
to copy-propagate the aggregate initialization from val1[4] -> result[4]. I'm
pretty sure that we have open enhancement requests for this already.
So shall we mark this fixed?
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |WAITING
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (2 preceding siblings ...)
2005-01-11 23:58 ` rth at gcc dot gnu dot org
@ 2005-01-12 6:53 ` uros at kss-loka dot si
2005-01-12 9:24 ` steven at gcc dot gnu dot org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: uros at kss-loka dot si @ 2005-01-12 6:53 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-01-12 06:53 -------
(In reply to comment #3)
In the second testcase, compiler should figure out that the whole val1[] array
is initialized with constants. In this case, .LCx constant vector can be built
in compile time and a load of constants to stack can be implemented by "movaps
.LCx, %xmm0 / movups -32(%ebs), %xmm" insn pair.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (3 preceding siblings ...)
2005-01-12 6:53 ` uros at kss-loka dot si
@ 2005-01-12 9:24 ` steven at gcc dot gnu dot org
2005-01-12 9:37 ` steven at gcc dot gnu dot org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-01-12 9:24 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-01-12 09:24 -------
If this is closed, let's be sure to find that enhancement request and close
this as a dup.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (4 preceding siblings ...)
2005-01-12 9:24 ` steven at gcc dot gnu dot org
@ 2005-01-12 9:37 ` steven at gcc dot gnu dot org
2005-01-12 10:54 ` uros at kss-loka dot si
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-01-12 9:37 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-01-12 09:37 -------
AMD64 has this at -O2:
Test case 1:
main:
.LFB499:
subq $24, %rsp
.LCFI0:
movl $.LC1, %edi
movl $4, %eax
movaps .LC0(%rip), %xmm0
movups %xmm0, (%rsp)
cvtss2sd 12(%rsp), %xmm3
cvtss2sd 8(%rsp), %xmm2
cvtss2sd 4(%rsp), %xmm1
cvtss2sd (%rsp), %xmm0
call printf
xorl %eax, %eax
addq $24, %rsp
ret
.LFE499:
Test case 2:
main:
.LFB499:
subq $40, %rsp
.LCFI0:
movl $.LC4, %edi
movl $4, %eax
movl $0x3fa66666, 16(%rsp)
movl $0x3fb33333, 20(%rsp)
movl $0x3fc00000, 24(%rsp)
movl $0x3fcccccd, 28(%rsp)
movups 16(%rsp), %xmm0
movups %xmm0, (%rsp)
cvtss2sd 12(%rsp), %xmm3
cvtss2sd 8(%rsp), %xmm2
cvtss2sd 4(%rsp), %xmm1
cvtss2sd (%rsp), %xmm0
call printf
xorl %eax, %eax
addq $40, %rsp
ret
With -m32 -march=pentium4 -mtune=prescott I get this:
Test case 1:
main:
pushl %ebp
movl %esp, %ebp
subl $56, %esp
andl $-16, %esp
subl $16, %esp
movaps .LC0, %xmm0
movups %xmm0, -16(%ebp)
flds -4(%ebp)
fstpl 28(%esp)
flds -8(%ebp)
fstpl 20(%esp)
flds -12(%ebp)
fstpl 12(%esp)
flds -16(%ebp)
fstpl 4(%esp)
movl $.LC1, (%esp)
call printf
xorl %eax, %eax
leave
ret
Test case 2 (which is basically the same as Uros' code):
main:
pushl %ebp
movl %esp, %ebp
subl $72, %esp
andl $-16, %esp
subl $16, %esp
movl $0x3fa66666, -16(%ebp)
movl $0x3fb33333, -12(%ebp)
movl $0x3fc00000, -8(%ebp)
movl $0x3fcccccd, -4(%ebp)
movups -16(%ebp), %xmm0
movups %xmm0, -32(%ebp)
flds -20(%ebp)
fstpl 28(%esp)
flds -24(%ebp)
fstpl 20(%esp)
flds -28(%ebp)
fstpl 12(%esp)
flds -32(%ebp)
fstpl 4(%esp)
movl $.LC4, (%esp)
call printf
xorl %eax, %eax
leave
ret
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (5 preceding siblings ...)
2005-01-12 9:37 ` steven at gcc dot gnu dot org
@ 2005-01-12 10:54 ` uros at kss-loka dot si
2005-01-18 9:49 ` rth at gcc dot gnu dot org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: uros at kss-loka dot si @ 2005-01-12 10:54 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-01-12 10:54 -------
Another testcase that I think should be optimized:
#include <xmmintrin.h>
__m128 test() {
float val1[4] = {0.0f, 0.0f, 0.0f, 0.0f};
return _mm_loadu_ps(val1);
}
This is currently compiled to:
test:
pushl %ebp
movl $0x00000000, %eax
movl %esp, %ebp
subl $16, %esp
movl %eax, -16(%ebp)
movl %eax, -12(%ebp)
movl %eax, -8(%ebp)
movl %eax, -4(%ebp)
movups -16(%ebp), %xmm0
leave
ret
But I think gcc it should produce something like:
test:
pushl %ebp
xorps %xmm0, %xmm0
movl %esp, %ebp
subl $16, %esp
(*) movups %xmm0, -16(%ebp)
leave
ret
Perhaps even the store to stack is not necessary in this particular case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (6 preceding siblings ...)
2005-01-12 10:54 ` uros at kss-loka dot si
@ 2005-01-18 9:49 ` rth at gcc dot gnu dot org
2005-01-18 9:52 ` rth at gcc dot gnu dot org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-18 9:49 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |14295
Status|WAITING |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (7 preceding siblings ...)
2005-01-18 9:49 ` rth at gcc dot gnu dot org
@ 2005-01-18 9:52 ` rth at gcc dot gnu dot org
2005-09-14 6:51 ` pinskia at gcc dot gnu dot org
2005-09-14 7:09 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-18 9:52 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rth at gcc dot gnu dot org 2005-01-18 09:50 -------
Found the tree-ssa aggregate copy-propagation pr. Made this pr depend on it,
as this has a different sort of test case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (8 preceding siblings ...)
2005-01-18 9:52 ` rth at gcc dot gnu dot org
@ 2005-09-14 6:51 ` pinskia at gcc dot gnu dot org
2005-09-14 7:09 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-14 6:51 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-14 06:51 -------
Hmm, but vectors are not consider as aggregates.
at the tree level right before optimization, we have:
A_6 = {1.2999999523162841796875e+0, 1.2999999523162841796875e+0,
1.2999999523162841796875e+0, 1.2999999523162841796875e+0};
# result_26 = V_MAY_DEF <result_14>;
__builtin_ia32_storeups (&result, A_6);
I would have thought that VECTOR_CST would be considered a constant and propgrated into
__builtin_ia32_storeups and that we would have folded __builtin_ia32_storeups at the tree level.
So I think there are two issues now, the first is that we don't constant prop VECTOR_CST (if this is truely
a VECTOR_CST in store_ccp):
A_6 = {1.2999999523162841796875e+0, 1.2999999523162841796875e+0,
1.2999999523162841796875e+0, 1.2999999523162841796875e+0};
Lattice value changed to VARYING. Adding SSA edges to worklist.
And then we need a fold specific to x86 for __builtin_ia32_storeups, after that it should just work.
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn|14295 |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (9 preceding siblings ...)
2005-09-14 6:51 ` pinskia at gcc dot gnu dot org
@ 2005-09-14 7:09 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-14 7:09 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-14 07:09 -------
Actually the issue is that we don't change the constructor into a VECTOR_CST:
<constructor 0x41ec8720
type <vector_type 0x41ebb9a0 __v4sf type <real_type 0x41e0fd90 float>
sizes-gimplified BLK size <integer_cst 0x41e11a60 128> unit size <integer_cst
0x41e11a80 16>
align 128 symtab 0 alias set -1 nunits 4>
>
which we should in this case.
And this happens in DOM or in CCP with replacing _mm_set1_ps with _mm_set1_ps (which really are
implemented the same way).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-09-14 7:09 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
2004-11-19 15:41 ` uros at gcc dot gnu dot org
2005-01-11 23:58 ` rth at gcc dot gnu dot org
2005-01-12 6:53 ` uros at kss-loka dot si
2005-01-12 9:24 ` steven at gcc dot gnu dot org
2005-01-12 9:37 ` steven at gcc dot gnu dot org
2005-01-12 10:54 ` uros at kss-loka dot si
2005-01-18 9:49 ` rth at gcc dot gnu dot org
2005-01-18 9:52 ` rth at gcc dot gnu dot org
2005-09-14 6:51 ` pinskia at gcc dot gnu dot org
2005-09-14 7:09 ` pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).