* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
@ 2004-11-19 14:44 ` pinskia at gcc dot gnu dot org
2004-11-19 15:41 ` uros at gcc dot gnu dot org
` (9 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-19 14:44 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-11-19 14:44 -------
Confirmed. This is either a target bug or an RTL optimization problem. The reason why I say that is
because the builtins are not expandded before reaching RTL.
--
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
Component|tree-optimization |target
Ever Confirmed| |1
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2004-11-19 14:44:09
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
@ 2004-11-19 15:41 ` uros at gcc dot gnu dot org
2005-01-11 23:58 ` rth at gcc dot gnu dot org
` (8 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-11-19 15:41 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at gcc dot gnu dot org 2004-11-19 15:41 -------
If val1 is moved out of the main() function, produced code is OK:
--cut here--
#include <xmmintrin.h>
#include <stdio.h>
float val1[4] = {1.3f, 1.4f, 1.5f, 1.6f};
int main(void) {
float result[4];
__m128 A;
...
--cut here--
val1: <- vector is merged this time
.long 1067869798
.long 1068708659
.long 1069547520
.long 1070386381
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "%f %f %f %f\n"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $56, %esp
andl $-16, %esp
movups val1, %xmm0 <- loaded directy into xmm reg
movups %xmm0, -16(%ebp)
subl $16, %esp
flds -4(%ebp)
...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
2004-11-19 15:41 ` uros at gcc dot gnu dot org
@ 2005-01-11 23:58 ` rth at gcc dot gnu dot org
2005-01-12 6:53 ` uros at kss-loka dot si
` (7 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-11 23:58 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rth at gcc dot gnu dot org 2005-01-11 23:58 -------
Your first test case is fixed by the patch for PR13366. We now get
.align 16
.LC0:
.long 1067869798
.long 1067869798
.long 1067869798
.long 1067869798
...
movaps .LC0, %xmm0
movups %xmm0, 56(%esp)
I really don't know what you expected out of your second test case. Perhaps
the problem is that we don't expose what movups means to the compiler. I can
see that perhaps we could reuse MISALIGNED_INDIRECT_REF to represent this
(which as a side benefit would result in movhps+movlps instead of one movups,
which runs faster). But I doubt that we have the machinery at the tree level
to copy-propagate the aggregate initialization from val1[4] -> result[4]. I'm
pretty sure that we have open enhancement requests for this already.
So shall we mark this fixed?
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |WAITING
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (2 preceding siblings ...)
2005-01-11 23:58 ` rth at gcc dot gnu dot org
@ 2005-01-12 6:53 ` uros at kss-loka dot si
2005-01-12 9:24 ` steven at gcc dot gnu dot org
` (6 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: uros at kss-loka dot si @ 2005-01-12 6:53 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-01-12 06:53 -------
(In reply to comment #3)
In the second testcase, compiler should figure out that the whole val1[] array
is initialized with constants. In this case, .LCx constant vector can be built
in compile time and a load of constants to stack can be implemented by "movaps
.LCx, %xmm0 / movups -32(%ebs), %xmm" insn pair.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (3 preceding siblings ...)
2005-01-12 6:53 ` uros at kss-loka dot si
@ 2005-01-12 9:24 ` steven at gcc dot gnu dot org
2005-01-12 9:37 ` steven at gcc dot gnu dot org
` (5 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-01-12 9:24 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-01-12 09:24 -------
If this is closed, let's be sure to find that enhancement request and close
this as a dup.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (4 preceding siblings ...)
2005-01-12 9:24 ` steven at gcc dot gnu dot org
@ 2005-01-12 9:37 ` steven at gcc dot gnu dot org
2005-01-12 10:54 ` uros at kss-loka dot si
` (4 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-01-12 9:37 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From steven at gcc dot gnu dot org 2005-01-12 09:37 -------
AMD64 has this at -O2:
Test case 1:
main:
.LFB499:
subq $24, %rsp
.LCFI0:
movl $.LC1, %edi
movl $4, %eax
movaps .LC0(%rip), %xmm0
movups %xmm0, (%rsp)
cvtss2sd 12(%rsp), %xmm3
cvtss2sd 8(%rsp), %xmm2
cvtss2sd 4(%rsp), %xmm1
cvtss2sd (%rsp), %xmm0
call printf
xorl %eax, %eax
addq $24, %rsp
ret
.LFE499:
Test case 2:
main:
.LFB499:
subq $40, %rsp
.LCFI0:
movl $.LC4, %edi
movl $4, %eax
movl $0x3fa66666, 16(%rsp)
movl $0x3fb33333, 20(%rsp)
movl $0x3fc00000, 24(%rsp)
movl $0x3fcccccd, 28(%rsp)
movups 16(%rsp), %xmm0
movups %xmm0, (%rsp)
cvtss2sd 12(%rsp), %xmm3
cvtss2sd 8(%rsp), %xmm2
cvtss2sd 4(%rsp), %xmm1
cvtss2sd (%rsp), %xmm0
call printf
xorl %eax, %eax
addq $40, %rsp
ret
With -m32 -march=pentium4 -mtune=prescott I get this:
Test case 1:
main:
pushl %ebp
movl %esp, %ebp
subl $56, %esp
andl $-16, %esp
subl $16, %esp
movaps .LC0, %xmm0
movups %xmm0, -16(%ebp)
flds -4(%ebp)
fstpl 28(%esp)
flds -8(%ebp)
fstpl 20(%esp)
flds -12(%ebp)
fstpl 12(%esp)
flds -16(%ebp)
fstpl 4(%esp)
movl $.LC1, (%esp)
call printf
xorl %eax, %eax
leave
ret
Test case 2 (which is basically the same as Uros' code):
main:
pushl %ebp
movl %esp, %ebp
subl $72, %esp
andl $-16, %esp
subl $16, %esp
movl $0x3fa66666, -16(%ebp)
movl $0x3fb33333, -12(%ebp)
movl $0x3fc00000, -8(%ebp)
movl $0x3fcccccd, -4(%ebp)
movups -16(%ebp), %xmm0
movups %xmm0, -32(%ebp)
flds -20(%ebp)
fstpl 28(%esp)
flds -24(%ebp)
fstpl 20(%esp)
flds -28(%ebp)
fstpl 12(%esp)
flds -32(%ebp)
fstpl 4(%esp)
movl $.LC4, (%esp)
call printf
xorl %eax, %eax
leave
ret
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (5 preceding siblings ...)
2005-01-12 9:37 ` steven at gcc dot gnu dot org
@ 2005-01-12 10:54 ` uros at kss-loka dot si
2005-01-18 9:49 ` rth at gcc dot gnu dot org
` (3 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: uros at kss-loka dot si @ 2005-01-12 10:54 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From uros at kss-loka dot si 2005-01-12 10:54 -------
Another testcase that I think should be optimized:
#include <xmmintrin.h>
__m128 test() {
float val1[4] = {0.0f, 0.0f, 0.0f, 0.0f};
return _mm_loadu_ps(val1);
}
This is currently compiled to:
test:
pushl %ebp
movl $0x00000000, %eax
movl %esp, %ebp
subl $16, %esp
movl %eax, -16(%ebp)
movl %eax, -12(%ebp)
movl %eax, -8(%ebp)
movl %eax, -4(%ebp)
movups -16(%ebp), %xmm0
leave
ret
But I think gcc it should produce something like:
test:
pushl %ebp
xorps %xmm0, %xmm0
movl %esp, %ebp
subl $16, %esp
(*) movups %xmm0, -16(%ebp)
leave
ret
Perhaps even the store to stack is not necessary in this particular case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (6 preceding siblings ...)
2005-01-12 10:54 ` uros at kss-loka dot si
@ 2005-01-18 9:49 ` rth at gcc dot gnu dot org
2005-01-18 9:52 ` rth at gcc dot gnu dot org
` (2 subsequent siblings)
10 siblings, 0 replies; 14+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-18 9:49 UTC (permalink / raw)
To: gcc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |14295
Status|WAITING |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (7 preceding siblings ...)
2005-01-18 9:49 ` rth at gcc dot gnu dot org
@ 2005-01-18 9:52 ` rth at gcc dot gnu dot org
2005-09-14 6:51 ` pinskia at gcc dot gnu dot org
2005-09-14 7:09 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 14+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-18 9:52 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From rth at gcc dot gnu dot org 2005-01-18 09:50 -------
Found the tree-ssa aggregate copy-propagation pr. Made this pr depend on it,
as this has a different sort of test case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (8 preceding siblings ...)
2005-01-18 9:52 ` rth at gcc dot gnu dot org
@ 2005-09-14 6:51 ` pinskia at gcc dot gnu dot org
2005-09-14 7:09 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-14 6:51 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-14 06:51 -------
Hmm, but vectors are not consider as aggregates.
at the tree level right before optimization, we have:
A_6 = {1.2999999523162841796875e+0, 1.2999999523162841796875e+0,
1.2999999523162841796875e+0, 1.2999999523162841796875e+0};
# result_26 = V_MAY_DEF <result_14>;
__builtin_ia32_storeups (&result, A_6);
I would have thought that VECTOR_CST would be considered a constant and propgrated into
__builtin_ia32_storeups and that we would have folded __builtin_ia32_storeups at the tree level.
So I think there are two issues now, the first is that we don't constant prop VECTOR_CST (if this is truely
a VECTOR_CST in store_ccp):
A_6 = {1.2999999523162841796875e+0, 1.2999999523162841796875e+0,
1.2999999523162841796875e+0, 1.2999999523162841796875e+0};
Lattice value changed to VARYING. Adding SSA edges to worklist.
And then we need a fold specific to x86 for __builtin_ia32_storeups, after that it should just work.
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn|14295 |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
2004-11-19 8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
` (9 preceding siblings ...)
2005-09-14 6:51 ` pinskia at gcc dot gnu dot org
@ 2005-09-14 7:09 ` pinskia at gcc dot gnu dot org
10 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-14 7:09 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-14 07:09 -------
Actually the issue is that we don't change the constructor into a VECTOR_CST:
<constructor 0x41ec8720
type <vector_type 0x41ebb9a0 __v4sf type <real_type 0x41e0fd90 float>
sizes-gimplified BLK size <integer_cst 0x41e11a60 128> unit size <integer_cst
0x41e11a80 16>
align 128 symtab 0 alias set -1 nunits 4>
>
which we should in this case.
And this happens in DOM or in CCP with replacing _mm_set1_ps with _mm_set1_ps (which really are
implemented the same way).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562
^ permalink raw reply [flat|nested] 14+ messages in thread