public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack
@ 2004-11-19  8:49 uros at gcc dot gnu dot org
  2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-11-19  8:49 UTC (permalink / raw)
  To: gcc-bugs

Compiling this testcase with '-O2 -msse' an unoptimal code is produced. 'val1'
is merged into vector at compile time, but it is still loaded onto stack. Gcc
does not detect that 'val1' value on stack is sitting there unused.

#include <xmmintrin.h>
#include <stdio.h>

int main(void) {
	float val1 = 1.3f;
	float result[4];
	__m128 A;

	A = _mm_load1_ps(&val1);
	_mm_storeu_ps(result, A);

	printf("%f %f %f %f\n", result[0], result[1], result[2], result[3]);
	return 0;
}

This code is produced:
...
.LC2:                       <- merged vector
        .long   1067869798
        .long   0
        .long   0
        .long   0
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $72, %esp
        movss   .LC2, %xmm0           <- vector is loaded into %xmm0
        andl    $-16, %esp
        subl    $16, %esp
        movl    $0x3fa66666, -4(%ebp) <- 'val1' is put on stack here
        shufps  $0, %xmm0, %xmm0
        movl    $.LC1, (%esp)
        movups  %xmm0, -20(%ebp)
        flds    -8(%ebp)
        fstpl   28(%esp)
        flds    -12(%ebp)
        fstpl   20(%esp)
        flds    -16(%ebp)
        fstpl   12(%esp)
        flds    -20(%ebp)
        fstpl   4(%esp)
        call    printf
        xorl    %eax, %eax
        leave
        ret

Even worser situation arises with:

int main(void) {
	float val1[4] = {1.3f, 1.4f, 1.5f, 1.6f};
	float result[4];
	__m128 A;

	A = _mm_loadu_ps(val1);
	_mm_storeu_ps(result, A);

	printf("%f %f %f %f\n", result[0], result[1], result[2], result[3]);
	return 0;

Following asm code is produced:
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $72, %esp
        andl    $-16, %esp
        movl    $0x3fa66666, -16(%ebp)
        subl    $16, %esp
        movl    $0x3fb33333, -12(%ebp)
        movl    $0x3fc00000, -8(%ebp)
        movl    $0x3fcccccd, -4(%ebp)
        movups  -16(%ebp), %xmm0
        movups  %xmm0, -32(%ebp)
        flds    -20(%ebp)
        fstpl   28(%esp)
        flds    -24(%ebp)
        fstpl   20(%esp)
        flds    -28(%ebp)
        fstpl   12(%esp)
        flds    -32(%ebp)
        fstpl   4(%esp)
        movl    $.LC4, (%esp)
        call    printf
        xorl    %eax, %eax
        leave
        ret

The constant values are not merged into vector at compile time, the vector is
built on the stack and then loaded into %xmm register. Value on stack is again
left unused after vector initialization.

Uros.

-- 
           Summary: SSE constant vector initialization produces dead
                    constant values on stack
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: uros at gcc dot gnu dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
@ 2004-11-19 14:44 ` pinskia at gcc dot gnu dot org
  2004-11-19 15:41 ` uros at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-19 14:44 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-19 14:44 -------
Confirmed.  This is either a target bug or an RTL optimization problem.  The reason why I say that is 
because the builtins are not expandded before reaching RTL.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
          Component|tree-optimization           |target
     Ever Confirmed|                            |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2004-11-19 14:44:09
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
  2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
@ 2004-11-19 15:41 ` uros at gcc dot gnu dot org
  2005-01-11 23:58 ` rth at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: uros at gcc dot gnu dot org @ 2004-11-19 15:41 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at gcc dot gnu dot org  2004-11-19 15:41 -------
If val1 is moved out of the main() function, produced code is OK:

--cut here--
#include <xmmintrin.h>
#include <stdio.h>

float val1[4] = {1.3f, 1.4f, 1.5f, 1.6f};

int main(void) {
	float result[4];
	__m128 A;
...
--cut here--

val1:                          <- vector is merged this time
        .long   1067869798
        .long   1068708659
        .long   1069547520
        .long   1070386381
        .section        .rodata.str1.1,"aMS",@progbits,1
.LC0:
        .string "%f %f %f %f\n"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $56, %esp
        andl    $-16, %esp
        movups  val1, %xmm0      <- loaded directy into xmm reg
        movups  %xmm0, -16(%ebp)
        subl    $16, %esp
        flds    -4(%ebp)
        ...

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
  2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
  2004-11-19 15:41 ` uros at gcc dot gnu dot org
@ 2005-01-11 23:58 ` rth at gcc dot gnu dot org
  2005-01-12  6:53 ` uros at kss-loka dot si
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-11 23:58 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rth at gcc dot gnu dot org  2005-01-11 23:58 -------
Your first test case is fixed by the patch for PR13366.  We now get

        .align 16
.LC0:
        .long   1067869798
        .long   1067869798
        .long   1067869798
        .long   1067869798
...
        movaps  .LC0, %xmm0
        movups  %xmm0, 56(%esp)

I really don't know what you expected out of your second test case.  Perhaps
the problem is that we don't expose what movups means to the compiler.  I can
see that perhaps we could reuse MISALIGNED_INDIRECT_REF to represent this
(which as a side benefit would result in movhps+movlps instead of one movups,
which runs faster).  But I doubt that we have the machinery at the tree level
to copy-propagate the aggregate initialization from val1[4] -> result[4].  I'm
pretty sure that we have open enhancement requests for this already.

So shall we mark this fixed?

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2005-01-11 23:58 ` rth at gcc dot gnu dot org
@ 2005-01-12  6:53 ` uros at kss-loka dot si
  2005-01-12  9:24 ` steven at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: uros at kss-loka dot si @ 2005-01-12  6:53 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2005-01-12 06:53 -------
(In reply to comment #3)

In the second testcase, compiler should figure out that the whole val1[] array
is initialized with constants. In this case, .LCx constant vector can be built
in compile time and a load of constants to stack can be implemented by "movaps
.LCx, %xmm0 / movups -32(%ebs), %xmm" insn pair.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2005-01-12  6:53 ` uros at kss-loka dot si
@ 2005-01-12  9:24 ` steven at gcc dot gnu dot org
  2005-01-12  9:37 ` steven at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-01-12  9:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-01-12 09:24 -------
If this is closed, let's be sure to find that enhancement request and close 
this as a dup. 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2005-01-12  9:24 ` steven at gcc dot gnu dot org
@ 2005-01-12  9:37 ` steven at gcc dot gnu dot org
  2005-01-12 10:54 ` uros at kss-loka dot si
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-01-12  9:37 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-01-12 09:37 -------
AMD64 has this at -O2: 
 
Test case 1: 
main: 
.LFB499: 
        subq    $24, %rsp 
.LCFI0: 
        movl    $.LC1, %edi 
        movl    $4, %eax 
        movaps  .LC0(%rip), %xmm0 
        movups  %xmm0, (%rsp) 
        cvtss2sd        12(%rsp), %xmm3 
        cvtss2sd        8(%rsp), %xmm2 
        cvtss2sd        4(%rsp), %xmm1 
        cvtss2sd        (%rsp), %xmm0 
        call    printf 
        xorl    %eax, %eax 
        addq    $24, %rsp 
        ret 
.LFE499: 
 
 
Test case 2: 
main: 
.LFB499: 
        subq    $40, %rsp 
.LCFI0: 
        movl    $.LC4, %edi 
        movl    $4, %eax 
        movl    $0x3fa66666, 16(%rsp) 
        movl    $0x3fb33333, 20(%rsp) 
        movl    $0x3fc00000, 24(%rsp) 
        movl    $0x3fcccccd, 28(%rsp) 
        movups  16(%rsp), %xmm0 
        movups  %xmm0, (%rsp) 
        cvtss2sd        12(%rsp), %xmm3 
        cvtss2sd        8(%rsp), %xmm2 
        cvtss2sd        4(%rsp), %xmm1 
        cvtss2sd        (%rsp), %xmm0 
        call    printf 
        xorl    %eax, %eax 
        addq    $40, %rsp 
        ret 
 
 
 
With -m32 -march=pentium4 -mtune=prescott I get this: 
 
Test case 1: 
main: 
        pushl   %ebp 
        movl    %esp, %ebp 
        subl    $56, %esp 
        andl    $-16, %esp 
        subl    $16, %esp 
        movaps  .LC0, %xmm0 
        movups  %xmm0, -16(%ebp) 
        flds    -4(%ebp) 
        fstpl   28(%esp) 
        flds    -8(%ebp) 
        fstpl   20(%esp) 
        flds    -12(%ebp) 
        fstpl   12(%esp) 
        flds    -16(%ebp) 
        fstpl   4(%esp) 
        movl    $.LC1, (%esp) 
        call    printf 
        xorl    %eax, %eax 
        leave 
        ret 
 
Test case 2 (which is basically the same as Uros' code): 
main: 
        pushl   %ebp 
        movl    %esp, %ebp 
        subl    $72, %esp 
        andl    $-16, %esp 
        subl    $16, %esp 
        movl    $0x3fa66666, -16(%ebp) 
        movl    $0x3fb33333, -12(%ebp) 
        movl    $0x3fc00000, -8(%ebp) 
        movl    $0x3fcccccd, -4(%ebp) 
        movups  -16(%ebp), %xmm0 
        movups  %xmm0, -32(%ebp) 
        flds    -20(%ebp) 
        fstpl   28(%esp) 
        flds    -24(%ebp) 
        fstpl   20(%esp) 
        flds    -28(%ebp) 
        fstpl   12(%esp) 
        flds    -32(%ebp) 
        fstpl   4(%esp) 
        movl    $.LC4, (%esp) 
        call    printf 
        xorl    %eax, %eax 
        leave 
        ret 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2005-01-12  9:37 ` steven at gcc dot gnu dot org
@ 2005-01-12 10:54 ` uros at kss-loka dot si
  2005-01-18  9:49 ` rth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: uros at kss-loka dot si @ 2005-01-12 10:54 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From uros at kss-loka dot si  2005-01-12 10:54 -------
Another testcase that I think should be optimized:
#include <xmmintrin.h>

__m128 test() {
	float val1[4] = {0.0f, 0.0f, 0.0f, 0.0f};

	return _mm_loadu_ps(val1);
}

This is currently compiled to:
test:
        pushl  %ebp
        movl   $0x00000000, %eax
        movl   %esp, %ebp
        subl   $16, %esp
        movl   %eax, -16(%ebp)
        movl   %eax, -12(%ebp)
        movl   %eax, -8(%ebp)
        movl   %eax, -4(%ebp)
        movups -16(%ebp), %xmm0
        leave
        ret

But I think gcc it should produce something like:
test:
        pushl  %ebp
        xorps  %xmm0, %xmm0
        movl   %esp, %ebp
        subl   $16, %esp
(*)     movups %xmm0, -16(%ebp)
        leave
        ret

Perhaps even the store to stack is not necessary in this particular case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2005-01-12 10:54 ` uros at kss-loka dot si
@ 2005-01-18  9:49 ` rth at gcc dot gnu dot org
  2005-01-18  9:52 ` rth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-18  9:49 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |14295
             Status|WAITING                     |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
                   ` (7 preceding siblings ...)
  2005-01-18  9:49 ` rth at gcc dot gnu dot org
@ 2005-01-18  9:52 ` rth at gcc dot gnu dot org
  2005-09-14  6:51 ` pinskia at gcc dot gnu dot org
  2005-09-14  7:09 ` pinskia at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: rth at gcc dot gnu dot org @ 2005-01-18  9:52 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From rth at gcc dot gnu dot org  2005-01-18 09:50 -------
Found the tree-ssa aggregate copy-propagation pr.  Made this pr depend on it,
as this has a different sort of test case.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
                   ` (8 preceding siblings ...)
  2005-01-18  9:52 ` rth at gcc dot gnu dot org
@ 2005-09-14  6:51 ` pinskia at gcc dot gnu dot org
  2005-09-14  7:09 ` pinskia at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-14  6:51 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-09-14 06:51 -------
Hmm, but vectors are not consider as aggregates.
at the tree level right before optimization, we have:
  A_6 = {1.2999999523162841796875e+0, 1.2999999523162841796875e+0, 
1.2999999523162841796875e+0, 1.2999999523162841796875e+0};
  #   result_26 = V_MAY_DEF <result_14>;
  __builtin_ia32_storeups (&result, A_6);

I would have thought that VECTOR_CST would be considered a constant and propgrated into 
__builtin_ia32_storeups and that we would have folded __builtin_ia32_storeups at the tree level.

So I think there are two issues now, the first is that we don't constant prop VECTOR_CST (if this is truely 
a VECTOR_CST in store_ccp):
A_6 = {1.2999999523162841796875e+0, 1.2999999523162841796875e+0, 
1.2999999523162841796875e+0, 1.2999999523162841796875e+0};

Lattice value changed to VARYING.  Adding SSA edges to worklist.


And then we need a fold specific to x86 for __builtin_ia32_storeups, after that it should just work.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|14295                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/18562] SSE constant vector initialization produces dead constant values on stack
  2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
                   ` (9 preceding siblings ...)
  2005-09-14  6:51 ` pinskia at gcc dot gnu dot org
@ 2005-09-14  7:09 ` pinskia at gcc dot gnu dot org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-09-14  7:09 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2005-09-14 07:09 -------
Actually the issue is that we don't change the constructor into a VECTOR_CST:
<constructor 0x41ec8720
        type <vector_type 0x41ebb9a0 __v4sf type <real_type 0x41e0fd90 float>
            sizes-gimplified BLK size <integer_cst 0x41e11a60 128> unit size <integer_cst 
0x41e11a80 16>
            align 128 symtab 0 alias set -1 nunits 4>
       >

which we should in this case.

And this happens in DOM or in CCP with replacing _mm_set1_ps with _mm_set1_ps (which really are 
implemented the same way).

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18562


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-09-14  7:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-19  8:49 [Bug tree-optimization/18562] New: SSE constant vector initialization produces dead constant values on stack uros at gcc dot gnu dot org
2004-11-19 14:44 ` [Bug target/18562] " pinskia at gcc dot gnu dot org
2004-11-19 15:41 ` uros at gcc dot gnu dot org
2005-01-11 23:58 ` rth at gcc dot gnu dot org
2005-01-12  6:53 ` uros at kss-loka dot si
2005-01-12  9:24 ` steven at gcc dot gnu dot org
2005-01-12  9:37 ` steven at gcc dot gnu dot org
2005-01-12 10:54 ` uros at kss-loka dot si
2005-01-18  9:49 ` rth at gcc dot gnu dot org
2005-01-18  9:52 ` rth at gcc dot gnu dot org
2005-09-14  6:51 ` pinskia at gcc dot gnu dot org
2005-09-14  7:09 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).