public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/40122]  New: missed optimization when using union of __m128i and int[4]
@ 2009-05-12 13:52 kretz at kde dot org
  2009-05-12 15:01 ` [Bug middle-end/40122] " rguenth at gcc dot gnu dot org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: kretz at kde dot org @ 2009-05-12 13:52 UTC (permalink / raw)
  To: gcc-bugs

The following testcase

#include <emmintrin.h>

typedef union {
    __m128i v;
    int m[4];
} VectorUnion;

VectorUnion one()
{
    VectorUnion r = { _mm_set1_epi32(1) };
    return r;
}

int main()
{
    VectorUnion x = one();
    if (0xffff == _mm_movemask_epi8(_mm_cmpeq_epi32(x.v, x.v))) {
        return 0;
    }
    return 1;
}

compiles (-Wall -Wextra -O2 -mssse3) to

00000000004004d0 <main>:
  4004d0:       66 0f 6f 05 38 01 00 00         movdqa 0x138(%rip),%xmm0
  4004d8:       66 0f 7f 44 24 d8       movdqa %xmm0,-0x28(%rsp)                
  4004de:       48 8b 44 24 d8          mov    -0x28(%rsp),%rax                 
  4004e3:       48 89 44 24 e8          mov    %rax,-0x18(%rsp)                 
  4004e8:       48 8b 44 24 e0          mov    -0x20(%rsp),%rax                 
  4004ed:       48 89 44 24 f0          mov    %rax,-0x10(%rsp)                 
  4004f2:       66 0f 6f 44 24 e8       movdqa -0x18(%rsp),%xmm0                
  4004f8:       66 0f 76 c0             pcmpeqd %xmm0,%xmm0                     
  4004fc:       66 0f d7 c0             pmovmskb %xmm0,%eax                     

As can be seen the xmm0 register is stored on the stack, then copied via two 64
bit moves on the stack and then, from there, loaded back into xmm0. The values
on the stack are not needed/used later on.

I expected gcc to note those no-op moves and produce code like

movdqa 0x138(%rip),%xmm0
pcmpeqd %xmm0,%xmm0                                                   
pmovmskb %xmm0,%eax


-- 
           Summary: missed optimization when using union of __m128i and
                    int[4]
           Product: gcc
           Version: 4.3.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: kretz at kde dot org
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/40122] missed optimization when using union of __m128i and int[4]
  2009-05-12 13:52 [Bug middle-end/40122] New: missed optimization when using union of __m128i and int[4] kretz at kde dot org
@ 2009-05-12 15:01 ` rguenth at gcc dot gnu dot org
  2009-05-12 15:24 ` pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-12 15:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-05-12 15:00 -------
The union copy confuses GCC:

  r.v = VIEW_CONVERT_EXPR<vector long long int>({1, 1, 1, 1});
  D.6990 = r;
  x = D.6990;
  D.6997 = VIEW_CONVERT_EXPR<vector int>(x.v);
  D.6994 = __builtin_ia32_pcmpeqd128 (D.6997, D.6997);
  D.7000 = __builtin_ia32_pmovmskb128 (VIEW_CONVERT_EXPR<vector
char>(VIEW_CONVERT_EXPR<vector long long int>(D.6994)));
  return D.7000 != 65535;

this will likely be fixed with the new SRA or is a duplicate of PR36327.

Martin, can you check this (and maybe add a testcase)?


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org, mjambor at suse dot cz
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/40122] missed optimization when using union of __m128i and int[4]
  2009-05-12 13:52 [Bug middle-end/40122] New: missed optimization when using union of __m128i and int[4] kretz at kde dot org
  2009-05-12 15:01 ` [Bug middle-end/40122] " rguenth at gcc dot gnu dot org
@ 2009-05-12 15:24 ` pinskia at gcc dot gnu dot org
  2009-05-21 16:02 ` jamborm at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2009-05-12 15:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2009-05-12 15:24 -------
This is a dup of bug 36327.

*** This bug has been marked as a duplicate of 36327 ***


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/40122] missed optimization when using union of __m128i and int[4]
  2009-05-12 13:52 [Bug middle-end/40122] New: missed optimization when using union of __m128i and int[4] kretz at kde dot org
  2009-05-12 15:01 ` [Bug middle-end/40122] " rguenth at gcc dot gnu dot org
  2009-05-12 15:24 ` pinskia at gcc dot gnu dot org
@ 2009-05-21 16:02 ` jamborm at gcc dot gnu dot org
  2009-05-25 15:20 ` jamborm at gcc dot gnu dot org
  2009-05-25 16:00 ` rguenth at gcc dot gnu dot org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2009-05-21 16:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from jamborm at gcc dot gnu dot org  2009-05-21 16:02 -------
With he new SRA, the optimized dump looks like:

  D.6886_10 = {1, 1, 1, 1};
  D.6887_11 = VIEW_CONVERT_EXPR<vector long long int>(D.6886_10);
  D.6893_12 = VIEW_CONVERT_EXPR<vector int>(D.6887_11);
  D.6891_14 = __builtin_ia32_pcmpeqd128 (D.6893_12, D.6893_12);
  D.6890_15 = VIEW_CONVERT_EXPR<vector long long int>(D.6891_14);
  D.6897_16 = VIEW_CONVERT_EXPR<vector char>(D.6890_15);
  D.6896_17 = __builtin_ia32_pmovmskb128 (D.6897_16);
  D.6933_21 = D.6896_17 != 65535;
  return D.6933_21;


x is completely gone.

The (relevant) assembly output is 

main:
        movdqa  .LC0, %xmm0
        pcmpeqd %xmm0, %xmm0
        pmovmskb        %xmm0, %eax
        cmpl    $65535, %eax
        pushl   %ebp
        setne   %al
        movl    %esp, %ebp
        movzbl  %al, %eax
        popl    %ebp
        ret

So  even though  I  don't  really understand  the  SSE instructions  I
believe the  new SRA does indeed  help.  I'll add  a testcase checking
that x vanishes to the patch series as I am finalizing the final patch
set now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/40122] missed optimization when using union of __m128i and int[4]
  2009-05-12 13:52 [Bug middle-end/40122] New: missed optimization when using union of __m128i and int[4] kretz at kde dot org
                   ` (2 preceding siblings ...)
  2009-05-21 16:02 ` jamborm at gcc dot gnu dot org
@ 2009-05-25 15:20 ` jamborm at gcc dot gnu dot org
  2009-05-25 16:00 ` rguenth at gcc dot gnu dot org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu dot org @ 2009-05-25 15:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jamborm at gcc dot gnu dot org  2009-05-25 15:20 -------
...hm, when I  wanted to make such a testcase I  realized that the SSE
code is  not very portable.   So I changed  my mind and won't  use it.
I'll be adding different union scalarization checks, though.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug middle-end/40122] missed optimization when using union of __m128i and int[4]
  2009-05-12 13:52 [Bug middle-end/40122] New: missed optimization when using union of __m128i and int[4] kretz at kde dot org
                   ` (3 preceding siblings ...)
  2009-05-25 15:20 ` jamborm at gcc dot gnu dot org
@ 2009-05-25 16:00 ` rguenth at gcc dot gnu dot org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-25 16:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenth at gcc dot gnu dot org  2009-05-25 15:59 -------
I have some CCP / fold_stmt patches that produce

        movdqa  .LC1(%rip), %xmm0
        pcmpeqd %xmm0, %xmm0
        pmovmskb        %xmm0, %eax
        cmpl    $65535, %eax
        setne   %al
        movzbl  %al, %eax
        ret

as well.  The issue is that the CONSTRUCTOR from _mm_set1_epi32(1) is neither
marked TREE_CONSTANT nor folded to VECTOR_CST.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-05-25 16:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-12 13:52 [Bug middle-end/40122] New: missed optimization when using union of __m128i and int[4] kretz at kde dot org
2009-05-12 15:01 ` [Bug middle-end/40122] " rguenth at gcc dot gnu dot org
2009-05-12 15:24 ` pinskia at gcc dot gnu dot org
2009-05-21 16:02 ` jamborm at gcc dot gnu dot org
2009-05-25 15:20 ` jamborm at gcc dot gnu dot org
2009-05-25 16:00 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).