[Bug rtl-optimization/43147] SSE shuffle merge

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
@ 2011-10-23  8:40 ` marc.glisse at normalesup dot org
  2012-05-07 15:00 ` glisse at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: marc.glisse at normalesup dot org @ 2011-10-23  8:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

Marc Glisse <marc.glisse at normalesup dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marc.glisse at normalesup
                   |                            |dot org

--- Comment #4 from Marc Glisse <marc.glisse at normalesup dot org> 2011-10-23 08:40:04 UTC ---
Apart from combining 2 shuffles, I would expect the set and the shuffle to be
combined in Comment 1. I was going to report the following, but it already
appears in this bug:
__m128d f(double d){
    __m128d x=_mm_setr_pd(-d,d);
    return _mm_shuffle_pd(x,x,1);
}

    movsd    .LC0(%rip), %xmm1
    xorpd    %xmm0, %xmm1
    movapd    %xmm1, %xmm2
    unpcklpd    %xmm0, %xmm2
    movapd    %xmm2, %xmm0
    shufpd    $1, %xmm2, %xmm0

some extra moves, as usual, and a shuffle that could be combined with the
unpack.
(obviously we don't write such code, it is only after inlining that it looks
that way)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
  2011-10-23  8:40 ` [Bug rtl-optimization/43147] SSE shuffle merge marc.glisse at normalesup dot org
@ 2012-05-07 15:00 ` glisse at gcc dot gnu.org
  2021-08-21 22:44 ` [Bug target/43147] " pinskia at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-05-07 15:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |glisse at gcc dot gnu.org

--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> 2012-05-07 14:52:46 UTC ---
Actually, why isn't constant propagation happening for the example in comment
#1? simplify-rtx.c contains code to that effect, it might just need a little
tweaking...

(yes, I know the original report is probably interested in non-constant
operands)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
  2011-10-23  8:40 ` [Bug rtl-optimization/43147] SSE shuffle merge marc.glisse at normalesup dot org
  2012-05-07 15:00 ` glisse at gcc dot gnu.org
@ 2021-08-21 22:44 ` pinskia at gcc dot gnu.org
  2021-08-21 23:45 ` hjl.tools at gmail dot com
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-21 22:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |target

--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We produce:
Trying 5, 7 -> 11:
    5: r86:V4SF=[`*.LC0']
      REG_EQUAL const_vector
    7: r85:V4SF=vec_select(vec_concat(r86:V4SF,r86:V4SF),parallel)
      REG_DEAD r86:V4SF
      REG_EQUAL const_vector
   11: r88:V4SF=vec_select(vec_concat(r85:V4SF,r85:V4SF),parallel)
      REG_DEAD r85:V4SF
      REG_EQUAL const_vector
Failed to match this instruction:
(set (reg:V4SF 88)
    (const_vector:V4SF [
            (const_double:SF 2.0e+0 [0x0.8p+2])
            (const_double:SF 1.0e+0 [0x0.8p+1])
            (const_double:SF 4.0e+0 [0x0.8p+3])
            (const_double:SF 3.0e+0 [0x0.cp+2])
        ]))

Which means the vec_select are merging at the rtl level just fine.

Anyways if the target expands __builtin_ia32_shufps to VEC_PERM_EXPR we would
have gotten this optimized at the gimple level.  So this is a target issue.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-08-21 22:44 ` [Bug target/43147] " pinskia at gcc dot gnu.org
@ 2021-08-21 23:45 ` hjl.tools at gmail dot com
  2021-08-22 12:54 ` hjl.tools at gmail dot com
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-08-21 23:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |hjl.tools at gmail dot com

--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 51345
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51345&action=edit
A patch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-08-21 23:45 ` hjl.tools at gmail dot com
@ 2021-08-22 12:54 ` hjl.tools at gmail dot com
  2021-08-24  1:57 ` [Bug rtl-optimization/43147] " hjl.tools at gmail dot com
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-08-22 12:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch
                URL|                            |https://gcc.gnu.org/piperma
                   |                            |il/gcc-patches/2021-August/
                   |                            |577884.html

--- Comment #13 from H.J. Lu <hjl.tools at gmail dot com> ---
A patch is posted at

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577884.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-08-22 12:54 ` hjl.tools at gmail dot com
@ 2021-08-24  1:57 ` hjl.tools at gmail dot com
  2021-08-25  8:42 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-08-24  1:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|hjl.tools at gmail dot com         |unassigned at gcc dot gnu.org
          Component|target                      |rtl-optimization
            Version|4.4.1                       |12.0

--- Comment #14 from H.J. Lu <hjl.tools at gmail dot com> ---
From

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577991.html

Trying 5 -> 7:
    5: r85:V4SF=[`*.LC0']
      REG_EQUAL const_vector
    7: r84:V4SF=vec_select(vec_concat(r85:V4SF,r85:V4SF),parallel)
      REG_DEAD r85:V4SF
      REG_EQUAL const_vector
Failed to match this instruction:
(set (reg:V4SF 84)
    (const_vector:V4SF [
            (const_double:SF 3.0e+0 [0x0.cp+2])
            (const_double:SF 2.0e+0 [0x0.8p+2])
            (const_double:SF 4.0e+0 [0x0.8p+3])
            (const_double:SF 1.0e+0 [0x0.8p+1])
        ]))

(insn 5 2 7 2 (set (reg:V4SF 85)
        (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16
A128]))
"/export/users/liuhongt/install/git_trunk_master_native/lib/gcc/x86_64-pc-linux-gnu/12.0.0/include/xmmintrin.h":746:19
1600 {movv4sf_internal}
     (expr_list:REG_EQUAL (const_vector:V4SF [
                (const_double:SF 4.0e+0 [0x0.8p+3])
                (const_double:SF 3.0e+0 [0x0.cp+2])
                (const_double:SF 2.0e+0 [0x0.8p+2])
                (const_double:SF 1.0e+0 [0x0.8p+1])
            ])
        (nil)))
(insn 7 5 11 2 (set (reg:V4SF 84)
        (vec_select:V4SF (vec_concat:V8SF (reg:V4SF 85)
                (reg:V4SF 85))
            (parallel [
                    (const_int 1 [0x1])
                    (const_int 2 [0x2])
                    (const_int 4 [0x4])
                    (const_int 7 [0x7])
                ])))
"/export/users/liuhongt/install/git_trunk_master_native/lib/gcc/x86_64-pc-linux-gnu/12.0.0/include/xmmintrin.h":746:19
3015 {sse_shufps_v4sf}
     (expr_list:REG_DEAD (reg:V4SF 85)
        (expr_list:REG_EQUAL (const_vector:V4SF [
                    (const_double:SF 3.0e+0 [0x0.cp+2])
                    (const_double:SF 2.0e+0 [0x0.8p+2])
                    (const_double:SF 4.0e+0 [0x0.8p+3])
                    (const_double:SF 1.0e+0 [0x0.8p+1])
                ])
            (nil))))

I think pass_combine should be extended to force illegitimate constant
to constant pool and recog load insn again, It looks like a general
optimization that better not do it in the backend.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-08-24  1:57 ` [Bug rtl-optimization/43147] " hjl.tools at gmail dot com
@ 2021-08-25  8:42 ` crazylht at gmail dot com
  2021-08-25  8:52 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2021-08-25  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---

> I think pass_combine should be extended to force illegitimate constant
> to constant pool and recog load insn again, It looks like a general
> optimization that better not do it in the backend.

The issue can also be solved by folding __builtin_ia32_shufps to gimple
VEC_PERM_EXPR, .i.e the below testcase doesn't have the problem

typedef int v4si __attribute__((vector_size (16)));

v4si
foo ()
{
    v4si a = __extension__ (v4si) {4, 3, 2, 1};
    v4si b = __extension__ (v4si) {5, 6, 7, 8};
    v4si c = __builtin_shufflevector (a, b, 1, 4, 2, 7);
    v4si d = __builtin_shuffle (c, __extension__ (v4si) { 3, 2, 0, 1 });
    return d;
}

foo():
  movdqa .LC0(%rip), %xmm0
  ret
.LC0:
  .long 8
  .long 2
  .long 3
  .long 5

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2021-08-25  8:42 ` crazylht at gmail dot com
@ 2021-08-25  8:52 ` pinskia at gcc dot gnu.org
  2021-08-25  8:54 ` glisse at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-25  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

--- Comment #16 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #15)
> > I think pass_combine should be extended to force illegitimate constant
> > to constant pool and recog load insn again, It looks like a general
> > optimization that better not do it in the backend.
> 
> The issue can also be solved by folding __builtin_ia32_shufps to gimple
> VEC_PERM_EXPR, .i.e the below testcase doesn't have the problem
> 
> typedef int v4si __attribute__((vector_size (16)));
> 
> v4si
> foo ()
> {
>     v4si a = __extension__ (v4si) {4, 3, 2, 1};
>     v4si b = __extension__ (v4si) {5, 6, 7, 8};
>     v4si c = __builtin_shufflevector (a, b, 1, 4, 2, 7);
>     v4si d = __builtin_shuffle (c, __extension__ (v4si) { 3, 2, 0, 1 });
>     return d;
> }

But that is because we constant fold on the gimple level for PERMs.
combining VEC_PERM_EXPR on the gimple is PR 54346; note I found this while
looking at other issues too :).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2021-08-25  8:52 ` pinskia at gcc dot gnu.org
@ 2021-08-25  8:54 ` glisse at gcc dot gnu.org
  2021-08-25  9:20 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-08-25  8:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

--- Comment #17 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #15)
> The issue can also be solved by folding __builtin_ia32_shufps to gimple
> VEC_PERM_EXPR,

Didn't you post a patch to do that last year? What happened to it?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2021-08-25  8:54 ` glisse at gcc dot gnu.org
@ 2021-08-25  9:20 ` crazylht at gmail dot com
  2021-08-27  0:51 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2021-08-25  9:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

--- Comment #18 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Marc Glisse from comment #17)
> (In reply to Hongtao.liu from comment #15)
> > The issue can also be solved by folding __builtin_ia32_shufps to gimple
> > VEC_PERM_EXPR,
> 
> Didn't you post a patch to do that last year? What happened to it?

I almost forgot it, let me retest my patch, it's in
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562029.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2021-08-25  9:20 ` crazylht at gmail dot com
@ 2021-08-27  0:51 ` cvs-commit at gcc dot gnu.org
  2021-08-27  1:00 ` crazylht at gmail dot com
  2023-08-22  4:32 ` pinskia at gcc dot gnu.org
  12 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-27  0:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:0fa4787bf34b173ce6f198e99b6f6dd8a3f98014

commit r12-3177-g0fa4787bf34b173ce6f198e99b6f6dd8a3f98014
Author: liuhongt <hongtao.liu@intel.com>
Date:   Fri Dec 11 19:02:43 2020 +0800

    Fold more shuffle builtins to VEC_PERM_EXPR.

    A follow-up to
https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html

    gcc/
            PR target/98167
            PR target/43147
            * config/i386/i386.c (ix86_gimple_fold_builtin): Fold
            IX86_BUILTIN_SHUFPD512, IX86_BUILTIN_SHUFPS512,
            IX86_BUILTIN_SHUFPD256ï¼ IX86_BUILTIN_SHUFPSï¼
            IX86_BUILTIN_SHUFPS256.
            (ix86_masked_all_ones): New function.

    gcc/testsuite/
            * gcc.target/i386/avx512f-vshufpd-1.c: Adjust testcase.
            * gcc.target/i386/avx512f-vshufps-1.c: Adjust testcase.
            * gcc.target/i386/pr43147.c: New test.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2021-08-27  0:51 ` cvs-commit at gcc dot gnu.org
@ 2021-08-27  1:00 ` crazylht at gmail dot com
  2023-08-22  4:32 ` pinskia at gcc dot gnu.org
  12 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2021-08-27  1:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

--- Comment #20 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC12, now  gcc generate optimal codes.

main:
.LFB532:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movaps  .LC0(%rip), %xmm0
        call    printv
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE532:
        .size   main, .-main
        .section        .rodata.cst16,"aM",@progbits,16
        .align 16
.LC0:
        .long   1073741824
        .long   1065353216
        .long   1082130432
        .long   1077936128
        .ident  "GCC: (GNU) 12.0.0 20210825 (experimental)"
        .section        .note.GNU-stack,"",@progbits

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
       [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2021-08-27  1:00 ` crazylht at gmail dot com
@ 2023-08-22  4:32 ` pinskia at gcc dot gnu.org
  12 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-22  4:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
   Target Milestone|---                         |13.0
         Resolution|---                         |FIXED

--- Comment #21 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Constant folding part was fixed in GCC 12 but combining shuffles was fixed in
GCC 13.

That is for:
```
__m128 m;
int main()
{
        m = _mm_shuffle_ps(m, m, 0xC9); // Those two shuffles together swap
pairs
        m = _mm_shuffle_ps(m, m, 0x2D); // And could be optimized to 0x4E
        printv(m);

        return 0;
}
```
GCC 13+ Produces:
```
        movaps  m(%rip), %xmm0
        shufps  $78, %xmm0, %xmm0
        movaps  %xmm0, m(%rip)
        call    _Z6printvDv4_f
```
instead of what was there in GCC 12:
```
        movaps  m(%rip), %xmm0
        shufps  $201, %xmm0, %xmm0
        shufps  $45, %xmm0, %xmm0
        movaps  %xmm0, m(%rip)
```

So closing as fixed in GCC 13.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
  2010-02-23  1:27 [Bug rtl-optimization/43147] New: " liranuna at gmail dot com
  2010-02-23  1:37 ` [Bug rtl-optimization/43147] " liranuna at gmail dot com
@ 2010-02-23  1:42 ` pinskia at gcc dot gnu dot org
  2010-02-23  1:42 ` pinskia at gcc dot gnu dot org
  2 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2010-02-23  1:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2010-02-23 01:42 -------
Confirmed.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-02-23 01:42:16
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
  2010-02-23  1:27 [Bug rtl-optimization/43147] New: " liranuna at gmail dot com
  2010-02-23  1:37 ` [Bug rtl-optimization/43147] " liranuna at gmail dot com
  2010-02-23  1:42 ` pinskia at gcc dot gnu dot org
@ 2010-02-23  1:42 ` pinskia at gcc dot gnu dot org
  2 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2010-02-23  1:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2010-02-23 01:42 -------
I think that is because nothing simplifies:
    (vec_select:V4SF (vec_concat:V8SF (vec_select:V4SF (vec_concat:V8SF
(reg:V4SF 62)
                    (reg:V4SF 62))
                (parallel [
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 4 [0x4])
                        (const_int 7 [0x7])
                    ]))
            (vec_select:V4SF (vec_concat:V8SF (reg:V4SF 62)
                    (reg:V4SF 62))
                (parallel [
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 4 [0x4])
                        (const_int 7 [0x7])
                    ])))
        (parallel [
                (const_int 1 [0x1])
                (const_int 3 [0x3])
                (const_int 6 [0x6])
                (const_int 4 [0x4])
            ]))


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug rtl-optimization/43147] SSE shuffle merge
  2010-02-23  1:27 [Bug rtl-optimization/43147] New: " liranuna at gmail dot com
@ 2010-02-23  1:37 ` liranuna at gmail dot com
  2010-02-23  1:42 ` pinskia at gcc dot gnu dot org
  2010-02-23  1:42 ` pinskia at gcc dot gnu dot org
  2 siblings, 0 replies; 16+ messages in thread
From: liranuna at gmail dot com @ 2010-02-23  1:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from liranuna at gmail dot com  2010-02-23 01:37 -------
It appears I am missing a line in the code I posted:

#include <xmmintrin.h>

extern void printv(__m128 m);

int main()
{
        __m128 m = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f);
        m = _mm_shuffle_ps(m, m, 0xC9); // Those two shuffles together swap
pairs
        m = _mm_shuffle_ps(m, m, 0x2D); // And could be optimized to 0x4E
        printv(m);

        return 0;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-08-22  4:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-43147-4@http.gcc.gnu.org/bugzilla/>
2011-10-23  8:40 ` [Bug rtl-optimization/43147] SSE shuffle merge marc.glisse at normalesup dot org
2012-05-07 15:00 ` glisse at gcc dot gnu.org
2021-08-21 22:44 ` [Bug target/43147] " pinskia at gcc dot gnu.org
2021-08-21 23:45 ` hjl.tools at gmail dot com
2021-08-22 12:54 ` hjl.tools at gmail dot com
2021-08-24  1:57 ` [Bug rtl-optimization/43147] " hjl.tools at gmail dot com
2021-08-25  8:42 ` crazylht at gmail dot com
2021-08-25  8:52 ` pinskia at gcc dot gnu.org
2021-08-25  8:54 ` glisse at gcc dot gnu.org
2021-08-25  9:20 ` crazylht at gmail dot com
2021-08-27  0:51 ` cvs-commit at gcc dot gnu.org
2021-08-27  1:00 ` crazylht at gmail dot com
2023-08-22  4:32 ` pinskia at gcc dot gnu.org
2010-02-23  1:27 [Bug rtl-optimization/43147] New: " liranuna at gmail dot com
2010-02-23  1:37 ` [Bug rtl-optimization/43147] " liranuna at gmail dot com
2010-02-23  1:42 ` pinskia at gcc dot gnu dot org
2010-02-23  1:42 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).