[Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns
@ 2020-05-01 19:02 gabravier at gmail dot com
  2020-05-01 21:24 ` [Bug tree-optimization/94908] " glisse at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: gabravier at gmail dot com @ 2020-05-01 19:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

            Bug ID: 94908
           Summary: Failure to optimally optimize certain shuffle patterns
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef float v4sf __attribute__((vector_size(16)));

v4sf g();

v4sf f(v4sf a, v4sf b)
{
    return (v4sf){g()[1], a[1], a[2], a[3]};
}

With -O3, LLVM outputs this :

f(float __vector(4), float __vector(4)): # @f(float __vector(4), float
__vector(4))
  sub rsp, 24
  movaps xmmword ptr [rsp], xmm0 # 16-byte Spill
  call g()
  movaps xmm1, xmmword ptr [rsp] # 16-byte Reload
  shufps xmm0, xmm1, 17 # xmm0 = xmm0[1,0],xmm1[1,0]
  shufps xmm0, xmm1, 232 # xmm0 = xmm0[0,2],xmm1[2,3]
  add rsp, 24
  ret

GCC outputs this : 

f(float __vector(4), float __vector(4)):
  sub rsp, 24
  movaps XMMWORD PTR [rsp], xmm0
  call g()
  movaps xmm1, XMMWORD PTR [rsp]
  add rsp, 24
  shufps xmm0, xmm0, 85
  movaps xmm2, xmm1
  shufps xmm2, xmm1, 85
  movaps xmm3, xmm2
  movaps xmm2, xmm1
  unpckhps xmm2, xmm1
  unpcklps xmm0, xmm3
  shufps xmm1, xmm1, 255
  unpcklps xmm2, xmm1
  movlhps xmm0, xmm2
  ret

This also seems to occurs on powerpc64le, so I haven't marked it as
target-specific.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
@ 2020-05-01 21:24 ` glisse at gcc dot gnu.org
  2020-05-04  6:30 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2020-05-01 21:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
Even if we write __builtin_shuffle, the vector lowering pass turns it into the
same code (constructor of BIT_FIELD_REFs), which seems to indicate that the
target does not handle this pattern.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
  2020-05-01 21:24 ` [Bug tree-optimization/94908] " glisse at gcc dot gnu.org
@ 2020-05-04  6:30 ` rguenth at gcc dot gnu.org
  2023-02-17 20:49 ` gabravier at gmail dot com
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-04  6:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-05-04
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, ideally it would be extract g()[1], insert at a[0].  But yes, we're not
trying to split an not handled suffle into two but leave that for targets
to sort out ... (x86 has code for many 3-insn shuffles for example).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
  2020-05-01 21:24 ` [Bug tree-optimization/94908] " glisse at gcc dot gnu.org
  2020-05-04  6:30 ` rguenth at gcc dot gnu.org
@ 2023-02-17 20:49 ` gabravier at gmail dot com
  2023-02-17 21:05 ` [Bug target/94908] " pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: gabravier at gmail dot com @ 2023-02-17 20:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #3 from Gabriel Ravier <gabravier at gmail dot com> ---
Looks like this gives much better output now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2023-02-17 20:49 ` gabravier at gmail dot com
@ 2023-02-17 21:05 ` pinskia at gcc dot gnu.org
  2023-02-18  9:35 ` ubizjak at gmail dot com
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-17 21:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=53346,
                   |                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=93720
          Component|tree-optimization           |target

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this was a target issue and maybe should be split into a couple
different bugs.

For GCC 8, aarch64 produces:
        dup     v0.4s, v0.s[1]
        ldr     q1, [sp, 16]
        ldp     x29, x30, [sp], 32
        ins     v0.s[1], v1.s[1]
        ins     v0.s[2], v1.s[2]
        ins     v0.s[3], v1.s[3]


For GCC 9/10 did (which is ok, though could be improved which it did in GCC
11):
        adrp    x0, .LC0
        ldr     q1, [sp, 16]
        ldr     q2, [x0, #:lo12:.LC0]
        ldp     x29, x30, [sp], 32
        tbl     v0.16b, {v0.16b - v1.16b}, v2.16b
For GCC 11+, aarch64 produces:
        ldr     q1, [sp, 16]
        ins     v1.s[0], v0.s[1]
        mov     v0.16b, v1.16b


Which means for aarch64, this was changed in GCC 10 and fixed fully for GCC 11
(by r11-2192-gc9c87e6f9c795b aka PR 93720 which was my patch in fact).

For x86_64, the trunk produces:

        movaps  (%rsp), %xmm1
        addq    $24, %rsp
        shufps  $85, %xmm1, %xmm0
        shufps  $232, %xmm1, %xmm0

While for GCC 12 produces:

        movaps  (%rsp), %xmm1
        addq    $24, %rsp
        shufps  $85, %xmm0, %xmm0
        movaps  %xmm1, %xmm2
        shufps  $85, %xmm1, %xmm2
        movaps  %xmm2, %xmm3
        movaps  %xmm1, %xmm2
        unpckhps        %xmm1, %xmm2
        unpcklps        %xmm3, %xmm0
        shufps  $255, %xmm1, %xmm1
        unpcklps        %xmm1, %xmm2
        movlhps %xmm2, %xmm0

This was changed with r13-2843-g3db8e9c2422d92 (aka PR 53346).

For powerpc64le, it looks ok for GCC 11:
        addis 9,2,.LC0@toc@ha
        addi 1,1,48
        addi 9,9,.LC0@toc@l
        li 0,-16
        lvx 0,0,9
        vperm 2,31,2,0

Both the x86_64 and the PowerPC PERM implementation could be improved to
support the inseration like the aarch64 backend does too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2023-02-17 21:05 ` [Bug target/94908] " pinskia at gcc dot gnu.org
@ 2023-02-18  9:35 ` ubizjak at gmail dot com
  2023-02-20  3:32 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-18  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Andrew Pinski from comment #4)
> Both the x86_64 and the PowerPC PERM implementation could be improved to
> support the inseration like the aarch64 backend does too.

Cc Hongtao for x86 part.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2023-02-18  9:35 ` ubizjak at gmail dot com
@ 2023-02-20  3:32 ` crazylht at gmail dot com
  2023-03-08 13:19 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2023-02-20  3:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
Yes, insertps can select any element from src and insert into any place of the
dest. under sse4.1, x86 can generate 
  vinsertps       xmm0, xmm1, xmm0, 64  # xmm0 = xmm0[1],xmm1[1,2,3]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2023-02-20  3:32 ` crazylht at gmail dot com
@ 2023-03-08 13:19 ` ubizjak at gmail dot com
  2023-03-09  4:22 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-03-08 13:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 54607
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54607&action=edit
Proposed patch

Patch in testing.

Attached patch produces (-O2 -msse4.1):

f:
        subq    $24, %rsp
        xorl    %eax, %eax
        vmovaps %xmm0, (%rsp)
        call    g
        vmovaps (%rsp), %xmm1
        addq    $24, %rsp
        vinsertps       $64, %xmm0, %xmm1, %xmm0
        ret

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2023-03-08 13:19 ` ubizjak at gmail dot com
@ 2023-03-09  4:22 ` crazylht at gmail dot com
  2023-03-09 14:27 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2023-03-09  4:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #7)
> Created attachment 54607 [details]
> Proposed patch
> 
> Patch in testing.
> 
> Attached patch produces (-O2 -msse4.1):
> 
> f:
>         subq    $24, %rsp
>         xorl    %eax, %eax
>         vmovaps %xmm0, (%rsp)
>         call    g
>         vmovaps (%rsp), %xmm1
>         addq    $24, %rsp
>         vinsertps       $64, %xmm0, %xmm1, %xmm0
>         ret

I'm thinking of something like below so it can be matched both by
expand_vselect_vconcat in ix86_expand_vec_perm_const_1 and patterns created by
pass_combine(theoretically).

+(define_insn_and_split "*sse4_1_insertps_1"
+  [(set (match_operand:VI4F_128 0 "register_operand")
+       (vec_select:VI4F_128
+         (vec_concat:<ssedoublevecmode>
+           (match_operand:VI4F_128 1 "register_operand")
+           (match_operand:VI4F_128 2 "register_operand"))
+         (match_parallel 3 "insertps_parallel"
+           [(match_operand 4 "const_int_operand")])))]
+  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (7 preceding siblings ...)
  2023-03-09  4:22 ` crazylht at gmail dot com
@ 2023-03-09 14:27 ` ubizjak at gmail dot com
  2023-03-09 14:32 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-03-09 14:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #8)

> I'm thinking of something like below so it can be matched both by
> expand_vselect_vconcat in ix86_expand_vec_perm_const_1 and patterns created
> by pass_combine(theoretically).
> 
> +(define_insn_and_split "*sse4_1_insertps_1"
> +  [(set (match_operand:VI4F_128 0 "register_operand")
> +       (vec_select:VI4F_128
> +         (vec_concat:<ssedoublevecmode>
> +           (match_operand:VI4F_128 1 "register_operand")
> +           (match_operand:VI4F_128 2 "register_operand"))
> +         (match_parallel 3 "insertps_parallel"
> +           [(match_operand 4 "const_int_operand")])))]
> +  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"

If you want to go that way, then the resulting pattern should look like
combination of:

(define_insn "*vec_setv4sf_sse4_1"
  [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
        (vec_merge:V4SF
          (vec_duplicate:V4SF
            (match_operand:SF 2 "nonimmediate_operand" "Yrm,*xm,vm"))
          (match_operand:V4SF 1 "register_operand" "0,0,v")
          (match_operand:SI 3 "const_0_to_3_operand")))]
  "TARGET_SSE4_1
   && ((unsigned) exact_log2 (INTVAL (operands[3]))
       < GET_MODE_NUNITS (V4SFmode))"

(define_insn_and_split "*sse4_1_extractps"
  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv")
        (vec_select:SF
          (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
          (parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
  "TARGET_SSE4_1"

where the later pattern propagates into the former in place of operand 2. This
combination is created only for scalar insert of an extracted value, so I doubt
it is ever created...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (8 preceding siblings ...)
  2023-03-09 14:27 ` ubizjak at gmail dot com
@ 2023-03-09 14:32 ` ubizjak at gmail dot com
  2023-04-18 16:59 ` cvs-commit at gcc dot gnu.org
  2023-04-18 17:01 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-03-09 14:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #54607|0                           |1
        is obsolete|                            |

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 54624
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54624&action=edit
Proposed patch v2

New version with some code shamelessly stolen from aarch64.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (9 preceding siblings ...)
  2023-03-09 14:32 ` ubizjak at gmail dot com
@ 2023-04-18 16:59 ` cvs-commit at gcc dot gnu.org
  2023-04-18 17:01 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-18 16:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:95b99e47f4f2df2d0c5680f45e3ec0a3170218ad

commit r14-47-g95b99e47f4f2df2d0c5680f45e3ec0a3170218ad
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue Apr 18 17:50:37 2023 +0200

    i386: Improve permutations with INSERTPS instruction [PR94908]

    INSERTPS can select any element from src and insert into any place
    of the dest.  For SSE4.1 targets, compiler can generate e.g.

            insertps $64, %xmm0, %xmm1

    to insert element 1 from %xmm1 to element 0 of %xmm0.

    gcc/ChangeLog:

            PR target/94908
            * config/i386/i386-builtin.def (__builtin_ia32_insertps128):
            Use CODE_FOR_sse4_1_insertps_v4sf.
            * config/i386/i386-expand.cc (expand_vec_perm_insertps): New.
            (expand_vec_perm_1): Call expand_vec_per_insertps.
            * config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here.
            * config/i386/mmx.md (mmxscalarmode): New mode attribute.
            (@sse4_1_insertps_<mode>): New insn pattern.
            * config/i386/sse.md (@sse4_1_insertps_<mode>): Macroize insn
            pattern from sse4_1_insertps using VI4F_128 mode iterator.

    gcc/testsuite/ChangeLog:

            PR target/94908
            * gcc.target/i386/pr94908.c: New test.
            * gcc.target/i386/sse4_1-insertps-5.c: New test.
            * gcc.target/i386/vperm-v4sf-2-sse4.c: New test.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
  2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
                   ` (10 preceding siblings ...)
  2023-04-18 16:59 ` cvs-commit at gcc dot gnu.org
@ 2023-04-18 17:01 ` ubizjak at gmail dot com
  11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-04-18 17:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908

--- Comment #12 from Uroš Bizjak <ubizjak at gmail dot com> ---
Implemented also for x86.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-04-18 17:01 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
2020-05-01 21:24 ` [Bug tree-optimization/94908] " glisse at gcc dot gnu.org
2020-05-04  6:30 ` rguenth at gcc dot gnu.org
2023-02-17 20:49 ` gabravier at gmail dot com
2023-02-17 21:05 ` [Bug target/94908] " pinskia at gcc dot gnu.org
2023-02-18  9:35 ` ubizjak at gmail dot com
2023-02-20  3:32 ` crazylht at gmail dot com
2023-03-08 13:19 ` ubizjak at gmail dot com
2023-03-09  4:22 ` crazylht at gmail dot com
2023-03-09 14:27 ` ubizjak at gmail dot com
2023-03-09 14:32 ` ubizjak at gmail dot com
2023-04-18 16:59 ` cvs-commit at gcc dot gnu.org
2023-04-18 17:01 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).