[Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq
@ 2020-04-30  0:21 gabravier at gmail dot com
  2020-04-30  7:13 ` [Bug target/94866] " rguenth at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: gabravier at gmail dot com @ 2020-04-30  0:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

            Bug ID: 94866
           Summary: Failure to optimize pinsrq of 0 with index 1 into movq
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef int64_t v2di __attribute__((vector_size(16)));
typedef int32_t v2si __attribute__((vector_size(8)));

v2di _mm_move_epi64(v2di a)
{
    return v2di{a[0], 0LL};
}

LLVM with `-O3 -msse4.1` compiles this to this : 

_mm_move_epi64(long __vector(2)): # @_mm_move_epi64(long __vector(2))
  movq xmm0, xmm0 # xmm0 = xmm0[0],zero
  ret

GCC gives :

_mm_move_epi64(long __vector(2)):
  xor eax, eax
  pinsrq xmm0, rax, 1
  ret

GCC's output seems like it would naturally be much slower, so unless there is
something seriously messed up with x86 chips that I've missed, LLVM's version
should be faster

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
@ 2020-04-30  7:13 ` rguenth at gcc dot gnu.org
  2023-08-22  9:35 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-30  7:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-04-30
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
         Depends on|                            |94864

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're expanding from

  _3 = BIT_INSERT_EXPR <a_1(D), 0, 64 (64 bits)>;
  return _3;

which ends up using

(insn 8 7 9 (set (reg:V2DI 85)
        (vec_merge:V2DI (vec_duplicate:V2DI (reg:DI 86))
            (reg:V2DI 85)
            (const_int 2 [0x2]))) "y.c":6:28 -1

so likely the vec_merge "issue" again.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864
[Bug 94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
  2020-04-30  7:13 ` [Bug target/94866] " rguenth at gcc dot gnu.org
@ 2023-08-22  9:35 ` rguenth at gcc dot gnu.org
  2023-08-22 12:43 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-22  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Bug 94866 depends on bug 94864, which changed state.

Bug 94864 Summary: Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
  2020-04-30  7:13 ` [Bug target/94866] " rguenth at gcc dot gnu.org
  2023-08-22  9:35 ` rguenth at gcc dot gnu.org
@ 2023-08-22 12:43 ` ubizjak at gmail dot com
  2023-08-23  1:12 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-22 12:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |ubizjak at gmail dot com

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 55776
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55776&action=edit
Proposed patch

Patch that introduces alternative MOVQ RTX definition.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2023-08-22 12:43 ` ubizjak at gmail dot com
@ 2023-08-23  1:12 ` crazylht at gmail dot com
  2023-08-23 11:21 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2023-08-23  1:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
!one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
which means we'd better to use vec_merge instead of vec_select:vec_concat when
available in out backend pattern match.

Also for the view of avx512 kmask instructions, use vec_merge will help
constant propagation.

20107  /* Try the SSE4.1 blend variable merge instructions.  */
20108  if (expand_vec_perm_blend (d))
20109    return true;
20110
20111  /* Try movss/movsd instructions.  */
20112  if (expand_vec_perm_movs (d))
20113    return true;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2023-08-23  1:12 ` crazylht at gmail dot com
@ 2023-08-23 11:21 ` ubizjak at gmail dot com
  2023-08-23 11:23 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-23 11:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
> !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
> which means we'd better to use vec_merge instead of vec_select:vec_concat
> when available in out backend pattern match.

In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but
the patch regressed:

-FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq
-FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor
-FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor
-FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times
(?n)(?:mov|psrldq).*%xmm[0-9] 12

So, the compiler still expects vec_concat/vec_select patterns to be present.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2023-08-23 11:21 ` ubizjak at gmail dot com
@ 2023-08-23 11:23 ` ubizjak at gmail dot com
  2023-08-23 13:39 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-23 11:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 55778
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55778&action=edit
Failing patch,  for reference

Patch that converts vec_concat/vec_select sse2_movq128 patterns to vec_merge.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2023-08-23 11:23 ` ubizjak at gmail dot com
@ 2023-08-23 13:39 ` crazylht at gmail dot com
  2023-08-23 14:54 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2023-08-23 13:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #4)
> (In reply to Hongtao.liu from comment #3)
> > in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
> > !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
> > which means we'd better to use vec_merge instead of vec_select:vec_concat
> > when available in out backend pattern match.
> 
> In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but
> the patch regressed:
> 
> -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq
> -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor
> -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor
> -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times
> (?n)(?:mov|psrldq).*%xmm[0-9] 12
> 
> So, the compiler still expects vec_concat/vec_select patterns to be present.


v2df foo_v2df (v2df x)
 {
   return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
 }

The testcase is not a typical vec_merge case, for vec_merge, the shuffle index
should be {0, 3}. Here it happened to be a vec_merge because the second vector
is all zero. And yes for this case, we still need to vec_concat:vec_select
pattern.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2023-08-23 13:39 ` crazylht at gmail dot com
@ 2023-08-23 14:54 ` ubizjak at gmail dot com
  2023-08-24  1:12 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-23 14:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #6) 
> > So, the compiler still expects vec_concat/vec_select patterns to be present.
> 
> v2df foo_v2df (v2df x)
>  {
>    return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
>  }
> 
> The testcase is not a typical vec_merge case, for vec_merge, the shuffle
> index should be {0, 3}. Here it happened to be a vec_merge because the
> second vector is all zero. And yes for this case, we still need to
> vec_concat:vec_select pattern.

I guess the original patch is the way to go then.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
                   ` (7 preceding siblings ...)
  2023-08-23 14:54 ` ubizjak at gmail dot com
@ 2023-08-24  1:12 ` crazylht at gmail dot com
  2023-08-24 20:25 ` cvs-commit at gcc dot gnu.org
  2023-08-24 20:27 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2023-08-24  1:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #7)
> (In reply to Hongtao.liu from comment #6) 
> > > So, the compiler still expects vec_concat/vec_select patterns to be present.
> > 
> > v2df foo_v2df (v2df x)
> >  {
> >    return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
> >  }
> > 
> > The testcase is not a typical vec_merge case, for vec_merge, the shuffle
> > index should be {0, 3}. Here it happened to be a vec_merge because the
> > second vector is all zero. And yes for this case, we still need to
> > vec_concat:vec_select pattern.
> 
> I guess the original patch is the way to go then.

Yes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
                   ` (8 preceding siblings ...)
  2023-08-24  1:12 ` crazylht at gmail dot com
@ 2023-08-24 20:25 ` cvs-commit at gcc dot gnu.org
  2023-08-24 20:27 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-24 20:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:6dd73f0f00f454a05552b008a1d56560bd3f1d4a

commit r14-3471-g6dd73f0f00f454a05552b008a1d56560bd3f1d4a
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Aug 24 22:23:52 2023 +0200

    i386: Optimize pinsrq of 0 with index 1 into movq [PR94866]

    Add new pattern involving vec_merge RTX that is produced by combine from
the
    combination of sse4_1_pinsrq and *movdi_internal:

        7: r86:DI=0
        8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2)
          REG_DEAD r87:V2DI
          REG_DEAD r86:DI
    Successfully matched this instruction:
    (set (reg:V2DI 85 [ a ])
        (vec_merge:V2DI (reg:V2DI 87)
            (const_vector:V2DI [
                    (const_int 0 [0]) repeated x2
                ])
            (const_int 1 [0x1])))

            PR target/94866

    gcc/ChangeLog:

            * config/i386/sse.md (*sse2_movq128_<mode>_1): New insn pattern.

    gcc/testsuite/ChangeLog:

            * g++.target/i386/pr94866.C: New test.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
  2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
                   ` (9 preceding siblings ...)
  2023-08-24 20:25 ` cvs-commit at gcc dot gnu.org
@ 2023-08-24 20:27 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-24 20:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
Implemented for gcc-14.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-08-24 20:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-30  0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
2020-04-30  7:13 ` [Bug target/94866] " rguenth at gcc dot gnu.org
2023-08-22  9:35 ` rguenth at gcc dot gnu.org
2023-08-22 12:43 ` ubizjak at gmail dot com
2023-08-23  1:12 ` crazylht at gmail dot com
2023-08-23 11:21 ` ubizjak at gmail dot com
2023-08-23 11:23 ` ubizjak at gmail dot com
2023-08-23 13:39 ` crazylht at gmail dot com
2023-08-23 14:54 ` ubizjak at gmail dot com
2023-08-24  1:12 ` crazylht at gmail dot com
2023-08-24 20:25 ` cvs-commit at gcc dot gnu.org
2023-08-24 20:27 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).