[Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily
@ 2021-07-24  8:17 glisse at gcc dot gnu.org
  2021-07-24 14:45 ` [Bug target/101611] " hjl.tools at gmail dot com
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-07-24  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

            Bug ID: 101611
           Summary: AVX2 vector arithmetic shift lowered to scalar
                    unnecessarily
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

Stealing the example from PR 56873

#define SIZE 32
typedef long long veci __attribute__((vector_size(SIZE)));

veci f(veci a, veci b){
  return a>>b;
}

but compiling with -O3 -mavx2 this time, gcc produces scalar code

        vmovq   %xmm1, %rcx
        vmovq   %xmm0, %rax
        vpextrq $1, %xmm0, %rsi
        sarq    %cl, %rax
        vextracti128    $0x1, %ymm0, %xmm0
        vpextrq $1, %xmm1, %rcx
        vextracti128    $0x1, %ymm1, %xmm1
        movq    %rax, %rdx
        sarq    %cl, %rsi
        vmovq   %xmm0, %rax
        vmovq   %xmm1, %rcx
        vmovq   %rdx, %xmm5
        sarq    %cl, %rax
        vpextrq $1, %xmm1, %rcx
        movq    %rax, %rdi
        vpextrq $1, %xmm0, %rax
        vpinsrq $1, %rsi, %xmm5, %xmm0
        sarq    %cl, %rax
        vmovq   %rdi, %xmm4
        vpinsrq $1, %rax, %xmm4, %xmm1
        vinserti128     $0x1, %xmm1, %ymm0, %ymm0
        ret

while clang outputs much shorter vector code

        vpbroadcastq    .LCPI0_0(%rip), %ymm2   # ymm2 =
[9223372036854775808,9223372036854775808,9223372036854775808,9223372036854775808]
        vpsrlvq %ymm1, %ymm2, %ymm2
        vpsrlvq %ymm1, %ymm0, %ymm0
        vpxor   %ymm2, %ymm0, %ymm0
        vpsubq  %ymm2, %ymm0, %ymm0
        retq

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
@ 2021-07-24 14:45 ` hjl.tools at gmail dot com
  2021-07-24 15:43 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-24 14:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-07-24
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> ---
This may be related to PR 101579.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
  2021-07-24 14:45 ` [Bug target/101611] " hjl.tools at gmail dot com
@ 2021-07-24 15:43 ` jakub at gcc dot gnu.org
  2021-07-24 16:25 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-24 15:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
For arithmetic V[24]DImode >> scalar we have PR98856 in GCC 12, but indeed, for
arithmetic V[24]DImode >> V[24]DImode
logical ((x >> y) ^ (0x8000000000000000ULL >> y)) - (0x8000000000000000ULL >>
y)
can be used.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
  2021-07-24 14:45 ` [Bug target/101611] " hjl.tools at gmail dot com
  2021-07-24 15:43 ` jakub at gcc dot gnu.org
@ 2021-07-24 16:25 ` jakub at gcc dot gnu.org
  2021-07-26 12:53 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-24 16:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
--- gcc/config/i386/sse.md.jj   2021-07-22 12:37:20.439532859 +0200
+++ gcc/config/i386/sse.md      2021-07-24 18:03:07.328126900 +0200
@@ -20499,13 +20499,34 @@ (define_expand "vlshr<mode>3"
          (match_operand:VI48_256 2 "nonimmediate_operand")))]
   "TARGET_AVX2")

-(define_expand "vashr<mode>3"
-  [(set (match_operand:VI8_256_512 0 "register_operand")
-       (ashiftrt:VI8_256_512
-         (match_operand:VI8_256_512 1 "register_operand")
-         (match_operand:VI8_256_512 2 "nonimmediate_operand")))]
+(define_expand "vashrv8di3"
+  [(set (match_operand:V8DI 0 "register_operand")
+       (ashiftrt:V8DI
+         (match_operand:V8DI 1 "register_operand")
+         (match_operand:V8DI 2 "nonimmediate_operand")))]
   "TARGET_AVX512F")

+(define_expand "vashrv4di3"
+  [(set (match_operand:V4DI 0 "register_operand")
+       (ashiftrt:V4DI
+         (match_operand:V4DI 1 "register_operand")
+         (match_operand:V4DI 2 "nonimmediate_operand")))]
+  "TARGET_AVX2"
+{
+  if (!TARGET_AVX512VL)
+    {
+      rtx mask = ix86_build_signbit_mask (V4DImode, 1, 0);
+      rtx t1 = gen_reg_rtx (V4DImode);
+      rtx t2 = gen_reg_rtx (V4DImode);
+      rtx t3 = gen_reg_rtx (V4DImode);
+      emit_insn (gen_vlshrv4di3 (t1, operands[1], operands[2]));
+      emit_insn (gen_vlshrv4di3 (t2, mask, operands[2]));
+      emit_insn (gen_xorv4di3 (t3, t1, t2));
+      emit_insn (gen_subv4di3 (operands[0], t3, t2));
+      DONE;
+    }
+})
+
 (define_expand "vashr<mode>3"
   [(set (match_operand:VI12_128 0 "register_operand")
        (ashiftrt:VI12_128
@@ -20527,12 +20548,12 @@ (define_expand "vashr<mode>3"
     }
 })

-(define_expand "vashrv2di3<mask_name>"
+(define_expand "vashrv2di3"
   [(set (match_operand:V2DI 0 "register_operand")
        (ashiftrt:V2DI
          (match_operand:V2DI 1 "register_operand")
          (match_operand:V2DI 2 "nonimmediate_operand")))]
-  "TARGET_XOP || TARGET_AVX512VL"
+  "TARGET_XOP || TARGET_AVX2"
 {
   if (TARGET_XOP)
     {
@@ -20541,6 +20562,18 @@ (define_expand "vashrv2di3<mask_name>"
       emit_insn (gen_xop_shav2di3 (operands[0], operands[1], neg));
       DONE;
     }
+  if (!TARGET_AVX512VL)
+    {
+      rtx mask = ix86_build_signbit_mask (V2DImode, 1, 0);
+      rtx t1 = gen_reg_rtx (V2DImode);
+      rtx t2 = gen_reg_rtx (V2DImode);
+      rtx t3 = gen_reg_rtx (V2DImode);
+      emit_insn (gen_vlshrv2di3 (t1, operands[1], operands[2]));
+      emit_insn (gen_vlshrv2di3 (t2, mask, operands[2]));
+      emit_insn (gen_xorv2di3 (t3, t1, t2));
+      emit_insn (gen_subv2di3 (operands[0], t3, t2));
+      DONE;
+    }
 })

 (define_expand "vashrv4si3"

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-07-24 16:25 ` jakub at gcc dot gnu.org
@ 2021-07-26 12:53 ` jakub at gcc dot gnu.org
  2021-07-26 13:10 ` glisse at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 12:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51205
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51205&action=edit
gcc12-pr101611.patch

Full untested fix.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-07-26 12:53 ` jakub at gcc dot gnu.org
@ 2021-07-26 13:10 ` glisse at gcc dot gnu.org
  2021-07-26 13:26 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-07-26 13:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #2)
> for arithmetic V[24]DImode >> V[24]DImode
> logical ((x >> y) ^ (0x8000000000000000ULL >> y)) - (0x8000000000000000ULL
> >> y)
> can be used.

I guess it would be complicated to try and implement this fallback strategy in
a generic way so other modes/targets could benefit.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-07-26 13:10 ` glisse at gcc dot gnu.org
@ 2021-07-26 13:26 ` jakub at gcc dot gnu.org
  2021-07-26 14:07 ` glisse at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 13:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think except for x86 it is very unusual to support logical but not arithmetic
vector right shifts, are you aware of any other target that suffers from these?
Even vector by vector shifts are rare, if my grep doesn't miss anything, only
aarch64, arm, x86, mips, rs6000 and s390 has them.  I've grepped tmp-mddump.md
for each of them and except for x86 where we have the known issues I only see
some weird vlshrti3 pattern that doesn't have vashrti3 counterpart, but the
vlshr<mode>3 and vashr<mode>3 optabs AFAIK should be used solely for vector
modes and nothing else.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-07-26 13:26 ` jakub at gcc dot gnu.org
@ 2021-07-26 14:07 ` glisse at gcc dot gnu.org
  2021-07-26 14:33 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-07-26 14:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> ---
The same strategy to implement arithmetic shift in terms of logical shift works
not just for vector>>vector but also vector>>scalar and scalar>>scalar. But it
is probably not worth the trouble indeed, especially since your target patch is
ready :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2021-07-26 14:07 ` glisse at gcc dot gnu.org
@ 2021-07-26 14:33 ` jakub at gcc dot gnu.org
  2021-07-28  8:53 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
That is true, but I think even for vector >> scalar and scalar >> scalar shifts
it will be quite rare to support logical and not support arithmetic shifts.
And on x86, as can be seen in the PR98856 changes, yes, this way of expressing
it is possible, but not always the shortest.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2021-07-26 14:33 ` jakub at gcc dot gnu.org
@ 2021-07-28  8:53 ` cvs-commit at gcc dot gnu.org
  2021-07-28  8:54 ` jakub at gcc dot gnu.org
  2021-08-04 22:01 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-28  8:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:88d0f70a326eeb42b479aa537f8a81bf5a199346

commit r12-2557-g88d0f70a326eeb42b479aa537f8a81bf5a199346
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Jul 28 10:52:51 2021 +0200

    i386: Improve AVX2 expansion of vector >> vector DImode arithm. shifts
[PR101611]

    AVX2 introduced vector >> vector shifts, but unfortunately for V{2,4}DImode
    it only supports logical and not arithmetic shifts, only AVX512F for
    V8DImode or AVX512VL for V{2,4}DImode fixed that omission.
    Earlier in GCC12 cycle I've committed vector >> scalar arithmetic shift
    emulation using various sequences, this patch handles the vector >> vector
    case.  No need to adjust costs, the previous cost adjustment actually
    covers even the vector by vector shifts.
    The patch emits the right arithmetic V{2,4}DImode shifts using 2 logical
right
    V{2,4}DImode shifts (once of the original operands, once of sign mask
    constant by the vector shift count), xor and subtraction, on each element
    (long long) x >> y is done as
    (((unsigned long long) x >> y) ^ (0x8000000000000000ULL >> y))
    - (0x8000000000000000ULL >> y)
    i.e. if x doesn't have in some element the MSB set, it is just the logical
    shift, if it does, then the xor and subtraction cause also all higher bits
    to be set.

    2021-07-28  Jakub Jelinek  <jakub@redhat.com>

            PR target/101611
            * config/i386/sse.md (vashr<mode>3): Split into vashrv8di3 expander
            and vashrv4di3 expander, where the latter requires just TARGET_AVX2
            and has special !TARGET_AVX512VL expansion.
            (vashrv2di3<mask_name>): Rename to ...
            (vashrv2di3): ... this.  Change condition to TARGET_XOP ||
TARGET_AVX2
            and add special !TARGET_XOP && !TARGET_AVX512VL expansion.

            * gcc.target/i386/avx2-pr101611-1.c: New test.
            * gcc.target/i386/avx2-pr101611-2.c: New test.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2021-07-28  8:53 ` cvs-commit at gcc dot gnu.org
@ 2021-07-28  8:54 ` jakub at gcc dot gnu.org
  2021-08-04 22:01 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-28  8:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
  2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2021-07-28  8:54 ` jakub at gcc dot gnu.org
@ 2021-08-04 22:01 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-04 22:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-04 22:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-24  8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
2021-07-24 14:45 ` [Bug target/101611] " hjl.tools at gmail dot com
2021-07-24 15:43 ` jakub at gcc dot gnu.org
2021-07-24 16:25 ` jakub at gcc dot gnu.org
2021-07-26 12:53 ` jakub at gcc dot gnu.org
2021-07-26 13:10 ` glisse at gcc dot gnu.org
2021-07-26 13:26 ` jakub at gcc dot gnu.org
2021-07-26 14:07 ` glisse at gcc dot gnu.org
2021-07-26 14:33 ` jakub at gcc dot gnu.org
2021-07-28  8:53 ` cvs-commit at gcc dot gnu.org
2021-07-28  8:54 ` jakub at gcc dot gnu.org
2021-08-04 22:01 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).