public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
@ 2020-09-02 20:22 gabravier at gmail dot com
  2020-09-03  3:06 ` [Bug target/96906] " crazylht at gmail dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2020-09-02 20:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

            Bug ID: 96906
           Summary: Failure to optimize __builtin_ia32_psubusw128 compared
                    to 0 to __builtin_ia32_pminuw128 compared to operand
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

typedef int16_t v8i16 __attribute__((vector_size(16)));

v8i16 cmple_epu16(v8i16 x, v8i16 y)
{
        return __builtin_ia32_psubusw128(x, y) == 0;
}

With -msse4.1, this can be optimized to `return __builtin_ia32_pminuw128(x, y)
== x;`. This transformation is done by LLVM, but not by GCC. 

PS: I'm not 100% sure this is faster but it logically should be, since the
`pminuw` version doesn't have to handle zeroing an SSE register.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
@ 2020-09-03  3:06 ` crazylht at gmail dot com
  2020-09-03  3:29 ` crazylht at gmail dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03  3:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
For vector compare to integer mask, gcc use UNSPEC, that's why it's not handled
by general part, maybe peephole2 should be added for this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
  2020-09-03  3:06 ` [Bug target/96906] " crazylht at gmail dot com
@ 2020-09-03  3:29 ` crazylht at gmail dot com
  2020-09-03  3:54 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03  3:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #1)
> For vector compare to integer mask, gcc use UNSPEC, that's why it's not
> handled by general part, maybe peephole2 should be added for this.

Or in ix86_expand_int_vcond

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
  2020-09-03  3:06 ` [Bug target/96906] " crazylht at gmail dot com
  2020-09-03  3:29 ` crazylht at gmail dot com
@ 2020-09-03  3:54 ` crazylht at gmail dot com
  2020-11-24 17:20 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03  3:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(gdb) f 6 
#6  0x0000000000f04a33 in expand_vect_cond_optab_fn (stmt=0x7fffea1193f0,
optab=vcond_optab) at ../../../gcc/gnu-toolchain/master/gcc/internal-fn.c:2612
2612      expand_insn (icode, 6, ops);
(gdb) p icode
$22 = CODE_FOR_vcondv8hiv8hi

Shouldn't CODE_FOR_vconduv8hiv8hi be used? then sign info could be handled by
extra param in ix86_expand_int_vcond.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2020-09-03  3:54 ` crazylht at gmail dot com
@ 2020-11-24 17:20 ` jakub at gcc dot gnu.org
  2020-11-26  7:48 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-24 17:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 49621
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49621&action=edit
gcc11-pr96906.patch

Looking over Agner Fog's table, pminus[bw] and psubus[bw] seems to have the
same timing.  This untested patch does the optimization in the combiner for
SSE2/SSE4.1/AVX2, but handling AVX512BW and AVX512BW+AVX512VL will need further
define_insn_and_split patterns I don't have cycles for right now (match the
unspec comparisons in there).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2020-11-24 17:20 ` jakub at gcc dot gnu.org
@ 2020-11-26  7:48 ` cvs-commit at gcc dot gnu.org
  2020-11-26  8:58 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-11-26  7:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:32b0abb24b8702ec9954448739682ace6fa5ccf5

commit r11-5398-g32b0abb24b8702ec9954448739682ace6fa5ccf5
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Nov 26 08:44:15 2020 +0100

    i386: Optimize psubusw compared to 0 into pminuw compared to op0 [PR96906]

    The following patch renames VI12_AVX2 iterator to VI12_AVX2_AVX512BW
    for consistency with some other iterators, as I need VI12_AVX2 without
    AVX512BW for this change.
    The real meat is a combiner split which combine
    can use to optimize psubusw compared to 0 into pminuw compared to op0
    (and similarly for psubusb compared to 0 into pminub compared to op0).
    According to Agner Fog's tables, psubus[bw] and pminu[bw] timings
    are the same, but the advantage of pminu[bw] is that the comparison
    doesn't need a zero operand, so e.g. for -msse4.1 it causes changes like
    -       psubusw %xmm1, %xmm0
    -       pxor    %xmm1, %xmm1
    +       pminuw  %xmm0, %xmm1
            pcmpeqw %xmm1, %xmm0
    and similarly for avx2:
    -       vpsubusb        %ymm1, %ymm0, %ymm0
    -       vpxor   %xmm1, %xmm1, %xmm1
    -       vpcmpeqb        %ymm1, %ymm0, %ymm0
    +       vpminub %ymm1, %ymm0, %ymm1
    +       vpcmpeqb        %ymm0, %ymm1, %ymm0

    I haven't done the AVX512{BW,VL} define_split(s), they'll need
    to match the UNSPEC_PCMP which are used for avx512 comparisons.

    2020-11-26  Jakub Jelinek  <jakub@redhat.com>

            PR target/96906
            * config/i386/sse.md (VI12_AVX2): Remove V64QI/V32HI modes.
            (VI12_AVX2_AVX512BW): New mode iterator.
            (<sse2_avx2>_<plusminus_insn><mode>3<mask_name>,
            uavg<mode>3_ceil, <sse2_avx2>_uavg<mode>3<mask_name>): Use
            VI12_AVX2_AVX512BW iterator instead of VI12_AVX2.
            (*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Likewise.
            (*<sse2_avx2>_uavg<mode>3<mask_name>): Likewise.
            (*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Add a new
            define_split after this insn.

            * gcc.target/i386/pr96906-1.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2020-11-26  7:48 ` cvs-commit at gcc dot gnu.org
@ 2020-11-26  8:58 ` jakub at gcc dot gnu.org
  2020-11-26  9:03 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-26  8:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Implemented now for non-AVX512*.  Hongtao, do you think you could have a look
at the avx512{bw,vl}/avx512bw splitter(s)?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2020-11-26  8:58 ` jakub at gcc dot gnu.org
@ 2020-11-26  9:03 ` crazylht at gmail dot com
  2020-12-03  5:45 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-11-26  9:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #6)
> Implemented now for non-AVX512*.  Hongtao, do you think you could have a
> look at the avx512{bw,vl}/avx512bw splitter(s)?

Yes, i'll do it. Thanks for the patch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2020-11-26  9:03 ` crazylht at gmail dot com
@ 2020-12-03  5:45 ` cvs-commit at gcc dot gnu.org
  2020-12-03  5:46 ` crazylht at gmail dot com
  2021-01-12 15:21 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-12-03  5:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:70310982492071f98eacdac0747521769b0f0328

commit r11-5697-g70310982492071f98eacdac0747521769b0f0328
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Nov 30 13:27:16 2020 +0800

    Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnleuw [PR96906]

    For signed comparisons, it handles cases that are eq or neq to 0.
    For unsigned comparisons, it additionaly handles cases that are le or
    gt to 0(equivilent to eq or neq to 0). Transform case eq to leu,
    case neq to gtu.

    .i.e. for -mavx512bw -mavx512vl transform eq case code from

            vpsubusw        %xmm1, %xmm0, %xmm0
            vpxor   %xmm1, %xmm1, %xmm1
            vpcmpeqw  %xmm1, %xmm0, %k0
    to
            vpcmpleuw       %xmm1, %xmm0, %k0

    .i.e. for -mavx512bw -mavx512vl transform neq case code from

            vpsubusw        %xmm1, %xmm0, %xmm0
            vpxor   %xmm1, %xmm1, %xmm1
            vpcmpneqw  %xmm1, %xmm0, %k0
    to
            vpcmpnleuw       %xmm1, %xmm0, %k0

    gcc/ChangeLog
            PR target/96906
            * config/i386/sse.md
            (<avx512>_ucmp<mode>3<mask_scalar_merge_name>): Add a new
            define_split after this insn.

    gcc/testsuite/ChangeLog

            * gcc.target/i386/avx512bw-pr96906-1.c: New test.
            * gcc.target/i386/pr96906-1.c: Add -mno-avx512f.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
                   ` (7 preceding siblings ...)
  2020-12-03  5:45 ` cvs-commit at gcc dot gnu.org
@ 2020-12-03  5:46 ` crazylht at gmail dot com
  2021-01-12 15:21 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-12-03  5:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
  2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
                   ` (8 preceding siblings ...)
  2020-12-03  5:46 ` crazylht at gmail dot com
@ 2021-01-12 15:21 ` jakub at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-12 15:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-12 15:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
2020-09-03  3:06 ` [Bug target/96906] " crazylht at gmail dot com
2020-09-03  3:29 ` crazylht at gmail dot com
2020-09-03  3:54 ` crazylht at gmail dot com
2020-11-24 17:20 ` jakub at gcc dot gnu.org
2020-11-26  7:48 ` cvs-commit at gcc dot gnu.org
2020-11-26  8:58 ` jakub at gcc dot gnu.org
2020-11-26  9:03 ` crazylht at gmail dot com
2020-12-03  5:45 ` cvs-commit at gcc dot gnu.org
2020-12-03  5:46 ` crazylht at gmail dot com
2021-01-12 15:21 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).