* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
@ 2020-09-03 3:06 ` crazylht at gmail dot com
2020-09-03 3:29 ` crazylht at gmail dot com
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03 3:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
For vector compare to integer mask, gcc use UNSPEC, that's why it's not handled
by general part, maybe peephole2 should be added for this.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
2020-09-03 3:06 ` [Bug target/96906] " crazylht at gmail dot com
@ 2020-09-03 3:29 ` crazylht at gmail dot com
2020-09-03 3:54 ` crazylht at gmail dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03 3:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #1)
> For vector compare to integer mask, gcc use UNSPEC, that's why it's not
> handled by general part, maybe peephole2 should be added for this.
Or in ix86_expand_int_vcond
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
2020-09-03 3:06 ` [Bug target/96906] " crazylht at gmail dot com
2020-09-03 3:29 ` crazylht at gmail dot com
@ 2020-09-03 3:54 ` crazylht at gmail dot com
2020-11-24 17:20 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03 3:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(gdb) f 6
#6 0x0000000000f04a33 in expand_vect_cond_optab_fn (stmt=0x7fffea1193f0,
optab=vcond_optab) at ../../../gcc/gnu-toolchain/master/gcc/internal-fn.c:2612
2612 expand_insn (icode, 6, ops);
(gdb) p icode
$22 = CODE_FOR_vcondv8hiv8hi
Shouldn't CODE_FOR_vconduv8hiv8hi be used? then sign info could be handled by
extra param in ix86_expand_int_vcond.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
` (2 preceding siblings ...)
2020-09-03 3:54 ` crazylht at gmail dot com
@ 2020-11-24 17:20 ` jakub at gcc dot gnu.org
2020-11-26 7:48 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-24 17:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 49621
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49621&action=edit
gcc11-pr96906.patch
Looking over Agner Fog's table, pminus[bw] and psubus[bw] seems to have the
same timing. This untested patch does the optimization in the combiner for
SSE2/SSE4.1/AVX2, but handling AVX512BW and AVX512BW+AVX512VL will need further
define_insn_and_split patterns I don't have cycles for right now (match the
unspec comparisons in there).
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
` (3 preceding siblings ...)
2020-11-24 17:20 ` jakub at gcc dot gnu.org
@ 2020-11-26 7:48 ` cvs-commit at gcc dot gnu.org
2020-11-26 8:58 ` jakub at gcc dot gnu.org
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-11-26 7:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:32b0abb24b8702ec9954448739682ace6fa5ccf5
commit r11-5398-g32b0abb24b8702ec9954448739682ace6fa5ccf5
Author: Jakub Jelinek <jakub@redhat.com>
Date: Thu Nov 26 08:44:15 2020 +0100
i386: Optimize psubusw compared to 0 into pminuw compared to op0 [PR96906]
The following patch renames VI12_AVX2 iterator to VI12_AVX2_AVX512BW
for consistency with some other iterators, as I need VI12_AVX2 without
AVX512BW for this change.
The real meat is a combiner split which combine
can use to optimize psubusw compared to 0 into pminuw compared to op0
(and similarly for psubusb compared to 0 into pminub compared to op0).
According to Agner Fog's tables, psubus[bw] and pminu[bw] timings
are the same, but the advantage of pminu[bw] is that the comparison
doesn't need a zero operand, so e.g. for -msse4.1 it causes changes like
- psubusw %xmm1, %xmm0
- pxor %xmm1, %xmm1
+ pminuw %xmm0, %xmm1
pcmpeqw %xmm1, %xmm0
and similarly for avx2:
- vpsubusb %ymm1, %ymm0, %ymm0
- vpxor %xmm1, %xmm1, %xmm1
- vpcmpeqb %ymm1, %ymm0, %ymm0
+ vpminub %ymm1, %ymm0, %ymm1
+ vpcmpeqb %ymm0, %ymm1, %ymm0
I haven't done the AVX512{BW,VL} define_split(s), they'll need
to match the UNSPEC_PCMP which are used for avx512 comparisons.
2020-11-26 Jakub Jelinek <jakub@redhat.com>
PR target/96906
* config/i386/sse.md (VI12_AVX2): Remove V64QI/V32HI modes.
(VI12_AVX2_AVX512BW): New mode iterator.
(<sse2_avx2>_<plusminus_insn><mode>3<mask_name>,
uavg<mode>3_ceil, <sse2_avx2>_uavg<mode>3<mask_name>): Use
VI12_AVX2_AVX512BW iterator instead of VI12_AVX2.
(*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Likewise.
(*<sse2_avx2>_uavg<mode>3<mask_name>): Likewise.
(*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Add a new
define_split after this insn.
* gcc.target/i386/pr96906-1.c: New test.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
` (4 preceding siblings ...)
2020-11-26 7:48 ` cvs-commit at gcc dot gnu.org
@ 2020-11-26 8:58 ` jakub at gcc dot gnu.org
2020-11-26 9:03 ` crazylht at gmail dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-26 8:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Implemented now for non-AVX512*. Hongtao, do you think you could have a look
at the avx512{bw,vl}/avx512bw splitter(s)?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
` (5 preceding siblings ...)
2020-11-26 8:58 ` jakub at gcc dot gnu.org
@ 2020-11-26 9:03 ` crazylht at gmail dot com
2020-12-03 5:45 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-11-26 9:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #6)
> Implemented now for non-AVX512*. Hongtao, do you think you could have a
> look at the avx512{bw,vl}/avx512bw splitter(s)?
Yes, i'll do it. Thanks for the patch.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
` (6 preceding siblings ...)
2020-11-26 9:03 ` crazylht at gmail dot com
@ 2020-12-03 5:45 ` cvs-commit at gcc dot gnu.org
2020-12-03 5:46 ` crazylht at gmail dot com
2021-01-12 15:21 ` jakub at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-12-03 5:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:70310982492071f98eacdac0747521769b0f0328
commit r11-5697-g70310982492071f98eacdac0747521769b0f0328
Author: liuhongt <hongtao.liu@intel.com>
Date: Mon Nov 30 13:27:16 2020 +0800
Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnleuw [PR96906]
For signed comparisons, it handles cases that are eq or neq to 0.
For unsigned comparisons, it additionaly handles cases that are le or
gt to 0(equivilent to eq or neq to 0). Transform case eq to leu,
case neq to gtu.
.i.e. for -mavx512bw -mavx512vl transform eq case code from
vpsubusw %xmm1, %xmm0, %xmm0
vpxor %xmm1, %xmm1, %xmm1
vpcmpeqw %xmm1, %xmm0, %k0
to
vpcmpleuw %xmm1, %xmm0, %k0
.i.e. for -mavx512bw -mavx512vl transform neq case code from
vpsubusw %xmm1, %xmm0, %xmm0
vpxor %xmm1, %xmm1, %xmm1
vpcmpneqw %xmm1, %xmm0, %k0
to
vpcmpnleuw %xmm1, %xmm0, %k0
gcc/ChangeLog
PR target/96906
* config/i386/sse.md
(<avx512>_ucmp<mode>3<mask_scalar_merge_name>): Add a new
define_split after this insn.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx512bw-pr96906-1.c: New test.
* gcc.target/i386/pr96906-1.c: Add -mno-avx512f.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
` (7 preceding siblings ...)
2020-12-03 5:45 ` cvs-commit at gcc dot gnu.org
@ 2020-12-03 5:46 ` crazylht at gmail dot com
2021-01-12 15:21 ` jakub at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-12-03 5:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/96906] Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand
2020-09-02 20:22 [Bug target/96906] New: Failure to optimize __builtin_ia32_psubusw128 compared to 0 to __builtin_ia32_pminuw128 compared to operand gabravier at gmail dot com
` (8 preceding siblings ...)
2020-12-03 5:46 ` crazylht at gmail dot com
@ 2021-01-12 15:21 ` jakub at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-01-12 15:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96906
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
.
^ permalink raw reply [flat|nested] 11+ messages in thread