public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
@ 2022-02-03 18:08 gabravier at gmail dot com
2022-02-03 23:07 ` [Bug target/104371] " pinskia at gcc dot gnu.org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2022-02-03 18:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
Bug ID: 104371
Summary: [x86] Failure to use optimize
pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
bool is_zero(__m128i x)
{
return _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128())) == 0xffff;
}
This can be optimized to `return _mm_testz_si128(x, x);`. This optimization is
done by LLVM, but not by GCC.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
@ 2022-02-03 23:07 ` pinskia at gcc dot gnu.org
2022-02-04 8:03 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-03 23:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
2022-02-03 23:07 ` [Bug target/104371] " pinskia at gcc dot gnu.org
@ 2022-02-04 8:03 ` rguenth at gcc dot gnu.org
2022-02-04 19:21 ` gabravier at gmail dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-04 8:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
<bb 2> [local count: 1073741824]:
_2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D));
_6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
_7 = VIEW_CONVERT_EXPR<vector(16) signed char>(_6);
_4 = __builtin_ia32_pmovmskb128 (_7);
_5 = _4 == 65535;
return _5;
so likely one reason is the builtin and later UNSPEC for the movemask
operation.
combine does try the following though
Trying 8, 11, 13 -> 14:
8: r92:V16QI=r89:V16QI==r96:V2DI#0
REG_DEAD r96:V2DI
REG_DEAD r89:V16QI
11: r88:SI=unspec[r92:V16QI] 44
REG_DEAD r92:V16QI
13: flags:CCZ=cmp(r88:SI,0xffff)
REG_DEAD r88:SI
14: r95:QI=flags:CCZ==0
REG_DEAD flags:CCZ
Failed to match this instruction:
(set (reg:QI 95)
(eq:QI (unspec:SI [
(eq:V16QI (reg:V16QI 89)
(subreg:V16QI (reg:V2DI 96) 0))
] UNSPEC_MOVMSK)
(const_int 65535 [0xffff])))
of course I have my doubts the pattern is a useful one to optimize.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
2022-02-03 23:07 ` [Bug target/104371] " pinskia at gcc dot gnu.org
2022-02-04 8:03 ` rguenth at gcc dot gnu.org
@ 2022-02-04 19:21 ` gabravier at gmail dot com
2022-02-07 5:12 ` crazylht at gmail dot com
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2022-02-04 19:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> ---
Although I agree the pattern doesn't seem that useful at first, I've seen it
crop up in several places, such as:
- in pixman: https://github.com/servo/pixman/blob/master/pixman/pixman-sse2.c
on line 181
- in an simd mandelbrot implementation:
https://github.com/huonw/mandel-simd/blob/master/mandel_sse2.c on line 47
- in this article:
http://0x80.pl/notesen/2021-02-02-all-bytes-in-reg-are-equal.html
- in boost::uuid (although this one will detect if compiling on a platform with
SSE4.1):
https://github.com/boostorg/uuid/blob/develop/include/boost/uuid/detail/uuid_x86.ipp
- in this other article:
https://mischasan.wordpress.com/2011/11/09/the-generic-sse2-loop/
- in a research paper's accompanying github repo:
https://github.com/GameTechDev/MaskedOcclusionCulling/blob/master/MaskedOcclusionCulling.cpp
on line 333
- in ClickHouse:
https://clickhouse.com/codebrowser/html_report/ClickHouse/src/Common/memcmpSmall.h.html
on line 241
And this is just what I found in a few minutes, so I would personally think
there are many more occurences of that pattern.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
` (2 preceding siblings ...)
2022-02-04 19:21 ` gabravier at gmail dot com
@ 2022-02-07 5:12 ` crazylht at gmail dot com
2022-02-07 5:16 ` crazylht at gmail dot com
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2022-02-07 5:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
Similar for
#include<immintrin.h>
bool is_zero256(__m256i x)
{
return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x, _mm256_setzero_si256()))
== 0xffffffff;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
` (3 preceding siblings ...)
2022-02-07 5:12 ` crazylht at gmail dot com
@ 2022-02-07 5:16 ` crazylht at gmail dot com
2022-02-07 5:41 ` crazylht at gmail dot com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2022-02-07 5:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Failed to match this instruction:
(set (reg:CCZ 17 flags)
(compare:CCZ (unspec:SI [
(eq:V16QI (subreg:V16QI (reg:V2DI 94) 0)
(const_vector:V16QI [
(const_int 0 [0]) repeated x16
]))
] UNSPEC_MOVMSK)
(const_int 65535 [0xffff])))
This can be optimized to ptest as long as only CCZ is cared.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
` (4 preceding siblings ...)
2022-02-07 5:16 ` crazylht at gmail dot com
@ 2022-02-07 5:41 ` crazylht at gmail dot com
2022-03-31 6:13 ` haochen.jiang at intel dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2022-02-07 5:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> <bb 2> [local count: 1073741824]:
> _2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D));
> _6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> _7 = VIEW_CONVERT_EXPR<vector(16) signed char>(_6);
> _4 = __builtin_ia32_pmovmskb128 (_7);
> _5 = _4 == 65535;
> return _5;
>
> so likely one reason is the builtin and later UNSPEC for the movemask
> operation.
>
Under AVX512BW & AVX512VL we can fold __builtin_ia32_pmovmskb128 to
vector(16) <signed-boolean:1> temp_1 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0 }
temp_2 = VIEW_CONVERT_EXPR<unsigned short> temp_1; ----- ????
_4 = zero_extend temp_2;
but I'm not sure if VIEW_CONVERT_EXPR can be used between vector and integer
type.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
` (5 preceding siblings ...)
2022-02-07 5:41 ` crazylht at gmail dot com
@ 2022-03-31 6:13 ` haochen.jiang at intel dot com
2022-05-12 9:37 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: haochen.jiang at intel dot com @ 2022-03-31 6:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
Haochen Jiang <haochen.jiang at intel dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |haochen.jiang at intel dot com
--- Comment #6 from Haochen Jiang <haochen.jiang at intel dot com> ---
Created attachment 52723
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52723&action=edit
This patch aims to optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
I fixed that through this patch. Regtested on x86_64-pc-linux-gnu.
Currently hold for Stage 1 of GCC 13
If this is ok, could you help me to add block to PR105073? Thx.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
` (6 preceding siblings ...)
2022-03-31 6:13 ` haochen.jiang at intel dot com
@ 2022-05-12 9:37 ` cvs-commit at gcc dot gnu.org
2022-05-12 9:45 ` haochen.jiang at intel dot com
2022-07-28 18:54 ` hjl.tools at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-12 9:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:
https://gcc.gnu.org/g:3c9364f29e7e47eb9de33f2d8843d5b00284ceca
commit r13-338-g3c9364f29e7e47eb9de33f2d8843d5b00284ceca
Author: Haochen Jiang <haochen.jiang@intel.com>
Date: Tue Feb 8 10:51:26 2022 +0800
i386: Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to
ptest.
gcc/ChangeLog:
PR target/104371
* config/i386/sse.md (vi1avx2const): New define_mode_attr.
(pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest splitter):
New define_split pattern.
gcc/testsuite/ChangeLog:
PR target/104371
* gcc.target/i386/pr104371-1.c: New test.
* gcc.target/i386/pr104371-2.c: Ditto.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
` (7 preceding siblings ...)
2022-05-12 9:37 ` cvs-commit at gcc dot gnu.org
@ 2022-05-12 9:45 ` haochen.jiang at intel dot com
2022-07-28 18:54 ` hjl.tools at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: haochen.jiang at intel dot com @ 2022-05-12 9:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
--- Comment #8 from Haochen Jiang <haochen.jiang at intel dot com> ---
Fixed for GCC 13.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
` (8 preceding siblings ...)
2022-05-12 9:45 ` haochen.jiang at intel dot com
@ 2022-07-28 18:54 ` hjl.tools at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: hjl.tools at gmail dot com @ 2022-07-28 18:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |13.0
--- Comment #9 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 13.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-07-28 18:54 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
2022-02-03 23:07 ` [Bug target/104371] " pinskia at gcc dot gnu.org
2022-02-04 8:03 ` rguenth at gcc dot gnu.org
2022-02-04 19:21 ` gabravier at gmail dot com
2022-02-07 5:12 ` crazylht at gmail dot com
2022-02-07 5:16 ` crazylht at gmail dot com
2022-02-07 5:41 ` crazylht at gmail dot com
2022-03-31 6:13 ` haochen.jiang at intel dot com
2022-05-12 9:37 ` cvs-commit at gcc dot gnu.org
2022-05-12 9:45 ` haochen.jiang at intel dot com
2022-07-28 18:54 ` hjl.tools at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).