public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
@ 2022-02-03 18:08 gabravier at gmail dot com
  2022-02-03 23:07 ` [Bug target/104371] " pinskia at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2022-02-03 18:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

            Bug ID: 104371
           Summary: [x86] Failure to use optimize
                    pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

bool is_zero(__m128i x)
{
    return _mm_movemask_epi8(_mm_cmpeq_epi8(x, _mm_setzero_si128())) == 0xffff;
}

This can be optimized to `return _mm_testz_si128(x, x);`. This optimization is
done by LLVM, but not by GCC.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
@ 2022-02-03 23:07 ` pinskia at gcc dot gnu.org
  2022-02-04  8:03 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-03 23:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
  2022-02-03 23:07 ` [Bug target/104371] " pinskia at gcc dot gnu.org
@ 2022-02-04  8:03 ` rguenth at gcc dot gnu.org
  2022-02-04 19:21 ` gabravier at gmail dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-04  8:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
  <bb 2> [local count: 1073741824]:
  _2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D));
  _6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _7 = VIEW_CONVERT_EXPR<vector(16) signed char>(_6);
  _4 = __builtin_ia32_pmovmskb128 (_7);
  _5 = _4 == 65535;
  return _5;

so likely one reason is the builtin and later UNSPEC for the movemask
operation.

combine does try the following though

Trying 8, 11, 13 -> 14:
    8: r92:V16QI=r89:V16QI==r96:V2DI#0
      REG_DEAD r96:V2DI
      REG_DEAD r89:V16QI
   11: r88:SI=unspec[r92:V16QI] 44
      REG_DEAD r92:V16QI
   13: flags:CCZ=cmp(r88:SI,0xffff)
      REG_DEAD r88:SI
   14: r95:QI=flags:CCZ==0
      REG_DEAD flags:CCZ
Failed to match this instruction:
(set (reg:QI 95)
    (eq:QI (unspec:SI [
                (eq:V16QI (reg:V16QI 89)
                    (subreg:V16QI (reg:V2DI 96) 0))
            ] UNSPEC_MOVMSK)
        (const_int 65535 [0xffff])))

of course I have my doubts the pattern is a useful one to optimize.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
  2022-02-03 23:07 ` [Bug target/104371] " pinskia at gcc dot gnu.org
  2022-02-04  8:03 ` rguenth at gcc dot gnu.org
@ 2022-02-04 19:21 ` gabravier at gmail dot com
  2022-02-07  5:12 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: gabravier at gmail dot com @ 2022-02-04 19:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> ---
Although I agree the pattern doesn't seem that useful at first, I've seen it
crop up in several places, such as:

- in pixman: https://github.com/servo/pixman/blob/master/pixman/pixman-sse2.c
on line 181
- in an simd mandelbrot implementation:
https://github.com/huonw/mandel-simd/blob/master/mandel_sse2.c on line 47
- in this article:
http://0x80.pl/notesen/2021-02-02-all-bytes-in-reg-are-equal.html
- in boost::uuid (although this one will detect if compiling on a platform with
SSE4.1):
https://github.com/boostorg/uuid/blob/develop/include/boost/uuid/detail/uuid_x86.ipp
- in this other article:
https://mischasan.wordpress.com/2011/11/09/the-generic-sse2-loop/
- in a research paper's accompanying github repo:
https://github.com/GameTechDev/MaskedOcclusionCulling/blob/master/MaskedOcclusionCulling.cpp
on line 333
- in ClickHouse:
https://clickhouse.com/codebrowser/html_report/ClickHouse/src/Common/memcmpSmall.h.html
on line 241

And this is just what I found in a few minutes, so I would personally think
there are many more occurences of that pattern.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2022-02-04 19:21 ` gabravier at gmail dot com
@ 2022-02-07  5:12 ` crazylht at gmail dot com
  2022-02-07  5:16 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2022-02-07  5:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
Similar for

#include<immintrin.h>
bool is_zero256(__m256i x)
{
    return _mm256_movemask_epi8(_mm256_cmpeq_epi8(x, _mm256_setzero_si256()))
== 0xffffffff;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2022-02-07  5:12 ` crazylht at gmail dot com
@ 2022-02-07  5:16 ` crazylht at gmail dot com
  2022-02-07  5:41 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2022-02-07  5:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Failed to match this instruction:
(set (reg:CCZ 17 flags)
    (compare:CCZ (unspec:SI [
                (eq:V16QI (subreg:V16QI (reg:V2DI 94) 0)
                    (const_vector:V16QI [
                            (const_int 0 [0]) repeated x16
                        ]))
            ] UNSPEC_MOVMSK)
        (const_int 65535 [0xffff])))

This can be optimized to ptest as long as only CCZ is cared.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2022-02-07  5:16 ` crazylht at gmail dot com
@ 2022-02-07  5:41 ` crazylht at gmail dot com
  2022-03-31  6:13 ` haochen.jiang at intel dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2022-02-07  5:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
>   <bb 2> [local count: 1073741824]:
>   _2 = VIEW_CONVERT_EXPR<__v16qi>(x_3(D));
>   _6 = _2 == { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
>   _7 = VIEW_CONVERT_EXPR<vector(16) signed char>(_6);
>   _4 = __builtin_ia32_pmovmskb128 (_7);
>   _5 = _4 == 65535;
>   return _5;
> 
> so likely one reason is the builtin and later UNSPEC for the movemask
> operation.
> 

Under AVX512BW & AVX512VL we can fold __builtin_ia32_pmovmskb128 to 

vector(16) <signed-boolean:1> temp_1 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0 }
temp_2 = VIEW_CONVERT_EXPR<unsigned short> temp_1; ----- ????
_4 = zero_extend temp_2;

but I'm not sure if VIEW_CONVERT_EXPR can be used between vector and integer
type.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2022-02-07  5:41 ` crazylht at gmail dot com
@ 2022-03-31  6:13 ` haochen.jiang at intel dot com
  2022-05-12  9:37 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: haochen.jiang at intel dot com @ 2022-03-31  6:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

Haochen Jiang <haochen.jiang at intel dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |haochen.jiang at intel dot com

--- Comment #6 from Haochen Jiang <haochen.jiang at intel dot com> ---
Created attachment 52723
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52723&action=edit
This patch aims to optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

I fixed that through this patch. Regtested on x86_64-pc-linux-gnu.

Currently hold for Stage 1 of GCC 13

If this is ok, could you help me to add block to PR105073? Thx.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2022-03-31  6:13 ` haochen.jiang at intel dot com
@ 2022-05-12  9:37 ` cvs-commit at gcc dot gnu.org
  2022-05-12  9:45 ` haochen.jiang at intel dot com
  2022-07-28 18:54 ` hjl.tools at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-12  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hongyu Wang <hongyuw@gcc.gnu.org>:

https://gcc.gnu.org/g:3c9364f29e7e47eb9de33f2d8843d5b00284ceca

commit r13-338-g3c9364f29e7e47eb9de33f2d8843d5b00284ceca
Author: Haochen Jiang <haochen.jiang@intel.com>
Date:   Tue Feb 8 10:51:26 2022 +0800

    i386: Add combine splitter to transform pxor/pcmpeqb/pmovmskb/cmp 0xffff to
ptest.

    gcc/ChangeLog:

            PR target/104371
            * config/i386/sse.md (vi1avx2const): New define_mode_attr.
            (pxor/pcmpeqb/pmovmskb/cmp 0xffff to ptest splitter):
            New define_split pattern.

    gcc/testsuite/ChangeLog:

            PR target/104371
            * gcc.target/i386/pr104371-1.c: New test.
            * gcc.target/i386/pr104371-2.c: Ditto.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
                   ` (7 preceding siblings ...)
  2022-05-12  9:37 ` cvs-commit at gcc dot gnu.org
@ 2022-05-12  9:45 ` haochen.jiang at intel dot com
  2022-07-28 18:54 ` hjl.tools at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: haochen.jiang at intel dot com @ 2022-05-12  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #8 from Haochen Jiang <haochen.jiang at intel dot com> ---
Fixed for GCC 13.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest
  2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
                   ` (8 preceding siblings ...)
  2022-05-12  9:45 ` haochen.jiang at intel dot com
@ 2022-07-28 18:54 ` hjl.tools at gmail dot com
  9 siblings, 0 replies; 11+ messages in thread
From: hjl.tools at gmail dot com @ 2022-07-28 18:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |13.0

--- Comment #9 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 13.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-07-28 18:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 18:08 [Bug target/104371] New: [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest gabravier at gmail dot com
2022-02-03 23:07 ` [Bug target/104371] " pinskia at gcc dot gnu.org
2022-02-04  8:03 ` rguenth at gcc dot gnu.org
2022-02-04 19:21 ` gabravier at gmail dot com
2022-02-07  5:12 ` crazylht at gmail dot com
2022-02-07  5:16 ` crazylht at gmail dot com
2022-02-07  5:41 ` crazylht at gmail dot com
2022-03-31  6:13 ` haochen.jiang at intel dot com
2022-05-12  9:37 ` cvs-commit at gcc dot gnu.org
2022-05-12  9:45 ` haochen.jiang at intel dot com
2022-07-28 18:54 ` hjl.tools at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).