public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Optimize comparison between result of us_minus and 0.
@ 2020-09-03  9:39 Hongtao Liu
  2020-11-17  0:08 ` Jeff Law
  0 siblings, 1 reply; 2+ messages in thread
From: Hongtao Liu @ 2020-09-03  9:39 UTC (permalink / raw)
  To: GCC Patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

Hi:
  Add define_peephole2 to perform optimization like bellow:

+/* Optimize for TARGET_AVX512F
+  vpsubusw op1, op2, dst1;
+  vxorps xmm, xmm, dst2; ---->   vpcmpleuw op1, op2, dst3
+  vpcmpeqw dst1, dst2, dst3  */

and

+/* Optimize for target above TARGET_SSE4_1
+  vpsubusw op1, op2, dst1;      vpminuw op1, op2, dst1
+  vpxor xmm, xmm, dst2; ---->   vpcmpeqw op1, dst1, dst3
+  vpcmpeqw dst1, dst2, dst3  */

Bootstrap is ok, regression test is ok for i386/x86-64 backend.
Ok for trunk?

gcc/ChangeLog:
        PR target/96906
        * config/i386/sse.md (VI12_128_256): New mode iterator.
        (define_peephole2): Optimize comparison between result of
        us_minus and 0, it could be optimized to "vpcmplequ" for
        AVX512 or "pminu + cmpeq" for target above TARGET_SSE4_1.

gcc/testsuite/ChangeLog:

        * gcc.target/i386/avx2-pr96906-1.c: New test.
        * gcc.target/i386/avx512f-pr96906-1.c: New test.
        * gcc.target/i386/sse2-pr96906.c: New test.
        * gcc.target/i386/sse4_1-pr96906-1.c: New test.

--
BR,
Hongtao

[-- Attachment #2: 0001-Optimize-__builtin_ia32_psubusw128-compared-to-0-to-.patch --]
[-- Type: application/x-patch, Size: 7850 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] Optimize comparison between result of us_minus and 0.
  2020-09-03  9:39 [PATCH] Optimize comparison between result of us_minus and 0 Hongtao Liu
@ 2020-11-17  0:08 ` Jeff Law
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff Law @ 2020-11-17  0:08 UTC (permalink / raw)
  To: Hongtao Liu, GCC Patches, Kirill Yukhin


On 9/3/20 3:39 AM, Hongtao Liu via Gcc-patches wrote:
> Hi:
>   Add define_peephole2 to perform optimization like bellow:
>
> +/* Optimize for TARGET_AVX512F
> +  vpsubusw op1, op2, dst1;
> +  vxorps xmm, xmm, dst2; ---->   vpcmpleuw op1, op2, dst3
> +  vpcmpeqw dst1, dst2, dst3  */
>
> and
>
> +/* Optimize for target above TARGET_SSE4_1
> +  vpsubusw op1, op2, dst1;      vpminuw op1, op2, dst1
> +  vpxor xmm, xmm, dst2; ---->   vpcmpeqw op1, dst1, dst3
> +  vpcmpeqw dst1, dst2, dst3  */
>
> Bootstrap is ok, regression test is ok for i386/x86-64 backend.
> Ok for trunk?
>
> gcc/ChangeLog:
>         PR target/96906
>         * config/i386/sse.md (VI12_128_256): New mode iterator.
>         (define_peephole2): Optimize comparison between result of
>         us_minus and 0, it could be optimized to "vpcmplequ" for
>         AVX512 or "pminu + cmpeq" for target above TARGET_SSE4_1.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/avx2-pr96906-1.c: New test.
>         * gcc.target/i386/avx512f-pr96906-1.c: New test.
>         * gcc.target/i386/sse2-pr96906.c: New test.
>         * gcc.target/i386/sse4_1-pr96906-1.c: New test.

I'd look to see if a combiner pattern could help with these too rather
than using a peep2.

jeff


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-11-17  0:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-03  9:39 [PATCH] Optimize comparison between result of us_minus and 0 Hongtao Liu
2020-11-17  0:08 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).