* [PATCH] Optimize comparison between result of us_minus and 0.
@ 2020-09-03 9:39 Hongtao Liu
2020-11-17 0:08 ` Jeff Law
0 siblings, 1 reply; 2+ messages in thread
From: Hongtao Liu @ 2020-09-03 9:39 UTC (permalink / raw)
To: GCC Patches, Kirill Yukhin
[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]
Hi:
Add define_peephole2 to perform optimization like bellow:
+/* Optimize for TARGET_AVX512F
+ vpsubusw op1, op2, dst1;
+ vxorps xmm, xmm, dst2; ----> vpcmpleuw op1, op2, dst3
+ vpcmpeqw dst1, dst2, dst3 */
and
+/* Optimize for target above TARGET_SSE4_1
+ vpsubusw op1, op2, dst1; vpminuw op1, op2, dst1
+ vpxor xmm, xmm, dst2; ----> vpcmpeqw op1, dst1, dst3
+ vpcmpeqw dst1, dst2, dst3 */
Bootstrap is ok, regression test is ok for i386/x86-64 backend.
Ok for trunk?
gcc/ChangeLog:
PR target/96906
* config/i386/sse.md (VI12_128_256): New mode iterator.
(define_peephole2): Optimize comparison between result of
us_minus and 0, it could be optimized to "vpcmplequ" for
AVX512 or "pminu + cmpeq" for target above TARGET_SSE4_1.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-pr96906-1.c: New test.
* gcc.target/i386/avx512f-pr96906-1.c: New test.
* gcc.target/i386/sse2-pr96906.c: New test.
* gcc.target/i386/sse4_1-pr96906-1.c: New test.
--
BR,
Hongtao
[-- Attachment #2: 0001-Optimize-__builtin_ia32_psubusw128-compared-to-0-to-.patch --]
[-- Type: application/x-patch, Size: 7850 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] Optimize comparison between result of us_minus and 0.
2020-09-03 9:39 [PATCH] Optimize comparison between result of us_minus and 0 Hongtao Liu
@ 2020-11-17 0:08 ` Jeff Law
0 siblings, 0 replies; 2+ messages in thread
From: Jeff Law @ 2020-11-17 0:08 UTC (permalink / raw)
To: Hongtao Liu, GCC Patches, Kirill Yukhin
On 9/3/20 3:39 AM, Hongtao Liu via Gcc-patches wrote:
> Hi:
> Add define_peephole2 to perform optimization like bellow:
>
> +/* Optimize for TARGET_AVX512F
> + vpsubusw op1, op2, dst1;
> + vxorps xmm, xmm, dst2; ----> vpcmpleuw op1, op2, dst3
> + vpcmpeqw dst1, dst2, dst3 */
>
> and
>
> +/* Optimize for target above TARGET_SSE4_1
> + vpsubusw op1, op2, dst1; vpminuw op1, op2, dst1
> + vpxor xmm, xmm, dst2; ----> vpcmpeqw op1, dst1, dst3
> + vpcmpeqw dst1, dst2, dst3 */
>
> Bootstrap is ok, regression test is ok for i386/x86-64 backend.
> Ok for trunk?
>
> gcc/ChangeLog:
> PR target/96906
> * config/i386/sse.md (VI12_128_256): New mode iterator.
> (define_peephole2): Optimize comparison between result of
> us_minus and 0, it could be optimized to "vpcmplequ" for
> AVX512 or "pminu + cmpeq" for target above TARGET_SSE4_1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx2-pr96906-1.c: New test.
> * gcc.target/i386/avx512f-pr96906-1.c: New test.
> * gcc.target/i386/sse2-pr96906.c: New test.
> * gcc.target/i386/sse4_1-pr96906-1.c: New test.
I'd look to see if a combiner pattern could help with these too rather
than using a peep2.
jeff
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-11-17 0:09 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-03 9:39 [PATCH] Optimize comparison between result of us_minus and 0 Hongtao Liu
2020-11-17 0:08 ` Jeff Law
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).