public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types @ 2022-02-03 2:29 gabravier at gmail dot com 2022-02-03 2:45 ` [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0) pinskia at gcc dot gnu.org 2022-02-03 2:54 ` pinskia at gcc dot gnu.org 0 siblings, 2 replies; 3+ messages in thread From: gabravier at gmail dot com @ 2022-02-03 2:29 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104360 Bug ID: 104360 Summary: Failure to optimize abs pattern on vector types Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- #include <stdint.h> typedef int16_t v8i16 __attribute__((vector_size(16))); v8i16 abs_i16(v8i16 x) { auto isN = x < v8i16{}; x ^= isN; return x - isN; } This (although I think v8i16 could be replaced with any integer vector type and it still would work) can be optimized to using an abs instruction where possible (such as `pabsw` on x86-64, or `abs` on aarch64) PS: this doesn't even necessarily require an abs instruction. on standard x86-64 with -O3, GCC manages just this: abs_i16(short __vector(8)): pxor xmm1, xmm1 pcmpgtw xmm1, xmm0 pxor xmm0, xmm1 psubw xmm0, xmm1 ret whereas LLVM outputs this: abs_i16(short __vector(8)): pxor xmm1, xmm1 psubw xmm1, xmm0 pmaxsw xmm0, xmm1 ret which I'm pretty sure is better. ^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0) 2022-02-03 2:29 [Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types gabravier at gmail dot com @ 2022-02-03 2:45 ` pinskia at gcc dot gnu.org 2022-02-03 2:54 ` pinskia at gcc dot gnu.org 1 sibling, 0 replies; 3+ messages in thread From: pinskia at gcc dot gnu.org @ 2022-02-03 2:45 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104360 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Severity|normal |enhancement Summary|Failure to optimize abs |Failure to optimize abs |pattern on vector types |pattern (x^(x<0?-1:0)) - | |(x<0?-1:0) Keywords| |missed-optimization Last reconfirmed| |2022-02-03 Status|UNCONFIRMED |NEW --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Even the scalar version is not optimized: typedef short i16; i16 abs_i16(i16 x) { auto isN = -(x < 0); x ^= isN; return x - isN; } Shouldn't be too hard to optimize for both. What is funny is clang/LLVM does not catch the scalar version either unless you do: (x < i16{}) ? -1 : 0 ^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0) 2022-02-03 2:29 [Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types gabravier at gmail dot com 2022-02-03 2:45 ` [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0) pinskia at gcc dot gnu.org @ 2022-02-03 2:54 ` pinskia at gcc dot gnu.org 1 sibling, 0 replies; 3+ messages in thread From: pinskia at gcc dot gnu.org @ 2022-02-03 2:54 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104360 --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Note it is easier to detect the vector version of this though: isN_3 = x_2(D) < { 0, 0, 0, 0, 0, 0, 0, 0 }; x_4 = x_2(D) ^ isN_3; _5 = x_4 - isN_3; Pattern here: (minus @0 (bit_xor:c @0 (lt@1 @0 vertor_zero_p))) than the scalar version: _10 = x_6(D) < 0; _11 = (int) _10; _12 = -_11; _1 = (short int) _12; x_7 = _1 ^ x_6(D); x.1_2 = (unsigned short) x_7; _3 = (unsigned short) _12; _4 = x.1_2 - _3; _8 = (i16) _4; Because of the overflow and such. If we used -fwrapv we get: _7 = x_3(D) < 0; _8 = (int) _7; _9 = -_8; _1 = (short int) _9; x_4 = _1 ^ x_3(D); _5 = x_4 - _1; Where we could reduce _1 to just: t = (short int) _7; _1 = -t; And then it is just pattern matching. For int we get: _6 = x_2(D) < 0; _7 = (int) _6; _8 = -_7; x_3 = x_2(D) ^ _8; _4 = x_3 + _7; Which should be easy to pattern match. (plus:c (bit_xor:c @0 (neg (convert@1 (lt @0 zero_p)))) @1) ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-02-03 2:54 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-03 2:29 [Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types gabravier at gmail dot com 2022-02-03 2:45 ` [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0) pinskia at gcc dot gnu.org 2022-02-03 2:54 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).