public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types
@ 2022-02-03 2:29 gabravier at gmail dot com
2022-02-03 2:45 ` [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0) pinskia at gcc dot gnu.org
2022-02-03 2:54 ` pinskia at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: gabravier at gmail dot com @ 2022-02-03 2:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104360
Bug ID: 104360
Summary: Failure to optimize abs pattern on vector types
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
#include <stdint.h>
typedef int16_t v8i16 __attribute__((vector_size(16)));
v8i16 abs_i16(v8i16 x)
{
auto isN = x < v8i16{};
x ^= isN;
return x - isN;
}
This (although I think v8i16 could be replaced with any integer vector type and
it still would work) can be optimized to using an abs instruction where
possible (such as `pabsw` on x86-64, or `abs` on aarch64)
PS: this doesn't even necessarily require an abs instruction. on standard
x86-64 with -O3, GCC manages just this:
abs_i16(short __vector(8)):
pxor xmm1, xmm1
pcmpgtw xmm1, xmm0
pxor xmm0, xmm1
psubw xmm0, xmm1
ret
whereas LLVM outputs this:
abs_i16(short __vector(8)):
pxor xmm1, xmm1
psubw xmm1, xmm0
pmaxsw xmm0, xmm1
ret
which I'm pretty sure is better.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0)
2022-02-03 2:29 [Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types gabravier at gmail dot com
@ 2022-02-03 2:45 ` pinskia at gcc dot gnu.org
2022-02-03 2:54 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-03 2:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104360
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Severity|normal |enhancement
Summary|Failure to optimize abs |Failure to optimize abs
|pattern on vector types |pattern (x^(x<0?-1:0)) -
| |(x<0?-1:0)
Keywords| |missed-optimization
Last reconfirmed| |2022-02-03
Status|UNCONFIRMED |NEW
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Even the scalar version is not optimized:
typedef short i16;
i16 abs_i16(i16 x)
{
auto isN = -(x < 0);
x ^= isN;
return x - isN;
}
Shouldn't be too hard to optimize for both.
What is funny is clang/LLVM does not catch the scalar version either unless you
do:
(x < i16{}) ? -1 : 0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0)
2022-02-03 2:29 [Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types gabravier at gmail dot com
2022-02-03 2:45 ` [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0) pinskia at gcc dot gnu.org
@ 2022-02-03 2:54 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-02-03 2:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104360
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note it is easier to detect the vector version of this though:
isN_3 = x_2(D) < { 0, 0, 0, 0, 0, 0, 0, 0 };
x_4 = x_2(D) ^ isN_3;
_5 = x_4 - isN_3;
Pattern here:
(minus @0 (bit_xor:c @0 (lt@1 @0 vertor_zero_p)))
than the scalar version:
_10 = x_6(D) < 0;
_11 = (int) _10;
_12 = -_11;
_1 = (short int) _12;
x_7 = _1 ^ x_6(D);
x.1_2 = (unsigned short) x_7;
_3 = (unsigned short) _12;
_4 = x.1_2 - _3;
_8 = (i16) _4;
Because of the overflow and such.
If we used -fwrapv we get:
_7 = x_3(D) < 0;
_8 = (int) _7;
_9 = -_8;
_1 = (short int) _9;
x_4 = _1 ^ x_3(D);
_5 = x_4 - _1;
Where we could reduce _1 to just:
t = (short int) _7;
_1 = -t;
And then it is just pattern matching.
For int we get:
_6 = x_2(D) < 0;
_7 = (int) _6;
_8 = -_7;
x_3 = x_2(D) ^ _8;
_4 = x_3 + _7;
Which should be easy to pattern match.
(plus:c (bit_xor:c @0 (neg (convert@1 (lt @0 zero_p)))) @1)
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-02-03 2:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03 2:29 [Bug tree-optimization/104360] New: Failure to optimize abs pattern on vector types gabravier at gmail dot com
2022-02-03 2:45 ` [Bug tree-optimization/104360] Failure to optimize abs pattern (x^(x<0?-1:0)) - (x<0?-1:0) pinskia at gcc dot gnu.org
2022-02-03 2:54 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).