public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code
@ 2021-06-06 9:39 denis.yaroshevskij at gmail dot com
2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: denis.yaroshevskij at gmail dot com @ 2021-06-06 9:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
Bug ID: 100929
Summary: gcc fails to optimize less to min for SIMD code
Product: gcc
Version: og10 (devel/omp/gcc-10)
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: denis.yaroshevskij at gmail dot com
Target Milestone: ---
Stand alone float - x86 example:
https://godbolt.org/z/vr3cjvY5G
Using a library x86 float, int, aarch64: https://godbolt.org/z/zPP48vzrq
less + blend or greater + blend should become min/max.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug c++/100929] gcc fails to optimize less to min for SIMD code
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
@ 2021-06-06 14:29 ` glisse at gcc dot gnu.org
2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-06-06 14:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
Please attach your testcases to the bug report. godbolt links are nice
complements, but not considered sufficient here.
We don't lower the comparison or the blend in GIMPLE (yet). I think Hongtao Liu
is doing blends right now. I don't know if there would be issues for
comparisons (with -ftrapping-math for instance?).
If you write (x<y)?x:y with everything of type __m128, we do generated minps.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org
@ 2021-06-06 14:31 ` glisse at gcc dot gnu.org
2021-06-06 20:18 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-06-06 14:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
Marc Glisse <glisse at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|og10 (devel/omp/gcc-10) |11.1.0
Keywords| |missed-optimization
Component|c++ |target
Severity|normal |enhancement
Target| |x86_64-*-*
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org
2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org
@ 2021-06-06 20:18 ` pinskia at gcc dot gnu.org
2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-06 20:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Original x86_64 testcase:
#include <immintrin.h>
__m256 if_else(__m256 x, __m256 y) {
__m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ);
return _mm256_blendv_ps(x, y, mask);
}
__m256 min(__m256 x, __m256 y) {
return _mm256_min_ps(x, y);
}
---- CUT -----
Note the other testcase is using eve which I have no idea what it is coming
from.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
` (2 preceding siblings ...)
2021-06-06 20:18 ` pinskia at gcc dot gnu.org
@ 2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com
2021-06-07 6:02 ` glisse at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: denis.yaroshevskij at gmail dot com @ 2021-06-06 22:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
--- Comment #3 from Denis Yaroshevskiy <denis.yaroshevskij at gmail dot com> ---
> Please attach your testcases to the bug report.
Is what @Andrew Pinski copied enough? I can attach the same code as file.
> I don't know if there would be issues for comparisons (with -ftrapping-math for instance?).
-ftrapping-math causes clang to stop doing this optimisation.
I can see that clang does it, so I assume `nans` are OK without this flag. For
ints this is for sure OK.
> Note the other testcase is using eve which I have no idea what it is coming from.
Using eve just was much easier then writing this with intrinsics:
The point was:
vpcmpgtd ymm2, ymm0, ymm1
vpblendvb ymm0, ymm0, ymm1, ymm2
should become
vpminsd ymm0, ymm1, ymm0
And on arm:
cmgt v2.4s, v0.4s, v1.4s
bit v0.16b, v1.16b, v2.16b
should become
smin v0.4s, v1.4s, v0.4s
And
fcmgt v2.4s, v0.4s, v1.4s
bit v0.16b, v1.16b, v2.16b
should become
fmin v0.4s, v1.4s, v0.4s
I don't really know how it is done in `gcc` - but all these examples look like
the same issue. If it is very helpful to write all of them as intrinsics, I
can.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
` (3 preceding siblings ...)
2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com
@ 2021-06-07 6:02 ` glisse at gcc dot gnu.org
2021-06-07 8:00 ` denis.yaroshevskij at gmail dot com
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-06-07 6:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Denis Yaroshevskiy from comment #3)
> Is what @Andrew Pinski copied enough?
I think so (it is missing the command line), although one example with an
integer type could also help in case floats turn out to have a different issue.
> -ftrapping-math causes clang to stop doing this optimisation.
Note that -ftrapping-math is on by default with gcc (PR 54192), but
-fno-trapping-math wouldn't solve your problem, we are missing other things.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
` (4 preceding siblings ...)
2021-06-07 6:02 ` glisse at gcc dot gnu.org
@ 2021-06-07 8:00 ` denis.yaroshevskij at gmail dot com
2022-04-03 12:56 ` glisse at gcc dot gnu.org
2024-07-19 21:36 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: denis.yaroshevskij at gmail dot com @ 2021-06-07 8:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
--- Comment #5 from Denis Yaroshevskiy <denis.yaroshevskij at gmail dot com> ---
x86 (https://godbolt.org/z/zPWbnqfPY)
Options: -O3 -mavx2
```
#include <immintrin.h>
__m256 if_else_float(__m256 x, __m256 y) {
__m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ);
return _mm256_blendv_ps(x, y, mask);
}
__m256 min_float(__m256 x, __m256 y) {
return _mm256_min_ps(x, y);
}
__m256i if_else_int(__m256i x, __m256i y) {
__m256i mask = _mm256_cmpgt_epi32(x, y);
return _mm256_blendv_epi8(x, y, mask);
}
__m256i min_int(__m256i x, __m256i y) {
return _mm256_min_epi32(x, y);
}
```
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
` (5 preceding siblings ...)
2021-06-07 8:00 ` denis.yaroshevskij at gmail dot com
@ 2022-04-03 12:56 ` glisse at gcc dot gnu.org
2024-07-19 21:36 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2022-04-03 12:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> ---
(blend is now lowered in gimple)
For the integer case, the mix of vector(int) and vector(char) obfuscates things
a bit, we have
__m256i if_else_int (__m256i x, __m256i y)
{
vector(32) char _4;
vector(32) char _5;
vector(32) char _6;
vector(32) <signed-boolean:8> _7;
vector(32) char _8;
vector(4) long long int _9;
vector(8) int _10;
vector(8) int _11;
vector(8) <signed-boolean:32> _12;
vector(8) int _13;
<bb 2> [local count: 1073741824]:
_10 = VIEW_CONVERT_EXPR<vector(8) int>(x_2(D));
_11 = VIEW_CONVERT_EXPR<vector(8) int>(y_3(D));
_12 = _10 > _11;
_13 = VEC_COND_EXPR <_12, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0,
0, 0, 0, 0 }>;
_5 = VIEW_CONVERT_EXPR<vector(32) char>(_13);
_4 = VIEW_CONVERT_EXPR<vector(32) char>(y_3(D));
_6 = VIEW_CONVERT_EXPR<vector(32) char>(x_2(D));
_7 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
_8 = VEC_COND_EXPR <_7, _4, _6>;
_9 = VIEW_CONVERT_EXPR<__m256i>(_8);
return _9;
}
A first step would be to teach gcc that it can do a VEC_COND_EXPR<_12, _11,
_10> with fewer VIEW_CONVERT_EXPR (maybe follow the definition chain of the
condition through trivial ops like <0, view_convert or ?-1:0 until we find a
real comparison _10 > _11, to determine the right size?).
Other steps:
* Move (or at least partially copy) fold_cond_expr_with_comparison to match.pd
so we can recognize min/max.
* Lower __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17) to GIMPLE for the float
case, if that's a valid thing to do (NaN, etc).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
` (6 preceding siblings ...)
2022-04-03 12:56 ` glisse at gcc dot gnu.org
@ 2024-07-19 21:36 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-07-19 21:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2024-07-19
Ever confirmed|0 |1
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
if_else_int is optimized starting in GCC 14:
_4 = MIN_EXPR <_6, _7>;
if_else_float is still not:
```
_7 = __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17);
_5 = VIEW_CONVERT_EXPR<vector(8) int>(_7);
_4 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0 };
_6 = .VCOND_MASK (_4, y_2(D), x_3(D));
```
But that is a target issue.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-19 21:36 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org
2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org
2021-06-06 20:18 ` pinskia at gcc dot gnu.org
2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com
2021-06-07 6:02 ` glisse at gcc dot gnu.org
2021-06-07 8:00 ` denis.yaroshevskij at gmail dot com
2022-04-03 12:56 ` glisse at gcc dot gnu.org
2024-07-19 21:36 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).