public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code @ 2021-06-06 9:39 denis.yaroshevskij at gmail dot com 2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org ` (6 more replies) 0 siblings, 7 replies; 8+ messages in thread From: denis.yaroshevskij at gmail dot com @ 2021-06-06 9:39 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 Bug ID: 100929 Summary: gcc fails to optimize less to min for SIMD code Product: gcc Version: og10 (devel/omp/gcc-10) Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: denis.yaroshevskij at gmail dot com Target Milestone: --- Stand alone float - x86 example: https://godbolt.org/z/vr3cjvY5G Using a library x86 float, int, aarch64: https://godbolt.org/z/zPP48vzrq less + blend or greater + blend should become min/max. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug c++/100929] gcc fails to optimize less to min for SIMD code 2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com @ 2021-06-06 14:29 ` glisse at gcc dot gnu.org 2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org ` (5 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: glisse at gcc dot gnu.org @ 2021-06-06 14:29 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> --- Please attach your testcases to the bug report. godbolt links are nice complements, but not considered sufficient here. We don't lower the comparison or the blend in GIMPLE (yet). I think Hongtao Liu is doing blends right now. I don't know if there would be issues for comparisons (with -ftrapping-math for instance?). If you write (x<y)?x:y with everything of type __m128, we do generated minps. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code 2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com 2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org @ 2021-06-06 14:31 ` glisse at gcc dot gnu.org 2021-06-06 20:18 ` pinskia at gcc dot gnu.org ` (4 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: glisse at gcc dot gnu.org @ 2021-06-06 14:31 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 Marc Glisse <glisse at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|og10 (devel/omp/gcc-10) |11.1.0 Keywords| |missed-optimization Component|c++ |target Severity|normal |enhancement Target| |x86_64-*-* ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code 2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com 2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org 2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org @ 2021-06-06 20:18 ` pinskia at gcc dot gnu.org 2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com ` (3 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: pinskia at gcc dot gnu.org @ 2021-06-06 20:18 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Original x86_64 testcase: #include <immintrin.h> __m256 if_else(__m256 x, __m256 y) { __m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ); return _mm256_blendv_ps(x, y, mask); } __m256 min(__m256 x, __m256 y) { return _mm256_min_ps(x, y); } ---- CUT ----- Note the other testcase is using eve which I have no idea what it is coming from. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code 2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com ` (2 preceding siblings ...) 2021-06-06 20:18 ` pinskia at gcc dot gnu.org @ 2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com 2021-06-07 6:02 ` glisse at gcc dot gnu.org ` (2 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: denis.yaroshevskij at gmail dot com @ 2021-06-06 22:29 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #3 from Denis Yaroshevskiy <denis.yaroshevskij at gmail dot com> --- > Please attach your testcases to the bug report. Is what @Andrew Pinski copied enough? I can attach the same code as file. > I don't know if there would be issues for comparisons (with -ftrapping-math for instance?). -ftrapping-math causes clang to stop doing this optimisation. I can see that clang does it, so I assume `nans` are OK without this flag. For ints this is for sure OK. > Note the other testcase is using eve which I have no idea what it is coming from. Using eve just was much easier then writing this with intrinsics: The point was: vpcmpgtd ymm2, ymm0, ymm1 vpblendvb ymm0, ymm0, ymm1, ymm2 should become vpminsd ymm0, ymm1, ymm0 And on arm: cmgt v2.4s, v0.4s, v1.4s bit v0.16b, v1.16b, v2.16b should become smin v0.4s, v1.4s, v0.4s And fcmgt v2.4s, v0.4s, v1.4s bit v0.16b, v1.16b, v2.16b should become fmin v0.4s, v1.4s, v0.4s I don't really know how it is done in `gcc` - but all these examples look like the same issue. If it is very helpful to write all of them as intrinsics, I can. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code 2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com ` (3 preceding siblings ...) 2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com @ 2021-06-07 6:02 ` glisse at gcc dot gnu.org 2021-06-07 8:00 ` denis.yaroshevskij at gmail dot com 2022-04-03 12:56 ` glisse at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: glisse at gcc dot gnu.org @ 2021-06-07 6:02 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to Denis Yaroshevskiy from comment #3) > Is what @Andrew Pinski copied enough? I think so (it is missing the command line), although one example with an integer type could also help in case floats turn out to have a different issue. > -ftrapping-math causes clang to stop doing this optimisation. Note that -ftrapping-math is on by default with gcc (PR 54192), but -fno-trapping-math wouldn't solve your problem, we are missing other things. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code 2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com ` (4 preceding siblings ...) 2021-06-07 6:02 ` glisse at gcc dot gnu.org @ 2021-06-07 8:00 ` denis.yaroshevskij at gmail dot com 2022-04-03 12:56 ` glisse at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: denis.yaroshevskij at gmail dot com @ 2021-06-07 8:00 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #5 from Denis Yaroshevskiy <denis.yaroshevskij at gmail dot com> --- x86 (https://godbolt.org/z/zPWbnqfPY) Options: -O3 -mavx2 ``` #include <immintrin.h> __m256 if_else_float(__m256 x, __m256 y) { __m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ); return _mm256_blendv_ps(x, y, mask); } __m256 min_float(__m256 x, __m256 y) { return _mm256_min_ps(x, y); } __m256i if_else_int(__m256i x, __m256i y) { __m256i mask = _mm256_cmpgt_epi32(x, y); return _mm256_blendv_epi8(x, y, mask); } __m256i min_int(__m256i x, __m256i y) { return _mm256_min_epi32(x, y); } ``` ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/100929] gcc fails to optimize less to min for SIMD code 2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com ` (5 preceding siblings ...) 2021-06-07 8:00 ` denis.yaroshevskij at gmail dot com @ 2022-04-03 12:56 ` glisse at gcc dot gnu.org 6 siblings, 0 replies; 8+ messages in thread From: glisse at gcc dot gnu.org @ 2022-04-03 12:56 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929 --- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> --- (blend is now lowered in gimple) For the integer case, the mix of vector(int) and vector(char) obfuscates things a bit, we have __m256i if_else_int (__m256i x, __m256i y) { vector(32) char _4; vector(32) char _5; vector(32) char _6; vector(32) <signed-boolean:8> _7; vector(32) char _8; vector(4) long long int _9; vector(8) int _10; vector(8) int _11; vector(8) <signed-boolean:32> _12; vector(8) int _13; <bb 2> [local count: 1073741824]: _10 = VIEW_CONVERT_EXPR<vector(8) int>(x_2(D)); _11 = VIEW_CONVERT_EXPR<vector(8) int>(y_3(D)); _12 = _10 > _11; _13 = VEC_COND_EXPR <_12, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }>; _5 = VIEW_CONVERT_EXPR<vector(32) char>(_13); _4 = VIEW_CONVERT_EXPR<vector(32) char>(y_3(D)); _6 = VIEW_CONVERT_EXPR<vector(32) char>(x_2(D)); _7 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; _8 = VEC_COND_EXPR <_7, _4, _6>; _9 = VIEW_CONVERT_EXPR<__m256i>(_8); return _9; } A first step would be to teach gcc that it can do a VEC_COND_EXPR<_12, _11, _10> with fewer VIEW_CONVERT_EXPR (maybe follow the definition chain of the condition through trivial ops like <0, view_convert or ?-1:0 until we find a real comparison _10 > _11, to determine the right size?). Other steps: * Move (or at least partially copy) fold_cond_expr_with_comparison to match.pd so we can recognize min/max. * Lower __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17) to GIMPLE for the float case, if that's a valid thing to do (NaN, etc). ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-04-03 12:56 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-06 9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com 2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org 2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org 2021-06-06 20:18 ` pinskia at gcc dot gnu.org 2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com 2021-06-07 6:02 ` glisse at gcc dot gnu.org 2021-06-07 8:00 ` denis.yaroshevskij at gmail dot com 2022-04-03 12:56 ` glisse at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).