public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code
@ 2021-06-06  9:39 denis.yaroshevskij at gmail dot com
  2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: denis.yaroshevskij at gmail dot com @ 2021-06-06  9:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

            Bug ID: 100929
           Summary: gcc fails to optimize less to min for SIMD code
           Product: gcc
           Version: og10 (devel/omp/gcc-10)
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: denis.yaroshevskij at gmail dot com
  Target Milestone: ---

Stand alone float - x86 example:
https://godbolt.org/z/vr3cjvY5G

Using a library x86 float, int, aarch64: https://godbolt.org/z/zPP48vzrq

less + blend or greater + blend should become min/max.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c++/100929] gcc fails to optimize less to min for SIMD code
  2021-06-06  9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
@ 2021-06-06 14:29 ` glisse at gcc dot gnu.org
  2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-06-06 14:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
Please attach your testcases to the bug report. godbolt links are nice
complements, but not considered sufficient here.

We don't lower the comparison or the blend in GIMPLE (yet). I think Hongtao Liu
is doing blends right now. I don't know if there would be issues for
comparisons (with -ftrapping-math for instance?).

If you write (x<y)?x:y with everything of type __m128, we do generated minps.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/100929] gcc fails to optimize less to min for SIMD code
  2021-06-06  9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
  2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org
@ 2021-06-06 14:31 ` glisse at gcc dot gnu.org
  2021-06-06 20:18 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-06-06 14:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|og10 (devel/omp/gcc-10)     |11.1.0
           Keywords|                            |missed-optimization
          Component|c++                         |target
           Severity|normal                      |enhancement
             Target|                            |x86_64-*-*

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/100929] gcc fails to optimize less to min for SIMD code
  2021-06-06  9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
  2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org
  2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org
@ 2021-06-06 20:18 ` pinskia at gcc dot gnu.org
  2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-06 20:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Original x86_64 testcase:

#include <immintrin.h>

__m256 if_else(__m256 x, __m256 y) {
  __m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ);
  return _mm256_blendv_ps(x, y, mask);
}

__m256 min(__m256 x, __m256 y) {
  return _mm256_min_ps(x, y);
}

---- CUT -----
Note the other testcase is using eve which I have no idea what it is coming
from.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/100929] gcc fails to optimize less to min for SIMD code
  2021-06-06  9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
                   ` (2 preceding siblings ...)
  2021-06-06 20:18 ` pinskia at gcc dot gnu.org
@ 2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com
  2021-06-07  6:02 ` glisse at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: denis.yaroshevskij at gmail dot com @ 2021-06-06 22:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #3 from Denis Yaroshevskiy <denis.yaroshevskij at gmail dot com> ---
> Please attach your testcases to the bug report.

Is what @Andrew Pinski copied enough? I can attach the same code as file.

> I don't know if there would be issues for comparisons (with -ftrapping-math for instance?).

-ftrapping-math causes clang to stop doing this optimisation.

I can see that clang does it, so I assume `nans` are OK without this flag. For
ints this is for sure OK.

> Note the other testcase is using eve which I have no idea what it is coming from.

Using eve just was much easier then writing this with intrinsics:

The point was:

        vpcmpgtd        ymm2, ymm0, ymm1
        vpblendvb       ymm0, ymm0, ymm1, ymm2

should become

        vpminsd ymm0, ymm1, ymm0

And on arm:

        cmgt    v2.4s, v0.4s, v1.4s
        bit     v0.16b, v1.16b, v2.16b

should become
       smin    v0.4s, v1.4s, v0.4s

And
        fcmgt   v2.4s, v0.4s, v1.4s
        bit     v0.16b, v1.16b, v2.16b

should become
       fmin    v0.4s, v1.4s, v0.4s


I don't really know how it is done in `gcc` - but all these examples look like
the same issue. If it is very helpful to write all of them as intrinsics, I
can.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/100929] gcc fails to optimize less to min for SIMD code
  2021-06-06  9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
                   ` (3 preceding siblings ...)
  2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com
@ 2021-06-07  6:02 ` glisse at gcc dot gnu.org
  2021-06-07  8:00 ` denis.yaroshevskij at gmail dot com
  2022-04-03 12:56 ` glisse at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-06-07  6:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Denis Yaroshevskiy from comment #3)
> Is what @Andrew Pinski copied enough?

I think so (it is missing the command line), although one example with an
integer type could also help in case floats turn out to have a different issue.

> -ftrapping-math causes clang to stop doing this optimisation.

Note that -ftrapping-math is on by default with gcc (PR 54192), but
-fno-trapping-math wouldn't solve your problem, we are missing other things.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/100929] gcc fails to optimize less to min for SIMD code
  2021-06-06  9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
                   ` (4 preceding siblings ...)
  2021-06-07  6:02 ` glisse at gcc dot gnu.org
@ 2021-06-07  8:00 ` denis.yaroshevskij at gmail dot com
  2022-04-03 12:56 ` glisse at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: denis.yaroshevskij at gmail dot com @ 2021-06-07  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #5 from Denis Yaroshevskiy <denis.yaroshevskij at gmail dot com> ---
x86  (https://godbolt.org/z/zPWbnqfPY)

Options: -O3 -mavx2
```
#include <immintrin.h>

__m256 if_else_float(__m256 x, __m256 y) {
  __m256 mask = _mm256_cmp_ps(y, x, _CMP_LT_OQ);
  return _mm256_blendv_ps(x, y, mask);
}

__m256 min_float(__m256 x, __m256 y) {
  return _mm256_min_ps(x, y);
}

__m256i if_else_int(__m256i x, __m256i y) {
  __m256i mask = _mm256_cmpgt_epi32(x, y);
  return _mm256_blendv_epi8(x, y, mask);
}

__m256i min_int(__m256i x, __m256i y) {
  return _mm256_min_epi32(x, y);
}
```

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/100929] gcc fails to optimize less to min for SIMD code
  2021-06-06  9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
                   ` (5 preceding siblings ...)
  2021-06-07  8:00 ` denis.yaroshevskij at gmail dot com
@ 2022-04-03 12:56 ` glisse at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: glisse at gcc dot gnu.org @ 2022-04-03 12:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100929

--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> ---
(blend is now lowered in gimple)

For the integer case, the mix of vector(int) and vector(char) obfuscates things
a bit, we have

__m256i if_else_int (__m256i x, __m256i y)
{
  vector(32) char _4;
  vector(32) char _5;
  vector(32) char _6;
  vector(32) <signed-boolean:8> _7;
  vector(32) char _8; 
  vector(4) long long int _9;
  vector(8) int _10;
  vector(8) int _11;
  vector(8) <signed-boolean:32> _12;
  vector(8) int _13;

  <bb 2> [local count: 1073741824]: 
  _10 = VIEW_CONVERT_EXPR<vector(8) int>(x_2(D));
  _11 = VIEW_CONVERT_EXPR<vector(8) int>(y_3(D));
  _12 = _10 > _11;
  _13 = VEC_COND_EXPR <_12, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0,
0, 0, 0, 0 }>;
  _5 = VIEW_CONVERT_EXPR<vector(32) char>(_13);
  _4 = VIEW_CONVERT_EXPR<vector(32) char>(y_3(D));
  _6 = VIEW_CONVERT_EXPR<vector(32) char>(x_2(D));
  _7 = _5 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _8 = VEC_COND_EXPR <_7, _4, _6>;
  _9 = VIEW_CONVERT_EXPR<__m256i>(_8);
  return _9;
}

A first step would be to teach gcc that it can do a VEC_COND_EXPR<_12, _11,
_10> with fewer VIEW_CONVERT_EXPR (maybe follow the definition chain of the
condition through trivial ops like <0, view_convert or ?-1:0 until we find a
real comparison _10 > _11, to determine the right size?).

Other steps:

* Move (or at least partially copy) fold_cond_expr_with_comparison to match.pd
so we can recognize min/max.

* Lower __builtin_ia32_cmpps256 (y_2(D), x_3(D), 17) to GIMPLE for the float
case, if that's a valid thing to do (NaN, etc).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-03 12:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-06  9:39 [Bug c++/100929] New: gcc fails to optimize less to min for SIMD code denis.yaroshevskij at gmail dot com
2021-06-06 14:29 ` [Bug c++/100929] " glisse at gcc dot gnu.org
2021-06-06 14:31 ` [Bug target/100929] " glisse at gcc dot gnu.org
2021-06-06 20:18 ` pinskia at gcc dot gnu.org
2021-06-06 22:29 ` denis.yaroshevskij at gmail dot com
2021-06-07  6:02 ` glisse at gcc dot gnu.org
2021-06-07  8:00 ` denis.yaroshevskij at gmail dot com
2022-04-03 12:56 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).