public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/103932] New: x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result)
@ 2022-01-06 19:11 nekotekina at gmail dot com
  2022-01-06 21:23 ` [Bug target/103932] " pinskia at gcc dot gnu.org
  2022-01-06 21:25 ` pinskia at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: nekotekina at gmail dot com @ 2022-01-06 19:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103932

            Bug ID: 103932
           Summary: x86: strange unoptimized code generated (multiple
                    negations of _mm_testz_si128 result)
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nekotekina at gmail dot com
  Target Milestone: ---

GCC generates seemingly unoptimized sequence of instructions in certain cases
(can't tell exactly what triggers it, example code is below):

        xor     eax, eax
        vptest  xmm0, xmm0
        sete    al
        test    eax, eax
        sete    al
        movzx   eax, al

This should be something like this:
xor eax, eax
vptest xmm0, xmm0
setne al


https://godbolt.org/z/sTaG65Ksc
Code (-O3 -std=c++20 -march=skylake):

#include <emmintrin.h>
#include <immintrin.h>
#include <bit>
#include <cstdint>

template <typename T>
concept Vector128 = (sizeof(T) == 16);

using u64 = std::uint64_t;
using u32 = std::uint32_t;

union alignas(16) v128
{
        u64 _u64[2];

        v128() = default;

        constexpr v128(const v128&) noexcept = default;

        template <Vector128 T>
        constexpr v128(const T& rhs) noexcept
                : v128(std::bit_cast<v128>(rhs))
        {
        }

        constexpr v128& operator=(const v128&) noexcept = default;

        template <Vector128 T>
        constexpr operator T() const noexcept
        {
                return std::bit_cast<T>(*this);
        }
};

// Test if vector is zero
inline bool gv_testz(const v128& arg)
{
#if defined(__SSE4_1__)
        return _mm_testz_si128(arg, arg);
#else
        return !(arg._u64[0] | arg._u64[1]);
#endif
}

struct alignas(16) context_t
{
        v128 vec[32];
        v128 sat;
};

void test1(context_t& ctx, u32 n)
{
        const u64 bit = !gv_testz(ctx.sat);
        v128 r;
        r._u64[0] = 0;
        r._u64[1] = bit;
        ctx.vec[n] = r;
}

void test2(context_t& ctx, u32 n)
{
        ctx.vec[n]._u64[1] = !gv_testz(ctx.sat);
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/103932] x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result)
  2022-01-06 19:11 [Bug c++/103932] New: x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result) nekotekina at gmail dot com
@ 2022-01-06 21:23 ` pinskia at gcc dot gnu.org
  2022-01-06 21:25 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-06 21:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103932

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2022-01-06
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The gimple level is decent:
  _7 = __builtin_ia32_ptestz128 (_4, _4);
  _1 = _7 == 0;
  bit_5 = (const u64) _1;

Though _7 is already 0 or 1. The problem is the target does not say the range
of __builtin_ia32_ptestz128 is 0 or 1.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/103932] x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result)
  2022-01-06 19:11 [Bug c++/103932] New: x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result) nekotekina at gmail dot com
  2022-01-06 21:23 ` [Bug target/103932] " pinskia at gcc dot gnu.org
@ 2022-01-06 21:25 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-06 21:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103932

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
And the reason why the == 0 does not combine (at the RTL level) with the
builtin is because the shift clobbers CC:
(insn 13 12 14 2 (parallel [
            (set (reg:DI 90)
                (ashift:DI (reg:DI 89 [ n ])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) "/app/example.cpp":58:13 710 {*ashldi3_1}
     (expr_list:REG_DEAD (reg:DI 89 [ n ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-06 21:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06 19:11 [Bug c++/103932] New: x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result) nekotekina at gmail dot com
2022-01-06 21:23 ` [Bug target/103932] " pinskia at gcc dot gnu.org
2022-01-06 21:25 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).