public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/103932] New: x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result)
@ 2022-01-06 19:11 nekotekina at gmail dot com
2022-01-06 21:23 ` [Bug target/103932] " pinskia at gcc dot gnu.org
2022-01-06 21:25 ` pinskia at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: nekotekina at gmail dot com @ 2022-01-06 19:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103932
Bug ID: 103932
Summary: x86: strange unoptimized code generated (multiple
negations of _mm_testz_si128 result)
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: nekotekina at gmail dot com
Target Milestone: ---
GCC generates seemingly unoptimized sequence of instructions in certain cases
(can't tell exactly what triggers it, example code is below):
xor eax, eax
vptest xmm0, xmm0
sete al
test eax, eax
sete al
movzx eax, al
This should be something like this:
xor eax, eax
vptest xmm0, xmm0
setne al
https://godbolt.org/z/sTaG65Ksc
Code (-O3 -std=c++20 -march=skylake):
#include <emmintrin.h>
#include <immintrin.h>
#include <bit>
#include <cstdint>
template <typename T>
concept Vector128 = (sizeof(T) == 16);
using u64 = std::uint64_t;
using u32 = std::uint32_t;
union alignas(16) v128
{
u64 _u64[2];
v128() = default;
constexpr v128(const v128&) noexcept = default;
template <Vector128 T>
constexpr v128(const T& rhs) noexcept
: v128(std::bit_cast<v128>(rhs))
{
}
constexpr v128& operator=(const v128&) noexcept = default;
template <Vector128 T>
constexpr operator T() const noexcept
{
return std::bit_cast<T>(*this);
}
};
// Test if vector is zero
inline bool gv_testz(const v128& arg)
{
#if defined(__SSE4_1__)
return _mm_testz_si128(arg, arg);
#else
return !(arg._u64[0] | arg._u64[1]);
#endif
}
struct alignas(16) context_t
{
v128 vec[32];
v128 sat;
};
void test1(context_t& ctx, u32 n)
{
const u64 bit = !gv_testz(ctx.sat);
v128 r;
r._u64[0] = 0;
r._u64[1] = bit;
ctx.vec[n] = r;
}
void test2(context_t& ctx, u32 n)
{
ctx.vec[n]._u64[1] = !gv_testz(ctx.sat);
}
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/103932] x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result)
2022-01-06 19:11 [Bug c++/103932] New: x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result) nekotekina at gmail dot com
@ 2022-01-06 21:23 ` pinskia at gcc dot gnu.org
2022-01-06 21:25 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-06 21:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103932
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Last reconfirmed| |2022-01-06
Ever confirmed|0 |1
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The gimple level is decent:
_7 = __builtin_ia32_ptestz128 (_4, _4);
_1 = _7 == 0;
bit_5 = (const u64) _1;
Though _7 is already 0 or 1. The problem is the target does not say the range
of __builtin_ia32_ptestz128 is 0 or 1.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/103932] x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result)
2022-01-06 19:11 [Bug c++/103932] New: x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result) nekotekina at gmail dot com
2022-01-06 21:23 ` [Bug target/103932] " pinskia at gcc dot gnu.org
@ 2022-01-06 21:25 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-06 21:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103932
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
And the reason why the == 0 does not combine (at the RTL level) with the
builtin is because the shift clobbers CC:
(insn 13 12 14 2 (parallel [
(set (reg:DI 90)
(ashift:DI (reg:DI 89 [ n ])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) "/app/example.cpp":58:13 710 {*ashldi3_1}
(expr_list:REG_DEAD (reg:DI 89 [ n ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-01-06 21:25 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06 19:11 [Bug c++/103932] New: x86: strange unoptimized code generated (multiple negations of _mm_testz_si128 result) nekotekina at gmail dot com
2022-01-06 21:23 ` [Bug target/103932] " pinskia at gcc dot gnu.org
2022-01-06 21:25 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).