public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/99228] New: blend/shuffle
@ 2021-02-23 20:43 g.peterhoff@t-online.de
2021-02-23 21:43 ` [Bug target/99228] blend/shuffle pinskia at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: g.peterhoff@t-online.de @ 2021-02-23 20:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228
Bug ID: 99228
Summary: blend/shuffle
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: g.peterhoff@t-online.de
Target Milestone: ---
Hello ggc team,
the compiler generates very inefficient code for the sgn functions (scalar and
complex arguments)
https://godbolt.org/z/zvE3Mf
scalar
- float32/64: 2 conditional jumps instead of blend/shuffle
- float80: no fcmov
- integer: only cmov instead of blend/shuffle
complex
- float32/64: 4 conditional jumps instead of blend/shuffle
- float80: no fcmov
- integer: only cmov instead of blend/shuffle
For testing I have 3 versions each:
v1: total disaster
v2: better, only half of the jumps each time, but clang can't really handle
that
v3: like v2, but clang seems to work too. If you remove [[likely]] from
conditional_move like v1.
regards
Gero
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/99228] blend/shuffle
2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
@ 2021-02-23 21:43 ` pinskia at gcc dot gnu.org
2021-02-23 21:57 ` g.peterhoff@t-online.de
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-02-23 21:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
URL|https://godbolt.org/z/zvE3M |
|f |
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Can you please attach the preprocessed source instead of using godbolt link as
a lot of us don't have boost installed?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/99228] blend/shuffle
2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
2021-02-23 21:43 ` [Bug target/99228] blend/shuffle pinskia at gcc dot gnu.org
@ 2021-02-23 21:57 ` g.peterhoff@t-online.de
2021-02-24 5:23 ` crazylht at gmail dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: g.peterhoff@t-online.de @ 2021-02-23 21:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228
--- Comment #2 from g.peterhoff@t-online.de ---
I only use the types of boost here. You can remove boost and use:
using float80_t = long double;
using float64_t = double;
using float32_t = float;
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/99228] blend/shuffle
2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
2021-02-23 21:43 ` [Bug target/99228] blend/shuffle pinskia at gcc dot gnu.org
2021-02-23 21:57 ` g.peterhoff@t-online.de
@ 2021-02-24 5:23 ` crazylht at gmail dot com
2021-02-24 8:23 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: crazylht at gmail dot com @ 2021-02-24 5:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
1. To eliminate branch instructions, -ffast-math needs to be added.
2. Without inline complex sgn, gcc also generate blend/shuffle
-std=gnu++20 -Ofast -march=znver2 -mno-vzeroupper
#include<math.h>
#include<complex>
#include<iostream>
#define TYPE double
TYPE
sgn(const TYPE &arg)
{
// https://de.wikipedia.org/wiki/Vorzeichenfunktion
const TYPE s{copysign (TYPE{1}, arg)};
// v1
return (arg != 0) ? s : 0;
// v2
//if (arg != 0) [[likely]] return s;
//else return 0;
// v3
//return std::conditional_move(arg != 0, s, Type{0});
}
TYPE
complex_sgn(const std::complex<TYPE> &arg)
{
// https://en.wikipedia.org/wiki/Sign_function#Complex_signum
const TYPE sr{sgn(arg.real())};
const TYPE si{sgn(arg.imag())};
// v1
return (arg.real() != 0) ? sr : si;
// v2
//if (arg.real() != 0) [[likely]] return sr;
//else return
si;
// v3
//return std::conditional_move(arg.real() != 0, sr, si);
}
int main(const int argc, const char** args)
{
using value_type = TYPE;
using complex_type = std::complex<TYPE>;
if (argc == 4)
{
const value_type
a{value_type(std::stod(args[1]))};
const complex_type
b{value_type(std::stod(args[2])), value_type(std::stod(args[3]))};
std::cout << a << std::endl;
std::cout << b << std::endl;
std::cout << sgn(a) << std::endl;
std::cout << complex_sgn(b) << std::endl;
}
return EXIT_SUCCESS;
}
assemble code
sgn(double const&):
vmovsd xmm0, QWORD PTR [rdi]
vcomisd xmm0, QWORD PTR .LC0[rip]
je .L8
vandpd xmm0, xmm0, XMMWORD PTR .LC1[rip]
vorpd xmm0, xmm0, XMMWORD PTR .LC2[rip]
.L8:
ret
complex_sgn(std::complex<double> const&):
vmovsd xmm0, QWORD PTR [rdi+8]
vmovq xmm4, QWORD PTR .LC1[rip]
vxorpd xmm2, xmm2, xmm2
vmovq xmm3, QWORD PTR .LC2[rip]
vmovsd xmm1, QWORD PTR [rdi]
vmovsd xmm5, xmm0, xmm0
vcmpeq_ussd xmm6, xmm0, xmm2
vandpd xmm5, xmm5, xmm4
vorpd xmm5, xmm5, xmm3
vblendvpd xmm0, xmm5, xmm0, xmm6
vmovsd xmm5, xmm1, xmm1
vandpd xmm5, xmm5, xmm4
vcmpneq_oqsd xmm1, xmm1, xmm2
vorpd xmm5, xmm5, xmm3
vblendvpd xmm0, xmm0, xmm5, xmm1
ret
https://godbolt.org/z/cosh93
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/99228] blend/shuffle
2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
` (2 preceding siblings ...)
2021-02-24 5:23 ` crazylht at gmail dot com
@ 2021-02-24 8:23 ` rguenth at gcc dot gnu.org
2021-03-02 11:43 ` g.peterhoff@t-online.de
2021-12-23 21:27 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-24 8:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|unknown |11.0
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note when everything is inlined into main() it tends to be predicted cold
(because we know main is exactly invoked once) and thus many optimizations do
not apply.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/99228] blend/shuffle
2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
` (3 preceding siblings ...)
2021-02-24 8:23 ` rguenth at gcc dot gnu.org
@ 2021-03-02 11:43 ` g.peterhoff@t-online.de
2021-12-23 21:27 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: g.peterhoff@t-online.de @ 2021-03-02 11:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228
--- Comment #5 from g.peterhoff@t-online.de ---
Here is a better test case. https://godbolt.org/z/3Gq783
I've found:
sgn_complex
- always inefficient code, TYPE and SIZE do not matter, even with -Ofast or
-fast-math
for TYPE=double
SIZE=1
- abs/mul/div/pow2_complex ok
- zero_complex not vectorized, also with -Ofast or -ffast-math
SIZE=2
- abs/mul/div/pow2/zero_complex only with scalar operations, never vectorized
SIZE=4 and larger
- abs/mul/div/pow2/zero_complex ok
for TYPE=float
SIZE=1
- abs/mul/pow2_complex ok
- div/zero_complex not vectorized, also with -Ofast or -ffast-math
SIZE=2
- abs/mul/div/pow2/zero_complex only with scalar operations, never vectorized
SIZE=4
- abs/pow2/zero_complex ok
- mul_complex inefficient, xmm instead of ymm, also with -Ofast or -ffast-math
- div_complex ok with O3, but with Ofast/fast-math only xmm instead of ymm
SIZE=8 and larger
- abs/mul/div/pow2_complex ok
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/99228] blend/shuffle
2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
` (4 preceding siblings ...)
2021-03-02 11:43 ` g.peterhoff@t-online.de
@ 2021-12-23 21:27 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-23 21:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm, the trunk no longer does the if conversion:
complex_sgn(std::complex<double> const&):
.LFB2678:
.cfi_startproc
vmovsd xmm0, QWORD PTR [rdi]
vxorpd xmm1, xmm1, xmm1
vcomisd xmm0, xmm1
jne .L8
vmovsd xmm0, QWORD PTR [rdi+8]
vcomisd xmm0, xmm1
je .L13
.L8:
vandpd xmm0, xmm0, XMMWORD PTR .LC1[rip]
vorpd xmm0, xmm0, XMMWORD PTR .LC2[rip]
ret
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-12-23 21:27 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
2021-02-23 21:43 ` [Bug target/99228] blend/shuffle pinskia at gcc dot gnu.org
2021-02-23 21:57 ` g.peterhoff@t-online.de
2021-02-24 5:23 ` crazylht at gmail dot com
2021-02-24 8:23 ` rguenth at gcc dot gnu.org
2021-03-02 11:43 ` g.peterhoff@t-online.de
2021-12-23 21:27 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).