public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/99228] New: blend/shuffle
@ 2021-02-23 20:43 g.peterhoff@t-online.de
  2021-02-23 21:43 ` [Bug target/99228] blend/shuffle pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: g.peterhoff@t-online.de @ 2021-02-23 20:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

            Bug ID: 99228
           Summary: blend/shuffle
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: g.peterhoff@t-online.de
  Target Milestone: ---

Hello ggc team,
the compiler generates very inefficient code for the sgn functions (scalar and
complex arguments)
https://godbolt.org/z/zvE3Mf

scalar
- float32/64: 2 conditional jumps instead of blend/shuffle
- float80: no fcmov
- integer: only cmov instead of blend/shuffle

complex
- float32/64: 4 conditional jumps instead of blend/shuffle
- float80: no fcmov
- integer: only cmov instead of blend/shuffle

For testing I have 3 versions each:
v1: total disaster
v2: better, only half of the jumps each time, but clang can't really handle
that
v3: like v2, but clang seems to work too. If you remove [[likely]] from
conditional_move like v1.

regards
Gero

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/99228] blend/shuffle
  2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
@ 2021-02-23 21:43 ` pinskia at gcc dot gnu.org
  2021-02-23 21:57 ` g.peterhoff@t-online.de
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-02-23 21:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|https://godbolt.org/z/zvE3M |
                   |f                           |

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Can you please attach the preprocessed source instead of using godbolt link as
a lot of us don't have boost installed?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/99228] blend/shuffle
  2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
  2021-02-23 21:43 ` [Bug target/99228] blend/shuffle pinskia at gcc dot gnu.org
@ 2021-02-23 21:57 ` g.peterhoff@t-online.de
  2021-02-24  5:23 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: g.peterhoff@t-online.de @ 2021-02-23 21:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

--- Comment #2 from g.peterhoff@t-online.de ---
I only use the types of boost here. You can remove boost and use:
using float80_t = long double;
using float64_t = double;
using float32_t = float;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/99228] blend/shuffle
  2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
  2021-02-23 21:43 ` [Bug target/99228] blend/shuffle pinskia at gcc dot gnu.org
  2021-02-23 21:57 ` g.peterhoff@t-online.de
@ 2021-02-24  5:23 ` crazylht at gmail dot com
  2021-02-24  8:23 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: crazylht at gmail dot com @ 2021-02-24  5:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
1. To eliminate branch instructions, -ffast-math needs to be added.
2. Without inline complex sgn, gcc also generate blend/shuffle

-std=gnu++20 -Ofast -march=znver2 -mno-vzeroupper

#include<math.h>
#include<complex>
#include<iostream>
#define TYPE double


TYPE
sgn(const TYPE &arg)
{
        //      https://de.wikipedia.org/wiki/Vorzeichenfunktion
        const TYPE s{copysign (TYPE{1}, arg)};

    //  v1
        return (arg != 0) ? s : 0;

    //  v2
    //if (arg != 0)   [[likely]]  return s;
    //else                        return 0;

    //  v3
    //return std::conditional_move(arg != 0, s, Type{0});
}

TYPE
complex_sgn(const std::complex<TYPE> &arg)
{
        //      https://en.wikipedia.org/wiki/Sign_function#Complex_signum
        const TYPE sr{sgn(arg.real())};
    const TYPE si{sgn(arg.imag())};

    //  v1
    return (arg.real() != 0) ? sr : si;

    //  v2
        //if (arg.real() != 0)  [[likely]]      return sr;
        //else                                                          return
si;

    //  v3
    //return std::conditional_move(arg.real() != 0, sr, si);
}

int main(const int argc, const char** args)
{
        using value_type = TYPE;
    using complex_type = std::complex<TYPE>;

    if (argc == 4)
    {
        const value_type
            a{value_type(std::stod(args[1]))};
        const complex_type
            b{value_type(std::stod(args[2])), value_type(std::stod(args[3]))};

        std::cout << a << std::endl;
        std::cout << b << std::endl;
        std::cout << sgn(a) << std::endl;
        std::cout << complex_sgn(b) << std::endl;
    }
        return EXIT_SUCCESS;
}

assemble code

sgn(double const&):
        vmovsd  xmm0, QWORD PTR [rdi]
        vcomisd xmm0, QWORD PTR .LC0[rip]
        je      .L8
        vandpd  xmm0, xmm0, XMMWORD PTR .LC1[rip]
        vorpd   xmm0, xmm0, XMMWORD PTR .LC2[rip]
.L8:
        ret
complex_sgn(std::complex<double> const&):
        vmovsd  xmm0, QWORD PTR [rdi+8]
        vmovq   xmm4, QWORD PTR .LC1[rip]
        vxorpd  xmm2, xmm2, xmm2
        vmovq   xmm3, QWORD PTR .LC2[rip]
        vmovsd  xmm1, QWORD PTR [rdi]
        vmovsd  xmm5, xmm0, xmm0
        vcmpeq_ussd     xmm6, xmm0, xmm2
        vandpd  xmm5, xmm5, xmm4
        vorpd   xmm5, xmm5, xmm3
        vblendvpd       xmm0, xmm5, xmm0, xmm6
        vmovsd  xmm5, xmm1, xmm1
        vandpd  xmm5, xmm5, xmm4
        vcmpneq_oqsd    xmm1, xmm1, xmm2
        vorpd   xmm5, xmm5, xmm3
        vblendvpd       xmm0, xmm0, xmm5, xmm1
        ret

https://godbolt.org/z/cosh93

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/99228] blend/shuffle
  2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
                   ` (2 preceding siblings ...)
  2021-02-24  5:23 ` crazylht at gmail dot com
@ 2021-02-24  8:23 ` rguenth at gcc dot gnu.org
  2021-03-02 11:43 ` g.peterhoff@t-online.de
  2021-12-23 21:27 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-02-24  8:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |11.0

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note when everything is inlined into main() it tends to be predicted cold
(because we know main is exactly invoked once) and thus many optimizations do
not apply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/99228] blend/shuffle
  2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
                   ` (3 preceding siblings ...)
  2021-02-24  8:23 ` rguenth at gcc dot gnu.org
@ 2021-03-02 11:43 ` g.peterhoff@t-online.de
  2021-12-23 21:27 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: g.peterhoff@t-online.de @ 2021-03-02 11:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

--- Comment #5 from g.peterhoff@t-online.de ---
Here is a better test case. https://godbolt.org/z/3Gq783
I've found:

sgn_complex
- always inefficient code, TYPE and SIZE do not matter, even with -Ofast or
-fast-math

for TYPE=double
SIZE=1
- abs/mul/div/pow2_complex ok
- zero_complex not vectorized, also with -Ofast or -ffast-math

SIZE=2
- abs/mul/div/pow2/zero_complex only with scalar operations, never vectorized

SIZE=4 and larger
- abs/mul/div/pow2/zero_complex ok


for TYPE=float
SIZE=1
- abs/mul/pow2_complex ok
- div/zero_complex not vectorized, also with -Ofast or -ffast-math

SIZE=2
- abs/mul/div/pow2/zero_complex only with scalar operations, never vectorized

SIZE=4
- abs/pow2/zero_complex ok
- mul_complex inefficient, xmm instead of ymm, also with -Ofast or -ffast-math
- div_complex ok with O3, but with Ofast/fast-math only xmm instead of ymm

SIZE=8 and larger
- abs/mul/div/pow2_complex ok

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/99228] blend/shuffle
  2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
                   ` (4 preceding siblings ...)
  2021-03-02 11:43 ` g.peterhoff@t-online.de
@ 2021-12-23 21:27 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-23 21:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99228

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Hmm, the trunk no longer does the if conversion:

complex_sgn(std::complex<double> const&):
.LFB2678:
        .cfi_startproc
        vmovsd  xmm0, QWORD PTR [rdi]
        vxorpd  xmm1, xmm1, xmm1
        vcomisd xmm0, xmm1
        jne     .L8
        vmovsd  xmm0, QWORD PTR [rdi+8]
        vcomisd xmm0, xmm1
        je      .L13
.L8:
        vandpd  xmm0, xmm0, XMMWORD PTR .LC1[rip]
        vorpd   xmm0, xmm0, XMMWORD PTR .LC2[rip]
        ret

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-23 21:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23 20:43 [Bug c++/99228] New: blend/shuffle g.peterhoff@t-online.de
2021-02-23 21:43 ` [Bug target/99228] blend/shuffle pinskia at gcc dot gnu.org
2021-02-23 21:57 ` g.peterhoff@t-online.de
2021-02-24  5:23 ` crazylht at gmail dot com
2021-02-24  8:23 ` rguenth at gcc dot gnu.org
2021-03-02 11:43 ` g.peterhoff@t-online.de
2021-12-23 21:27 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).