public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/100171] New: autovectorizer
@ 2021-04-21  3:36 g.peterhoff@t-online.de
  2021-04-21  5:09 ` [Bug tree-optimization/100171] autovectorizer pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: g.peterhoff@t-online.de @ 2021-04-21  3:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100171

            Bug ID: 100171
           Summary: autovectorizer
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: g.peterhoff@t-online.de
  Target Milestone: ---

Hello gcc team,
I once wrote a small test case to show the problems with the autovectorizer
https://godbolt.org/z/xs35P45MM . In particular, the += operator is not
vectorized. The + operator works in the same context. I do not understand that.
If you decrement the arraysize in foo from 2 to 1 it doesn't work at all
anymore - scalar operations are always generated for ARR_2x.
In general, I made the experience that the autovectorizer starts much too late.
It should always do this from 2 values, even if these are much smaller than a
simd register. This also saves a lot of memory accesses - especially when the
data is linear in the memory (as in the example). Usually, however,
vectorization is only carried out when the data is at least as large as a simd
register, but often only when it is twice or even four times as large.
I think you should urgently update/optimize the autovectorizer.

thx & regards
Gero

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/100171] autovectorizer
  2021-04-21  3:36 [Bug c++/100171] New: autovectorizer g.peterhoff@t-online.de
@ 2021-04-21  5:09 ` pinskia at gcc dot gnu.org
  2021-04-21  6:58 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-04-21  5:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100171

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |11.0
           Severity|normal                      |enhancement
           Keywords|                            |alias
          Component|c++                         |tree-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
There is an aliasing issue with the += case.
I Noticed that even clang does not auto-vectorizes the exe_self_* cases either.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/100171] autovectorizer
  2021-04-21  3:36 [Bug c++/100171] New: autovectorizer g.peterhoff@t-online.de
  2021-04-21  5:09 ` [Bug tree-optimization/100171] autovectorizer pinskia at gcc dot gnu.org
@ 2021-04-21  6:58 ` rguenth at gcc dot gnu.org
  2021-04-21  8:25 ` rguenth at gcc dot gnu.org
  2021-08-17  5:40 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-21  6:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100171

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well, the issue is that we end up with (for the simplest case):

  <bb 2> [local count: 357878152]:
  _15 = MEM <const double[2]> [(const value_type &)arg_3(D)][0];
  _16 = MEM <const double[2]> [(value_type &)out_2(D)][0];
  _17 = _15 + _16;
  MEM <const double[2]> [(value_type &)out_2(D)][0] = _17;
  _22 = MEM <const double[2]> [(const value_type &)arg_3(D)][1];
  _23 = MEM <const double[2]> [(value_type &)out_2(D)][1];
  _24 = _22 + _23;
  MEM <const double[2]> [(value_type &)out_2(D)][1] = _24;
  return;

and the first store into out[0] can end up writing to arg[1].  I don't see
what we can easily do here.  Path based disambiguation could maybe argue
that partial overlaps of value_type are not allowed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/100171] autovectorizer
  2021-04-21  3:36 [Bug c++/100171] New: autovectorizer g.peterhoff@t-online.de
  2021-04-21  5:09 ` [Bug tree-optimization/100171] autovectorizer pinskia at gcc dot gnu.org
  2021-04-21  6:58 ` rguenth at gcc dot gnu.org
@ 2021-04-21  8:25 ` rguenth at gcc dot gnu.org
  2021-08-17  5:40 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-21  8:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100171

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Compared to the non-self case where we see

  <bb 2> [local count: 357878152]:
  _19 = MEM <const double[2]> [(const value_type &)arg1_3(D)][0];
  _20 = MEM <const double[2]> [(const value_type &)arg2_4(D)][0];
  _21 = _19 + _20;
  _26 = MEM <const double[2]> [(const value_type &)arg1_3(D)][1];
  _27 = MEM <const double[2]> [(const value_type &)arg2_4(D)][1];
  _28 = _26 + _27;
  res ={v} {CLOBBER};
  MEM[(struct value_type *)out_2(D)][0].value._M_elems[0] = _21;
  MEM[(struct value_type *)out_2(D)][0].value._M_elems[1] = _28;
  return;

here intermediate optimizations have elided 'res'.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/100171] autovectorizer
  2021-04-21  3:36 [Bug c++/100171] New: autovectorizer g.peterhoff@t-online.de
                   ` (2 preceding siblings ...)
  2021-04-21  8:25 ` rguenth at gcc dot gnu.org
@ 2021-08-17  5:40 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-17  5:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100171

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
testcase:
#include <array>
#include <cmath>

template <typename Type> class foo
{
public:
    using array_type = std::array<Type, 2>;

    array_type
        value;

    inline constexpr foo&    operator+=(const foo& arg)    noexcept
    {
        for (size_t i=0; i<value.size(); ++i)
            value[i] += arg.value[i];
        return *this;
    }

    inline constexpr foo    operator+(const foo& arg)    const noexcept
    {
        foo
            res;
        for (size_t i=0; i<res.value.size(); ++i)
            res.value[i] = value[i] + arg.value[i];
        return res;
    }
};

//  operator-calls
inline constexpr void exe_self(auto& out, const auto& arg)    noexcept
{
    for (size_t i=0; i<out.size(); ++i)
        out[i] += arg[i];
}
inline constexpr void exe(auto& out, const auto& arg1, const auto& arg2)   
noexcept
{
    for (size_t i=0; i<out.size(); ++i)
        out[i] = arg1[i] + arg2[i];
}

//  test-cases
//  float64
using ARR_1D = std::array<foo<double>, 1>;
void exe_self_1d(ARR_1D& out, const ARR_1D& arg)                 noexcept {
exe_self(out, arg); }
void exe_1d(ARR_1D& out, const ARR_1D& arg1, const ARR_1D& arg2) noexcept {
exe(out, arg1, arg2); }

using ARR_2D = std::array<foo<double>, 2>;
void exe_self_2d(ARR_2D& out, const ARR_2D& arg)                 noexcept {
exe_self(out, arg); }
void exe_2d(ARR_2D& out, const ARR_2D& arg1, const ARR_2D& arg2) noexcept {
exe(out, arg1, arg2); }

using ARR_4D = std::array<foo<double>, 4>;
void exe_self_4d(ARR_4D& out, const ARR_4D& arg)                 noexcept {
exe_self(out, arg); }
void exe_4d(ARR_4D& out, const ARR_4D& arg1, const ARR_4D& arg2) noexcept {
exe(out, arg1, arg2); }


//  float32
using ARR_1F = std::array<foo<float>, 1>;
void exe_self_1f(ARR_1F& out, const ARR_1F& arg)                  noexcept {
exe_self(out, arg);  }
void exe_1f(ARR_1F& out, const ARR_1F& arg1, const ARR_1F& arg2)  noexcept {
exe(out, arg1, arg2); }

using ARR_2F = std::array<foo<float>, 2>;
void exe_self_2f(ARR_2F& out, const ARR_2F& arg)                 noexcept {
exe_self(out, arg); }
void exe_2f(ARR_2F& out, const ARR_2F& arg1, const ARR_2F& arg2) noexcept {
exe(out, arg1, arg2); }

using ARR_4F = std::array<foo<float>, 4>;
void exe_self_4f(ARR_4F& out, const ARR_4F& arg)                 noexcept {
exe_self(out, arg); }
void exe_4f(ARR_4F& out, const ARR_4F& arg1, const ARR_4F& arg2) noexcept {
exe(out, arg1, arg2); }


//  int64
using ARR_1i64 = std::array<foo<int64_t>, 1>;
void exe_self_1i64(ARR_1i64& out, const ARR_1i64& arg)                  
noexcept { exe_self(out, arg);  }
void exe_1i64(ARR_1i64& out, const ARR_1i64& arg1, const ARR_1i64& arg2)
noexcept { exe(out, arg1, arg2); }

using ARR_2i64 = std::array<foo<int64_t>, 2>;
void exe_self_2i64(ARR_2i64& out, const ARR_2i64& arg)                  
noexcept { exe_self(out, arg); }
void exe_2i64(ARR_2i64& out, const ARR_2i64& arg1, const ARR_2i64& arg2)
noexcept { exe(out, arg1, arg2); }

using ARR_4i64 = std::array<foo<int64_t>, 4>;
void exe_self_4i64(ARR_4i64& out, const ARR_4i64& arg)                  
noexcept { exe_self(out, arg); }
void exe_4i64(ARR_4i64& out, const ARR_4i64& arg1, const ARR_4i64& arg2)
noexcept { exe(out, arg1, arg2); }


//  int32
using ARR_1i32 = std::array<foo<int32_t>, 1>;
void exe_self_1i32(ARR_1i32& out, const ARR_1i32& arg)                  
noexcept { exe_self(out, arg);  }
void exe_1i32(ARR_1i32& out, const ARR_1i32& arg1, const ARR_1i32& arg2)
noexcept { exe(out, arg1, arg2); }

using ARR_2i32 = std::array<foo<int32_t>, 2>;
void exe_self_2i32(ARR_2i32& out, const ARR_2i32& arg)                  
noexcept { exe_self(out, arg); }
void exe_2i32(ARR_2i32& out, const ARR_2i32& arg1, const ARR_2i32& arg2)
noexcept { exe(out, arg1, arg2); }

using ARR_4i32 = std::array<foo<int32_t>, 4>;
void exe_self_4i32(ARR_4i32& out, const ARR_4i32& arg)                  
noexcept { exe_self(out, arg); }
void exe_4i32(ARR_4i32& out, const ARR_4i32& arg1, const ARR_4i32& arg2)
noexcept { exe(out, arg1, arg2); }


//  int16
using ARR_1i16 = std::array<foo<int16_t>, 1>;
void exe_self_1i16(ARR_1i16& out, const ARR_1i16& arg)                  
noexcept { exe_self(out, arg);  }
void exe_1i16(ARR_1i16& out, const ARR_1i16& arg1, const ARR_1i16& arg2)
noexcept { exe(out, arg1, arg2); }

using ARR_2i16 = std::array<foo<int16_t>, 2>;
void exe_self_2i16(ARR_2i16& out, const ARR_2i16& arg)                  
noexcept { exe_self(out, arg); }
void exe_2i16(ARR_2i16& out, const ARR_2i16& arg1, const ARR_2i16& arg2)
noexcept { exe(out, arg1, arg2); }

using ARR_4i16 = std::array<foo<int16_t>, 4>;
void exe_self_4i16(ARR_4i16& out, const ARR_4i16& arg)                  
noexcept { exe_self(out, arg); }
void exe_4i16(ARR_4i16& out, const ARR_4i16& arg1, const ARR_4i16& arg2)
noexcept { exe(out, arg1, arg2); }


//  int8
using ARR_1i8 = std::array<foo<int8_t>, 1>;
void exe_self_1i8(ARR_1i8& out, const ARR_1i8& arg)                  noexcept {
exe_self(out, arg);  }
void exe_1i8(ARR_1i8& out, const ARR_1i8& arg1, const ARR_1i8& arg2) noexcept {
exe(out, arg1, arg2); }

using ARR_2i8 = std::array<foo<int8_t>, 2>;
void exe_self_2i8(ARR_2i8& out, const ARR_2i8& arg)                  noexcept {
exe_self(out, arg); }
void exe_2i8(ARR_2i8& out, const ARR_2i8& arg1, const ARR_2i8& arg2) noexcept {
exe(out, arg1, arg2); }

using ARR_4i8 = std::array<foo<int8_t>, 4>;
void exe_self_4i8(ARR_4i8& out, const ARR_4i8& arg)                  noexcept {
exe_self(out, arg); }
void exe_4i8(ARR_4i8& out, const ARR_4i8& arg1, const ARR_4i8& arg2) noexcept {
exe(out, arg1, arg2); }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-17  5:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21  3:36 [Bug c++/100171] New: autovectorizer g.peterhoff@t-online.de
2021-04-21  5:09 ` [Bug tree-optimization/100171] autovectorizer pinskia at gcc dot gnu.org
2021-04-21  6:58 ` rguenth at gcc dot gnu.org
2021-04-21  8:25 ` rguenth at gcc dot gnu.org
2021-08-17  5:40 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).