[Bug tree-optimization/63271] Should commute arithmetic with vector load

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/63271] Should commute arithmetic with vector load
       [not found] <bug-63271-4@http.gcc.gnu.org/bugzilla/>
@ 2014-09-15 20:04 ` glisse at gcc dot gnu.org
  2014-09-16  9:06 ` rguenth at gcc dot gnu.org
  2021-08-15 22:34 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 3+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-09-15 20:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271

--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
The closest we currently handle (with -O3) is:

typedef int vec __attribute__((vector_size(4*sizeof(int))));

void f(vec*r, int i){
  (*r)[0]=3*i;
  (*r)[1]=4*i;
  (*r)[2]=7*i;
  (*r)[3]=9*i;
}

(none of the constants should be 0, 1 or -1, those hide the multiplication and
we don't see through that)

I did have in mind recognizing, with a forwprop-like pattern matching, a
constructor { sqrt(x1), ..., sqrt(xn) } since we don't have any generic syntax
to call sqrt on a vector. Binary operations are a bit more work. I don't know
if it would be possible / a good idea to tell SLP that a constructor is
essentially the same as several data refs, to avoid duplicating too much code.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/63271] Should commute arithmetic with vector load
       [not found] <bug-63271-4@http.gcc.gnu.org/bugzilla/>
  2014-09-15 20:04 ` [Bug tree-optimization/63271] Should commute arithmetic with vector load glisse at gcc dot gnu.org
@ 2014-09-16  9:06 ` rguenth at gcc dot gnu.org
  2021-08-15 22:34 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-09-16  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-09-16
             Blocks|                            |53947
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
basic-block SLP misses treating a vector constructor as source to search for
an SLP opportunity.  And yes, it has issues with non-matching SLPs if
adds of zero / multiplications of one could be inserted to make the SLP match.

Both are missed optimization opportunities there.

The SLP vectorizer also requires loads to end the SLP chain which isn't
necessary either.

We may have duplicate bugreports about each of the three issues (and they
should be addressed separately with separate testcases).


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/63271] Should commute arithmetic with vector load
       [not found] <bug-63271-4@http.gcc.gnu.org/bugzilla/>
  2014-09-15 20:04 ` [Bug tree-optimization/63271] Should commute arithmetic with vector load glisse at gcc dot gnu.org
  2014-09-16  9:06 ` rguenth at gcc dot gnu.org
@ 2021-08-15 22:34 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-15 22:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So the two functions are not the same (because __m128i is Vector of 2 long long
[at least now]).
Here is a better testcase:

#define vector __attribute__((vector_size(16)))
typedef vector  char __m128i ;

static inline __m128i _mm_set_epi8(char a, char b, char c, char d, char e, char
f,
                     char g, char h, char i, char j, char k, char l,
                     char m, char n, char o, char p)
{
  return (__m128i){a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p};
}


    __m128i foo(char C)
    {
      return _mm_set_epi8(   0,    C,  2*C,  3*C,
                           4*C,  5*C,  6*C,  7*C,
                           8*C,  9*C, 10*C, 11*C,
                          12*C, 13*C, 14*C, 15*C);
    }

    __m128i bar(char C)
    {
      __m128i v = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7,
                               8, 9,10,11,12,13,14,15);
      vector unsigned char d = (vector unsigned char)v;
      d *= C;
      return (__m128i)d;
    }
-------------------------------------CUT ------------------------

So take the above, on aarch64 SLP does not do it because it does not recongize
0 and C as being able to SLPed.  If I change them to be both to 2*C, then SLP
will do the right thing.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-15 22:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-63271-4@http.gcc.gnu.org/bugzilla/>
2014-09-15 20:04 ` [Bug tree-optimization/63271] Should commute arithmetic with vector load glisse at gcc dot gnu.org
2014-09-16  9:06 ` rguenth at gcc dot gnu.org
2021-08-15 22:34 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).