public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/63271] Should commute arithmetic with vector load [not found] <bug-63271-4@http.gcc.gnu.org/bugzilla/> @ 2014-09-15 20:04 ` glisse at gcc dot gnu.org 2014-09-16 9:06 ` rguenth at gcc dot gnu.org 2021-08-15 22:34 ` pinskia at gcc dot gnu.org 2 siblings, 0 replies; 3+ messages in thread From: glisse at gcc dot gnu.org @ 2014-09-15 20:04 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271 --- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> --- The closest we currently handle (with -O3) is: typedef int vec __attribute__((vector_size(4*sizeof(int)))); void f(vec*r, int i){ (*r)[0]=3*i; (*r)[1]=4*i; (*r)[2]=7*i; (*r)[3]=9*i; } (none of the constants should be 0, 1 or -1, those hide the multiplication and we don't see through that) I did have in mind recognizing, with a forwprop-like pattern matching, a constructor { sqrt(x1), ..., sqrt(xn) } since we don't have any generic syntax to call sqrt on a vector. Binary operations are a bit more work. I don't know if it would be possible / a good idea to tell SLP that a constructor is essentially the same as several data refs, to avoid duplicating too much code. ^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/63271] Should commute arithmetic with vector load [not found] <bug-63271-4@http.gcc.gnu.org/bugzilla/> 2014-09-15 20:04 ` [Bug tree-optimization/63271] Should commute arithmetic with vector load glisse at gcc dot gnu.org @ 2014-09-16 9:06 ` rguenth at gcc dot gnu.org 2021-08-15 22:34 ` pinskia at gcc dot gnu.org 2 siblings, 0 replies; 3+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-09-16 9:06 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2014-09-16 Blocks| |53947 Ever confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- basic-block SLP misses treating a vector constructor as source to search for an SLP opportunity. And yes, it has issues with non-matching SLPs if adds of zero / multiplications of one could be inserted to make the SLP match. Both are missed optimization opportunities there. The SLP vectorizer also requires loads to end the SLP chain which isn't necessary either. We may have duplicate bugreports about each of the three issues (and they should be addressed separately with separate testcases). ^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/63271] Should commute arithmetic with vector load [not found] <bug-63271-4@http.gcc.gnu.org/bugzilla/> 2014-09-15 20:04 ` [Bug tree-optimization/63271] Should commute arithmetic with vector load glisse at gcc dot gnu.org 2014-09-16 9:06 ` rguenth at gcc dot gnu.org @ 2021-08-15 22:34 ` pinskia at gcc dot gnu.org 2 siblings, 0 replies; 3+ messages in thread From: pinskia at gcc dot gnu.org @ 2021-08-15 22:34 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271 --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- So the two functions are not the same (because __m128i is Vector of 2 long long [at least now]). Here is a better testcase: #define vector __attribute__((vector_size(16))) typedef vector char __m128i ; static inline __m128i _mm_set_epi8(char a, char b, char c, char d, char e, char f, char g, char h, char i, char j, char k, char l, char m, char n, char o, char p) { return (__m128i){a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p}; } __m128i foo(char C) { return _mm_set_epi8( 0, C, 2*C, 3*C, 4*C, 5*C, 6*C, 7*C, 8*C, 9*C, 10*C, 11*C, 12*C, 13*C, 14*C, 15*C); } __m128i bar(char C) { __m128i v = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15); vector unsigned char d = (vector unsigned char)v; d *= C; return (__m128i)d; } -------------------------------------CUT ------------------------ So take the above, on aarch64 SLP does not do it because it does not recongize 0 and C as being able to SLPed. If I change them to be both to 2*C, then SLP will do the right thing. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-08-15 22:34 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <bug-63271-4@http.gcc.gnu.org/bugzilla/> 2014-09-15 20:04 ` [Bug tree-optimization/63271] Should commute arithmetic with vector load glisse at gcc dot gnu.org 2014-09-16 9:06 ` rguenth at gcc dot gnu.org 2021-08-15 22:34 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).