public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/49849] New: loop optimization prevents vectorization
@ 2011-07-26  7:46 vincenzo.innocente at cern dot ch
  2011-07-26  8:31 ` [Bug tree-optimization/49849] " vincenzo.innocente at cern dot ch
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2011-07-26  7:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49849

           Summary: loop optimization prevents vectorization
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: vincenzo.innocente@cern.ch


In the following example I suspect that some sort of loop merging at O3 prevent
the optimization of the second inner loop in "bar"
compare
c++ -Wall -O2 -ftree-vectorize -ftree-vectorizer-verbose=7 -c vectHist.cpp
-ffast-math
c++ -Wall -O3 -ftree-vectorize -ftree-vectorizer-verbose=7 -c vectHist.cpp
-ffast-math



what I do not understand is that if (following man page) I compare O2 and O3
with
gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
diff /tmp/O2-opts /tmp/O3-opts | grep enabled
>   -fgcse-after-reload         		[enabled]
>   -finline-functions          		[enabled]
>   -fipa-cp-clone              		[enabled]
>   -fpredictive-commoning      		[enabled]
>   -ftree-loop-distribute-patterns 	[enabled]
>   -ftree-vectorize            		[enabled]
>   -funswitch-loops            		[enabled]

I still get
c++ -std=gnu++0x -DNDEBUG -Wall -O2 -ftree-vectorize -msse4
-fvisibility-inlines-hidden -ftree-vectorizer-verbose=2 --param
vect-max-version-for-alias-checks=30 -funsafe-loop-optimizations
-ftree-loop-distribution -ftree-loop-if-convert-stores -fipa-pta
-Wunsafe-loop-optimizations -fgcse-sm -fgcse-las -c vectHist.cpp -ffast-math
-funswitch-loops -ftree-loop-distribute-patterns -fpredictive-commoning
-finline-functions -fipa-cp-clone -fgcse-after-reload

vectHist.cpp:17: note: not vectorized: data ref analysis failed x_5 =
co[D.4986_4];

vectHist.cpp:16: note: vectorized 0 loops in function.

vectHist.cpp:35: note: not vectorized: data ref analysis failed D.4977_30 =
hist[D.4976_29];

vectHist.cpp:33: note: LOOP VECTORIZED.
vectHist.cpp:31: note: not vectorized: data ref analysis failed D.4957_13 =
co[D.4956_12];

vectHist.cpp:25: note: vectorized 1 loops in function.

while changing just O2 in 03 (that at this point should be not really effective
as I added all options by hand) does not vectorize…
c++ -std=gnu++0x -DNDEBUG -Wall -O3 -mavx -ftree-vectorize -msse4
-fvisibility-inlines-hidden -ftree-vectorizer-verbose=2 --param
vect-max-version-for-alias-checks=30 -funsafe-loop-optimizations
-ftree-loop-distribution -ftree-loop-if-convert-stores -fipa-pta
-Wunsafe-loop-optimizations -fgcse-sm -fgcse-las -c vectHist.cpp -ffast-math
-funswitch-loops -ftree-loop-distribute-patterns -fpredictive-commoning
-finline-functions -fipa-cp-clone -fgcse-after-reload 
vectHist.cpp:17: note: not vectorized: data ref analysis failed x_5 =
co[D.5125_4];

vectHist.cpp:17: note: not vectorized: data ref analysis failed x_5 =
co[D.5125_4];

vectHist.cpp:16: note: vectorized 0 loops in function.

vectHist.cpp:30: note: not vectorized: data ref analysis failed D.5096_55 =
co[D.5095_54];

vectHist.cpp:30: note: not vectorized: data ref analysis failed D.5096_55 =
co[D.5095_54];

vectHist.cpp:25: note: vectorized 0 loops in function.

note how it does not report anything about loops at lines 31,33 and 35

---------------------------
// a classroom example
#include<cmath>

const int N=1024;

float __attribute__ ((aligned(16))) a[N];
float __attribute__ ((aligned(16))) b[N];
float __attribute__ ((aligned(16))) c[N];
float __attribute__ ((aligned(16))) d[N];
int __attribute__ ((aligned(16)))   k[N];



float __attribute__ ((aligned(16))) co[12];
float __attribute__ ((aligned(16))) hist[100];


// do not expect GCC to vectorize (yet)
void foo() {
  for (int i=0; i!=N; ++i) {
    float x = co[k[i]];
    float y = a[i]/std::sqrt(x*b[i]);
    ++hist[int(y)];
  } 
}


// let's give it an hand: split the loop so that the "heavy duty one" vectorize
void bar() {
  const int S=8;
  int loops = N/S;
  float x[S];
  float y[S];
  for (int j=0; j!=loops; ++j) {
    for (int i=0; i!=S; ++i)
      x[i] = co[k[j+i]];
    for (int i=0; i!=S; ++i) // this should vectorize
      y[i] = a[j+i]/std::sqrt(x[i]*b[j+i]);
    for (int i=0; i!=S; ++i)
      ++hist[int(y[i])];
  } 
}


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/49849] loop optimization prevents vectorization
  2011-07-26  7:46 [Bug tree-optimization/49849] New: loop optimization prevents vectorization vincenzo.innocente at cern dot ch
@ 2011-07-26  8:31 ` vincenzo.innocente at cern dot ch
  2011-07-26  9:21 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2011-07-26  8:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49849

--- Comment #1 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2011-07-26 08:30:45 UTC ---
it may be a duplicate of my own PR49730
as

void bar2(int jj) {
  const int S=8;
  float x[S];
  float y[S];
  int j = jj*S;
  for (int i=0; i!=S; ++i)
    x[i] = co[k[j+i]];
  for (int i=0; i!=S; ++i) // this should vectorize
    y[i] = a[j+i]/std::sqrt(x[i]*b[j+i]);
  for (int i=0; i!=S; ++i)
    ++hist[int(y[i])];
} 

vectorize at 03

(of course in the example I submitted previously the external loop should read

  for (int jj=0; jj!=loops; ++jj) {
    int j = jj*S;

)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/49849] loop optimization prevents vectorization
  2011-07-26  7:46 [Bug tree-optimization/49849] New: loop optimization prevents vectorization vincenzo.innocente at cern dot ch
  2011-07-26  8:31 ` [Bug tree-optimization/49849] " vincenzo.innocente at cern dot ch
@ 2011-07-26  9:21 ` rguenth at gcc dot gnu.org
  2011-07-26  9:38 ` vincenzo.innocente at cern dot ch
  2011-07-26  9:46 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-26  9:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49849

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011.07.26 09:21:16
     Ever Confirmed|0                           |1

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-26 09:21:16 UTC ---
The loop likely completely unrolled, you can disable that with
--param max-completely-peel-times=1.

I think scalar-code vectorization does not handle this right now because
the temporary arrays that would help it have store-motion applied (and
should be later optimized away, but are not).


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/49849] loop optimization prevents vectorization
  2011-07-26  7:46 [Bug tree-optimization/49849] New: loop optimization prevents vectorization vincenzo.innocente at cern dot ch
  2011-07-26  8:31 ` [Bug tree-optimization/49849] " vincenzo.innocente at cern dot ch
  2011-07-26  9:21 ` rguenth at gcc dot gnu.org
@ 2011-07-26  9:38 ` vincenzo.innocente at cern dot ch
  2011-07-26  9:46 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2011-07-26  9:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49849

--- Comment #3 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2011-07-26 09:38:13 UTC ---
Thanks Richard,
--param max-completely-peel-times=1
does the trick and, in my real life example, does not have any adverse effect
elsewhere
while it speeds up the loop as expected.
More in general,
Do you think that GCC will ever be able to transform things like foo into bar
by itself?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/49849] loop optimization prevents vectorization
  2011-07-26  7:46 [Bug tree-optimization/49849] New: loop optimization prevents vectorization vincenzo.innocente at cern dot ch
                   ` (2 preceding siblings ...)
  2011-07-26  9:38 ` vincenzo.innocente at cern dot ch
@ 2011-07-26  9:46 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-26  9:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49849

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |spop at gcc dot gnu.org

--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-26 09:45:55 UTC ---
(In reply to comment #3)
> Thanks Richard,
> --param max-completely-peel-times=1
> does the trick and, in my real life example, does not have any adverse effect
> elsewhere
> while it speeds up the loop as expected.
> More in general,
> Do you think that GCC will ever be able to transform things like foo into bar
> by itself?

I hope so ;)  The graphite framework is supposed to provide us with
this kind of features.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-07-26  9:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-26  7:46 [Bug tree-optimization/49849] New: loop optimization prevents vectorization vincenzo.innocente at cern dot ch
2011-07-26  8:31 ` [Bug tree-optimization/49849] " vincenzo.innocente at cern dot ch
2011-07-26  9:21 ` rguenth at gcc dot gnu.org
2011-07-26  9:38 ` vincenzo.innocente at cern dot ch
2011-07-26  9:46 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).