[Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array
       [not found] <bug-57223-4@http.gcc.gnu.org/bugzilla/>
@ 2013-05-09  5:06 ` snagavallis at outlook dot com
  2013-05-09  5:11 ` snagavallis at outlook dot com
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: snagavallis at outlook dot com @ 2013-05-09  5:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57223

--- Comment #1 from Sasanka Nagavalli <snagavallis at outlook dot com> 2013-05-09 05:06:32 UTC ---
Created attachment 30069
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30069
Test case for issue 57223

Adding a test case that demonstrates the issue for several types.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array
       [not found] <bug-57223-4@http.gcc.gnu.org/bugzilla/>
  2013-05-09  5:06 ` [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array snagavallis at outlook dot com
@ 2013-05-09  5:11 ` snagavallis at outlook dot com
  2013-09-26  8:45 ` y.usishchev at samsung dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: snagavallis at outlook dot com @ 2013-05-09  5:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57223

--- Comment #2 from Sasanka Nagavalli <snagavallis at outlook dot com> 2013-05-09 05:11:28 UTC ---
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.7.3-1ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.7 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls
--with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin
--with-system-zlib --enable-objc-gc --with-cloog --enable-cloog-backend=ppl
--disable-cloog-version-check --disable-ppl-version-check --enable-multiarch
--disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.7.3 (Ubuntu/Linaro 4.7.3-1ubuntu1)

$ gcc -march=native -Q --help=target
The following options are target specific:
  -m128bit-long-double                [disabled]
  -m32                                [disabled]
  -m3dnow                             [disabled]
  -m3dnowa                            [disabled]
  -m64                                [enabled]
  -m80387                             [enabled]
  -m8bit-idiv                         [disabled]
  -m96bit-long-double                 [enabled]
  -mabi=                              sysv
  -mabm                               [disabled]
  -maccumulate-outgoing-args          [disabled]
  -maddress-mode=                     short
  -maes                               [disabled]
  -malign-double                      [disabled]
  -malign-functions=                  0
  -malign-jumps=                      0
  -malign-loops=                      0
  -malign-stringops                   [enabled]
  -mandroid                           [disabled]
  -march=                             corei7
  -masm=                              att
  -mavx                               [disabled]
  -mavx2                              [disabled]
  -mavx256-split-unaligned-load     [disabled]
  -mavx256-split-unaligned-store     [disabled]
  -mbionic                            [disabled]
  -mbmi                               [disabled]
  -mbmi2                              [disabled]
  -mbranch-cost=                      0
  -mcld                               [disabled]
  -mcmodel=                           32
  -mcpu=                              
  -mcrc32                             [disabled]
  -mcx16                              [enabled]
  -mdispatch-scheduler                [disabled]
  -mf16c                              [disabled]
  -mfancy-math-387                    [enabled]
  -mfentry                            [enabled]
  -mfma                               [disabled]
  -mfma4                              [disabled]
  -mforce-drap                        [disabled]
  -mfp-ret-in-387                     [enabled]
  -mfpmath=                           387
  -mfsgsbase                          [disabled]
  -mfused-madd                        
  -mglibc                             [enabled]
  -mhard-float                        [enabled]
  -mieee-fp                           [enabled]
  -mincoming-stack-boundary=          0
  -minline-all-stringops              [disabled]
  -minline-stringops-dynamically     [disabled]
  -mintel-syntax                      
  -mlarge-data-threshold=             0x10000
  -mlwp                               [disabled]
  -mlzcnt                             [disabled]
  -mmmx                               [disabled]
  -mmovbe                             [disabled]
  -mms-bitfields                      [disabled]
  -mno-align-stringops                [disabled]
  -mno-fancy-math-387                 [disabled]
  -mno-push-args                      [disabled]
  -mno-red-zone                       [disabled]
  -mno-sse4                           [disabled]
  -momit-leaf-frame-pointer           [disabled]
  -mpc32                              [disabled]
  -mpc64                              [disabled]
  -mpc80                              [disabled]
  -mpclmul                            [disabled]
  -mpopcnt                            [enabled]
  -mprefer-avx128                     [disabled]
  -mpreferred-stack-boundary=         0
  -mpush-args                         [enabled]
  -mrdrnd                             [disabled]
  -mrecip                             [disabled]
  -mrecip=                            
  -mred-zone                          [enabled]
  -mregparm=                          0
  -mrtd                               [disabled]
  -msahf                              [enabled]
  -msoft-float                        [disabled]
  -msse                               [enabled]
  -msse2                              [enabled]
  -msse2avx                           [disabled]
  -msse3                              [enabled]
  -msse4                              [enabled]
  -msse4.1                            [enabled]
  -msse4.2                            [enabled]
  -msse4a                             [disabled]
  -msse5                              
  -msseregparm                        [disabled]
  -mssse3                             [enabled]
  -mstack-arg-probe                   [disabled]
  -mstackrealign                      [enabled]
  -mstringop-strategy=                [default]
  -mtbm                               [disabled]
  -mtls-dialect=                      gnu
  -mtls-direct-seg-refs               [enabled]
  -mtune=                             corei7
  -muclibc                            [disabled]
  -mveclibabi=                        [default]
  -mvect8-ret-in-mem                  [disabled]
  -mvzeroupper                        [disabled]
  -mx32                               [disabled]
  -mxop                               [disabled]

  Known assembler dialects (for use with the -masm-dialect= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array
       [not found] <bug-57223-4@http.gcc.gnu.org/bugzilla/>
  2013-05-09  5:06 ` [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array snagavallis at outlook dot com
  2013-05-09  5:11 ` snagavallis at outlook dot com
@ 2013-09-26  8:45 ` y.usishchev at samsung dot com
  2013-09-26  9:23 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: y.usishchev at samsung dot com @ 2013-09-26  8:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57223

Usishchev Yury <y.usishchev at samsung dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |y.usishchev at samsung dot com

--- Comment #3 from Usishchev Yury <y.usishchev at samsung dot com> ---
I'm testing it on current trunk, and second loop is not vectorized both with
floating point and integer types.
For floating point types it is not vectorized due to control flow in loop:

 <bb 15>:
// ...
if (t_56 > _61)
  goto <bb 16>;
else
  goto <bb 17>;
 <bb 16>:
 <bb 17>:
# iftmp.2_7 = PHI <_61(16), t_56(15)>

This can be optimized to MIN_EXPR in phiopt pass, but is not because of NaNs:

tree-ssa-phiopt.c:876:
  /* The optimization may be unsafe due to NaNs.  */
  if (HONOR_NANS (TYPE_MODE (type)))
    return false;

If compiled with -ffinite-math-only second loop still is not vectorised:

not_always_good.c:16:7: note: not vectorized: latch block not empty.

(same occurs with integer types). Latch block it that case is:

 <bb 14>:
  pretmp_176 = *prephitmp_173;
  goto <bb 13>;

the statement here is generated in pre pass.

For vectorization to work we can either not generate it in pre or move it into
head of the loop in vectorizer.

Right now i'm trying to find how to prevent pre from generating statements in
empty latch blocks.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array
       [not found] <bug-57223-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2013-09-26  8:45 ` y.usishchev at samsung dot com
@ 2013-09-26  9:23 ` rguenth at gcc dot gnu.org
  2013-09-26 14:47 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-09-26  9:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57223

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Usishchev Yury from comment #3)
> I'm testing it on current trunk, and second loop is not vectorized both with
> floating point and integer types.
> For floating point types it is not vectorized due to control flow in loop:
> 
>  <bb 15>:
> // ...
> if (t_56 > _61)
>   goto <bb 16>;
> else
>   goto <bb 17>;
>  <bb 16>:
>  <bb 17>:
> # iftmp.2_7 = PHI <_61(16), t_56(15)>
> 
> This can be optimized to MIN_EXPR in phiopt pass, but is not because of NaNs:
> 
> tree-ssa-phiopt.c:876:
>   /* The optimization may be unsafe due to NaNs.  */
>   if (HONOR_NANS (TYPE_MODE (type)))
>     return false;
> 
> If compiled with -ffinite-math-only second loop still is not vectorised:
> 
> not_always_good.c:16:7: note: not vectorized: latch block not empty.
> 
> (same occurs with integer types). Latch block it that case is:
> 
>  <bb 14>:
>   pretmp_176 = *prephitmp_173;
>   goto <bb 13>;
> 
> the statement here is generated in pre pass.
> 
> For vectorization to work we can either not generate it in pre or move it
> into head of the loop in vectorizer.
> 
> Right now i'm trying to find how to prevent pre from generating statements
> in empty latch blocks.

It already has code to avoid this in some situations.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array
       [not found] <bug-57223-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2013-09-26  9:23 ` rguenth at gcc dot gnu.org
@ 2013-09-26 14:47 ` rguenth at gcc dot gnu.org
  2013-09-26 14:53 ` rguenth at gcc dot gnu.org
  2013-09-26 14:57 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-09-26 14:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57223

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-09-26
     Ever confirmed|0                           |1

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #5)
> (In reply to Usishchev Yury from comment #3)
> > I'm testing it on current trunk, and second loop is not vectorized both with
> > floating point and integer types.
> > For floating point types it is not vectorized due to control flow in loop:
> > 
> >  <bb 15>:
> > // ...
> > if (t_56 > _61)
> >   goto <bb 16>;
> > else
> >   goto <bb 17>;
> >  <bb 16>:
> >  <bb 17>:
> > # iftmp.2_7 = PHI <_61(16), t_56(15)>
> > 
> > This can be optimized to MIN_EXPR in phiopt pass, but is not because of NaNs:
> 
> Even without MIN_EXPR, ifcvt should turn this into a COND_EXPR that can be
> vectorized, no?

I think it is.  Both loops are vectorized with -O3 -fno-tree-pre and test_t ==
float on x86_64.

Confirmed for the PRE issue.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array
       [not found] <bug-57223-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2013-09-26 14:47 ` rguenth at gcc dot gnu.org
@ 2013-09-26 14:53 ` rguenth at gcc dot gnu.org
  2013-09-26 14:57 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-09-26 14:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57223

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Testcase for the PRE issue:

typedef float test_t;
void foo(test_t * d, int n)
{
  int i, j, k;
  for (k=0; k<n; ++k) {
      for (i=0; i<n; ++i) {
          test_t t;
          j = k;
          t = d[i*n+k] + d[k*n+j];
          d[i*n+j] = (d[i*n+j] < t) ? d[i*n+j] : t;
          for (j=k+1; j<n; ++j) {
              t = d[i*n+k] + d[k*n+j];
              d[i*n+j] = (d[i*n+j] < t) ? d[i*n+j] : t;
          }
      }
  }
}

for which I think we have (simpler) duplicates.  We do not inhibit
PRE from removing partial memory redundancies just because we vectorize.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array
       [not found] <bug-57223-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2013-09-26 14:53 ` rguenth at gcc dot gnu.org
@ 2013-09-26 14:57 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-09-26 14:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57223

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Dup.

*** This bug has been marked as a duplicate of bug 35229 ***


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-09-26 14:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-57223-4@http.gcc.gnu.org/bugzilla/>
2013-05-09  5:06 ` [Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array snagavallis at outlook dot com
2013-05-09  5:11 ` snagavallis at outlook dot com
2013-09-26  8:45 ` y.usishchev at samsung dot com
2013-09-26  9:23 ` rguenth at gcc dot gnu.org
2013-09-26 14:47 ` rguenth at gcc dot gnu.org
2013-09-26 14:53 ` rguenth at gcc dot gnu.org
2013-09-26 14:57 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).