public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/37150]  New: vectorizer issue
@ 2008-08-18 15:34 jv244 at cam dot ac dot uk
  2008-08-18 15:35 ` [Bug middle-end/37150] " jv244 at cam dot ac dot uk
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-18 15:34 UTC (permalink / raw)
  To: gcc-bugs

As pointed out :

http://gcc.gnu.org/ml/gcc/2008-08/msg00290.html

The attached testcase yields (on a core2 duo, gcc trunk):

    gfortran -O3 -ftree-vectorize -ffast-math -march=native test.f90
    time ./a.out

real 0m3.414s

    ifort -xT -O3  test.f90
    time ./a.out

real 0m1.556s

The assembly contains:

        ifort   gfortran
mulpd     140          0
mulsd       0        280


so the reason seems that ifort vectorizes the attached testcase


-- 
           Summary: vectorizer issue
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jv244 at cam dot ac dot uk
  GCC host triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer issue
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
@ 2008-08-18 15:35 ` jv244 at cam dot ac dot uk
  2008-08-18 15:56 ` rguenth at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-18 15:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from jv244 at cam dot ac dot uk  2008-08-18 15:33 -------
Created an attachment (id=16082)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16082&action=view)
testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer issue
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
  2008-08-18 15:35 ` [Bug middle-end/37150] " jv244 at cam dot ac dot uk
@ 2008-08-18 15:56 ` rguenth at gcc dot gnu dot org
  2008-08-18 16:10 ` burnus at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-08-18 15:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2008-08-18 15:55 -------
Note that there is no loop left on the trunk for the testcase, but after
the vectorizer it is all unrolled completely (unvectorized, of course).
Again this looks like missing vectorization of scalar code.

Note that the first complete unrolling pass unrolls loops that result in
smaller code.  This interferes with vectorization in your case, so can
you try

Index: tree-ssa-loop-ivcanon.c
===================================================================
*** tree-ssa-loop-ivcanon.c     (revision 139200)
--- tree-ssa-loop-ivcanon.c     (working copy)
*************** tree_unroll_loops_completely (bool may_i
*** 359,374 ****

        FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
        {
!         if (may_increase_size && maybe_hot_bb_p (loop->header)
!             /* Unroll outermost loops only if asked to do so or they do
!                not cause code growth.  */
!             && (unroll_outer
!                 || loop_outer (loop_outer (loop))))
            ul = UL_ALL;
          else
            ul = UL_NO_GROWTH;
!         changed |= canonicalize_loop_induction_variables
!                      (loop, false, ul, !flag_tree_loop_ivcanon);
        }

        if (changed)
--- 359,378 ----

        FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
        {
!         /* Unroll outermost loops only if asked to do so.  */
!         if (!unroll_outer
!             && !loop_outer (loop_outer (loop)))
!           ul = UL_SINGLE_ITER;
!         else if (may_increase_size && maybe_hot_bb_p (loop->header))
            ul = UL_ALL;
          else
            ul = UL_NO_GROWTH;
!         if (canonicalize_loop_induction_variables
!               (loop, false, ul, !flag_tree_loop_ivcanon))
!           {
!             statistics_counter_event (cfun, "Loops completely unrolled", 1);
!             changed = true;
!           }
        }

        if (changed)


?


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer issue
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
  2008-08-18 15:35 ` [Bug middle-end/37150] " jv244 at cam dot ac dot uk
  2008-08-18 15:56 ` rguenth at gcc dot gnu dot org
@ 2008-08-18 16:10 ` burnus at gcc dot gnu dot org
  2008-08-19 10:55 ` jv244 at cam dot ac dot uk
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: burnus at gcc dot gnu dot org @ 2008-08-18 16:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from burnus at gcc dot gnu dot org  2008-08-18 16:09 -------
Same trend with "ifort -O3" (ifort 11beta) and "gfortran -O3 --fast-math
-march=native" on AMD Athlon64 X2 4800+ / openSUSE 11. [same mulsd/mulpd
numbers]
ifort 2.452s, gfortran 3.848s -> 57% slower.

With Richard's patch: 3.040s (and mulsd = 0; mulpd = 140)


-- 

burnus at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |burnus at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer issue
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (2 preceding siblings ...)
  2008-08-18 16:10 ` burnus at gcc dot gnu dot org
@ 2008-08-19 10:55 ` jv244 at cam dot ac dot uk
  2008-08-19 11:37 ` jv244 at cam dot ac dot uk
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 10:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from jv244 at cam dot ac dot uk  2008-08-19 10:53 -------
(In reply to comment #2)

> Note that the first complete unrolling pass unrolls loops that result in
> smaller code.  This interferes with vectorization in your case, so can
> you try

unfortunately, the patch below doesn't apply to trunk anymore. I applied it by
hand, and get similar improvements like the ones observed by Tobias.
Ifort                : 1.54s
Gfortran (unpatched) : 3.30s
Gfortran (patched)   : 1.94s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer issue
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (3 preceding siblings ...)
  2008-08-19 10:55 ` jv244 at cam dot ac dot uk
@ 2008-08-19 11:37 ` jv244 at cam dot ac dot uk
  2008-08-19 13:51 ` jv244 at cam dot ac dot uk
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 11:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from jv244 at cam dot ac dot uk  2008-08-19 11:36 -------
Created an attachment (id=16098)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16098&action=view)
ifort asm

added the ifort asm. The remaining difference seems to be related to how data
is being loaded in the registers


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer issue
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (4 preceding siblings ...)
  2008-08-19 11:37 ` jv244 at cam dot ac dot uk
@ 2008-08-19 13:51 ` jv244 at cam dot ac dot uk
  2008-08-24 22:50 ` [Bug middle-end/37150] vectorizer misses some loops pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 13:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from jv244 at cam dot ac dot uk  2008-08-19 13:50 -------
Created an attachment (id=16099)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16099&action=view)
non-reduced testcase

unfortunately, on the non-reduced testcase (attached as collocate_fast_2.f90)
the vectorization does not trigger :-(

I guess that this is due to the more complex loop structure ? It would be
really great if this could be made to work.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer misses some loops
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (5 preceding siblings ...)
  2008-08-19 13:51 ` jv244 at cam dot ac dot uk
@ 2008-08-24 22:50 ` pinskia at gcc dot gnu dot org
  2008-12-27  6:33 ` pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-08-24 22:50 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
            Summary|vectorizer issue            |vectorizer misses some loops


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer misses some loops
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (6 preceding siblings ...)
  2008-08-24 22:50 ` [Bug middle-end/37150] vectorizer misses some loops pinskia at gcc dot gnu dot org
@ 2008-12-27  6:33 ` pinskia at gcc dot gnu dot org
  2009-08-06  7:55 ` jv244 at cam dot ac dot uk
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-27  6:33 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2008-12-27 06:31:06
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer misses some loops
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (7 preceding siblings ...)
  2008-12-27  6:33 ` pinskia at gcc dot gnu dot org
@ 2009-08-06  7:55 ` jv244 at cam dot ac dot uk
  2009-08-06  9:40 ` rguenth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-06  7:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from jv244 at cam dot ac dot uk  2009-08-06 07:54 -------
Just verified that current trunk is not yet able to vectorize the test.f90
code,
it would be cool if this could be fixed (maybe along the lines of Richard's
previous patch?) as this is similar to CP2K's kernel routines:

> gfortran -O3 -march=native -ffast-math test.f90 &> /dev/null
> time ./a.out

real    0m2.306s
user    0m2.304s
sys     0m0.000s
> ifort -O3 -xT test.f90 &> /dev/null
> time ./a.out

real    0m1.812s
user    0m1.808s
sys     0m0.004s


-- 

jv244 at cam dot ac dot uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2008-12-27 06:31:06         |2009-08-06 07:54:57
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer misses some loops
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (8 preceding siblings ...)
  2009-08-06  7:55 ` jv244 at cam dot ac dot uk
@ 2009-08-06  9:40 ` rguenth at gcc dot gnu dot org
  2009-08-06 10:24 ` jv244 at cam dot ac dot uk
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-06  9:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2009-08-06 09:40 -------
I think that scalar code vectorization should instead catch this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer misses some loops
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (9 preceding siblings ...)
  2009-08-06  9:40 ` rguenth at gcc dot gnu dot org
@ 2009-08-06 10:24 ` jv244 at cam dot ac dot uk
  2009-08-06 10:49 ` irar at il dot ibm dot com
  2009-08-06 11:11 ` jv244 at cam dot ac dot uk
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-06 10:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jv244 at cam dot ac dot uk  2009-08-06 10:24 -------
(In reply to comment #8)
> I think that scalar code vectorization should instead catch this.

is this 'scalar code vectorization' the same as the SLP that has already been
added? 


-- 

jv244 at cam dot ac dot uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer misses some loops
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (10 preceding siblings ...)
  2009-08-06 10:24 ` jv244 at cam dot ac dot uk
@ 2009-08-06 10:49 ` irar at il dot ibm dot com
  2009-08-06 11:11 ` jv244 at cam dot ac dot uk
  12 siblings, 0 replies; 14+ messages in thread
From: irar at il dot ibm dot com @ 2009-08-06 10:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from irar at il dot ibm dot com  2009-08-06 10:49 -------
Yes. The problem is that only a basic implementation was added. To vectorize
this code several improvements must be done: support stmt group sizes greater
than vector size, allow loads and stores to the same location, initiate SLP
analysis from groups of loads, support misaligned access, etc. 

Finding a benchmark could really help to push these items to the top of
vectorizer's todo list.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug middle-end/37150] vectorizer misses some loops
  2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
                   ` (11 preceding siblings ...)
  2009-08-06 10:49 ` irar at il dot ibm dot com
@ 2009-08-06 11:11 ` jv244 at cam dot ac dot uk
  12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-06 11:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from jv244 at cam dot ac dot uk  2009-08-06 11:11 -------
(In reply to comment #10)
> Finding a benchmark could really help to push these items to the top of
> vectorizer's todo list.

we're lucky here ;-)

http://gcc.gnu.org/benchmarks/

has a link to

http://cp2k.berlios.de/gfortran/

the code discussed (in particular the above collocate_fast_2.f90) is (in a
slightly older but equivalent variant, see
ftp://ftp.berlios.de/pub/cp2k/gfortran/gcc_bench.tgz) a significant part of the
bench01 benchmark. Getting the same performance as ifort on this kernel would
speedup this benchmark with ~ 10% which would be highly significant.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-08-06 11:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
2008-08-18 15:35 ` [Bug middle-end/37150] " jv244 at cam dot ac dot uk
2008-08-18 15:56 ` rguenth at gcc dot gnu dot org
2008-08-18 16:10 ` burnus at gcc dot gnu dot org
2008-08-19 10:55 ` jv244 at cam dot ac dot uk
2008-08-19 11:37 ` jv244 at cam dot ac dot uk
2008-08-19 13:51 ` jv244 at cam dot ac dot uk
2008-08-24 22:50 ` [Bug middle-end/37150] vectorizer misses some loops pinskia at gcc dot gnu dot org
2008-12-27  6:33 ` pinskia at gcc dot gnu dot org
2009-08-06  7:55 ` jv244 at cam dot ac dot uk
2009-08-06  9:40 ` rguenth at gcc dot gnu dot org
2009-08-06 10:24 ` jv244 at cam dot ac dot uk
2009-08-06 10:49 ` irar at il dot ibm dot com
2009-08-06 11:11 ` jv244 at cam dot ac dot uk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).