public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/37150] New: vectorizer issue
@ 2008-08-18 15:34 jv244 at cam dot ac dot uk
2008-08-18 15:35 ` [Bug middle-end/37150] " jv244 at cam dot ac dot uk
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-18 15:34 UTC (permalink / raw)
To: gcc-bugs
As pointed out :
http://gcc.gnu.org/ml/gcc/2008-08/msg00290.html
The attached testcase yields (on a core2 duo, gcc trunk):
gfortran -O3 -ftree-vectorize -ffast-math -march=native test.f90
time ./a.out
real 0m3.414s
ifort -xT -O3 test.f90
time ./a.out
real 0m1.556s
The assembly contains:
ifort gfortran
mulpd 140 0
mulsd 0 280
so the reason seems that ifort vectorizes the attached testcase
--
Summary: vectorizer issue
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jv244 at cam dot ac dot uk
GCC host triplet: x86_64-unknown-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer issue
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
@ 2008-08-18 15:35 ` jv244 at cam dot ac dot uk
2008-08-18 15:56 ` rguenth at gcc dot gnu dot org
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-18 15:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from jv244 at cam dot ac dot uk 2008-08-18 15:33 -------
Created an attachment (id=16082)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16082&action=view)
testcase
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer issue
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
2008-08-18 15:35 ` [Bug middle-end/37150] " jv244 at cam dot ac dot uk
@ 2008-08-18 15:56 ` rguenth at gcc dot gnu dot org
2008-08-18 16:10 ` burnus at gcc dot gnu dot org
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-08-18 15:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2008-08-18 15:55 -------
Note that there is no loop left on the trunk for the testcase, but after
the vectorizer it is all unrolled completely (unvectorized, of course).
Again this looks like missing vectorization of scalar code.
Note that the first complete unrolling pass unrolls loops that result in
smaller code. This interferes with vectorization in your case, so can
you try
Index: tree-ssa-loop-ivcanon.c
===================================================================
*** tree-ssa-loop-ivcanon.c (revision 139200)
--- tree-ssa-loop-ivcanon.c (working copy)
*************** tree_unroll_loops_completely (bool may_i
*** 359,374 ****
FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
{
! if (may_increase_size && maybe_hot_bb_p (loop->header)
! /* Unroll outermost loops only if asked to do so or they do
! not cause code growth. */
! && (unroll_outer
! || loop_outer (loop_outer (loop))))
ul = UL_ALL;
else
ul = UL_NO_GROWTH;
! changed |= canonicalize_loop_induction_variables
! (loop, false, ul, !flag_tree_loop_ivcanon);
}
if (changed)
--- 359,378 ----
FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
{
! /* Unroll outermost loops only if asked to do so. */
! if (!unroll_outer
! && !loop_outer (loop_outer (loop)))
! ul = UL_SINGLE_ITER;
! else if (may_increase_size && maybe_hot_bb_p (loop->header))
ul = UL_ALL;
else
ul = UL_NO_GROWTH;
! if (canonicalize_loop_induction_variables
! (loop, false, ul, !flag_tree_loop_ivcanon))
! {
! statistics_counter_event (cfun, "Loops completely unrolled", 1);
! changed = true;
! }
}
if (changed)
?
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer issue
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
2008-08-18 15:35 ` [Bug middle-end/37150] " jv244 at cam dot ac dot uk
2008-08-18 15:56 ` rguenth at gcc dot gnu dot org
@ 2008-08-18 16:10 ` burnus at gcc dot gnu dot org
2008-08-19 10:55 ` jv244 at cam dot ac dot uk
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: burnus at gcc dot gnu dot org @ 2008-08-18 16:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from burnus at gcc dot gnu dot org 2008-08-18 16:09 -------
Same trend with "ifort -O3" (ifort 11beta) and "gfortran -O3 --fast-math
-march=native" on AMD Athlon64 X2 4800+ / openSUSE 11. [same mulsd/mulpd
numbers]
ifort 2.452s, gfortran 3.848s -> 57% slower.
With Richard's patch: 3.040s (and mulsd = 0; mulpd = 140)
--
burnus at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |burnus at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer issue
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (2 preceding siblings ...)
2008-08-18 16:10 ` burnus at gcc dot gnu dot org
@ 2008-08-19 10:55 ` jv244 at cam dot ac dot uk
2008-08-19 11:37 ` jv244 at cam dot ac dot uk
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 10:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from jv244 at cam dot ac dot uk 2008-08-19 10:53 -------
(In reply to comment #2)
> Note that the first complete unrolling pass unrolls loops that result in
> smaller code. This interferes with vectorization in your case, so can
> you try
unfortunately, the patch below doesn't apply to trunk anymore. I applied it by
hand, and get similar improvements like the ones observed by Tobias.
Ifort : 1.54s
Gfortran (unpatched) : 3.30s
Gfortran (patched) : 1.94s
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer issue
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (3 preceding siblings ...)
2008-08-19 10:55 ` jv244 at cam dot ac dot uk
@ 2008-08-19 11:37 ` jv244 at cam dot ac dot uk
2008-08-19 13:51 ` jv244 at cam dot ac dot uk
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 11:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from jv244 at cam dot ac dot uk 2008-08-19 11:36 -------
Created an attachment (id=16098)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16098&action=view)
ifort asm
added the ifort asm. The remaining difference seems to be related to how data
is being loaded in the registers
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer issue
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (4 preceding siblings ...)
2008-08-19 11:37 ` jv244 at cam dot ac dot uk
@ 2008-08-19 13:51 ` jv244 at cam dot ac dot uk
2008-08-24 22:50 ` [Bug middle-end/37150] vectorizer misses some loops pinskia at gcc dot gnu dot org
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 13:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from jv244 at cam dot ac dot uk 2008-08-19 13:50 -------
Created an attachment (id=16099)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16099&action=view)
non-reduced testcase
unfortunately, on the non-reduced testcase (attached as collocate_fast_2.f90)
the vectorization does not trigger :-(
I guess that this is due to the more complex loop structure ? It would be
really great if this could be made to work.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer misses some loops
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (5 preceding siblings ...)
2008-08-19 13:51 ` jv244 at cam dot ac dot uk
@ 2008-08-24 22:50 ` pinskia at gcc dot gnu dot org
2008-12-27 6:33 ` pinskia at gcc dot gnu dot org
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-08-24 22:50 UTC (permalink / raw)
To: gcc-bugs
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Summary|vectorizer issue |vectorizer misses some loops
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer misses some loops
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (6 preceding siblings ...)
2008-08-24 22:50 ` [Bug middle-end/37150] vectorizer misses some loops pinskia at gcc dot gnu dot org
@ 2008-12-27 6:33 ` pinskia at gcc dot gnu dot org
2009-08-06 7:55 ` jv244 at cam dot ac dot uk
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-12-27 6:33 UTC (permalink / raw)
To: gcc-bugs
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2008-12-27 06:31:06
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer misses some loops
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (7 preceding siblings ...)
2008-12-27 6:33 ` pinskia at gcc dot gnu dot org
@ 2009-08-06 7:55 ` jv244 at cam dot ac dot uk
2009-08-06 9:40 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-06 7:55 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from jv244 at cam dot ac dot uk 2009-08-06 07:54 -------
Just verified that current trunk is not yet able to vectorize the test.f90
code,
it would be cool if this could be fixed (maybe along the lines of Richard's
previous patch?) as this is similar to CP2K's kernel routines:
> gfortran -O3 -march=native -ffast-math test.f90 &> /dev/null
> time ./a.out
real 0m2.306s
user 0m2.304s
sys 0m0.000s
> ifort -O3 -xT test.f90 &> /dev/null
> time ./a.out
real 0m1.812s
user 0m1.808s
sys 0m0.004s
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2008-12-27 06:31:06 |2009-08-06 07:54:57
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer misses some loops
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (8 preceding siblings ...)
2009-08-06 7:55 ` jv244 at cam dot ac dot uk
@ 2009-08-06 9:40 ` rguenth at gcc dot gnu dot org
2009-08-06 10:24 ` jv244 at cam dot ac dot uk
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-06 9:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from rguenth at gcc dot gnu dot org 2009-08-06 09:40 -------
I think that scalar code vectorization should instead catch this.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer misses some loops
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (9 preceding siblings ...)
2009-08-06 9:40 ` rguenth at gcc dot gnu dot org
@ 2009-08-06 10:24 ` jv244 at cam dot ac dot uk
2009-08-06 10:49 ` irar at il dot ibm dot com
2009-08-06 11:11 ` jv244 at cam dot ac dot uk
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-06 10:24 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from jv244 at cam dot ac dot uk 2009-08-06 10:24 -------
(In reply to comment #8)
> I think that scalar code vectorization should instead catch this.
is this 'scalar code vectorization' the same as the SLP that has already been
added?
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |irar at il dot ibm dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer misses some loops
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (10 preceding siblings ...)
2009-08-06 10:24 ` jv244 at cam dot ac dot uk
@ 2009-08-06 10:49 ` irar at il dot ibm dot com
2009-08-06 11:11 ` jv244 at cam dot ac dot uk
12 siblings, 0 replies; 14+ messages in thread
From: irar at il dot ibm dot com @ 2009-08-06 10:49 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from irar at il dot ibm dot com 2009-08-06 10:49 -------
Yes. The problem is that only a basic implementation was added. To vectorize
this code several improvements must be done: support stmt group sizes greater
than vector size, allow loads and stores to the same location, initiate SLP
analysis from groups of loads, support misaligned access, etc.
Finding a benchmark could really help to push these items to the top of
vectorizer's todo list.
Ira
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug middle-end/37150] vectorizer misses some loops
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
` (11 preceding siblings ...)
2009-08-06 10:49 ` irar at il dot ibm dot com
@ 2009-08-06 11:11 ` jv244 at cam dot ac dot uk
12 siblings, 0 replies; 14+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-06 11:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from jv244 at cam dot ac dot uk 2009-08-06 11:11 -------
(In reply to comment #10)
> Finding a benchmark could really help to push these items to the top of
> vectorizer's todo list.
we're lucky here ;-)
http://gcc.gnu.org/benchmarks/
has a link to
http://cp2k.berlios.de/gfortran/
the code discussed (in particular the above collocate_fast_2.f90) is (in a
slightly older but equivalent variant, see
ftp://ftp.berlios.de/pub/cp2k/gfortran/gcc_bench.tgz) a significant part of the
bench01 benchmark. Getting the same performance as ifort on this kernel would
speedup this benchmark with ~ 10% which would be highly significant.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-08-06 11:11 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-18 15:34 [Bug middle-end/37150] New: vectorizer issue jv244 at cam dot ac dot uk
2008-08-18 15:35 ` [Bug middle-end/37150] " jv244 at cam dot ac dot uk
2008-08-18 15:56 ` rguenth at gcc dot gnu dot org
2008-08-18 16:10 ` burnus at gcc dot gnu dot org
2008-08-19 10:55 ` jv244 at cam dot ac dot uk
2008-08-19 11:37 ` jv244 at cam dot ac dot uk
2008-08-19 13:51 ` jv244 at cam dot ac dot uk
2008-08-24 22:50 ` [Bug middle-end/37150] vectorizer misses some loops pinskia at gcc dot gnu dot org
2008-12-27 6:33 ` pinskia at gcc dot gnu dot org
2009-08-06 7:55 ` jv244 at cam dot ac dot uk
2009-08-06 9:40 ` rguenth at gcc dot gnu dot org
2009-08-06 10:24 ` jv244 at cam dot ac dot uk
2009-08-06 10:49 ` irar at il dot ibm dot com
2009-08-06 11:11 ` jv244 at cam dot ac dot uk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).