public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/44576]  New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling
@ 2010-06-18  7:51 borntraeger at de dot ibm dot com
  2010-06-18  7:59 ` [Bug middle-end/44576] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling borntraeger at de dot ibm dot com
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: borntraeger at de dot ibm dot com @ 2010-06-18  7:51 UTC (permalink / raw)
  To: gcc-bugs

testsuite/gfortran.dg/zero_sized_1.f90 takes almost 11 minutes to compile on my
notebook when combining aggressive loop prefetching with loop peeling:

$ time gfortran-4.5 -O3 -march=core2  zero_sized_1.f90 -S
-fprefetch-loop-arrays  -funroll-loops --param max-completely-peeled-insns=2000

real    10m54.594s
user    10m48.638s
sys     0m0.393s

The compiler is almost always in compute_miss_rate introduced by
http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00641.html

The problem is triggered by several things:
- loop peeling creates thousands of references (with only a small delta)  and
each reference is compared with every other reference
- for each comparison we iterate over all alignments
- for each alignment we iterate over all distinct iterators

As you can see this causes an exploding complexitiy.
Furthermore,since the cache line size is passed in via an external variable,
the compiler cannot optimize the integer division into shifts.

I think the right solution would be to reduce the complexity of
compute_miss_rate, but I have not found a good solution yet.


-- 
           Summary: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile
                    time on prefetching+peeling
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: borntraeger at de dot ibm dot com
  GCC host triplet: several, i486, s390..


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
@ 2010-06-18  7:59 ` borntraeger at de dot ibm dot com
  2010-06-18 10:52 ` [Bug middle-end/44576] [4.5/4.6 Regression] " rguenth at gcc dot gnu dot org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: borntraeger at de dot ibm dot com @ 2010-06-18  7:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from borntraeger at de dot ibm dot com  2010-06-18 07:59 -------
4.6 (trunk) is also affected


-- 

borntraeger at de dot ibm dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|testsuite/gfortran.dg/zero_s|testsuite/gfortran.dg/zero_s
                   |ized_1.f90 with huge compile|ized_1.f90 with huge compile
                   |time on prefetching+peeling |time on prefetching +
                   |                            |peeling


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
  2010-06-18  7:59 ` [Bug middle-end/44576] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling borntraeger at de dot ibm dot com
@ 2010-06-18 10:52 ` rguenth at gcc dot gnu dot org
  2010-06-24 21:44 ` rguenth at gcc dot gnu dot org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-18 10:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2010-06-18 10:52 -------
Confirmed.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|                            |compile-time-hog
   Last reconfirmed|0000-00-00 00:00:00         |2010-06-18 10:52:06
               date|                            |
            Summary|testsuite/gfortran.dg/zero_s|[4.5/4.6 Regression]
                   |ized_1.f90 with huge compile|testsuite/gfortran.dg/zero_s
                   |time on prefetching +       |ized_1.f90 with huge compile
                   |peeling                     |time on prefetching +
                   |                            |peeling
   Target Milestone|---                         |4.5.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
  2010-06-18  7:59 ` [Bug middle-end/44576] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling borntraeger at de dot ibm dot com
  2010-06-18 10:52 ` [Bug middle-end/44576] [4.5/4.6 Regression] " rguenth at gcc dot gnu dot org
@ 2010-06-24 21:44 ` rguenth at gcc dot gnu dot org
  2010-06-25  9:03 ` borntraeger at de dot ibm dot com
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-24 21:44 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (2 preceding siblings ...)
  2010-06-24 21:44 ` rguenth at gcc dot gnu dot org
@ 2010-06-25  9:03 ` borntraeger at de dot ibm dot com
  2010-06-25 17:08 ` changpeng dot fang at amd dot com
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: borntraeger at de dot ibm dot com @ 2010-06-25  9:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from borntraeger at de dot ibm dot com  2010-06-25 09:02 -------
Created an attachment (id=21001)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21001&action=view)
Potential fix for compile time regression

Here is a potential fix. We just limit prefetching to loops with a "low" amount
of memory references and bail out if the amount of references is too large.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (3 preceding siblings ...)
  2010-06-25  9:03 ` borntraeger at de dot ibm dot com
@ 2010-06-25 17:08 ` changpeng dot fang at amd dot com
  2010-06-26 14:25 ` rguenth at gcc dot gnu dot org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-06-25 17:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from changpeng dot fang at amd dot com  2010-06-25 17:08 -------
(In reply to comment #3)
> Created an attachment (id=21001)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21001&action=view) [edit]
> Potential fix for compile time regression
> 
> Here is a potential fix. We just limit prefetching to loops with a "low" amount
> of memory references and bail out if the amount of references is too large.
> 

This should be a good fix for now. But the complexities of computing group
reuse
and miss rate are still a concern. I don't think we need to compute the miss
rate "exactly" here. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (4 preceding siblings ...)
  2010-06-25 17:08 ` changpeng dot fang at amd dot com
@ 2010-06-26 14:25 ` rguenth at gcc dot gnu dot org
  2010-06-26 20:30 ` borntraeger at de dot ibm dot com
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-26 14:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from rguenth at gcc dot gnu dot org  2010-06-26 14:25 -------
I now see compile-time on polyhedron sky-rocketed with enabling prefetching at
-O3 by default.  See

http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-summary.txt-1-0.html

This isn't acceptable - please work on this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (5 preceding siblings ...)
  2010-06-26 14:25 ` rguenth at gcc dot gnu dot org
@ 2010-06-26 20:30 ` borntraeger at de dot ibm dot com
  2010-06-27  9:16 ` rguenth at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: borntraeger at de dot ibm dot com @ 2010-06-26 20:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from borntraeger at de dot ibm dot com  2010-06-26 20:30 -------
Richard,

can you check if compute_miss_rate is the problem?
does the attached patch helps?

thanks

Christian


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (6 preceding siblings ...)
  2010-06-26 20:30 ` borntraeger at de dot ibm dot com
@ 2010-06-27  9:16 ` rguenth at gcc dot gnu dot org
  2010-06-27  9:17 ` rguenth at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-27  9:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenth at gcc dot gnu dot org  2010-06-27 09:16 -------
The patch didn't help.  Btw, reuse analysis looks worse than quadratic.
Polyhedron test_fpu.f90 is just a single file so you should be able to
check yourself easily.  The last jump doesn't reproduce on the 4.5 branch.

SPEC CPU 2006 gamess compile-time also increased by 25%.  leslie3d compile-time
doubled.
http://gcc.opensuse.org/SPEC/CFP/sb-barbella.suse.de-head-64-2006/times.html.

A good chunk of time seems to be spent in the RTL loop unroller, triggered
by array prefetching (testing with -O3 -funroll-loops).  Otherwise it might
as well be just excessive code growth caused by prefetching.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (7 preceding siblings ...)
  2010-06-27  9:16 ` rguenth at gcc dot gnu dot org
@ 2010-06-27  9:17 ` rguenth at gcc dot gnu dot org
  2010-06-27 11:09 ` borntraeger at de dot ibm dot com
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-27  9:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rguenth at gcc dot gnu dot org  2010-06-27 09:17 -------
Size of the binaries explodes as well...

http://gcc.opensuse.org/SPEC/CFP/sb-barbella.suse.de-head-64-2006/size.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (8 preceding siblings ...)
  2010-06-27  9:17 ` rguenth at gcc dot gnu dot org
@ 2010-06-27 11:09 ` borntraeger at de dot ibm dot com
  2010-06-27 11:20 ` rguenth at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: borntraeger at de dot ibm dot com @ 2010-06-27 11:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from borntraeger at de dot ibm dot com  2010-06-27 11:09 -------
So there seem to be at least two problems:
1. exploding complexity in compute_miss_rate (the start for this bugzilla)
2. effects due to prefetching seen in other passes

I think the attached patch cures 1. What shall we do about 2? Opening another
bugzilla or tracking the problems here? 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (9 preceding siblings ...)
  2010-06-27 11:09 ` borntraeger at de dot ibm dot com
@ 2010-06-27 11:20 ` rguenth at gcc dot gnu dot org
  2010-06-29  0:07 ` changpeng dot fang at amd dot com
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-06-27 11:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rguenth at gcc dot gnu dot org  2010-06-27 11:20 -------
I have opened PR44688


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (10 preceding siblings ...)
  2010-06-27 11:20 ` rguenth at gcc dot gnu dot org
@ 2010-06-29  0:07 ` changpeng dot fang at amd dot com
  2010-06-29  0:49 ` changpeng dot fang at amd dot com
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-06-29  0:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from changpeng dot fang at amd dot com  2010-06-29 00:07 -------
I have a patch that partially fixes the problem:
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02956.html

Note that for this test case, the compile time doubled even though
I don't compute the miss rate at all.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (11 preceding siblings ...)
  2010-06-29  0:07 ` changpeng dot fang at amd dot com
@ 2010-06-29  0:49 ` changpeng dot fang at amd dot com
  2010-06-30  0:23 ` changpeng dot fang at amd dot com
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-06-29  0:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from changpeng dot fang at amd dot com  2010-06-29 00:49 -------
Created an attachment (id=21034)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21034&action=view)
Early return in miss rate computation

The attached patch improves the computation of miss rate. We can stop computing
if the total misses has always exceeds the given acceptable threshold.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (12 preceding siblings ...)
  2010-06-29  0:49 ` changpeng dot fang at amd dot com
@ 2010-06-30  0:23 ` changpeng dot fang at amd dot com
  2010-06-30  0:36 ` changpeng dot fang at amd dot com
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-06-30  0:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from changpeng dot fang at amd dot com  2010-06-30 00:23 -------
Here is the current status of this work:
patch1: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02956.html
patch2: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg03049.html
On my system with -O3 zero_sized_1.f90 -fprefetch-loop-arrays 
-fno-unroll-loops --param max-completely-peeled-insns=2000:

original timing:      5m30s
with patch1:          1m20s
with patch1 + patch2: 1m03s
without prefetch:     0m30s

The timing with prefetch-loop-arrays is still doubled after the two patch
compared to no-prefetch-loop-arrays. The extra 33s is mostly spent in 
dependence computation for loops. For this test case, prefetching is the
only optimization that invokes "compute_all_dependences".

I am not sure whether we should tolerate this timing increase with aggressive
peeling and prefetching, or we should work on the cost reduction of dependence
computation.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (13 preceding siblings ...)
  2010-06-30  0:23 ` changpeng dot fang at amd dot com
@ 2010-06-30  0:36 ` changpeng dot fang at amd dot com
  2010-07-01  0:34 ` changpeng dot fang at amd dot com
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-06-30  0:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from changpeng dot fang at amd dot com  2010-06-30 00:36 -------

(In reply to comment #7)
> A good chunk of time seems to be spent in the RTL loop unroller, triggered
> by array prefetching (testing with -O3 -funroll-loops).  Otherwise it might
> as well be just excessive code growth caused by prefetching.

Yes, for test_fpu.f90, more than half of the time is spent in the RTL loop
unroller, and if manually set unroll_factor to 1 (don't unroll), the timing
increase by array prefetching is negligible.

With -O3 -funroll-loops, I don't expect code size or compilation time increase
from the RTL loop unroller, triggered by array prefetching.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (14 preceding siblings ...)
  2010-06-30  0:36 ` changpeng dot fang at amd dot com
@ 2010-07-01  0:34 ` changpeng dot fang at amd dot com
  2010-07-02 16:34 ` spop at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-07-01  0:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from changpeng dot fang at amd dot com  2010-07-01 00:34 -------
Unrolling of the peeled loop is partially the reason for test_fpu.f90
compilation
time and code size increase. Vectorization peeled a few iteration of the the
loop, the prefetching and unrolling passes does not recognize that a loop is a
peeled version and still unroll the loop.

 MODULE kinds
   INTEGER, PARAMETER :: RK8 = SELECTED_REAL_KIND(15, 300)
END MODULE kinds
! --------------------------------------------------------------------
PROGRAM TEST_FPU  ! A number-crunching benchmark using matrix inversion.
USE kinds         ! Implemented by:    David Frank  Dave_Frank@hotmail.com
IMPLICIT NONE     ! Gauss  routine by: Tim Prince   N8TM@aol.com
                  ! Crout  routine by: James Van Buskirk  torsop@ix.netcom.com
                  ! Lapack routine by: Jos Bergervoet bergervo@IAEhv.nl

REAL(RK8) :: pool(101, 101,1000), a(101, 101)
INTEGER :: i

      DO i = 1,1000
         a = pool(:,:,i)         ! get next matrix to invert
      END DO

END PROGRAM TEST_FPU


In this example, prefetching will unroll tree version of the innermost loop.
If we turn off the vectorizer, it unrolls the only loop.

In addition, -fprefetch-loop-arrays and -funroll-loops (turned on at the same
time) will unroll the same loop. This is over-unrolling and  -funroll-loops
should recognize that the loop has already been unrolled by prefetching.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (15 preceding siblings ...)
  2010-07-01  0:34 ` changpeng dot fang at amd dot com
@ 2010-07-02 16:34 ` spop at gcc dot gnu dot org
  2010-07-02 23:58 ` changpeng dot fang at amd dot com
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: spop at gcc dot gnu dot org @ 2010-07-02 16:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from spop at gcc dot gnu dot org  2010-07-02 16:34 -------
Subject: Bug 44576

Author: spop
Date: Fri Jul  2 16:34:29 2010
New Revision: 161727

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161727
Log:
PR 44576: miss rate computation improvement for prefetching loop arrays.

2010-07-02  Changpeng Fang  <changpeng.fang@amd.com>

        PR middle-end/44576
        * tree-ssa-loop-prefetch.c (compute_miss_rate): Return 1000 (out
        of 1000) for miss rate if the address diference is greater than or
        equal to the cache line size (the two reference will never hit the
        same cache line).

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-prefetch.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (16 preceding siblings ...)
  2010-07-02 16:34 ` spop at gcc dot gnu dot org
@ 2010-07-02 23:58 ` changpeng dot fang at amd dot com
  2010-07-07 18:15 ` spop at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-07-02 23:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from changpeng dot fang at amd dot com  2010-07-02 23:58 -------
(In reply to comment #15)
I have opened PR44794 for the unrolling of pre- and post-loop issue. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (17 preceding siblings ...)
  2010-07-02 23:58 ` changpeng dot fang at amd dot com
@ 2010-07-07 18:15 ` spop at gcc dot gnu dot org
  2010-07-07 19:00 ` changpeng dot fang at amd dot com
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: spop at gcc dot gnu dot org @ 2010-07-07 18:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from spop at gcc dot gnu dot org  2010-07-07 18:14 -------
Changpeng, should this PR be closed now?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (18 preceding siblings ...)
  2010-07-07 18:15 ` spop at gcc dot gnu dot org
@ 2010-07-07 19:00 ` changpeng dot fang at amd dot com
  2010-07-09  1:59 ` changpeng dot fang at amd dot com
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-07-07 19:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from changpeng dot fang at amd dot com  2010-07-07 19:00 -------
(In reply to comment #18)
> Changpeng, should this PR be closed now?
> 

No. I am still looking at the dependence computation cost. I just found the
most of the time is spent in memory allocation and freeing of the data
dependence relatiuon structure.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (19 preceding siblings ...)
  2010-07-07 19:00 ` changpeng dot fang at amd dot com
@ 2010-07-09  1:59 ` changpeng dot fang at amd dot com
  2010-07-09 23:09 ` spop at gcc dot gnu dot org
  2010-07-09 23:12 ` spop at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-07-09  1:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from changpeng dot fang at amd dot com  2010-07-09 01:59 -------
I submitted a patch for review to completely fix the problem. The patch is an
extension to Christian's speedup.patch. It splits the cost analysis into
three small functions and quits further prefetching analysis as long as we
know prefetching is not going to be beneficial to the loop.

Here is the gcc-patches@ link:
http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00734.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (20 preceding siblings ...)
  2010-07-09  1:59 ` changpeng dot fang at amd dot com
@ 2010-07-09 23:09 ` spop at gcc dot gnu dot org
  2010-07-09 23:12 ` spop at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: spop at gcc dot gnu dot org @ 2010-07-09 23:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from spop at gcc dot gnu dot org  2010-07-09 23:09 -------
Subject: Bug 44576

Author: spop
Date: Fri Jul  9 23:08:55 2010
New Revision: 162023

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162023
Log:
pr44576 Avoid un-necessary prefetch analysis by distributing the cost models

2010-07-09  Changpeng Fang  <changpeng.fang@amd.com>

        PR tree-optimization/44576
        * tree-ssa-loop-prefetch.c (trip_count_to_ahead_ratio_too_small_p):
        New.  Pull out from is_loop_prefetching_profitable to implement
        the trip count to ahead ratio heuristic.
        (mem_ref_count_reasonable_p): New.  Pull out from
        is_loop_prefetching_profitable to implement the instruction to
        memory reference ratio heuristic.  Also consider not reasonable if
        the memory reference count is above a threshold (to avoid
        explosive compilation time.
        (insn_to_prefetch_ratio_too_small_p): New.  Pull out from
        is_loop_prefetching_profitable to implement the instruction to
        prefetch ratio heuristic.
        (is_loop_prefetching_profitable): Removed.
        (loop_prefetch_arrays): Distribute the cost analysis across the
        function to allow early exit of the prefetch analysis.
        is_loop_prefetching_profitable is splitted into three functions,
        with each one called as early as possible.
        (PREFETCH_MAX_MEM_REFS_PER_LOOP): New.  Threshold above which the
        number of memory references in a loop is considered too many.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-prefetch.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug middle-end/44576] [4.5/4.6 Regression] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling
  2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
                   ` (21 preceding siblings ...)
  2010-07-09 23:09 ` spop at gcc dot gnu dot org
@ 2010-07-09 23:12 ` spop at gcc dot gnu dot org
  22 siblings, 0 replies; 24+ messages in thread
From: spop at gcc dot gnu dot org @ 2010-07-09 23:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from spop at gcc dot gnu dot org  2010-07-09 23:12 -------
Fixed.


-- 

spop at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44576


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2010-07-09 23:12 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-18  7:51 [Bug middle-end/44576] New: testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching+peeling borntraeger at de dot ibm dot com
2010-06-18  7:59 ` [Bug middle-end/44576] testsuite/gfortran.dg/zero_sized_1.f90 with huge compile time on prefetching + peeling borntraeger at de dot ibm dot com
2010-06-18 10:52 ` [Bug middle-end/44576] [4.5/4.6 Regression] " rguenth at gcc dot gnu dot org
2010-06-24 21:44 ` rguenth at gcc dot gnu dot org
2010-06-25  9:03 ` borntraeger at de dot ibm dot com
2010-06-25 17:08 ` changpeng dot fang at amd dot com
2010-06-26 14:25 ` rguenth at gcc dot gnu dot org
2010-06-26 20:30 ` borntraeger at de dot ibm dot com
2010-06-27  9:16 ` rguenth at gcc dot gnu dot org
2010-06-27  9:17 ` rguenth at gcc dot gnu dot org
2010-06-27 11:09 ` borntraeger at de dot ibm dot com
2010-06-27 11:20 ` rguenth at gcc dot gnu dot org
2010-06-29  0:07 ` changpeng dot fang at amd dot com
2010-06-29  0:49 ` changpeng dot fang at amd dot com
2010-06-30  0:23 ` changpeng dot fang at amd dot com
2010-06-30  0:36 ` changpeng dot fang at amd dot com
2010-07-01  0:34 ` changpeng dot fang at amd dot com
2010-07-02 16:34 ` spop at gcc dot gnu dot org
2010-07-02 23:58 ` changpeng dot fang at amd dot com
2010-07-07 18:15 ` spop at gcc dot gnu dot org
2010-07-07 19:00 ` changpeng dot fang at amd dot com
2010-07-09  1:59 ` changpeng dot fang at amd dot com
2010-07-09 23:09 ` spop at gcc dot gnu dot org
2010-07-09 23:12 ` spop at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).