public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/31040] New: unroll/peel loops not aggressive enough @ 2007-03-05 9:11 jv244 at cam dot ac dot uk 2007-03-05 10:18 ` [Bug tree-optimization/31040] " rguenth at gcc dot gnu dot org ` (5 more replies) 0 siblings, 6 replies; 7+ messages in thread From: jv244 at cam dot ac dot uk @ 2007-03-05 9:11 UTC (permalink / raw) To: gcc-bugs Looking at the asm for the program below, there plenty of loops left after compiling with > gfortran -S -march=native -O3 -funroll-loops -funroll-all-loops -fpeel-loops test.f90 or any combination of these options. A full unrolling (and in that case a return of the value 3) would be possible and much faster. > cat test.f90 INTEGER FUNCTION lxy() lxy=0 DO lxa=0,1 DO lxb=0,0 DO lya=0,1-lxa DO lyb=0,0-lxb lxy=lxy+1 ENDDO ENDDO ENDDO ENDDO END FUNCTION write(6,*) lxy() END -- Summary: unroll/peel loops not aggressive enough Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jv244 at cam dot ac dot uk http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31040 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/31040] unroll/peel loops not aggressive enough 2007-03-05 9:11 [Bug tree-optimization/31040] New: unroll/peel loops not aggressive enough jv244 at cam dot ac dot uk @ 2007-03-05 10:18 ` rguenth at gcc dot gnu dot org 2007-03-05 11:47 ` jv244 at cam dot ac dot uk ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu dot org @ 2007-03-05 10:18 UTC (permalink / raw) To: gcc-bugs ------- Comment #1 from rguenth at gcc dot gnu dot org 2007-03-05 10:18 ------- We don't unroll non-innermost loops at the moment. I don't know if sccp can be taught to handle this case (and if it's worth it). -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu dot | |org, rakdver at gcc dot gnu | |dot org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Keywords| |missed-optimization Last reconfirmed|0000-00-00 00:00:00 |2007-03-05 10:18:14 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31040 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/31040] unroll/peel loops not aggressive enough 2007-03-05 9:11 [Bug tree-optimization/31040] New: unroll/peel loops not aggressive enough jv244 at cam dot ac dot uk 2007-03-05 10:18 ` [Bug tree-optimization/31040] " rguenth at gcc dot gnu dot org @ 2007-03-05 11:47 ` jv244 at cam dot ac dot uk 2007-03-05 11:50 ` rakdver at atrey dot karlin dot mff dot cuni dot cz ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: jv244 at cam dot ac dot uk @ 2007-03-05 11:47 UTC (permalink / raw) To: gcc-bugs ------- Comment #2 from jv244 at cam dot ac dot uk 2007-03-05 11:47 ------- (In reply to comment #1) > We don't unroll non-innermost loops at the moment. I don't know if sccp can > be taught to handle this case (and if it's worth it). such small loops are quite typical for some quantum chemistry integral routines. I'm just experimenting rewriting the kernel mentioned in PR 31021. If I do this unrolling by hand I get quite a speedup on the full kernel: hand unrolled: # best time 5.260329 loops: # best time 6.616413 which is quite impressive because these loops take at most 30% of the kernel total time: The actual code in question is: coef(:,:)=0.0_wp lxy=0 ; lx=0 DO lxa=0,1 DO lxb=0,1 lx = lx + 1 g1=0.0_wp g2=0.0_wp g1k=0.0_wp g2k=0.0_wp DO lya=0,1-lxa DO lyb=0,1-lxb lxy=lxy+1 g1=g1+pyx(1,lxy)*dpy(lyb,lya,jg) g2=g2+pyx(1,lxy)*dpy(lyb,lya,jg2) g1k=g1k+pyx(2,lxy)*dpy(lyb,lya,jg) g2k=g2k+pyx(2,lxy)*dpy(lyb,lya,jg2) ENDDO ENDDO DO icoef=1,3 coef(icoef,1)=coef(icoef,1)+alpha(icoef,lx)*g1 coef(icoef,2)=coef(icoef,2)+alpha(icoef,lx)*g2 coef(icoef,3)=coef(icoef,3)+alpha(icoef,lx)*g1k coef(icoef,4)=coef(icoef,4)+alpha(icoef,lx)*g2k ENDDO ENDDO ENDDO and the hand-unrolling just explicitly expands all loops to the loop free version of exactly the same statements: coef(:,:)=0.0_wp g1=0.0_wp g2=0.0_wp g1k=0.0_wp g2k=0.0_wp g1=g1+pyx(1,1)*dpy(0,0,jg) g2=g2+pyx(1,1)*dpy(0,0,jg2) g1k=g1k+pyx(2,1)*dpy(0,0,jg) g2k=g2k+pyx(2,1)*dpy(0,0,jg2) g1=g1+pyx(1,2)*dpy(1,0,jg) g2=g2+pyx(1,2)*dpy(1,0,jg2) g1k=g1k+pyx(2,2)*dpy(1,0,jg) g2k=g2k+pyx(2,2)*dpy(1,0,jg2) g1=g1+pyx(1,3)*dpy(0,1,jg) g2=g2+pyx(1,3)*dpy(0,1,jg2) g1k=g1k+pyx(2,3)*dpy(0,1,jg) g2k=g2k+pyx(2,3)*dpy(0,1,jg2) g1=g1+pyx(1,4)*dpy(1,1,jg) g2=g2+pyx(1,4)*dpy(1,1,jg2) g1k=g1k+pyx(2,4)*dpy(1,1,jg) g2k=g2k+pyx(2,4)*dpy(1,1,jg2) coef(01,01)=coef(01,01)+alpha(1,1)*g1 coef(01,02)=coef(01,02)+alpha(1,1)*g2 coef(01,03)=coef(01,03)+alpha(1,1)*g1k coef(01,04)=coef(01,04)+alpha(1,1)*g2k coef(02,01)=coef(02,01)+alpha(2,1)*g1 coef(02,02)=coef(02,02)+alpha(2,1)*g2 coef(02,03)=coef(02,03)+alpha(2,1)*g1k coef(02,04)=coef(02,04)+alpha(2,1)*g2k coef(03,01)=coef(03,01)+alpha(3,1)*g1 coef(03,02)=coef(03,02)+alpha(3,1)*g2 coef(03,03)=coef(03,03)+alpha(3,1)*g1k coef(03,04)=coef(03,04)+alpha(3,1)*g2k g1=0.0_wp g2=0.0_wp g1k=0.0_wp g2k=0.0_wp g1=g1+pyx(1,5)*dpy(0,0,jg) g2=g2+pyx(1,5)*dpy(0,0,jg2) g1k=g1k+pyx(2,5)*dpy(0,0,jg) g2k=g2k+pyx(2,5)*dpy(0,0,jg2) g1=g1+pyx(1,6)*dpy(0,1,jg) g2=g2+pyx(1,6)*dpy(0,1,jg2) g1k=g1k+pyx(2,6)*dpy(0,1,jg) g2k=g2k+pyx(2,6)*dpy(0,1,jg2) coef(01,01)=coef(01,01)+alpha(1,2)*g1 coef(01,02)=coef(01,02)+alpha(1,2)*g2 coef(01,03)=coef(01,03)+alpha(1,2)*g1k coef(01,04)=coef(01,04)+alpha(1,2)*g2k coef(02,01)=coef(02,01)+alpha(2,2)*g1 coef(02,02)=coef(02,02)+alpha(2,2)*g2 coef(02,03)=coef(02,03)+alpha(2,2)*g1k coef(02,04)=coef(02,04)+alpha(2,2)*g2k coef(03,01)=coef(03,01)+alpha(3,2)*g1 coef(03,02)=coef(03,02)+alpha(3,2)*g2 coef(03,03)=coef(03,03)+alpha(3,2)*g1k coef(03,04)=coef(03,04)+alpha(3,2)*g2k g1=0.0_wp g2=0.0_wp g1k=0.0_wp g2k=0.0_wp g1=g1+pyx(1,7)*dpy(0,0,jg) g2=g2+pyx(1,7)*dpy(0,0,jg2) g1k=g1k+pyx(2,7)*dpy(0,0,jg) g2k=g2k+pyx(2,7)*dpy(0,0,jg2) g1=g1+pyx(1,8)*dpy(1,0,jg) g2=g2+pyx(1,8)*dpy(1,0,jg2) g1k=g1k+pyx(2,8)*dpy(1,0,jg) g2k=g2k+pyx(2,8)*dpy(1,0,jg2) coef(01,01)=coef(01,01)+alpha(1,3)*g1 coef(01,02)=coef(01,02)+alpha(1,3)*g2 coef(01,03)=coef(01,03)+alpha(1,3)*g1k coef(01,04)=coef(01,04)+alpha(1,3)*g2k coef(02,01)=coef(02,01)+alpha(2,3)*g1 coef(02,02)=coef(02,02)+alpha(2,3)*g2 coef(02,03)=coef(02,03)+alpha(2,3)*g1k coef(02,04)=coef(02,04)+alpha(2,3)*g2k coef(03,01)=coef(03,01)+alpha(3,3)*g1 coef(03,02)=coef(03,02)+alpha(3,3)*g2 coef(03,03)=coef(03,03)+alpha(3,3)*g1k coef(03,04)=coef(03,04)+alpha(3,3)*g2k g1=0.0_wp g2=0.0_wp g1k=0.0_wp g2k=0.0_wp g1=g1+pyx(1,9)*dpy(0,0,jg) g2=g2+pyx(1,9)*dpy(0,0,jg2) g1k=g1k+pyx(2,9)*dpy(0,0,jg) g2k=g2k+pyx(2,9)*dpy(0,0,jg2) coef(01,01)=coef(01,01)+alpha(1,4)*g1 coef(01,02)=coef(01,02)+alpha(1,4)*g2 coef(01,03)=coef(01,03)+alpha(1,4)*g1k coef(01,04)=coef(01,04)+alpha(1,4)*g2k coef(02,01)=coef(02,01)+alpha(2,4)*g1 coef(02,02)=coef(02,02)+alpha(2,4)*g2 coef(02,03)=coef(02,03)+alpha(2,4)*g1k coef(02,04)=coef(02,04)+alpha(2,4)*g2k coef(03,01)=coef(03,01)+alpha(3,4)*g1 coef(03,02)=coef(03,02)+alpha(3,4)*g2 coef(03,03)=coef(03,03)+alpha(3,4)*g1k coef(03,04)=coef(03,04)+alpha(3,4)*g2k -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31040 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/31040] unroll/peel loops not aggressive enough 2007-03-05 9:11 [Bug tree-optimization/31040] New: unroll/peel loops not aggressive enough jv244 at cam dot ac dot uk 2007-03-05 10:18 ` [Bug tree-optimization/31040] " rguenth at gcc dot gnu dot org 2007-03-05 11:47 ` jv244 at cam dot ac dot uk @ 2007-03-05 11:50 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2007-03-05 12:22 ` rguenth at gcc dot gnu dot org ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2007-03-05 11:50 UTC (permalink / raw) To: gcc-bugs ------- Comment #3 from rakdver at atrey dot karlin dot mff dot cuni dot cz 2007-03-05 11:49 ------- Subject: Re: unroll/peel loops not aggressive enough > We don't unroll non-innermost loops at the moment. I don't know if sccp can > be taught to handle this case (and if it's worth it). It is fairly easy to make gcc completely unroll non-innermost loops, I am working on that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31040 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/31040] unroll/peel loops not aggressive enough 2007-03-05 9:11 [Bug tree-optimization/31040] New: unroll/peel loops not aggressive enough jv244 at cam dot ac dot uk ` (2 preceding siblings ...) 2007-03-05 11:50 ` rakdver at atrey dot karlin dot mff dot cuni dot cz @ 2007-03-05 12:22 ` rguenth at gcc dot gnu dot org 2007-07-03 18:21 ` jv244 at cam dot ac dot uk 2007-07-21 8:59 ` pinskia at gcc dot gnu dot org 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu dot org @ 2007-03-05 12:22 UTC (permalink / raw) To: gcc-bugs ------- Comment #4 from rguenth at gcc dot gnu dot org 2007-03-05 12:22 ------- Note that in addition to unrolling the outermost loop you can experiment with adjusting the --param max-completely-peeled-insns param. Also I wonder if DO lxb=0,0 is really common (if so, the frontend might want to lower this differently). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31040 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/31040] unroll/peel loops not aggressive enough 2007-03-05 9:11 [Bug tree-optimization/31040] New: unroll/peel loops not aggressive enough jv244 at cam dot ac dot uk ` (3 preceding siblings ...) 2007-03-05 12:22 ` rguenth at gcc dot gnu dot org @ 2007-07-03 18:21 ` jv244 at cam dot ac dot uk 2007-07-21 8:59 ` pinskia at gcc dot gnu dot org 5 siblings, 0 replies; 7+ messages in thread From: jv244 at cam dot ac dot uk @ 2007-07-03 18:21 UTC (permalink / raw) To: gcc-bugs ------- Comment #5 from jv244 at cam dot ac dot uk 2007-07-03 18:21 ------- The optimization asked for in this PR is now being performed: > gfortran -O3 -funroll-loops -S test.f90 yields globl lxy_ .type lxy_, @function lxy_: .LFB2: movl $3, %eax ret .LFE2: .size lxy_, .-lxy_ .section .eh_frame,"a",@progbits .Lframe1: -- jv244 at cam dot ac dot uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31040 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug tree-optimization/31040] unroll/peel loops not aggressive enough 2007-03-05 9:11 [Bug tree-optimization/31040] New: unroll/peel loops not aggressive enough jv244 at cam dot ac dot uk ` (4 preceding siblings ...) 2007-07-03 18:21 ` jv244 at cam dot ac dot uk @ 2007-07-21 8:59 ` pinskia at gcc dot gnu dot org 5 siblings, 0 replies; 7+ messages in thread From: pinskia at gcc dot gnu dot org @ 2007-07-21 8:59 UTC (permalink / raw) To: gcc-bugs -- pinskia at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pinskia at gcc dot gnu dot | |org Target Milestone|--- |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31040 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-07-21 8:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-03-05 9:11 [Bug tree-optimization/31040] New: unroll/peel loops not aggressive enough jv244 at cam dot ac dot uk 2007-03-05 10:18 ` [Bug tree-optimization/31040] " rguenth at gcc dot gnu dot org 2007-03-05 11:47 ` jv244 at cam dot ac dot uk 2007-03-05 11:50 ` rakdver at atrey dot karlin dot mff dot cuni dot cz 2007-03-05 12:22 ` rguenth at gcc dot gnu dot org 2007-07-03 18:21 ` jv244 at cam dot ac dot uk 2007-07-21 8:59 ` pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).