public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "dominiq at lps dot ens dot fr" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/34265] New: Missed optimizations Date: Wed, 28 Nov 2007 15:27:00 -0000 [thread overview] Message-ID: <bug-34265-12313@http.gcc.gnu.org/bugzilla/> (raw) I have had a closer look to the optimization of the polyhedron test induct.f90. 3/4 of the runtime is spent in the subroutine 'mutual_ind_quad_cir_coil' and 1/4 in 'mutual_ind_quad_rec_coil'. The two subroutines contain two main loops with the following structure: do i = 1, 2*m ... do j = 1, 9 ... do k = 1, 9 q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal) q_vector(2) = 0.5_longreal * b1 * (y2gauss(k) - 1.0_longreal) q_vector(3) = 0.0_longreal ! ! rotate quad vector into the global coordinate system ! rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:)) rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:)) rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:)) ! ! compute and add in quadrature term ! numerator = w1gauss(j) * w2gauss(k) * & dot_product(coil_current_vec,current_vector) denominator = sqrt(dot_product(rot_c_vector-rot_q_vector, & rot_c_vector-rot_q_vector)) l12_lower = l12_lower + numerator/denominator end do end do end do where the six first lines of code in the k loop do not depend on i nor j and can be computed in a 'do k = 1, 9' loop outside the main loop by replacing the length-three vector 'rot_q_vector' by a nine by three array. The original code induct.f90 gives the following timing (all with -O3 -ffast-math -funroll-loops): 93.227u 0.094s 1:33.32 99.9% 0+0k 0+0io 0pf+0w and output: ... Maximum wand/quad abs rel mutual inductance = 5.95379428444656467E-002 Minimum resq coil/quad abs rel mutual inductance = 0.0000000000000000 Maximum resq coil/quad abs rel mutual inductance = 9.63995250242061230E-002 ... Unrolling bye hand 'numerator' and 'denominator' gives (see http://gcc.gnu.org/ml/fortran/2007-11/msg00231.html): 65.563u 0.092s 1:05.66 99.9% 0+0k 0+0io 0pf+0w Looking at the assembly I can see that for the original code the inner loops in k are not unrolled, as guessed by Paul Thomas (only the implied vector loops being unrolled). QUESTION 1: Should the frontend do the unrolling for small vectors itself? or should the middleend be more aggressive for nested loops with small known iterations? Moving the invariants on i and j in the k loops outside the main loops gives: 80.313u 0.074s 1:20.39 99.9% 0+0k 0+1io 0pf+0w Combining the two hand optimizations gives: 35.925u 0.040s 0:35.97 99.9% 0+0k 0+0io 0pf+0w (without -ffast-math the timing is 59.263u 0.067s 0:59.33 99.9% 0+0k 0+1io 0pf+0w) but the results change to: Maximum wand/quad abs rel mutual inductance = 5.95379428444656675E-002 Minimum resq coil/quad abs rel mutual inductance = 0.0000000000000000 Maximum resq coil/quad abs rel mutual inductance = 9.63995250242059842E-002 ( Maximum wand/quad abs rel mutual inductance = 5.95379428444659520E-002 Minimum resq coil/quad abs rel mutual inductance = 0.0000000000000000 Maximum resq coil/quad abs rel mutual inductance = 9.63995250242060675E-002 without -ffast-math). The attached file gives the differences between the original code and the three variants. This is to be compared to further optimizations (indu.v2.f90) leading to: 35.376u 0.062s 0:35.44 99.9% 0+0k 0+1io 0pf+0w or after merging the two loops (indu.v3.f90) to: 34.452u 0.041s 0:34.49 100.0% 0+0k 0+0io 0pf+0w I have counted the number of sqrt() in the assembly code and found 9 of them in the slow codes while I only found 5 (10 for indu.v3.f90) of them for the fast codes. I checked that this was not due to common subexpressions I may have missed. Looking more closely to the assemby I saw that he slow codes used 'sqrtsd' while the fast ones used 'sqrtpd' along with several other packed operations. Now 65.66*5/9=36.48 explaining most of the speedup. Note that 5=9/2+1, i.e., four packed computations followed by a scalar one. Owing to the same structure in the two loops, the two scalar computations could be merged in a packed one, but it is missed by the optimizer. I have tried without success to trigger this "vectorization" by making some variables vectors in k. QUESTION 2: Why the optimizer is able to vectorize in some cases and not in others? Can the frontend help to vectorize? -- Summary: Missed optimizations Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dominiq at lps dot ens dot fr GCC build triplet: i686-apple-darwin9 GCC host triplet: i686-apple-darwin9 GCC target triplet: i686-apple-darwin9 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
next reply other threads:[~2007-11-28 15:27 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2007-11-28 15:27 dominiq at lps dot ens dot fr [this message] 2007-11-28 15:30 ` [Bug tree-optimization/34265] " dominiq at lps dot ens dot fr 2007-11-28 16:06 ` rguenth at gcc dot gnu dot org 2007-11-28 16:14 ` dominiq at lps dot ens dot fr 2007-11-28 16:18 ` rguenth at gcc dot gnu dot org 2007-11-28 16:34 ` rguenth at gcc dot gnu dot org 2007-11-28 18:18 ` dominiq at lps dot ens dot fr 2007-11-28 18:49 ` dominiq at lps dot ens dot fr 2007-11-28 20:48 ` jb at gcc dot gnu dot org 2007-11-28 21:27 ` burnus at gcc dot gnu dot org 2007-11-28 22:05 ` rguenth at gcc dot gnu dot org 2007-11-28 22:36 ` dominiq at lps dot ens dot fr 2007-11-28 22:49 ` steven at gcc dot gnu dot org 2007-11-28 23:07 ` kargl at gcc dot gnu dot org 2007-11-28 23:18 ` steven at gcc dot gnu dot org 2007-11-28 23:57 ` dominiq at lps dot ens dot fr 2007-11-29 8:06 ` dominiq at lps dot ens dot fr 2007-11-29 10:12 ` rguenth at gcc dot gnu dot org 2007-11-29 10:22 ` dominiq at lps dot ens dot fr 2007-11-29 10:40 ` dominiq at lps dot ens dot fr 2007-11-29 11:01 ` dominiq at lps dot ens dot fr 2007-11-29 11:13 ` rguenther at suse dot de 2007-11-29 11:16 ` dominiq at lps dot ens dot fr 2007-11-29 12:25 ` dominiq at lps dot ens dot fr 2007-11-29 15:49 ` dominiq at lps dot ens dot fr 2007-11-30 21:38 ` ubizjak at gmail dot com 2007-12-03 14:08 ` dominiq at lps dot ens dot fr 2007-12-03 14:32 ` dominiq at lps dot ens dot fr 2007-12-03 14:34 ` dominiq at lps dot ens dot fr 2007-12-03 14:34 ` dominiq at lps dot ens dot fr 2007-12-03 16:30 ` ubizjak at gmail dot com 2007-12-03 18:59 ` dominiq at lps dot ens dot fr 2007-12-04 6:57 ` irar at il dot ibm dot com 2008-04-23 21:27 ` dominiq at lps dot ens dot fr
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-34265-12313@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).