public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/50904] New: Induct benchmark of polyhedron slows down when -fno-protect-parens is enabled by -Ofast.
@ 2011-10-28 17:19 venkataramanan.kumar.gnu at gmail dot com
  2011-10-28 19:14 ` [Bug rtl-optimization/50904] " dominiq at lps dot ens.fr
                   ` (49 more replies)
  0 siblings, 50 replies; 51+ messages in thread
From: venkataramanan.kumar.gnu at gmail dot com @ 2011-10-28 17:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

             Bug #: 50904
           Summary: Induct benchmark of polyhedron slows down when
                    -fno-protect-parens is enabled by -Ofast.
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: venkataramanan.kumar.gnu@gmail.com


Configurations:
GCC 4.7 trunk revison: 180364
Machine: AMD64 

Commandline:
gfortran -Ofast induct2.f90

Description:
We observed slowdown in induct benchmark for -Ofast after -fprotect-parens got
disabled in -Ofast (in gcc trunk rev 173385 on 2011-05-04 for
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48864).

When ISA enabled is avx, we observed a slowdown of ~2% 

While analyzing the slowdown, we found that there is a difference in code
generated in one of the induct's hot loop nest, between -Ofast (with
protect-parens) and  -Ofast (without protect-parens) irrespective of ISA
(avx,fma4)and tuning. Observations revealed this is due to an interaction with
the reassociation of expression happening in gimple and code hoisting in PRE
and other loop optimizations for the RTL generated for that expression.  

Details:

The following snippet shows the hot loop in subroutine
"mutual_ind_quad_cir_coil".

(-----Snip-----)
      do i = 1, 2*m
          theta = pi*real(i,longreal)/real(m,longreal)
          c_vector(1) = r_coil * cos(theta)
          c_vector(2) = r_coil * sin(theta)
!
!       compute current vector for the coil in the global coordinate system
!
          coil_tmp_vector(1) = -sin(theta)
          coil_tmp_vector(2) = cos(theta)
          coil_tmp_vector(3) = 0.0_longreal
          coil_current_vec(1) =
dot_product(rotate_coil(1,:),coil_tmp_vector(:))
          coil_current_vec(2) =
dot_product(rotate_coil(2,:),coil_tmp_vector(:))
          coil_current_vec(3) =
dot_product(rotate_coil(3,:),coil_tmp_vector(:))
!
          do j = 1, 9
              c_vector(3) = 0.5 * h_coil * z1gauss(j)
!
!       rotate coil vector into the global coordinate system and translate it
!
              rot_c_vector(1) = dot_product(rotate_coil(1,:),c_vector(:)) + dx
              rot_c_vector(2) = dot_product(rotate_coil(2,:),c_vector(:)) + dy
              rot_c_vector(3) = dot_product(rotate_coil(3,:),c_vector(:)) + dz
!
              do k = 1, 9
                  q_vector(1) = 0.5_longreal * a * (x2gauss(k) + 1.0_longreal)
                  q_vector(2) = 0.5_longreal * b1 * (y2gauss(k) - 1.0_longreal)
                  q_vector(3) = 0.0_longreal
!
!       rotate quad vector into the global coordinate system
!
                  rot_q_vector(1) = dot_product(rotate_quad(1,:),q_vector(:))
                  rot_q_vector(2) = dot_product(rotate_quad(2,:),q_vector(:))
                  rot_q_vector(3) = dot_product(rotate_quad(3,:),q_vector(:))
!
!       compute and add in quadrature term
!
                  numerator = w1gauss(j) * w2gauss(k) *                        
            &
                                                
dot_product(coil_current_vec,current_vector)
                  denominator = sqrt(dot_product(rot_c_vector-rot_q_vector,    
            &
                                                                 
rot_c_vector-rot_q_vector))
                  l12_lower = l12_lower + numerator/denominator
              end do
          end do
      end do
(-----Snip-----)

At Ofast, the k loop is unrolled and vectorized. 

When -fprotect-parens is enabled at -Ofast, "q_vector(2) = 0.5_longreal * b1 *
(y2gauss(k) - 1.0_longreal)" and part of the expression "rot_q_vector(1) =
dot_product(rotate_quad(1,:),q_vector(:))" are hoisted out of the j loop:

But in case when -fprotect-parens is disabled, the expressions are not hoisted
out of the loop.

Observations:

1) In gimple, when -fprotect-parens is disabled, the expression (y2gauss(k) -
1.0_longreal) is reassociated as shown below.

   induct2.f90.080t.dse1
   (-----Snip-----)
   D.8701_385 = y2gauss[D.8696_378];
   D.8702_386 = D.8701_385 - 1.0e+0;
   D.8703_387 = b1_148 * D.8702_386;
   D.8704_388 = D.8703_387 * 5.0e-1;
   (-----Snip-----)

   induct2.f90.081.reassoc1
   (-----Snip-----)
   D.8701_385 = y2gauss[D.8696_378];
   D.8702_386 = D.8701_385 + -1.0e+0;
   D.8703_387 = b1_148 * 5.0e-1;
   D.8704_388 = D.8703_387 * D.8702_386;
  (-----Snip-----)

However with  -fprotect-parens is enabled, 

  induct2.f90.081.reassoc1
  (-----Snip-----)
  D.8814_395 = y2gauss[D.8808_387];
  D.8815_396 = D.8814_395 - 1.0e+0;
  D.8816_397 = ((D.8815_396));
  D.8817_398 = b1_154 * 5.0e-1;
  D.8818_399 = D.8817_398 * D.8816_397
  (-----Snip-----)

2) Due to the reassociation that happens when -fprotect-parens is disabled, the
RTL generated for the expression "0.5_longreal * b1 * (y2gauss(k) -
1.0_longreal)" also changes.

For example first 2 elements in y2guass array, the RTL  is generated as  

(-----Snip-----)
insn 525 523 526 14 (set (reg:V2DF 1124)
        (mem/u/c/i:V2DF (symbol_ref/u:DI ("*.LC82") [flags 0x2]) [8 S16 A128]))
../induct2.f90:1662 1102 {*movv2df_internal}
     (expr_list:REG_EQUAL (const_vector:V2DF [
                (const_double:DF -1.0e+0 [-0x0.8p+1])
                (const_double:DF -1.0e+0 [-0x0.8p+1])
            ])
        (nil)))

(insn 526 525 527 14 (set (reg:V2DF 1123)
        (plus:V2DF (reg:V2DF 1124)
            (mem/c:V2DF (symbol_ref:DI ("y2gauss.2335") [flags 0x2]  <var_decl
0x2aaaabb09dc0 y2gauss>) [8 MEM[(real(kind=8)[9] *)&y2gauss]+0 S16 A256])))
../induct2.f90:1662 1130 {*addv2df3}
     (expr_list:REG_EQUAL (plus:V2DF (mem/c:V2DF (symbol_ref:DI
("y2gauss.2335") [flags 0x2]  <var_decl 0x2aaaabb09dc0 y2gauss>) [8
MEM[(real(kind=8)[9] *)&y2gauss]+0 S16 A256])
            (const_vector:V2DF [
                    (const_double:DF -1.0e+0 [-0x0.8p+1])
                    (const_double:DF -1.0e+0 [-0x0.8p+1])
                ]))
        (nil)))

(insn 527 526 528 14 (set (reg:V2DF 216 [ vect_var_.1769 ])
        (mult:V2DF (reg:V2DF 1123)
            (reg:V2DF 1108))) ../induct2.f90:1662 1139 {*mulv2df3}
     (expr_list:REG_DEAD (reg:V2DF 1123)
        (nil)))
(-----Snip-----)

These RTL expressions are computed inside the j loop and not hoisted out.

But in the case when -fprotect-parens enabled at -Ofast, RTL is as follows.

induct2.f90.157r.cprop1
(-----Snip-----)
(insn 536 533 537 14 (set (reg:V2DF 1172 [ MEM[(real(kind=8)[9] *)&y2gauss] ])
        (mem/c:V2DF (symbol_ref:DI ("y2gauss.2335") [flags 0x2]  <var_decl
0x2aaaabb09dc0 y2gauss>) [8 MEM[(real(kind=8)[9] *)&y2gauss]+0 S16 A256]))
induct2.f90:1662 1102 {*movv2df_internal}
     (nil))

(insn 537 536 538 14 (set (reg:V2DF 1170)
        (minus:V2DF (reg:V2DF 1172 [ MEM[(real(kind=8)[9] *)&y2gauss] ])
            (reg:V2DF 1168))) induct2.f90:1662 1131 {*subv2df3}
     (expr_list:REG_DEAD (reg:V2DF 1172 [ MEM[(real(kind=8)[9] *)&y2gauss] ])
        (expr_list:REG_EQUAL (minus:V2DF (mem/c:V2DF (symbol_ref:DI
("y2gauss.2335") [flags 0x2]  <var_decl 0x2aaaabb09dc0 y2gauss>) [8
MEM[(real(kind=8)[9] *)&y2gauss]+0 S16 A256])
                (const_vector:V2DF [
                        (const_double:DF 1.0e+0 [0x0.8p+1])
                        (const_double:DF 1.0e+0 [0x0.8p+1])
                    ]))
            (nil))))                                                            

(insn 538 537 539 14 (set (reg:V2DF 236 [ vect_var_.1777 ])
        (mult:V2DF (reg:V2DF 1170)
            (reg:V2DF 1155))) induct2.f90:1662 1139 {*mulv2df3}
     (expr_list:REG_DEAD (reg:V2DF 1170)
        (nil)))
(-----Snip-----)

Note these expressions get hoisted out of J loop.

In PRE (dump induct2.f90.158r.pre), the first instruction "insn 536" gets
hoisted. Other two instructions are insn 537 and 538 are hoisted at
induct2.f90.168r.loop2_unswitch

This hoisting difference is responsible for 2% degradation in induct benchmark
for avx and fma4 cases. At -Ofast slowdown is not expected and hence raising
this as a bug.

Please provide your suggestions.


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2011-12-07 13:21 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-28 17:19 [Bug rtl-optimization/50904] New: Induct benchmark of polyhedron slows down when -fno-protect-parens is enabled by -Ofast venkataramanan.kumar.gnu at gmail dot com
2011-10-28 19:14 ` [Bug rtl-optimization/50904] " dominiq at lps dot ens.fr
2011-10-30  9:41 ` rguenth at gcc dot gnu.org
2011-10-30  9:41 ` rguenth at gcc dot gnu.org
2011-10-30 11:25 ` dominiq at lps dot ens.fr
2011-10-30 11:35 ` dominiq at lps dot ens.fr
2011-11-01 13:53 ` ebotcazou at gcc dot gnu.org
2011-11-02  5:51 ` venkataramanan.kumar.gnu at gmail dot com
2011-11-04 21:55 ` ebotcazou at gcc dot gnu.org
2011-11-05 11:54 ` [Bug rtl-optimization/50904] [4.7 regression] pessimization " rguenth at gcc dot gnu.org
2011-11-07  0:33 ` ebotcazou at gcc dot gnu.org
2011-11-08  0:43 ` ebotcazou at gcc dot gnu.org
2011-11-09  9:03 ` ebotcazou at gcc dot gnu.org
2011-11-09 10:40 ` venkataramanan.kumar.gnu at gmail dot com
2011-11-11 23:04 ` venkataramanan.kumar.gnu at gmail dot com
2011-11-12 17:22 ` [Bug tree-optimization/50904] " ebotcazou at gcc dot gnu.org
2011-11-19  7:18 ` venkataramanan.kumar.gnu at gmail dot com
2011-11-19  9:09 ` ebotcazou at gcc dot gnu.org
2011-12-01  8:51 ` rguenther at suse dot de
2011-12-01 19:53 ` [Bug rtl-optimization/50904] " ebotcazou at gcc dot gnu.org
2011-12-02  9:49 ` rguenther at suse dot de
2011-12-02 10:56 ` ebotcazou at gcc dot gnu.org
2011-12-02 11:51 ` rguenth at gcc dot gnu.org
2011-12-02 14:04 ` burnus at gcc dot gnu.org
2011-12-02 14:32 ` rguenther at suse dot de
2011-12-02 14:41 ` burnus at gcc dot gnu.org
2011-12-02 15:04 ` rguenther at suse dot de
2011-12-02 16:03 ` burnus at gcc dot gnu.org
2011-12-02 16:13 ` howarth at nitro dot med.uc.edu
2011-12-02 16:15 ` rguenther at suse dot de
2011-12-02 16:30 ` burnus at gcc dot gnu.org
2011-12-02 16:33 ` rguenther at suse dot de
2011-12-02 16:38 ` dominiq at lps dot ens.fr
2011-12-02 16:47 ` dominiq at lps dot ens.fr
2011-12-02 17:07 ` burnus at gcc dot gnu.org
2011-12-02 21:21 ` ebotcazou at gcc dot gnu.org
2011-12-03 14:55 ` dominiq at lps dot ens.fr
2011-12-05  8:19 ` rguenther at suse dot de
2011-12-05  8:27 ` rguenther at suse dot de
2011-12-05  9:21 ` ebotcazou at gcc dot gnu.org
2011-12-05  9:57 ` rguenther at suse dot de
2011-12-05 10:13 ` dominiq at lps dot ens.fr
2011-12-05 10:21 ` rguenth at gcc dot gnu.org
2011-12-05 10:28 ` [Bug tree-optimization/50904] " ebotcazou at gcc dot gnu.org
2011-12-05 11:13 ` rguenth at gcc dot gnu.org
2011-12-05 14:38 ` rguenth at gcc dot gnu.org
2011-12-05 14:40 ` rguenth at gcc dot gnu.org
2011-12-05 17:30 ` ebotcazou at gcc dot gnu.org
2011-12-05 17:59 ` dominiq at lps dot ens.fr
2011-12-06 10:00 ` venkataramanan.kumar.gnu at gmail dot com
2011-12-07 13:21 ` venkataramanan.kumar.gnu at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).