public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
@ 2012-05-18 11:45 dominiq at lps dot ens.fr
  2012-05-18 11:48 ` [Bug middle-end/53395] " rguenth at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-18 11:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

             Bug #: 53395
           Summary: [4.8 Regression] The LAPACK functions i(d|s)amax are
                    more than two times slower after revision 187183
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: dominiq@lps.ens.fr
                CC: pinskia@gcc.gnu.org


As noted in pr53346, the LAPACK functions i(d|s)amax compiled with '-O3
-ffast-math -funroll-loops' are more than two times slower after revision
187183 on x86_64-apple-darwin10, as shown by the following results for a
reduced version of idamax (increment one only)

[macbook] test/dbg_rnflow% cat idamax_red.f90
      integer function idamax(n,dx)
!
      double precision dx(*),dmax
      integer i,n
!
      idamax = 1
   20 dmax = dabs(dx(1))
      do 30 i = 2,n
         if(dabs(dx(i)).le.dmax) go to 30
         idamax = i
         dmax = dabs(dx(i))
   30 continue
      return
      end
[macbook] test/dbg_rnflow% cat tst_idamax_red.f90
implicit none
integer, parameter :: n = 40000
integer :: i, j, res(n+1)
integer :: idamax
external idamax
real(8) :: x, dx, a(n+1)

dx = 2.0/real(n, kind=8)
do i = 0, n
    x = dx*real(i, kind=8) - 1.0
    a(i+1) = 1-2.0*(1-2.0*x**2)**2-0.1_8*x
end do

res = 0

do i = 0, n
    j = idamax(n+1, a)
    res(i+1) = j
    a(i+1) = a(i+1) + 0.1_8
end do
print *, sum(res)
end

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187182/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax_red.f90
[macbook] test/dbg_rnflow% gfc tst_idamax_red.f90 idamax_red.o
[macbook] test/dbg_rnflow% time a.out
   386062110
2.474u 0.002s 0:02.47 100.0%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187183/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax_red.f90
[macbook] test/dbg_rnflow% gfc tst_idamax_red.f90 idamax_red.o
[macbook] test/dbg_rnflow% time a.out
   386062110
5.561u 0.004s 0:05.56 100.0%    0+0k 0+0io 0pf+0w


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
@ 2012-05-18 11:48 ` rguenth at gcc dot gnu.org
  2012-05-18 11:54 ` [Bug tree-optimization/53395] " dominiq at lps dot ens.fr
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-18 11:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |middle-end
   Target Milestone|---                         |4.8.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
  2012-05-18 11:48 ` [Bug middle-end/53395] " rguenth at gcc dot gnu.org
@ 2012-05-18 11:54 ` dominiq at lps dot ens.fr
  2012-05-18 16:02 ` pinskia at gcc dot gnu.org
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-18 11:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

--- Comment #1 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-18 11:24:06 UTC ---
The assembly code for -O3 is almost the same for revisions 187182 and 187183.
However with '-O3 -ffast-math', revision 187182 gives for the loop

L12:
    movapd    %xmm2, %xmm1
L9:
    movsd    8(%rsi), %xmm0
    andpd    %xmm3, %xmm0
    comisd    %xmm0, %xmm1
    movapd    %xmm0, %xmm2
    maxsd    %xmm1, %xmm2
    cmovb    %edx, %eax
    addl    $1, %edx
    addq    $8, %rsi
    cmpl    %ecx, %edx
    jne    L12

while revision 187183 gives

L6:
    movapd    %xmm2, %xmm1
L3:
    movsd    8(%rsi), %xmm0
    movapd    %xmm1, %xmm3
    andpd    %xmm4, %xmm0
    comisd    %xmm0, %xmm1
    movapd    %xmm0, %xmm2
    cmplesd    %xmm1, %xmm2
    cmovb    %edx, %eax
    addl    $1, %edx
    addq    $8, %rsi
    cmpl    %ecx, %edx
    andpd    %xmm2, %xmm3
    andnpd    %xmm0, %xmm2
    orpd    %xmm3, %xmm2
    jne    L6

(for the later -ffast-math only change ucomisd to comisd).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
  2012-05-18 11:48 ` [Bug middle-end/53395] " rguenth at gcc dot gnu.org
  2012-05-18 11:54 ` [Bug tree-optimization/53395] " dominiq at lps dot ens.fr
@ 2012-05-18 16:02 ` pinskia at gcc dot gnu.org
  2012-05-18 17:34 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-05-18 16:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|middle-end                  |tree-optimization

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-05-18 15:25:13 UTC ---
  dmax_12 = ABS_EXPR <D.1877_11>;
  dmax_2 = dmax_1 >= dmax_12 ? dmax_1 : dmax_12;
  __result_idamax_21 = dmax_1 >= dmax_12 ? __result_idamax_22 : i_3;

Hmm,  dmax_2 should have been MAX_EXPR.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (2 preceding siblings ...)
  2012-05-18 16:02 ` pinskia at gcc dot gnu.org
@ 2012-05-18 17:34 ` pinskia at gcc dot gnu.org
  2012-05-18 17:42 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-05-18 17:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |pinskia at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-05-18 16:13:59 UTC ---
I have a patch.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (3 preceding siblings ...)
  2012-05-18 17:34 ` pinskia at gcc dot gnu.org
@ 2012-05-18 17:42 ` pinskia at gcc dot gnu.org
  2012-05-18 17:46 ` dominiq at lps dot ens.fr
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-05-18 17:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-05-18 16:03:46 UTC ---
This should fix tree-if-conv.c:
Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c    (revision 187647)
+++ tree-if-conv.c    (working copy)
@@ -1313,8 +1313,8 @@ predicate_scalar_phi (gimple phi, tree c
                || bb_postdominates_preds (bb));

       /* Build new RHS using selected condition and arguments.  */
-      rhs = build3 (COND_EXPR, TREE_TYPE (res),
-            unshare_expr (cond), arg_0, arg_1);
+      rhs = fold_build3 (COND_EXPR, TREE_TYPE (res),
+                 unshare_expr (cond), arg_0, arg_1);
     }

   new_stmt = gimple_build_assign (res, rhs);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (4 preceding siblings ...)
  2012-05-18 17:42 ` pinskia at gcc dot gnu.org
@ 2012-05-18 17:46 ` dominiq at lps dot ens.fr
  2012-05-18 17:51 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-18 17:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

--- Comment #6 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-18 17:41:21 UTC ---
> This should fix tree-if-conv.c:

It does. Thanks.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (5 preceding siblings ...)
  2012-05-18 17:46 ` dominiq at lps dot ens.fr
@ 2012-05-18 17:51 ` pinskia at gcc dot gnu.org
  2012-05-21 10:11 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-05-18 17:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-05-18
     Ever Confirmed|0                           |1

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-05-18 15:32:04 UTC ---
This was mentioned on http://gcc.gnu.org/ml/gcc/2011-10/msg00422.html .  So
there are two ways of fixing this bug.
Way #1: Fix ifcvt on the tree level to produce MAX_EXPR instead of the
COND_EXPR.

Way #2: Simplify COND_EXPR to MAX_EXPR during expanding or some other time.

I want to say way #1 is the correct fix.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (6 preceding siblings ...)
  2012-05-18 17:51 ` pinskia at gcc dot gnu.org
@ 2012-05-21 10:11 ` rguenth at gcc dot gnu.org
  2012-08-28  1:18 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-21 10:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-21 09:39:51 UTC ---
Note that if-conversion does not fold to not destroy valid gimple RHS and
to avoid canonicalizing the condition.  Producing a MAX_EXPR is certainly
fine of course ... (I'm to blame for not adding testcases for some of the
if-conversion improvements I've done in the last months ...)

But I suppose with the simple patch you at least need to gimplify the
result.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (7 preceding siblings ...)
  2012-05-21 10:11 ` rguenth at gcc dot gnu.org
@ 2012-08-28  1:18 ` pinskia at gcc dot gnu.org
  2012-08-28  7:05 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-08-28  1:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-08-28 01:18:17 UTC ---
Created attachment 28091
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28091
New patch based on Richard's comments

Testing a new fix which includes Richard's comments.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (8 preceding siblings ...)
  2012-08-28  1:18 ` pinskia at gcc dot gnu.org
@ 2012-08-28  7:05 ` pinskia at gcc dot gnu.org
  2012-09-03 20:32 ` pinskia at gcc dot gnu.org
  2012-09-03 20:32 ` pinskia at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-08-28  7:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-08-28 07:04:52 UTC ---
While working on this, I noticed that sometimes we don't produce what the x86
back-end calls IEEE MIN/MAX either but that is a different issue all together
and I have a fix for that (I ran into that while implementing improving the
last phi-opt that also converts those PHIs into COND_EXPR like ifcvt does).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (9 preceding siblings ...)
  2012-08-28  7:05 ` pinskia at gcc dot gnu.org
@ 2012-09-03 20:32 ` pinskia at gcc dot gnu.org
  2012-09-03 20:32 ` pinskia at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-09-03 20:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-09-03 20:32:33 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/53395] [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183
  2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
                   ` (10 preceding siblings ...)
  2012-09-03 20:32 ` pinskia at gcc dot gnu.org
@ 2012-09-03 20:32 ` pinskia at gcc dot gnu.org
  11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-09-03 20:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53395

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-09-03 20:31:55 UTC ---
Author: pinskia
Date: Mon Sep  3 20:31:52 2012
New Revision: 190904

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190904
Log:
2012-09-03  Andrew Pinski  <apinski@cavium.com>

    PR tree-opt/53395
    * tree-if-conv.c (constant_or_ssa_name): New function.
    (fold_build_cond_expr): New function.
    (predicate_scalar_phi): Use fold_build_cond_expr instead of build3.
    (predicate_mem_writes): Likewise.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-if-conv.c


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-09-03 20:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-18 11:45 [Bug tree-optimization/53395] New: [4.8 Regression] The LAPACK functions i(d|s)amax are more than two times slower after revision 187183 dominiq at lps dot ens.fr
2012-05-18 11:48 ` [Bug middle-end/53395] " rguenth at gcc dot gnu.org
2012-05-18 11:54 ` [Bug tree-optimization/53395] " dominiq at lps dot ens.fr
2012-05-18 16:02 ` pinskia at gcc dot gnu.org
2012-05-18 17:34 ` pinskia at gcc dot gnu.org
2012-05-18 17:42 ` pinskia at gcc dot gnu.org
2012-05-18 17:46 ` dominiq at lps dot ens.fr
2012-05-18 17:51 ` pinskia at gcc dot gnu.org
2012-05-21 10:11 ` rguenth at gcc dot gnu.org
2012-08-28  1:18 ` pinskia at gcc dot gnu.org
2012-08-28  7:05 ` pinskia at gcc dot gnu.org
2012-09-03 20:32 ` pinskia at gcc dot gnu.org
2012-09-03 20:32 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).