public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/40106]  New: Time increase with inlining for the Polyhedron test air.f90
@ 2009-05-11 18:04 dominiq at lps dot ens dot fr
  2009-05-12 11:52 ` [Bug middle-end/40106] " hubicka at gcc dot gnu dot org
                   ` (62 more replies)
  0 siblings, 63 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-11 18:04 UTC (permalink / raw)
  To: gcc-bugs

The run time of air.f90 of the Polyhedron test suite takes ~15% more time when
compiled with -fwhole-file than without the option.  I have checked that the
subroutines DERIV(X|Y) are inlined with -finline-limit=100, but not with
-finline-limit=50 (for the later I recover the timing without -fwhole-file).
What I have found very odd is that if I manually inline only a single call (see
below) I get the same timing that with all of them (2*14) inlined. This is the
case for trunk and gfortran 4.4.0, but not for 4.3.3 which gives a slower
executable.


I have inlined 

      SUBROUTINE DERIVX(D,U,Ux,Al,Np,Nd,M)
      IMPLICIT REAL*8(A-H,O-Z)
      PARAMETER (NX=150,NY=150)
      DIMENSION D(NX,33) , U(NX,NY) , Ux(NX,NY) , Al(30) , Np(30)
      DO jm = 1 , M
         jmax = 0
         jmin = 1
         DO i = 1 , Nd
            jmax = jmax + Np(i) + 1
            DO j = jmin , jmax
               uxt = 0.
               DO k = 0 , Np(i)
                  uxt = uxt + D(j,k+1)*U(jmin+k,jm)
               ENDDO
               Ux(j,jm) = uxt*Al(i)
            ENDDO
!
            jmin = jmin + Np(i) + 1
         ENDDO
      ENDDO
      CONTINUE
      END

at line 793 as

!       CALL DERIVX(DX,f4,f4x,ALX,NPX,NDX,MXPy)
      DO jm = 1 , MXPy
         jmax = 0
         jmin = 1
         DO i = 1 , NDX
            jmax = jmax + NPX(i) + 1
            DO j = jmin , jmax
               uxt = 0.
               DO k = 0 , NPX(i)
                  uxt = uxt + DX(j,k+1)*f4(jmin+k,jm)
               ENDDO
               f4x(j,jm) = uxt*ALX(i)
            ENDDO
            jmin = jmin + NPX(i) + 1
         ENDDO
      ENDDO


-- 
           Summary: Time increase with inlining for the Polyhedron test
                    air.f90
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dominiq at lps dot ens dot fr
 GCC build triplet: i686-apple-darwin9
  GCC host triplet: i686-apple-darwin9
GCC target triplet: i686-apple-darwin9


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
@ 2009-05-12 11:52 ` hubicka at gcc dot gnu dot org
  2009-05-12 13:23 ` dominiq at lps dot ens dot fr
                   ` (61 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2009-05-12 11:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from hubicka at gcc dot gnu dot org  2009-05-12 11:52 -------
Hmm, the inlined functions has loop depth of 4, that makes it predicted to
iterate quite few times. My guess would be that inlining increases loop depth
that in turn makes GCC to conclude that one of loops that are in fact internal
hot loops are cold. decreasing --param hot-bb-frequency-fraction might help in
this case.

I've seen this in past, just hope it is quite rare.
If we find enough testcases like this, it might make sense to alter the
predicate deciding on hot-bb to always consider innermost loops hot no mater on
their relative frequency.  Woud need to have flag on BB or loop structure
always available though.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
  2009-05-12 11:52 ` [Bug middle-end/40106] " hubicka at gcc dot gnu dot org
@ 2009-05-12 13:23 ` dominiq at lps dot ens dot fr
  2009-05-12 14:47 ` rguenther at suse dot de
                   ` (60 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-12 13:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from dominiq at lps dot ens dot fr  2009-05-12 13:23 -------
> decreasing --param hot-bb-frequency-fraction might help in this case.

I have tried --param hot-bb-frequency-fraction=1 (which seems the smallest
possible value, see pr40119), but it did not changed anything.

What I find very surprising is that the ~15% slow-down appears as soon as one
call is inlined, but without further slow-down with more inlining (I have
tested 4 and -fwhole-file inline 28 of them). If the block was misoptimized I
would expect a slow-down increasing with the number of inlined calls. Could the
problem be related to cache management instead (L1, since L2 is 4Mb on my
core2Duo)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
  2009-05-12 11:52 ` [Bug middle-end/40106] " hubicka at gcc dot gnu dot org
  2009-05-12 13:23 ` dominiq at lps dot ens dot fr
@ 2009-05-12 14:47 ` rguenther at suse dot de
  2009-05-12 16:18 ` dominiq at lps dot ens dot fr
                   ` (59 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2009-05-12 14:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenther at suse dot de  2009-05-12 14:47 -------
Subject: Re:  Time increase with inlining for the
 Polyhedron test air.f90

On Tue, 12 May 2009, dominiq at lps dot ens dot fr wrote:

> ------- Comment #2 from dominiq at lps dot ens dot fr  2009-05-12 13:23 -------
> > decreasing --param hot-bb-frequency-fraction might help in this case.
> 
> I have tried --param hot-bb-frequency-fraction=1 (which seems the smallest
> possible value, see pr40119), but it did not changed anything.
> 
> What I find very surprising is that the ~15% slow-down appears as soon as one
> call is inlined, but without further slow-down with more inlining (I have
> tested 4 and -fwhole-file inline 28 of them). If the block was misoptimized I
> would expect a slow-down increasing with the number of inlined calls. Could the
> problem be related to cache management instead (L1, since L2 is 4Mb on my
> core2Duo)?

You may be hitting some analysis limits either for maximum loop depth
or similar stuff.  There is no other way to analyze what is the difference
in optimizations produced.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (2 preceding siblings ...)
  2009-05-12 14:47 ` rguenther at suse dot de
@ 2009-05-12 16:18 ` dominiq at lps dot ens dot fr
  2009-05-22 20:39 ` dominiq at lps dot ens dot fr
                   ` (58 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-12 16:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from dominiq at lps dot ens dot fr  2009-05-12 16:18 -------
Assembly code for the inlined inner loop:

L123:
        movsd   (%rdx), %xmm15
        movsd   8(%rdx), %xmm6
        mulsd   (%rax), %xmm15
        mulsd   1200(%rax), %xmm6
        movsd   16(%rdx), %xmm4
        movsd   24(%rdx), %xmm3
        mulsd   2400(%rax), %xmm4
        mulsd   3600(%rax), %xmm3
        addsd   %xmm15, %xmm0
        movsd   32(%rdx), %xmm9
        movsd   40(%rdx), %xmm1
        mulsd   4800(%rax), %xmm9
        mulsd   6000(%rax), %xmm1
        addsd   %xmm6, %xmm0
        movsd   48(%rdx), %xmm7
        movsd   56(%rdx), %xmm2
        addq    $64, %rdx
        mulsd   7200(%rax), %xmm7
        mulsd   8400(%rax), %xmm2
        addq    $9600, %rax
        addsd   %xmm4, %xmm0
        cmpq    %rax, %rcx
        addsd   %xmm3, %xmm0
        addsd   %xmm9, %xmm0
        addsd   %xmm1, %xmm0
        addsd   %xmm7, %xmm0
        addsd   %xmm2, %xmm0
        jne     L123

and in the subroutine DERIVX:

L953:
        movsd   (%rax), %xmm9
        addl    $8, %ebx
        movsd   8(%rax), %xmm8
        mulsd   (%rcx), %xmm9
        mulsd   1200(%rcx), %xmm8
        movsd   16(%rax), %xmm7
        movsd   24(%rax), %xmm6
        mulsd   2400(%rcx), %xmm7
        mulsd   3600(%rcx), %xmm6
        addsd   %xmm9, %xmm0
        movsd   32(%rax), %xmm5
        movsd   40(%rax), %xmm4
        mulsd   4800(%rcx), %xmm5
        mulsd   6000(%rcx), %xmm4
        addsd   %xmm8, %xmm0
        movsd   48(%rax), %xmm3
        movsd   56(%rax), %xmm1
        addq    $64, %rax
        mulsd   7200(%rcx), %xmm3
        mulsd   8400(%rcx), %xmm1
        addq    $9600, %rcx
        cmpl    %edi, %ebx
        addsd   %xmm7, %xmm0
        addsd   %xmm6, %xmm0
        addsd   %xmm5, %xmm0
        addsd   %xmm4, %xmm0
        addsd   %xmm3, %xmm0
        addsd   %xmm1, %xmm0
        jne     L953

The structure of the outer loops seems quite comparable in both cases.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (3 preceding siblings ...)
  2009-05-12 16:18 ` dominiq at lps dot ens dot fr
@ 2009-05-22 20:39 ` dominiq at lps dot ens dot fr
  2009-05-22 20:41 ` dominiq at lps dot ens dot fr
                   ` (57 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-22 20:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from dominiq at lps dot ens dot fr  2009-05-22 20:39 -------
Created an attachment (id=17903)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17903&action=view)
air.s file for i686-apple-darwin9 compiled with -m64 -O3 -ffast-math
-funroll-loops


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (4 preceding siblings ...)
  2009-05-22 20:39 ` dominiq at lps dot ens dot fr
@ 2009-05-22 20:41 ` dominiq at lps dot ens dot fr
  2009-05-22 20:52 ` dominiq at lps dot ens dot fr
                   ` (56 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-22 20:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from dominiq at lps dot ens dot fr  2009-05-22 20:41 -------
Created an attachment (id=17904)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17904&action=view)
air.s file for i686-apple-darwin9 compiled with -m64 -O3 -ffast-math
-funroll-loops -fwhole-file


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (5 preceding siblings ...)
  2009-05-22 20:41 ` dominiq at lps dot ens dot fr
@ 2009-05-22 20:52 ` dominiq at lps dot ens dot fr
  2009-07-13 15:29 ` burnus at gcc dot gnu dot org
                   ` (55 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-22 20:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from dominiq at lps dot ens dot fr  2009-05-22 20:52 -------
I had a closer look at the code and found that the inner loop

               DO k = 0 , Np(i)
                  uxt = uxt + D(j,k+1)*U(jmin+k,jm)
               ENDDO

is unrolled 8 times, but Np(i) is always equal to 4, so the relevant part of
the assembly is

...
        je      L951
        testl   %esi, %esi
        je      L915
        cmpl    $1, %esi
        je      L945
        cmpl    $2, %esi
        .p2align 4,,5
        je      L946
        cmpl    $3, %esi
        .p2align 4,,5
        je      L947
        cmpl    $4, %esi
        .p2align 4,,5
        je      L948
        cmpl    $5, %esi
        .p2align 4,,5
        je      L949
        cmpl    $6, %esi
        .p2align 4,,5
        je      L950
...

where the jump for $5 is the relevant one (this does look an optimal way to
handle the preamble).

I have also done some profiling and found that 'pow$fenv_access_off' in
libSystem.B.dylib  (PowerInner for ppc) takes a significant amount of time for
the executable compiled with -fwhole-file.

Any idea why? Note that derivx and derivy are inlined with -fwhole-file and
looking at the *s files attached in comment #5 and #6, everything looks normal
at this point.

                        i686-apple-darwin9

[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops air.f90
[ibook-dhum] lin/test% rm -f tmp ; time a.out > tmp
8.451u 0.116s 0:08.61 99.4%     0+0k 0+6io 0pf+0w

+ 99.5%, start, a.out
| + 99.5%, main, a.out
| | + 99.4%, MAIN__, a.out
| | |   12.8%, derivy_, a.out
| | |   11.3%, derivx_, a.out
| | |   5.1%, fvsplty2_, a.out
| | |   4.1%, state_, a.out
| | |   3.1%, fvspltx2_, a.out
| | | - 2.8%, _gfortrani_list_formatted_write, libgfortran.3.dylib
| | | + 0.6%, botwall_, a.out
| | | |   0.2%, pow$fenv_access_off, libSystem.B.dylib
| | | |   0.0%, exp, libSystem.B.dylib
| | | |   0.0%, dyld_stub_exp, a.out
| | | + 0.6%, topwall_, a.out
| | | |   0.4%, pow$fenv_access_off, libSystem.B.dylib
| | | |   0.1%, exp, libSystem.B.dylib
| | | |   0.0%, dyld_stub_pow, a.out
| | | + 0.3%, aexit_, a.out
| | | |   0.1%, exp, libSystem.B.dylib
| | | + 0.2%, inlet_, a.out
| | | |   0.1%, exp, libSystem.B.dylib
| | | |   0.0%, log$fenv_access_off, libSystem.B.dylib
| | |   0.2%, log$fenv_access_off, libSystem.B.dylib
| | | - 0.1%, _gfortran_st_write_done, libgfortran.3.dylib
| | | - 0.1%, data_transfer_init, libgfortran.3.dylib
| | | - 0.1%, formatted_transfer, libgfortran.3.dylib
| | |   0.0%, _gfortran_transfer_real, libgfortran.3.dylib
| |   0.0%, _gfortran_st_write, libgfortran.3.dylib


[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops -fwhole-file
air.f90
[ibook-dhum] lin/test% rm -f tmp ; time a.out > tmp
9.752u 0.096s 0:09.90 99.3%     0+0k 0+6io 0pf+0w

+ 99.5%, start, a.out
| + 99.5%, main, a.out
| | + 99.5%, MAIN__, a.out
| | | + 15.0%, pow$fenv_access_off, libSystem.B.dylib             <==== Why?
| | | |   0.4%, floorl$fenv_access_off, libSystem.B.dylib
| | | |   0.2%, dyld_stub_fabs, libSystem.B.dylib
| | | |   0.1%, dyld_stub_floorl, libSystem.B.dylib
| | | |   0.1%, fabs$fenv_access_off, libSystem.B.dylib
| | |   4.6%, fvsplty2_, a.out
| | |   3.5%, state_.clone.2, a.out
| | | - 2.9%, _gfortrani_list_formatted_write, libgfortran.3.dylib
| | |   2.8%, fvspltx2_, a.out
| | | + 0.4%, topwall_, a.out
| | | |   0.2%, pow$fenv_access_off, libSystem.B.dylib
| | | |   0.1%, exp, libSystem.B.dylib
| | | + 0.4%, botwall_.clone.3, a.out
| | | |   0.2%, pow$fenv_access_off, libSystem.B.dylib
| | | |   0.0%, exp, libSystem.B.dylib
| | | + 0.3%, aexit_.clone.4, a.out
| | | |   0.1%, exp, libSystem.B.dylib
| | | |   0.0%, log$fenv_access_off, libSystem.B.dylib
| | |   0.3%, dyld_stub_pow, a.out
| | | + 0.2%, inlet_, a.out
| | | |   0.1%, exp, libSystem.B.dylib
| | | |   0.0%, dyld_stub_log, a.out
| | | - 0.2%, _gfortran_st_write_done, libgfortran.3.dylib
| | | - 0.1%, formatted_transfer, libgfortran.3.dylib
| | | - 0.1%, data_transfer_init, libgfortran.3.dylib
| | |   0.1%, log$fenv_access_off, libSystem.B.dylib
| | |   0.0%, _gfortrani_flush_if_preconnected, libgfortran.3.dylib
| |   0.0%, pow$fenv_access_off, libSystem.B.dylib
| |   0.0%, _gfortrani_free_internal_unit, libgfortran.3.dylib


                        powerpc-apple-darwin9

gfc -m64 -O3 -ffast-math -funroll-loops air.f90

- 75.5%, MAIN__, a.out
- 5.9%, derivy_, a.out
- 5.4%, derivx_, a.out
- 4.7%, fvsplty2_, a.out
- 4.2%, fvspltx2_, a.out
- 2.1%, state_, a.out
- 0.6%, dyld_stub_sqrt, a.out
- 0.5%, ml_set_interrupts_enabled, mach_kernel
- 0.2%, sqrt, libSystem.B.dylib
- 0.2%, exp, libSystem.B.dylib
- 0.2%, log, libSystem.B.dylib
- 0.1%, PowerInner, libSystem.B.dylib
- 0.1%, inlet_, a.out
- 0.0%, aexit_, a.out
- 0.0%, dyld_stub_pow, a.out
- 0.0%, botwall_, a.out
- 0.0%, topwall_, a.out
- 0.0%, pow, libSystem.B.dylib
- 0.0%, dyld_stub_log, a.out
- 0.0%, __dtoa, libSystem.B.dylib
- 0.0%, next_format0, libgfortran.3.dylib
- 0.0%, log10, libSystem.B.dylib
- 0.0%, dyld_stub_memset, libSystem.B.dylib
- 0.0%, dyld_stub_memcpy, libgfortran.3.dylib
- 0.0%, dyld_stub_exp, a.out
- 0.0%, dyld_stub___sfvwrite, libSystem.B.dylib
- 0.0%, __vfprintf, libSystem.B.dylib
- 0.0%, __quorem_D2A, libSystem.B.dylib
- 0.0%, __Bfree_D2A, libSystem.B.dylib

gfc -m64 -O3 -ffast-math -funroll-loops -fwhole-file air.f90

- 82.6%, MAIN__, a.out
- 5.3%, PowerInner, libSystem.B.dylib                             <==== Why?
- 4.3%, fvsplty2_, a.out
- 3.2%, fvspltx2_, a.out
- 1.9%, state_.clone.2, a.out
- 1.3%, pow, libSystem.B.dylib
- 0.4%, ml_set_interrupts_enabled, mach_kernel
- 0.4%, dyld_stub_sqrt, a.out
- 0.1%, log, libSystem.B.dylib
- 0.1%, dyld_stub_pow, a.out
- 0.1%, sqrt, libSystem.B.dylib
- 0.1%, exp, libSystem.B.dylib
- 0.0%, inlet_, a.out
- 0.0%, botwall_.clone.3, a.out
- 0.0%, topwall_, a.out
- 0.0%, aexit_.clone.4, a.out
- 0.0%, dyld_stub_log, a.out
- 0.0%, dyld_stub_localeconv_l, libSystem.B.dylib
- 0.0%, dyld_stub_exp, a.out
- 0.0%, dyld_stub___pow5mult_D2A, libSystem.B.dylib
- 0.0%, data_transfer_init, libgfortran.3.dylib
- 0.0%, __umodti3, libgfortran.3.dylib
- 0.0%, __dtoa, libSystem.B.dylib
- 0.0%, __Bfree_D2A, libSystem.B.dylib
- 0.0%, __Balloc_D2A, libSystem.B.dylib


-- 

dominiq at lps dot ens dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jh at suse dot cz


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (6 preceding siblings ...)
  2009-05-22 20:52 ` dominiq at lps dot ens dot fr
@ 2009-07-13 15:29 ` burnus at gcc dot gnu dot org
  2009-08-25 11:56 ` dominiq at lps dot ens dot fr
                   ` (54 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-13 15:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from burnus at gcc dot gnu dot org  2009-07-13 15:29 -------
(Not restricted to Darwin, happens also on x86-64-linux.)


-- 

burnus at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |burnus at gcc dot gnu dot
                   |                            |org
  GCC build triplet|i686-apple-darwin9          |
   GCC host triplet|i686-apple-darwin9          |
 GCC target triplet|i686-apple-darwin9          |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (7 preceding siblings ...)
  2009-07-13 15:29 ` burnus at gcc dot gnu dot org
@ 2009-08-25 11:56 ` dominiq at lps dot ens dot fr
  2009-08-25 12:01 ` [Bug middle-end/40106] Time increase " dominiq at lps dot ens dot fr
                   ` (53 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 11:56 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from dominiq at lps dot ens dot fr  2009-08-25 11:55 -------
I see a similar slowdown with the patch in
http://gcc.gnu.org/ml/fortran/2009-08/msg00361.html (see
http://gcc.gnu.org/ml/fortran/2009-08/msg00377.html). I suspect it is related
to pr41098, but I don't know how to show it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (8 preceding siblings ...)
  2009-08-25 11:56 ` dominiq at lps dot ens dot fr
@ 2009-08-25 12:01 ` dominiq at lps dot ens dot fr
  2009-08-25 12:22 ` [Bug middle-end/40106] Time increase with inlining " rguenth at gcc dot gnu dot org
                   ` (52 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 12:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from dominiq at lps dot ens dot fr  2009-08-25 12:01 -------
> I see a similar slowdown with the patch in ...

I have again forgotten to say that I saw the slowdown without the -fwhole-file
option.
I have changed the summary to reflect that.


-- 

dominiq at lps dot ens dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Time increase with inlining |Time increase for the
                   |for the Polyhedron test     |Polyhedron test air.f90
                   |air.f90                     |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (9 preceding siblings ...)
  2009-08-25 12:01 ` [Bug middle-end/40106] Time increase " dominiq at lps dot ens dot fr
@ 2009-08-25 12:22 ` rguenth at gcc dot gnu dot org
  2009-08-25 12:30 ` dominiq at lps dot ens dot fr
                   ` (51 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-25 12:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from rguenth at gcc dot gnu dot org  2009-08-25 12:22 -------
We clone quite a few functions with -fwhole-file but appearantly we fail to
apply constant propagation for &CONST_DECL arguments which is a pity.  In fact
we seem to clone them without any change.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mjambor at suse dot cz
            Summary|Time increase for the       |Time increase with inlining
                   |Polyhedron test air.f90     |for the Polyhedron test
                   |                            |air.f90


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (10 preceding siblings ...)
  2009-08-25 12:22 ` [Bug middle-end/40106] Time increase with inlining " rguenth at gcc dot gnu dot org
@ 2009-08-25 12:30 ` dominiq at lps dot ens dot fr
  2009-08-25 12:40 ` rguenther at suse dot de
                   ` (50 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 12:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from dominiq at lps dot ens dot fr  2009-08-25 12:30 -------
>From comment #9, I think inlining is just exposing a latent missed optimization
related to the way the middle end handle pow(). This is why I changed the
summary.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (11 preceding siblings ...)
  2009-08-25 12:30 ` dominiq at lps dot ens dot fr
@ 2009-08-25 12:40 ` rguenther at suse dot de
  2009-08-25 12:51 ` dominiq at lps dot ens dot fr
                   ` (49 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2009-08-25 12:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from rguenther at suse dot de  2009-08-25 12:40 -------
Subject: Re:  Time increase with inlining for the
 Polyhedron test air.f90

On Tue, 25 Aug 2009, dominiq at lps dot ens dot fr wrote:

> ------- Comment #12 from dominiq at lps dot ens dot fr  2009-08-25 12:30 -------
> From comment #9, I think inlining is just exposing a latent missed optimization
> related to the way the middle end handle pow(). This is why I changed the
> summary.

I don't think the issue is pow expansion.  Does -fno-ipa-cp fix the
regression?

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (12 preceding siblings ...)
  2009-08-25 12:40 ` rguenther at suse dot de
@ 2009-08-25 12:51 ` dominiq at lps dot ens dot fr
  2009-08-25 15:31 ` dominiq at lps dot ens dot fr
                   ` (48 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 12:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from dominiq at lps dot ens dot fr  2009-08-25 12:51 -------
> I don't think the issue is pow expansion.  

What I do see from different means is that the number of calls to pow()
increases from 63,907,869 to 1,953,139,629. Since pow() is not exactly cheap, I
think this could be sufficient to explain the 1.8s difference I see. Note that
the code has plenty of x**2 and x**a where a is real.

> Does -fno-ipa-cp fix the regression?

No.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (13 preceding siblings ...)
  2009-08-25 12:51 ` dominiq at lps dot ens dot fr
@ 2009-08-25 15:31 ` dominiq at lps dot ens dot fr
  2009-08-25 21:25 ` [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization dominiq at lps dot ens dot fr
                   ` (47 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 15:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from dominiq at lps dot ens dot fr  2009-08-25 15:30 -------
I think I have made some progress to understand the problem:

(1) The 1,953,139,629 or so calls to pow() are the non optimized base.

(2) For working situations this number is reduced to 63,907,869 or so when
using the -funsafe-math-optimizations option:

[ibook-dhum] lin/test% time a.out > /dev/null
11.348u 0.049s 0:11.41 99.7%    0+0k 0+7io 0pf+0w
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.464u 0.046s 0:08.52 99.7%     0+0k 0+8io 0pf+0w
[ibook-dhum] lin/test% gfc -fwhole-file -m64 -O2 -funsafe-math-optimizations
air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.471u 0.047s 0:08.53 99.7%     0+0k 0+7io 0pf+0w

so with -O2 -funsafe-math-optimizations the optimization is still there with
-fwhole-file.

(3) The critical option with -fwhole-file is -finline-functions:

[ibook-dhum] lin/test% gfc -m64 -O2 -finline-functions
-funsafe-math-optimizations air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.464u 0.045s 0:08.52 99.7%     0+0k 0+8io 0pf+0w
[ibook-dhum] lin/test% gfc -fwhole-file -m64 -O2 -finline-functions
-funsafe-math-optimizations air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
10.053u 0.046s 0:10.11 99.8%    0+0k 0+8io 0pf+0w

Note that the patch in http://gcc.gnu.org/ml/fortran/2009-08/msg00361.html
seems to prevent the optimization coming from -funsafe-math-optimizations (see
http://gcc.gnu.org/ml/fortran/2009-08/msg00390.html ).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (14 preceding siblings ...)
  2009-08-25 15:31 ` dominiq at lps dot ens dot fr
@ 2009-08-25 21:25 ` dominiq at lps dot ens dot fr
  2009-08-27 21:59 ` dominiq at lps dot ens dot fr
                   ` (46 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 21:25 UTC (permalink / raw)
  To: gcc-bugs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 864 bytes --]



------- Comment #16 from dominiq at lps dot ens dot fr  2009-08-25 21:25 -------
After some discussion on IRC with Tobias Schlüter, it seems that the problem
comes from bad optimizations that are broken by chance with the original code.
Commenting line 139:

         WRITE (6,*) i , spx(i) , epx(i) , NPX(i)

is enough to go from ~8.5s to ~10.2s without having nothing to do with
-fwhole-file or Tobias' patch.


-- 

dominiq at lps dot ens dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Time increase with inlining |Time increase for the
                   |for the Polyhedron test     |Polyhedron test air.f90 due
                   |air.f90                     |to bad optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (15 preceding siblings ...)
  2009-08-25 21:25 ` [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization dominiq at lps dot ens dot fr
@ 2009-08-27 21:59 ` dominiq at lps dot ens dot fr
  2009-08-28  1:09 ` howarth at nitro dot med dot uc dot edu
                   ` (45 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-27 21:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from dominiq at lps dot ens dot fr  2009-08-27 21:59 -------
Created an attachment (id=18439)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18439&action=view)
reduced test without any subroutine

I have attached a reduced test without any subroutine. It requires the same
input as air.f90, but do not expect meaningful results. As such I get:

[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.306u 0.015s 0:04.32 99.7%     0+0k 0+1io 0pf+0w

If I comment line 94

         WRITE (6,*) i , spx(i) , epx(i) , NPX(i)

I get

[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
6.464u 0.020s 0:06.49 99.8%     0+0k 0+2io 0pf+0w

Among the weirdness of this pr, if I comment also the line 502

      WRITE (7,*) MXPx , MXPy

I get

[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.273u 0.014s 0:04.29 99.7%     0+0k 0+0io 0pf+0w



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (16 preceding siblings ...)
  2009-08-27 21:59 ` dominiq at lps dot ens dot fr
@ 2009-08-28  1:09 ` howarth at nitro dot med dot uc dot edu
  2009-08-28  5:39 ` dominiq at lps dot ens dot fr
                   ` (44 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-08-28  1:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from howarth at nitro dot med dot uc dot edu  2009-08-28 01:09 -------
Why don't you go back to the original test case and see which component of
-funsafe-math-optimizations...

-fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math

is actually causing the problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (17 preceding siblings ...)
  2009-08-28  1:09 ` howarth at nitro dot med dot uc dot edu
@ 2009-08-28  5:39 ` dominiq at lps dot ens dot fr
  2009-08-28  7:19 ` dominiq at lps dot ens dot fr
                   ` (43 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-28  5:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from dominiq at lps dot ens dot fr  2009-08-28 05:39 -------
> Why don't you go back to the original test case and see which component of
> -funsafe-math-optimizations...
>
> -fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math
>
> is actually causing the problem.

See http://gcc.gnu.org/ml/fortran/2009-08/msg00390.html :

I have dug the problem a little bit more and found that the key
option is -funsafe-math-optimizations. I tried to refine that, but as
usual this option is not the sum of -fassociative-math -fno-signed-zeros
-fno-trapping-math -freciprocal-math as said in the manual!-(


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (18 preceding siblings ...)
  2009-08-28  5:39 ` dominiq at lps dot ens dot fr
@ 2009-08-28  7:19 ` dominiq at lps dot ens dot fr
  2009-08-28 12:01 ` dominiq at lps dot ens dot fr
                   ` (42 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-28  7:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from dominiq at lps dot ens dot fr  2009-08-28 07:19 -------
It it helps, I get for the reduced test with the line 94:

[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null                                  
                                                                      4.555u
0.016s 0:04.57 99.7%       0+0k 0+2io 0pf+0w

without it

[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null                                  
                                                                      6.632u
0.020s 0:06.66 99.8%       0+0k 0+0io 0pf+0w


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (19 preceding siblings ...)
  2009-08-28  7:19 ` dominiq at lps dot ens dot fr
@ 2009-08-28 12:01 ` dominiq at lps dot ens dot fr
  2009-08-28 12:23 ` dominiq at lps dot ens dot fr
                   ` (41 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-28 12:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from dominiq at lps dot ens dot fr  2009-08-28 12:01 -------
And finally the winner is -fstrict-overflow!

[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
6.472u 0.020s 0:06.50 99.8%     0+0k 0+2io 0pf+0w               <=== bad

[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.307u 0.016s 0:04.33 99.5%     0+0k 0+0io 0pf+0w              <=== good

[ibook-dhum] lin/test% gfc -m64 -O1 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.347u 0.016s 0:04.37 99.5%     0+0k 0+1io 0pf+0w              <=== good

[ibook-dhum] lin/test% gfc -m64 -O1 -funsafe-math-optimizations
-fstrict-overflow air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
5.962u 0.019s 0:05.99 99.6%     0+0k 0+2io 0pf+0w              <=== bad


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (20 preceding siblings ...)
  2009-08-28 12:01 ` dominiq at lps dot ens dot fr
@ 2009-08-28 12:23 ` dominiq at lps dot ens dot fr
  2009-08-28 13:36 ` howarth at nitro dot med dot uc dot edu
                   ` (40 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-28 12:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from dominiq at lps dot ens dot fr  2009-08-28 12:23 -------
For the original air.f90 I get:

[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
9.572u 0.055s 0:09.66 99.5%     0+0k 0+9io 1pf+0w
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.446u 0.046s 0:08.50 99.7%     0+0k 0+8io 0pf+0w
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air.f90

Commenting the write in line 139, it becomes

[ibook-dhum] lin/test% time a.out > /dev/null
10.083u 0.052s 0:10.15 99.8%    0+0k 0+7io 0pf+0w
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
9.531u 0.045s 0:09.58 99.8%     0+0k 0+7io 0pf+0w


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (21 preceding siblings ...)
  2009-08-28 12:23 ` dominiq at lps dot ens dot fr
@ 2009-08-28 13:36 ` howarth at nitro dot med dot uc dot edu
  2009-08-31 13:06 ` dominiq at lps dot ens dot fr
                   ` (39 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-08-28 13:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from howarth at nitro dot med dot uc dot edu  2009-08-28 13:36 -------
(In reply to comment #20)
> It it helps, I get for the reduced test with the line 94:
> 
> [ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
> -fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
> air_db.f90
> [ibook-dhum] lin/test% time a.out > /dev/null                                  
>                                                                       4.555u
> 0.016s 0:04.57 99.7%       0+0k 0+2io 0pf+0w
> 
> without it
> 
> [ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
> -fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
> air_db.f90
> [ibook-dhum] lin/test% time a.out > /dev/null                                  
>                                                                       6.632u
> 0.020s 0:06.66 99.8%       0+0k 0+0io 0pf+0w
> 

Aren't these compile lines identical? Also, why are you passing
funsafe-math-optimizations. I meant that you should use...

-fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math

instead and work through all of the possible combinations with the inverse
forms -fsigned-zeros, -ftrapping-math, -fno-associative-math and
-fno-reciprocal-math which is 16 combinations.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (22 preceding siblings ...)
  2009-08-28 13:36 ` howarth at nitro dot med dot uc dot edu
@ 2009-08-31 13:06 ` dominiq at lps dot ens dot fr
  2009-08-31 15:04 ` dominiq at lps dot ens dot fr
                   ` (38 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-31 13:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from dominiq at lps dot ens dot fr  2009-08-31 13:06 -------
(In reply to comment #23)
> Aren't these compile lines identical?
Apparently no, -funsafe-math-optimizations turns on optimization(s) that cannot
be undone by

-fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math

> I meant that you should use...
>
> -fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math
>

with commented write:

ibook-dhum] lin/test% gfc -m64 -O2 -fno-signed-zeros -fno-trapping-math
-fassociative-math -freciprocal-math air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
6.194u 0.017s 0:06.21 99.8%     0+0k 0+1io 0pf+0w

with write:

[ibook-dhum] lin/test% gfc -m64 -O2 -fsigned-zeros -ftrapping-math
-fassociative-math -freciprocal-math air_db.f90
f951: warning: -fassociative-math disabled; other options take precedence
[ibook-dhum] lin/test% time a.out > /dev/null
6.306u 0.018s 0:06.33 99.6%     0+0k 0+2io 0pf+0w

> instead and work through all of the possible combinations with the inverse
> forms -fsigned-zeros, -ftrapping-math, -fno-associative-math and
> -fno-reciprocal-math which is 16 combinations.

I had no intention to try the 16 combinations as they are ineffective, the key
optimization being hidden behind funsafe-math-optimization with all the
explicit optimization disabled.  As said in comment #21 the other key option is
-fstrict-overflow. 

I know that all these facts do not make sense, but if you have doubts you can
redo the tests yourself.
As a side comment it would be nice for debugging purpose that the options
combinations of sub-options do not have hidden optimizations (yes I know there
a sentence about that in the manual).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (23 preceding siblings ...)
  2009-08-31 13:06 ` dominiq at lps dot ens dot fr
@ 2009-08-31 15:04 ` dominiq at lps dot ens dot fr
  2009-08-31 15:21 ` jv244 at cam dot ac dot uk
                   ` (37 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-31 15:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from dominiq at lps dot ens dot fr  2009-08-31 15:04 -------
If I compare the results of -fdump-tree-original for the first 2 cases of
comment #21 I get:

[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fdump-tree-original air_db.f90
[ibook-dhum] test/dbg_air% mv air_db.f90.003t.original
air_db.f90.003t.original-no
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow -fdump-tree-original air_db.f90
[ibook-dhum] test/dbg_air% diff -u air_db.f90.003t.original
air_db.f90.003t.original-no
--- air_db.f90.003t.original    2009-08-31 17:01:34.000000000 +0200
+++ air_db.f90.003t.original-no 2009-08-31 17:00:39.000000000 +0200
@@ -548,7 +548,7 @@
                                       logical(kind=4) D.1668;

                                       ict = (integer(kind=4)) (ict + 1);
-                                      if (npx[(integer(kind=8)) i + -1] + 1 >
j)
+                                      if (NON_LVALUE_EXPR
<npx[(integer(kind=8)) i + -1]> >= j)
                                         {
                                           ddx[((integer(kind=8)) ict +
(integer(kind=8)) k * 150) + -151] = xp1[((integer(kind=8)) (ict + 1) +
(integer(kind=8)) k * 150) + -151] - xp1[((integer(kind=8)) ict +
(integer(kind=8)) k * 150) + -151];
                                         }
@@ -621,7 +621,7 @@
                                       logical(kind=4) D.1680;

                                       ict = (integer(kind=4)) (ict + 1);
-                                      if (npy[(integer(kind=8)) i + -1] + 1 >
j)
+                                      if (NON_LVALUE_EXPR
<npy[(integer(kind=8)) i + -1]> >= j)
                                         {
                                           ddy[((integer(kind=8)) k +
(integer(kind=8)) ict * 150) + -151] = yp1[((integer(kind=8)) k +
((integer(kind=8)) ict + 1) * 150) + -151] - yp1[((integer(kind=8)) k +
(integer(kind=8)) ict * 150) + -151];
                                         }

where NON_LVALUE_EXPR appear when the test is compiled without
-fno-strict-overflow.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (24 preceding siblings ...)
  2009-08-31 15:04 ` dominiq at lps dot ens dot fr
@ 2009-08-31 15:21 ` jv244 at cam dot ac dot uk
  2009-08-31 15:23 ` rguenther at suse dot de
                   ` (36 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-31 15:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from jv244 at cam dot ac dot uk  2009-08-31 15:20 -------
(In reply to comment #25)
> -                                      if (npx[(integer(kind=8)) i + -1] + 1 >
> j)
> +                                      if (NON_LVALUE_EXPR
> <npx[(integer(kind=8)) i + -1]> >= j)> where NON_LVALUE_EXPR appear when the test is compiled without
> -fno-strict-overflow.

I wonder if this is a case where the optimizers would benefit from exploiting
the fact that in Fortran integers can never overflow in a valid program ?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (25 preceding siblings ...)
  2009-08-31 15:21 ` jv244 at cam dot ac dot uk
@ 2009-08-31 15:23 ` rguenther at suse dot de
  2009-08-31 23:59 ` dominiq at lps dot ens dot fr
                   ` (35 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2009-08-31 15:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from rguenther at suse dot de  2009-08-31 15:23 -------
Subject: Re:  Time increase for the Polyhedron test
 air.f90 due to bad optimization

On Mon, 31 Aug 2009, jv244 at cam dot ac dot uk wrote:

> ------- Comment #26 from jv244 at cam dot ac dot uk  2009-08-31 15:20 -------
> (In reply to comment #25)
> > -                                      if (npx[(integer(kind=8)) i + -1] + 1 >
> > j)
> > +                                      if (NON_LVALUE_EXPR
> > <npx[(integer(kind=8)) i + -1]> >= j)> where NON_LVALUE_EXPR appear when the test is compiled without
> > -fno-strict-overflow.
> 
> I wonder if this is a case where the optimizers would benefit from exploiting
> the fact that in Fortran integers can never overflow in a valid program ?

In fact it does with -fstrict-overflow.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (26 preceding siblings ...)
  2009-08-31 15:23 ` rguenther at suse dot de
@ 2009-08-31 23:59 ` dominiq at lps dot ens dot fr
  2009-09-01  9:37 ` dominiq at lps dot ens dot fr
                   ` (34 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-31 23:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #28 from dominiq at lps dot ens dot fr  2009-08-31 23:59 -------
Following Richard Guenther's suggestion on IRC, I have tested the following
patch:

--- ../_gcc_clean/gcc/builtins.c        2009-08-31 15:07:18.000000000 +0200
+++ gcc/builtins.c      2009-09-01 01:28:09.000000000 +0200
@@ -3012,7 +3012,7 @@
       real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
       if (real_identical (&c2, &cint)
          && ((flag_unsafe_math_optimizations
-              && optimize_insn_for_speed_p ()
+              /* && optimize_insn_for_speed_p () */
               && powi_cost (n/2) <= POWI_MAX_MULTS)
              || n == 1))
        {

With it I get:

[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.490u 0.018s 0:04.51 99.7%     0+0k 0+3io 0pf+0w

compared to

[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.320u 0.015s 0:04.34 99.7%     0+0k 0+0io 0pf+0w

and there is no call to pow in the assembly. I think the difference is
significant; so it seems that optimize_insn_for_speed_p () is playing some role
elsewhere in the code. Note that if I replace lines 322 and 427

            mu = mu0*(T(i,j)/t02)**1.5*(t02+110.56)/(T(i,j)+110.56)

with

            mu = mu0*sqrt((T(i,j)/t02)**3)*(t02+110.56)/(T(i,j)+110.56)

or

            mu =
mu0*sqrt((T(i,j)/t02))*(T(i,j)/t02)*(t02+110.56)/(T(i,j)+110.56)

there is no call to pow and the code is slightly faster with
-fno-strict-overflow

[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db_1.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.323u 0.015s 0:04.34 99.7%     0+0k 0+0io 0pf+0w
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
air_db_1.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.527u 0.016s 0:04.55 99.5%     0+0k 0+0io 0pf+0w

The original air.f90 compiled with -fwhole-file gives

[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param
min-vect-loop-bound=2 -fwhole-file air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.358u 0.049s 0:08.42 99.6%     0+0k 0+8io 0pf+0w

compared to

[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param
min-vect-loop-bound=2 air.f90
[[ibook-dhum] lin/test% time a.out > /dev/null
8.273u 0.046s 0:08.32 99.8%     0+0k 0+0io 0pf+0w


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (27 preceding siblings ...)
  2009-08-31 23:59 ` dominiq at lps dot ens dot fr
@ 2009-09-01  9:37 ` dominiq at lps dot ens dot fr
  2009-09-03  7:10 ` [Bug middle-end/40106] [4.4/4.5 Regression] " dominiq at lps dot ens dot fr
                   ` (33 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-09-01  9:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #29 from dominiq at lps dot ens dot fr  2009-09-01 09:37 -------
Does anyone understand why commenting a write can change crtl->maybe_hot_insn_p
from 1 to 0?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (28 preceding siblings ...)
  2009-09-01  9:37 ` dominiq at lps dot ens dot fr
@ 2009-09-03  7:10 ` dominiq at lps dot ens dot fr
  2009-09-03 11:20 ` dominiq at lps dot ens dot fr
                   ` (32 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-09-03  7:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #30 from dominiq at lps dot ens dot fr  2009-09-03 07:09 -------
This is a regression from gcc 4.3.4 (gfc=trunk r151295, gfc44=4.4.1,
gfc43=4.3.4):

[ibook-dhum] test/dbg_air% gfc -S -m64 -O2 -funsafe-math-optimizations
air_db.f90                                                                    
[ibook-dhum] test/dbg_air% grep pow air_db.s                                   
                                                                          call 
  _pow
        call    _pow
[ibook-dhum] test/dbg_air% gfc44 -S -m64 -O2 -funsafe-math-optimizations
air_db.f90
[ibook-dhum] test/dbg_air% grep pow air_db.s
        call    _pow
        call    _pow
[ibook-dhum] test/dbg_air% gfc43 -S -m64 -O2 -funsafe-math-optimizations
air_db.f90
[ibook-dhum] test/dbg_air% grep pow air_db.s
[ibook-dhum] test/dbg_air% 


-- 

dominiq at lps dot ens dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Time increase for the       |[4.4/4.5 Regression] Time
                   |Polyhedron test air.f90 due |increase for the Polyhedron
                   |to bad optimization         |test air.f90 due to bad
                   |                            |optimization


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (29 preceding siblings ...)
  2009-09-03  7:10 ` [Bug middle-end/40106] [4.4/4.5 Regression] " dominiq at lps dot ens dot fr
@ 2009-09-03 11:20 ` dominiq at lps dot ens dot fr
  2009-09-06 22:15 ` rguenth at gcc dot gnu dot org
                   ` (31 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-09-03 11:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #31 from dominiq at lps dot ens dot fr  2009-09-03 11:20 -------
More reduced nonfunctional (invalid) test to show the problem:

      IMPLICIT REAL*8(a-H,O-Z)
      PARAMETER (NX=150,NY=150)
      DIMENSION NPX(30), FV2(NX,NY), T(NX,NY), dtt(NX,NY)

      do it = 1, 2000

         DO i = 1 , MXPx
            DO j = 1 , MXPy
               FV2(i,j) = T(i,j)**1.5
           ENDDO
         ENDDO

         DO ix = 1 , NDX
            maxx = maxx + NPX(ix) + 1
            DO iy = 1 , NDY
               DO i = minx , maxx
                  DO j = miny , maxy
                     dtt(i,j) = dtd
                  ENDDO
               ENDDO
               miny = miny + NPX(iy) + 1
            ENDDO
         ENDDO

      end do

      WRITE (7,*) MXPx , MXPy
      END

[ibook-dhum] test/dbg_air% gfc -S -m64 -O2 -funsafe-math-optimizations
air_red.f90
[ibook-dhum] test/dbg_air% grep pow air_red.s
        call    _pow
[ibook-dhum] test/dbg_air% gfc -S -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_red.f90
[ibook-dhum] test/dbg_air% grep pow air_red.s
[ibook-dhum] test/dbg_air% 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (30 preceding siblings ...)
  2009-09-03 11:20 ` dominiq at lps dot ens dot fr
@ 2009-09-06 22:15 ` rguenth at gcc dot gnu dot org
  2009-09-18  8:58 ` rguenth at gcc dot gnu dot org
                   ` (30 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-09-06 22:15 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.4.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (31 preceding siblings ...)
  2009-09-06 22:15 ` rguenth at gcc dot gnu dot org
@ 2009-09-18  8:58 ` rguenth at gcc dot gnu dot org
  2009-10-15 12:49 ` jakub at gcc dot gnu dot org
                   ` (29 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-09-18  8:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #32 from rguenth at gcc dot gnu dot org  2009-09-18 08:58 -------
Honza, this is yours.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |hubicka at gcc dot gnu dot
                   |dot org                     |org
           Priority|P3                          |P1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (32 preceding siblings ...)
  2009-09-18  8:58 ` rguenth at gcc dot gnu dot org
@ 2009-10-15 12:49 ` jakub at gcc dot gnu dot org
  2009-10-18 13:22 ` rguenth at gcc dot gnu dot org
                   ` (28 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-10-15 12:49 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.2                       |4.4.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (33 preceding siblings ...)
  2009-10-15 12:49 ` jakub at gcc dot gnu dot org
@ 2009-10-18 13:22 ` rguenth at gcc dot gnu dot org
  2009-12-15 16:40 ` rguenth at gcc dot gnu dot org
                   ` (27 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-10-18 13:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #33 from rguenth at gcc dot gnu dot org  2009-10-18 13:22 -------
It looks like basic-block frequencies are completely off.  The BB in question
is

  # BLOCK 7 freq:3
  # PRED: 6 [100.0%]  (fallthru,exec) 7 [99.0%]  (false,exec)
  # ivtmp.65_38 = PHI <ivtmp.65_113(6), ivtmp.65_129(7)>
  # ivtmp.68_147 = PHI <ivtmp.68_151(6), ivtmp.68_148(7)>
  D.1360_26 = MEM[index: ivtmp.65_38];
  D.1404_30 = pow (D.1360_26, 1.5e+0);
  MEM[index: ivtmp.68_147] = D.1404_30;
  ivtmp.65_129 = ivtmp.65_38 + 1200;
  ivtmp.68_148 = ivtmp.68_147 + 1200;
  if (ivtmp.77_32 == ivtmp.65_129)
    goto <bb 8>;
  else
    goto <bb 7>;
  # SUCC: 8 [1.0%]  (true,exec) 7 [99.0%]  (false,exec)

And 3 is lower than 11, the minimum frequency a BB is considered not cold.

Predictions for bb 7
  DS theory heuristics (ignored): 0.1%
  first match heuristics: 1.0%
  combined heuristics: 1.0%
  opcode values nonequal (on trees) heuristics (ignored): 28.0%
  loop branch heuristics (ignored): 14.0%
  guessed loop iterations heuristics: 1.0%


but I see most blocks do not have a frequency at all and I also see

  # BLOCK 17 freq:10000
  # PRED: 16 [100.0%]  (fallthru,exec) 17 [99.0%]  (false,exec)
  # ivtmp.16_116 = PHI <ivtmp.16_125(16), ivtmp.16_115(17)>
  MEM[index: ivtmp.16_116] = dtd_56(D);
  ivtmp.16_115 = ivtmp.16_116 + 1200;
  if (ivtmp.27_12 == ivtmp.16_115)
    goto <bb 18>;
  else
    goto <bb 17>;
  # SUCC: 18 [1.0%]  (true,exec) 17 [99.0%]  (false,exec)

which is the block with the highest frequency (the innermost loop of the
2nd nest).

I can imagine that with a lot of inlining and exposing very deep nested
loops alongside very hot not-so-deep loops can cause them to become
artificially cold.

Interestingly the outermost loop blocks do not have any frequency
assigned (that probably means zero).


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-10-18 13:22:22
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (34 preceding siblings ...)
  2009-10-18 13:22 ` rguenth at gcc dot gnu dot org
@ 2009-12-15 16:40 ` rguenth at gcc dot gnu dot org
  2010-01-21 13:16 ` jakub at gcc dot gnu dot org
                   ` (26 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-12-15 16:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #34 from rguenth at gcc dot gnu dot org  2009-12-15 16:40 -------
4.4 is also slow, we know what causes it so this can't be P1.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (35 preceding siblings ...)
  2009-12-15 16:40 ` rguenth at gcc dot gnu dot org
@ 2010-01-21 13:16 ` jakub at gcc dot gnu dot org
  2010-02-25 17:20 ` [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations dominiq at lps dot ens dot fr
                   ` (25 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-01-21 13:16 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.3                       |4.4.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (36 preceding siblings ...)
  2010-01-21 13:16 ` jakub at gcc dot gnu dot org
@ 2010-02-25 17:20 ` dominiq at lps dot ens dot fr
  2010-03-16 15:07 ` dominiq at lps dot ens dot fr
                   ` (24 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-02-25 17:20 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #35 from dominiq at lps dot ens dot fr  2010-02-25 17:20 -------
I changed the summary to reflect the status of this pr (see comment #31). I
think the following questions should be answered:

(a) why optimize_insn_for_speed_p is changed by the options
-funsafe-math-optimizations and -fno-strict-overflow?

(b) why !optimize_size (lines 2929 and 2953 of
http://gcc.gnu.org/viewcvs/branches/gcc-4_3-branch/gcc/builtins.c?revision=151052&view=markup&sortby=file
last 4.3 revision) has been replaced with optimize_insn_for_speed_p () (lines
2961 and 2985 of
http://gcc.gnu.org/viewcvs/branches/gcc-4_4-branch/gcc/builtins.c?revision=145122&view=markup&sortby=file
 first 4.4 revision)?

Side question, is there anybody really convinced that replacing pow(a,b) with
powi(a,n) when b==n is not always a win even for -Os?

Note that the replacement for x**(n/3) * cbrt(x)**(n%3) does not seems guarded
by any optimisation flag.


-- 

dominiq at lps dot ens dot fr changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.4/4.5 Regression] Time   |[4.4/4.5 Regression] Weird
                   |increase for the Polyhedron |interaction between
                   |test air.f90 due to bad     |optimize_insn_for_speed_p
                   |optimization                |and -funsafe-math-
                   |                            |optimizations


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (37 preceding siblings ...)
  2010-02-25 17:20 ` [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations dominiq at lps dot ens dot fr
@ 2010-03-16 15:07 ` dominiq at lps dot ens dot fr
  2010-03-16 15:11 ` rguenther at suse dot de
                   ` (23 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 15:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #36 from dominiq at lps dot ens dot fr  2010-03-16 15:06 -------
> Note that the replacement for x**(n/3) * cbrt(x)**(n%3) does not seems guarded
> by any optimisation flag.

The condition is implemented further down in the code and I missed it:

      if (real_identical (&c2, &c)
          && ((optimize_insn_for_speed_p ()
               && powi_cost (n/3) <= POWI_MAX_MULTS)
              || n == 1))

Why the condition optimize_insn_for_speed_p () is not part of 

  if (fn != NULL_TREE
      && flag_unsafe_math_optimizations
      && (tree_expr_nonnegative_p (arg0)
          || !HONOR_NANS (mode)))

?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (38 preceding siblings ...)
  2010-03-16 15:07 ` dominiq at lps dot ens dot fr
@ 2010-03-16 15:11 ` rguenther at suse dot de
  2010-03-16 15:26 ` rguenth at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2010-03-16 15:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #37 from rguenther at suse dot de  2010-03-16 15:11 -------
Subject: Re:  [4.4/4.5 Regression] Weird interaction
 between optimize_insn_for_speed_p and -funsafe-math-optimizations

On Tue, 16 Mar 2010, dominiq at lps dot ens dot fr wrote:

> 
> 
> ------- Comment #36 from dominiq at lps dot ens dot fr  2010-03-16 15:06 -------
> > Note that the replacement for x**(n/3) * cbrt(x)**(n%3) does not seems guarded
> > by any optimisation flag.
> 
> The condition is implemented further down in the code and I missed it:
> 
>       if (real_identical (&c2, &c)
>           && ((optimize_insn_for_speed_p ()
>                && powi_cost (n/3) <= POWI_MAX_MULTS)
>               || n == 1))
> 
> Why the condition optimize_insn_for_speed_p () is not part of 
> 
>   if (fn != NULL_TREE
>       && flag_unsafe_math_optimizations
>       && (tree_expr_nonnegative_p (arg0)
>           || !HONOR_NANS (mode)))
> 
> ?

Because we unconditionally want to turn pow (x, 1/3) to
cbrt (x) as it is smaller.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (39 preceding siblings ...)
  2010-03-16 15:11 ` rguenther at suse dot de
@ 2010-03-16 15:26 ` rguenth at gcc dot gnu dot org
  2010-03-16 15:50 ` dominiq at lps dot ens dot fr
                   ` (21 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-16 15:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #38 from rguenth at gcc dot gnu dot org  2010-03-16 15:26 -------
Btw, the testcase has

  D.1610_34 = __builtin_pow (D.1564_28, 1.5e+0);

which would expand to

  D.1564_28 * sqrt (D.1564_28)

which is estimated as being larger than the call to pow.  Now this isn't
exactly
true if the target has a sqrt insn, but we do not implement such a
sophisticated
size check.

Especially on embedded targets with soft-float the multiplication would
add a significant code size penalty.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (40 preceding siblings ...)
  2010-03-16 15:26 ` rguenth at gcc dot gnu dot org
@ 2010-03-16 15:50 ` dominiq at lps dot ens dot fr
  2010-03-16 15:52 ` rguenther at suse dot de
                   ` (20 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 15:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #39 from dominiq at lps dot ens dot fr  2010-03-16 15:49 -------
> Especially on embedded targets with soft-float the multiplication would
> add a significant code size penalty.

Even in this case this would strongly of the code. It may be true if other
pieces require log and exp. If not I seriously doubt that replacing the code
for multiplies and square roots will be larger than the code for log and exp.

My (very limited) understanding of this issue is that at some point x*sqrt(x)
is replaced with pow(x,1.5) (so that pow(x,a)*pow(x,b) is optimized as
pow(x,a+b)). So even if the programmer write x*sqrt(x) (s)he can end up with
pow(x,1.5), resulting in poor performances in term of both speed and size (not
speaking of accuracy).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (41 preceding siblings ...)
  2010-03-16 15:50 ` dominiq at lps dot ens dot fr
@ 2010-03-16 15:52 ` rguenther at suse dot de
  2010-03-16 16:04 ` dominiq at lps dot ens dot fr
                   ` (19 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2010-03-16 15:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #40 from rguenther at suse dot de  2010-03-16 15:52 -------
Subject: Re:  [4.4/4.5 Regression] Weird interaction
 between optimize_insn_for_speed_p and -funsafe-math-optimizations

On Tue, 16 Mar 2010, dominiq at lps dot ens dot fr wrote:

> 
> 
> ------- Comment #39 from dominiq at lps dot ens dot fr  2010-03-16 15:49 -------
> > Especially on embedded targets with soft-float the multiplication would
> > add a significant code size penalty.
> 
> Even in this case this would strongly of the code. It may be true if other
> pieces require log and exp. If not I seriously doubt that replacing the code
> for multiplies and square roots will be larger than the code for log and exp.

Parse error.

> My (very limited) understanding of this issue is that at some point x*sqrt(x)
> is replaced with pow(x,1.5) (so that pow(x,a)*pow(x,b) is optimized as
> pow(x,a+b)). So even if the programmer write x*sqrt(x) (s)he can end up with
> pow(x,1.5), resulting in poor performances in term of both speed and size (not
> speaking of accuracy).

Yes, that's true.  This is what you'd expect when optimizing for size - 
turn x*sqrt(x) to pow(x,1.5).

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (42 preceding siblings ...)
  2010-03-16 15:52 ` rguenther at suse dot de
@ 2010-03-16 16:04 ` dominiq at lps dot ens dot fr
  2010-03-16 16:07 ` rguenther at suse dot de
                   ` (18 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 16:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #41 from dominiq at lps dot ens dot fr  2010-03-16 16:04 -------
> > > Especially on embedded targets with soft-float the multiplication would
> > > add a significant code size penalty.
> > 
> > Even in this case this would strongly of the code. It may be true if other
> > pieces require log and exp. If not I seriously doubt that replacing the code
> > for multiplies and square roots will be larger than the code for log and exp.
> 
> Parse error.

Sorry, is "stongly depend on the code" and  "If not, I seriously doubt that
replacing the code for multiplies and square roots will be larger than the code
for log and exp." better? 

pow(a,b) == exp(b*log(a)), so if 'a' is not a constant, you need the code for
log and exp to evaluate x*sqrt(x) as pow(x,1.5), instead of the code for
multiply and sqrt (note that I cannot see how the code for log and exp could
not require the code for multiply). If log or exp codes are not needed by other
parts of the whole program, x*sqrt(x) will almost certainly gives a more
compact code than pow(x,1.5).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (43 preceding siblings ...)
  2010-03-16 16:04 ` dominiq at lps dot ens dot fr
@ 2010-03-16 16:07 ` rguenther at suse dot de
  2010-03-16 16:39 ` dominiq at lps dot ens dot fr
                   ` (17 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2010-03-16 16:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #42 from rguenther at suse dot de  2010-03-16 16:07 -------
Subject: Re:  [4.4/4.5 Regression] Weird interaction
 between optimize_insn_for_speed_p and -funsafe-math-optimizations

On Tue, 16 Mar 2010, dominiq at lps dot ens dot fr wrote:

> 
> 
> ------- Comment #41 from dominiq at lps dot ens dot fr  2010-03-16 16:04 -------
> > > > Especially on embedded targets with soft-float the multiplication would
> > > > add a significant code size penalty.
> > > 
> > > Even in this case this would strongly of the code. It may be true if other
> > > pieces require log and exp. If not I seriously doubt that replacing the code
> > > for multiplies and square roots will be larger than the code for log and exp.
> > 
> > Parse error.
> 
> Sorry, is "stongly depend on the code" and  "If not, I seriously doubt that
> replacing the code for multiplies and square roots will be larger than the code
> for log and exp." better? 
> 
> pow(a,b) == exp(b*log(a)), so if 'a' is not a constant, you need the code for
> log and exp to evaluate x*sqrt(x) as pow(x,1.5), instead of the code for
> multiply and sqrt (note that I cannot see how the code for log and exp could
> not require the code for multiply). If log or exp codes are not needed by other
> parts of the whole program, x*sqrt(x) will almost certainly gives a more
> compact code than pow(x,1.5).

log, exp?  What code are you looking at now?

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (44 preceding siblings ...)
  2010-03-16 16:07 ` rguenther at suse dot de
@ 2010-03-16 16:39 ` dominiq at lps dot ens dot fr
  2010-03-16 16:59 ` jakub at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 16:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #43 from dominiq at lps dot ens dot fr  2010-03-16 16:38 -------
> log, exp?  What code are you looking at now?

AFAIK all pow(a,b) boils down to exp(b*log(a)), unless special values: n,
n/2.0, n/3.0, ... for 'b' are handled in a different way.

So from what I know about coding, replacing pow(a,b) with multiplications,
sqrt, and cbrt is almost always a win for speed (as shown by this pr, although
you can probably write corner cases for which it may be not true). For size,
the matter is more complicated and may depend on the use or not of exp and log
in other parts of the program (how compact they are, are static or not, ...),
thus I doubt that replacing x*sqrt(x) with pow(x,1.5) is always a win for size,
even for embedded systems with soft-floats.

It seems to me that controlling the constant exponents through a maximum
integer (instead of POWI_MAX_MULTS) depending on the kind of optimization and
the target would be a better solution
than n==1||optimize_insn_for_speed_p ().


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (45 preceding siblings ...)
  2010-03-16 16:39 ` dominiq at lps dot ens dot fr
@ 2010-03-16 16:59 ` jakub at gcc dot gnu dot org
  2010-03-16 17:14 ` dominiq at lps dot ens dot fr
                   ` (15 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-03-16 16:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #44 from jakub at gcc dot gnu dot org  2010-03-16 16:58 -------
-Os optimizes for size current translation unit, it doesn't (nor easily can)
guess whether or not you are linking libm.a or libm.so and whether in the
former case using a call would be the only place that calls some routine (when
linking against shared library of course this doesn't make any sense, you
always get it).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (46 preceding siblings ...)
  2010-03-16 16:59 ` jakub at gcc dot gnu dot org
@ 2010-03-16 17:14 ` dominiq at lps dot ens dot fr
  2010-03-18 18:30 ` dominiq at lps dot ens dot fr
                   ` (14 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 17:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #45 from dominiq at lps dot ens dot fr  2010-03-16 17:13 -------
> -Os optimizes for size current translation unit, it doesn't (nor easily can)
> guess whether or not you are linking libm.a or libm.so and whether in the
> former case using a call would be the only place that calls some routine (when
> linking against shared library of course this doesn't make any sense, you
> always get it).

Yes, indeed! However I am pretty sure that expanding pow(x,n) as an optimal
sequence of multiply will always be a win (speed, size and accuracy) for at
least 8<n<15, i.e., a few multiplies on targets with hard-floats, and so on for
n/2.0 and n/3.0. 

Now one of my concern related to this pr is that I don't know how to use at the
same time generic optimization and keep x*sqrt(x) instead of pow(x,1.5) if I
know that for my target this is the right thing to do.

Now I think it is important to answer my questions when and why in
http://gcc.gnu.org/ml/gcc/2010-03/msg00179.html .


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (47 preceding siblings ...)
  2010-03-16 17:14 ` dominiq at lps dot ens dot fr
@ 2010-03-18 18:30 ` dominiq at lps dot ens dot fr
  2010-03-19 10:26 ` rguenth at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-18 18:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #46 from dominiq at lps dot ens dot fr  2010-03-18 18:29 -------
The answer to the question (b) in comment #35:

> (b) why !optimize_size has been replaced with optimize_insn_for_speed_p ()?

seems to be

> this patch replace some of optimize_size tests by
> optimize_insn_for_speed_p predicate so we can make decisions on per-BB
> granuality.

from http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00121.html  (revision 138565
by hubicka, Sun Aug 3 12:04:49 2008 UTC).

Why is there any need to expand pow(x,n) "on per-BB granularity"? is not
!optimize_size enough for this case?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (48 preceding siblings ...)
  2010-03-18 18:30 ` dominiq at lps dot ens dot fr
@ 2010-03-19 10:26 ` rguenth at gcc dot gnu dot org
  2010-03-19 10:35 ` rguenth at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-19 10:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #47 from rguenth at gcc dot gnu dot org  2010-03-19 10:26 -------
(In reply to comment #46)
> The answer to the question (b) in comment #35:
> 
> > (b) why !optimize_size has been replaced with optimize_insn_for_speed_p ()?
> 
> seems to be
> 
> > this patch replace some of optimize_size tests by
> > optimize_insn_for_speed_p predicate so we can make decisions on per-BB
> > granuality.
> 
> from http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00121.html  (revision 138565
> by hubicka, Sun Aug 3 12:04:49 2008 UTC).
> 
> Why is there any need to expand pow(x,n) "on per-BB granularity"? is not
> !optimize_size enough for this case?

optimize_insn_for_speed_p is more precise in that it allows hot functions
to be optimized for speed even with -Os.  This is quite important for
embedded targets where you generally want to optimize for size but want
performance sensitive parts to be optimized for speed.

I think there are two good solutions to this PR.

 1) re-work how the profile is computed for deep loop nests

 2) improve the code-size estimate of these expanders (a simple convincing
 heuristic is that if the target has an optab for sqrt then x * sqrt (x)
 is not going to be larger than pow(x, 1.5)).

2) would fix the air case but not really the underlying problem which is
1).  2) would be easy to implement and appropriate for 4.5 - I can't see
how to address 1) with a reasonably sized patch.

Note that this PR isn't too serious as -fwhole-file isn't the default
for Fortran so we do not run into this unfortunate interaction of
profile estimation and inlining.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (49 preceding siblings ...)
  2010-03-19 10:26 ` rguenth at gcc dot gnu dot org
@ 2010-03-19 10:35 ` rguenth at gcc dot gnu dot org
  2010-03-19 15:40 ` dominiq at lps dot ens dot fr
                   ` (11 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-19 10:35 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #48 from rguenth at gcc dot gnu dot org  2010-03-19 10:35 -------
Untested patch doing 2):

Index: builtins.c
===================================================================
--- builtins.c  (revision 157561)
+++ builtins.c  (working copy)
@@ -2980,10 +2980,16 @@ expand_builtin_pow (tree exp, rtx target
          && ((flag_unsafe_math_optimizations
               && optimize_insn_for_speed_p ()
               && powi_cost (n/2) <= POWI_MAX_MULTS)
-             /* Even the c==0.5 case cannot be done unconditionally
+             /* Even the c == 0.5 case cannot be done unconditionally
                 when we need to preserve signed zeros, as
                 pow (-0, 0.5) is +0, while sqrt(-0) is -0.  */
-             || (!HONOR_SIGNED_ZEROS (mode) && n == 1)))
+             || (!HONOR_SIGNED_ZEROS (mode) && n == 1)
+             /* For c == 1.5 we can assume that x * sqrt (x) is always
+                smaller than pow (x, 1.5) if sqrt will not be expanded
+                as a call.  */
+             || (n == 2
+                 && (optab_handler (sqrt_optab, mode)->insn_code
+                     != CODE_FOR_nothing))))
        {
          tree call_expr = build_call_nofold (fn, 1, narg0);
          /* Use expand_expr in case the newly built call expression


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (50 preceding siblings ...)
  2010-03-19 10:35 ` rguenth at gcc dot gnu dot org
@ 2010-03-19 15:40 ` dominiq at lps dot ens dot fr
  2010-03-20 13:03 ` dominiq at lps dot ens dot fr
                   ` (10 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-19 15:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #49 from dominiq at lps dot ens dot fr  2010-03-19 15:40 -------
A few remarks about comments #47 and #48

> Note that this PR isn't too serious as -fwhole-file isn't the default
> for Fortran so we do not run into this unfortunate interaction of
> profile estimation and inlining.

The test in comment #31 shows that you don't need -fwhole-file nor inlining to
trigger this PR.

> +             || (n == 2

Isn't it n==3?

I have done some tests of replacing 1.5 in the test in comment #31 with some
other values (up to 15.5, but not in a systematic way). On
x86_64-apple-darwin10, the multiplications are always a win for size (based on
the size of a.out) over the call to pow. Is my metric flawed? If yes, what
should I use? If no could embedded system experts have a look at this kind of
optimization?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (51 preceding siblings ...)
  2010-03-19 15:40 ` dominiq at lps dot ens dot fr
@ 2010-03-20 13:03 ` dominiq at lps dot ens dot fr
  2010-03-20 13:21 ` dominiq at lps dot ens dot fr
                   ` (9 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-20 13:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #50 from dominiq at lps dot ens dot fr  2010-03-20 13:02 -------
> optimize_insn_for_speed_p is more precise in that it allows hot functions
> to be optimized for speed even with -Os.  This is quite important for
> embedded targets where you generally want to optimize for size but want
> performance sensitive parts to be optimized for speed.

If so, should not 

  return optimize_function_for_size_p (cfun) || !crtl->maybe_hot_insn_p;

be

  return optimize_function_for_size_p (cfun) && !crtl->maybe_hot_insn_p;

i.e., true only if optimize_function_for_size_p is true AND
crtl->maybe_hot_insn_p false?

In the same line, should not

bool
optimize_function_for_size_p (struct function *fun)
{
  return (optimize_size
          || (fun && (fun->function_frequency
                      == FUNCTION_FREQUENCY_UNLIKELY_EXECUTED)));
}

be

bool
optimize_function_for_size_p (struct function *fun)
{
  return (optimize_size
          && (fun && (fun->function_frequency
                      == FUNCTION_FREQUENCY_UNLIKELY_EXECUTED)));
}

?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (52 preceding siblings ...)
  2010-03-20 13:03 ` dominiq at lps dot ens dot fr
@ 2010-03-20 13:21 ` dominiq at lps dot ens dot fr
  2010-03-20 14:19 ` rguenther at suse dot de
                   ` (8 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-20 13:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #51 from dominiq at lps dot ens dot fr  2010-03-20 13:21 -------
The following patch fixes this pr:

--- ../_clean/gcc/predict.c     2009-11-25 18:20:33.000000000 +0100
+++ gcc/predict.c       2010-03-20 14:03:33.000000000 +0100
@@ -251,7 +251,7 @@ optimize_edge_for_speed_p (edge e)
 bool
 optimize_insn_for_size_p (void)
 {
-  return optimize_function_for_size_p (cfun) || !crtl->maybe_hot_insn_p;
+  return optimize_function_for_size_p (cfun) && !crtl->maybe_hot_insn_p;
 }

 /* Return TRUE when BB should be optimized for speed.  */

If the optimize_*_p procs are intended to allow optimization for speed with -Os
and "hot" part of codes, it seems that the logic of the implementation should
be checked carefully.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (53 preceding siblings ...)
  2010-03-20 13:21 ` dominiq at lps dot ens dot fr
@ 2010-03-20 14:19 ` rguenther at suse dot de
  2010-03-20 14:40 ` dominiq at lps dot ens dot fr
                   ` (7 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2010-03-20 14:19 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #52 from rguenther at suse dot de  2010-03-20 14:19 -------
Subject: Re:  [4.4/4.5 Regression] Weird interaction
 between optimize_insn_for_speed_p and -funsafe-math-optimizations

On Sat, 20 Mar 2010, dominiq at lps dot ens dot fr wrote:

> ------- Comment #51 from dominiq at lps dot ens dot fr  2010-03-20 13:21 -------
> The following patch fixes this pr:
> 
> --- ../_clean/gcc/predict.c     2009-11-25 18:20:33.000000000 +0100
> +++ gcc/predict.c       2010-03-20 14:03:33.000000000 +0100
> @@ -251,7 +251,7 @@ optimize_edge_for_speed_p (edge e)
>  bool
>  optimize_insn_for_size_p (void)
>  {
> -  return optimize_function_for_size_p (cfun) || !crtl->maybe_hot_insn_p;
> +  return optimize_function_for_size_p (cfun) && !crtl->maybe_hot_insn_p;
>  }
> 
>  /* Return TRUE when BB should be optimized for speed.  */
> 
> If the optimize_*_p procs are intended to allow optimization for speed with -Os
> and "hot" part of codes, it seems that the logic of the implementation should
> be checked carefully.

optimize_function_for_size_p (cfun) is true if attribute(cold) is set
on it or we are optimizing for size.

The only issue that exists with the predicates is that they are
implemented symmetrically (optimize_*_for_speed_p is 
!optimize_*_for_size_p) but the low-level implementations check
for extremes like FUNCTION_FREQUENCY_UNLIKELY_EXECUTED where
negation would be FUNCTION_FREQUENCY_HOT, not
FUNCTION_FREQUENCY_HOT || FUNCTION_FREQUENCY_NORMAL.

Thus, for example optimize_function_for_size_p would better read

  if (fun && fun->function_frequency == 
FUNCTION_FREQUENCY_UNLIKELY_EXECUTED)
   return true;
 else if (fun && fun->function_frequency == FUNCTION_FREQUENCY_HOT)
   return false
 return optimize_size;

thus optimize_size should be the default that applies when the
(guessed) profile doesn't give a strong hint.

Likewise optimize_bb_for_size_p needs to disregard the case where
optimize_function_for_size_p returns optimize_size and only then
ask maybe_hot_bb_p.  Thus there should be low-level fns that
return a tri-state, true, false and "default".

But that's all too much change for 4.5.  Eventually you can
play with adjusting just optimize_function_for_size_p as indicated
above.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (54 preceding siblings ...)
  2010-03-20 14:19 ` rguenther at suse dot de
@ 2010-03-20 14:40 ` dominiq at lps dot ens dot fr
  2010-03-20 15:00 ` rguenth at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-20 14:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #53 from dominiq at lps dot ens dot fr  2010-03-20 14:40 -------
> optimize_function_for_size_p (cfun) is true if attribute(cold) is set
> on it or we are optimizing for size.

It is what is presently implemented. As a consequence (illustrated by this pr),
optimize for speed is not obeyed if attribute(cold) is set on. I don't see the
interest of that: If I want optimization for speed, I just want it.

>From comment #47, I got the impression that the intended behavior is the
following:
if optimized for size is on (-Os) then it is overridden if the block is marked
as "hot" (it is not clear for me that it is !attribute(cold)). From this
impression the truth table I expect is the following for
optimize_function_for_size_p:

"hot"        0      1
-Os         1      0
-O[1-3]  0     0

and not

"cold"       0    1
-Os          1    1
-O[1-3]  0    1

as presently implemented.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (55 preceding siblings ...)
  2010-03-20 14:40 ` dominiq at lps dot ens dot fr
@ 2010-03-20 15:00 ` rguenth at gcc dot gnu dot org
  2010-03-20 15:12 ` rguenth at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-20 15:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #54 from rguenth at gcc dot gnu dot org  2010-03-20 14:59 -------
(In reply to comment #53)
> > optimize_function_for_size_p (cfun) is true if attribute(cold) is set
> > on it or we are optimizing for size.
> 
> It is what is presently implemented. As a consequence (illustrated by this pr),
> optimize for speed is not obeyed if attribute(cold) is set on. I don't see the
> interest of that: If I want optimization for speed, I just want it.
> 
> From comment #47, I got the impression that the intended behavior is the
> following:
> if optimized for size is on (-Os) then it is overridden if the block is marked
> as "hot" (it is not clear for me that it is !attribute(cold)). From this
> impression the truth table I expect is the following for
> optimize_function_for_size_p:
> 
> "hot"        0      1
> -Os         1      0
> -O[1-3]  0     0
> 
> and not
> 
> "cold"       0    1
> -Os          1    1
> -O[1-3]  0    1
> 
> as presently implemented.

The intent is

           "hot" "cold" nothing
  -Os        0      1     1
  -O[1-3]    0      1     0

implemented is as far as I see

           "hot" "cold" nothing
  -Os        1      1     1
  -O[1-3]    0      1     0

thus optimize_function_for_{size,speed}_p fully correct for -O[1-3].


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (56 preceding siblings ...)
  2010-03-20 15:00 ` rguenth at gcc dot gnu dot org
@ 2010-03-20 15:12 ` rguenth at gcc dot gnu dot org
  2010-03-22 10:36 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-20 15:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #55 from rguenth at gcc dot gnu dot org  2010-03-20 15:12 -------
(In reply to comment #54)
> (In reply to comment #53)
> > > optimize_function_for_size_p (cfun) is true if attribute(cold) is set
> > > on it or we are optimizing for size.
> > 
> > It is what is presently implemented. As a consequence (illustrated by this pr),
> > optimize for speed is not obeyed if attribute(cold) is set on. I don't see the
> > interest of that: If I want optimization for speed, I just want it.
> > 
> > From comment #47, I got the impression that the intended behavior is the
> > following:
> > if optimized for size is on (-Os) then it is overridden if the block is marked
> > as "hot" (it is not clear for me that it is !attribute(cold)). From this
> > impression the truth table I expect is the following for
> > optimize_function_for_size_p:
> > 
> > "hot"        0      1
> > -Os         1      0
> > -O[1-3]  0     0
> > 
> > and not
> > 
> > "cold"       0    1
> > -Os          1    1
> > -O[1-3]  0    1
> > 
> > as presently implemented.
> 
> The intent is
> 
>            "hot" "cold" nothing
>   -Os        0      1     1
>   -O[1-3]    0      1     0
> 
> implemented is as far as I see
> 
>            "hot" "cold" nothing
>   -Os        1      1     1
>   -O[1-3]    0      1     0
> 
> thus optimize_function_for_{size,speed}_p fully correct for -O[1-3].

The issue is the || !crtl->maybe_hot_insn_p in optimize_insn_for_size_p
which boils down to !maybe_hot_frequency_p (bb->freq) which has at the
end

  if (freq < BB_FREQ_MAX / PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION))
    return false;
  return true;

thus it really only tells if a frequency is hot or not, its negation
doesn't autmatically means its frequency is cold.

Thus, maybe_hot_bb_p should properly honor [!]optimize_size for the
default case where a bb is neither hot nor cold.

In the end this won't save us from the underlying issue in this PR
where frequency scaling makes blocks appear as cold when they are not,
simply due to the loop depth predictors (they should maybe be limited
to a loop depth of 3 or so).  And this is really Honza's area of
expertise (well, at least its all his code).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (57 preceding siblings ...)
  2010-03-20 15:12 ` rguenth at gcc dot gnu dot org
@ 2010-03-22 10:36 ` rguenth at gcc dot gnu dot org
  2010-03-22 12:38 ` rguenth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-22 10:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #56 from rguenth at gcc dot gnu dot org  2010-03-22 10:36 -------
I'm testing fixed comment #48.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|hubicka at gcc dot gnu dot  |rguenth at gcc dot gnu dot
                   |org                         |org
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2009-10-18 13:22:22         |2010-03-22 10:36:35
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (58 preceding siblings ...)
  2010-03-22 10:36 ` rguenth at gcc dot gnu dot org
@ 2010-03-22 12:38 ` rguenth at gcc dot gnu dot org
  2010-03-22 12:39 ` [Bug middle-end/40106] [4.4 " rguenth at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-22 12:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #57 from rguenth at gcc dot gnu dot org  2010-03-22 12:38 -------
Subject: Bug 40106

Author: rguenth
Date: Mon Mar 22 12:38:02 2010
New Revision: 157623

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157623
Log:
2010-03-22  Richard Guenther  <rguenther@suse.de>

        PR middle-end/40106
        * builtins.c (expand_builtin_pow): Expand pow (x, 1.5) as
        x * sqrt (x) even when optimizing for size if the target
        has native support for sqrt.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/builtins.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (59 preceding siblings ...)
  2010-03-22 12:38 ` rguenth at gcc dot gnu dot org
@ 2010-03-22 12:39 ` rguenth at gcc dot gnu dot org
  2010-03-25 17:38 ` hubicka at gcc dot gnu dot org
  2010-04-30  9:01 ` jakub at gcc dot gnu dot org
  62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-22 12:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #58 from rguenth at gcc dot gnu dot org  2010-03-22 12:39 -------
Fixed for 4.5.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|rguenth at gcc dot gnu dot  |unassigned at gcc dot gnu
                   |org                         |dot org
             Status|ASSIGNED                    |NEW
      Known to work|                            |4.5.0
            Summary|[4.4/4.5 Regression] Weird  |[4.4 Regression] Weird
                   |interaction between         |interaction between
                   |optimize_insn_for_speed_p   |optimize_insn_for_speed_p
                   |and -funsafe-math-          |and -funsafe-math-
                   |optimizations               |optimizations


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (60 preceding siblings ...)
  2010-03-22 12:39 ` [Bug middle-end/40106] [4.4 " rguenth at gcc dot gnu dot org
@ 2010-03-25 17:38 ` hubicka at gcc dot gnu dot org
  2010-04-30  9:01 ` jakub at gcc dot gnu dot org
  62 siblings, 0 replies; 64+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2010-03-25 17:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #59 from hubicka at gcc dot gnu dot org  2010-03-25 17:37 -------
Hi,
concerning the optimize_*_for_size and maybe_hot_*_p predicates, the idea is
that maybe_hot/probably_cold care about the profile alone.  So when optimizing
for size, parts of program still can be considered hot and this can be used by
optimizers if doing so does not increase code size (i.e. one can trade copy in
hot block for copy in cold block even at -Os).

optimize_*_for_size should be aware of the defaults - with -Os everything is by
default optimized for size unless user asks otherwise and with ohter levels
only probably cold sutuff (that is negation of maybe_hot) is optimized for
size.

Let me check if there are some problems, but I guess this is just problem with
too many nested loops leading to too large frequency differences.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Bug middle-end/40106] [4.4 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
  2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
                   ` (61 preceding siblings ...)
  2010-03-25 17:38 ` hubicka at gcc dot gnu dot org
@ 2010-04-30  9:01 ` jakub at gcc dot gnu dot org
  62 siblings, 0 replies; 64+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-30  9:01 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.4                       |4.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106


^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2010-04-30  8:55 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
2009-05-12 11:52 ` [Bug middle-end/40106] " hubicka at gcc dot gnu dot org
2009-05-12 13:23 ` dominiq at lps dot ens dot fr
2009-05-12 14:47 ` rguenther at suse dot de
2009-05-12 16:18 ` dominiq at lps dot ens dot fr
2009-05-22 20:39 ` dominiq at lps dot ens dot fr
2009-05-22 20:41 ` dominiq at lps dot ens dot fr
2009-05-22 20:52 ` dominiq at lps dot ens dot fr
2009-07-13 15:29 ` burnus at gcc dot gnu dot org
2009-08-25 11:56 ` dominiq at lps dot ens dot fr
2009-08-25 12:01 ` [Bug middle-end/40106] Time increase " dominiq at lps dot ens dot fr
2009-08-25 12:22 ` [Bug middle-end/40106] Time increase with inlining " rguenth at gcc dot gnu dot org
2009-08-25 12:30 ` dominiq at lps dot ens dot fr
2009-08-25 12:40 ` rguenther at suse dot de
2009-08-25 12:51 ` dominiq at lps dot ens dot fr
2009-08-25 15:31 ` dominiq at lps dot ens dot fr
2009-08-25 21:25 ` [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization dominiq at lps dot ens dot fr
2009-08-27 21:59 ` dominiq at lps dot ens dot fr
2009-08-28  1:09 ` howarth at nitro dot med dot uc dot edu
2009-08-28  5:39 ` dominiq at lps dot ens dot fr
2009-08-28  7:19 ` dominiq at lps dot ens dot fr
2009-08-28 12:01 ` dominiq at lps dot ens dot fr
2009-08-28 12:23 ` dominiq at lps dot ens dot fr
2009-08-28 13:36 ` howarth at nitro dot med dot uc dot edu
2009-08-31 13:06 ` dominiq at lps dot ens dot fr
2009-08-31 15:04 ` dominiq at lps dot ens dot fr
2009-08-31 15:21 ` jv244 at cam dot ac dot uk
2009-08-31 15:23 ` rguenther at suse dot de
2009-08-31 23:59 ` dominiq at lps dot ens dot fr
2009-09-01  9:37 ` dominiq at lps dot ens dot fr
2009-09-03  7:10 ` [Bug middle-end/40106] [4.4/4.5 Regression] " dominiq at lps dot ens dot fr
2009-09-03 11:20 ` dominiq at lps dot ens dot fr
2009-09-06 22:15 ` rguenth at gcc dot gnu dot org
2009-09-18  8:58 ` rguenth at gcc dot gnu dot org
2009-10-15 12:49 ` jakub at gcc dot gnu dot org
2009-10-18 13:22 ` rguenth at gcc dot gnu dot org
2009-12-15 16:40 ` rguenth at gcc dot gnu dot org
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
2010-02-25 17:20 ` [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations dominiq at lps dot ens dot fr
2010-03-16 15:07 ` dominiq at lps dot ens dot fr
2010-03-16 15:11 ` rguenther at suse dot de
2010-03-16 15:26 ` rguenth at gcc dot gnu dot org
2010-03-16 15:50 ` dominiq at lps dot ens dot fr
2010-03-16 15:52 ` rguenther at suse dot de
2010-03-16 16:04 ` dominiq at lps dot ens dot fr
2010-03-16 16:07 ` rguenther at suse dot de
2010-03-16 16:39 ` dominiq at lps dot ens dot fr
2010-03-16 16:59 ` jakub at gcc dot gnu dot org
2010-03-16 17:14 ` dominiq at lps dot ens dot fr
2010-03-18 18:30 ` dominiq at lps dot ens dot fr
2010-03-19 10:26 ` rguenth at gcc dot gnu dot org
2010-03-19 10:35 ` rguenth at gcc dot gnu dot org
2010-03-19 15:40 ` dominiq at lps dot ens dot fr
2010-03-20 13:03 ` dominiq at lps dot ens dot fr
2010-03-20 13:21 ` dominiq at lps dot ens dot fr
2010-03-20 14:19 ` rguenther at suse dot de
2010-03-20 14:40 ` dominiq at lps dot ens dot fr
2010-03-20 15:00 ` rguenth at gcc dot gnu dot org
2010-03-20 15:12 ` rguenth at gcc dot gnu dot org
2010-03-22 10:36 ` rguenth at gcc dot gnu dot org
2010-03-22 12:38 ` rguenth at gcc dot gnu dot org
2010-03-22 12:39 ` [Bug middle-end/40106] [4.4 " rguenth at gcc dot gnu dot org
2010-03-25 17:38 ` hubicka at gcc dot gnu dot org
2010-04-30  9:01 ` jakub at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).