public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90
@ 2009-05-11 18:04 dominiq at lps dot ens dot fr
2009-05-12 11:52 ` [Bug middle-end/40106] " hubicka at gcc dot gnu dot org
` (62 more replies)
0 siblings, 63 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-11 18:04 UTC (permalink / raw)
To: gcc-bugs
The run time of air.f90 of the Polyhedron test suite takes ~15% more time when
compiled with -fwhole-file than without the option. I have checked that the
subroutines DERIV(X|Y) are inlined with -finline-limit=100, but not with
-finline-limit=50 (for the later I recover the timing without -fwhole-file).
What I have found very odd is that if I manually inline only a single call (see
below) I get the same timing that with all of them (2*14) inlined. This is the
case for trunk and gfortran 4.4.0, but not for 4.3.3 which gives a slower
executable.
I have inlined
SUBROUTINE DERIVX(D,U,Ux,Al,Np,Nd,M)
IMPLICIT REAL*8(A-H,O-Z)
PARAMETER (NX=150,NY=150)
DIMENSION D(NX,33) , U(NX,NY) , Ux(NX,NY) , Al(30) , Np(30)
DO jm = 1 , M
jmax = 0
jmin = 1
DO i = 1 , Nd
jmax = jmax + Np(i) + 1
DO j = jmin , jmax
uxt = 0.
DO k = 0 , Np(i)
uxt = uxt + D(j,k+1)*U(jmin+k,jm)
ENDDO
Ux(j,jm) = uxt*Al(i)
ENDDO
!
jmin = jmin + Np(i) + 1
ENDDO
ENDDO
CONTINUE
END
at line 793 as
! CALL DERIVX(DX,f4,f4x,ALX,NPX,NDX,MXPy)
DO jm = 1 , MXPy
jmax = 0
jmin = 1
DO i = 1 , NDX
jmax = jmax + NPX(i) + 1
DO j = jmin , jmax
uxt = 0.
DO k = 0 , NPX(i)
uxt = uxt + DX(j,k+1)*f4(jmin+k,jm)
ENDDO
f4x(j,jm) = uxt*ALX(i)
ENDDO
jmin = jmin + NPX(i) + 1
ENDDO
ENDDO
--
Summary: Time increase with inlining for the Polyhedron test
air.f90
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dominiq at lps dot ens dot fr
GCC build triplet: i686-apple-darwin9
GCC host triplet: i686-apple-darwin9
GCC target triplet: i686-apple-darwin9
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
@ 2009-05-12 11:52 ` hubicka at gcc dot gnu dot org
2009-05-12 13:23 ` dominiq at lps dot ens dot fr
` (61 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2009-05-12 11:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from hubicka at gcc dot gnu dot org 2009-05-12 11:52 -------
Hmm, the inlined functions has loop depth of 4, that makes it predicted to
iterate quite few times. My guess would be that inlining increases loop depth
that in turn makes GCC to conclude that one of loops that are in fact internal
hot loops are cold. decreasing --param hot-bb-frequency-fraction might help in
this case.
I've seen this in past, just hope it is quite rare.
If we find enough testcases like this, it might make sense to alter the
predicate deciding on hot-bb to always consider innermost loops hot no mater on
their relative frequency. Woud need to have flag on BB or loop structure
always available though.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
2009-05-12 11:52 ` [Bug middle-end/40106] " hubicka at gcc dot gnu dot org
@ 2009-05-12 13:23 ` dominiq at lps dot ens dot fr
2009-05-12 14:47 ` rguenther at suse dot de
` (60 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-12 13:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from dominiq at lps dot ens dot fr 2009-05-12 13:23 -------
> decreasing --param hot-bb-frequency-fraction might help in this case.
I have tried --param hot-bb-frequency-fraction=1 (which seems the smallest
possible value, see pr40119), but it did not changed anything.
What I find very surprising is that the ~15% slow-down appears as soon as one
call is inlined, but without further slow-down with more inlining (I have
tested 4 and -fwhole-file inline 28 of them). If the block was misoptimized I
would expect a slow-down increasing with the number of inlined calls. Could the
problem be related to cache management instead (L1, since L2 is 4Mb on my
core2Duo)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
2009-05-12 11:52 ` [Bug middle-end/40106] " hubicka at gcc dot gnu dot org
2009-05-12 13:23 ` dominiq at lps dot ens dot fr
@ 2009-05-12 14:47 ` rguenther at suse dot de
2009-05-12 16:18 ` dominiq at lps dot ens dot fr
` (59 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2009-05-12 14:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rguenther at suse dot de 2009-05-12 14:47 -------
Subject: Re: Time increase with inlining for the
Polyhedron test air.f90
On Tue, 12 May 2009, dominiq at lps dot ens dot fr wrote:
> ------- Comment #2 from dominiq at lps dot ens dot fr 2009-05-12 13:23 -------
> > decreasing --param hot-bb-frequency-fraction might help in this case.
>
> I have tried --param hot-bb-frequency-fraction=1 (which seems the smallest
> possible value, see pr40119), but it did not changed anything.
>
> What I find very surprising is that the ~15% slow-down appears as soon as one
> call is inlined, but without further slow-down with more inlining (I have
> tested 4 and -fwhole-file inline 28 of them). If the block was misoptimized I
> would expect a slow-down increasing with the number of inlined calls. Could the
> problem be related to cache management instead (L1, since L2 is 4Mb on my
> core2Duo)?
You may be hitting some analysis limits either for maximum loop depth
or similar stuff. There is no other way to analyze what is the difference
in optimizations produced.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (2 preceding siblings ...)
2009-05-12 14:47 ` rguenther at suse dot de
@ 2009-05-12 16:18 ` dominiq at lps dot ens dot fr
2009-05-22 20:39 ` dominiq at lps dot ens dot fr
` (58 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-12 16:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from dominiq at lps dot ens dot fr 2009-05-12 16:18 -------
Assembly code for the inlined inner loop:
L123:
movsd (%rdx), %xmm15
movsd 8(%rdx), %xmm6
mulsd (%rax), %xmm15
mulsd 1200(%rax), %xmm6
movsd 16(%rdx), %xmm4
movsd 24(%rdx), %xmm3
mulsd 2400(%rax), %xmm4
mulsd 3600(%rax), %xmm3
addsd %xmm15, %xmm0
movsd 32(%rdx), %xmm9
movsd 40(%rdx), %xmm1
mulsd 4800(%rax), %xmm9
mulsd 6000(%rax), %xmm1
addsd %xmm6, %xmm0
movsd 48(%rdx), %xmm7
movsd 56(%rdx), %xmm2
addq $64, %rdx
mulsd 7200(%rax), %xmm7
mulsd 8400(%rax), %xmm2
addq $9600, %rax
addsd %xmm4, %xmm0
cmpq %rax, %rcx
addsd %xmm3, %xmm0
addsd %xmm9, %xmm0
addsd %xmm1, %xmm0
addsd %xmm7, %xmm0
addsd %xmm2, %xmm0
jne L123
and in the subroutine DERIVX:
L953:
movsd (%rax), %xmm9
addl $8, %ebx
movsd 8(%rax), %xmm8
mulsd (%rcx), %xmm9
mulsd 1200(%rcx), %xmm8
movsd 16(%rax), %xmm7
movsd 24(%rax), %xmm6
mulsd 2400(%rcx), %xmm7
mulsd 3600(%rcx), %xmm6
addsd %xmm9, %xmm0
movsd 32(%rax), %xmm5
movsd 40(%rax), %xmm4
mulsd 4800(%rcx), %xmm5
mulsd 6000(%rcx), %xmm4
addsd %xmm8, %xmm0
movsd 48(%rax), %xmm3
movsd 56(%rax), %xmm1
addq $64, %rax
mulsd 7200(%rcx), %xmm3
mulsd 8400(%rcx), %xmm1
addq $9600, %rcx
cmpl %edi, %ebx
addsd %xmm7, %xmm0
addsd %xmm6, %xmm0
addsd %xmm5, %xmm0
addsd %xmm4, %xmm0
addsd %xmm3, %xmm0
addsd %xmm1, %xmm0
jne L953
The structure of the outer loops seems quite comparable in both cases.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (3 preceding siblings ...)
2009-05-12 16:18 ` dominiq at lps dot ens dot fr
@ 2009-05-22 20:39 ` dominiq at lps dot ens dot fr
2009-05-22 20:41 ` dominiq at lps dot ens dot fr
` (57 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-22 20:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from dominiq at lps dot ens dot fr 2009-05-22 20:39 -------
Created an attachment (id=17903)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17903&action=view)
air.s file for i686-apple-darwin9 compiled with -m64 -O3 -ffast-math
-funroll-loops
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (4 preceding siblings ...)
2009-05-22 20:39 ` dominiq at lps dot ens dot fr
@ 2009-05-22 20:41 ` dominiq at lps dot ens dot fr
2009-05-22 20:52 ` dominiq at lps dot ens dot fr
` (56 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-22 20:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from dominiq at lps dot ens dot fr 2009-05-22 20:41 -------
Created an attachment (id=17904)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17904&action=view)
air.s file for i686-apple-darwin9 compiled with -m64 -O3 -ffast-math
-funroll-loops -fwhole-file
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (5 preceding siblings ...)
2009-05-22 20:41 ` dominiq at lps dot ens dot fr
@ 2009-05-22 20:52 ` dominiq at lps dot ens dot fr
2009-07-13 15:29 ` burnus at gcc dot gnu dot org
` (55 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-05-22 20:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from dominiq at lps dot ens dot fr 2009-05-22 20:52 -------
I had a closer look at the code and found that the inner loop
DO k = 0 , Np(i)
uxt = uxt + D(j,k+1)*U(jmin+k,jm)
ENDDO
is unrolled 8 times, but Np(i) is always equal to 4, so the relevant part of
the assembly is
...
je L951
testl %esi, %esi
je L915
cmpl $1, %esi
je L945
cmpl $2, %esi
.p2align 4,,5
je L946
cmpl $3, %esi
.p2align 4,,5
je L947
cmpl $4, %esi
.p2align 4,,5
je L948
cmpl $5, %esi
.p2align 4,,5
je L949
cmpl $6, %esi
.p2align 4,,5
je L950
...
where the jump for $5 is the relevant one (this does look an optimal way to
handle the preamble).
I have also done some profiling and found that 'pow$fenv_access_off' in
libSystem.B.dylib (PowerInner for ppc) takes a significant amount of time for
the executable compiled with -fwhole-file.
Any idea why? Note that derivx and derivy are inlined with -fwhole-file and
looking at the *s files attached in comment #5 and #6, everything looks normal
at this point.
i686-apple-darwin9
[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops air.f90
[ibook-dhum] lin/test% rm -f tmp ; time a.out > tmp
8.451u 0.116s 0:08.61 99.4% 0+0k 0+6io 0pf+0w
+ 99.5%, start, a.out
| + 99.5%, main, a.out
| | + 99.4%, MAIN__, a.out
| | | 12.8%, derivy_, a.out
| | | 11.3%, derivx_, a.out
| | | 5.1%, fvsplty2_, a.out
| | | 4.1%, state_, a.out
| | | 3.1%, fvspltx2_, a.out
| | | - 2.8%, _gfortrani_list_formatted_write, libgfortran.3.dylib
| | | + 0.6%, botwall_, a.out
| | | | 0.2%, pow$fenv_access_off, libSystem.B.dylib
| | | | 0.0%, exp, libSystem.B.dylib
| | | | 0.0%, dyld_stub_exp, a.out
| | | + 0.6%, topwall_, a.out
| | | | 0.4%, pow$fenv_access_off, libSystem.B.dylib
| | | | 0.1%, exp, libSystem.B.dylib
| | | | 0.0%, dyld_stub_pow, a.out
| | | + 0.3%, aexit_, a.out
| | | | 0.1%, exp, libSystem.B.dylib
| | | + 0.2%, inlet_, a.out
| | | | 0.1%, exp, libSystem.B.dylib
| | | | 0.0%, log$fenv_access_off, libSystem.B.dylib
| | | 0.2%, log$fenv_access_off, libSystem.B.dylib
| | | - 0.1%, _gfortran_st_write_done, libgfortran.3.dylib
| | | - 0.1%, data_transfer_init, libgfortran.3.dylib
| | | - 0.1%, formatted_transfer, libgfortran.3.dylib
| | | 0.0%, _gfortran_transfer_real, libgfortran.3.dylib
| | 0.0%, _gfortran_st_write, libgfortran.3.dylib
[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops -fwhole-file
air.f90
[ibook-dhum] lin/test% rm -f tmp ; time a.out > tmp
9.752u 0.096s 0:09.90 99.3% 0+0k 0+6io 0pf+0w
+ 99.5%, start, a.out
| + 99.5%, main, a.out
| | + 99.5%, MAIN__, a.out
| | | + 15.0%, pow$fenv_access_off, libSystem.B.dylib <==== Why?
| | | | 0.4%, floorl$fenv_access_off, libSystem.B.dylib
| | | | 0.2%, dyld_stub_fabs, libSystem.B.dylib
| | | | 0.1%, dyld_stub_floorl, libSystem.B.dylib
| | | | 0.1%, fabs$fenv_access_off, libSystem.B.dylib
| | | 4.6%, fvsplty2_, a.out
| | | 3.5%, state_.clone.2, a.out
| | | - 2.9%, _gfortrani_list_formatted_write, libgfortran.3.dylib
| | | 2.8%, fvspltx2_, a.out
| | | + 0.4%, topwall_, a.out
| | | | 0.2%, pow$fenv_access_off, libSystem.B.dylib
| | | | 0.1%, exp, libSystem.B.dylib
| | | + 0.4%, botwall_.clone.3, a.out
| | | | 0.2%, pow$fenv_access_off, libSystem.B.dylib
| | | | 0.0%, exp, libSystem.B.dylib
| | | + 0.3%, aexit_.clone.4, a.out
| | | | 0.1%, exp, libSystem.B.dylib
| | | | 0.0%, log$fenv_access_off, libSystem.B.dylib
| | | 0.3%, dyld_stub_pow, a.out
| | | + 0.2%, inlet_, a.out
| | | | 0.1%, exp, libSystem.B.dylib
| | | | 0.0%, dyld_stub_log, a.out
| | | - 0.2%, _gfortran_st_write_done, libgfortran.3.dylib
| | | - 0.1%, formatted_transfer, libgfortran.3.dylib
| | | - 0.1%, data_transfer_init, libgfortran.3.dylib
| | | 0.1%, log$fenv_access_off, libSystem.B.dylib
| | | 0.0%, _gfortrani_flush_if_preconnected, libgfortran.3.dylib
| | 0.0%, pow$fenv_access_off, libSystem.B.dylib
| | 0.0%, _gfortrani_free_internal_unit, libgfortran.3.dylib
powerpc-apple-darwin9
gfc -m64 -O3 -ffast-math -funroll-loops air.f90
- 75.5%, MAIN__, a.out
- 5.9%, derivy_, a.out
- 5.4%, derivx_, a.out
- 4.7%, fvsplty2_, a.out
- 4.2%, fvspltx2_, a.out
- 2.1%, state_, a.out
- 0.6%, dyld_stub_sqrt, a.out
- 0.5%, ml_set_interrupts_enabled, mach_kernel
- 0.2%, sqrt, libSystem.B.dylib
- 0.2%, exp, libSystem.B.dylib
- 0.2%, log, libSystem.B.dylib
- 0.1%, PowerInner, libSystem.B.dylib
- 0.1%, inlet_, a.out
- 0.0%, aexit_, a.out
- 0.0%, dyld_stub_pow, a.out
- 0.0%, botwall_, a.out
- 0.0%, topwall_, a.out
- 0.0%, pow, libSystem.B.dylib
- 0.0%, dyld_stub_log, a.out
- 0.0%, __dtoa, libSystem.B.dylib
- 0.0%, next_format0, libgfortran.3.dylib
- 0.0%, log10, libSystem.B.dylib
- 0.0%, dyld_stub_memset, libSystem.B.dylib
- 0.0%, dyld_stub_memcpy, libgfortran.3.dylib
- 0.0%, dyld_stub_exp, a.out
- 0.0%, dyld_stub___sfvwrite, libSystem.B.dylib
- 0.0%, __vfprintf, libSystem.B.dylib
- 0.0%, __quorem_D2A, libSystem.B.dylib
- 0.0%, __Bfree_D2A, libSystem.B.dylib
gfc -m64 -O3 -ffast-math -funroll-loops -fwhole-file air.f90
- 82.6%, MAIN__, a.out
- 5.3%, PowerInner, libSystem.B.dylib <==== Why?
- 4.3%, fvsplty2_, a.out
- 3.2%, fvspltx2_, a.out
- 1.9%, state_.clone.2, a.out
- 1.3%, pow, libSystem.B.dylib
- 0.4%, ml_set_interrupts_enabled, mach_kernel
- 0.4%, dyld_stub_sqrt, a.out
- 0.1%, log, libSystem.B.dylib
- 0.1%, dyld_stub_pow, a.out
- 0.1%, sqrt, libSystem.B.dylib
- 0.1%, exp, libSystem.B.dylib
- 0.0%, inlet_, a.out
- 0.0%, botwall_.clone.3, a.out
- 0.0%, topwall_, a.out
- 0.0%, aexit_.clone.4, a.out
- 0.0%, dyld_stub_log, a.out
- 0.0%, dyld_stub_localeconv_l, libSystem.B.dylib
- 0.0%, dyld_stub_exp, a.out
- 0.0%, dyld_stub___pow5mult_D2A, libSystem.B.dylib
- 0.0%, data_transfer_init, libgfortran.3.dylib
- 0.0%, __umodti3, libgfortran.3.dylib
- 0.0%, __dtoa, libSystem.B.dylib
- 0.0%, __Bfree_D2A, libSystem.B.dylib
- 0.0%, __Balloc_D2A, libSystem.B.dylib
--
dominiq at lps dot ens dot fr changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jh at suse dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (6 preceding siblings ...)
2009-05-22 20:52 ` dominiq at lps dot ens dot fr
@ 2009-07-13 15:29 ` burnus at gcc dot gnu dot org
2009-08-25 11:56 ` dominiq at lps dot ens dot fr
` (54 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: burnus at gcc dot gnu dot org @ 2009-07-13 15:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from burnus at gcc dot gnu dot org 2009-07-13 15:29 -------
(Not restricted to Darwin, happens also on x86-64-linux.)
--
burnus at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |burnus at gcc dot gnu dot
| |org
GCC build triplet|i686-apple-darwin9 |
GCC host triplet|i686-apple-darwin9 |
GCC target triplet|i686-apple-darwin9 |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (7 preceding siblings ...)
2009-07-13 15:29 ` burnus at gcc dot gnu dot org
@ 2009-08-25 11:56 ` dominiq at lps dot ens dot fr
2009-08-25 12:01 ` [Bug middle-end/40106] Time increase " dominiq at lps dot ens dot fr
` (53 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 11:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from dominiq at lps dot ens dot fr 2009-08-25 11:55 -------
I see a similar slowdown with the patch in
http://gcc.gnu.org/ml/fortran/2009-08/msg00361.html (see
http://gcc.gnu.org/ml/fortran/2009-08/msg00377.html). I suspect it is related
to pr41098, but I don't know how to show it.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (8 preceding siblings ...)
2009-08-25 11:56 ` dominiq at lps dot ens dot fr
@ 2009-08-25 12:01 ` dominiq at lps dot ens dot fr
2009-08-25 12:22 ` [Bug middle-end/40106] Time increase with inlining " rguenth at gcc dot gnu dot org
` (52 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 12:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from dominiq at lps dot ens dot fr 2009-08-25 12:01 -------
> I see a similar slowdown with the patch in ...
I have again forgotten to say that I saw the slowdown without the -fwhole-file
option.
I have changed the summary to reflect that.
--
dominiq at lps dot ens dot fr changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Time increase with inlining |Time increase for the
|for the Polyhedron test |Polyhedron test air.f90
|air.f90 |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (9 preceding siblings ...)
2009-08-25 12:01 ` [Bug middle-end/40106] Time increase " dominiq at lps dot ens dot fr
@ 2009-08-25 12:22 ` rguenth at gcc dot gnu dot org
2009-08-25 12:30 ` dominiq at lps dot ens dot fr
` (51 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-25 12:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from rguenth at gcc dot gnu dot org 2009-08-25 12:22 -------
We clone quite a few functions with -fwhole-file but appearantly we fail to
apply constant propagation for &CONST_DECL arguments which is a pity. In fact
we seem to clone them without any change.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mjambor at suse dot cz
Summary|Time increase for the |Time increase with inlining
|Polyhedron test air.f90 |for the Polyhedron test
| |air.f90
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (10 preceding siblings ...)
2009-08-25 12:22 ` [Bug middle-end/40106] Time increase with inlining " rguenth at gcc dot gnu dot org
@ 2009-08-25 12:30 ` dominiq at lps dot ens dot fr
2009-08-25 12:40 ` rguenther at suse dot de
` (50 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 12:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from dominiq at lps dot ens dot fr 2009-08-25 12:30 -------
>From comment #9, I think inlining is just exposing a latent missed optimization
related to the way the middle end handle pow(). This is why I changed the
summary.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (11 preceding siblings ...)
2009-08-25 12:30 ` dominiq at lps dot ens dot fr
@ 2009-08-25 12:40 ` rguenther at suse dot de
2009-08-25 12:51 ` dominiq at lps dot ens dot fr
` (49 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2009-08-25 12:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from rguenther at suse dot de 2009-08-25 12:40 -------
Subject: Re: Time increase with inlining for the
Polyhedron test air.f90
On Tue, 25 Aug 2009, dominiq at lps dot ens dot fr wrote:
> ------- Comment #12 from dominiq at lps dot ens dot fr 2009-08-25 12:30 -------
> From comment #9, I think inlining is just exposing a latent missed optimization
> related to the way the middle end handle pow(). This is why I changed the
> summary.
I don't think the issue is pow expansion. Does -fno-ipa-cp fix the
regression?
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (12 preceding siblings ...)
2009-08-25 12:40 ` rguenther at suse dot de
@ 2009-08-25 12:51 ` dominiq at lps dot ens dot fr
2009-08-25 15:31 ` dominiq at lps dot ens dot fr
` (48 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 12:51 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from dominiq at lps dot ens dot fr 2009-08-25 12:51 -------
> I don't think the issue is pow expansion.
What I do see from different means is that the number of calls to pow()
increases from 63,907,869 to 1,953,139,629. Since pow() is not exactly cheap, I
think this could be sufficient to explain the 1.8s difference I see. Note that
the code has plenty of x**2 and x**a where a is real.
> Does -fno-ipa-cp fix the regression?
No.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase with inlining for the Polyhedron test air.f90
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (13 preceding siblings ...)
2009-08-25 12:51 ` dominiq at lps dot ens dot fr
@ 2009-08-25 15:31 ` dominiq at lps dot ens dot fr
2009-08-25 21:25 ` [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization dominiq at lps dot ens dot fr
` (47 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 15:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from dominiq at lps dot ens dot fr 2009-08-25 15:30 -------
I think I have made some progress to understand the problem:
(1) The 1,953,139,629 or so calls to pow() are the non optimized base.
(2) For working situations this number is reduced to 63,907,869 or so when
using the -funsafe-math-optimizations option:
[ibook-dhum] lin/test% time a.out > /dev/null
11.348u 0.049s 0:11.41 99.7% 0+0k 0+7io 0pf+0w
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.464u 0.046s 0:08.52 99.7% 0+0k 0+8io 0pf+0w
[ibook-dhum] lin/test% gfc -fwhole-file -m64 -O2 -funsafe-math-optimizations
air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.471u 0.047s 0:08.53 99.7% 0+0k 0+7io 0pf+0w
so with -O2 -funsafe-math-optimizations the optimization is still there with
-fwhole-file.
(3) The critical option with -fwhole-file is -finline-functions:
[ibook-dhum] lin/test% gfc -m64 -O2 -finline-functions
-funsafe-math-optimizations air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.464u 0.045s 0:08.52 99.7% 0+0k 0+8io 0pf+0w
[ibook-dhum] lin/test% gfc -fwhole-file -m64 -O2 -finline-functions
-funsafe-math-optimizations air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
10.053u 0.046s 0:10.11 99.8% 0+0k 0+8io 0pf+0w
Note that the patch in http://gcc.gnu.org/ml/fortran/2009-08/msg00361.html
seems to prevent the optimization coming from -funsafe-math-optimizations (see
http://gcc.gnu.org/ml/fortran/2009-08/msg00390.html ).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (14 preceding siblings ...)
2009-08-25 15:31 ` dominiq at lps dot ens dot fr
@ 2009-08-25 21:25 ` dominiq at lps dot ens dot fr
2009-08-27 21:59 ` dominiq at lps dot ens dot fr
` (46 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-25 21:25 UTC (permalink / raw)
To: gcc-bugs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 864 bytes --]
------- Comment #16 from dominiq at lps dot ens dot fr 2009-08-25 21:25 -------
After some discussion on IRC with Tobias Schlüter, it seems that the problem
comes from bad optimizations that are broken by chance with the original code.
Commenting line 139:
WRITE (6,*) i , spx(i) , epx(i) , NPX(i)
is enough to go from ~8.5s to ~10.2s without having nothing to do with
-fwhole-file or Tobias' patch.
--
dominiq at lps dot ens dot fr changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Time increase with inlining |Time increase for the
|for the Polyhedron test |Polyhedron test air.f90 due
|air.f90 |to bad optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (15 preceding siblings ...)
2009-08-25 21:25 ` [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization dominiq at lps dot ens dot fr
@ 2009-08-27 21:59 ` dominiq at lps dot ens dot fr
2009-08-28 1:09 ` howarth at nitro dot med dot uc dot edu
` (45 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-27 21:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from dominiq at lps dot ens dot fr 2009-08-27 21:59 -------
Created an attachment (id=18439)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18439&action=view)
reduced test without any subroutine
I have attached a reduced test without any subroutine. It requires the same
input as air.f90, but do not expect meaningful results. As such I get:
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.306u 0.015s 0:04.32 99.7% 0+0k 0+1io 0pf+0w
If I comment line 94
WRITE (6,*) i , spx(i) , epx(i) , NPX(i)
I get
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
6.464u 0.020s 0:06.49 99.8% 0+0k 0+2io 0pf+0w
Among the weirdness of this pr, if I comment also the line 502
WRITE (7,*) MXPx , MXPy
I get
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.273u 0.014s 0:04.29 99.7% 0+0k 0+0io 0pf+0w
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (16 preceding siblings ...)
2009-08-27 21:59 ` dominiq at lps dot ens dot fr
@ 2009-08-28 1:09 ` howarth at nitro dot med dot uc dot edu
2009-08-28 5:39 ` dominiq at lps dot ens dot fr
` (44 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-08-28 1:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from howarth at nitro dot med dot uc dot edu 2009-08-28 01:09 -------
Why don't you go back to the original test case and see which component of
-funsafe-math-optimizations...
-fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math
is actually causing the problem.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (17 preceding siblings ...)
2009-08-28 1:09 ` howarth at nitro dot med dot uc dot edu
@ 2009-08-28 5:39 ` dominiq at lps dot ens dot fr
2009-08-28 7:19 ` dominiq at lps dot ens dot fr
` (43 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-28 5:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from dominiq at lps dot ens dot fr 2009-08-28 05:39 -------
> Why don't you go back to the original test case and see which component of
> -funsafe-math-optimizations...
>
> -fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math
>
> is actually causing the problem.
See http://gcc.gnu.org/ml/fortran/2009-08/msg00390.html :
I have dug the problem a little bit more and found that the key
option is -funsafe-math-optimizations. I tried to refine that, but as
usual this option is not the sum of -fassociative-math -fno-signed-zeros
-fno-trapping-math -freciprocal-math as said in the manual!-(
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (18 preceding siblings ...)
2009-08-28 5:39 ` dominiq at lps dot ens dot fr
@ 2009-08-28 7:19 ` dominiq at lps dot ens dot fr
2009-08-28 12:01 ` dominiq at lps dot ens dot fr
` (42 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-28 7:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from dominiq at lps dot ens dot fr 2009-08-28 07:19 -------
It it helps, I get for the reduced test with the line 94:
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.555u
0.016s 0:04.57 99.7% 0+0k 0+2io 0pf+0w
without it
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
6.632u
0.020s 0:06.66 99.8% 0+0k 0+0io 0pf+0w
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (19 preceding siblings ...)
2009-08-28 7:19 ` dominiq at lps dot ens dot fr
@ 2009-08-28 12:01 ` dominiq at lps dot ens dot fr
2009-08-28 12:23 ` dominiq at lps dot ens dot fr
` (41 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-28 12:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from dominiq at lps dot ens dot fr 2009-08-28 12:01 -------
And finally the winner is -fstrict-overflow!
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
6.472u 0.020s 0:06.50 99.8% 0+0k 0+2io 0pf+0w <=== bad
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.307u 0.016s 0:04.33 99.5% 0+0k 0+0io 0pf+0w <=== good
[ibook-dhum] lin/test% gfc -m64 -O1 -funsafe-math-optimizations air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
4.347u 0.016s 0:04.37 99.5% 0+0k 0+1io 0pf+0w <=== good
[ibook-dhum] lin/test% gfc -m64 -O1 -funsafe-math-optimizations
-fstrict-overflow air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
5.962u 0.019s 0:05.99 99.6% 0+0k 0+2io 0pf+0w <=== bad
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (20 preceding siblings ...)
2009-08-28 12:01 ` dominiq at lps dot ens dot fr
@ 2009-08-28 12:23 ` dominiq at lps dot ens dot fr
2009-08-28 13:36 ` howarth at nitro dot med dot uc dot edu
` (40 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-28 12:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from dominiq at lps dot ens dot fr 2009-08-28 12:23 -------
For the original air.f90 I get:
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
9.572u 0.055s 0:09.66 99.5% 0+0k 0+9io 1pf+0w
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.446u 0.046s 0:08.50 99.7% 0+0k 0+8io 0pf+0w
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations air.f90
Commenting the write in line 139, it becomes
[ibook-dhum] lin/test% time a.out > /dev/null
10.083u 0.052s 0:10.15 99.8% 0+0k 0+7io 0pf+0w
[ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
9.531u 0.045s 0:09.58 99.8% 0+0k 0+7io 0pf+0w
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (21 preceding siblings ...)
2009-08-28 12:23 ` dominiq at lps dot ens dot fr
@ 2009-08-28 13:36 ` howarth at nitro dot med dot uc dot edu
2009-08-31 13:06 ` dominiq at lps dot ens dot fr
` (39 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-08-28 13:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from howarth at nitro dot med dot uc dot edu 2009-08-28 13:36 -------
(In reply to comment #20)
> It it helps, I get for the reduced test with the line 94:
>
> [ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
> -fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
> air_db.f90
> [ibook-dhum] lin/test% time a.out > /dev/null
> 4.555u
> 0.016s 0:04.57 99.7% 0+0k 0+2io 0pf+0w
>
> without it
>
> [ibook-dhum] lin/test% gfc -m64 -O2 -funsafe-math-optimizations
> -fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
> air_db.f90
> [ibook-dhum] lin/test% time a.out > /dev/null
> 6.632u
> 0.020s 0:06.66 99.8% 0+0k 0+0io 0pf+0w
>
Aren't these compile lines identical? Also, why are you passing
funsafe-math-optimizations. I meant that you should use...
-fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math
instead and work through all of the possible combinations with the inverse
forms -fsigned-zeros, -ftrapping-math, -fno-associative-math and
-fno-reciprocal-math which is 16 combinations.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (22 preceding siblings ...)
2009-08-28 13:36 ` howarth at nitro dot med dot uc dot edu
@ 2009-08-31 13:06 ` dominiq at lps dot ens dot fr
2009-08-31 15:04 ` dominiq at lps dot ens dot fr
` (38 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-31 13:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from dominiq at lps dot ens dot fr 2009-08-31 13:06 -------
(In reply to comment #23)
> Aren't these compile lines identical?
Apparently no, -funsafe-math-optimizations turns on optimization(s) that cannot
be undone by
-fno-signed-zeros -fno-trapping-math -fno-associative-math -fno-reciprocal-math
> I meant that you should use...
>
> -fno-signed-zeros -fno-trapping-math -fassociative-math -freciprocal-math
>
with commented write:
ibook-dhum] lin/test% gfc -m64 -O2 -fno-signed-zeros -fno-trapping-math
-fassociative-math -freciprocal-math air_db.f90
[ibook-dhum] lin/test% time a.out > /dev/null
6.194u 0.017s 0:06.21 99.8% 0+0k 0+1io 0pf+0w
with write:
[ibook-dhum] lin/test% gfc -m64 -O2 -fsigned-zeros -ftrapping-math
-fassociative-math -freciprocal-math air_db.f90
f951: warning: -fassociative-math disabled; other options take precedence
[ibook-dhum] lin/test% time a.out > /dev/null
6.306u 0.018s 0:06.33 99.6% 0+0k 0+2io 0pf+0w
> instead and work through all of the possible combinations with the inverse
> forms -fsigned-zeros, -ftrapping-math, -fno-associative-math and
> -fno-reciprocal-math which is 16 combinations.
I had no intention to try the 16 combinations as they are ineffective, the key
optimization being hidden behind funsafe-math-optimization with all the
explicit optimization disabled. As said in comment #21 the other key option is
-fstrict-overflow.
I know that all these facts do not make sense, but if you have doubts you can
redo the tests yourself.
As a side comment it would be nice for debugging purpose that the options
combinations of sub-options do not have hidden optimizations (yes I know there
a sentence about that in the manual).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (23 preceding siblings ...)
2009-08-31 13:06 ` dominiq at lps dot ens dot fr
@ 2009-08-31 15:04 ` dominiq at lps dot ens dot fr
2009-08-31 15:21 ` jv244 at cam dot ac dot uk
` (37 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-31 15:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from dominiq at lps dot ens dot fr 2009-08-31 15:04 -------
If I compare the results of -fdump-tree-original for the first 2 cases of
comment #21 I get:
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fdump-tree-original air_db.f90
[ibook-dhum] test/dbg_air% mv air_db.f90.003t.original
air_db.f90.003t.original-no
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow -fdump-tree-original air_db.f90
[ibook-dhum] test/dbg_air% diff -u air_db.f90.003t.original
air_db.f90.003t.original-no
--- air_db.f90.003t.original 2009-08-31 17:01:34.000000000 +0200
+++ air_db.f90.003t.original-no 2009-08-31 17:00:39.000000000 +0200
@@ -548,7 +548,7 @@
logical(kind=4) D.1668;
ict = (integer(kind=4)) (ict + 1);
- if (npx[(integer(kind=8)) i + -1] + 1 >
j)
+ if (NON_LVALUE_EXPR
<npx[(integer(kind=8)) i + -1]> >= j)
{
ddx[((integer(kind=8)) ict +
(integer(kind=8)) k * 150) + -151] = xp1[((integer(kind=8)) (ict + 1) +
(integer(kind=8)) k * 150) + -151] - xp1[((integer(kind=8)) ict +
(integer(kind=8)) k * 150) + -151];
}
@@ -621,7 +621,7 @@
logical(kind=4) D.1680;
ict = (integer(kind=4)) (ict + 1);
- if (npy[(integer(kind=8)) i + -1] + 1 >
j)
+ if (NON_LVALUE_EXPR
<npy[(integer(kind=8)) i + -1]> >= j)
{
ddy[((integer(kind=8)) k +
(integer(kind=8)) ict * 150) + -151] = yp1[((integer(kind=8)) k +
((integer(kind=8)) ict + 1) * 150) + -151] - yp1[((integer(kind=8)) k +
(integer(kind=8)) ict * 150) + -151];
}
where NON_LVALUE_EXPR appear when the test is compiled without
-fno-strict-overflow.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (24 preceding siblings ...)
2009-08-31 15:04 ` dominiq at lps dot ens dot fr
@ 2009-08-31 15:21 ` jv244 at cam dot ac dot uk
2009-08-31 15:23 ` rguenther at suse dot de
` (36 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: jv244 at cam dot ac dot uk @ 2009-08-31 15:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from jv244 at cam dot ac dot uk 2009-08-31 15:20 -------
(In reply to comment #25)
> - if (npx[(integer(kind=8)) i + -1] + 1 >
> j)
> + if (NON_LVALUE_EXPR
> <npx[(integer(kind=8)) i + -1]> >= j)> where NON_LVALUE_EXPR appear when the test is compiled without
> -fno-strict-overflow.
I wonder if this is a case where the optimizers would benefit from exploiting
the fact that in Fortran integers can never overflow in a valid program ?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (25 preceding siblings ...)
2009-08-31 15:21 ` jv244 at cam dot ac dot uk
@ 2009-08-31 15:23 ` rguenther at suse dot de
2009-08-31 23:59 ` dominiq at lps dot ens dot fr
` (35 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2009-08-31 15:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #27 from rguenther at suse dot de 2009-08-31 15:23 -------
Subject: Re: Time increase for the Polyhedron test
air.f90 due to bad optimization
On Mon, 31 Aug 2009, jv244 at cam dot ac dot uk wrote:
> ------- Comment #26 from jv244 at cam dot ac dot uk 2009-08-31 15:20 -------
> (In reply to comment #25)
> > - if (npx[(integer(kind=8)) i + -1] + 1 >
> > j)
> > + if (NON_LVALUE_EXPR
> > <npx[(integer(kind=8)) i + -1]> >= j)> where NON_LVALUE_EXPR appear when the test is compiled without
> > -fno-strict-overflow.
>
> I wonder if this is a case where the optimizers would benefit from exploiting
> the fact that in Fortran integers can never overflow in a valid program ?
In fact it does with -fstrict-overflow.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (26 preceding siblings ...)
2009-08-31 15:23 ` rguenther at suse dot de
@ 2009-08-31 23:59 ` dominiq at lps dot ens dot fr
2009-09-01 9:37 ` dominiq at lps dot ens dot fr
` (34 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-31 23:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #28 from dominiq at lps dot ens dot fr 2009-08-31 23:59 -------
Following Richard Guenther's suggestion on IRC, I have tested the following
patch:
--- ../_gcc_clean/gcc/builtins.c 2009-08-31 15:07:18.000000000 +0200
+++ gcc/builtins.c 2009-09-01 01:28:09.000000000 +0200
@@ -3012,7 +3012,7 @@
real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0);
if (real_identical (&c2, &cint)
&& ((flag_unsafe_math_optimizations
- && optimize_insn_for_speed_p ()
+ /* && optimize_insn_for_speed_p () */
&& powi_cost (n/2) <= POWI_MAX_MULTS)
|| n == 1))
{
With it I get:
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations air_db.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.490u 0.018s 0:04.51 99.7% 0+0k 0+3io 0pf+0w
compared to
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.320u 0.015s 0:04.34 99.7% 0+0k 0+0io 0pf+0w
and there is no call to pow in the assembly. I think the difference is
significant; so it seems that optimize_insn_for_speed_p () is playing some role
elsewhere in the code. Note that if I replace lines 322 and 427
mu = mu0*(T(i,j)/t02)**1.5*(t02+110.56)/(T(i,j)+110.56)
with
mu = mu0*sqrt((T(i,j)/t02)**3)*(t02+110.56)/(T(i,j)+110.56)
or
mu =
mu0*sqrt((T(i,j)/t02))*(T(i,j)/t02)*(t02+110.56)/(T(i,j)+110.56)
there is no call to pow and the code is slightly faster with
-fno-strict-overflow
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_db_1.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.323u 0.015s 0:04.34 99.7% 0+0k 0+0io 0pf+0w
[ibook-dhum] test/dbg_air% gfc -m64 -O2 -funsafe-math-optimizations
air_db_1.f90
[ibook-dhum] test/dbg_air% time a.out > /dev/null
4.527u 0.016s 0:04.55 99.5% 0+0k 0+0io 0pf+0w
The original air.f90 compiled with -fwhole-file gives
[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param
min-vect-loop-bound=2 -fwhole-file air.f90
[ibook-dhum] lin/test% time a.out > /dev/null
8.358u 0.049s 0:08.42 99.6% 0+0k 0+8io 0pf+0w
compared to
[ibook-dhum] lin/test% gfc -m64 -O3 -ffast-math -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -finline-limit=600 --param
min-vect-loop-bound=2 air.f90
[[ibook-dhum] lin/test% time a.out > /dev/null
8.273u 0.046s 0:08.32 99.8% 0+0k 0+0io 0pf+0w
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (27 preceding siblings ...)
2009-08-31 23:59 ` dominiq at lps dot ens dot fr
@ 2009-09-01 9:37 ` dominiq at lps dot ens dot fr
2009-09-03 7:10 ` [Bug middle-end/40106] [4.4/4.5 Regression] " dominiq at lps dot ens dot fr
` (33 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-09-01 9:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #29 from dominiq at lps dot ens dot fr 2009-09-01 09:37 -------
Does anyone understand why commenting a write can change crtl->maybe_hot_insn_p
from 1 to 0?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (28 preceding siblings ...)
2009-09-01 9:37 ` dominiq at lps dot ens dot fr
@ 2009-09-03 7:10 ` dominiq at lps dot ens dot fr
2009-09-03 11:20 ` dominiq at lps dot ens dot fr
` (32 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-09-03 7:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #30 from dominiq at lps dot ens dot fr 2009-09-03 07:09 -------
This is a regression from gcc 4.3.4 (gfc=trunk r151295, gfc44=4.4.1,
gfc43=4.3.4):
[ibook-dhum] test/dbg_air% gfc -S -m64 -O2 -funsafe-math-optimizations
air_db.f90
[ibook-dhum] test/dbg_air% grep pow air_db.s
call
_pow
call _pow
[ibook-dhum] test/dbg_air% gfc44 -S -m64 -O2 -funsafe-math-optimizations
air_db.f90
[ibook-dhum] test/dbg_air% grep pow air_db.s
call _pow
call _pow
[ibook-dhum] test/dbg_air% gfc43 -S -m64 -O2 -funsafe-math-optimizations
air_db.f90
[ibook-dhum] test/dbg_air% grep pow air_db.s
[ibook-dhum] test/dbg_air%
--
dominiq at lps dot ens dot fr changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Time increase for the |[4.4/4.5 Regression] Time
|Polyhedron test air.f90 due |increase for the Polyhedron
|to bad optimization |test air.f90 due to bad
| |optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (29 preceding siblings ...)
2009-09-03 7:10 ` [Bug middle-end/40106] [4.4/4.5 Regression] " dominiq at lps dot ens dot fr
@ 2009-09-03 11:20 ` dominiq at lps dot ens dot fr
2009-09-06 22:15 ` rguenth at gcc dot gnu dot org
` (31 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-09-03 11:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #31 from dominiq at lps dot ens dot fr 2009-09-03 11:20 -------
More reduced nonfunctional (invalid) test to show the problem:
IMPLICIT REAL*8(a-H,O-Z)
PARAMETER (NX=150,NY=150)
DIMENSION NPX(30), FV2(NX,NY), T(NX,NY), dtt(NX,NY)
do it = 1, 2000
DO i = 1 , MXPx
DO j = 1 , MXPy
FV2(i,j) = T(i,j)**1.5
ENDDO
ENDDO
DO ix = 1 , NDX
maxx = maxx + NPX(ix) + 1
DO iy = 1 , NDY
DO i = minx , maxx
DO j = miny , maxy
dtt(i,j) = dtd
ENDDO
ENDDO
miny = miny + NPX(iy) + 1
ENDDO
ENDDO
end do
WRITE (7,*) MXPx , MXPy
END
[ibook-dhum] test/dbg_air% gfc -S -m64 -O2 -funsafe-math-optimizations
air_red.f90
[ibook-dhum] test/dbg_air% grep pow air_red.s
call _pow
[ibook-dhum] test/dbg_air% gfc -S -m64 -O2 -funsafe-math-optimizations
-fno-strict-overflow air_red.f90
[ibook-dhum] test/dbg_air% grep pow air_red.s
[ibook-dhum] test/dbg_air%
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (30 preceding siblings ...)
2009-09-03 11:20 ` dominiq at lps dot ens dot fr
@ 2009-09-06 22:15 ` rguenth at gcc dot gnu dot org
2009-09-18 8:58 ` rguenth at gcc dot gnu dot org
` (30 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-09-06 22:15 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.4.2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (31 preceding siblings ...)
2009-09-06 22:15 ` rguenth at gcc dot gnu dot org
@ 2009-09-18 8:58 ` rguenth at gcc dot gnu dot org
2009-10-15 12:49 ` jakub at gcc dot gnu dot org
` (29 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-09-18 8:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #32 from rguenth at gcc dot gnu dot org 2009-09-18 08:58 -------
Honza, this is yours.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |hubicka at gcc dot gnu dot
|dot org |org
Priority|P3 |P1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (32 preceding siblings ...)
2009-09-18 8:58 ` rguenth at gcc dot gnu dot org
@ 2009-10-15 12:49 ` jakub at gcc dot gnu dot org
2009-10-18 13:22 ` rguenth at gcc dot gnu dot org
` (28 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: jakub at gcc dot gnu dot org @ 2009-10-15 12:49 UTC (permalink / raw)
To: gcc-bugs
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.4.2 |4.4.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (33 preceding siblings ...)
2009-10-15 12:49 ` jakub at gcc dot gnu dot org
@ 2009-10-18 13:22 ` rguenth at gcc dot gnu dot org
2009-12-15 16:40 ` rguenth at gcc dot gnu dot org
` (27 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-10-18 13:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #33 from rguenth at gcc dot gnu dot org 2009-10-18 13:22 -------
It looks like basic-block frequencies are completely off. The BB in question
is
# BLOCK 7 freq:3
# PRED: 6 [100.0%] (fallthru,exec) 7 [99.0%] (false,exec)
# ivtmp.65_38 = PHI <ivtmp.65_113(6), ivtmp.65_129(7)>
# ivtmp.68_147 = PHI <ivtmp.68_151(6), ivtmp.68_148(7)>
D.1360_26 = MEM[index: ivtmp.65_38];
D.1404_30 = pow (D.1360_26, 1.5e+0);
MEM[index: ivtmp.68_147] = D.1404_30;
ivtmp.65_129 = ivtmp.65_38 + 1200;
ivtmp.68_148 = ivtmp.68_147 + 1200;
if (ivtmp.77_32 == ivtmp.65_129)
goto <bb 8>;
else
goto <bb 7>;
# SUCC: 8 [1.0%] (true,exec) 7 [99.0%] (false,exec)
And 3 is lower than 11, the minimum frequency a BB is considered not cold.
Predictions for bb 7
DS theory heuristics (ignored): 0.1%
first match heuristics: 1.0%
combined heuristics: 1.0%
opcode values nonequal (on trees) heuristics (ignored): 28.0%
loop branch heuristics (ignored): 14.0%
guessed loop iterations heuristics: 1.0%
but I see most blocks do not have a frequency at all and I also see
# BLOCK 17 freq:10000
# PRED: 16 [100.0%] (fallthru,exec) 17 [99.0%] (false,exec)
# ivtmp.16_116 = PHI <ivtmp.16_125(16), ivtmp.16_115(17)>
MEM[index: ivtmp.16_116] = dtd_56(D);
ivtmp.16_115 = ivtmp.16_116 + 1200;
if (ivtmp.27_12 == ivtmp.16_115)
goto <bb 18>;
else
goto <bb 17>;
# SUCC: 18 [1.0%] (true,exec) 17 [99.0%] (false,exec)
which is the block with the highest frequency (the innermost loop of the
2nd nest).
I can imagine that with a lot of inlining and exposing very deep nested
loops alongside very hot not-so-deep loops can cause them to become
artificially cold.
Interestingly the outermost loop blocks do not have any frequency
assigned (that probably means zero).
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-10-18 13:22:22
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (34 preceding siblings ...)
2009-10-18 13:22 ` rguenth at gcc dot gnu dot org
@ 2009-12-15 16:40 ` rguenth at gcc dot gnu dot org
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
` (26 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-12-15 16:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #34 from rguenth at gcc dot gnu dot org 2009-12-15 16:40 -------
4.4 is also slow, we know what causes it so this can't be P1.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P1 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Time increase for the Polyhedron test air.f90 due to bad optimization
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (35 preceding siblings ...)
2009-12-15 16:40 ` rguenth at gcc dot gnu dot org
@ 2010-01-21 13:16 ` jakub at gcc dot gnu dot org
2010-02-25 17:20 ` [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations dominiq at lps dot ens dot fr
` (25 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-01-21 13:16 UTC (permalink / raw)
To: gcc-bugs
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.4.3 |4.4.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (36 preceding siblings ...)
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
@ 2010-02-25 17:20 ` dominiq at lps dot ens dot fr
2010-03-16 15:07 ` dominiq at lps dot ens dot fr
` (24 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-02-25 17:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #35 from dominiq at lps dot ens dot fr 2010-02-25 17:20 -------
I changed the summary to reflect the status of this pr (see comment #31). I
think the following questions should be answered:
(a) why optimize_insn_for_speed_p is changed by the options
-funsafe-math-optimizations and -fno-strict-overflow?
(b) why !optimize_size (lines 2929 and 2953 of
http://gcc.gnu.org/viewcvs/branches/gcc-4_3-branch/gcc/builtins.c?revision=151052&view=markup&sortby=file
last 4.3 revision) has been replaced with optimize_insn_for_speed_p () (lines
2961 and 2985 of
http://gcc.gnu.org/viewcvs/branches/gcc-4_4-branch/gcc/builtins.c?revision=145122&view=markup&sortby=file
first 4.4 revision)?
Side question, is there anybody really convinced that replacing pow(a,b) with
powi(a,n) when b==n is not always a win even for -Os?
Note that the replacement for x**(n/3) * cbrt(x)**(n%3) does not seems guarded
by any optimisation flag.
--
dominiq at lps dot ens dot fr changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[4.4/4.5 Regression] Time |[4.4/4.5 Regression] Weird
|increase for the Polyhedron |interaction between
|test air.f90 due to bad |optimize_insn_for_speed_p
|optimization |and -funsafe-math-
| |optimizations
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (37 preceding siblings ...)
2010-02-25 17:20 ` [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations dominiq at lps dot ens dot fr
@ 2010-03-16 15:07 ` dominiq at lps dot ens dot fr
2010-03-16 15:11 ` rguenther at suse dot de
` (23 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 15:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #36 from dominiq at lps dot ens dot fr 2010-03-16 15:06 -------
> Note that the replacement for x**(n/3) * cbrt(x)**(n%3) does not seems guarded
> by any optimisation flag.
The condition is implemented further down in the code and I missed it:
if (real_identical (&c2, &c)
&& ((optimize_insn_for_speed_p ()
&& powi_cost (n/3) <= POWI_MAX_MULTS)
|| n == 1))
Why the condition optimize_insn_for_speed_p () is not part of
if (fn != NULL_TREE
&& flag_unsafe_math_optimizations
&& (tree_expr_nonnegative_p (arg0)
|| !HONOR_NANS (mode)))
?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (38 preceding siblings ...)
2010-03-16 15:07 ` dominiq at lps dot ens dot fr
@ 2010-03-16 15:11 ` rguenther at suse dot de
2010-03-16 15:26 ` rguenth at gcc dot gnu dot org
` (22 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2010-03-16 15:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #37 from rguenther at suse dot de 2010-03-16 15:11 -------
Subject: Re: [4.4/4.5 Regression] Weird interaction
between optimize_insn_for_speed_p and -funsafe-math-optimizations
On Tue, 16 Mar 2010, dominiq at lps dot ens dot fr wrote:
>
>
> ------- Comment #36 from dominiq at lps dot ens dot fr 2010-03-16 15:06 -------
> > Note that the replacement for x**(n/3) * cbrt(x)**(n%3) does not seems guarded
> > by any optimisation flag.
>
> The condition is implemented further down in the code and I missed it:
>
> if (real_identical (&c2, &c)
> && ((optimize_insn_for_speed_p ()
> && powi_cost (n/3) <= POWI_MAX_MULTS)
> || n == 1))
>
> Why the condition optimize_insn_for_speed_p () is not part of
>
> if (fn != NULL_TREE
> && flag_unsafe_math_optimizations
> && (tree_expr_nonnegative_p (arg0)
> || !HONOR_NANS (mode)))
>
> ?
Because we unconditionally want to turn pow (x, 1/3) to
cbrt (x) as it is smaller.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (39 preceding siblings ...)
2010-03-16 15:11 ` rguenther at suse dot de
@ 2010-03-16 15:26 ` rguenth at gcc dot gnu dot org
2010-03-16 15:50 ` dominiq at lps dot ens dot fr
` (21 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-16 15:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #38 from rguenth at gcc dot gnu dot org 2010-03-16 15:26 -------
Btw, the testcase has
D.1610_34 = __builtin_pow (D.1564_28, 1.5e+0);
which would expand to
D.1564_28 * sqrt (D.1564_28)
which is estimated as being larger than the call to pow. Now this isn't
exactly
true if the target has a sqrt insn, but we do not implement such a
sophisticated
size check.
Especially on embedded targets with soft-float the multiplication would
add a significant code size penalty.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (40 preceding siblings ...)
2010-03-16 15:26 ` rguenth at gcc dot gnu dot org
@ 2010-03-16 15:50 ` dominiq at lps dot ens dot fr
2010-03-16 15:52 ` rguenther at suse dot de
` (20 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 15:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #39 from dominiq at lps dot ens dot fr 2010-03-16 15:49 -------
> Especially on embedded targets with soft-float the multiplication would
> add a significant code size penalty.
Even in this case this would strongly of the code. It may be true if other
pieces require log and exp. If not I seriously doubt that replacing the code
for multiplies and square roots will be larger than the code for log and exp.
My (very limited) understanding of this issue is that at some point x*sqrt(x)
is replaced with pow(x,1.5) (so that pow(x,a)*pow(x,b) is optimized as
pow(x,a+b)). So even if the programmer write x*sqrt(x) (s)he can end up with
pow(x,1.5), resulting in poor performances in term of both speed and size (not
speaking of accuracy).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (41 preceding siblings ...)
2010-03-16 15:50 ` dominiq at lps dot ens dot fr
@ 2010-03-16 15:52 ` rguenther at suse dot de
2010-03-16 16:04 ` dominiq at lps dot ens dot fr
` (19 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2010-03-16 15:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #40 from rguenther at suse dot de 2010-03-16 15:52 -------
Subject: Re: [4.4/4.5 Regression] Weird interaction
between optimize_insn_for_speed_p and -funsafe-math-optimizations
On Tue, 16 Mar 2010, dominiq at lps dot ens dot fr wrote:
>
>
> ------- Comment #39 from dominiq at lps dot ens dot fr 2010-03-16 15:49 -------
> > Especially on embedded targets with soft-float the multiplication would
> > add a significant code size penalty.
>
> Even in this case this would strongly of the code. It may be true if other
> pieces require log and exp. If not I seriously doubt that replacing the code
> for multiplies and square roots will be larger than the code for log and exp.
Parse error.
> My (very limited) understanding of this issue is that at some point x*sqrt(x)
> is replaced with pow(x,1.5) (so that pow(x,a)*pow(x,b) is optimized as
> pow(x,a+b)). So even if the programmer write x*sqrt(x) (s)he can end up with
> pow(x,1.5), resulting in poor performances in term of both speed and size (not
> speaking of accuracy).
Yes, that's true. This is what you'd expect when optimizing for size -
turn x*sqrt(x) to pow(x,1.5).
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (42 preceding siblings ...)
2010-03-16 15:52 ` rguenther at suse dot de
@ 2010-03-16 16:04 ` dominiq at lps dot ens dot fr
2010-03-16 16:07 ` rguenther at suse dot de
` (18 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 16:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #41 from dominiq at lps dot ens dot fr 2010-03-16 16:04 -------
> > > Especially on embedded targets with soft-float the multiplication would
> > > add a significant code size penalty.
> >
> > Even in this case this would strongly of the code. It may be true if other
> > pieces require log and exp. If not I seriously doubt that replacing the code
> > for multiplies and square roots will be larger than the code for log and exp.
>
> Parse error.
Sorry, is "stongly depend on the code" and "If not, I seriously doubt that
replacing the code for multiplies and square roots will be larger than the code
for log and exp." better?
pow(a,b) == exp(b*log(a)), so if 'a' is not a constant, you need the code for
log and exp to evaluate x*sqrt(x) as pow(x,1.5), instead of the code for
multiply and sqrt (note that I cannot see how the code for log and exp could
not require the code for multiply). If log or exp codes are not needed by other
parts of the whole program, x*sqrt(x) will almost certainly gives a more
compact code than pow(x,1.5).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (43 preceding siblings ...)
2010-03-16 16:04 ` dominiq at lps dot ens dot fr
@ 2010-03-16 16:07 ` rguenther at suse dot de
2010-03-16 16:39 ` dominiq at lps dot ens dot fr
` (17 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2010-03-16 16:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #42 from rguenther at suse dot de 2010-03-16 16:07 -------
Subject: Re: [4.4/4.5 Regression] Weird interaction
between optimize_insn_for_speed_p and -funsafe-math-optimizations
On Tue, 16 Mar 2010, dominiq at lps dot ens dot fr wrote:
>
>
> ------- Comment #41 from dominiq at lps dot ens dot fr 2010-03-16 16:04 -------
> > > > Especially on embedded targets with soft-float the multiplication would
> > > > add a significant code size penalty.
> > >
> > > Even in this case this would strongly of the code. It may be true if other
> > > pieces require log and exp. If not I seriously doubt that replacing the code
> > > for multiplies and square roots will be larger than the code for log and exp.
> >
> > Parse error.
>
> Sorry, is "stongly depend on the code" and "If not, I seriously doubt that
> replacing the code for multiplies and square roots will be larger than the code
> for log and exp." better?
>
> pow(a,b) == exp(b*log(a)), so if 'a' is not a constant, you need the code for
> log and exp to evaluate x*sqrt(x) as pow(x,1.5), instead of the code for
> multiply and sqrt (note that I cannot see how the code for log and exp could
> not require the code for multiply). If log or exp codes are not needed by other
> parts of the whole program, x*sqrt(x) will almost certainly gives a more
> compact code than pow(x,1.5).
log, exp? What code are you looking at now?
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (44 preceding siblings ...)
2010-03-16 16:07 ` rguenther at suse dot de
@ 2010-03-16 16:39 ` dominiq at lps dot ens dot fr
2010-03-16 16:59 ` jakub at gcc dot gnu dot org
` (16 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 16:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #43 from dominiq at lps dot ens dot fr 2010-03-16 16:38 -------
> log, exp? What code are you looking at now?
AFAIK all pow(a,b) boils down to exp(b*log(a)), unless special values: n,
n/2.0, n/3.0, ... for 'b' are handled in a different way.
So from what I know about coding, replacing pow(a,b) with multiplications,
sqrt, and cbrt is almost always a win for speed (as shown by this pr, although
you can probably write corner cases for which it may be not true). For size,
the matter is more complicated and may depend on the use or not of exp and log
in other parts of the program (how compact they are, are static or not, ...),
thus I doubt that replacing x*sqrt(x) with pow(x,1.5) is always a win for size,
even for embedded systems with soft-floats.
It seems to me that controlling the constant exponents through a maximum
integer (instead of POWI_MAX_MULTS) depending on the kind of optimization and
the target would be a better solution
than n==1||optimize_insn_for_speed_p ().
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (45 preceding siblings ...)
2010-03-16 16:39 ` dominiq at lps dot ens dot fr
@ 2010-03-16 16:59 ` jakub at gcc dot gnu dot org
2010-03-16 17:14 ` dominiq at lps dot ens dot fr
` (15 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-03-16 16:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #44 from jakub at gcc dot gnu dot org 2010-03-16 16:58 -------
-Os optimizes for size current translation unit, it doesn't (nor easily can)
guess whether or not you are linking libm.a or libm.so and whether in the
former case using a call would be the only place that calls some routine (when
linking against shared library of course this doesn't make any sense, you
always get it).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (46 preceding siblings ...)
2010-03-16 16:59 ` jakub at gcc dot gnu dot org
@ 2010-03-16 17:14 ` dominiq at lps dot ens dot fr
2010-03-18 18:30 ` dominiq at lps dot ens dot fr
` (14 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-16 17:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #45 from dominiq at lps dot ens dot fr 2010-03-16 17:13 -------
> -Os optimizes for size current translation unit, it doesn't (nor easily can)
> guess whether or not you are linking libm.a or libm.so and whether in the
> former case using a call would be the only place that calls some routine (when
> linking against shared library of course this doesn't make any sense, you
> always get it).
Yes, indeed! However I am pretty sure that expanding pow(x,n) as an optimal
sequence of multiply will always be a win (speed, size and accuracy) for at
least 8<n<15, i.e., a few multiplies on targets with hard-floats, and so on for
n/2.0 and n/3.0.
Now one of my concern related to this pr is that I don't know how to use at the
same time generic optimization and keep x*sqrt(x) instead of pow(x,1.5) if I
know that for my target this is the right thing to do.
Now I think it is important to answer my questions when and why in
http://gcc.gnu.org/ml/gcc/2010-03/msg00179.html .
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (47 preceding siblings ...)
2010-03-16 17:14 ` dominiq at lps dot ens dot fr
@ 2010-03-18 18:30 ` dominiq at lps dot ens dot fr
2010-03-19 10:26 ` rguenth at gcc dot gnu dot org
` (13 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-18 18:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #46 from dominiq at lps dot ens dot fr 2010-03-18 18:29 -------
The answer to the question (b) in comment #35:
> (b) why !optimize_size has been replaced with optimize_insn_for_speed_p ()?
seems to be
> this patch replace some of optimize_size tests by
> optimize_insn_for_speed_p predicate so we can make decisions on per-BB
> granuality.
from http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00121.html (revision 138565
by hubicka, Sun Aug 3 12:04:49 2008 UTC).
Why is there any need to expand pow(x,n) "on per-BB granularity"? is not
!optimize_size enough for this case?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (48 preceding siblings ...)
2010-03-18 18:30 ` dominiq at lps dot ens dot fr
@ 2010-03-19 10:26 ` rguenth at gcc dot gnu dot org
2010-03-19 10:35 ` rguenth at gcc dot gnu dot org
` (12 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-19 10:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #47 from rguenth at gcc dot gnu dot org 2010-03-19 10:26 -------
(In reply to comment #46)
> The answer to the question (b) in comment #35:
>
> > (b) why !optimize_size has been replaced with optimize_insn_for_speed_p ()?
>
> seems to be
>
> > this patch replace some of optimize_size tests by
> > optimize_insn_for_speed_p predicate so we can make decisions on per-BB
> > granuality.
>
> from http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00121.html (revision 138565
> by hubicka, Sun Aug 3 12:04:49 2008 UTC).
>
> Why is there any need to expand pow(x,n) "on per-BB granularity"? is not
> !optimize_size enough for this case?
optimize_insn_for_speed_p is more precise in that it allows hot functions
to be optimized for speed even with -Os. This is quite important for
embedded targets where you generally want to optimize for size but want
performance sensitive parts to be optimized for speed.
I think there are two good solutions to this PR.
1) re-work how the profile is computed for deep loop nests
2) improve the code-size estimate of these expanders (a simple convincing
heuristic is that if the target has an optab for sqrt then x * sqrt (x)
is not going to be larger than pow(x, 1.5)).
2) would fix the air case but not really the underlying problem which is
1). 2) would be easy to implement and appropriate for 4.5 - I can't see
how to address 1) with a reasonably sized patch.
Note that this PR isn't too serious as -fwhole-file isn't the default
for Fortran so we do not run into this unfortunate interaction of
profile estimation and inlining.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (49 preceding siblings ...)
2010-03-19 10:26 ` rguenth at gcc dot gnu dot org
@ 2010-03-19 10:35 ` rguenth at gcc dot gnu dot org
2010-03-19 15:40 ` dominiq at lps dot ens dot fr
` (11 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-19 10:35 UTC (permalink / raw)
To: gcc-bugs
------- Comment #48 from rguenth at gcc dot gnu dot org 2010-03-19 10:35 -------
Untested patch doing 2):
Index: builtins.c
===================================================================
--- builtins.c (revision 157561)
+++ builtins.c (working copy)
@@ -2980,10 +2980,16 @@ expand_builtin_pow (tree exp, rtx target
&& ((flag_unsafe_math_optimizations
&& optimize_insn_for_speed_p ()
&& powi_cost (n/2) <= POWI_MAX_MULTS)
- /* Even the c==0.5 case cannot be done unconditionally
+ /* Even the c == 0.5 case cannot be done unconditionally
when we need to preserve signed zeros, as
pow (-0, 0.5) is +0, while sqrt(-0) is -0. */
- || (!HONOR_SIGNED_ZEROS (mode) && n == 1)))
+ || (!HONOR_SIGNED_ZEROS (mode) && n == 1)
+ /* For c == 1.5 we can assume that x * sqrt (x) is always
+ smaller than pow (x, 1.5) if sqrt will not be expanded
+ as a call. */
+ || (n == 2
+ && (optab_handler (sqrt_optab, mode)->insn_code
+ != CODE_FOR_nothing))))
{
tree call_expr = build_call_nofold (fn, 1, narg0);
/* Use expand_expr in case the newly built call expression
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (50 preceding siblings ...)
2010-03-19 10:35 ` rguenth at gcc dot gnu dot org
@ 2010-03-19 15:40 ` dominiq at lps dot ens dot fr
2010-03-20 13:03 ` dominiq at lps dot ens dot fr
` (10 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-19 15:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #49 from dominiq at lps dot ens dot fr 2010-03-19 15:40 -------
A few remarks about comments #47 and #48
> Note that this PR isn't too serious as -fwhole-file isn't the default
> for Fortran so we do not run into this unfortunate interaction of
> profile estimation and inlining.
The test in comment #31 shows that you don't need -fwhole-file nor inlining to
trigger this PR.
> + || (n == 2
Isn't it n==3?
I have done some tests of replacing 1.5 in the test in comment #31 with some
other values (up to 15.5, but not in a systematic way). On
x86_64-apple-darwin10, the multiplications are always a win for size (based on
the size of a.out) over the call to pow. Is my metric flawed? If yes, what
should I use? If no could embedded system experts have a look at this kind of
optimization?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (51 preceding siblings ...)
2010-03-19 15:40 ` dominiq at lps dot ens dot fr
@ 2010-03-20 13:03 ` dominiq at lps dot ens dot fr
2010-03-20 13:21 ` dominiq at lps dot ens dot fr
` (9 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-20 13:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #50 from dominiq at lps dot ens dot fr 2010-03-20 13:02 -------
> optimize_insn_for_speed_p is more precise in that it allows hot functions
> to be optimized for speed even with -Os. This is quite important for
> embedded targets where you generally want to optimize for size but want
> performance sensitive parts to be optimized for speed.
If so, should not
return optimize_function_for_size_p (cfun) || !crtl->maybe_hot_insn_p;
be
return optimize_function_for_size_p (cfun) && !crtl->maybe_hot_insn_p;
i.e., true only if optimize_function_for_size_p is true AND
crtl->maybe_hot_insn_p false?
In the same line, should not
bool
optimize_function_for_size_p (struct function *fun)
{
return (optimize_size
|| (fun && (fun->function_frequency
== FUNCTION_FREQUENCY_UNLIKELY_EXECUTED)));
}
be
bool
optimize_function_for_size_p (struct function *fun)
{
return (optimize_size
&& (fun && (fun->function_frequency
== FUNCTION_FREQUENCY_UNLIKELY_EXECUTED)));
}
?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (52 preceding siblings ...)
2010-03-20 13:03 ` dominiq at lps dot ens dot fr
@ 2010-03-20 13:21 ` dominiq at lps dot ens dot fr
2010-03-20 14:19 ` rguenther at suse dot de
` (8 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-20 13:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #51 from dominiq at lps dot ens dot fr 2010-03-20 13:21 -------
The following patch fixes this pr:
--- ../_clean/gcc/predict.c 2009-11-25 18:20:33.000000000 +0100
+++ gcc/predict.c 2010-03-20 14:03:33.000000000 +0100
@@ -251,7 +251,7 @@ optimize_edge_for_speed_p (edge e)
bool
optimize_insn_for_size_p (void)
{
- return optimize_function_for_size_p (cfun) || !crtl->maybe_hot_insn_p;
+ return optimize_function_for_size_p (cfun) && !crtl->maybe_hot_insn_p;
}
/* Return TRUE when BB should be optimized for speed. */
If the optimize_*_p procs are intended to allow optimization for speed with -Os
and "hot" part of codes, it seems that the logic of the implementation should
be checked carefully.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (53 preceding siblings ...)
2010-03-20 13:21 ` dominiq at lps dot ens dot fr
@ 2010-03-20 14:19 ` rguenther at suse dot de
2010-03-20 14:40 ` dominiq at lps dot ens dot fr
` (7 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenther at suse dot de @ 2010-03-20 14:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #52 from rguenther at suse dot de 2010-03-20 14:19 -------
Subject: Re: [4.4/4.5 Regression] Weird interaction
between optimize_insn_for_speed_p and -funsafe-math-optimizations
On Sat, 20 Mar 2010, dominiq at lps dot ens dot fr wrote:
> ------- Comment #51 from dominiq at lps dot ens dot fr 2010-03-20 13:21 -------
> The following patch fixes this pr:
>
> --- ../_clean/gcc/predict.c 2009-11-25 18:20:33.000000000 +0100
> +++ gcc/predict.c 2010-03-20 14:03:33.000000000 +0100
> @@ -251,7 +251,7 @@ optimize_edge_for_speed_p (edge e)
> bool
> optimize_insn_for_size_p (void)
> {
> - return optimize_function_for_size_p (cfun) || !crtl->maybe_hot_insn_p;
> + return optimize_function_for_size_p (cfun) && !crtl->maybe_hot_insn_p;
> }
>
> /* Return TRUE when BB should be optimized for speed. */
>
> If the optimize_*_p procs are intended to allow optimization for speed with -Os
> and "hot" part of codes, it seems that the logic of the implementation should
> be checked carefully.
optimize_function_for_size_p (cfun) is true if attribute(cold) is set
on it or we are optimizing for size.
The only issue that exists with the predicates is that they are
implemented symmetrically (optimize_*_for_speed_p is
!optimize_*_for_size_p) but the low-level implementations check
for extremes like FUNCTION_FREQUENCY_UNLIKELY_EXECUTED where
negation would be FUNCTION_FREQUENCY_HOT, not
FUNCTION_FREQUENCY_HOT || FUNCTION_FREQUENCY_NORMAL.
Thus, for example optimize_function_for_size_p would better read
if (fun && fun->function_frequency ==
FUNCTION_FREQUENCY_UNLIKELY_EXECUTED)
return true;
else if (fun && fun->function_frequency == FUNCTION_FREQUENCY_HOT)
return false
return optimize_size;
thus optimize_size should be the default that applies when the
(guessed) profile doesn't give a strong hint.
Likewise optimize_bb_for_size_p needs to disregard the case where
optimize_function_for_size_p returns optimize_size and only then
ask maybe_hot_bb_p. Thus there should be low-level fns that
return a tri-state, true, false and "default".
But that's all too much change for 4.5. Eventually you can
play with adjusting just optimize_function_for_size_p as indicated
above.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (54 preceding siblings ...)
2010-03-20 14:19 ` rguenther at suse dot de
@ 2010-03-20 14:40 ` dominiq at lps dot ens dot fr
2010-03-20 15:00 ` rguenth at gcc dot gnu dot org
` (6 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-20 14:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #53 from dominiq at lps dot ens dot fr 2010-03-20 14:40 -------
> optimize_function_for_size_p (cfun) is true if attribute(cold) is set
> on it or we are optimizing for size.
It is what is presently implemented. As a consequence (illustrated by this pr),
optimize for speed is not obeyed if attribute(cold) is set on. I don't see the
interest of that: If I want optimization for speed, I just want it.
>From comment #47, I got the impression that the intended behavior is the
following:
if optimized for size is on (-Os) then it is overridden if the block is marked
as "hot" (it is not clear for me that it is !attribute(cold)). From this
impression the truth table I expect is the following for
optimize_function_for_size_p:
"hot" 0 1
-Os 1 0
-O[1-3] 0 0
and not
"cold" 0 1
-Os 1 1
-O[1-3] 0 1
as presently implemented.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (55 preceding siblings ...)
2010-03-20 14:40 ` dominiq at lps dot ens dot fr
@ 2010-03-20 15:00 ` rguenth at gcc dot gnu dot org
2010-03-20 15:12 ` rguenth at gcc dot gnu dot org
` (5 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-20 15:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #54 from rguenth at gcc dot gnu dot org 2010-03-20 14:59 -------
(In reply to comment #53)
> > optimize_function_for_size_p (cfun) is true if attribute(cold) is set
> > on it or we are optimizing for size.
>
> It is what is presently implemented. As a consequence (illustrated by this pr),
> optimize for speed is not obeyed if attribute(cold) is set on. I don't see the
> interest of that: If I want optimization for speed, I just want it.
>
> From comment #47, I got the impression that the intended behavior is the
> following:
> if optimized for size is on (-Os) then it is overridden if the block is marked
> as "hot" (it is not clear for me that it is !attribute(cold)). From this
> impression the truth table I expect is the following for
> optimize_function_for_size_p:
>
> "hot" 0 1
> -Os 1 0
> -O[1-3] 0 0
>
> and not
>
> "cold" 0 1
> -Os 1 1
> -O[1-3] 0 1
>
> as presently implemented.
The intent is
"hot" "cold" nothing
-Os 0 1 1
-O[1-3] 0 1 0
implemented is as far as I see
"hot" "cold" nothing
-Os 1 1 1
-O[1-3] 0 1 0
thus optimize_function_for_{size,speed}_p fully correct for -O[1-3].
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (56 preceding siblings ...)
2010-03-20 15:00 ` rguenth at gcc dot gnu dot org
@ 2010-03-20 15:12 ` rguenth at gcc dot gnu dot org
2010-03-22 10:36 ` rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-20 15:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #55 from rguenth at gcc dot gnu dot org 2010-03-20 15:12 -------
(In reply to comment #54)
> (In reply to comment #53)
> > > optimize_function_for_size_p (cfun) is true if attribute(cold) is set
> > > on it or we are optimizing for size.
> >
> > It is what is presently implemented. As a consequence (illustrated by this pr),
> > optimize for speed is not obeyed if attribute(cold) is set on. I don't see the
> > interest of that: If I want optimization for speed, I just want it.
> >
> > From comment #47, I got the impression that the intended behavior is the
> > following:
> > if optimized for size is on (-Os) then it is overridden if the block is marked
> > as "hot" (it is not clear for me that it is !attribute(cold)). From this
> > impression the truth table I expect is the following for
> > optimize_function_for_size_p:
> >
> > "hot" 0 1
> > -Os 1 0
> > -O[1-3] 0 0
> >
> > and not
> >
> > "cold" 0 1
> > -Os 1 1
> > -O[1-3] 0 1
> >
> > as presently implemented.
>
> The intent is
>
> "hot" "cold" nothing
> -Os 0 1 1
> -O[1-3] 0 1 0
>
> implemented is as far as I see
>
> "hot" "cold" nothing
> -Os 1 1 1
> -O[1-3] 0 1 0
>
> thus optimize_function_for_{size,speed}_p fully correct for -O[1-3].
The issue is the || !crtl->maybe_hot_insn_p in optimize_insn_for_size_p
which boils down to !maybe_hot_frequency_p (bb->freq) which has at the
end
if (freq < BB_FREQ_MAX / PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION))
return false;
return true;
thus it really only tells if a frequency is hot or not, its negation
doesn't autmatically means its frequency is cold.
Thus, maybe_hot_bb_p should properly honor [!]optimize_size for the
default case where a bb is neither hot nor cold.
In the end this won't save us from the underlying issue in this PR
where frequency scaling makes blocks appear as cold when they are not,
simply due to the loop depth predictors (they should maybe be limited
to a loop depth of 3 or so). And this is really Honza's area of
expertise (well, at least its all his code).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (57 preceding siblings ...)
2010-03-20 15:12 ` rguenth at gcc dot gnu dot org
@ 2010-03-22 10:36 ` rguenth at gcc dot gnu dot org
2010-03-22 12:38 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-22 10:36 UTC (permalink / raw)
To: gcc-bugs
------- Comment #56 from rguenth at gcc dot gnu dot org 2010-03-22 10:36 -------
I'm testing fixed comment #48.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|hubicka at gcc dot gnu dot |rguenth at gcc dot gnu dot
|org |org
Status|NEW |ASSIGNED
Last reconfirmed|2009-10-18 13:22:22 |2010-03-22 10:36:35
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (58 preceding siblings ...)
2010-03-22 10:36 ` rguenth at gcc dot gnu dot org
@ 2010-03-22 12:38 ` rguenth at gcc dot gnu dot org
2010-03-22 12:39 ` [Bug middle-end/40106] [4.4 " rguenth at gcc dot gnu dot org
` (2 subsequent siblings)
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-22 12:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #57 from rguenth at gcc dot gnu dot org 2010-03-22 12:38 -------
Subject: Bug 40106
Author: rguenth
Date: Mon Mar 22 12:38:02 2010
New Revision: 157623
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157623
Log:
2010-03-22 Richard Guenther <rguenther@suse.de>
PR middle-end/40106
* builtins.c (expand_builtin_pow): Expand pow (x, 1.5) as
x * sqrt (x) even when optimizing for size if the target
has native support for sqrt.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/builtins.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (59 preceding siblings ...)
2010-03-22 12:38 ` rguenth at gcc dot gnu dot org
@ 2010-03-22 12:39 ` rguenth at gcc dot gnu dot org
2010-03-25 17:38 ` hubicka at gcc dot gnu dot org
2010-04-30 9:01 ` jakub at gcc dot gnu dot org
62 siblings, 0 replies; 64+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-03-22 12:39 UTC (permalink / raw)
To: gcc-bugs
------- Comment #58 from rguenth at gcc dot gnu dot org 2010-03-22 12:39 -------
Fixed for 4.5.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|rguenth at gcc dot gnu dot |unassigned at gcc dot gnu
|org |dot org
Status|ASSIGNED |NEW
Known to work| |4.5.0
Summary|[4.4/4.5 Regression] Weird |[4.4 Regression] Weird
|interaction between |interaction between
|optimize_insn_for_speed_p |optimize_insn_for_speed_p
|and -funsafe-math- |and -funsafe-math-
|optimizations |optimizations
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (60 preceding siblings ...)
2010-03-22 12:39 ` [Bug middle-end/40106] [4.4 " rguenth at gcc dot gnu dot org
@ 2010-03-25 17:38 ` hubicka at gcc dot gnu dot org
2010-04-30 9:01 ` jakub at gcc dot gnu dot org
62 siblings, 0 replies; 64+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2010-03-25 17:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #59 from hubicka at gcc dot gnu dot org 2010-03-25 17:37 -------
Hi,
concerning the optimize_*_for_size and maybe_hot_*_p predicates, the idea is
that maybe_hot/probably_cold care about the profile alone. So when optimizing
for size, parts of program still can be considered hot and this can be used by
optimizers if doing so does not increase code size (i.e. one can trade copy in
hot block for copy in cold block even at -Os).
optimize_*_for_size should be aware of the defaults - with -Os everything is by
default optimized for size unless user asks otherwise and with ohter levels
only probably cold sutuff (that is negation of maybe_hot) is optimized for
size.
Let me check if there are some problems, but I guess this is just problem with
too many nested loops leading to too large frequency differences.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
* [Bug middle-end/40106] [4.4 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
` (61 preceding siblings ...)
2010-03-25 17:38 ` hubicka at gcc dot gnu dot org
@ 2010-04-30 9:01 ` jakub at gcc dot gnu dot org
62 siblings, 0 replies; 64+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-30 9:01 UTC (permalink / raw)
To: gcc-bugs
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.4.4 |4.4.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40106
^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2010-04-30 8:55 UTC | newest]
Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-11 18:04 [Bug middle-end/40106] New: Time increase with inlining for the Polyhedron test air.f90 dominiq at lps dot ens dot fr
2009-05-12 11:52 ` [Bug middle-end/40106] " hubicka at gcc dot gnu dot org
2009-05-12 13:23 ` dominiq at lps dot ens dot fr
2009-05-12 14:47 ` rguenther at suse dot de
2009-05-12 16:18 ` dominiq at lps dot ens dot fr
2009-05-22 20:39 ` dominiq at lps dot ens dot fr
2009-05-22 20:41 ` dominiq at lps dot ens dot fr
2009-05-22 20:52 ` dominiq at lps dot ens dot fr
2009-07-13 15:29 ` burnus at gcc dot gnu dot org
2009-08-25 11:56 ` dominiq at lps dot ens dot fr
2009-08-25 12:01 ` [Bug middle-end/40106] Time increase " dominiq at lps dot ens dot fr
2009-08-25 12:22 ` [Bug middle-end/40106] Time increase with inlining " rguenth at gcc dot gnu dot org
2009-08-25 12:30 ` dominiq at lps dot ens dot fr
2009-08-25 12:40 ` rguenther at suse dot de
2009-08-25 12:51 ` dominiq at lps dot ens dot fr
2009-08-25 15:31 ` dominiq at lps dot ens dot fr
2009-08-25 21:25 ` [Bug middle-end/40106] Time increase for the Polyhedron test air.f90 due to bad optimization dominiq at lps dot ens dot fr
2009-08-27 21:59 ` dominiq at lps dot ens dot fr
2009-08-28 1:09 ` howarth at nitro dot med dot uc dot edu
2009-08-28 5:39 ` dominiq at lps dot ens dot fr
2009-08-28 7:19 ` dominiq at lps dot ens dot fr
2009-08-28 12:01 ` dominiq at lps dot ens dot fr
2009-08-28 12:23 ` dominiq at lps dot ens dot fr
2009-08-28 13:36 ` howarth at nitro dot med dot uc dot edu
2009-08-31 13:06 ` dominiq at lps dot ens dot fr
2009-08-31 15:04 ` dominiq at lps dot ens dot fr
2009-08-31 15:21 ` jv244 at cam dot ac dot uk
2009-08-31 15:23 ` rguenther at suse dot de
2009-08-31 23:59 ` dominiq at lps dot ens dot fr
2009-09-01 9:37 ` dominiq at lps dot ens dot fr
2009-09-03 7:10 ` [Bug middle-end/40106] [4.4/4.5 Regression] " dominiq at lps dot ens dot fr
2009-09-03 11:20 ` dominiq at lps dot ens dot fr
2009-09-06 22:15 ` rguenth at gcc dot gnu dot org
2009-09-18 8:58 ` rguenth at gcc dot gnu dot org
2009-10-15 12:49 ` jakub at gcc dot gnu dot org
2009-10-18 13:22 ` rguenth at gcc dot gnu dot org
2009-12-15 16:40 ` rguenth at gcc dot gnu dot org
2010-01-21 13:16 ` jakub at gcc dot gnu dot org
2010-02-25 17:20 ` [Bug middle-end/40106] [4.4/4.5 Regression] Weird interaction between optimize_insn_for_speed_p and -funsafe-math-optimizations dominiq at lps dot ens dot fr
2010-03-16 15:07 ` dominiq at lps dot ens dot fr
2010-03-16 15:11 ` rguenther at suse dot de
2010-03-16 15:26 ` rguenth at gcc dot gnu dot org
2010-03-16 15:50 ` dominiq at lps dot ens dot fr
2010-03-16 15:52 ` rguenther at suse dot de
2010-03-16 16:04 ` dominiq at lps dot ens dot fr
2010-03-16 16:07 ` rguenther at suse dot de
2010-03-16 16:39 ` dominiq at lps dot ens dot fr
2010-03-16 16:59 ` jakub at gcc dot gnu dot org
2010-03-16 17:14 ` dominiq at lps dot ens dot fr
2010-03-18 18:30 ` dominiq at lps dot ens dot fr
2010-03-19 10:26 ` rguenth at gcc dot gnu dot org
2010-03-19 10:35 ` rguenth at gcc dot gnu dot org
2010-03-19 15:40 ` dominiq at lps dot ens dot fr
2010-03-20 13:03 ` dominiq at lps dot ens dot fr
2010-03-20 13:21 ` dominiq at lps dot ens dot fr
2010-03-20 14:19 ` rguenther at suse dot de
2010-03-20 14:40 ` dominiq at lps dot ens dot fr
2010-03-20 15:00 ` rguenth at gcc dot gnu dot org
2010-03-20 15:12 ` rguenth at gcc dot gnu dot org
2010-03-22 10:36 ` rguenth at gcc dot gnu dot org
2010-03-22 12:38 ` rguenth at gcc dot gnu dot org
2010-03-22 12:39 ` [Bug middle-end/40106] [4.4 " rguenth at gcc dot gnu dot org
2010-03-25 17:38 ` hubicka at gcc dot gnu dot org
2010-04-30 9:01 ` jakub at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).