public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug lto/57290] New: [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto
@ 2013-05-15 14:02 dominiq at lps dot ens.fr
  2013-05-15 14:08 ` [Bug lto/57290] " rguenther at suse dot de
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: dominiq at lps dot ens.fr @ 2013-05-15 14:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290

            Bug ID: 57290
           Summary: [4.9 Regression] After r198333 the aermod runtime is
                    ~10% slower when compiled with -fprotect-parens and
                    -flto
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: lto
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dominiq at lps dot ens.fr
                CC: rguenth at gcc dot gnu.org

After r198333, the aermod runtime is more than 10% slower when compiled with
-fprotect-parens and -flto:

(1) -Ofast -funroll-loops
(2) -fprotect-parens -Ofast -funroll-loops
(3) -Ofast -funroll-loops -fwhole-program -flto
(4) -fprotect-parens -Ofast -funroll-loops -fwhole-program
(5) -fprotect-parens -Ofast -funroll-loops -fwhole-program -flto

revision:   198332  198333

(1)          18.11   17.74
(2)          17.70   17.61
(3)          17.66   18.34
(4)          18.47   18.49
(5)          17.80   20.70


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug lto/57290] [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto
  2013-05-15 14:02 [Bug lto/57290] New: [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto dominiq at lps dot ens.fr
@ 2013-05-15 14:08 ` rguenther at suse dot de
  2013-05-15 14:26 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenther at suse dot de @ 2013-05-15 14:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290

--- Comment #1 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 15 May 2013, dominiq at lps dot ens.fr wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290
> 
>             Bug ID: 57290
>            Summary: [4.9 Regression] After r198333 the aermod runtime is
>                     ~10% slower when compiled with -fprotect-parens and
>                     -flto
>            Product: gcc
>            Version: 4.9.0
>             Status: UNCONFIRMED
>           Severity: normal
>           Priority: P3
>          Component: lto
>           Assignee: unassigned at gcc dot gnu.org
>           Reporter: dominiq at lps dot ens.fr
>                 CC: rguenth at gcc dot gnu.org
> 
> After r198333, the aermod runtime is more than 10% slower when compiled with
> -fprotect-parens and -flto:
> 
> (1) -Ofast -funroll-loops
> (2) -fprotect-parens -Ofast -funroll-loops
> (3) -Ofast -funroll-loops -fwhole-program -flto
> (4) -fprotect-parens -Ofast -funroll-loops -fwhole-program
> (5) -fprotect-parens -Ofast -funroll-loops -fwhole-program -flto
> 
> revision:   198332  198333
> 
> (1)          18.11   17.74
> (2)          17.70   17.61
> (3)          17.66   18.34
> (4)          18.47   18.49
> (5)          17.80   20.70

There is a lot of noise in these numbers(?) the patch, apart from

+       * passes.c (init_optimization_passes): Schedule a copy-propagation
+       pass before complete unrolling of inner loops.

should have had no effect on performance (well, in theory, that is).
Can you check whether reverting the above part changes the results?

Also, what's the variance of the numbers?  Are (1) to (4) effectively
the same performance r198332 vs. r198333?  (make sure to disable
address-space randomization for benchmarking)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug lto/57290] [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto
  2013-05-15 14:02 [Bug lto/57290] New: [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto dominiq at lps dot ens.fr
  2013-05-15 14:08 ` [Bug lto/57290] " rguenther at suse dot de
@ 2013-05-15 14:26 ` rguenth at gcc dot gnu.org
  2013-05-15 17:00 ` dominiq at lps dot ens.fr
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-05-15 14:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.9.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug lto/57290] [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto
  2013-05-15 14:02 [Bug lto/57290] New: [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto dominiq at lps dot ens.fr
  2013-05-15 14:08 ` [Bug lto/57290] " rguenther at suse dot de
  2013-05-15 14:26 ` rguenth at gcc dot gnu.org
@ 2013-05-15 17:00 ` dominiq at lps dot ens.fr
  2013-05-16 11:19 ` rguenth at gcc dot gnu.org
  2013-06-22 10:46 ` dominiq at lps dot ens.fr
  4 siblings, 0 replies; 6+ messages in thread
From: dominiq at lps dot ens.fr @ 2013-05-15 17:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290

--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
There is a lot of noise in these numbers(?) 

Well, AFAICT aermod.f90 has a "non-monotonic" behavior for the different
optimizations: when playing with --param max-inline-insns-auto=xx, the
execution time was not decreasing for increasing xx, but went up or down
depending on which routine was inlined.

> the patch, apart from
>
> +       * passes.c (init_optimization_passes): Schedule a copy-propagation
> +       pass before complete unrolling of inner loops.
>
> should have had no effect on performance (well, in theory, that is).
> Can you check whether reverting the above part changes the results?

Nope

> Also, what's the variance of the numbers? 

Below 0.1s. 

> Are (1) to (4) effectively
> the same performance r198332 vs. r198333?  

Yes for (2) and (4). For (1) and (3), I think the performances are slightly
different. What triggered this PR is (5) (can you reproduce it?) versus (3),
i.e., -fprotect-parens versus -fwhole-program -flto.

> (make sure to disable
> address-space randomization for benchmarking)

I don't really know what you are talking about (I am using Darwin).

Profiling the executable obtained with -fprotect-parens -Ofast -funroll-loops
-ftree-loop-linear -fomit-frame-pointer -fwhole-program -flto gives

- 21.8%, iblval_.lto_priv.516, a.out
- 12.7%, sigz_.lto_priv.419, a.out
- 12.7%, powf$fenv_access_off, libSystem.B.dylib
  12.4%, anyavg_.constprop.50, a.out
- 5.6%, plumef_.lto_priv.580, a.out

and with -Ofast -funroll-loops -ftree-loop-linear -fomit-frame-pointer
-fwhole-program -flto:

- 14.7%, powf$fenv_access_off, libSystem.B.dylib
- 14.5%, iblval_.lto_priv.284, a.out
- 13.8%, sigz_.lto_priv.290, a.out
  13.7%, anyavg_.constprop.50, a.out
- 4.8%, refl_ht_.lto_priv.281, a.out
- 4.7%, rmssig_.lto_priv.298, a.out
  3.1%, _gfortran_compare_string, libgfortran.3.dylib

The subroutine takes ~4.5s for the first set of options and ~2.6s for the
second one.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug lto/57290] [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto
  2013-05-15 14:02 [Bug lto/57290] New: [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto dominiq at lps dot ens.fr
                   ` (2 preceding siblings ...)
  2013-05-15 17:00 ` dominiq at lps dot ens.fr
@ 2013-05-16 11:19 ` rguenth at gcc dot gnu.org
  2013-06-22 10:46 ` dominiq at lps dot ens.fr
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-05-16 11:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm trying to reproduce it.  Can you on your side verify whether dropping
-ftree-loop-linear changes anything with respect to the regression?
Also what does

(6) -Ofast -funroll-loops -fwhole-program

numbers look like?  Because if you factor in LTO then you should compare
against a revision that includes

2013-04-26  Richard Biener  <rguenther@suse.de>

        * Makefile.in (lto-streamer-in.o): Add $(CFGLOOP_H) dependency.
        (lto-streamer-out.o): Likewise.
        * cfgloop.c (init_loops_structure): Export, add struct function
        argument and adjust.
        (flow_loops_find): Adjust.
        * cfgloop.h (enum loop_estimation): Add EST_LAST.
        (init_loops_structure): Declare.
        * lto-streamer-in.c: Include cfgloop.h.
        (input_cfg): Input the loop tree.
        * lto-streamer-out.c: Include cfgloop.h.
        (output_cfg): Output the loop tree.
        (output_struct_function_base): Do not drop PROP_loops.

I see

(1) -Ofast -funroll-loops -fomit-frame-pointer -fwhole-program -flto
(2) -Ofast -funroll-loops -fomit-frame-pointer -fwhole-program -flto
-fprotect-parens

revision:    198332     198333
(1)          15.5+-.3   15.6+-.2
(2)          16.1+-.1   15.9+-.2

note that the PAREN_EXPR thing made me point at the extra copyprop pass.
So there is a difference between -f[no-]protect-parens but between the revs
I cannot see a regression.

Are you testing 64bit or 32bit executables?  On Intel or PPC?

As you noted the non-monotonic behavior wrt inlining decisions it would be
interesting if those differ for you, (5) rev. 198332 vs. 198333.  Add
-fdump-ipa-inline to the command-line and inspect the
aermod.f90.wpa.047i.inline
dumpfile, grepping for 'Inlined into'.  I only see changes in estimated
time/size but no real code changes.  I do see code layout changes though
and changes in LTRANS due to the extra copyprop pass.

Note that if -flto makes things worse compared to just -fwhole-program
(which it slightly does for me) then this is probably due to partitioning.
So you may also want to check -flto -flto-partition=none (slightly easier
to debug in the end - but without LTO it would be easiest).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug lto/57290] [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto
  2013-05-15 14:02 [Bug lto/57290] New: [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto dominiq at lps dot ens.fr
                   ` (3 preceding siblings ...)
  2013-05-16 11:19 ` rguenth at gcc dot gnu.org
@ 2013-06-22 10:46 ` dominiq at lps dot ens.fr
  4 siblings, 0 replies; 6+ messages in thread
From: dominiq at lps dot ens.fr @ 2013-06-22 10:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57290

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #4 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
At revision 200321 I get
(3) 17.83
(5) 17.82

Closing as FIXED.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-06-22 10:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-15 14:02 [Bug lto/57290] New: [4.9 Regression] After r198333 the aermod runtime is ~10% slower when compiled with -fprotect-parens and -flto dominiq at lps dot ens.fr
2013-05-15 14:08 ` [Bug lto/57290] " rguenther at suse dot de
2013-05-15 14:26 ` rguenth at gcc dot gnu.org
2013-05-15 17:00 ` dominiq at lps dot ens.fr
2013-05-16 11:19 ` rguenth at gcc dot gnu.org
2013-06-22 10:46 ` dominiq at lps dot ens.fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).