[Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
@ 2024-01-28 21:31 jamborm at gcc dot gnu.org
  2024-01-29  8:27 ` [Bug gcov-profile/113646] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2024-01-28 21:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

            Bug ID: 113646
           Summary: PGO hurts run-time of 538.imagick_r as much as 68% at
                    -Ofast -march=native
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: gcov-profile
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: hubicka at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux, aarch64-linux
            Target: x86_64-linux, aarch64-linux

Using profile guided optimization is very detrimental when compiling SPEC 2017
FPrate benchmark 538.imagick_r at -Ofast -march=native (with or without LTO) on
all machines where I have tried.

On Zen4, using PGO results in a 68% slower than not doing that without LTO and
65% with LTO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=970.507.0&plot.1=966.507.0&plot.2=959.507.0&plot.3=958.507.0&

On Zen3, using PGO slows the binary down by 22% when not using LTO and by 30%
with LTO:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0&plot.1=473.507.0&plot.2=475.507.0&plot.3=477.507.0&

On Zen2, PGO regresses by 16% without LTO and by 28% with it:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.507.0&plot.1=293.507.0&plot.2=287.507.0&plot.3=286.507.0&

On our Altra CPU, the slowdowns are 26% and 45%:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=584.507.0&plot.1=583.507.0&plot.2=587.507.0&plot.3=589.507.0&

On an Intel CascadeLake machine, they are 24% and 41%. (Our LNT Intel machine
is temporarily offline, unfortunately).

It is of course possible that the training workload does not match the
reference one very well.  However, this was not a problem in the past
(apparently the problem is that our non-PGO results improved but our PGO ones
did not).  Also, other compilers such as LLVM achieve better run-times with PGO
than without.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
  2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org
@ 2024-01-29  8:27 ` rguenth at gcc dot gnu.org
  2024-01-29 15:33 ` hubicka at ucw dot cz
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-29  8:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Did you try with -fprofile-partial-training (is that default on?  it probably
should ...).  Can you please try training with the rate data instead of train
to rule out a mismatch?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
  2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org
  2024-01-29  8:27 ` [Bug gcov-profile/113646] " rguenth at gcc dot gnu.org
@ 2024-01-29 15:33 ` hubicka at ucw dot cz
  2024-01-31 14:45 ` jamborm at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: hubicka at ucw dot cz @ 2024-01-29 15:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #2 from Jan Hubicka <hubicka at ucw dot cz> ---
> Did you try with -fprofile-partial-training (is that default on?  it probably
> should ...).  Can you please try training with the rate data instead of train

It is not on by default - the problem of partial training is that it
mostly nullifies any code size benefits from profile-use and that is
relatively noticebale aspect of it in real-world situations (like
for GCC itself or Firefox the overall size of binary matters).

I need to work on this more, but now we have two-state optimize_size
predicates and with level 1 we can turn off those -Os optimizations that
make large tradeoffs of performance for size optimization.

Honza
> to rule out a mismatch?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
  2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org
  2024-01-29  8:27 ` [Bug gcov-profile/113646] " rguenth at gcc dot gnu.org
  2024-01-29 15:33 ` hubicka at ucw dot cz
@ 2024-01-31 14:45 ` jamborm at gcc dot gnu.org
  2024-02-01 11:25 ` hubicka at ucw dot cz
  2024-06-14 14:39 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2024-01-31 14:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Did you try with -fprofile-partial-training (is that default on?  it
> probably should ...).  Can you please try training with the rate data
> instead of train
> to rule out a mismatch?

With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer
master) goes down from 66% to 54%.  

So far I did not find a way to easily train with the reference run (when I add
"train_with = refrate" to the config, I always get "ERROR: The workload
specified by train_with MUST be a training workload!")

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
  2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-01-31 14:45 ` jamborm at gcc dot gnu.org
@ 2024-02-01 11:25 ` hubicka at ucw dot cz
  2024-06-14 14:39 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: hubicka at ucw dot cz @ 2024-02-01 11:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

--- Comment #4 from Jan Hubicka <hubicka at ucw dot cz> ---
> 
> With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer
> master) goes down from 66% to 54%.  
> 
> So far I did not find a way to easily train with the reference run (when I add
> "train_with = refrate" to the config, I always get "ERROR: The workload
> specified by train_with MUST be a training workload!")

I do that with a crude hack of simply rewriting training data files with
reference versions in SPEC directly.   I believe that here problem must
be that with PGO we confuse vectorizer somehow.

I did not know there is train_with option.  Perhaps hacking the spec
driver to not output error is easy enough

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native
  2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-02-01 11:25 ` hubicka at ucw dot cz
@ 2024-06-14 14:39 ` jamborm at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: jamborm at gcc dot gnu.org @ 2024-06-14 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-06-14
     Ever confirmed|0                           |1

--- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Re-confirmed with the released GCC 14.1.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-14 14:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org
2024-01-29  8:27 ` [Bug gcov-profile/113646] " rguenth at gcc dot gnu.org
2024-01-29 15:33 ` hubicka at ucw dot cz
2024-01-31 14:45 ` jamborm at gcc dot gnu.org
2024-02-01 11:25 ` hubicka at ucw dot cz
2024-06-14 14:39 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).