public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native @ 2024-01-28 21:31 jamborm at gcc dot gnu.org 2024-01-29 8:27 ` [Bug gcov-profile/113646] " rguenth at gcc dot gnu.org ` (4 more replies) 0 siblings, 5 replies; 6+ messages in thread From: jamborm at gcc dot gnu.org @ 2024-01-28 21:31 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 Bug ID: 113646 Summary: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: hubicka at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux, aarch64-linux Target: x86_64-linux, aarch64-linux Using profile guided optimization is very detrimental when compiling SPEC 2017 FPrate benchmark 538.imagick_r at -Ofast -march=native (with or without LTO) on all machines where I have tried. On Zen4, using PGO results in a 68% slower than not doing that without LTO and 65% with LTO: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=970.507.0&plot.1=966.507.0&plot.2=959.507.0&plot.3=958.507.0& On Zen3, using PGO slows the binary down by 22% when not using LTO and by 30% with LTO: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0&plot.1=473.507.0&plot.2=475.507.0&plot.3=477.507.0& On Zen2, PGO regresses by 16% without LTO and by 28% with it: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.507.0&plot.1=293.507.0&plot.2=287.507.0&plot.3=286.507.0& On our Altra CPU, the slowdowns are 26% and 45%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=584.507.0&plot.1=583.507.0&plot.2=587.507.0&plot.3=589.507.0& On an Intel CascadeLake machine, they are 24% and 41%. (Our LNT Intel machine is temporarily offline, unfortunately). It is of course possible that the training workload does not match the reference one very well. However, this was not a problem in the past (apparently the problem is that our non-PGO results improved but our PGO ones did not). Also, other compilers such as LLVM achieve better run-times with PGO than without. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native 2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org @ 2024-01-29 8:27 ` rguenth at gcc dot gnu.org 2024-01-29 15:33 ` hubicka at ucw dot cz ` (3 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu.org @ 2024-01-29 8:27 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Did you try with -fprofile-partial-training (is that default on? it probably should ...). Can you please try training with the rate data instead of train to rule out a mismatch? ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native 2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org 2024-01-29 8:27 ` [Bug gcov-profile/113646] " rguenth at gcc dot gnu.org @ 2024-01-29 15:33 ` hubicka at ucw dot cz 2024-01-31 14:45 ` jamborm at gcc dot gnu.org ` (2 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: hubicka at ucw dot cz @ 2024-01-29 15:33 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #2 from Jan Hubicka <hubicka at ucw dot cz> --- > Did you try with -fprofile-partial-training (is that default on? it probably > should ...). Can you please try training with the rate data instead of train It is not on by default - the problem of partial training is that it mostly nullifies any code size benefits from profile-use and that is relatively noticebale aspect of it in real-world situations (like for GCC itself or Firefox the overall size of binary matters). I need to work on this more, but now we have two-state optimize_size predicates and with level 1 we can turn off those -Os optimizations that make large tradeoffs of performance for size optimization. Honza > to rule out a mismatch? ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native 2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org 2024-01-29 8:27 ` [Bug gcov-profile/113646] " rguenth at gcc dot gnu.org 2024-01-29 15:33 ` hubicka at ucw dot cz @ 2024-01-31 14:45 ` jamborm at gcc dot gnu.org 2024-02-01 11:25 ` hubicka at ucw dot cz 2024-06-14 14:39 ` jamborm at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: jamborm at gcc dot gnu.org @ 2024-01-31 14:45 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> --- (In reply to Richard Biener from comment #1) > Did you try with -fprofile-partial-training (is that default on? it > probably should ...). Can you please try training with the rate data > instead of train > to rule out a mismatch? With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer master) goes down from 66% to 54%. So far I did not find a way to easily train with the reference run (when I add "train_with = refrate" to the config, I always get "ERROR: The workload specified by train_with MUST be a training workload!") ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native 2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org ` (2 preceding siblings ...) 2024-01-31 14:45 ` jamborm at gcc dot gnu.org @ 2024-02-01 11:25 ` hubicka at ucw dot cz 2024-06-14 14:39 ` jamborm at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: hubicka at ucw dot cz @ 2024-02-01 11:25 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 --- Comment #4 from Jan Hubicka <hubicka at ucw dot cz> --- > > With -fprofile-partial-training the znver4 LTO vs LTOPGO regression (on a newer > master) goes down from 66% to 54%. > > So far I did not find a way to easily train with the reference run (when I add > "train_with = refrate" to the config, I always get "ERROR: The workload > specified by train_with MUST be a training workload!") I do that with a crude hack of simply rewriting training data files with reference versions in SPEC directly. I believe that here problem must be that with PGO we confuse vectorizer somehow. I did not know there is train_with option. Perhaps hacking the spec driver to not output error is easy enough ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug gcov-profile/113646] PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native 2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org ` (3 preceding siblings ...) 2024-02-01 11:25 ` hubicka at ucw dot cz @ 2024-06-14 14:39 ` jamborm at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: jamborm at gcc dot gnu.org @ 2024-06-14 14:39 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113646 Martin Jambor <jamborm at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2024-06-14 Ever confirmed|0 |1 --- Comment #5 from Martin Jambor <jamborm at gcc dot gnu.org> --- Re-confirmed with the released GCC 14.1. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-14 14:39 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-28 21:31 [Bug gcov-profile/113646] New: PGO hurts run-time of 538.imagick_r as much as 68% at -Ofast -march=native jamborm at gcc dot gnu.org 2024-01-29 8:27 ` [Bug gcov-profile/113646] " rguenth at gcc dot gnu.org 2024-01-29 15:33 ` hubicka at ucw dot cz 2024-01-31 14:45 ` jamborm at gcc dot gnu.org 2024-02-01 11:25 ` hubicka at ucw dot cz 2024-06-14 14:39 ` jamborm at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).