public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/103592] New: fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize @ 2021-12-06 21:02 hubicka at gcc dot gnu.org 2021-12-06 21:48 ` [Bug tree-optimization/103592] " hubicka at kam dot mff.cuni.cz ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: hubicka at gcc dot gnu.org @ 2021-12-06 21:02 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103592 Bug ID: 103592 Summary: fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- While looking into -fno-inline-functions-called-once difference I noticed that on zen hardware I get: - 0m33s runtime for fatigue2 benchmark (from phoronix) when built with -Ofast -march=native -fno-slp-vectorize -fno-tree-vectorize - 0m57s for -Ofast -march=native binary ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/103592] fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize 2021-12-06 21:02 [Bug tree-optimization/103592] New: fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize hubicka at gcc dot gnu.org @ 2021-12-06 21:48 ` hubicka at kam dot mff.cuni.cz 2021-12-07 8:23 ` marxin at gcc dot gnu.org 2021-12-07 9:38 ` rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: hubicka at kam dot mff.cuni.cz @ 2021-12-06 21:48 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103592 --- Comment #1 from hubicka at kam dot mff.cuni.cz --- > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 > [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95) note that fatigue2 is polyhedron, not spec... ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/103592] fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize 2021-12-06 21:02 [Bug tree-optimization/103592] New: fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize hubicka at gcc dot gnu.org 2021-12-06 21:48 ` [Bug tree-optimization/103592] " hubicka at kam dot mff.cuni.cz @ 2021-12-07 8:23 ` marxin at gcc dot gnu.org 2021-12-07 9:38 ` rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: marxin at gcc dot gnu.org @ 2021-12-07 8:23 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103592 Martin Liška <marxin at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |marxin at gcc dot gnu.org Last reconfirmed| |2021-12-07 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/103592] fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize 2021-12-06 21:02 [Bug tree-optimization/103592] New: fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize hubicka at gcc dot gnu.org 2021-12-06 21:48 ` [Bug tree-optimization/103592] " hubicka at kam dot mff.cuni.cz 2021-12-07 8:23 ` marxin at gcc dot gnu.org @ 2021-12-07 9:38 ` rguenth at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2021-12-07 9:38 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103592 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- 23.13% 44783 a.out.vect a.out.vect [.] __perdida_m_MOD_generalized_hookes_law.constprop.0.isra.0# 2.40% 4641 a.out.vect a.out.vect [.] __perdida_m_MOD_generalized_hookes_law.constprop.1.isra.0# 2.37% 4613 a.out.novect a.out.novect [.] __perdida_m_MOD_generalized_hookes_law.constprop.0.isra.0# 1.23% 2383 a.out.vect libc-2.31.so [.] __memset_avx2_unaligned_erms # 0.35% 676 a.out.vect libc-2.31.so [.] __memset_avx2_unaligned # 0.20% 394 a.out.novect a.out.novect [.] __perdida_m_MOD_generalized_hookes_law.constprop.1.isra.0 we end up doing loop vectorization with a lot of invariants built up from scalars but only a known single vector iteration. We also have a local array that's only elided after vectorization causing final stores to require vector extracts. I think this is the usual case of vectorization constraining OOO execution in the face of the code being limited by load & store. We also fail to elide generalized_constitutive_tensor - FRE can do this in priciple - there's a duplicate PR for this and the situation is like generalized_constitutive_tensor = {}; ... generalized_constitutive_tensor[0] = _19; generalized_constitutive_tensor[1] = ISRA.833_76(D); generalized_constitutive_tensor[2] = ISRA.833_76(D); ... vect__14.843_125 = MEM <vector(4) real(kind=8)> [(real(kind=8) *)&generalized_constitutive_tensor]; where FRE could create a { _19, ISRA.833_76(D), ISRA.833_76(D), 0. } vector CTOR but that's only profitable if the stores go away. I have a patch to do that (w/o the costing). Note in the not vectorized case we are able to elide generalized_constitutive_tensor and also CSE a lot of the computations because the tensor only has 4 distinct values (and some are even zero). So it's really a very special case ... ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-12-07 9:38 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-06 21:02 [Bug tree-optimization/103592] New: fatigue2 benchmarks on zen runs 43% faster with -fno-tree-vectorize -fno-tree-slp-vectorize hubicka at gcc dot gnu.org 2021-12-06 21:48 ` [Bug tree-optimization/103592] " hubicka at kam dot mff.cuni.cz 2021-12-07 8:23 ` marxin at gcc dot gnu.org 2021-12-07 9:38 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).