public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 @ 2020-03-27 18:06 jamborm at gcc dot gnu.org 2020-03-30 7:51 ` [Bug target/94364] " rguenth at gcc dot gnu.org ` (5 more replies) 0 siblings, 6 replies; 7+ messages in thread From: jamborm at gcc dot gnu.org @ 2020-03-27 18:06 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364 Bug ID: 94364 Summary: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux SPEC 2017 INTrate benchmark 505.mcf_r, when compiled with options -Ofast -march=native -mtune=native, is 8% slower than when we also use option -mprefer-vector-width=128. I have observed it on both AMD Zen2 and Intel Cascade Lake Server CPUs (using master revision 26b3e568a60). Better vector width selection would therefore bring about noticeable speed-up. Symbol profiles (collected on AMD Rome): -Ofast -march=native -mtune=native: Overhead Samples Shared Object Symbol ........ ............ ............... ................................ 28.64% 462302 mcf_r_peak.mine spec_qsort 21.58% 348703 mcf_r_peak.mine cost_compare 15.81% 255029 mcf_r_peak.mine primal_bea_mpp 15.58% 251176 mcf_r_peak.mine replace_weaker_arc 7.37% 118646 mcf_r_peak.mine arc_compare 6.53% 105337 mcf_r_peak.mine price_out_impl 1.38% 22276 mcf_r_peak.mine update_tree -Ofast -march=native -mtune=native -mprefer-vector-width=128: Overhead Samples Shared Object Symbol ........ ............ ............... ................................ 23.57% 354536 mcf_r_peak.mine spec_qsort 23.51% 353767 mcf_r_peak.mine cost_compare 16.98% 255104 mcf_r_peak.mine primal_bea_mpp 16.65% 249891 mcf_r_peak.mine replace_weaker_arc 7.29% 109267 mcf_r_peak.mine arc_compare 7.09% 106380 mcf_r_peak.mine price_out_impl 1.53% 22968 mcf_r_peak.mine update_tree Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org @ 2020-03-30 7:51 ` rguenth at gcc dot gnu.org 2020-04-01 19:14 ` jamborm at gcc dot gnu.org ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2020-03-30 7:51 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Huh, looks like this is the (patched by us) memory copying done in spec_qsort? I wonder if you can re-measure with our patching undone but then with -fno-strict-aliasing (though I think that only was required with LTO). How large are the objects sorted in mcf? ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org 2020-03-30 7:51 ` [Bug target/94364] " rguenth at gcc dot gnu.org @ 2020-04-01 19:14 ` jamborm at gcc dot gnu.org 2020-04-02 7:30 ` rguenth at gcc dot gnu.org ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: jamborm at gcc dot gnu.org @ 2020-04-01 19:14 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364 --- Comment #2 from Martin Jambor <jamborm at gcc dot gnu.org> --- (In reply to Richard Biener from comment #1) > Huh, looks like this is the (patched by us) memory copying done in > spec_qsort? Yes > I wonder if you can re-measure with our patching undone but then with > -fno-strict-aliasing (though I think that only was required with LTO). > The difference indeed goes away :-/ The current code we're benchmarking (when not using LTO) is slower in both cases :-/ > How large are the objects sorted in mcf? It's always pointers, 8 bytes. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org 2020-03-30 7:51 ` [Bug target/94364] " rguenth at gcc dot gnu.org 2020-04-01 19:14 ` jamborm at gcc dot gnu.org @ 2020-04-02 7:30 ` rguenth at gcc dot gnu.org 2020-04-02 11:18 ` marxin at gcc dot gnu.org ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: rguenth at gcc dot gnu.org @ 2020-04-02 7:30 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364 --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Martin Jambor from comment #2) > (In reply to Richard Biener from comment #1) > > Huh, looks like this is the (patched by us) memory copying done in > > spec_qsort? > > Yes > > > I wonder if you can re-measure with our patching undone but then with > > -fno-strict-aliasing (though I think that only was required with LTO). > > > > The difference indeed goes away :-/ The current code we're > benchmarking (when not using LTO) is slower in both cases :-/ :/ What is the diff we are using? IIRC spec_qsort contains special casing for standard integer type sizes and my original patch simply removed all that premature optimization and instead always uses the char copying loop (which seems to be vectorized then). Maybe we can resort to apply -fno-strict-aliasing just to the qsort CU? It wasn't intended to introduce big differences compared to official runs... > > How large are the objects sorted in mcf? > > It's always pointers, 8 bytes. OK, that would explain it then. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org ` (2 preceding siblings ...) 2020-04-02 7:30 ` rguenth at gcc dot gnu.org @ 2020-04-02 11:18 ` marxin at gcc dot gnu.org 2020-04-02 12:00 ` marxin at gcc dot gnu.org 2020-04-02 14:37 ` jamborm at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: marxin at gcc dot gnu.org @ 2020-04-02 11:18 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364 --- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> --- Created attachment 48169 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48169&action=edit qsort patch I'm sending spec_qsort patch we use. I'm going to prepare a patch that will revert this and add -fno-strict-aliasing attribute to the function. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org ` (3 preceding siblings ...) 2020-04-02 11:18 ` marxin at gcc dot gnu.org @ 2020-04-02 12:00 ` marxin at gcc dot gnu.org 2020-04-02 14:37 ` jamborm at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: marxin at gcc dot gnu.org @ 2020-04-02 12:00 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364 --- Comment #5 from Martin Liška <marxin at gcc dot gnu.org> --- With something like: diff --git a/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c b/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c index 05cad501..ad79ddae 100755 --- a/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c +++ b/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c @@ -112,6 +112,7 @@ med3(char *a, char *b, char *c, cmp_t *cmp) } void +__attribute__((optimize("-fno-strict-aliasing"))) spec_qsort(void *a, size_t n, size_t es, cmp_t *cmp) { char *pa, *pb, *pc, *pd, *pl, *pm, *pn; diff --git a/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.h b/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.h index 0519f867..c25a1159 100755 --- a/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.h +++ b/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.h @@ -6,5 +6,7 @@ #ifdef __cplusplus extern "C" #endif -void spec_qsort(void *array, size_t nitems, size_t size, int (*cmp)(const void*,const void*)); +void +__attribute__((optimize("-fno-strict-aliasing"))) +spec_qsort(void *array, size_t nitems, size_t size, int (*cmp)(const void*,const void*)); #endif and -Ofast -march=znver2 I get: 21.95% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] cost_compare 19.95% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] spec_qsort 19.63% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] primal_bea_mpp 14.20% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] replace_weaker_arc 9.17% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] arc_compare 8.47% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] price_out_impl 1.37% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] update_tree 0.97% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] switch_arcs.constprop.0 0.83% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] suspend_impl 0.69% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] primal_iminus ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org ` (4 preceding siblings ...) 2020-04-02 12:00 ` marxin at gcc dot gnu.org @ 2020-04-02 14:37 ` jamborm at gcc dot gnu.org 5 siblings, 0 replies; 7+ messages in thread From: jamborm at gcc dot gnu.org @ 2020-04-02 14:37 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364 Martin Jambor <jamborm at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> --- OK, I'm going to close this given that this problem is specific to our mcf patch which we decided to change and the issue cannot easily be avoided in the compiler. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-02 14:37 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org 2020-03-30 7:51 ` [Bug target/94364] " rguenth at gcc dot gnu.org 2020-04-01 19:14 ` jamborm at gcc dot gnu.org 2020-04-02 7:30 ` rguenth at gcc dot gnu.org 2020-04-02 11:18 ` marxin at gcc dot gnu.org 2020-04-02 12:00 ` marxin at gcc dot gnu.org 2020-04-02 14:37 ` jamborm at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).