public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128
@ 2020-03-27 18:06 jamborm at gcc dot gnu.org
2020-03-30 7:51 ` [Bug target/94364] " rguenth at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-03-27 18:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
Bug ID: 94364
Summary: 505.mcf_r is 8% faster when compiled with
-mprefer-vector-width=128
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jamborm at gcc dot gnu.org
Blocks: 26163
Target Milestone: ---
Host: x86_64-linux
Target: x86_64-linux
SPEC 2017 INTrate benchmark 505.mcf_r, when compiled with options
-Ofast -march=native -mtune=native, is 8% slower than when we also use
option -mprefer-vector-width=128. I have observed it on both AMD Zen2
and Intel Cascade Lake Server CPUs (using master revision 26b3e568a60).
Better vector width selection would therefore bring about noticeable
speed-up.
Symbol profiles (collected on AMD Rome):
-Ofast -march=native -mtune=native:
Overhead Samples Shared Object Symbol
........ ............ ............... ................................
28.64% 462302 mcf_r_peak.mine spec_qsort
21.58% 348703 mcf_r_peak.mine cost_compare
15.81% 255029 mcf_r_peak.mine primal_bea_mpp
15.58% 251176 mcf_r_peak.mine replace_weaker_arc
7.37% 118646 mcf_r_peak.mine arc_compare
6.53% 105337 mcf_r_peak.mine price_out_impl
1.38% 22276 mcf_r_peak.mine update_tree
-Ofast -march=native -mtune=native -mprefer-vector-width=128:
Overhead Samples Shared Object Symbol
........ ............ ............... ................................
23.57% 354536 mcf_r_peak.mine spec_qsort
23.51% 353767 mcf_r_peak.mine cost_compare
16.98% 255104 mcf_r_peak.mine primal_bea_mpp
16.65% 249891 mcf_r_peak.mine replace_weaker_arc
7.29% 109267 mcf_r_peak.mine arc_compare
7.09% 106380 mcf_r_peak.mine price_out_impl
1.53% 22968 mcf_r_peak.mine update_tree
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128
2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org
@ 2020-03-30 7:51 ` rguenth at gcc dot gnu.org
2020-04-01 19:14 ` jamborm at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-03-30 7:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Huh, looks like this is the (patched by us) memory copying done in spec_qsort?
I wonder if you can re-measure with our patching undone but then with
-fno-strict-aliasing (though I think that only was required with LTO).
How large are the objects sorted in mcf?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128
2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org
2020-03-30 7:51 ` [Bug target/94364] " rguenth at gcc dot gnu.org
@ 2020-04-01 19:14 ` jamborm at gcc dot gnu.org
2020-04-02 7:30 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-04-01 19:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
--- Comment #2 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Huh, looks like this is the (patched by us) memory copying done in
> spec_qsort?
Yes
> I wonder if you can re-measure with our patching undone but then with
> -fno-strict-aliasing (though I think that only was required with LTO).
>
The difference indeed goes away :-/ The current code we're
benchmarking (when not using LTO) is slower in both cases :-/
> How large are the objects sorted in mcf?
It's always pointers, 8 bytes.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128
2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org
2020-03-30 7:51 ` [Bug target/94364] " rguenth at gcc dot gnu.org
2020-04-01 19:14 ` jamborm at gcc dot gnu.org
@ 2020-04-02 7:30 ` rguenth at gcc dot gnu.org
2020-04-02 11:18 ` marxin at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-02 7:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Martin Jambor from comment #2)
> (In reply to Richard Biener from comment #1)
> > Huh, looks like this is the (patched by us) memory copying done in
> > spec_qsort?
>
> Yes
>
> > I wonder if you can re-measure with our patching undone but then with
> > -fno-strict-aliasing (though I think that only was required with LTO).
> >
>
> The difference indeed goes away :-/ The current code we're
> benchmarking (when not using LTO) is slower in both cases :-/
:/
What is the diff we are using? IIRC spec_qsort contains special casing
for standard integer type sizes and my original patch simply removed all
that premature optimization and instead always uses the char copying loop
(which seems to be vectorized then). Maybe we can resort to apply
-fno-strict-aliasing just to the qsort CU? It wasn't intended to introduce
big differences compared to official runs...
> > How large are the objects sorted in mcf?
>
> It's always pointers, 8 bytes.
OK, that would explain it then.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128
2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org
` (2 preceding siblings ...)
2020-04-02 7:30 ` rguenth at gcc dot gnu.org
@ 2020-04-02 11:18 ` marxin at gcc dot gnu.org
2020-04-02 12:00 ` marxin at gcc dot gnu.org
2020-04-02 14:37 ` jamborm at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-04-02 11:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
--- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> ---
Created attachment 48169
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48169&action=edit
qsort patch
I'm sending spec_qsort patch we use. I'm going to prepare a patch that will
revert this and add -fno-strict-aliasing attribute to the function.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128
2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org
` (3 preceding siblings ...)
2020-04-02 11:18 ` marxin at gcc dot gnu.org
@ 2020-04-02 12:00 ` marxin at gcc dot gnu.org
2020-04-02 14:37 ` jamborm at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-04-02 12:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
--- Comment #5 from Martin Liška <marxin at gcc dot gnu.org> ---
With something like:
diff --git a/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c
b/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c
index 05cad501..ad79ddae 100755
--- a/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c
+++ b/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.c
@@ -112,6 +112,7 @@ med3(char *a, char *b, char *c, cmp_t *cmp)
}
void
+__attribute__((optimize("-fno-strict-aliasing")))
spec_qsort(void *a, size_t n, size_t es, cmp_t *cmp)
{
char *pa, *pb, *pc, *pd, *pl, *pm, *pn;
diff --git a/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.h
b/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.h
index 0519f867..c25a1159 100755
--- a/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.h
+++ b/benchspec/CPU/505.mcf_r/src/spec_qsort/spec_qsort.h
@@ -6,5 +6,7 @@
#ifdef __cplusplus
extern "C"
#endif
-void spec_qsort(void *array, size_t nitems, size_t size, int (*cmp)(const
void*,const void*));
+void
+__attribute__((optimize("-fno-strict-aliasing")))
+spec_qsort(void *array, size_t nitems, size_t size, int (*cmp)(const
void*,const void*));
#endif
and -Ofast -march=znver2 I get:
21.95% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] cost_compare
19.95% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] spec_qsort
19.63% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] primal_bea_mpp
14.20% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] replace_weaker_arc
9.17% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] arc_compare
8.47% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] price_out_impl
1.37% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] update_tree
0.97% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] switch_arcs.constprop.0
0.83% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] suspend_impl
0.69% mcf_r_peak.gcc7 mcf_r_peak.gcc7-m64 [.] primal_iminus
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/94364] 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128
2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org
` (4 preceding siblings ...)
2020-04-02 12:00 ` marxin at gcc dot gnu.org
@ 2020-04-02 14:37 ` jamborm at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jamborm at gcc dot gnu.org @ 2020-04-02 14:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94364
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |WONTFIX
--- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> ---
OK, I'm going to close this given that this problem is specific to our
mcf patch which we decided to change and the issue cannot easily be
avoided in the compiler.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-02 14:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 18:06 [Bug tree-optimization/94364] New: 505.mcf_r is 8% faster when compiled with -mprefer-vector-width=128 jamborm at gcc dot gnu.org
2020-03-30 7:51 ` [Bug target/94364] " rguenth at gcc dot gnu.org
2020-04-01 19:14 ` jamborm at gcc dot gnu.org
2020-04-02 7:30 ` rguenth at gcc dot gnu.org
2020-04-02 11:18 ` marxin at gcc dot gnu.org
2020-04-02 12:00 ` marxin at gcc dot gnu.org
2020-04-02 14:37 ` jamborm at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).