public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/107905] New: 2x slowdown versus CLANG and ICL
@ 2022-11-29 2:36 sanmayce at sanmayce dot com
2022-11-29 3:22 ` [Bug middle-end/107905] " pinskia at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: sanmayce at sanmayce dot com @ 2022-11-29 2:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
Bug ID: 107905
Summary: 2x slowdown versus CLANG and ICL
Product: gcc
Version: 11.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: sanmayce at sanmayce dot com
Target Milestone: ---
Hi,
I have encountered performance problem with MinGW GCC 11.3.0, nearly 2x
slowdown with a simple C function (being the 3rd fastest wildcard matching
function).
```
int Tcheburaschka_Wildcard_Iterative_Kaze_CaseSensitive (const char* mask,
const char* name) {
const char* maskSTACK;
const char* nameSTACK;
#pragma nounroll
for (name, mask; *name; ++name, ++mask) {
if (*mask == '*') {
goto Backtrack;
//} else if ((*mask != '?') && (*name != *mask)) {
/*
} else if ((*mask - '?') * (*name - *mask)) {
return 0;
}
*/
}
if ((*mask - '?') * (*name - *mask)) {
return 0;
}
}
Backtrack:
#pragma nounroll
for (nameSTACK = name, maskSTACK = mask; *nameSTACK; ++nameSTACK,
++maskSTACK) {
if (*maskSTACK == '*') {
mask = maskSTACK+1;
if (!*mask) return 1;
name = nameSTACK;
goto Backtrack;
//} else if ((*maskSTACK != '?') && (*nameSTACK != *maskSTACK))
{
/*
} else if ((*maskSTACK - '?') * (*nameSTACK - *maskSTACK)) {
name++;
goto Backtrack;
}
*/
}
if ((*maskSTACK - '?') * (*nameSTACK - *maskSTACK)) {
name++;
goto Backtrack;
}
}
while (*maskSTACK == '*') ++maskSTACK;
return (!*maskSTACK);
}
```
```
[CPU: AMD Zen2 Ryzen7 4800H, @2.9GHz, Max. Boost Clock Up to 4.2GHz]
+-------------------------------------------------------------------------------+-------------------+-------------------------+-----------------------+
| Function \ Compiler
| CLANG 14.0.1, -O3 | Intel's ICL 19.0.0, /O3 | MinGW gcc 11.3.0, -O3 |
+-------------------------------------------------------------------------------+-------------------+-------------------------+-----------------------+
| Dogan Kurt's 'Antimalware', 2016, Iterative (wild_iterative)
| 70.605000 s | 102.610000 s | 83.398000 s |
| Dogan Kurt's 'Antimalware', 2016, Iterative Optimised (wild_iterative_opt)
| 61.322000 s | 74.243000 s | 66.538000 s |
| Tcheburaschka_r3, 2022, (Tcheburaschka_Wildcard_Iterative_Kaze_CaseSensitive)
| 72.990000 s | 76.161000 s | 127.717000 s |
| JackHandy_Iterative, 2005, (IterativeWildcards)
| 80.053000 s | 90.872000 s | 70.156000 s |
| Kirk J. Krauss, 2014, DrDobbs (FastWildCompare)
| 44.113000 s | 48.109000 s | 51.018000 s |
| Alessandro Cantatore, 2003, (szWildMatch7)
| 98.729000 s | 85.986000 s | 121.965000 s |
| Nondeterministic Finite Automaton (wild_nfa)
| 162.561000 s | 200.022000 s | 176.440000 s |
+-------------------------------------------------------------------------------+-------------------+-------------------------+-----------------------+
[Note1: All functions returned 1,075,000,000 Matches - that is TRUEs, kinda
means they passed the quality test, no, really, I printed all the 1's and 0's
after each run - the sequences matched.]
[Note2: It is well-known that Maximum Turbo Modes are maintained for some 15-30
seconds, so it is good that each function takes 30+ seconds, to emulate some 8
billion real-world searches.]
```
For more info:
https://github.com/kirkjkrauss/MatchingWildcards/issues/1#issue-1467311771
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/107905] 2x slowdown versus CLANG and ICL
2022-11-29 2:36 [Bug c/107905] New: 2x slowdown versus CLANG and ICL sanmayce at sanmayce dot com
@ 2022-11-29 3:22 ` pinskia at gcc dot gnu.org
2022-11-29 5:14 ` sanmayce at sanmayce dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-29 3:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization, ra
Component|target |middle-end
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I looked at the code generation and there is not much difference between GCC
and LLVM really.
The only difference I see some register allocation difference when it comes to
mask and name. GCC uses two extra registers to store those.
I don't see that causing a 2x slow down though.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/107905] 2x slowdown versus CLANG and ICL
2022-11-29 2:36 [Bug c/107905] New: 2x slowdown versus CLANG and ICL sanmayce at sanmayce dot com
2022-11-29 3:22 ` [Bug middle-end/107905] " pinskia at gcc dot gnu.org
@ 2022-11-29 5:14 ` sanmayce at sanmayce dot com
2022-11-29 19:19 ` amonakov at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: sanmayce at sanmayce dot com @ 2022-11-29 5:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
--- Comment #2 from Georgi <sanmayce at sanmayce dot com> ---
> I don't see that causing a 2x slow down though.
Same here, yet, the results prove that not only on Zen2, on top of that on
i5-7200U @2.5GHz and Windows 10, this function takes 203s, cannot figure it
out, could you run the wildtest_GCC executable (or compile to it) and see it is
so?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/107905] 2x slowdown versus CLANG and ICL
2022-11-29 2:36 [Bug c/107905] New: 2x slowdown versus CLANG and ICL sanmayce at sanmayce dot com
2022-11-29 3:22 ` [Bug middle-end/107905] " pinskia at gcc dot gnu.org
2022-11-29 5:14 ` sanmayce at sanmayce dot com
@ 2022-11-29 19:19 ` amonakov at gcc dot gnu.org
2022-11-29 19:31 ` pinskia at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-29 19:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|ra |
CC| |amonakov at gcc dot gnu.org
--- Comment #3 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
LLVM does a better job at code layout, and massively wins on the amount of
executed branches (in particular unconditional jumps). With -fdisable-rtl-bbro
gcc achieves a similar performance.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/107905] 2x slowdown versus CLANG and ICL
2022-11-29 2:36 [Bug c/107905] New: 2x slowdown versus CLANG and ICL sanmayce at sanmayce dot com
` (2 preceding siblings ...)
2022-11-29 19:19 ` amonakov at gcc dot gnu.org
@ 2022-11-29 19:31 ` pinskia at gcc dot gnu.org
2022-11-30 13:57 ` amonakov at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-29 19:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #3)
> LLVM does a better job at code layout, and massively wins on the amount of
> executed branches (in particular unconditional jumps). With
> -fdisable-rtl-bbro gcc achieves a similar performance.
-freorder-blocks-algorithm=simple seems to improve it there too.
This does sound like all accidential really.
Throwing some [[unlikely]] around seems to get "better" layout.
I am I suspecting it all depends on the inputs and might not actually be a
performance difference if the inputs are better "weighted" ....
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/107905] 2x slowdown versus CLANG and ICL
2022-11-29 2:36 [Bug c/107905] New: 2x slowdown versus CLANG and ICL sanmayce at sanmayce dot com
` (3 preceding siblings ...)
2022-11-29 19:31 ` pinskia at gcc dot gnu.org
@ 2022-11-30 13:57 ` amonakov at gcc dot gnu.org
2022-11-30 14:36 ` amonakov at gcc dot gnu.org
2022-11-30 15:01 ` sanmayce at sanmayce dot com
6 siblings, 0 replies; 8+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-30 13:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
--- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Not sure what you don't like about the inputs, they appear quite reasonable.
Perhaps GCC's estimation of bb frequencies is off (with profile feedback we
achieve good performance).
Georgi: you'll likely see better results with profile-guided optimization. You
can first compile the benchmark with -O2 -fprofile-generate, run the output (it
will generate *.gcda files), then compile again with -O2 -fprofile-use. For
Clang the options are spelled -fprofile-instr-generate and -fprofile-instr-use,
respectively.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/107905] 2x slowdown versus CLANG and ICL
2022-11-29 2:36 [Bug c/107905] New: 2x slowdown versus CLANG and ICL sanmayce at sanmayce dot com
` (4 preceding siblings ...)
2022-11-30 13:57 ` amonakov at gcc dot gnu.org
@ 2022-11-30 14:36 ` amonakov at gcc dot gnu.org
2022-11-30 15:01 ` sanmayce at sanmayce dot com
6 siblings, 0 replies; 8+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-11-30 14:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
--- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Let me add that Clang supports GCC's -fprofile-{generate,use} flags for
compatibility as well.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug middle-end/107905] 2x slowdown versus CLANG and ICL
2022-11-29 2:36 [Bug c/107905] New: 2x slowdown versus CLANG and ICL sanmayce at sanmayce dot com
` (5 preceding siblings ...)
2022-11-30 14:36 ` amonakov at gcc dot gnu.org
@ 2022-11-30 15:01 ` sanmayce at sanmayce dot com
6 siblings, 0 replies; 8+ messages in thread
From: sanmayce at sanmayce dot com @ 2022-11-30 15:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
--- Comment #7 from Georgi <sanmayce at sanmayce dot com> ---
(In reply to Alexander Monakov from comment #5)
> Not sure what you don't like about the inputs, they appear quite reasonable.
> Perhaps GCC's estimation of bb frequencies is off (with profile feedback we
> achieve good performance).
>
> Georgi: you'll likely see better results with profile-guided optimization.
> You can first compile the benchmark with -O2 -fprofile-generate, run the
> output (it will generate *.gcda files), then compile again with -O2
> -fprofile-use. For Clang the options are spelled -fprofile-instr-generate
> and -fprofile-instr-use, respectively.
Thank you Alexander, your help is much appreciated.
For the first time I see such problem, no idea what is the core-problem.
Regarding inputs, I agree, they are not only okay, they form one DEFINITIVE
benchmark, anyway, glad seeing the problem resolved when using your
-fdisable-rtl-bbro:
Rerunning, on Zen2 4800H:
D:\Wildtest_2022-Nov-29>3
D:\Wildtest_2022-Nov-29>wildtest_CLANG_14.0.1.exe
WILDTEST, wildcard benchmark written by Dogan Kurt (dogan.kurt@dodobyte.com),
modified by Kaze (twitter.com/Sanmayce), revision 2022-Nov-29.
Dogan Kurt's 'Antimalware', 2016, Iterative (wild_iterative):
70.441000 s
Dogan Kurt's 'Antimalware', 2016, Iterative Optimised (wild_iterative_opt):
61.099000 s
Tcheburaschka_r3, 2022, (Tcheburaschka_Wildcard_Iterative_Kaze_CaseSensitive):
72.853000 s
JackHandy_Iterative, 2005, (IterativeWildcards):
80.415000 s
Kirk J. Krauss, 2014, DrDobbs (FastWildCompare):
42.279000 s
Alessandro Cantatore, 2003, (szWildMatch7):
97.681000 s
Nondeterministic Finite Automaton (wild_nfa):
162.023000 s
D:\Wildtest_2022-Nov-29>wildtest_Intel_19.0.exe
WILDTEST, wildcard benchmark written by Dogan Kurt (dogan.kurt@dodobyte.com),
modified by Kaze (twitter.com/Sanmayce), revision 2022-Nov-29.
Dogan Kurt's 'Antimalware', 2016, Iterative (wild_iterative):
98.859000 s
Dogan Kurt's 'Antimalware', 2016, Iterative Optimised (wild_iterative_opt):
72.175000 s
Tcheburaschka_r3, 2022, (Tcheburaschka_Wildcard_Iterative_Kaze_CaseSensitive):
70.278000 s
JackHandy_Iterative, 2005, (IterativeWildcards):
107.693000 s
Kirk J. Krauss, 2014, DrDobbs (FastWildCompare):
45.988000 s
Alessandro Cantatore, 2003, (szWildMatch7):
79.309000 s
Nondeterministic Finite Automaton (wild_nfa):
170.198000 s
D:\Wildtest_2022-Nov-29>wildtest_GCC_11.3.0.exe
WILDTEST, wildcard benchmark written by Dogan Kurt (dogan.kurt@dodobyte.com),
modified by Kaze (twitter.com/Sanmayce), revision 2022-Nov-29.
Dogan Kurt's 'Antimalware', 2016, Iterative (wild_iterative):
113.664000 s
Dogan Kurt's 'Antimalware', 2016, Iterative Optimised (wild_iterative_opt):
72.522000 s
Tcheburaschka_r3, 2022, (Tcheburaschka_Wildcard_Iterative_Kaze_CaseSensitive):
78.138000 s
JackHandy_Iterative, 2005, (IterativeWildcards):
83.266000 s
Kirk J. Krauss, 2014, DrDobbs (FastWildCompare):
56.878000 s
Alessandro Cantatore, 2003, (szWildMatch7):
119.861000 s
Nondeterministic Finite Automaton (wild_nfa):
176.290000 s
Hope future GCCs to fix this, the Tcheburaschka function is good in my view.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-11-30 15:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-29 2:36 [Bug c/107905] New: 2x slowdown versus CLANG and ICL sanmayce at sanmayce dot com
2022-11-29 3:22 ` [Bug middle-end/107905] " pinskia at gcc dot gnu.org
2022-11-29 5:14 ` sanmayce at sanmayce dot com
2022-11-29 19:19 ` amonakov at gcc dot gnu.org
2022-11-29 19:31 ` pinskia at gcc dot gnu.org
2022-11-30 13:57 ` amonakov at gcc dot gnu.org
2022-11-30 14:36 ` amonakov at gcc dot gnu.org
2022-11-30 15:01 ` sanmayce at sanmayce dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).