public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/110823] New: [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils
@ 2023-07-26 19:37 eggert at cs dot ucla.edu
2023-07-26 19:38 ` [Bug rtl-optimization/110823] " eggert at cs dot ucla.edu
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: eggert at cs dot ucla.edu @ 2023-07-26 19:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
Bug ID: 110823
Summary: [missed optimization] >50% speedup for x86-64 ASCII
processing a la GNU diffutils
Product: gcc
Version: 13.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: eggert at cs dot ucla.edu
Target Milestone: ---
Created attachment 55643
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55643&action=edit
proprocessed source code inspired by GNU diffutils
This is GCC 13.1.1 20230614 (Red Hat 13.1.1-4) on x86-64.
While tuning GNU diffutils I noticed that its loops to process mostly-ASCII
text were not compiled well by GCC on x86-64. For a stripped-down example of
the problem, compile the attached program with:
gcc -O2 -S code-mbcel1.i
The result is in the attached file code-mbcel1.s. Its loop kernel assuming
ASCII text (starting on line 212) looks like this:
.L33:
testb %al, %al
js .L30
movl $1, %edx
.L31:
movl %eax, %eax
addq %rdx, %rbx
addq %rax, %rbp
movsbl (%rbx), %eax
testb %al, %al
jne .L33
As I understand it the "movl %eax, %eax" is unnecessary, as all code that
reaches .L31 guarantees that %rax's top 32 bits are zero.
Also, the loop body executes "testb %al, %al" twice when once would suffice.
(As a minor point, since the compiler can easily tell that %al is positive when
the loop is entered, it can omit the first testb.)
Suppose we change the above code to the following, as is done in the attached
file code-mbcel1-opt.s:
.L33:
movl $1, %edx
.L31:
addq %rdx, %rbx
addq %rax, %rbp
movsbl (%rbx), %eax
testb %al, %al
jg .L33
js .L30
This small change improves performance significantly: for me, the test program
runs 55% faster on a circa-2021 Intel Xeon W-1350, and 74% faster on a
circa-2010 AMD Phenom II x4 910e, using the following commands to benchmark:
gcc -O2 code-mbcel1.i -o code-mbcel1
gcc -O2 code-mbcel1-opt.s -o code-mbcel1-opt
time ./code-mbcel1
time ./code-mbcel1-opt
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils
2023-07-26 19:37 [Bug rtl-optimization/110823] New: [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils eggert at cs dot ucla.edu
@ 2023-07-26 19:38 ` eggert at cs dot ucla.edu
2023-07-26 19:38 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: eggert at cs dot ucla.edu @ 2023-07-26 19:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
--- Comment #1 from Paul Eggert <eggert at cs dot ucla.edu> ---
Created attachment 55644
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55644&action=edit
gcc -O2 -S output (from code-mbcel1.i)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils
2023-07-26 19:37 [Bug rtl-optimization/110823] New: [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils eggert at cs dot ucla.edu
2023-07-26 19:38 ` [Bug rtl-optimization/110823] " eggert at cs dot ucla.edu
@ 2023-07-26 19:38 ` pinskia at gcc dot gnu.org
2023-07-26 19:39 ` eggert at cs dot ucla.edu
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-26 19:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils
2023-07-26 19:37 [Bug rtl-optimization/110823] New: [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils eggert at cs dot ucla.edu
2023-07-26 19:38 ` [Bug rtl-optimization/110823] " eggert at cs dot ucla.edu
2023-07-26 19:38 ` pinskia at gcc dot gnu.org
@ 2023-07-26 19:39 ` eggert at cs dot ucla.edu
2023-07-26 19:54 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: eggert at cs dot ucla.edu @ 2023-07-26 19:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
--- Comment #2 from Paul Eggert <eggert at cs dot ucla.edu> ---
Created attachment 55645
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55645&action=edit
code-mbcel1.s with the optimization suggested in the bug report
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils
2023-07-26 19:37 [Bug rtl-optimization/110823] New: [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils eggert at cs dot ucla.edu
` (2 preceding siblings ...)
2023-07-26 19:39 ` eggert at cs dot ucla.edu
@ 2023-07-26 19:54 ` pinskia at gcc dot gnu.org
2023-07-30 11:38 ` amonakov at gcc dot gnu.org
2023-08-25 0:40 ` eggert at cs dot ucla.edu
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-26 19:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The gimple level looks like:
```
if (_54 >= 0)
goto <bb 4>; [90.00%]
else
goto <bb 5>; [10.00%]
<bb 4> [local count: 63261141172]:
_18 = (unsigned int) _54;
goto <bb 8>; [100.00%]
...
len_37 = mbrtoc32 (&ch, iter_39, _36, &mbs);
len.0_38 = (signed long) len_37;
if (len.0_38 < 0)
goto <bb 7>; [10.00%]
else
goto <bb 6>; [90.00%]
<bb 6> [local count: 632611429]:
ch.1_42 = ch; // Note this is a local variable
<bb 7> [local count: 7029015815]:
# SR.45_12 = PHI <ch.1_42(6), 0(5)>
# SR.46_46 = PHI <len_37(6), 1(5)>
mbs ={v} {CLOBBER(eol)};
ch ={v} {CLOBBER(eol)};
<bb 8> [local count: 70290156974]:
# SR.41_16 = PHI <_18(4), SR.45_12(7)>
# SR.42_47 = PHI <1(4), SR.46_46(7)>
_6 = (long long unsigned int) SR.41_16;
```
Maybe we should have a type promotion pass on the gimple level that promotes
_54 to `long unsigned int`.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils
2023-07-26 19:37 [Bug rtl-optimization/110823] New: [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils eggert at cs dot ucla.edu
` (3 preceding siblings ...)
2023-07-26 19:54 ` pinskia at gcc dot gnu.org
@ 2023-07-30 11:38 ` amonakov at gcc dot gnu.org
2023-08-25 0:40 ` eggert at cs dot ucla.edu
5 siblings, 0 replies; 7+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-07-30 11:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #4 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
It's a weakness in the REE pass. AFAICT normally it would handle this, but here
there are two elimination candidates in 'main', the first is eliminated
successfully, and then REE punts on the second because one if its reaching
definitions is the first redundant extension:
/* If def_insn is already scheduled to be deleted, don't attempt
to modify it. */
if (state->modified[INSN_UID (def_insn)].deleted)
return false;
While looking into this I noticed that the fix for PR 61094 introduced a
write-only bitfield 'do_not_reextend' (the Changelog wrongly claimed it was
used).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils
2023-07-26 19:37 [Bug rtl-optimization/110823] New: [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils eggert at cs dot ucla.edu
` (4 preceding siblings ...)
2023-07-30 11:38 ` amonakov at gcc dot gnu.org
@ 2023-08-25 0:40 ` eggert at cs dot ucla.edu
5 siblings, 0 replies; 7+ messages in thread
From: eggert at cs dot ucla.edu @ 2023-08-25 0:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
--- Comment #5 from Paul Eggert <eggert at cs dot ucla.edu> ---
Also see bug 111143 for a related performance issue, which is perhaps more
important given the current state of bleeding-edge GNU diffutils.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-08-25 0:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-26 19:37 [Bug rtl-optimization/110823] New: [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils eggert at cs dot ucla.edu
2023-07-26 19:38 ` [Bug rtl-optimization/110823] " eggert at cs dot ucla.edu
2023-07-26 19:38 ` pinskia at gcc dot gnu.org
2023-07-26 19:39 ` eggert at cs dot ucla.edu
2023-07-26 19:54 ` pinskia at gcc dot gnu.org
2023-07-30 11:38 ` amonakov at gcc dot gnu.org
2023-08-25 0:40 ` eggert at cs dot ucla.edu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).