public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/107050] New: duplicate load of return value when facing multiple branches
@ 2022-09-27 8:53 absoler at smail dot nju.edu.cn
2022-09-27 9:20 ` [Bug rtl-optimization/107050] " rguenth at gcc dot gnu.org
2022-09-27 19:53 ` segher at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: absoler at smail dot nju.edu.cn @ 2022-09-27 8:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107050
Bug ID: 107050
Summary: duplicate load of return value when facing multiple
branches
Product: gcc
Version: 12.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: absoler at smail dot nju.edu.cn
Target Milestone: ---
given this code:
int g_286 = (-5L);
int p;
int f = 1;
void func_58();
func_31(int c, int d) {
if (c) {
if (f){
if (d)
func_58();
return g_286;
}
g_286 = 0;
}
return 0;
}
func_58() {
int arr[30];
p = arr[0];
}
when compiled with gcc-12.1.0 (-O1), it will generate:
0000000000401186 <func_58>:
401186: 48 83 ec 10 sub $0x10,%rsp
40118a: 8b 44 24 88 mov -0x78(%rsp),%eax
40118e: 89 05 f4 8c 00 00 mov %eax,0x8cf4(%rip) #
409e88 <p>
401194: 48 83 c4 10 add $0x10,%rsp
401198: c3 retq
0000000000401199 <func_31>:
401199: 89 f8 mov %edi,%eax
40119b: 85 ff test %edi,%edi
40119d: 74 1f je 4011be <func_31+0x25>
40119f: 8b 05 bb 2e 00 00 mov 0x2ebb(%rip),%eax #
404060 <f>
4011a5: 85 c0 test %eax,%eax
4011a7: 75 0b jne 4011b4 <func_31+0x1b>
4011a9: c7 05 b1 2e 00 00 00 movl $0x0,0x2eb1(%rip) #
404064 <g_286>
4011b0: 00 00 00
4011b3: c3 retq
4011b4: 8b 05 aa 2e 00 00 mov 0x2eaa(%rip),%eax #
404064 <g_286>
4011ba: 85 f6 test %esi,%esi
4011bc: 75 01 jne 4011bf <func_31+0x26>
4011be: c3 retq
4011bf: 48 83 ec 08 sub $0x8,%rsp
4011c3: b8 00 00 00 00 mov $0x0,%eax
4011c8: e8 b9 ff ff ff callq 401186 <func_58>
4011cd: 8b 05 91 2e 00 00 mov 0x2e91(%rip),%eax #
404064 <g_286>
4011d3: 48 83 c4 08 add $0x8,%rsp
4011d7: c3 retq
we can see in the func_31, compiler choose to load g_286 before judge whether d
!= 0, and if it's true, %eax will be used and func_58 is called. Before return,
g_286 will be loaded again to %eax
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug rtl-optimization/107050] duplicate load of return value when facing multiple branches
2022-09-27 8:53 [Bug rtl-optimization/107050] New: duplicate load of return value when facing multiple branches absoler at smail dot nju.edu.cn
@ 2022-09-27 9:20 ` rguenth at gcc dot gnu.org
2022-09-27 19:53 ` segher at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-27 9:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107050
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
CC| |segher at gcc dot gnu.org
Last reconfirmed| |2022-09-27
Known to fail| |13.0
Target| |x86_64-*-*
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. It's shrink-wrapping which is duplicating blocks and the load.
Before we have
1: NOTE_INSN_DELETED
5: NOTE_INSN_BASIC_BLOCK 2
2: ax:SI=di:SI
4: NOTE_INSN_FUNCTION_BEG
7: flags:CCZ=cmp(ax:SI,0)
8: pc={(flags:CCZ==0)?L27:pc}
REG_BR_PROB 536870916
9: NOTE_INSN_BASIC_BLOCK 3
10: ax:SI=[`f']
11: flags:CCZ=cmp(ax:SI,0)
12: pc={(flags:CCZ==0)?L24:pc}
REG_BR_PROB 708669604
13: NOTE_INSN_BASIC_BLOCK 4
14: flags:CCZ=cmp(si:SI,0)
15: pc={(flags:CCZ==0)?L19:pc}
REG_BR_PROB 719407028
16: NOTE_INSN_BASIC_BLOCK 5
17: ax:QI=0
18: call [`func_58'] argc:0
REG_EH_REGION 0
19: L19:
20: NOTE_INSN_BASIC_BLOCK 6
21: ax:SI=[`g_286']
38: pc=L27
39: barrier
24: L24:
25: NOTE_INSN_BASIC_BLOCK 7
26: [`g_286']=0
27: L27:
28: NOTE_INSN_BASIC_BLOCK 8
34: use ax:SI
40: NOTE_INSN_DELETED
maybe shrink-wrapping should consider splitting blocks before doing the
transform?
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug rtl-optimization/107050] duplicate load of return value when facing multiple branches
2022-09-27 8:53 [Bug rtl-optimization/107050] New: duplicate load of return value when facing multiple branches absoler at smail dot nju.edu.cn
2022-09-27 9:20 ` [Bug rtl-optimization/107050] " rguenth at gcc dot gnu.org
@ 2022-09-27 19:53 ` segher at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: segher at gcc dot gnu.org @ 2022-09-27 19:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107050
--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Splitting blocks in shrink-wrap will cause degraded performance compared
to the status quo, on average. If I understand what will be split how,
that is? It certainly can be good to move more code, much much more than
prepare_shrink_wrap does, but that is a good trade-off most of the time
only because it makes the fast path faster, makes less code executed when
there is an early return: just randomly moving code to be executed later
makes code *slower*.
Where shrink-wrapping duplicates code here only one copy is executed, ever.
The question seems to really be why at -O1 global variable accesses are not
optimised very well? The answer to that is this is -O1, if you want good
optimisation you should use -O2!
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-09-27 19:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 8:53 [Bug rtl-optimization/107050] New: duplicate load of return value when facing multiple branches absoler at smail dot nju.edu.cn
2022-09-27 9:20 ` [Bug rtl-optimization/107050] " rguenth at gcc dot gnu.org
2022-09-27 19:53 ` segher at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).