public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/107050] New: duplicate load of return value when facing multiple branches
@ 2022-09-27  8:53 absoler at smail dot nju.edu.cn
  2022-09-27  9:20 ` [Bug rtl-optimization/107050] " rguenth at gcc dot gnu.org
  2022-09-27 19:53 ` segher at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: absoler at smail dot nju.edu.cn @ 2022-09-27  8:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107050

            Bug ID: 107050
           Summary: duplicate load of return value when facing multiple
                    branches
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: absoler at smail dot nju.edu.cn
  Target Milestone: ---

given this code:

int g_286 = (-5L);
int p;
int f = 1;

void func_58();

func_31(int c, int d) {
        if (c) {
                if (f){
                        if (d)
                                func_58();
                        return g_286;
                }
                g_286 = 0;
        }
        return 0;
}
func_58() {
        int arr[30];
        p = arr[0];
}

when compiled with gcc-12.1.0 (-O1), it will generate:

0000000000401186 <func_58>:
  401186:       48 83 ec 10             sub    $0x10,%rsp
  40118a:       8b 44 24 88             mov    -0x78(%rsp),%eax
  40118e:       89 05 f4 8c 00 00       mov    %eax,0x8cf4(%rip)        #
409e88 <p>
  401194:       48 83 c4 10             add    $0x10,%rsp
  401198:       c3                      retq   

0000000000401199 <func_31>:
  401199:       89 f8                   mov    %edi,%eax
  40119b:       85 ff                   test   %edi,%edi
  40119d:       74 1f                   je     4011be <func_31+0x25>
  40119f:       8b 05 bb 2e 00 00       mov    0x2ebb(%rip),%eax        #
404060 <f>
  4011a5:       85 c0                   test   %eax,%eax
  4011a7:       75 0b                   jne    4011b4 <func_31+0x1b>
  4011a9:       c7 05 b1 2e 00 00 00    movl   $0x0,0x2eb1(%rip)        #
404064 <g_286>
  4011b0:       00 00 00 
  4011b3:       c3                      retq   
  4011b4:       8b 05 aa 2e 00 00       mov    0x2eaa(%rip),%eax        #
404064 <g_286>
  4011ba:       85 f6                   test   %esi,%esi
  4011bc:       75 01                   jne    4011bf <func_31+0x26>
  4011be:       c3                      retq   
  4011bf:       48 83 ec 08             sub    $0x8,%rsp
  4011c3:       b8 00 00 00 00          mov    $0x0,%eax
  4011c8:       e8 b9 ff ff ff          callq  401186 <func_58>
  4011cd:       8b 05 91 2e 00 00       mov    0x2e91(%rip),%eax        #
404064 <g_286>
  4011d3:       48 83 c4 08             add    $0x8,%rsp
  4011d7:       c3                      retq 

we can see in the func_31, compiler choose to load g_286 before judge whether d
!= 0, and if it's true, %eax will be used and func_58 is called. Before return,
g_286 will be loaded again to %eax

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug rtl-optimization/107050] duplicate load of return value when facing multiple branches
  2022-09-27  8:53 [Bug rtl-optimization/107050] New: duplicate load of return value when facing multiple branches absoler at smail dot nju.edu.cn
@ 2022-09-27  9:20 ` rguenth at gcc dot gnu.org
  2022-09-27 19:53 ` segher at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-27  9:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107050

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |segher at gcc dot gnu.org
   Last reconfirmed|                            |2022-09-27
      Known to fail|                            |13.0
             Target|                            |x86_64-*-*
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  It's shrink-wrapping which is duplicating blocks and the load. 
Before we have

    1: NOTE_INSN_DELETED
    5: NOTE_INSN_BASIC_BLOCK 2
    2: ax:SI=di:SI
    4: NOTE_INSN_FUNCTION_BEG
    7: flags:CCZ=cmp(ax:SI,0)
    8: pc={(flags:CCZ==0)?L27:pc}
      REG_BR_PROB 536870916
    9: NOTE_INSN_BASIC_BLOCK 3
   10: ax:SI=[`f']
   11: flags:CCZ=cmp(ax:SI,0)
   12: pc={(flags:CCZ==0)?L24:pc}
      REG_BR_PROB 708669604
   13: NOTE_INSN_BASIC_BLOCK 4
   14: flags:CCZ=cmp(si:SI,0)
   15: pc={(flags:CCZ==0)?L19:pc}
      REG_BR_PROB 719407028
   16: NOTE_INSN_BASIC_BLOCK 5
   17: ax:QI=0
   18: call [`func_58'] argc:0
      REG_EH_REGION 0
   19: L19:
   20: NOTE_INSN_BASIC_BLOCK 6
   21: ax:SI=[`g_286']
   38: pc=L27
   39: barrier
   24: L24:
   25: NOTE_INSN_BASIC_BLOCK 7
   26: [`g_286']=0
   27: L27:
   28: NOTE_INSN_BASIC_BLOCK 8
   34: use ax:SI
   40: NOTE_INSN_DELETED

maybe shrink-wrapping should consider splitting blocks before doing the
transform?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug rtl-optimization/107050] duplicate load of return value when facing multiple branches
  2022-09-27  8:53 [Bug rtl-optimization/107050] New: duplicate load of return value when facing multiple branches absoler at smail dot nju.edu.cn
  2022-09-27  9:20 ` [Bug rtl-optimization/107050] " rguenth at gcc dot gnu.org
@ 2022-09-27 19:53 ` segher at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: segher at gcc dot gnu.org @ 2022-09-27 19:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107050

--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Splitting blocks in shrink-wrap will cause degraded performance compared
to the status quo, on average.  If I understand what will be split how,
that is?  It certainly can be good to move more code, much much more than
prepare_shrink_wrap does, but that is a good trade-off most of the time
only because it makes the fast path faster, makes less code executed when
there is an early return: just randomly moving code to be executed later
makes code *slower*.

Where shrink-wrapping duplicates code here only one copy is executed, ever.

The question seems to really be why at -O1 global variable accesses are not
optimised very well?  The answer to that is this is -O1, if you want good
optimisation you should use -O2!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-27 19:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27  8:53 [Bug rtl-optimization/107050] New: duplicate load of return value when facing multiple branches absoler at smail dot nju.edu.cn
2022-09-27  9:20 ` [Bug rtl-optimization/107050] " rguenth at gcc dot gnu.org
2022-09-27 19:53 ` segher at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).