public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug ipa/99788] New: missed optimization for dead code elimination at -O3 (vs. -O1)
@ 2021-03-26 10:44 zhendong.su at inf dot ethz.ch
  2021-03-26 11:48 ` [Bug tree-optimization/99788] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: zhendong.su at inf dot ethz.ch @ 2021-03-26 10:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99788

            Bug ID: 99788
           Summary: missed optimization for dead code elimination at -O3
                    (vs. -O1)
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: ipa
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhendong.su at inf dot ethz.ch
                CC: marxin at gcc dot gnu.org
  Target Milestone: ---

[606] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/11.0.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap
--prefix=/local/suz-local/software/local/gcc-trunk --enable-languages=c,c++
--disable-werror --enable-multilib --with-system-zlib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.1 20210326 (experimental) [master revision
9d45e848d02:ca344bbd24f:6081d8994ed1a0aef6b7f5fb34f091faa3580416] (GCC) 
[607] % 
[607] % gcctk -O1 -S -o O1.s small.c
[608] % gcctk -O3 -S -o O3.s small.c
[609] % 
[609] % wc O1.s O3.s
  79  162  986 O1.s
 109  229 1433 O3.s
 188  391 2419 total
[610] % 
[610] % grep foo O1.s
[611] % grep foo O3.s
        call    foo
[612] % 
[612] % cat small.c
extern void foo(void);

char a;
int b, *c, g, h = 1, i = 1, j, *k = &i;

static void d();
static int *e() {
  for (a = 1; a; a = a+2)
    ;
  foo();
  h = (g % h) % i;
  *k = -j;
  return 0;
}
static void f() {
  if (b)
    d(e);
}
void d() {
  for (;;)
    c = e();
}
int main() {
  f();
  return 0;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/99788] missed optimization for dead code elimination at -O3 (vs. -O1)
  2021-03-26 10:44 [Bug ipa/99788] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
@ 2021-03-26 11:48 ` rguenth at gcc dot gnu.org
  2021-03-26 11:49 ` rguenth at gcc dot gnu.org
  2023-08-18  1:35 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-26 11:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99788

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-03-26
            Version|unknown                     |11.0
          Component|ipa                         |tree-optimization
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  The issue is that at -O3 we inline e() and while inside e() we
eliminate the call to foo since the preceeding for() loop does not terminate
(CCP figures this out), the inline copy has the loop header PHI not simplified
at the point CCP runs (and it doesn't run later again):

  <bb 3> [local count: 43379093]:
  a = 1;
  a.3_4 = a;

  <bb 4> [local count: 350976297]:
  # a.3_3 = PHI <a.3_5(4), a.3_4(3)>
  a.2_6 = (unsigned char) a.3_3;
  _7 = a.2_6 + 2;
  _8 = (char) _7;
  a = _8;
  a.3_5 = a;
  if (a.3_5 != 0)
    goto <bb 4>; [87.64%]
  else
    goto <bb 5>; [12.36%]

  <bb 5> [local count: 43379093]:
  foo ();

vs.

  <bb 3> [local count: 955630225]:
  # a.3_22 = PHI <_3(3), 1(2)>
  a.2_1 = (unsigned char) a.3_22;
  _2 = a.2_1 + 2;
  _3 = (char) _2;
  a = _3;
  if (_3 != 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 4> [local count: 118111600]:
  foo ();

and the difference starts with loop header copying which is applied to
the outline but not the inline copy of the loop.

Analyzing loop 1
Loop 1 is not do-while loop: latch is not empty.
    Will duplicate bb 4
  Not duplicating bb 3: it is single succ.
Duplicating header of the loop 1 up to edge 4->3, 3 insns.
Loop 1 is do-while loop
Loop 1 is now do-while loop.

vs.

Analyzing loop 1
Analyzing loop 2
Loop 2 is not do-while loop: latch is not empty.
  Not duplicating bb 5: optimizing for size.

where the decision on optimizing for size is because this is main().  Renaming
main() to baz() fixes the issue.

But I wonder why we inline e() into cold main at all.  Honza?  I see

Processing frequency f/9
  Called by main/11 that is normal or hot
t.c:24:3: note: Inlining f/9 to main/11 with frequency 1.00

so here main() is normal or hot but loop header copying sees
optimize_loop_for_size_p () == true!?

IPA inlining sees

Considering d/10 with 20 size
 to be inlined into main/11 in t.c:17
 Estimated badness is -0.000046, frequency 0.00.
    Badness calculation for main/11 -> d/10
      size growth 16, time 8428.908463 unspec 8428.908463
      -0.000011: guessed profile. frequency 0.000400, count -1 caller count -1
time saved 0.004400 overall growth -4 (current) -4 (original) -4 (compensated)
      Adjusted by hints -0.000046
Updated mod-ref summary for main/11
  loads:
    Limits: 32 bases, 16 refs
    Every base
  stores:
    Limits: 32 bases, 16 refs
                Accounting size:17.00, time:2.97 on predicate exec:(true)
Processing frequency d/10
  Called by main/11 that is executed once
Processing frequency e/13
  Called by d/10 that is executed once
Node e/13 promoted to executed once.
                Accounting size:-2.00, time:-0.00 on predicate exec:(true)
                Accounting size:1.00, time:0.40 on predicate exec:(true)
t.c:17:5: optimized:  Inlined d/10 into main/11 which now has time 8.370758 and
size 24, net change of -4.

so something is off with how we process speed/size optimization.  Note
it looks like the loop copy in main gets cold also because it is predicated
by if (b) which is predicted as very cold:

  <bb 2> [local count: 1073741824]:
  b.0_2 = b;
  if (b.0_2 != 0)
    goto <bb 8>; [0.04%]
  else
    goto <bb 7>; [99.96%]

  <bb 8> [local count: 429496]:

  <bb 3> [local count: 43379093]:
  a = 1;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 350976297]:
  a.2_6 = (unsigned char) a.3_5;
  _7 = a.2_6 + 2;
  _8 = (char) _7;
  a = _8;

  <bb 5> [local count: 394355390]:
  a.3_5 = a;
  if (a.3_5 != 0)
    goto <bb 4>; [89.00%]
  else
    goto <bb 6>; [11.00%]

still when the function is not called main() we're not getting the
optimize_loop_for_size () predicate evaluated to true (with the
exact same local profile as above!).

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/99788] missed optimization for dead code elimination at -O3 (vs. -O1)
  2021-03-26 10:44 [Bug ipa/99788] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
  2021-03-26 11:48 ` [Bug tree-optimization/99788] " rguenth at gcc dot gnu.org
@ 2021-03-26 11:49 ` rguenth at gcc dot gnu.org
  2023-08-18  1:35 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-26 11:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99788

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Honza?  (why do we inline into cold main()?)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/99788] missed optimization for dead code elimination at -O3 (vs. -O1)
  2021-03-26 10:44 [Bug ipa/99788] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
  2021-03-26 11:48 ` [Bug tree-optimization/99788] " rguenth at gcc dot gnu.org
  2021-03-26 11:49 ` rguenth at gcc dot gnu.org
@ 2023-08-18  1:35 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-18  1:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99788

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |12.1.0
           Keywords|                            |needs-bisection

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Looks to be fixed in gcc 12

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-18  1:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-26 10:44 [Bug ipa/99788] New: missed optimization for dead code elimination at -O3 (vs. -O1) zhendong.su at inf dot ethz.ch
2021-03-26 11:48 ` [Bug tree-optimization/99788] " rguenth at gcc dot gnu.org
2021-03-26 11:49 ` rguenth at gcc dot gnu.org
2023-08-18  1:35 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).