public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug ipa/103227] 58% exchange2 regression with -Ofast -march=native on zen3 between g:1ae8edf5f73ca5c3 and g:2af63f0f53a12a72
Date: Sat, 13 Nov 2021 22:11:16 +0000	[thread overview]
Message-ID: <bug-103227-4-dXZGkwgmen@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-103227-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
There is difference in inlier decision.  Since all clones are of same size it
depends on the order inliner picks them and combines together before hitting
large-function-growth.  It seems that with isra ordering inliner simply less
lucky.

Instead of inline stack:
IPA function summary for digits_2.constprop/143 inlinable
  global time:     22960.500916
  self size:       1277
  global size:     2534
  min size:       513
  self stack:      261
  global stack:    783
  estimated growth:-488
    size:513.000000, time:6690.410500
    size:3.000000, time:2.000001,  executed if:(not inlined)
    size:0.500000, time:0.500000,  executed if:(not inlined),  nonconst
if:(op0[ref offset: 0] changed) && (not inlined)
    size:138.500000, time:217.532556,  nonconst if:(op0[ref offset: 0] changed)
    size:36.000000, time:34.793911,  executed if:(op0[ref offset: 0],(# % 3) ==
2),  nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
2)
    size:198.000000, time:574.099545,  executed if:(op0[ref offset: 0],(# % 3)
== 2)
    size:36.000000, time:34.793911,  executed if:(op0[ref offset: 0],(# % 3) ==
1),  nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
1)
    size:270.000000, time:1357.103458,  executed if:(op0[ref offset: 0],(# % 3)
== 1)
    size:21.000000, time:375.971570,  executed if:(op0[ref offset: 0] == 5)
    size:1263.000000, time:12359.502960,  executed if:(op0[ref offset: 0] != 8)
    size:1.000000, time:0.900000,  executed if:(op0[ref offset: 0] != 8), 
nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] != 8)
    size:48.000000, time:1300.920311,  executed if:(op0[ref offset: 0] == 8)
  loop iterations:  0.68 for (op0[ref offset: 0] changed)
  0.76 for (op0[ref offset: 0] changed)
  0.88 for (op0[ref offset: 0] changed)
  1.08 for (op0[ref offset: 0] changed)
  1.40 for (op0[ref offset: 0] changed)
  1.93 for (op0[ref offset: 0] changed)
  2.80 for (op0[ref offset: 0] changed)
  4.23 for (op0[ref offset: 0] changed)
  11.88 for (op0[ref offset: 0] changed)
  4.59 for (op0[ref offset: 0] changed)
  3.16 for (op0[ref offset: 0] changed)
  2.29 for (op0[ref offset: 0] changed)
  1.76 for (op0[ref offset: 0] changed)
  1.44 for (op0[ref offset: 0] changed)
  1.24 for (op0[ref offset: 0] changed)
  1.12 for (op0[ref offset: 0] changed)
  calls:
    covered.constprop/148 --param max-inline-insns-auto limit reached
      freq:0.30 loop depth: 9 size: 4 time: 13 callee size:262 stack:1472
predicate: (op0[ref offset: 0] == 8)
       op0 is compile time invariant
       op0 points to local or readonly memory
       op1 is compile time invariant
       op1 points to local or readonly memory
    digits_2.constprop/144 inlined
      freq:0.90
      Stack frame offset 261, callee self size 261
      __builtin_unreachable/156 unreachable
        freq:0.00 cross module loop depth:18 size: 0 time:  0 predicate:
(false)
         op0 is compile time invariant
         op0 points to local or readonly memory
         op1 is compile time invariant
         op1 points to local or readonly memory
      digits_2.constprop/145 inlined
        freq:0.81
        Stack frame offset 522, callee self size 261
        __builtin_unreachable/156 unreachable
          freq:0.00 cross module loop depth:27 size: 0 time:  0 predicate:
(false)
           op0 points to local or readonly memory
           op1 is compile time invariant
           op1 points to local or readonly memory
        digits_2.constprop/146 --param large-function-growth limit reached
          freq:0.73 loop depth:27 size: 2 time: 11 callee size:1019 stack:522
predicate: (op0[ref offset: 0] != 8)
           op0 is compile time invariant
           op0 points to local or readonly memory

where inlining fails only at recursion depth 4 we get:

IPA function summary for digits_2.constprop.isra/163 inlinable
  global time:     17184.704285
  self size:       1277
  global size:     1994
  min size:       513
  self stack:      261
  global stack:    522
  estimated growth:301
    size:513.000000, time:6690.410500
    size:3.000000, time:2.000001,  executed if:(not inlined)
    size:0.500000, time:0.500000,  executed if:(not inlined),  nonconst
if:(op0[ref offset: 0] changed) && (not inlined)
    size:138.500000, time:217.532556,  nonconst if:(op0[ref offset: 0] changed)
    size:36.000000, time:34.793911,  executed if:(op0[ref offset: 0],(# % 3) ==
2),  nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
2)
    size:198.000000, time:574.099545,  executed if:(op0[ref offset: 0],(# % 3)
== 2)
    size:36.000000, time:34.793911,  executed if:(op0[ref offset: 0],(# % 3) ==
1),  nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
1)
    size:270.000000, time:1357.103458,  executed if:(op0[ref offset: 0],(# % 3)
== 1)
    size:21.000000, time:375.971570,  executed if:(op0[ref offset: 0] == 5)
    size:723.000000, time:6582.815331,  executed if:(op0[ref offset: 0] != 8)
    size:1.000000, time:0.900000,  executed if:(op0[ref offset: 0] != 8), 
nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] != 8)
    size:48.000000, time:1300.920311,  executed if:(op0[ref offset: 0] == 8)
  loop iterations:  0.68 for (op0[ref offset: 0] changed)
  0.76 for (op0[ref offset: 0] changed)
  0.88 for (op0[ref offset: 0] changed)
  1.08 for (op0[ref offset: 0] changed)
  1.40 for (op0[ref offset: 0] changed)
  1.93 for (op0[ref offset: 0] changed)
  2.80 for (op0[ref offset: 0] changed)
  4.23 for (op0[ref offset: 0] changed)
  11.88 for (op0[ref offset: 0] changed)
  4.59 for (op0[ref offset: 0] changed)
  3.16 for (op0[ref offset: 0] changed)
  2.29 for (op0[ref offset: 0] changed)
  1.76 for (op0[ref offset: 0] changed)
  1.44 for (op0[ref offset: 0] changed)
  1.24 for (op0[ref offset: 0] changed)
  1.12 for (op0[ref offset: 0] changed)
  calls:
    digits_2.constprop.isra/162 inlined
      freq:0.90
      Stack frame offset 261, callee self size 261
      digits_2.constprop.isra/161 --param large-function-growth limit reached
        freq:0.81 loop depth:18 size: 2 time: 11 callee size:1033 stack:522
predicate: (op0[ref offset: 0] != 8)
         op0 is compile time invariant
         op0 points to local or readonly memory
      __builtin_unreachable/168 unreachable
        freq:0.00 cross module loop depth:18 size: 0 time:  0 predicate:
(false)
         op0 is compile time invariant
         op0 points to local or readonly memory
         op1 is compile time invariant
         op1 points to local or readonly memory
    covered.constprop/148 --param max-inline-insns-auto limit reached
      freq:0.30 loop depth: 9 size: 4 time: 13 callee size:262 stack:1472
predicate: (op0[ref offset: 0] == 8)
       op0 is compile time invariant
       op0 points to local or readonly memory
       op1 is compile time invariant
       op1 points to local or readonly memory

where we fail at depth2

  parent reply	other threads:[~2021-11-13 22:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-13 21:05 [Bug tree-optimization/103227] New: " hubicka at gcc dot gnu.org
2021-11-13 22:00 ` [Bug ipa/103227] " hubicka at gcc dot gnu.org
2021-11-13 22:11 ` hubicka at gcc dot gnu.org [this message]
2021-11-13 22:15 ` hubicka at gcc dot gnu.org
2021-11-15  9:04 ` [Bug ipa/103227] [12 Regression] 58% exchange2 regression with -Ofast -march=native on zen3 since r12-5223-gecdf414bd89e6ba251f6b3f494407139b4dbae0e rguenth at gcc dot gnu.org
2021-11-19 18:18 ` jamborm at gcc dot gnu.org
2021-11-19 21:12 ` hubicka at kam dot mff.cuni.cz
2021-11-19 21:22 ` hubicka at gcc dot gnu.org
2021-11-19 23:21 ` jamborm at gcc dot gnu.org
2021-11-20 12:32 ` hubicka at kam dot mff.cuni.cz
2021-11-20 12:39 ` hubicka at kam dot mff.cuni.cz
2021-11-21 15:16 ` cvs-commit at gcc dot gnu.org
2021-11-23 17:02 ` jamborm at gcc dot gnu.org
2021-11-24 12:52 ` jamborm at gcc dot gnu.org
2021-11-25 17:17 ` cvs-commit at gcc dot gnu.org
2021-11-26  9:19 ` hubicka at gcc dot gnu.org
2021-11-28 18:56 ` hubicka at gcc dot gnu.org
2022-12-14  0:04 ` cvs-commit at gcc dot gnu.org
2023-08-15 15:45 ` jamborm at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-103227-4-dXZGkwgmen@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).