From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 59EBB3858403; Sat, 13 Nov 2021 22:11:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 59EBB3858403 From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug ipa/103227] 58% exchange2 regression with -Ofast -march=native on zen3 between g:1ae8edf5f73ca5c3 and g:2af63f0f53a12a72 Date: Sat, 13 Nov 2021 22:11:16 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: ipa X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Nov 2021 22:11:16 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103227 --- Comment #2 from Jan Hubicka --- There is difference in inlier decision. Since all clones are of same size = it depends on the order inliner picks them and combines together before hitting large-function-growth. It seems that with isra ordering inliner simply less lucky. Instead of inline stack: IPA function summary for digits_2.constprop/143 inlinable global time: 22960.500916 self size: 1277 global size: 2534 min size: 513 self stack: 261 global stack: 783 estimated growth:-488 size:513.000000, time:6690.410500 size:3.000000, time:2.000001, executed if:(not inlined) size:0.500000, time:0.500000, executed if:(not inlined), nonconst if:(op0[ref offset: 0] changed) && (not inlined) size:138.500000, time:217.532556, nonconst if:(op0[ref offset: 0] chan= ged) size:36.000000, time:34.793911, executed if:(op0[ref offset: 0],(# % 3= ) =3D=3D 2), nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3= ) =3D=3D 2) size:198.000000, time:574.099545, executed if:(op0[ref offset: 0],(# %= 3) =3D=3D 2) size:36.000000, time:34.793911, executed if:(op0[ref offset: 0],(# % 3= ) =3D=3D 1), nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3= ) =3D=3D 1) size:270.000000, time:1357.103458, executed if:(op0[ref offset: 0],(# = % 3) =3D=3D 1) size:21.000000, time:375.971570, executed if:(op0[ref offset: 0] =3D= =3D 5) size:1263.000000, time:12359.502960, executed if:(op0[ref offset: 0] != =3D 8) size:1.000000, time:0.900000, executed if:(op0[ref offset: 0] !=3D 8),= =20 nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] !=3D 8) size:48.000000, time:1300.920311, executed if:(op0[ref offset: 0] =3D= =3D 8) loop iterations: 0.68 for (op0[ref offset: 0] changed) 0.76 for (op0[ref offset: 0] changed) 0.88 for (op0[ref offset: 0] changed) 1.08 for (op0[ref offset: 0] changed) 1.40 for (op0[ref offset: 0] changed) 1.93 for (op0[ref offset: 0] changed) 2.80 for (op0[ref offset: 0] changed) 4.23 for (op0[ref offset: 0] changed) 11.88 for (op0[ref offset: 0] changed) 4.59 for (op0[ref offset: 0] changed) 3.16 for (op0[ref offset: 0] changed) 2.29 for (op0[ref offset: 0] changed) 1.76 for (op0[ref offset: 0] changed) 1.44 for (op0[ref offset: 0] changed) 1.24 for (op0[ref offset: 0] changed) 1.12 for (op0[ref offset: 0] changed) calls: covered.constprop/148 --param max-inline-insns-auto limit reached freq:0.30 loop depth: 9 size: 4 time: 13 callee size:262 stack:1472 predicate: (op0[ref offset: 0] =3D=3D 8) op0 is compile time invariant op0 points to local or readonly memory op1 is compile time invariant op1 points to local or readonly memory digits_2.constprop/144 inlined freq:0.90 Stack frame offset 261, callee self size 261 __builtin_unreachable/156 unreachable freq:0.00 cross module loop depth:18 size: 0 time: 0 predicate: (false) op0 is compile time invariant op0 points to local or readonly memory op1 is compile time invariant op1 points to local or readonly memory digits_2.constprop/145 inlined freq:0.81 Stack frame offset 522, callee self size 261 __builtin_unreachable/156 unreachable freq:0.00 cross module loop depth:27 size: 0 time: 0 predicate: (false) op0 points to local or readonly memory op1 is compile time invariant op1 points to local or readonly memory digits_2.constprop/146 --param large-function-growth limit reached freq:0.73 loop depth:27 size: 2 time: 11 callee size:1019 stack:5= 22 predicate: (op0[ref offset: 0] !=3D 8) op0 is compile time invariant op0 points to local or readonly memory where inlining fails only at recursion depth 4 we get: IPA function summary for digits_2.constprop.isra/163 inlinable global time: 17184.704285 self size: 1277 global size: 1994 min size: 513 self stack: 261 global stack: 522 estimated growth:301 size:513.000000, time:6690.410500 size:3.000000, time:2.000001, executed if:(not inlined) size:0.500000, time:0.500000, executed if:(not inlined), nonconst if:(op0[ref offset: 0] changed) && (not inlined) size:138.500000, time:217.532556, nonconst if:(op0[ref offset: 0] chan= ged) size:36.000000, time:34.793911, executed if:(op0[ref offset: 0],(# % 3= ) =3D=3D 2), nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3= ) =3D=3D 2) size:198.000000, time:574.099545, executed if:(op0[ref offset: 0],(# %= 3) =3D=3D 2) size:36.000000, time:34.793911, executed if:(op0[ref offset: 0],(# % 3= ) =3D=3D 1), nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3= ) =3D=3D 1) size:270.000000, time:1357.103458, executed if:(op0[ref offset: 0],(# = % 3) =3D=3D 1) size:21.000000, time:375.971570, executed if:(op0[ref offset: 0] =3D= =3D 5) size:723.000000, time:6582.815331, executed if:(op0[ref offset: 0] != =3D 8) size:1.000000, time:0.900000, executed if:(op0[ref offset: 0] !=3D 8),= =20 nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] !=3D 8) size:48.000000, time:1300.920311, executed if:(op0[ref offset: 0] =3D= =3D 8) loop iterations: 0.68 for (op0[ref offset: 0] changed) 0.76 for (op0[ref offset: 0] changed) 0.88 for (op0[ref offset: 0] changed) 1.08 for (op0[ref offset: 0] changed) 1.40 for (op0[ref offset: 0] changed) 1.93 for (op0[ref offset: 0] changed) 2.80 for (op0[ref offset: 0] changed) 4.23 for (op0[ref offset: 0] changed) 11.88 for (op0[ref offset: 0] changed) 4.59 for (op0[ref offset: 0] changed) 3.16 for (op0[ref offset: 0] changed) 2.29 for (op0[ref offset: 0] changed) 1.76 for (op0[ref offset: 0] changed) 1.44 for (op0[ref offset: 0] changed) 1.24 for (op0[ref offset: 0] changed) 1.12 for (op0[ref offset: 0] changed) calls: digits_2.constprop.isra/162 inlined freq:0.90 Stack frame offset 261, callee self size 261 digits_2.constprop.isra/161 --param large-function-growth limit reach= ed freq:0.81 loop depth:18 size: 2 time: 11 callee size:1033 stack:522 predicate: (op0[ref offset: 0] !=3D 8) op0 is compile time invariant op0 points to local or readonly memory __builtin_unreachable/168 unreachable freq:0.00 cross module loop depth:18 size: 0 time: 0 predicate: (false) op0 is compile time invariant op0 points to local or readonly memory op1 is compile time invariant op1 points to local or readonly memory covered.constprop/148 --param max-inline-insns-auto limit reached freq:0.30 loop depth: 9 size: 4 time: 13 callee size:262 stack:1472 predicate: (op0[ref offset: 0] =3D=3D 8) op0 is compile time invariant op0 points to local or readonly memory op1 is compile time invariant op1 points to local or readonly memory where we fail at depth2=