From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0AD3D385771F; Thu, 27 Jul 2023 18:01:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0AD3D385771F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690480917; bh=iUA+ybcwZHONo9XHCPVaLxe8B/AOul/zDskFYVUKn7I=; h=From:To:Subject:Date:In-Reply-To:References:From; b=XWsbDoYEyvotXhsi9j5XefZtf+Lwh1d7OntUoE7YZcS6FUNuma6eEN8CFDPC/ZRYT jmQ4Jyrc1vydDo23ZUOmLmQphCfwmNiIxbhGKHkAr0eHLRvxWKdA0I7P6CsHuzykr8 hSY5j8/jhUwJAZiBKx+zs7vGgVBedIbHfZL2qv9A= From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 Date: Thu, 27 Jul 2023 18:01:54 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106293 --- Comment #15 from Jan Hubicka --- if (bb_loop_depth (best_bb) =3D=3D bb_loop_depth (early_bb) /* If result of comparsion is unknown, prefer EARLY_BB. Thus use !(...>=3D..) rather than (...<...) */ - && !(best_bb->count * 100 >=3D early_bb->count * threshold)) + && !(best_bb->count * 100 > early_bb->count * threshold)) return best_bb; Comparing loop depths seems ceartainly odd.=20=20 If we want to test best_bb and early_bb to be in same loop, we want to test loop_father. What is a benefit of testing across loop nests? Profile report here claims: dump id |static mismat|dynamic mismatch = |=20=20=20 |in count |in count |time = |=20=20=20 lsplit | 5 +5| 8151850567 +8151850567| 531506481006 +57.9= %|=20 ldist | 9 +4| 15345493501 +7193642934| 606848841056 +14.2= %|=20 ifcvt | 10 +1| 15487514871 +142021370| 689469797790 +13.6= %|=20 vect | 35 +25| 17558425961 +2070911090| 517375405715 -25.0= %|=20 cunroll | 42 +7| 16898736178 -659689783| 452445796198 -4.9= %|=20=20 loopdone| 33 -9| 2678017188 -14220718990| 330969127663 = |=20=20=20 tracer | 34 +1| 2678018710 +1522| 330613415364 +0.0= %|=20=20 fre | 33 -1| 2676980249 -1038461| 330465677073 -0.0= %|=20=20 expand | 28 -5| 2497468467 -179511782|-------------------------= -| so looks like loop splitting, distribution and vectorizer does disturb prof= ile signficantly.=20 (Ifcft does so by design and the damage is undone later.) Not sure if that is the real problem though.=