From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 50CAB385C6E0; Fri, 28 Jul 2023 07:22:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 50CAB385C6E0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690528942; bh=LFKd9tmpMu3jgqk7y+2VzFo/mnTf3etROvbGwxceKQo=; h=From:To:Subject:Date:In-Reply-To:References:From; b=R7xb/mxCXqhCW4FpMtOs1xPLYwyV+mycpLqYPyeTht7QPCdPUmfZVtYyAIysEsRvE Phw4CJvLm6cD6ZqyiuE4KV5qCtv/nx233vB09BDRaPGjnu9iywjsAiKpF20uE545co dRvceyDTHNL0r2dCxUxEEM8PbedsuPDMFKkmgLbE= From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 Date: Fri, 28 Jul 2023 07:22:19 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106293 --- Comment #17 from rguenther at suse dot de --- On Thu, 27 Jul 2023, hubicka at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106293 >=20 > --- Comment #15 from Jan Hubicka --- > if (bb_loop_depth (best_bb) =3D=3D bb_loop_depth (early_bb) > /* If result of comparsion is unknown, prefer EARLY_BB. > Thus use !(...>=3D..) rather than (...<...) */ > - && !(best_bb->count * 100 >=3D early_bb->count * threshold)) > + && !(best_bb->count * 100 > early_bb->count * threshold)) > return best_bb; >=20 > Comparing loop depths seems ceartainly odd.=20=20 > If we want to test best_bb and early_bb to be in same loop, we want to te= st > loop_father. What is a benefit of testing across loop nests? This heuristic wants to catch if (foo) abort (); and avoid sinking "too far" across a path with "similar enough" execution count (I think the original motivation was to fix some spilling / register pressure issue). The loop depth test should be !(bb_loop_depth (best_bb) < bb_loop_depth (early_bb)) so we shouldn't limit sinking to a more outer nest. As we rule out > before this becomes =3D=3D. It looks tempting to sink to the earliest place with the same execution count rather than the latest but the above doesn't really achive that (it doesn't look "upwards" but simply fails). With a guessed profile it's also going to be hard. And it in no way implements register pressure / spilling sensitivity (see also Ajits attempts at producing a patch that avoids sinking across a call). All these are ultimatively doomed unless we at least consider a group of stmts together.=