From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 127FB3858D35; Thu, 16 Mar 2023 17:03:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 127FB3858D35 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678986185; bh=KukIghGOLY88zDmbh82yNrHFs8wDP6EmhoExRCHtAlI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=F9VpgnW3dHGP2Y0aqcD/U7jvvampG0XItwOG5leZDHZrt8mAJi97d/wtZtYfQEPdb l561hI9LDRvR4gyZyviikzlmbjmPBRaYQfooPqQIOiUEx+Loc4xZCXfMS9seW24TQv FX+b8ym/+aBBjZ4PfaJ1nvV+PE+bJvne3oIliXAA= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/109154] [13 regression] aarch64 -mcpu=neoverse-v1 microbude performance regression Date: Thu, 16 Mar 2023 17:03:03 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109154 --- Comment #2 from Tamar Christina --- Confirmed, It looks like the extra range information from g:4fbe3e6aa74dae5c75a73c46ae6683fdecd1a75d is leading jump threading down t= he wrong path. Reduced testcase: --- int etot_0, fasten_main_natpro_chrg_init; void fasten_main_natpro() { float elcdst =3D 1; for (int l; l < 1; l++) { int zone1 =3D l < 0.0f, chrg_e =3D fasten_main_natpro_chrg_init * (zone= 1 ?: 1) * (l < elcdst ? 1 : 0.0f); etot_0 +=3D chrg_e; } } --- and compile with `-O1`. Issue also effects all targets not just AArch64 https://godbolt.org/z/qes4K4oTz. and using `-fno-thread-jumps` confirmed to "fix" it. With the new case jump threading seems to duplicate the edges on the l < 0.= 0f check. the dump says: "Jump threading proved probability of edge 5->7 too small (it is 41.0% (guessed) should be 69.5% (guessed))" In BB 3 the branch probabilities are guessed as: if (_1 < 0.0) goto ; [41.00%] else goto ; [59.00%] and in BB 5: if (_1 < 1.0e+0)=20=20=20=20=20=20=20 goto ; [41.00%] else goto ; [59.00%] and so it thinks that the chances of _1 >=3D 0.0 && _1 < 1.0 is very small: if (_1 < 1.0e+0) goto ; [14.80%] else goto ; [85.20%] The problem is that both BB 4 falls through to BB 5, and BB 6 falls through= to BB 7. jump threading optimizes BB 5 by splitting the work to be done in BB 5 for = the fall-through from BB 4 back into BB 4. It then threads the additional edge to BB 7 where the final calculation is = now more expensive. much more than before (three way phi-node). but because the hot path in BB 6 also falls into BB 7 the overall result is that all paths become slower. but the hot path actually got an additional comparison. This is why the code slows down, for each instance of this occurrence (and = in the example provided by microbude it happens often) we get an addition bran= ch in a few paths. this has a bigger slow down in SVE (vs the scalar slowdown) because it then creates a longer dependency chain on producing the predicate for the BB. It looks like this threading shouldn't be done if both hot and cold branches end up in the same place?=