From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A55FC3858C52; Fri, 23 Sep 2022 17:56:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A55FC3858C52 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663955816; bh=Hw09Stsod9bY+MbTQ3QDZTcNb/GqbDkpd5ObQ6SZw18=; h=From:To:Subject:Date:In-Reply-To:References:From; b=M2EfngrDQfDqvIh2ZOk+djDJAIvq35spkgnEm+0daVB7Z0KCNuGxROWM5KB26VHaR 6bBIjeHfxKzddDWhZjkJc8hOpbZtZWa0Zxd3i2PhnqFuN2kUN33LsndsskIip4+A1v 6itcBtPQMWKCPOgDYyfVA0BMGYH6Q3sf663758lY= From: "aldyh at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/107009] [13 Regression] massive unnecessary code blowup in vectorizer Date: Fri, 23 Sep 2022 17:56:56 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: aldyh at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107009 --- Comment #5 from Aldy Hernandez --- There are two things needed to fix this regression. First, we need an op1_range entry for bitwise-and, so that the 2->4 edge ra= nge has the correct nonzero bits for n_12. [local count: 118111600]: _1 =3D n_12(D) & 7; if (_1 !=3D 0) goto ; [0.00%] else goto ; [100.00%] With the correct tweak to range-ops, we have: 2->4 (F) n_12(D) : [irange] size_t [1, 18446744073709551608] NONZERO 0xfffffffffffffff8 Which is correct and what DOM would need from ranger to get the nonzero mask correct. However, set_global_ranges_from_unreachable_edges() in DOM is is only expor= ting ranges for unreachable edges on the SSA names feeding the final conditional above (_1). It also needs to calculate these ranges for other exports from this BB. In this case, we'd need to do the same thing for n_12 as well as = _1. In my conversion of DOM+evrp to DOM+ranger, I missed that evrp was doing th= is dance for all the ranges it knew about coming out of the BB, not just op1 of the conditional. This is legacy evrp: /* Push updated ranges only after finding all of them to avoid ordering issues that can lead to worse ranges. */ for (unsigned i =3D 0; i < vrs.length (); ++i) ... ... if (is_fallthru && m_update_global_ranges && all_uses_feed_or_dominated_by_stmt (vrs[i].first, stmt) /* The condition must post-dominate the definition point.= */ && (SSA_NAME_IS_DEFAULT_DEF (vrs[i].first) || (gimple_bb (SSA_NAME_DEF_STMT (vrs[i].first)) =3D=3D pred_e->src))) { set_ssa_range_info (vrs[i].first, vrs[i].second); maybe_set_nonzero_bits (pred_e, vrs[i].first); } All we'd need in theory is to loop over the exports to BB2, and export thos= e if the same logic applies.=20=20 I have two patches in testing that fix the regression.=