From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 4F438385802A; Mon, 15 Mar 2021 09:52:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4F438385802A From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/90579] [8/9/10/11 Regression] Huge store forward stall due to vectorizer, missed CSE Date: Mon, 15 Mar 2021 09:52:08 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 9.1.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 8.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Mar 2021 09:52:09 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D90579 --- Comment #12 from Richard Biener --- (In reply to Richard Biener from comment #9) > So we now have a "real" FRE after the vectorizer but we fail to CSE >=20 > MEM [(double *)&r] =3D vect__3.20_74; > ... > MEM [(double *)&r + 32B] =3D vect__62.26_88; > ... > vect__5.7_34 =3D MEM [(double *)&r + 16B]; >=20 > mine for GCC 11 to look at. The code to CSE that load for _74 and _88 > is going to be a bit awkward though but it will nicely combine with the > following stmts >=20 > vect__5.8_35 =3D VEC_PERM_EXPR ; > stmp_t_12.9_36 =3D BIT_FIELD_REF ; > stmp_t_12.9_37 =3D stmp_t_12.9_36 + 0.0; > stmp_t_12.9_38 =3D BIT_FIELD_REF ; > stmp_t_12.9_39 =3D stmp_t_12.9_37 + stmp_t_12.9_38; > stmp_t_12.9_40 =3D BIT_FIELD_REF ; > stmp_t_12.9_41 =3D stmp_t_12.9_39 + stmp_t_12.9_40; > stmp_t_12.9_42 =3D BIT_FIELD_REF ; > t_12 =3D stmp_t_12.9_41 + stmp_t_12.9_42; >=20 > and hopefully elide 'r' completely. So the difficult thing is that we need to compose the upper v2df half of vect__3.20_74 and the v2df vect__62.26_88. Assembly for that would be sth like vextractf128 $0x1, %ymm0, %xmm0 vinsertf128 $0x1, %xmm1, %ymm0, %ymm0 and on GIMPLE tem_42 =3D BIT_FIELD_REF ; vect__5.7_34 =3D { tem_42, vect__62.26_88 }; that's two stmts which at the moment VN simplification insertion doesn't support. It would be "nicer" to enhance for example VEC_PERM to allow vect__5.7_34 =3D VEC_PERM "implicitely" extending _88 to v4df (aka a paradoxical v4df subreg of the v2df SSE reg). It would turn VEC_PERM into a concat + select operation with not requiring the intermediate to have vector mode (in this case it would have v6df without introducing subregs, a mode not possible). On RTL unfortunately (vec_select:V4DF (vec_concat (reg:V4DF ..) (reg:V2DF .= .)) ..) is not possible because of that restriction. OTOH RTL lacks that concat-and-select operation, allowing the cited form and vec_merge to be "merged" (vec_merge doesn't require such intermediate mode either). I'll see how difficult it is to teach VN multi-stmt insertions.=