From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E6E6F385702C; Wed, 6 Jan 2021 09:48:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E6E6F385702C From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98138] BB vect fail to SLP one case Date: Wed, 06 Jan 2021 09:48:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jan 2021 09:48:39 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98138 --- Comment #6 from Richard Biener --- Starting from the loads is not how SLP discovery works so there will be zero re-use of code. Sure - the only important thing is you end up with a valid SLP graph. But going back to the original testcase and the proposed vectorization for power - is that faster in the end? For the "rewrite" of the vectorizer into all-SLP we do have to address that "interleaving scheme not carried out as interleaving" at some point, but that's usually for loop vectorization - for BB vectorization all we have is optimize_slp. I have patches that would build the vector load SLP node (you still have to kill that 'build from scalars' thing to make it trigger ). But then we end up with a shared vector load node and N extract/splat operations at the 'scalar' points. It's not entirely clear to me how to re-arrange the SLP graph at that point. Btw, on current trunk the simplified testcase no longer runs into the 'scalar operand' build case but of course vectorization is thought to be not profitable. pattern recog of the plus/minus subgraphs may help (not sure if ppc has those as instruction, x86 has). That said, "failure" to identify the common (vector) load is known and I do have experimental patches trying to address that but did not yet arrive at a conclusive "best" approach.=