From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B4B5C38582B8; Fri, 2 Sep 2022 13:00:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B4B5C38582B8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1662123626; bh=rPshdtOKFSV2D+lYmxODK5sfxblZQbtw4RsKkT5mUzs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=xq3NyUOChSuF3A2LvgSEYB1V34Utgy9tsNILPSvGZcjv472UG74zKOGXsIORAN3XG WEP0x/Dzi3vpjEBpAEmue5WYYYcxZ4p4yAla/Yin3V6PRB9ZRKvuYi9tuJhOGgrXyh Y9Hu+i+VAfKAl+ecz7jizyrK2B8wJGs9xp5S4nGE= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/106787] [13 Regression] ICE in vect_schedule_slp_node, at tree-vect-slp.cc:8648 since r13-2288-g61c4c989034548f4 Date: Fri, 02 Sep 2022 13:00:25 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: ice-on-valid-code X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rsandifo at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106787 --- Comment #3 from CVS Commits --- The trunk branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:eab511df13ca6abb24c3c2abb0f420a89c91e310 commit r13-2377-geab511df13ca6abb24c3c2abb0f420a89c91e310 Author: Richard Sandiford Date: Fri Sep 2 14:00:14 2022 +0100 vect: Ensure SLP nodes don't end up in multiple BB partitions [PR106787] In the PR we have two REDUC_PLUS SLP instances that share a common load of stride 4. Each instance also has a unique contiguous load. Initially all three loads are out of order, so have a nontrivial load permutation. The layout pass puts them in order instead, For the two contiguous loads it is possible to do this by adjusting the SLP_LOAD_PERMUTATION to be { 0, 1, 2, 3 }. But a SLP_LOAD_PERMUTATION of { 0, 4, 8, 12 } is rejected as unsupported, so the pass creates a separate VEC_PERM_EXPR instead. Later the 4-stride load's initial SLP_LOAD_PERMUTATION is rejected too, so that the load gets replaced by an external node built from scalars. We then have an external node feeding a VEC_PERM_EXPR. VEC_PERM_EXPRs created in this way do not have any associated SLP_TREE_SCALAR_STMTS. This means that they do not affect the decision about which nodes should be in which subgraph for costing purposes. If the VEC_PERM_EXPR is fed by a vect_external_def, then the VEC_PERM_EXPR's input doesn't affect that decision either. The net effect is that a shared VEC_PERM_EXPR fed by an external def can appear in more than one subgraph. This triggered an ICE in vect_schedule_node, which (rightly) expects to be called no more than once for the same internal def. There seemed to be many possible fixes, including: (1) Replace unsupported loads with external defs *before* doing the layout optimisation. This would avoid the need for the VEC_PERM_EXPR altogether. (2) If the target doesn't support a load in its original layout, stop the layout optimisation from checking whether the target supports loads in any new candidate layout. In other words, treat all layouts as if they were supported whenever the original layout is not in fact supported. I'd rather not do this. In principle, the layout optimisation could convert an unsupported layout to a supported one. Selectively ignoring target support would work against that. We could try to look specifically for loads that will need to be decomposed, but that just seems like admitting that things are happening in the wrong order. (3) Add SLP_TREE_SCALAR_STMTS to VEC_PERM_EXPRs. That would be OK for this case, but wouldn't be possible for external defs that represent existing vectors. (4) Make vect_schedule_slp share SCC info between subgraphs. It feels like that's working around the partitioning problem rather than a real fix though. (5) Directly ensure that internal def nodes belong to a single subgraph. (1) is probably the best long-term fix, but (5) is much simpler. The subgraph partitioning code already has a hash set to record which nodes have been visited; we just need to convert that to a map from nodes to instances instead. gcc/ PR tree-optimization/106787 * tree-vect-slp.cc (vect_map_to_instance): New function, split = out from... (vect_bb_partition_graph_r): ...here. Replace the visited set with a map from nodes to instances. Ensure that a node only appears in one partition. (vect_bb_partition_graph): Update accordingly. gcc/testsuite/ * gcc.dg/vect/bb-slp-layout-19.c: New test.=