From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 05C55385840C; Wed, 10 Jan 2024 08:12:46 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 05C55385840C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1704874367;
	bh=cmt3fffsjwHt3PltpL/riAAJDvA6YkSW6a6VZEFYURo=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=Ep4zlTU3Yhtn9NKYsDoS3HHCYA+/y3MVQNyJkJ4Y77FI0ce9Kvx1bFL2lZZFi06n5
	 s8rSwUz+kEL/VWXCcVc3TV1jDHdIHDM9kkm1zeMgqq0wZycy9wwhtPejzwwExr2dtC
	 bZS0FFvj1FfLV5xZ97yA8MetVqDEavxntb4Azjb8=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/113205] [14 Regression] internal compiler error: in
 backward_pass, at tree-vect-slp.cc:5346 since r14-3220
Date: Wed, 10 Jan 2024 08:12:45 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: ice-on-valid-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc see_also
Message-ID: <bug-113205-4-5AoQyZWXYF@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-113205-4@http.gcc.gnu.org/bugzilla/>
References: <bug-113205-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113205

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=3D110935

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so this should already reproduce before the change when removing the
invariant add (p + 8000).  The issue seems to be that SLP build ends up
with an unsupported load permutation when we try with V2SImode vectorization
after V4SImode is scrapped because of cost issues.  We have

t.c:18:10: note:   node 0x6471a48 (max_nunits=3D2, refcnt=3D2) vector(2) int
t.c:18:10: note:   op template: _3 =3D MEM[(int *)i.0_1 + 4B];
t.c:18:10: note:        stmt 0 _3 =3D MEM[(int *)i.0_1 + 4B];
t.c:18:10: note:        stmt 1 _5 =3D MEM[(int *)i.0_1 + 12B];
t.c:18:10: note:        stmt 2 _4 =3D MEM[(int *)i.0_1 + 8B];
t.c:18:10: note:        stmt 3 _2 =3D *i.0_1;
t.c:18:10: note:        load permutation { 1 3 2 0 }

I'm not sure whether that's a supported situation.  Changing the code
to be more graceful like
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b6cce55ce90..a12214bc1ad 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5343,8 +5343,8 @@ vect_optimize_slp_pass::backward_pass ()
            }
        }

-      gcc_assert (min_layout_cost.is_possible ());
-      partition.layout =3D min_layout_i;
+      if (min_layout_cost.is_possible ())
+       partition.layout =3D min_layout_i;
     }
 }

then yields

t.c:18:10: note:  SLP optimize permutations:
t.c:18:10: note:    1: { 1, 3, 2, 0 }
t.c:18:10: note:  SLP optimize partitions:
t.c:18:10: note:    -------------
t.c:18:10: note:    partition 0 (layout 0):
t.c:18:10: note:      nodes:
t.c:18:10: note:        - 0x5f0d9b0:
t.c:18:10: note:            weight: 1.000000
t.c:18:10: note:            out weight: 1.000000 (degree 1)
t.c:18:10: note:            op template: _20 =3D (int) _19;
t.c:18:10: note:      edges:
t.c:18:10: note:        - 0x5f0d9b0 --> [2] 0x5f0d928
t.c:18:10: note:      layout 0: rejected
t.c:18:10: note:      layout 1: rejected
t.c:18:10: note:    -------------
t.c:18:10: note:    partition 1 (layout 1):
t.c:18:10: note:      nodes:
t.c:18:10: note:        - 0x5f0da38:
t.c:18:10: note:            weight: 1.000000
t.c:18:10: note:            out weight: 1.000000 (degree 1)
t.c:18:10: note:            op template: _3 =3D MEM[(int *)i.0_1 + 4B];
t.c:18:10: note:      edges:
t.c:18:10: note:        - 0x5f0da38 --> [2] 0x5f0d928
t.c:18:10: note:      layout 0: rejected
t.c:18:10: note:      layout 1: rejected
t.c:18:10: note:    -------------
t.c:18:10: note:    partition 2 (layout 1):
t.c:18:10: note:      nodes:
t.c:18:10: note:        - 0x5f0d928:
t.c:18:10: note:            weight: 1.000000
t.c:18:10: note:            out weight: 1.000000 (degree 1)
t.c:18:10: note:            op template: _21 =3D _3 * _20;
t.c:18:10: note:      edges:
t.c:18:10: note:        - 0x5f0d928 --> [3] 0x5f0d8a0
t.c:18:10: note:        - 0x5f0d9b0 [0] --> 0x5f0d928
t.c:18:10: note:        - 0x5f0da38 [1] --> 0x5f0d928
t.c:18:10: note:      layout 0: rejected
t.c:18:10: note:      layout 1: rejected
t.c:18:10: note:    -------------
t.c:18:10: note:    partition 3 (layout 1):
t.c:18:10: note:      nodes:
t.c:18:10: note:        - 0x5f0d8a0:
t.c:18:10: note:            weight: 1.000000
t.c:18:10: note:            op template: _22 =3D (unsigned int) _21;
t.c:18:10: note:      edges:
t.c:18:10: note:        - 0x5f0d928 [2] --> 0x5f0d8a0
t.c:18:10: note:      layout 0:
t.c:18:10: note:          {depth: 1.000000, total: 1.000000}
t.c:18:10: note:        + {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        + {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        =3D {depth: 1.000000, total: 1.000000}
t.c:18:10: note:      layout 1: (*)
t.c:18:10: note:          {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        + {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        + {depth: 0.000000, total: 0.000000}
t.c:18:10: note:        =3D {depth: 0.000000, total: 0.000000}
t.c:18:10: note:  inserting permutation node in place of 0x5f0d9b0
t.c:18:10: note:  recording new base alignment for i.0_1
...
t.c:18:10: note:   vectorizing permutation op0[3] op0[0] op0[2] op0[1]
t.c:18:10: note:   vectorizing permutation op0[3] op0[0] op0[2] op0[1]
t.c:18:10: note:   as vops0[1][1] vops0[0][0], vops0[1][0] vops0[0][1]
t.c:18:10: missed:   unsupported vect permute { 1 2 }
t.c:18:10: note:   Building vector operands of 0x5f0db48 from scalars inste=
ad
...
t.c:18:10: note:   removing SLP instance operations starting from: _25 =3D =
_24 +
_40;
t.c:18:10: missed:  not vectorized: bad operation in basic block.
t.c:18:10: note: ***** Analysis failed with vector mode V8QI
t.c:18:10: note: ***** Re-trying analysis with vector mode V4QI

and the ICE is gone.

I'm not sure if we can "recover" in this way or whether leaving
partition.layout unchanged could lead to wrong-code if it were actually
possible to code generate it, thus whether it's really the inability
to generate the permute that triggers this issue.

Related to PR110935, with -Ofast we should elide the unsupported permute.=