From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 7C932395B066; Wed, 16 Nov 2022 15:13:29 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7C932395B066
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1668611609;
	bh=A5pu9OlC4VEjneAnotV2ShFwFjZ2Klib0/tJ/HIxhD8=;
	h=From:To:Subject:Date:From;
	b=Xu4IMnIEfJovYNFb8NsevzvnMwoVQOsfoETeoTO+ZkWAZensVVRYhRHk/vhNt1Kfp
	 5EDL41mU2DGLBmKvQZvLpBZvFJYeKzUgT74uVBO01dJbKmJvPsxJOC+JP+KEJfpEwD
	 x7iYUfFaH5+GJKanUpJyBPY3FQUa/HifD0aQ4V7E=
From: "tnfchris at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107717] New: [13 Regression] ICEs expanding
 permutes after g:dc95e1e9702f2f6367bbc108c8d01169be1b66d2
Date: Wed, 16 Nov 2022 15:13:29 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: ice-on-valid-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: tnfchris at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 keywords bug_severity priority component assigned_to reporter
 target_milestone cf_gcctarget
Message-ID: <bug-107717-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107717

            Bug ID: 107717
           Summary: [13 Regression] ICEs expanding permutes after
                    g:dc95e1e9702f2f6367bbc108c8d01169be1b66d2
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Keywords: ice-on-valid-code
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

After

commit dc95e1e9702f2f6367bbc108c8d01169be1b66d2 (origin/trunk, origin/maste=
r,
origin/HEAD)
Author: Hongyu Wang <hongyu.wang@intel.com>
Date:   Mon Jan 17 13:01:51 2022 +0800

    Optimize VEC_PERM_EXPR with same permutation index and operation

    The sequence
         c1 =3D VEC_PERM_EXPR (a, a, mask)
         c2 =3D VEC_PERM_EXPR (b, b, mask)
         c3 =3D c1 op c2
    can be optimized to
         c =3D a op b
         c3 =3D VEC_PERM_EXPR (c, c, mask)
    for all integer vector operation, and float operation with
    full permutation.

    gcc/ChangeLog:

            PR target/98167
            * match.pd: New perm + vector op patterns for int and fp vector.

    gcc/testsuite/ChangeLog:

            PR target/98167
            * gcc.target/i386/pr98167.c: New test.

We see various ICEs, an example is

void foo(int n, char *restrict out, char *restrict in) {
  for (int i=3Dn; i-->0; ) {
    out[i] +=3D in[i];
  }
}

compiled with

aarch64-none-linux-gnu -O3 -march=3Darmv8-a+sve2

The problem is that the match.pd pattern as written causes the permute to
switch from a single register permute to a two register one.

The reason is that when the folded result is expanded in SSA form

vec_perm (op @0 @1) (op @0 @1)

the result of applying op twice results in two distinct SSA names. This fai=
ls
because expand_vec_perm_const now tries to use a two operand expansion beca=
use
there's no easy way to tell that these two operands are the same.

If it happens early enough we can CSE the operands, but when this happens a=
fter
vec_lower it generated something the target does not support.

I tried getting expand_vec_perm_const to recognize that they are the same, =
but
that's quite hard.

It's best to prevent the generation of the two SSA names to begin with, or =
add
an additional rule for match.pd that's able to CSE this.

I'm making this issue because I don't know which approach upstream would li=
ke
so it's easier to ask first.=