From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 14E683858C29; Sat, 30 Dec 2023 12:35:51 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 14E683858C29
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1703939751;
	bh=0oyYfwR+Vn/nGvmlkKUG+HPyIenYcU79++X08AHnrG0=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=MGJ8/+Ij5qZyP+7rL+AMyFqHxk6lGnuNQkbmpkrnvmMgeAJsQwTxx5VhHLBi9gmKU
	 myedI3aP4CNo3coqA59QhU5cybcW/323nCI+pEynAwIlWwpW8Uisg5CBowO9TnjqJO
	 BBovEr5R39i1iN1IG6ZxEJR9jKD8mS17nm8mqjzI=
From: "rsandifo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113104] Suboptimal loop-based slp node
 splicing across iterations
Date: Sat, 30 Dec 2023 12:35:49 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rsandifo at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rsandifo at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_status cf_reconfirmed_on everconfirmed cc
 assigned_to
Message-ID: <bug-113104-4-Ky6KpQU5nY@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-113104-4@http.gcc.gnu.org/bugzilla/>
References: <bug-113104-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113104

Richard Sandiford <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-12-30
     Ever confirmed|0                           |1
                 CC|                            |rsandifo at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |rsandifo at gcc dot=
 gnu.org
--- Comment #4 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
FWIW, we do get the desired code with -march=3Darmv8-a+sve (even though the=
 test
doesn't use SVE).  This is because of:

  /* Consider enabling VECT_COMPARE_COSTS for SVE, both so that we
     can compare SVE against Advanced SIMD and so that we can compare
     multiple SVE vectorization approaches against each other.  There's
     not really any point doing this for Advanced SIMD only, since the
     first mode that works should always be the best.  */
  if (TARGET_SVE && aarch64_sve_compare_costs)
    flags |=3D VECT_COMPARE_COSTS;

The testcase in this PR is a counterexample to the claim in the final sente=
nce.
 I think the comment might predate significant support for mixed-sized Adva=
nced
SIMD vectorisation.

If we enable SVE (or uncomment the "if" line), the costs are 13 units per
vector iteration for 128-bit vectors and 4 units per vector iteration for
64-bit vectors (so 8 units per 128 bits on a parity basis).  The 64-bit ver=
sion
is therefore seen as significantly cheaper and is chosen ahead of the 128-b=
it
version.

I think this PR is enough proof that we should enable VECT_COMPARE_COSTS ev=
en
without SVE.  Assigning to myself for that.=