From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id BFFFB385843B; Tue,  5 Mar 2024 10:44:59 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BFFFB385843B
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1709635499;
	bh=Wpl09r0WukfDN4BobVs71ol7BwxFTl0P61jQ/Pbp5Go=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=vBh5eIM8B3c6cga+J5kpv/+diHa2ToweFypFGtris1JnhN565BeFsSmdRTi4CrNuC
	 eExYifW+irUBPG+WOfdmiqBnIZ76BBMEAKQB4L24FoCwcGYjLX1KFh1Vej3VobVgBv
	 RZW6PoU443b6ixeJm7BhkuZ61yL2PNz6nI1ZWHAQ=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/113441] [14 Regression] Fail to fold the last
 element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7
Date: Tue, 05 Mar 2024 10:44:57 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-113441-4-sSMoWAdntZ@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-113441-4@http.gcc.gnu.org/bugzilla/>
References: <bug-113441-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113441
--- Comment #44 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Sandiford from comment #42)
> Created attachment 57605 [details]
> proof-of-concept patch to suppress peeling for gaps
>=20
> How about the attached?  It records whether all accesses that require
> peeling for gaps could instead have used gathers, and only retries when
> that's true.  It means that we retry for only 0.034% of calls to
> vect_analyze_loop_1 in a build of SPEC2017 with -mcpu=3Dneoverse-v1 -Ofast
> -fomit-frame-pointer.

I guess this idea would work, but as said full re-analysis shouldn't be
required, instead "just" the updated cost on the affected loads/stores
need to be recomputed?  Of course this would require quite some
implementation work.  If we want to just fix this regression the approach
looks sensible but it would be also applied to x86 which doesn't want to
compare costs, right?  I'm not sure the gather vs. permute costing there
makes this a good idea for stage4?=