From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BFFFB385843B; Tue, 5 Mar 2024 10:44:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BFFFB385843B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709635499; bh=Wpl09r0WukfDN4BobVs71ol7BwxFTl0P61jQ/Pbp5Go=; h=From:To:Subject:Date:In-Reply-To:References:From; b=vBh5eIM8B3c6cga+J5kpv/+diHa2ToweFypFGtris1JnhN565BeFsSmdRTi4CrNuC eExYifW+irUBPG+WOfdmiqBnIZ76BBMEAKQB4L24FoCwcGYjLX1KFh1Vej3VobVgBv RZW6PoU443b6ixeJm7BhkuZ61yL2PNz6nI1ZWHAQ= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop since g:2efe3a7de0107618397264017fb045f237764cc7 Date: Tue, 05 Mar 2024 10:44:57 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113441 --- Comment #44 from Richard Biener --- (In reply to Richard Sandiford from comment #42) > Created attachment 57605 [details] > proof-of-concept patch to suppress peeling for gaps >=20 > How about the attached? It records whether all accesses that require > peeling for gaps could instead have used gathers, and only retries when > that's true. It means that we retry for only 0.034% of calls to > vect_analyze_loop_1 in a build of SPEC2017 with -mcpu=3Dneoverse-v1 -Ofast > -fomit-frame-pointer. I guess this idea would work, but as said full re-analysis shouldn't be required, instead "just" the updated cost on the affected loads/stores need to be recomputed? Of course this would require quite some implementation work. If we want to just fix this regression the approach looks sensible but it would be also applied to x86 which doesn't want to compare costs, right? I'm not sure the gather vs. permute costing there makes this a good idea for stage4?=