From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9E4FA385840E; Tue, 11 Jul 2023 10:41:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9E4FA385840E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689072104; bh=3s7pTypjU3hiLnTwDUhg9vFNDhejvzPV6ESqW7NFqw0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=R/C940hpVhGfoevsQNkNgyuZH7GXFGoGoHSR2XCdZm5nWcOThV+o3ILjec3/hyufd zUYHzJW97D3OCb6kBDG/Y7tD8VqSZF77FtbH0LVIq7xTIhU3X4yi4vCDsmeVlgMz6g n2LOn5/iYnDV3sy5RD2YmlHhlbNZ7R0TO3qQxIBE= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/110625] [AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large Date: Tue, 11 Jul 2023 10:41:44 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cf_gcctarget keywords Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110625 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |aarch64 Keywords| |missed-optimization --- Comment #1 from Richard Biener --- Well, I think count is handled correctly even for SLP. Given we accumulate 'short' to 'double' we likely perform 'count' adds to the m's here and those are chained in a simple way. We specifically avoid creating more reduction variables because of register pressure issues with and without SLP if possible. Note when you have for example three scalar reductions we will up the number of IVs to use with SLP, so using 'count' isn't always 100% accurate but it the case of the testcase it should be. But I'm not sure what "reduction-latency" tries to measure.=