From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 965763858402; Thu, 31 Aug 2023 07:07:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 965763858402 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693465669; bh=RP4S0D2I+gDCYolf4nxsr3ZG3T1DXf/EujFjaRGjbvI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Hk054I/j5lk3yR3nLNebmpo3uHohY+VNEAyQ/HFBBZtkTZIOAr7T4SVYKMZcYSANB iXJ4tLvHN/K/4gVXg1fYA58n6pZ2iqkcU7tQDvwJvfgZOq/zYZegpMRpOc7HffzxM8 K/UHDMV0vQSCkF/9HQm3yJsSgukKH1jq30Quzu7o= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer (gain up to 3 times) Date: Thu, 31 Aug 2023 07:07:48 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.7.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D52252 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #9 from Richard Biener --- We are not optimally vectorizing this yet, we are using SLP to cover out[0], out[1], out[2] and single element interleaving for out[3]. The stores end up strided (aka scalar), that's not what the reporter intended. We also unroll the loop four times. The SLP discovery code splits the store group (in the end we should avoid throwing away such information). This makes it have a gap and stores with a gap are only supported "strided" (we could at least store two and one element, but ...). We don't support "merging" back the group from SLP and non-SLP. With SLP only we might recover here, possibly we shouldn't allow half SLP / non-SLP for a store group but it might fail even after discovery so it might be difficult to force this. Maybe a good case to "prime" single-lane SLP.=