From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 965763858402; Thu, 31 Aug 2023 07:07:49 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 965763858402
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1693465669;
	bh=RP4S0D2I+gDCYolf4nxsr3ZG3T1DXf/EujFjaRGjbvI=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=Hk054I/j5lk3yR3nLNebmpo3uHohY+VNEAyQ/HFBBZtkTZIOAr7T4SVYKMZcYSANB
	 iXJ4tLvHN/K/4gVXg1fYA58n6pZ2iqkcU7tQDvwJvfgZOq/zYZegpMRpOc7HffzxM8
	 K/UHDMV0vQSCkF/9HQm3yJsSgukKH1jq30Quzu7o=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/52252] An opportunity for x86 gcc vectorizer
 (gain up to 3 times)
Date: Thu, 31 Aug 2023 07:07:48 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 4.7.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-52252-4-A1kBeSVjRY@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-52252-4@http.gcc.gnu.org/bugzilla/>
References: <bug-52252-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D52252

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
We are not optimally vectorizing this yet, we are using SLP to cover
out[0], out[1], out[2] and single element interleaving for out[3].  The
stores end up strided (aka scalar), that's not what the reporter intended.
We also unroll the loop four times.

The SLP discovery code splits the store group (in the end we should avoid
throwing away such information).  This makes it have a gap and stores with
a gap are only supported "strided" (we could at least store two and one
element, but ...).  We don't support "merging" back the group from SLP
and non-SLP.  With SLP only we might recover here, possibly we shouldn't
allow half SLP / non-SLP for a store group but it might fail even after
discovery so it might be difficult to force this.  Maybe a good case to
"prime" single-lane SLP.=