From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 601B338930DA; Tue,  6 Apr 2021 11:44:22 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 601B338930DA
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/99881] Regression compare -O2 -ftree-vectorize with -O2
 on SKX/CLX
Date: Tue, 06 Apr 2021 11:44:22 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-99881-4-Q7vReEIVQq@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-99881-4@http.gcc.gnu.org/bugzilla/>
References: <bug-99881-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Apr 2021 11:44:22 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99881
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #4)
> (In reply to Richard Biener from comment #3)
> > But 2 element construction _should_ be cheap.  What is missing is the m=
ove
> > cost from GPR to XMM regs (but we do not have a good idea whether the s=
ources
> > are memory, so it's not as clear-cut here either).
> >=20
> > IMHO a better approach might be to up unaligned vector store/load costs?
> >=20
> > For the testcase at hand why does a throughput of 1 pose a problem?  Th=
ere's
> > only one punpckldq instruction around?
> >=20
>=20
> There're several lea/add(which also may use port 5) instructions around
> punckldq, considering that FAST LEA and Int ALU will be common in address
> computation, throughput of 1 for punckldq will be a bottleneck.
>=20
> refer to https://godbolt.org/z/hK9r5vTzd for original case

Too bad.  But this is starting to model resource constraints which are not
at all handled by the generic part of the vectorizer cost model.  We kind-of
have the ability to do this in the target (see how rs6000 models some of th=
is
in its finis_cost hook via rs6000_density_test).  But then the cost model
suffers from quite some GIGO already and I fear adding complexity will only
produce more 'G'.

As you have seen you need quite some offset to make up for the saved store,
I think trying to get integer_to_sse costed for the movd/pinsrq would be a
better way than parametrizing 'vec_construct' (because there's no vec_const=
ruct
instruction - there's multiple pieces to it).

> > Note that for the case of non-loop vectorization of 'double' the two el=
ement
> > vector CTORs are common and important to handle cheaply.  See also all =
the
> > discussion in PR98856=