From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 70A8E383B80B; Mon, 16 Aug 2021 09:43:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 70A8E383B80B
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/101929] r12-2549 regress x264_r by 4% on CLX.
Date: Mon, 16 Aug 2021 09:43:37 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-101929-4-MGB48OpQRL@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-101929-4@http.gcc.gnu.org/bugzilla/>
References: <bug-101929-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Aug 2021 09:43:38 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101929

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's interesting to note that in

-  _820 =3D {_187, _189, _187, _189};
-  vect_t2_188.65_821 =3D VIEW_CONVERT_EXPR<vector(4) int>(_820);
-  vect__200.67_823 =3D vect_t0_184.64_819 - vect_t2_188.65_821;
-  vect__191.66_822 =3D vect_t0_184.64_819 + vect_t2_188.65_821;
-  _824 =3D VEC_PERM_EXPR <vect__191.66_822, vect__200.67_823, { 0, 1, 6, 7=
 }>;

we only need parts of the CTOR for the add/sub parts (because we ignore
some lanes with the blend).  That might even allow to elide the final
compose of the low/high part and expose some more insn parallelism.

Of course that looks quite difficult to achieve.

--

Note your CTOR cost estimates might be off given the CTORs are mostly
regular like

{ _181, _181, _181, _181, _262, _262, _262, _262, _343, _343, _343, _343, _=
48,
_48, _48, _48 }

thus could use 4 splats to xmm and 4 inserts?  For the V4SI vectorization
we unfortunately decide to do

t.c:37:9: note:   Using a splat of the uniform operand
t.c:37:9: note:   Using a splat of the uniform operand
t.c:37:9: note:   Building parent vector operands from scalars instead

and thus end up with { _49, _50, _49, _50 }.  That said, I don't think
the backend gets easy access to the actual CTOR layout yet to improve costi=
ng
(similar as with permutes and the actual permute mask).

--

It's difficult (if not impossible) for the vectorizer to second-guess
the followup FRE, we're a long way from doing loop + SLP vectorization
in one go and discover we can elide the vector store.=