From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 67E2B3858C60; Tue, 22 Aug 2023 09:34:25 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 67E2B3858C60
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1692696865;
	bh=+zYthBK6f/gn/oz/5OOhRBsPY5rhjjzZjsO/iofGPLc=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=tMqnDcaNLsWynDpAW2iEhf3/RwvD3FsJ/qYbaECxmYTVqU/fgIwjqb69EfxFq9kl/
	 MID9BpMx+f3xJx6ZuaizOcEF/WV+qG1SZ+jndhpLqO5hvu1yQMwTEYTiKCB+bC/4W0
	 fh+0Gd+JRyl/uK3eBhxu0HrefBoIWNc5dgYx/i34=
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into
 single vunpckhpd
Date: Tue, 22 Aug 2023 09:34:22 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 10.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: cvs-commit at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-94864-4-60iWid9rzt@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-94864-4@http.gcc.gnu.org/bugzilla/>
References: <bug-94864-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D94864
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:27de9aa152141e7f3ee66372647d0f2cd94c4b90

commit r14-3381-g27de9aa152141e7f3ee66372647d0f2cd94c4b90
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jul 12 15:01:47 2023 +0200

    tree-optimization/94864 - vector insert of vector extract simplification

    The PRs ask for optimizing of

      _1 =3D BIT_FIELD_REF <b_3(D), 64, 64>;
      result_4 =3D BIT_INSERT_EXPR <a_2(D), _1, 64>;

    to a vector permutation.  The following implements this as
    match.pd pattern, improving code generation on x86_64.

    On the RTL level we face the issue that backend patterns inconsistently
    use vec_merge and vec_select of vec_concat to represent permutes.

    I think using a (supported) permute is almost always better
    than an extract plus insert, maybe excluding the case we extract
    element zero and that's aliased to a register that can be used
    directly for insertion (not sure how to query that).

    The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
    where we now expand from

     __A_28 =3D VEC_PERM_EXPR <x2.8_9, x1.9_10, { 0, 9, 10, 11, 12, 13, 14,=
 15
}>;

    instead of

     _28 =3D BIT_FIELD_REF <x2.8_9, 16, 0>;
     __A_29 =3D BIT_INSERT_EXPR <x1.9_10, _28, 0>;

    producing a vpblendw instruction instead of the expected vmovsh.  That's
    either a missed vec_perm_const expansion optimization or even better,
    an improvement - Zen4 for example has 4 ports to execute vpblendw
    but only 3 for executing vmovsh and both instructions have the same siz=
e.

    The patch XFAILs the sub-testcase.

            PR tree-optimization/94864
            PR tree-optimization/94865
            PR tree-optimization/93080
            * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
            for vector insertion from vector extraction.

            * gcc.target/i386/pr94864.c: New testcase.
            * gcc.target/i386/pr94865.c: Likewise.
            * gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL.
            * gcc.dg/tree-ssa/forwprop-40.c: Likewise.
            * gcc.dg/tree-ssa/forwprop-41.c: Likewise.=