From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id DE4B03858404; Thu, 28 Oct 2021 04:44:27 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DE4B03858404
From: "pinskia at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/102977] [GCC12 regression] vectorizer failed to
 generate complex fma with SVE
Date: Thu, 28 Oct 2021 04:44:27 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: pinskia at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: INVALID
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: resolution bug_status
Message-ID: <bug-102977-4-0wp59iw0Ea@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-102977-4@http.gcc.gnu.org/bugzilla/>
References: <bug-102977-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Oct 2021 04:44:28 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102977

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Huh.
The trunk code is vectorized all the way:
        ptrue   p1.h, vl8 ; set p1.h to 8 wide
        ptrue   p0.b, all ; set p0.b to all ones
        ld2h    {z2.h - z3.h}, p1/z, [x1] ; load the 8x2 vector into z2/z3
        ld2h    {z0.h - z1.h}, p1/z, [x2] ; load the 8x2 vector into z0/z1
        ld2h    {z16.h - z17.h}, p1/z, [x0] ; load the 8x2 vector into z16/=
17
        fmul    z6.h, z0.h, z3.h ; z6 =3D z0 * z3
        movprfx z7, z16          ; z7 =3D z16
        fmla    z7.h, p0/m, z0.h, z2.h ; z7+=3Dz0*z2
        fmla    z6.h, p0/m, z1.h, z2.h ; z6 +=3D z1*z2
        movprfx z4, z7                 ; z4 =3D z7
        fmls    z4.h, p0/m, z1.h, z3.h ; z4 -=3D z1*z3
        fadd    z5.h, z6.h, z17.h      ; z5 =3D z6 + z17
        st2h    {z4.h - z5.h}, p1, [x0] ; store the 8x2 vector into x0


note the way ld2 works is the first element goes into the first vector, sec=
ond
element goes into the second vector, the 3rd element goes into the first
vector, the 4th element goes into the second vector.

So this is optimized all the way. Knowing the lower limit of the size of the
vectors will be 128 byte (or 64 half floats) so 8 half floats will always f=
it
into one vector just fine.
So this is vectorized all the way such that it is unrolled even.=