From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id C667D384A06B; Fri, 10 May 2024 07:52:37 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C667D384A06B
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1715327557;
	bh=FT1KiJnjJM9glKwD9rYa+J+cBUKVVjV8MTnpQk394a4=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=SFSh8N4aCAasP0dbM/LmDTe7nYvhNmWLlIKGRwV6saGGE0pWcYKX8D2Qfl+As5/xC
	 7FeBmgP2BQMvFOGNz27wtoEKjEh9V3cPTXEHMLJak7XtKZYUauqYzoS8gkD1JtcM9j
	 8Oh3vJ0e9vjYLyBi3v5ETiC5OfJ5424YnWHvNFqs=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114987] [14/15 Regression] floating point vector
 regression, x86, between gcc 14 and gcc-13 using -O3 and target clones on
 skylake platforms
Date: Fri, 10 May 2024 07:52:37 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.2
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_status short_desc everconfirmed
 cf_reconfirmed_on cf_gcctarget target_milestone
Message-ID: <bug-114987-4-5kQ2WlvEjX@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114987-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114987-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114987

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
            Summary|[14/15 regression] floating |[14/15 Regression] floating
                   |point vector regression,    |point vector regression,
                   |x86, between gcc 14 and     |x86, between gcc 14 and
                   |gcc-13 using -O3 and target |gcc-13 using -O3 and target
                   |clones on skylake platforms |clones on skylake platforms
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-05-10
             Target|x86_64                      |x86_64-*-*
   Target Milestone|---                         |14.2
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I can't reproduce a slowdown on a Zen2 CPU.  The difference seems to be mer=
ely
instruction scheduling.  I do note we're not doing a good job in handling

        for (i =3D 0; i < LOOPS_PER_CALL; i++) {
                r.v =3D r.v + add.v;
        }

where r.v and add.v are AVX512 sized vectors when emulating them with AVX
vectors.  We end up with

  r_v_lsm.48_48 =3D r.v;
  _11 =3D add.v;

  <bb 3> [local count: 1063004408]:
  # r_v_lsm.48_50 =3D PHI <_12(3), r_v_lsm.48_48(2)>
  # ivtmp_56 =3D PHI <ivtmp_55(3), 65536(2)>
  _16 =3D BIT_FIELD_REF <_11, 256, 0>;
  _37 =3D BIT_FIELD_REF <r_v_lsm.48_50, 256, 0>;
  _29 =3D _16 + _37;
  _387 =3D BIT_FIELD_REF <_11, 256, 256>;
  _375 =3D BIT_FIELD_REF <r_v_lsm.48_50, 256, 256>;
  _363 =3D _387 + _375;
  _12 =3D {_29, _363};
  ivtmp_55 =3D ivtmp_56 - 1;
  if (ivtmp_55 !=3D 0)
    goto <bb 3>; [98.99%]
  else
    goto <bb 4>; [1.01%]

  <bb 4> [local count: 10737416]:

after lowering from 512bit to 256bit vectors and there's no pass that
would demote the 512bit reduction value to two 256bit ones.

There's also weird things going on in the target/on RTL.  A smaller testcase
illustrating the code generation issue is

typedef float v16sf __attribute__((vector_size(sizeof(float)*16)));

void foo (v16sf * __restrict r, v16sf *a, int n)
{
  for (int i =3D 0; i < n; ++i)
    *r =3D *r + *a;
}

So confirmed for non-optimal code but I don't see how it's a regression.=