From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C667D384A06B; Fri, 10 May 2024 07:52:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C667D384A06B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1715327557; bh=FT1KiJnjJM9glKwD9rYa+J+cBUKVVjV8MTnpQk394a4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=SFSh8N4aCAasP0dbM/LmDTe7nYvhNmWLlIKGRwV6saGGE0pWcYKX8D2Qfl+As5/xC 7FeBmgP2BQMvFOGNz27wtoEKjEh9V3cPTXEHMLJak7XtKZYUauqYzoS8gkD1JtcM9j 8Oh3vJ0e9vjYLyBi3v5ETiC5OfJ5424YnWHvNFqs= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114987] [14/15 Regression] floating point vector regression, x86, between gcc 14 and gcc-13 using -O3 and target clones on skylake platforms Date: Fri, 10 May 2024 07:52:37 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.2 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status short_desc everconfirmed cf_reconfirmed_on cf_gcctarget target_milestone Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114987 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Summary|[14/15 regression] floating |[14/15 Regression] floating |point vector regression, |point vector regression, |x86, between gcc 14 and |x86, between gcc 14 and |gcc-13 using -O3 and target |gcc-13 using -O3 and target |clones on skylake platforms |clones on skylake platforms Ever confirmed|0 |1 Last reconfirmed| |2024-05-10 Target|x86_64 |x86_64-*-* Target Milestone|--- |14.2 --- Comment #4 from Richard Biener --- I can't reproduce a slowdown on a Zen2 CPU. The difference seems to be mer= ely instruction scheduling. I do note we're not doing a good job in handling for (i =3D 0; i < LOOPS_PER_CALL; i++) { r.v =3D r.v + add.v; } where r.v and add.v are AVX512 sized vectors when emulating them with AVX vectors. We end up with r_v_lsm.48_48 =3D r.v; _11 =3D add.v; [local count: 1063004408]: # r_v_lsm.48_50 =3D PHI <_12(3), r_v_lsm.48_48(2)> # ivtmp_56 =3D PHI _16 =3D BIT_FIELD_REF <_11, 256, 0>; _37 =3D BIT_FIELD_REF ; _29 =3D _16 + _37; _387 =3D BIT_FIELD_REF <_11, 256, 256>; _375 =3D BIT_FIELD_REF ; _363 =3D _387 + _375; _12 =3D {_29, _363}; ivtmp_55 =3D ivtmp_56 - 1; if (ivtmp_55 !=3D 0) goto ; [98.99%] else goto ; [1.01%] [local count: 10737416]: after lowering from 512bit to 256bit vectors and there's no pass that would demote the 512bit reduction value to two 256bit ones. There's also weird things going on in the target/on RTL. A smaller testcase illustrating the code generation issue is typedef float v16sf __attribute__((vector_size(sizeof(float)*16))); void foo (v16sf * __restrict r, v16sf *a, int n) { for (int i =3D 0; i < n; ++i) *r =3D *r + *a; } So confirmed for non-optimal code but I don't see how it's a regression.=