From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id E39353858C50; Thu, 11 Apr 2024 21:40:06 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E39353858C50
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1712871606;
	bh=rDky/zYMOFVAxvZbqXS2601A+43s3Qr4ZnngRKar5Pc=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=QaOqUlI3zFMF7SEFEQ2vWVb7ZoBxeYuluXjjPZbM5m6KeGMtJ+17deNFDXmfAA05S
	 ZpWVWCuZJXXiLSASnsID8DA3ahGjTJhbcaICgPC6a+DZEDT7C6/0rbLpUXOwXH/1zK
	 KX+Cm4pya/hSEgfMIOWd3KWwjbWDSPug8ExOilhI=
From: "tnfchris at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/114403] [14 regression] LLVM miscompiled with
 -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8
Date: Thu, 11 Apr 2024 21:40:03 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: needs-reduction, wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: tnfchris at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: attachments.created
Message-ID: <bug-114403-4-X2wJAOmp8o@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114403-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114403-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114403

--- Comment #21 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Created attachment 57932
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D57932&action=3Dedit
loop.c

attached reduced testcase that reproduces the issue and also checks the buf=
fer
position and copied values.

As discussed on IRC when peeling for gaps we need to either adjust the upper
bounds of the vector loop or force the vector loop to get to the scalar loo=
p.=20
However we already go to the scalar loop, just with the wrong induction val=
ue
because we were never supposed to take the main exit.

whether go to the scalar loop depends on
x =3D (((ptr2 - ptr1) - 16) / 16) + 1
x =3D=3D (((x - 1) >> 2) << 2)

in this case x =3D=3D 26, so we do go to the scalar code already, but throu=
gh the
main exit.

exiting through the main exit assumes you've done all vector iterations, in
this case 6 iterations based on the main exit condition which is first !=3D=
 last.

In this case the inductions values will be set on niters_vector_mult.

so in this case first +=3D 24

But that's wrong since the secondary exit has a known iteration count of 9,=
 due
to (buffer_ptr + store_size) <=3D buffer_end.

Statement (exit)if (ivtmp_21 !=3D 0)
 is executed at most 8 (bounded by 8) + 1 times in loop 1.

So we will always exit through it as 9 < 24.

that means that when we calculate the upper bounds of the vector loop, we m=
ust
add a bias so that in this boundary condition that we do an extra partial
vector iteration.

I think the discussion on IRC went off track for a bit and hopefully this
testcase and the explanation above shows that for all early break and all
epilogue peeling reasons, we must bias up for the upper bound to give the
secondary exits a chance to trigger.

So really do think the correct patch is:
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 4375ebdcb49..0973b952c70 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12144,6 +12144,9 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimp=
le
*loop_vectorized_call)
      -min_epilogue_iters to remove iterations that cannot be performed
        by the vector code.  */
   int bias_for_lowest =3D 1 - min_epilogue_iters;
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+    bias_for_lowest =3D 1;
+
   int bias_for_assumed =3D bias_for_lowest;
   int alignment_npeels =3D LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
   if (alignment_npeels && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))

for the reasons described above.  There's no way for us to take the main ex=
it,
which signifies (we've reached the end of all iterations we can possibly do=
 as
vector) and get the correct induction values in this case.=