From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E39353858C50; Thu, 11 Apr 2024 21:40:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E39353858C50 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1712871606; bh=rDky/zYMOFVAxvZbqXS2601A+43s3Qr4ZnngRKar5Pc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=QaOqUlI3zFMF7SEFEQ2vWVb7ZoBxeYuluXjjPZbM5m6KeGMtJ+17deNFDXmfAA05S ZpWVWCuZJXXiLSASnsID8DA3ahGjTJhbcaICgPC6a+DZEDT7C6/0rbLpUXOwXH/1zK KX+Cm4pya/hSEgfMIOWd3KWwjbWDSPug8ExOilhI= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8 Date: Thu, 11 Apr 2024 21:40:03 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: needs-reduction, wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: tnfchris at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114403 --- Comment #21 from Tamar Christina --- Created attachment 57932 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D57932&action=3Dedit loop.c attached reduced testcase that reproduces the issue and also checks the buf= fer position and copied values. As discussed on IRC when peeling for gaps we need to either adjust the upper bounds of the vector loop or force the vector loop to get to the scalar loo= p.=20 However we already go to the scalar loop, just with the wrong induction val= ue because we were never supposed to take the main exit. whether go to the scalar loop depends on x =3D (((ptr2 - ptr1) - 16) / 16) + 1 x =3D=3D (((x - 1) >> 2) << 2) in this case x =3D=3D 26, so we do go to the scalar code already, but throu= gh the main exit. exiting through the main exit assumes you've done all vector iterations, in this case 6 iterations based on the main exit condition which is first !=3D= last. In this case the inductions values will be set on niters_vector_mult. so in this case first +=3D 24 But that's wrong since the secondary exit has a known iteration count of 9,= due to (buffer_ptr + store_size) <=3D buffer_end. Statement (exit)if (ivtmp_21 !=3D 0) is executed at most 8 (bounded by 8) + 1 times in loop 1. So we will always exit through it as 9 < 24. that means that when we calculate the upper bounds of the vector loop, we m= ust add a bias so that in this boundary condition that we do an extra partial vector iteration. I think the discussion on IRC went off track for a bit and hopefully this testcase and the explanation above shows that for all early break and all epilogue peeling reasons, we must bias up for the upper bound to give the secondary exits a chance to trigger. So really do think the correct patch is: diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 4375ebdcb49..0973b952c70 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -12144,6 +12144,9 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimp= le *loop_vectorized_call) -min_epilogue_iters to remove iterations that cannot be performed by the vector code. */ int bias_for_lowest =3D 1 - min_epilogue_iters; + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + bias_for_lowest =3D 1; + int bias_for_assumed =3D bias_for_lowest; int alignment_npeels =3D LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo); if (alignment_npeels && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) for the reasons described above. There's no way for us to take the main ex= it, which signifies (we've reached the end of all iterations we can possibly do= as vector) and get the correct induction values in this case.=