From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 3E1883858409; Wed, 20 Oct 2021 06:24:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3E1883858409 From: "linkw at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/102789] [12 regression] libgomp.c++/simd-3.C fails after r12-4340 for 32 bits Date: Wed, 20 Oct 2021 06:24:34 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: linkw at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: linkw at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Oct 2021 06:24:34 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102789 Kewen Lin changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bergner at gcc dot gnu.org, | |rguenth at gcc dot gnu.org, | |wschmidt at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #5 from Kewen Lin --- As Jakub noted, r12-4340 just exposed one latent bug, even without r12-4340= but with -fvect-cost-model=3Ddynamic, the issue still exists. The key is if it = will do the peeling for alignment in prologue. unsigned max_allowed_peel =3D param_vect_max_peeling_for_alignment; if (flag_vect_cost_model <=3D VECT_COST_MODEL_CHEAP) max_allowed_peel =3D 0; --param vect-max-peeling-for-alignment=3D14 make the peeling disabled and it passes. I think this is a bug in vectorizer, reduced the culprit loop to (also move= the first loop out of function): for (i =3D n; i < o; i++) { k +=3D m + 1; t =3D k + p[i]; s2 +=3D t; c[i]++; } we have some temporary storages for the omp clause such as: int D.3802[16]; // for k int D.3800[16]; // for s2 int D.3799[16]; // for t After having the peeling (one prologue), the addresses of k,s2,t become to: _187 =3D prolog_loop_niters.27_88 * 4; vectp.37_186 =3D &D.3802 + _187; _213 =3D prolog_loop_niters.27_88 * 4; vectp.46_212 =3D &D.3799 + _213; _222 =3D prolog_loop_niters.27_88 * 4; vectp.48_221 =3D &D.3800 + _222; then the main vectorized loop body acts on the biased addresses which is wr= ong: vect__61.49_223 =3D MEM [(int *)vectp.48_221]; vectp.48_224 =3D vectp.48_221 + 16; vect__61.50_225 =3D MEM [(int *)vectp.48_224]; vectp.48_226 =3D vectp.48_221 + 32; vect__61.51_227 =3D MEM [(int *)vectp.48_226]; vectp.48_228 =3D vectp.48_221 + 48; vect__61.52_229 =3D MEM [(int *)vectp.48_228]; _61 =3D D.3800[_56]; vect__62.53_230 =3D vect__59.44_208 + vect__61.49_223; vect__62.53_231 =3D vect__59.44_209 + vect__61.50_225; vect__62.53_232 =3D vect__59.44_210 + vect__61.51_227; vect__62.53_233 =3D vect__59.44_211 + vect__61.52_229; _62 =3D _59 + _61; MEM [(int *)vectp.55_234] =3D vect__62.53_230; vectp.55_237 =3D vectp.55_234 + 16; MEM [(int *)vectp.55_237] =3D vect__62.53_231; vectp.55_239 =3D vectp.55_234 + 32; MEM [(int *)vectp.55_239] =3D vect__62.53_232; vectp.55_241 =3D vectp.55_234 + 48; MEM [(int *)vectp.55_241] =3D vect__62.53_233; A fix looks to avoid the address biasing for these kinds of DRs for omp cla= use specific storage. These DRs are mainly used in the main loop (lanes?), for = this case it's for reduction, in prologues we use element 0, in epilogue we use = the last one or reduc_op all elements according to the type. The below small fix can make it pass: diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 4988c93fdb6..a447f457f93 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -1820,7 +1820,7 @@ vect_update_inits_of_drs (loop_vec_info loop_vinfo, t= ree niters, FOR_EACH_VEC_ELT (datarefs, i, dr) { dr_vec_info *dr_info =3D loop_vinfo->lookup_dr (dr); - if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt)) + if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt) && !STMT_VINFO_SIMD_LANE_ACCESS_P (dr_info->stmt)) vect_update_init_of_dr (dr_info, niters, code); } } I've not looked into the meaning for different values (1,2,3,4) for STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info), it seems for the different omp clauses? The assumption of the above fix is that for all cases of STMT_VINFO_SIMD_LANE_ACCESS_P > 0, the related DR would be used mainly in vectorized loop body, we don't need any updates for it in prologue. I'm goi= ng to do one broader testing to see if we need more restrictions on that.=