From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 3E1883858409; Wed, 20 Oct 2021 06:24:34 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3E1883858409
From: "linkw at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/102789] [12 regression] libgomp.c++/simd-3.C fails after
 r12-4340 for 32 bits
Date: Wed, 20 Oct 2021 06:24:34 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: linkw at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: linkw at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc bug_status
Message-ID: <bug-102789-4-FiCLOdMygF@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-102789-4@http.gcc.gnu.org/bugzilla/>
References: <bug-102789-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Oct 2021 06:24:34 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102789

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bergner at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org,
                   |                            |wschmidt at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #5 from Kewen Lin <linkw at gcc dot gnu.org> ---
As Jakub noted, r12-4340 just exposed one latent bug, even without r12-4340=
 but
with -fvect-cost-model=3Ddynamic, the issue still exists. The key is if it =
will
do the peeling for alignment in prologue.

          unsigned max_allowed_peel
            =3D param_vect_max_peeling_for_alignment;
          if (flag_vect_cost_model <=3D VECT_COST_MODEL_CHEAP)
            max_allowed_peel =3D 0;

--param vect-max-peeling-for-alignment=3D14 make the peeling disabled and it
passes.

I think this is a bug in vectorizer, reduced the culprit loop to (also move=
 the
first loop out of function):

  for (i =3D n; i < o; i++)
    {
      k +=3D m + 1;
      t =3D k + p[i];
      s2 +=3D t;
      c[i]++;
    }

we have some temporary storages for the omp clause such as:

  int D.3802[16];  // for k
  int D.3800[16];  // for s2
  int D.3799[16];  // for t

After having the peeling (one prologue), the addresses of k,s2,t become to:

  _187 =3D prolog_loop_niters.27_88 * 4;
  vectp.37_186 =3D &D.3802 + _187;
  _213 =3D prolog_loop_niters.27_88 * 4;
  vectp.46_212 =3D &D.3799 + _213;
  _222 =3D prolog_loop_niters.27_88 * 4;
  vectp.48_221 =3D &D.3800 + _222;

then the main vectorized loop body acts on the biased addresses which is wr=
ong:

  vect__61.49_223 =3D MEM <vector(4) int> [(int *)vectp.48_221];
  vectp.48_224 =3D vectp.48_221 + 16;
  vect__61.50_225 =3D MEM <vector(4) int> [(int *)vectp.48_224];
  vectp.48_226 =3D vectp.48_221 + 32;
  vect__61.51_227 =3D MEM <vector(4) int> [(int *)vectp.48_226];
  vectp.48_228 =3D vectp.48_221 + 48;
  vect__61.52_229 =3D MEM <vector(4) int> [(int *)vectp.48_228];
  _61 =3D D.3800[_56];

  vect__62.53_230 =3D vect__59.44_208 + vect__61.49_223;
  vect__62.53_231 =3D vect__59.44_209 + vect__61.50_225;
  vect__62.53_232 =3D vect__59.44_210 + vect__61.51_227;
  vect__62.53_233 =3D vect__59.44_211 + vect__61.52_229;
  _62 =3D _59 + _61;

  MEM <vector(4) int> [(int *)vectp.55_234] =3D vect__62.53_230;
  vectp.55_237 =3D vectp.55_234 + 16;
  MEM <vector(4) int> [(int *)vectp.55_237] =3D vect__62.53_231;
  vectp.55_239 =3D vectp.55_234 + 32;
  MEM <vector(4) int> [(int *)vectp.55_239] =3D vect__62.53_232;
  vectp.55_241 =3D vectp.55_234 + 48;
  MEM <vector(4) int> [(int *)vectp.55_241] =3D vect__62.53_233;


A fix looks to avoid the address biasing for these kinds of DRs for omp cla=
use
specific storage. These DRs are mainly used in the main loop (lanes?), for =
this
case it's for reduction, in prologues we use element 0, in epilogue we use =
the
last one or reduc_op all elements according to the type. The below small fix
can make it pass:
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 4988c93fdb6..a447f457f93 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1820,7 +1820,7 @@ vect_update_inits_of_drs (loop_vec_info loop_vinfo, t=
ree
niters,
   FOR_EACH_VEC_ELT (datarefs, i, dr)
     {
       dr_vec_info *dr_info =3D loop_vinfo->lookup_dr (dr);
-      if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt))
+      if (!STMT_VINFO_GATHER_SCATTER_P (dr_info->stmt) &&
!STMT_VINFO_SIMD_LANE_ACCESS_P (dr_info->stmt))
        vect_update_init_of_dr (dr_info, niters, code);
     }
 }

I've not looked into the meaning for different values (1,2,3,4) for
STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info), it seems for the different omp
clauses? The assumption of the above fix is that for all cases of
STMT_VINFO_SIMD_LANE_ACCESS_P > 0, the related DR would be used mainly in
vectorized loop body, we don't need any updates for it in prologue. I'm goi=
ng
to do one broader testing to see if we need more restrictions on that.=