From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id DF635385803D; Wed, 27 Apr 2022 13:44:07 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DF635385803D
From: "avieira at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/105219] [12 Regression] SVE: Wrong code with
 -O3 -msve-vector-bits=128 -mtune=thunderx
Date: Wed, 27 Apr 2022 13:44:07 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: avieira at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: avieira at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-105219-4-d9UXqWyAOA@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-105219-4@http.gcc.gnu.org/bugzilla/>
References: <bug-105219-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Apr 2022 13:44:08 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D105219

--- Comment #18 from avieira at gcc dot gnu.org ---
(In reply to Richard Biener from comment #16)
> (In reply to rsandifo@gcc.gnu.org from comment #15)
> > (In reply to Richard Biener from comment #14)
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index d7bc34636bd..3b63ab7b669 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -9977,7 +9981,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, =
gimple
> > > *loop_vectorized_call)
> > >                             lowest_vf) - 1
> > >            : wi::udiv_floor (loop->nb_iterations_upper_bound +
> > > bias_for_lowest,
> > >                              lowest_vf) - 1);
> > > -      if (main_vinfo)
> > > +      if (main_vinfo && !main_vinfo->peeling_for_alignment)
> > >         {
> > >           unsigned int bound;
> > >           poly_uint64 main_iters
> > It might be better to add the maximum peeling amount to main_iters.
> > Maybe you'd prefer this anyway for GCC 12 though.
> >=20
> > I wonder if there's a similar problem for peeling for gaps,
> > in cases where the epilogue doesn't need the same peeling.
>=20
> I don't quite understand the code in if (main_vinfo) but the point is
> that for our case main_iters is zero (and so is prologue_iters if that
> would exist).  I'm not sure how the code can be adjusted with that
> given it computes upper bounds and uses min() for the upper bound
> of the epilogue - we'd need to adjust that with a max (2*vf-2,
> old-upper-bound)
> when there's prologue peeling and the short cut exists (I don't actually
> compute that).
>=20
> peeling for gaps means we run the epilogue for main VF more iterations,
> but that would just mean the vectorized epilogue executes one more time
> and has peeling for gaps applied as well, so the scalar epilogue runs
> for epilogue VF more iterations.
>=20
> I'm not sure what conditions prevent epilogue vectorization but I think
> there were some at least.


I think disabling this for peeling makes sense for now, but just to explain=
 how
the code works.

The perhaps misnamed 'main_iters' represents the maximum number of iteratio=
ns
left to do after the main loop, either entered or not. The maximum number of
iterations left to do after the main loop the largest of the three:
 - the main loop's VF, in case we enter the main loop there are at most VF-1
iterations left, I see I didn't add a -1 there.
 - LOOP_VINFO_COST_MODEL_THRESHOLD or LOOP_VINFO_VERSIONING_THRESHOLD in ca=
se
we don't enter the main loop because we don't have enough iterations to meet
these (but do still have enough for the epilogue).

Our problem is that this didn't take peeling into account, since skipping m=
ain
-> skipping peeling and thus really the number of iters we could be left wi=
th
after skipping main are actually main_iters + to-peel.

So I think the approach should be to add 'to_peel' to main_iters where
'to_peel' is either:
VF - 1 if PEELING_FOR_GAPS or PEELING_FOR_ALIGNMENT =3D -1
PEELING_FOR_ALIGNMENT otherwise.

But like I said first, disabling is probably the safest and easiest for gcc=
 12
and given the niche of this, I'm not even sure it's worth tightening it for=
 gcc
13?=