public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/106346] [11/12/13 Regression] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94
Date: Wed, 27 Jul 2022 08:50:00 +0000	[thread overview]
Message-ID: <bug-106346-4-LsgLU75Dwn@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-106346-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346

--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 27 Jul 2022, tnfchris at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346
> 
> --- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #5)
> > (In reply to Tamar Christina from comment #4)
> > > I believe the problem is actually g:27842e2a1eb26a7eae80b8efd98fb8c8bd74a68e
> > > 
> > > We added an optab for the widening left shift pattern there however the
> > > operation requires a uniform shift constant to work. See
> > > https://godbolt.org/z/4hqKc69Ke
> > > 
> > > The existing pattern that deals with this is vect_recog_widen_shift_pattern
> > > which is a scalar pattern.  during build_slp it validates that constants are
> > > the same and when they're not it aborts SLP.  This is why we lose
> > > vectorization.  Eventually we hit V4HI for which we have no widening shift
> > > optab for and it vectorizes using that low VF.
> > > 
> > > This example shows a number of things wrong:
> > > 
> > > 1. The generic costing seems off, this sequence shouldn't have been
> > > generated, as a vector sequence it's more inefficient than the scalar
> > > sequence. Using -mcpu=neover-n1 or any other costing structure correctly
> > > only gives scalar.
> > > 
> > > 2. vect_recog_widen_shift_pattern is implemented in the wrong place.  It
> > > predates the existence of the SLP pattern matcher. Because of the uniform
> > > requirements it's better to use the SLP pattern matcher where we have access
> > > to all the constants to decide whether the pattern is a match or not.  That
> > > way we don't abort SLP. Are you ok with this as a fix Richi?
> > 
> > patterns are difficult beasts - I think vect_recog_widen_shift_pattern is
> > at the correct place but instead what is lacking is SLP discovery support
> > for scrapping it - that is, ideally the vectorizer would take patterns as
> > a hint and ignore them when they are not helpful.
> 
> Hmm, yes but the problem is that we've consumed additional related statements
> which now need to be handled by build_slp as well. I suppose you could do an
> in-place build_slp on the pattern stmt seq iterator.  Though that seems like
> undoing a mistake.

Yes.  But it would be actually undoing a mistake, so ... ;)  The issue
is of course that we have many ways to vectorize things and patterns
just add to that - and SLP discovery needs to somehow pick one and
we stick to it (and cost based selection is only done on the vector mode
choice).

I'm not sure what's the best thing to do in the very end - for patterns
one could think of adding alternate (sub-)SLP-graph overlays instead
of forking the whole SLP graph for every combination of 
pattern/non-pattern which would surely explode.  At analysis time
one could choose the locally cheapest variant assuming the local
choice does not influence analysis of the rest of the graph (which
some widening patterns do after all, at least in the loop case by
influencing the VF).

> > 
> > Now - in theory, for SLP vectorization, all patterns could be handled
> > as SLP patterns and scalar patterns disabled.  But that isn't easy to
> > do either.
> 
> As long as we still have the non-SLP loop vect it's probably not a good idea
> no?
> since we then lose all patterns for it.  The widening shift was already
> sufficiently
> limited that it wouldn't really regress here.

Well, we'd need both obviously.  But yes, if we get to the point (sorry
for it taking forever) that we have SLP data structures for everything
we could shelve the stmt based patterns.

> > 
> > I fear to fight this regression the easiest route is to pretend the
> > ISA can do widen shift by vector and fixup in the expander ...
> 
> I can do this, but we're hiding the cost then. Or did you want me to fudge the
> numbers in the costing hooks?

Yes.

> > 
> > > 3. The epilogue costing seems off..
> > > 
> > > This example https://godbolt.org/z/YoPcWv6Td ends up generating an
> > > exceptionally high epilogue cost and so thinks vectorization at the higher
> > > VF is not profitable.
> > > 
> > > *src1_18(D) 1 times vec_to_scalar costs 2 in epilogue
> > > MEM[(uint16_t *)src1_18(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue
> > > MEM[(uint16_t *)src1_18(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue
> > > MEM[(uint16_t *)src1_18(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue
> > > MEM[(uint16_t *)src1_18(D) + 8B] 1 times vec_to_scalar costs 2 in epilogue
> > > MEM[(uint16_t *)src1_18(D) + 10B] 1 times vec_to_scalar costs 2 in epilogue
> > > MEM[(uint16_t *)src1_18(D) + 12B] 1 times vec_to_scalar costs 2 in epilogue
> > > MEM[(uint16_t *)src1_18(D) + 14B] 1 times vec_to_scalar costs 2 in epilogue
> > > /app/example.c:16:12: note: Cost model analysis for part in loop 0:
> > >   Vector cost: 23
> > >   Scalar cost: 17
> > 
> > I don't see any epilogue cost - the example doesn't have a loop.  With BB
> > vect you could see no epilogue costs?
> 
> That was my expectation, but see e.g. https://godbolt.org/z/MGEMYEe86 the SLP
> shows the
> above output.  I don't understand where the vec_to_scalar costs come from.
> 
> > 
> > > For some reason it thinks it needs a scalar epilogue? using
> > > -fno-vect-cost-model gives the desired codegen.
> 
>

  parent reply	other threads:[~2022-07-27  8:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-18 18:28 [Bug tree-optimization/106346] New: Potential regression on vectorization of left shift with constants manolis.tsamis at vrull dot eu
2022-07-19  6:52 ` [Bug target/106346] " rguenth at gcc dot gnu.org
2022-07-19 13:09 ` manolis.tsamis at vrull dot eu
2022-07-22 10:11 ` [Bug target/106346] Potential regression on vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94 marxin at gcc dot gnu.org
2022-07-22 11:00 ` tnfchris at gcc dot gnu.org
2022-07-27  7:27 ` [Bug target/106346] [11/12/13 Regression] " tnfchris at gcc dot gnu.org
2022-07-27  7:48 ` rguenth at gcc dot gnu.org
2022-07-27  8:26 ` tnfchris at gcc dot gnu.org
2022-07-27  8:50 ` rguenther at suse dot de [this message]
2023-07-31 15:30 ` [Bug target/106346] [11/12/13/14 " tnfchris at gcc dot gnu.org
2023-08-04 12:50 ` cvs-commit at gcc dot gnu.org
2023-08-04 13:58 ` tnfchris at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-106346-4-LsgLU75Dwn@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).