[Bug tree-optimization/110310] New: vector epilogue handling is inefficient

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/110310] New: vector epilogue handling is inefficient
@ 2023-06-19 11:00 rguenth at gcc dot gnu.org
  2023-06-19 11:01 ` [Bug tree-optimization/110310] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-19 11:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

            Bug ID: 110310
           Summary: vector epilogue handling is inefficient
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

It looks like we apply some analysis only when transforming the main vector
loop.  In particular vect_do_peeling does the following which elides a
vector epilogue after costing.

  /* If we know the number of scalar iterations for the main loop we should
     check whether after the main loop there are enough iterations left over
     for the epilogue.  */
  if (vect_epilogues
      && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
      && prolog_peeling >= 0
      && known_eq (vf, lowest_vf))
    {
      unsigned HOST_WIDE_INT eiters
        = (LOOP_VINFO_INT_NITERS (loop_vinfo)
           - LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));

      eiters -= prolog_peeling;
      eiters
        = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);

      while (!vect_update_epilogue_niters (epilogue_vinfo, eiters))
        {
          delete epilogue_vinfo;
          epilogue_vinfo = NULL;
          if (loop_vinfo->epilogue_vinfos.length () == 0)
            {
              vect_epilogues = false;
              break;
            }
          epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
          loop_vinfo->epilogue_vinfos.ordered_remove (0);
        }
      vect_epilogues_updated_niters = true;

So for example for the loop

void foo (int * __restrict a, int *b)
{
  for (int i = 0; i < 20; ++i)
    a[i] = b[i] + 42;
}

we end up with no vectorized epilogue when using AVX512 but instead of the
AVX2 epilogue which is discarded we'd like to use a SSE2 epilogue.  It
seems that vect_determine_partial_vectors_and_peeling as called from
vect_update_epilogue_niters should have been already determined when
analyzing the epilogue, but during the epilogue costing the loop_vinfo
still inherits the main loop NITER.

For the testcase at hand we're somewhat saved by BB vectorization but when
doing partial loop vectorization we unnecessarily get a AVX512 masked
epilogue here and the cost model doesn't get a chance to see the updated
known niter for the epilogue nor would there be a meaningful way to
do this when costs are compared because we have no way of estimating the
number of masked out lanes for example.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/110310] vector epilogue handling is inefficient
  2023-06-19 11:00 [Bug tree-optimization/110310] New: vector epilogue handling is inefficient rguenth at gcc dot gnu.org
@ 2023-06-19 11:01 ` rguenth at gcc dot gnu.org
  2023-06-22  9:35 ` avieira at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-19 11:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |avieira at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org
             Blocks|                            |53947
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I don't remember why that epilogue niter updating is only done during
transform?


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/110310] vector epilogue handling is inefficient
  2023-06-19 11:00 [Bug tree-optimization/110310] New: vector epilogue handling is inefficient rguenth at gcc dot gnu.org
  2023-06-19 11:01 ` [Bug tree-optimization/110310] " rguenth at gcc dot gnu.org
@ 2023-06-22  9:35 ` avieira at gcc dot gnu.org
  2023-06-22 12:03 ` rguenther at suse dot de
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: avieira at gcc dot gnu.org @ 2023-06-22  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

--- Comment #2 from avieira at gcc dot gnu.org ---
I can't remember the exact reason either, though I do vaguely remember niter
updating being something that we felt 'needed more future work' at the time.

Just a side question, AVX512 has predication right? So how come you are
expecting an epilogue?

I'm also curious about the condition on that snippet of code, 'known_eq (vf,
lowest_vf)' seems odd.. lowest_vf is by definition constant, so known_eq only
succeeds if vf is constant and the same as lowest_vf, but lowest_vf is the
constant lower bound of vf, i.e. that seems like a very convoluted way of doing
vf.is_constant (&lowest_vf)? Maybe this helper function wasn't around back
then. Either way, it feels like we shouldn't be doing this if loop_vinfo is
predicated? But I also agree that we probably want to be doing all of this
during analysis, seems odd to be ruling out loop_vinfo's during transformation.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/110310] vector epilogue handling is inefficient
  2023-06-19 11:00 [Bug tree-optimization/110310] New: vector epilogue handling is inefficient rguenth at gcc dot gnu.org
  2023-06-19 11:01 ` [Bug tree-optimization/110310] " rguenth at gcc dot gnu.org
  2023-06-22  9:35 ` avieira at gcc dot gnu.org
@ 2023-06-22 12:03 ` rguenther at suse dot de
  2023-06-22 14:30 ` avieira at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenther at suse dot de @ 2023-06-22 12:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 22 Jun 2023, avieira at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310
> 
> --- Comment #2 from avieira at gcc dot gnu.org ---
> I can't remember the exact reason either, though I do vaguely remember niter
> updating being something that we felt 'needed more future work' at the time.
> 
> Just a side question, AVX512 has predication right? So how come you are
> expecting an epilogue?

I'm asking to only predicate the epilogues.

> I'm also curious about the condition on that snippet of code, 'known_eq (vf,
> lowest_vf)' seems odd.. lowest_vf is by definition constant, so known_eq only
> succeeds if vf is constant and the same as lowest_vf, but lowest_vf is the
> constant lower bound of vf, i.e. that seems like a very convoluted way of doing
> vf.is_constant (&lowest_vf)? Maybe this helper function wasn't around back
> then. Either way, it feels like we shouldn't be doing this if loop_vinfo is
> predicated? But I also agree that we probably want to be doing all of this
> during analysis, seems odd to be ruling out loop_vinfo's during transformation.

OK, so I take away from this that you don't think this is done the way
it is on purpose.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/110310] vector epilogue handling is inefficient
  2023-06-19 11:00 [Bug tree-optimization/110310] New: vector epilogue handling is inefficient rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-06-22 12:03 ` rguenther at suse dot de
@ 2023-06-22 14:30 ` avieira at gcc dot gnu.org
  2023-07-03  9:01 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: avieira at gcc dot gnu.org @ 2023-06-22 14:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

--- Comment #4 from avieira at gcc dot gnu.org ---
> OK, so I take away from this that you don't think this is done the way
it is on purpose.

I don't think so, I think I just found a place where it was safe to do so, i.e.
where we knew the vectorization factor would not change after. 

I have a vague recollection that vect_analyze_loop used to be somewhat more
complex, but given the now clear separation between main loop and epilogue
vinfo selection we have now, we could probably do this as we analyze
loop_vinfos for epilogue?

Assuming that during analysis we've had determined vf, peeling and use of
masks, which I'm pretty sure we have.

Might be worth asking Richard Sandiford if he can think of anything that we
might not be 'fixing' during analysis.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/110310] vector epilogue handling is inefficient
  2023-06-19 11:00 [Bug tree-optimization/110310] New: vector epilogue handling is inefficient rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-06-22 14:30 ` avieira at gcc dot gnu.org
@ 2023-07-03  9:01 ` rguenth at gcc dot gnu.org
  2023-07-04  7:08 ` cvs-commit at gcc dot gnu.org
  2023-07-04  7:08 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-03  9:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2023-07-03
     Ever confirmed|0                           |1

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will see if I can fix this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/110310] vector epilogue handling is inefficient
  2023-06-19 11:00 [Bug tree-optimization/110310] New: vector epilogue handling is inefficient rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-07-03  9:01 ` rguenth at gcc dot gnu.org
@ 2023-07-04  7:08 ` cvs-commit at gcc dot gnu.org
  2023-07-04  7:08 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-04  7:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:0682a32c026f1e246eb07bb8066abca4636f01d8

commit r14-2281-g0682a32c026f1e246eb07bb8066abca4636f01d8
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jul 3 13:59:33 2023 +0200

    tree-optimization/110310 - move vector epilogue disabling to analysis phase

    The following removes late deciding to elide vectorized epilogues to
    the analysis phase and also avoids altering the epilogues niter.
    The costing part from vect_determine_partial_vectors_and_peeling is
    moved to vect_analyze_loop_costing where we use the main loop
    analysis to constrain the epilogue scalar iterations.

    I have not tried to integrate this with vect_known_niters_smaller_than_vf.

    It seems the for_epilogue_p parameter in
    vect_determine_partial_vectors_and_peeling is largely useless and
    we could compute that in the function itself.

            PR tree-optimization/110310
            * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
            Move costing part ...
            (vect_analyze_loop_costing): ... here.  Integrate better
            estimate for epilogues from ...
            (vect_analyze_loop_2): Call
vect_determine_partial_vectors_and_peeling
            with actual epilogue status.
            * tree-vect-loop-manip.cc (vect_do_peeling): ... here and
            avoid cancelling epilogue vectorization.
            (vect_update_epilogue_niters): Remove.  No longer update
            epilogue LOOP_VINFO_NITERS.

            * gcc.target/i386/pr110310.c: New testcase.
            * gcc.dg/vect/slp-perm-12.c: Disable epilogue vectorization.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/110310] vector epilogue handling is inefficient
  2023-06-19 11:00 [Bug tree-optimization/110310] New: vector epilogue handling is inefficient rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-07-04  7:08 ` cvs-commit at gcc dot gnu.org
@ 2023-07-04  7:08 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-04  7:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110310

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-07-04  7:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-19 11:00 [Bug tree-optimization/110310] New: vector epilogue handling is inefficient rguenth at gcc dot gnu.org
2023-06-19 11:01 ` [Bug tree-optimization/110310] " rguenth at gcc dot gnu.org
2023-06-22  9:35 ` avieira at gcc dot gnu.org
2023-06-22 12:03 ` rguenther at suse dot de
2023-06-22 14:30 ` avieira at gcc dot gnu.org
2023-07-03  9:01 ` rguenth at gcc dot gnu.org
2023-07-04  7:08 ` cvs-commit at gcc dot gnu.org
2023-07-04  7:08 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).