[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/112325] Missed vectorization of reduction after unrolling
Date: Wed, 28 Feb 2024 08:23:47 +0000	[thread overview]
Message-ID: <bug-112325-4-J9MYNGNEkj@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-112325-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325

--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 28 Feb 2024, liuhongt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
> 
> --- Comment #14 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #13)
> > On Tue, 27 Feb 2024, liuhongt at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
> > > 
> > > --- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > > 
> > > >    Loop body is likely going to simplify further, this is difficult
> > > >    to guess, we just decrease the result by 1/3.  */
> > > > 
> > > 
> > > This is introduced by r0-68074-g91a01f21abfe19
> > > 
> > > /* Estimate number of insns of completely unrolled loop.  We assume
> > > +   that the size of the unrolled loop is decreased in the
> > > +   following way (the numbers of insns are based on what
> > > +   estimate_num_insns returns for appropriate statements):
> > > +
> > > +   1) exit condition gets removed (2 insns)
> > > +   2) increment of the control variable gets removed (2 insns)
> > > +   3) All remaining statements are likely to get simplified
> > > +      due to constant propagation.  Hard to estimate; just
> > > +      as a heuristics we decrease the rest by 1/3.
> > > +
> > > +   NINSNS is the number of insns in the loop before unrolling.
> > > +   NUNROLL is the number of times the loop is unrolled.  */
> > > +
> > > +static unsigned HOST_WIDE_INT
> > > +estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns,
> > > +                        unsigned HOST_WIDE_INT nunroll)
> > > +{
> > > +  HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3;
> > > +  if (unr_insns <= 0)
> > > +    unr_insns = 1;
> > > +  unr_insns *= (nunroll + 1);
> > > +
> > > +  return unr_insns;
> > > +}
> > > 
> > > And r0-93444-g08f1af2ed022e0 try do it more accurately by marking
> > > likely_eliminated stmt and minus that from total insns, But 2 / 3 is still
> > > keeped.
> > > 
> > > +/* Estimate number of insns of completely unrolled loop.
> > > +   It is (NUNROLL + 1) * size of loop body with taking into account
> > > +   the fact that in last copy everything after exit conditional
> > > +   is dead and that some instructions will be eliminated after
> > > +   peeling.
> > > 
> > > -   NINSNS is the number of insns in the loop before unrolling.
> > > -   NUNROLL is the number of times the loop is unrolled.  */
> > > +   Loop body is likely going to simplify futher, this is difficult
> > > +   to guess, we just decrease the result by 1/3.  */
> > > 
> > >  static unsigned HOST_WIDE_INT
> > > -estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns,
> > > +estimated_unrolled_size (struct loop_size *size,
> > >                          unsigned HOST_WIDE_INT nunroll)
> > >  {
> > > -  HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3;
> > > +  HOST_WIDE_INT unr_insns = ((nunroll)
> > > +                            * (HOST_WIDE_INT) (size->overall
> > > +                                               -
> > > size->eliminated_by_peeling));
> > > +  if (!nunroll)
> > > +    unr_insns = 0;
> > > +  unr_insns += size->last_iteration -
> > > size->last_iteration_eliminated_by_peeling;
> > > +
> > > +  unr_insns = unr_insns * 2 / 3;
> > >    if (unr_insns <= 0)
> > >      unr_insns = 1;
> > > -  unr_insns *= (nunroll + 1);
> > > 
> > > It looks to me 1 / 3 overestimates the instructions that can be optimised away,
> > > especially if we've subtracted eliminated_by_peeling
> > 
> > Yes, that 1/3 reduction is a bit odd - you could have the same effect
> > by increasing the instruction limit by 1/3, but that means it doesn't
> > really matter, does it?  It would be interesting to see if increasing
> > the limit by 1/3 and removing the above is neutral on SPEC?
> 
> Remove 1/3 reduction get ~2% improvement for 525.x264_r on SPR with
> -march=native -O3, no big impact on other integer benchmark.

454.calculix was always the benchmark to cross check as that benefits
from much unrolling.

I'm all for removing the 1/3 for innermost loop handling (in cunroll
the unrolled loop is then innermost).  I'm more concerned about
unrolling more than one level which is exactly what's required for
454.calculix.

next prev parent reply	other threads:[~2024-02-28  8:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-01  2:41 [Bug tree-optimization/112325] New: Missed vectorization after cunrolli wwwhhhyyy333 at gmail dot com
2023-11-01  2:46 ` [Bug tree-optimization/112325] " pinskia at gcc dot gnu.org
2023-11-02  9:51 ` [Bug tree-optimization/112325] Missed vectorization of reduction after unrolling rguenth at gcc dot gnu.org
2023-11-16  8:03 ` liuhongt at gcc dot gnu.org
2023-11-16  8:15 ` liuhongt at gcc dot gnu.org
2023-11-16  9:15 ` rguenth at gcc dot gnu.org
2023-11-17  6:19 ` pinskia at gcc dot gnu.org
2023-11-17  6:21 ` pinskia at gcc dot gnu.org
2023-11-20  2:52 ` cvs-commit at gcc dot gnu.org
2023-11-21  0:34 ` cvs-commit at gcc dot gnu.org
2024-02-27  6:02 ` liuhongt at gcc dot gnu.org
2024-02-27  6:13 ` liuhongt at gcc dot gnu.org
2024-02-27  7:26 ` liuhongt at gcc dot gnu.org
2024-02-27  7:53 ` rguenther at suse dot de
2024-02-27  7:58 ` rguenther at suse dot de
2024-02-28  7:26 ` liuhongt at gcc dot gnu.org
2024-02-28  8:23 ` rguenther at suse dot de [this message]
2024-02-28  8:26 ` liuhongt at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-112325-4-J9MYNGNEkj@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).