[Bug tree-optimization/112325] Missed vectorization of reduction after unrolling

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "liuhongt at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/112325] Missed vectorization of reduction after unrolling
Date: Wed, 28 Feb 2024 07:26:28 +0000	[thread overview]
Message-ID: <bug-112325-4-DS3TBtcB5a@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-112325-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325

--- Comment #14 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #13)
> On Tue, 27 Feb 2024, liuhongt at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
> > 
> > --- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > 
> > >    Loop body is likely going to simplify further, this is difficult
> > >    to guess, we just decrease the result by 1/3.  */
> > > 
> > 
> > This is introduced by r0-68074-g91a01f21abfe19
> > 
> > /* Estimate number of insns of completely unrolled loop.  We assume
> > +   that the size of the unrolled loop is decreased in the
> > +   following way (the numbers of insns are based on what
> > +   estimate_num_insns returns for appropriate statements):
> > +
> > +   1) exit condition gets removed (2 insns)
> > +   2) increment of the control variable gets removed (2 insns)
> > +   3) All remaining statements are likely to get simplified
> > +      due to constant propagation.  Hard to estimate; just
> > +      as a heuristics we decrease the rest by 1/3.
> > +
> > +   NINSNS is the number of insns in the loop before unrolling.
> > +   NUNROLL is the number of times the loop is unrolled.  */
> > +
> > +static unsigned HOST_WIDE_INT
> > +estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns,
> > +                        unsigned HOST_WIDE_INT nunroll)
> > +{
> > +  HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3;
> > +  if (unr_insns <= 0)
> > +    unr_insns = 1;
> > +  unr_insns *= (nunroll + 1);
> > +
> > +  return unr_insns;
> > +}
> > 
> > And r0-93444-g08f1af2ed022e0 try do it more accurately by marking
> > likely_eliminated stmt and minus that from total insns, But 2 / 3 is still
> > keeped.
> > 
> > +/* Estimate number of insns of completely unrolled loop.
> > +   It is (NUNROLL + 1) * size of loop body with taking into account
> > +   the fact that in last copy everything after exit conditional
> > +   is dead and that some instructions will be eliminated after
> > +   peeling.
> > 
> > -   NINSNS is the number of insns in the loop before unrolling.
> > -   NUNROLL is the number of times the loop is unrolled.  */
> > +   Loop body is likely going to simplify futher, this is difficult
> > +   to guess, we just decrease the result by 1/3.  */
> > 
> >  static unsigned HOST_WIDE_INT
> > -estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns,
> > +estimated_unrolled_size (struct loop_size *size,
> >                          unsigned HOST_WIDE_INT nunroll)
> >  {
> > -  HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3;
> > +  HOST_WIDE_INT unr_insns = ((nunroll)
> > +                            * (HOST_WIDE_INT) (size->overall
> > +                                               -
> > size->eliminated_by_peeling));
> > +  if (!nunroll)
> > +    unr_insns = 0;
> > +  unr_insns += size->last_iteration -
> > size->last_iteration_eliminated_by_peeling;
> > +
> > +  unr_insns = unr_insns * 2 / 3;
> >    if (unr_insns <= 0)
> >      unr_insns = 1;
> > -  unr_insns *= (nunroll + 1);
> > 
> > It looks to me 1 / 3 overestimates the instructions that can be optimised away,
> > especially if we've subtracted eliminated_by_peeling
> 
> Yes, that 1/3 reduction is a bit odd - you could have the same effect
> by increasing the instruction limit by 1/3, but that means it doesn't
> really matter, does it?  It would be interesting to see if increasing
> the limit by 1/3 and removing the above is neutral on SPEC?

Remove 1/3 reduction get ~2% improvement for 525.x264_r on SPR with
-march=native -O3, no big impact on other integer benchmark.

The regression comes from below function, cunrolli unrolls the inner loop,
cunroll unrolls the outer loop, and causes lots of spills.

typedef unsigned long long uint64_t;
typedef unsigned char uint8_t;
typedef unsigned int uint32_t;
uint64_t x264_pixel_var_8x8(uint8_t *pix, int i_stride )
{
    uint32_t sum = 0, sqr = 0;
    for( int y = 0; y < 8; y++ )
    {
        for( int x = 0; x < 8; x++ ) 
        {
            sum += pix[x]; 
            sqr += pix[x] * pix[x]; 
        }                                                              
        pix += i_stride;                                                       
    }                                                                           
    return sum + ((uint64_t)sqr << 32);    
}

next prev parent reply	other threads:[~2024-02-28  7:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-01  2:41 [Bug tree-optimization/112325] New: Missed vectorization after cunrolli wwwhhhyyy333 at gmail dot com
2023-11-01  2:46 ` [Bug tree-optimization/112325] " pinskia at gcc dot gnu.org
2023-11-02  9:51 ` [Bug tree-optimization/112325] Missed vectorization of reduction after unrolling rguenth at gcc dot gnu.org
2023-11-16  8:03 ` liuhongt at gcc dot gnu.org
2023-11-16  8:15 ` liuhongt at gcc dot gnu.org
2023-11-16  9:15 ` rguenth at gcc dot gnu.org
2023-11-17  6:19 ` pinskia at gcc dot gnu.org
2023-11-17  6:21 ` pinskia at gcc dot gnu.org
2023-11-20  2:52 ` cvs-commit at gcc dot gnu.org
2023-11-21  0:34 ` cvs-commit at gcc dot gnu.org
2024-02-27  6:02 ` liuhongt at gcc dot gnu.org
2024-02-27  6:13 ` liuhongt at gcc dot gnu.org
2024-02-27  7:26 ` liuhongt at gcc dot gnu.org
2024-02-27  7:53 ` rguenther at suse dot de
2024-02-27  7:58 ` rguenther at suse dot de
2024-02-28  7:26 ` liuhongt at gcc dot gnu.org [this message]
2024-02-28  8:23 ` rguenther at suse dot de
2024-02-28  8:26 ` liuhongt at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-112325-4-DS3TBtcB5a@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).