Re: [RFC] avoid type conversion through versioning loop

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: guojiufu <guojiufu@linux.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>,
	GCC Development <gcc@gcc.gnu.org>,
	Richard Guenther <rguenther@suse.de>,
	Segher Boessenkool <segher@kernel.crashing.org>
Subject: Re: [RFC] avoid type conversion through versioning loop
Date: Wed, 24 Mar 2021 20:12:15 +0800	[thread overview]
Message-ID: <f5ca33b3ce8fe1dd0265d51a575a34ed@imap.linux.ibm.com> (raw)
In-Reply-To: <CAFiYyc2XSd2PKn+CkVPgLYeNsC-2ZCNFKzJU6Z07KV6xAXf=Bw@mail.gmail.com>

On 2021-03-24 15:55, Richard Biener wrote:
> On Wed, Mar 24, 2021 at 3:55 AM guojiufu <guojiufu@linux.ibm.com> 
> wrote:
>> 
>> On 2021-03-23 16:25, Richard Biener via Gcc wrote:
>> > On Tue, Mar 23, 2021 at 4:33 AM guojiufu <guojiufu@imap.linux.ibm.com>
>> > wrote:
>> >>
>> >> On 2021-03-22 16:31, Jakub Jelinek via Gcc wrote:
>> >> > On Mon, Mar 22, 2021 at 09:22:26AM +0100, Richard Biener via Gcc wrote:
>> >> >> Better than doing loop versioning is to enhance SCEV (and thus also
>> >> >> dependence analysis) to track extra conditions they need to handle
>> >> >> cases similar as to how niter analysis computes it's 'assumptions'
>> >> >> condition.  That allows the versioning to be done when there's an
>> >> >> actual beneficial transform (like vectorization) rather than just
>> >> >> upfront for the eventual chance that there'll be any.  Ideally such
>> >> >> transform would then choose IVs in their transformed copy that
>> >> >> are analyzable w/o repeating such versioning exercise for the next
>> >> >> transform.
>> >> >
>> >> > And it might be beneficial to perform some type promotion/demotion
>> >> > pass, either early during vectorization or separately before
>> >> > vectorization
>> >> > on a loop copy guarded with the ifns e.g. ifconv uses too.
>> >> > Find out what type sizes the loop use, first try to demote computations
>> >> > to narrower types in the vectorized loop candidate (e.g. if something
>> >> > is computed in a wider type only to have the result demoted to narrower
>> >> > type), then pick up the widest type size still in use in the loop (ok,
>> >> > this assumes we don't mix multiple vector sizes in the loop, but
>> >> > currently
>> >> > our vectorizer doesn't do that) and try to promote computations that
>> >> > could
>> >> > be promoted to that type size.  We do partially something like that
>> >> > during
>> >> > vect patterns for bool types, but not other types I think.
>> >> >
>> >> >       Jakub
>> >>
>> >> Thanks for the suggestions!
>> >>
>> >> Enhancing SCEV could help other optimizations and improve performance
>> >> in
>> >> some cases.
>> >> While one of the direct ideas of using the '64bit type' is to
>> >> eliminate
>> >> conversions,
>> >> even for some cases which are not easy to be optimized through
>> >> ifconv/vectorization,
>> >> for examples:
>> >>
>> >>    unsigned int i = 0;
>> >>    while (a[i]>1e-3)
>> >>      i++;
>> >>
>> >>    unsigned int i = 0;
>> >>    while (p1[i] == p2[i] && p1[i] != '\0')
>> >>      i++;
>> >>
>> >> Or only do versioning on type for this kind of loop? Any suggestions?
>> >
>> > But the "optimization" resulting from such versioning is hard to
>> > determine upfront which means we'll pay quite a big code size cost
>> > for unknown questionable gain.  What's the particular optimization
>> 
>> Right.  Code size increasing is a big pain on large loops. If the gain
>> is not significant, this optimization may not positive.
>> 
>> > in the above cases?  Note that for example for
>> >
>> >     unsigned int i = 0;
>> >     while (a[i]>1e-3)
>> >        i++;
>> >
>> > you know that when 'i' wraps then the loop will not terminate.  There's
>> 
>> Thanks :) The code would be "while (a[i]>1e-3 && i < n)", the upbound 
>> is
>> checkable.  Otherwise, the optimization to avoid zext is not 
>> adoptable.
>> 
>> > the address computation that is i * sizeof (T) which is done in a
>> > larger
>> > type to avoid overflow so we have &a + zext (i) * 8 - is that the
>> > operation
>> > that is 'slow' for you?
>> 
>> This is the point: "zext(i)" is the instruction that I want to
>> eliminate,
>> which is the direct goal of the optimization.
>> 
>> The gain of eliminating the 'zext' is visible or not, and the code 
>> size
>> increasing is small enough or not, this is a question and needs to
>> trade-off.
>> It may be only acceptable if the loop is very small, then eliminating
>> 'zext'
>> would help to save runtime, and code size increase maybe not big.
> 
> OK, so I indeed think that the desire to micro-optimize a 'zext' 
> doesn't
> make versioning a good trade-off.  The micro-architecture should better
> not make that totally slow (I'd expect an extra latency comparable to
> the multiply or add on the &a + zext(i) * 8 instruction chain).

Agree, I understand your point.  The concern is some micro-architectures 
are not
very well on this yet.  I tested the above example code:
   unsigned i = 0;
   while (a[i] > 1e-3 && i < n)
     i++;
there are ~30% performance improvement if using "long i" instead 
of"unsigned i"
on ppc64le and x86.  It seems those instructions are not optimized too 
much on
some platforms.  So, I'm wondering we need to do this in GCC.

> 
> OTOH making SCEV analysis not give up but instead record the 
> constraints
> under which its solution is valid is a very good and useful thing to 
> do.

Thanks! Enhance SCEV could help a few cases, especially when other 
optimizations
are enabled.

Thanks again for your suggestions!

BR.
Jiufu Guo.

> 
> Richard.
> 
>> Thanks again for your very helpful comments!
>> 
>> BR.
>> Jiufu Guo.
>> 
>> >
>> > Richard.
>> >
>> >> BR.
>> >> Jiufu Guo.

next prev parent reply	other threads:[~2021-03-24 12:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22  6:39 guojiufu
2021-03-22  8:22 ` Richard Biener
2021-03-22  8:31   ` Jakub Jelinek
2021-03-23  3:33     ` guojiufu
2021-03-23  8:25       ` Richard Biener
2021-03-24  2:55         ` guojiufu
2021-03-24  7:55           ` Richard Biener
2021-03-24 12:12             ` guojiufu [this message]
2021-03-24 12:33               ` Richard Biener
2021-03-24 15:17                 ` guojiufu
2021-03-25  8:35                   ` Richard Biener
2021-03-25 13:32                     ` guojiufu
2021-03-25 13:47                       ` guojiufu
2021-03-26  8:19                     ` guojiufu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5ca33b3ce8fe1dd0265d51a575a34ed@imap.linux.ibm.com \
    --to=guojiufu@linux.ibm.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=rguenther@suse.de \
    --cc=richard.guenther@gmail.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).