Re: [PATCH PR66388]Add sizetype cand for BIV of smaller type if it's used as index of memory ref

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Bin Cheng <bin.cheng@arm.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH PR66388]Add sizetype cand for BIV of smaller type if it's used as index of memory ref
Date: Tue, 08 Sep 2015 10:07:00 -0000	[thread overview]
Message-ID: <CAHFci28X0SehQh34W9V-WFoDwm1OfZ24BN5XKXiiwf_xu42=nw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc1heF7Doc-GSfu4w_h6jqd=2kZVFn69v-YZNNiyz8tLBA@mail.gmail.com>

On Wed, Sep 2, 2015 at 10:12 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Sep 2, 2015 at 5:26 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>> Hi,
>> This patch is a new approach to fix PR66388.  IVO today computes iv_use with
>> iv_cand which has at least same type precision as the use.  On 64bit
>> platforms like AArch64, this results in different iv_cand created for each
>> address type iv_use, and register pressure increased.  As a matter of fact,
>> the BIV should be used for all iv_uses in some of these cases.  It is a
>> latent bug but recently getting worse because of overflow changes.
>>
>> The original approach at
>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01484.html can fix the issue
>> except it conflict with IV elimination.  Seems to me it is impossible to
>> mitigate the contradiction.
>>
>> This new approach fixes the issue by adding sizetype iv_cand for BIVs
>> directly.  In cases if the original BIV is preferred, the sizetype iv_cand
>> will be chosen.  As for code generation, the sizetype iv_cand has the same
>> effect as the original BIV.  Actually, it's better because BIV needs to be
>> explicitly extended to sizetype to be used in address expression on most
>> targets.
>>
>> One shortage of this approach is it may introduce more iv candidates.  To
>> minimize the impact, this patch does sophisticated code analysis and adds
>> sizetype candidate for BIV only if it is used as index.  Moreover, it avoids
>> to add candidate of the original type if the BIV is only used as index.
>> Statistics for compiling spec2k6 shows increase of candidate number is
>> modest and can be ignored.
>>
>> There are two more patches following to fix corner cases revealed by this
>> one.  In together they bring obvious perf improvement for spec26k/int on
>> aarch64.
>> Spec2k6/int
>> 400.perlbench   3.44%
>> 445.gobmk       -0.86%
>> 456.hmmer       14.83%
>> 458.sjeng       2.49%
>> 462.libquantum  -0.79%
>> GEOMEAN         1.68%
>>
>> There is also about 0.36% improvement for spec2k6/fp, mostly because of case
>> 436.cactusADM.  I believe it can be further improved, but that should be
>> another patch.
>>
>> I also collected benchmark data for x86_64.  Spec2k6/fp is not affected.  As
>> for spec2k6/int, though the geomean is improved slightly, 400.perlbench is
>> regressed by ~3%.  I can see BIVs are chosen for some loops instead of
>> address candidates.  Generally, the loop header will be simplified because
>> iv elimination with BIV is simpler; the number of instructions in loop body
>> isn't changed.  I suspect the regression comes from different addressing
>> modes.  With BIV, complex addressing mode like [base + index << scale +
>> disp] is used, rather than [base + disp].  I guess the former has more
>> micro-ops, thus more expensive.  This guess can be confirmed by manually
>> suppressing the complex addressing mode with higher address cost.
>> Now the problem becomes why overall cost of BIV is computed lower while the
>> actual cost is higher.  I noticed for most affected loops, loop header is
>> bloated because of iv elimination using the old address candidate.  The
>> bloated loop header results in much higher cost than BIV.  As a result, BIV
>> is preferred.  I also noticed the bloated loop header generally can be
>> simplified (I have a following patch for this).  After applying the local
>> patch, the old address candidate is chosen, and most of regression is
>> recovered.
>> Conclusion is I think loop header bloated issue should be blamed for the
>> regression, and it can be resolved.
>>
>> Bootstrap and test on x64_64 and aarch64.  It fixes failure of
>> gcc.target/i386/pr49781-1.c, without new breakage.
>>
>> So what do you think?
>
> The data above looks ok to me.
>
> +static struct iv *
> +find_deriving_biv_for_iv (struct ivopts_data *data, struct iv *iv)
> +{
> +  aff_tree aff;
> +  struct expand_data exp_data;
> +
> +  if (!iv->ssa_name || TREE_CODE (iv->ssa_name) != SSA_NAME)
> +    return iv;
> +
> +  /* Expand IV's ssa_name till the deriving biv is found.  */
> +  exp_data.data = data;
> +  exp_data.biv = NULL;
> +  tree_to_aff_combination_expand (iv->ssa_name, TREE_TYPE (iv->ssa_name),
> +                                 &aff, &data->name_expansion_cache,
> +                                 stop_expand, &exp_data);
> +  return exp_data.biv;
>
> that's actually "abusing" tree_to_aff_combination_expand for simply walking
> SSA uses and their defs uses recursively until you hit "stop".  ISTR past
> discussion to add a generic walk_ssa_use interface for that.  Not sure if it
> materialized with a name I can't remember or whether it didn't.
Thanks for reviewing.  I didn't found existing interface to walk up
definition chains of ssa vars.  In this updated patch, I implemented a
simple function which meets the minimal requirement of walking up
definition chains of BIV variables.  I also counted number of
no_overflow BIVs that are not used in address type use.  Since
generally there are only two BIVs in a loop, this can prevent us from
visiting definition chains most of time.  Statistics shows that number
of call to find_deriving_biv_for_expr plummet.

>
> -  add_candidate (data, iv->base, iv->step, true, NULL);
> +  /* Check if this biv is used in address type use.  */
> +  if (iv->no_overflow  && iv->have_address_use
> +      && INTEGRAL_TYPE_P (TREE_TYPE (iv->base))
> +      && TYPE_PRECISION (TREE_TYPE (iv->base)) < TYPE_PRECISION (sizetype))
> +    {
> +      tree type = unsigned_type_for (sizetype);
>
> sizetype is unsigned.
Fixed.

Bootstrap and test on x86_64, is this OK?

Thanks,
bin
>
> the rest looks ok to me but I really don't like the abuse of
> tree_to_aff_combination_expand...
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>>
>> 2015-08-31  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * tree-affine.c (aff_combination_expand): New parameters.
>>         (tree_to_aff_combination_expand): Ditto.
>>         * tree-affine.h (aff_combination_expand): New declaration.
>>         (tree_to_aff_combination_expand): Ditto.
>>         * tree-ssa-loop-ivopts.c (struct iv, iv_cand): New fields.
>>         (dump_iv): Dump no_overflow information.
>>         (alloc_iv): Initialize new field for struct iv.
>>         (struct expand_data): New struct for affine combination expanding.
>>         (stop_expand): New callback func for affine combination expanding.
>>         (find_deriving_biv_for_iv, record_biv_for_address_use): New
>> functions.
>>         (idx_find_step): Call new functions above.
>>         (find_depends, add_candidate): New paramter.
>>         (add_iv_candidate_for_biv): Add sizetype cand for BIV.
>>         (get_computation_aff): Simplify convertion of cand for BIV.
>>         (get_computation_cost_at): Step cand's base if necessary.
>>

next prev parent reply	other threads:[~2015-09-08 10:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-02  3:26 Bin Cheng
2015-09-02 14:12 ` Richard Biener
2015-09-08 10:07   ` Bin.Cheng [this message]
2015-09-08 10:47     ` Bin.Cheng
2015-09-15  5:57       ` Bin.Cheng
2015-09-16 11:55         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHFci28X0SehQh34W9V-WFoDwm1OfZ24BN5XKXiiwf_xu42=nw@mail.gmail.com' \
    --to=amker.cheng@gmail.com \
    --cc=bin.cheng@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).