Re: [RFC] Combine vectorized loops with its scalar remainder.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Yuri Rumyantsev <ysrumyan@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Ilya Enkovich <enkovich.gnu@gmail.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>, 	Jeff Law <law@redhat.com>,
	Igor Zamyatin <izamyatin@gmail.com>
Subject: Re: [RFC] Combine vectorized loops with its scalar remainder.
Date: Mon, 30 Nov 2015 15:04:00 -0000	[thread overview]
Message-ID: <CAEoMCqRcYCuGbSohGkLvZ6ttX4dy9Y6jOSer8-amdZXKQRC7Xw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc2F_x3V8QP+z6cPDL9DMvJm5E8nqq06S8-=Ms1jacSfmQ@mail.gmail.com>

Richard,

Thanks a lot for your detailed comments!

Few words about 436.cactusADM gain. The loop which was transformed for
avx2 is very huge and this is the last inner-most loop in routine
Bench_StaggeredLeapfrog2 (StaggeredLeapfrog2.F #366). If you don't
have sources, let me know.

Yuri.

2015-11-27 16:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Nov 13, 2015 at 11:35 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard,
>>
>> Here is updated version of the patch which 91) is in sync with trunk
>> compiler and (2) contains simple cost model to estimate profitability
>> of scalar epilogue elimination. The part related to vectorization of
>> loops with small trip count is in process of developing. Note that
>> implemented cost model was not tuned  well for HASWELL and KNL but we
>> got  ~6% speed-up on 436.cactusADM from spec2006 suite for HASWELL.
>
> Ok, so I don't know where to start with this.
>
> First of all while I wanted to have the actual stmt processing to be
> as post-processing
> on the vectorized loop body I didn't want to have this competely separated from
> vectorizing.
>
> So, do combine_vect_loop_remainder () from vect_transform_loop, not by iterating
> over all (vectorized) loops at the end.
>
> Second, all the adjustments of the number of iterations for the vector
> loop should
> be integrated into the main vectorization scheme as should determining the
> cost of the predication.  So you'll end up adding a
> LOOP_VINFO_MASK_MAIN_LOOP_FOR_EPILOGUE flag, determined during
> cost analysis and during code generation adjust vector iteration computation
> accordingly and _not_ generate the epilogue loop (or wire it up correctly in
> the first place).
>
> The actual stmt processing should then still happen in a similar way as you do.
>
> So I'm going to comment on that part only as I expect the rest will look a lot
> different.
>
> +/* Generate induction_vector which will be used to mask evaluation.  */
> +
> +static tree
> +gen_vec_induction (loop_vec_info loop_vinfo, unsigned elem_size, unsigned size)
> +{
>
> please make use of create_iv.  Add more comments.  I reverse-engineered
> that you add a { { 0, ..., vf }, +, {vf, ... vf } } IV which you use
> in gen_mask_for_remainder
> by comparing it against { niter, ..., niter }.
>
> +  gsi = gsi_after_labels (loop->header);
> +  niters = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> +          ? LOOP_VINFO_NITERS (loop_vinfo)
> +          : LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo);
>
> that's either wrong or unnecessary.  if ! peeling for alignment
> loop-vinfo-niters
> is equal to loop-vinfo-niters-unchanged.
>
> +      ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
> +      if (!SSA_NAME_PTR_INFO (addr))
> +       copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), ref);
>
> vect_duplicate_ssa_name_ptr_info.
>
> +
> +static void
> +fix_mask_for_masked_ld_st (vec<gimple *> *masked_stmt, tree mask)
> +{
> +  gimple *stmt, *new_stmt;
> +  tree old, lhs, vectype, var, n_lhs;
>
> no comment?  what's this for.
>
> +/* Convert vectorized reductions to VEC_COND statements to preserve
> +   reduction semantic:
> +       s1 = x + s2 --> t = x + s2; s1 = (mask)? t : s2.  */
> +
> +static void
> +convert_reductions (loop_vec_info loop_vinfo, tree mask)
> +{
>
> for reductions it looks like preserving the last iteration x plus the mask
> could avoid predicating it this way and compensate in the reduction
> epilogue by "subtracting" x & mask?  With true predication support
> that'll likely be more expensive of course.
>
> +      /* Generate new VEC_COND expr.  */
> +      vec_cond_expr = build3 (VEC_COND_EXPR, vectype, mask, new_lhs, rhs);
> +      new_stmt = gimple_build_assign (lhs, vec_cond_expr);
>
> gimple_build_assign (lhs, VEC_COND_EXPR, vectype, mask, new_lhs, rhs);
>
> +/* Return true if MEM_REF is incremented by vector size and false
> otherwise.  */
> +
> +static bool
> +mem_ref_is_vec_size_incremented (loop_vec_info loop_vinfo, tree lhs)
> +{
> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>
> what?!  Just look at DR_STEP of the store?
>
>
> +void
> +combine_vect_loop_remainder (loop_vec_info loop_vinfo)
> +{
> +  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  auto_vec<gimple *, 10> loads;
> +  auto_vec<gimple *, 5> stores;
>
> so you need to re-structure this in a way that it computes
>
>   a) wheter it can perform the operation - and you need to do that
>       reliably before the operation has taken place
>   b) its cost
>
> instead of looking at def types or gimple_assign_load/store_p predicates
> please look at STMT_VINFO_TYPE instead.
>
> I don't like the new target hook for the costing.  We do need some major
> re-structuring in the vectorizer cost model implementation, this doesn't go
> into the right direction.
>
> A simplistic hook following the current scheme would have used
> the vect_cost_for_stmt as argument and mirror builtin_vectorization_cost.
>
> There is not a single testcase in the patch.  I would have expected one that
> makes sure we keep the 6% speedup for cactusADM at least.
>
>
> So this was a 45minute "overall" review not going into all the
> implementation details.
>
> Thanks,
> Richard.
>
>
>> 2015-11-10 17:52 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Nov 10, 2015 at 2:02 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> 2015-11-10 15:30 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> It looks like misunderstanding - we assume that for GCCv6 the simple
>>>>>> scheme of remainder will be used through introducing new IV :
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
>>>>>>
>>>>>> Is it true or we missed something?
>>>>>
>>>>> <quote>
>>>>>> > Do you have an idea how "masking" is better be organized to be usable
>>>>>> > for both 4b and 4c?
>>>>>>
>>>>>> Do 2a ...
>>>>> Okay.
>>>>> </quote>
>>>>
>>>> 2a was 'transform already vectorized loop as a separate
>>>> post-processing'. Isn't it what this prototype patch implements?
>>>> Current version only masks loop body which is in practice applicable
>>>> for AVX-512 only in the most cases.  With AVX-512 it's easier to see
>>>> how profitable masking might be and it is a main target for the first
>>>> masking version.  Extending it to prologues/epilogues and thus making
>>>> it more profitable for other targets is the next step and is out of
>>>> the scope of this patch.
>>>
>>> Ok, technically the prototype transforms the already vectorized loop.
>>> Of course I meant the vectorized loop be copied, masked and that
>>> result used as epilogue...
>>>
>>> I'll queue a more detailed look into the patch for this week.
>>>
>>> Did you perform any measurements with this patch like # of
>>> masked epilogues in SPEC 2006 FP (and any speedup?)
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> Ilya
>>>>
>>>>>
>>>>> Richard.
>>>>>

next prev parent reply	other threads:[~2015-11-30 15:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-28 10:57 Yuri Rumyantsev
2015-11-03 10:08 ` Richard Henderson
2015-11-03 10:35   ` Yuri Rumyantsev
2015-11-03 11:47 ` Richard Biener
2015-11-03 12:08   ` Yuri Rumyantsev
2015-11-10 12:30     ` Richard Biener
2015-11-10 13:02       ` Ilya Enkovich
2015-11-10 14:52         ` Richard Biener
2015-11-13 10:36           ` Yuri Rumyantsev
2015-11-23 15:54             ` Yuri Rumyantsev
2015-11-24  9:21               ` Richard Biener
2015-11-27 13:49             ` Richard Biener
2015-11-30 15:04               ` Yuri Rumyantsev [this message]
2015-12-15 16:41                 ` Yuri Rumyantsev
2016-01-11 10:07                   ` Yuri Rumyantsev
2016-02-09 16:10                   ` Ilya Enkovich
2016-02-09 16:18                     ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEoMCqRcYCuGbSohGkLvZ6ttX4dy9Y6jOSer8-amdZXKQRC7Xw@mail.gmail.com \
    --to=ysrumyan@gmail.com \
    --cc=enkovich.gnu@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=izamyatin@gmail.com \
    --cc=law@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).