From: Yuri Rumyantsev <ysrumyan@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Ilya Enkovich <enkovich.gnu@gmail.com>,
gcc-patches <gcc-patches@gcc.gnu.org>, Jeff Law <law@redhat.com>,
Igor Zamyatin <izamyatin@gmail.com>
Subject: Re: [RFC] Combine vectorized loops with its scalar remainder.
Date: Mon, 30 Nov 2015 15:04:00 -0000 [thread overview]
Message-ID: <CAEoMCqRcYCuGbSohGkLvZ6ttX4dy9Y6jOSer8-amdZXKQRC7Xw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc2F_x3V8QP+z6cPDL9DMvJm5E8nqq06S8-=Ms1jacSfmQ@mail.gmail.com>
Richard,
Thanks a lot for your detailed comments!
Few words about 436.cactusADM gain. The loop which was transformed for
avx2 is very huge and this is the last inner-most loop in routine
Bench_StaggeredLeapfrog2 (StaggeredLeapfrog2.F #366). If you don't
have sources, let me know.
Yuri.
2015-11-27 16:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Nov 13, 2015 at 11:35 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard,
>>
>> Here is updated version of the patch which 91) is in sync with trunk
>> compiler and (2) contains simple cost model to estimate profitability
>> of scalar epilogue elimination. The part related to vectorization of
>> loops with small trip count is in process of developing. Note that
>> implemented cost model was not tuned well for HASWELL and KNL but we
>> got ~6% speed-up on 436.cactusADM from spec2006 suite for HASWELL.
>
> Ok, so I don't know where to start with this.
>
> First of all while I wanted to have the actual stmt processing to be
> as post-processing
> on the vectorized loop body I didn't want to have this competely separated from
> vectorizing.
>
> So, do combine_vect_loop_remainder () from vect_transform_loop, not by iterating
> over all (vectorized) loops at the end.
>
> Second, all the adjustments of the number of iterations for the vector
> loop should
> be integrated into the main vectorization scheme as should determining the
> cost of the predication. So you'll end up adding a
> LOOP_VINFO_MASK_MAIN_LOOP_FOR_EPILOGUE flag, determined during
> cost analysis and during code generation adjust vector iteration computation
> accordingly and _not_ generate the epilogue loop (or wire it up correctly in
> the first place).
>
> The actual stmt processing should then still happen in a similar way as you do.
>
> So I'm going to comment on that part only as I expect the rest will look a lot
> different.
>
> +/* Generate induction_vector which will be used to mask evaluation. */
> +
> +static tree
> +gen_vec_induction (loop_vec_info loop_vinfo, unsigned elem_size, unsigned size)
> +{
>
> please make use of create_iv. Add more comments. I reverse-engineered
> that you add a { { 0, ..., vf }, +, {vf, ... vf } } IV which you use
> in gen_mask_for_remainder
> by comparing it against { niter, ..., niter }.
>
> + gsi = gsi_after_labels (loop->header);
> + niters = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> + ? LOOP_VINFO_NITERS (loop_vinfo)
> + : LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo);
>
> that's either wrong or unnecessary. if ! peeling for alignment
> loop-vinfo-niters
> is equal to loop-vinfo-niters-unchanged.
>
> + ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
> + if (!SSA_NAME_PTR_INFO (addr))
> + copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr), ref);
>
> vect_duplicate_ssa_name_ptr_info.
>
> +
> +static void
> +fix_mask_for_masked_ld_st (vec<gimple *> *masked_stmt, tree mask)
> +{
> + gimple *stmt, *new_stmt;
> + tree old, lhs, vectype, var, n_lhs;
>
> no comment? what's this for.
>
> +/* Convert vectorized reductions to VEC_COND statements to preserve
> + reduction semantic:
> + s1 = x + s2 --> t = x + s2; s1 = (mask)? t : s2. */
> +
> +static void
> +convert_reductions (loop_vec_info loop_vinfo, tree mask)
> +{
>
> for reductions it looks like preserving the last iteration x plus the mask
> could avoid predicating it this way and compensate in the reduction
> epilogue by "subtracting" x & mask? With true predication support
> that'll likely be more expensive of course.
>
> + /* Generate new VEC_COND expr. */
> + vec_cond_expr = build3 (VEC_COND_EXPR, vectype, mask, new_lhs, rhs);
> + new_stmt = gimple_build_assign (lhs, vec_cond_expr);
>
> gimple_build_assign (lhs, VEC_COND_EXPR, vectype, mask, new_lhs, rhs);
>
> +/* Return true if MEM_REF is incremented by vector size and false
> otherwise. */
> +
> +static bool
> +mem_ref_is_vec_size_incremented (loop_vec_info loop_vinfo, tree lhs)
> +{
> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>
> what?! Just look at DR_STEP of the store?
>
>
> +void
> +combine_vect_loop_remainder (loop_vec_info loop_vinfo)
> +{
> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> + auto_vec<gimple *, 10> loads;
> + auto_vec<gimple *, 5> stores;
>
> so you need to re-structure this in a way that it computes
>
> a) wheter it can perform the operation - and you need to do that
> reliably before the operation has taken place
> b) its cost
>
> instead of looking at def types or gimple_assign_load/store_p predicates
> please look at STMT_VINFO_TYPE instead.
>
> I don't like the new target hook for the costing. We do need some major
> re-structuring in the vectorizer cost model implementation, this doesn't go
> into the right direction.
>
> A simplistic hook following the current scheme would have used
> the vect_cost_for_stmt as argument and mirror builtin_vectorization_cost.
>
> There is not a single testcase in the patch. I would have expected one that
> makes sure we keep the 6% speedup for cactusADM at least.
>
>
> So this was a 45minute "overall" review not going into all the
> implementation details.
>
> Thanks,
> Richard.
>
>
>> 2015-11-10 17:52 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Nov 10, 2015 at 2:02 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
>>>> 2015-11-10 15:30 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> It looks like misunderstanding - we assume that for GCCv6 the simple
>>>>>> scheme of remainder will be used through introducing new IV :
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
>>>>>>
>>>>>> Is it true or we missed something?
>>>>>
>>>>> <quote>
>>>>>> > Do you have an idea how "masking" is better be organized to be usable
>>>>>> > for both 4b and 4c?
>>>>>>
>>>>>> Do 2a ...
>>>>> Okay.
>>>>> </quote>
>>>>
>>>> 2a was 'transform already vectorized loop as a separate
>>>> post-processing'. Isn't it what this prototype patch implements?
>>>> Current version only masks loop body which is in practice applicable
>>>> for AVX-512 only in the most cases. With AVX-512 it's easier to see
>>>> how profitable masking might be and it is a main target for the first
>>>> masking version. Extending it to prologues/epilogues and thus making
>>>> it more profitable for other targets is the next step and is out of
>>>> the scope of this patch.
>>>
>>> Ok, technically the prototype transforms the already vectorized loop.
>>> Of course I meant the vectorized loop be copied, masked and that
>>> result used as epilogue...
>>>
>>> I'll queue a more detailed look into the patch for this week.
>>>
>>> Did you perform any measurements with this patch like # of
>>> masked epilogues in SPEC 2006 FP (and any speedup?)
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> Ilya
>>>>
>>>>>
>>>>> Richard.
>>>>>
next prev parent reply other threads:[~2015-11-30 15:03 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-28 10:57 Yuri Rumyantsev
2015-11-03 10:08 ` Richard Henderson
2015-11-03 10:35 ` Yuri Rumyantsev
2015-11-03 11:47 ` Richard Biener
2015-11-03 12:08 ` Yuri Rumyantsev
2015-11-10 12:30 ` Richard Biener
2015-11-10 13:02 ` Ilya Enkovich
2015-11-10 14:52 ` Richard Biener
2015-11-13 10:36 ` Yuri Rumyantsev
2015-11-23 15:54 ` Yuri Rumyantsev
2015-11-24 9:21 ` Richard Biener
2015-11-27 13:49 ` Richard Biener
2015-11-30 15:04 ` Yuri Rumyantsev [this message]
2015-12-15 16:41 ` Yuri Rumyantsev
2016-01-11 10:07 ` Yuri Rumyantsev
2016-02-09 16:10 ` Ilya Enkovich
2016-02-09 16:18 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAEoMCqRcYCuGbSohGkLvZ6ttX4dy9Y6jOSer8-amdZXKQRC7Xw@mail.gmail.com \
--to=ysrumyan@gmail.com \
--cc=enkovich.gnu@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=izamyatin@gmail.com \
--cc=law@redhat.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).