From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom
Date: Mon, 24 Jul 2017 14:27:00 -0000 [thread overview]
Message-ID: <CAHFci2_7BsYZ-rXP-a=LMUxOncmO3rbWzxfYTqh9yTYHs0QKrQ@mail.gmail.com> (raw)
In-Reply-To: <CAHFci29QwmFnNGUHh1-9hKJswm_GNUYtnrihXb7zUrvzcqteLw@mail.gmail.com>
Ping^1.
Thanks,
bin
On Mon, Jul 10, 2017 at 9:23 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Jul 4, 2017 at 1:29 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, Jul 4, 2017 at 2:06 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, Jul 4, 2017 at 12:19 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>>>> Hi,
>>>>>>> For the moment, tree-predcom.c only supports invariant/load-loads/store-loads chains.
>>>>>>> This patch generalizes dead store elimination (or store motion) across loop iterations in
>>>>>>> predictive commoning pass by supporting store-store chain. As comment in the patch:
>>>>>>>
>>>>>>> Apart from predictive commoning on Load-Load and Store-Load chains, we
>>>>>>> also support Store-Store chains -- stores killed by other store can be
>>>>>>> eliminated. Given below example:
>>>>>>>
>>>>>>> for (i = 0; i < n; i++)
>>>>>>> {
>>>>>>> a[i] = 1;
>>>>>>> a[i+2] = 2;
>>>>>>> }
>>>>>>>
>>>>>>> It can be replaced with:
>>>>>>>
>>>>>>> t0 = a[0];
>>>>>>> t1 = a[1];
>>>>>>> for (i = 0; i < n; i++)
>>>>>>> {
>>>>>>> a[i] = 1;
>>>>>>> t2 = 2;
>>>>>>> t0 = t1;
>>>>>>> t1 = t2;
>>>>>>> }
>>>>>>> a[n] = t0;
>>>>>>> a[n+1] = t1;
>>>>>>>
>>>>>>> If the loop runs more than 1 iterations, it can be further simplified into:
>>>>>>>
>>>>>>> for (i = 0; i < n; i++)
>>>>>>> {
>>>>>>> a[i] = 1;
>>>>>>> }
>>>>>>> a[n] = 2;
>>>>>>> a[n+1] = 2;
>>>>>>>
>>>>>>> The interesting part is this can be viewed either as general store motion
>>>>>>> or general dead store elimination in either intra/inter-iterations way.
>>>>>>>
>>>>>>> There are number of interesting facts about this enhancement:
>>>>>>> a) This patch supports dead store elimination for both across-iteration case and single-iteration
>>>>>>> case. For the latter, it is dead store elimination.
>>>>>>> b) There are advantages supporting dead store elimination in predcom, for example, it has
>>>>>>> complete information about memory address. On the contrary, DSE pass can only handle
>>>>>>> memory references with exact the same memory address expression.
>>>>>>> c) It's cheap to support store-stores chain in predcom based on existing code.
>>>>>>> d) As commented, the enhancement can be viewed as either generalized dead store elimination
>>>>>>> or generalized store motion. I prefer DSE here.
>>>>>>>
>>>>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64. Is it OK?
>>>>>>
>>>>>> Looks mostly ok. I have a few questions though.
>>>>>>
>>>>>> + /* Don't do store elimination if loop has multiple exit edges. */
>>>>>> + bool eliminate_store_p = single_exit (loop) != NULL;
>>>>>>
>>>>>> handling this would be an enhancement? IIRC LIM store-motion handles this
>>>>>> just fine by emitting code on all exits.
>>>>> It is an enhancement with a little bit more complication. We would
>>>>> need to setup/record finalizer memory references for different exit
>>>>> edges. I added TODO description for this (and following one). Is it
>>>>> okay to pick up this in the future?
>>>>
>>>> Yes.
>>>>
>>>>>>
>>>>>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec<chain_p> chains)
>>>>>> {
>>>>>> if (chain->type == CT_INVARIANT)
>>>>>> continue;
>>>>>> + /* Don't unroll when eliminating stores. */
>>>>>> + else if (chain->type == CT_STORE_STORE)
>>>>>> + return 1;
>>>>>>
>>>>>> this is a hard exit value so we do not handle the case where another chain
>>>>>> in the loop would want to unroll? (enhancement?) I'd have expected to do
>>>>>> the same as for CT_INVARIANT here.
>>>>> I didn't check what change is needed in case of unrolling. I am not
>>>>> very sure if we should prefer unroll for *load chains or prefer not
>>>>> unroll for store-store chains, because unroll in general increases
>>>>> loop-carried register pressure for store-store chains rather than
>>>>> decreases register pressure for *load chains.
>>>>> I was also thinking if it's possible to restrict unrolling somehow in
>>>>> order to enable predcom at O2. BTW, this is not common, it only
>>>>> happens once in spec2k6 with factor forced to 1. So okay if as it is
>>>>> now?
>>>>
>>>> I think it is ok for now with a TODO added. Please change the comment
>>>> to "we can't handle unrolling when eliminating stores" (it's not clear if we
>>>> can -- did you try? maybe add a testcase that would ICE)
>>>>
>>>>>
>>>>>>
>>>>>> + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
>>>>>> + if (!chain->all_always_accessed && tree_could_trap_p (init))
>>>>>> + {
>>>>>> + gimple_seq_discard (stmts);
>>>>>> + return false;
>>>>>>
>>>>>> so this is the only place that remotely cares for not always performed stores.
>>>>>> But as-is the patch doesn't seem to avoid speculating stores and thus
>>>>>> violates the C++ memory model, aka, introduces store-data-races? The LIM
>>>>>> store-motion code was fixed to avoid this by keeping track of whether a BB
>>>>>> has executed to guard the stores done in the compensation code on the loop
>>>>>> exit.
>>>>>>
>>>>>> That said, to "fix" this all && tree_could_trap_p cases would need to be removed
>>>>>> (or similarly flag vars be introduced). Speculating loads that do not
>>>>>> trap is ok
>>>>>> (might only introduce false uninit use reports by tools like valgrind).
>>>>> Hmm, not sure IIUC. Patch updated, is it correct (though conservative)?
>>>>
>>>> +static bool
>>>> +prepare_initializers_chain_store_elim (struct loop *, chain_p chain)
>>>> +{
>>>> ...
>>>> + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
>>>> + if (!chain->all_always_accessed && tree_could_trap_p (init))
>>>> + {
>>>>
>>>> remove && tree_could_trap_p and hoist the other check.
>>>>
>>>> Same in prepare_finalizers_chain. I think you should check
>>>>
>>>> if (! chain->all_always_accessed
>>>> && ! PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
>>>> return false;
>>>>
>>>> at the beginning of both functions and retain
>>>>
>>>> if (! chain->all_always_accessed && tree_could_trap_p ())
>>>>
>>>> in the loops.
>>>>
>>>> Ok with that change and a testcase that would exercise failure/ICE of
>>>> store chains w/ unrolling.
>>> Hmm, now I remember maybe these
>>> all_always_accessed/trap/data_store_race checks can be simplified. In
>>> function suitable_component_p, we call just_once_each_iteration_p for
>>> all references. So we shouldn't not end up with
>>> chain->all_always_accessed == false cases, right? why means we don't
>>> really need to check at all?
>>
>> Not sure. We have this check in the existing code already. Please come
>> up with a testcase that we might not DSE-pcom because of store speculation.
> So just_once_each_iteration_p function call skips all if-condition
> cases, while chain->all_always_accessed skips multi-exit cases. As
> for store-elimination, given we already requires single-exit loops, we
> can assume chain->all_always_accessed is true in
> prepare_initializers_chain_store_elim and prepare_finalizers_chain.
> To make code clearer, this updated patch explicitly checks
> chain->all_always_accessed at the beginning of both functions. The
> fact is, we can't eliminate conditional stores (at least with current
> code) because both the condition and niters need to be compilation
> time constants to propagate the value for storing. It's complicated
> and may not worth the effort if they are constants.
> I also added two new tests, one for conditional store elimination, the
> other for unrolling/store-elimination.
> Bootstrap and test on x86_64 and AArch64. Is it OK?
>
> Thanks,
> bin
> 2017-07-04 Bin Cheng <bin.cheng@arm.com>
>
> * tree-predcom.c: Revise general description of pass.
> (enum chain_type): New enum type for store elimination.
> (struct chain): New field supporting store elimination.
> (dump_chain): Dump store-stores chain.
> (release_chain): Release resources.
> (split_data_refs_to_components): Compute and create component
> contains only stores for elimination.
> (get_chain_last_ref_at): New function.
> (make_invariant_chain): Initialization.
> (make_rooted_chain): Specify chain type in parameter.
> (add_looparound_copies): Skip for store-stores chain.
> (determine_roots_comp): Compute type of chain and pass it to
> make_rooted_chain.
> (initialize_root_vars_store_elim_2): New function.
> (finalize_eliminated_stores): New function.
> (remove_stmt): Handle store for elimination.
> (execute_pred_commoning_chain): Execute predictive commoning on
> store-store chains.
> (determine_unroll_factor): Skip unroll for store-stores chain.
> (prepare_initializers_chain_store_elim): New function.
> (prepare_initializers_chain): Hanlde store-store chain.
> (prepare_finalizers_chain, prepare_finalizers): New function.
> (tree_predictive_commoning_loop): Return integer value indicating
> if loop is unrolled or lcssa form is corrupted.
> (tree_predictive_commoning): Rewrite for lcssa form if necessary.
>
> gcc/testsuite/ChangeLog
> 2017-07-04 Bin Cheng <bin.cheng@arm.com>
>
> * gcc.dg/tree-ssa/predcom-dse-1.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-2.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-3.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-4.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-5.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-6.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-7.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-8.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-9.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-10.c: New test.
> * gcc.dg/tree-ssa/predcom-dse-11.c: New test.
next prev parent reply other threads:[~2017-07-24 14:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-27 10:49 Bin Cheng
2017-07-03 9:38 ` Richard Biener
2017-07-03 14:18 ` Bin.Cheng
2017-07-04 11:19 ` Richard Biener
2017-07-04 12:06 ` Bin.Cheng
2017-07-04 12:29 ` Richard Biener
2017-07-10 8:24 ` Bin.Cheng
2017-07-24 14:27 ` Bin.Cheng [this message]
2017-07-25 11:46 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHFci2_7BsYZ-rXP-a=LMUxOncmO3rbWzxfYTqh9yTYHs0QKrQ@mail.gmail.com' \
--to=amker.cheng@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).