From: Ajit Agarwal <aagarwa1@linux.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
Jeff Law <jeffreyalaw@gmail.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
Peter Bergner <bergner@linux.ibm.com>,
"Kewen.Lin" <linkw@linux.ibm.com>
Subject: Re: [PATCH] tree-optimization: Add register pressure heuristics
Date: Fri, 3 Nov 2023 20:24:54 +0530 [thread overview]
Message-ID: <d12a4582-ad35-46eb-83c6-7558cd1a3707@linux.ibm.com> (raw)
In-Reply-To: <CAFiYyc1s6gxe1sbPb-KLUCuReRmEvF=XhxZTYeBZ_rQRMNiSyg@mail.gmail.com>
Hello Richard:
On 03/11/23 7:06 pm, Richard Biener wrote:
> On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>
>> Hello Richard:
>>
>> On 03/11/23 12:51 pm, Richard Biener wrote:
>>> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>>>
>>>> Hello All:
>>>>
> [...]
>>>>
>>>> High register pressure region is the region where there are live-in of
>>>> early blocks that has been modified by the early block. If there are
>>>> modification of the variables in best block that are live-in in early
>>>> block that are live-out of best block.
>>>
>>> ?! Parse error.
>>>
>>
>> I didnt understand what you meant here. Please suggest.
>
> I can't even guess what that paragraph means. It fails at a
> parsing level already, I can't even start to reason about what
> the sentences mean.
Sorry for that I will modify.
>
>>>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>>
>>> What's the effect on code generation?
>>>
>>> Note that live is a quadratic problem while sinking was not. You
>>> are effectively making the pass unfit for -O1.
>>>
>>> You are computing "liveness" on GIMPLE where within EBBs there
>>> isn't really any particular order of stmts, so it's kind of a garbage
>>> heuristic. Likewise you are not computing the effect that moving
>>> a stmt has on liveness as far as I can see but you are just identifying
>>> some odd metrics (I don't really understand them) to rank blocks,
>>> not even taking the register file size into account.
>>
>>
>> if the live out of best_bb <= live out of early_bb, that shows
>> that there are modification in best_bb.
>
> Hm? Do you maybe want to say that if live_out (bb) < live_in (bb)
> then some variables die during the execution of bb?
live_out (bb) < live_in(bb) means in bb there may be KILL (Variables)
and there are more GEN (Variables).
Otherwise,
> if live_out (early) > live_out (best) then somewhere on the path
> from early to best some variables die.
>
If live_out (early) > live_out (best) means there are more GEN (Variables)
between path from early to best.
>> Then it's
>> safer to move statements in best_bb as there are lesser interfering
>> live variables in best_bb.
>
> consider a stmt
>
> a = b + c;
>
> where b and c die at the definition of a. Then moving the stmt
> down from early_bb means you increase live_out (early_bb) by
> one. So why's that "safer" then? Of course live_out (best_bb)
> also increases by two then.
>
If b and c die at the definition of a and generates a live_in(early_bb)
would be live_out(early_bb) - 2 + 1.
the moving the stmt from early_bb down to best_bb increases live_out(early_bb)
by one and live_out (best_bb) depends on the LIVEIN(for all successors of best_bb)
which may be same even if we move down.
There are chances that live_out (best_bb) greater if for all successors of
best_bb there are more GEN ( variables). If live_out (best_bb) is less
means there more KILL (Variables) in successors of best_bb.
With my heuristics live_out (best_bb ) > live_out (early_bb) then we dont do
code motion as there are chances of more interfering live ranges. If liveout(best_bb)
<= liveout (early_bb) then we do code motion as there is there are more KILL(for
all successors of best_bb) and there is less chance of interfering live ranges.
With moving down above stmt from early_bb to best_bb increases live_out(early_bb)
by one but live_out(best_bb) may be remains. If live_out (early_bb) increase by 1
but if it becomes > live_out(best_bb) then we dont do code motion if we have more GEN (Variables) in best_bb otherewise its safer to do
code motion.
for above statement a = b + c dies b and c and generates a in early_bb then
liveout(early_bb) increases by 1. If before moving if liveout (best_bb) is 10
and then liveout (early_bb) becomes > 10 then we dont do code motion otherwise
we do code motion.
>> if there are lesser live out in best_bb, there is lesser chance
>> of interfering live ranges and hence moving statements in best_bb
>> will not increase register pressure.
>>
>> If the liveout of best_bb is greater than live-out of early_bb,
>> moving statements in best_bb will increase chances of more interfering
>> live ranges and hence increase in register pressure.
>>
>> This is how the heuristics is defined.
>
> I don't think they will work. Do you have any numbers?
>
My heuristics will work as mentioned above. I will run the spec benchmarks
and will able to give performance numbers.
Thanks & Regards
Ajit
>>
>>>
>>> You are replacing the hot/cold heuristic.
>>
>>>
>>> IMHO the sinking pass is the totally wrong place to do anything
>>> about register pressure. You are trying to solve a scheduling
>>> problem by just looking at a single stmt.
>>>
>>
>> bb->count from profile.cc are prone to errors as you have
>> mentioned in previous mails. Main bottlenecks with code
>> motion is increase in register pressure as that counts to
>> spills in later phases of the compiler backend.
>>
>> Calculation of best_bb based of immediate dominator should
>> consider register pressure instead of hold cold regions as that
>> would effect code generation.
>>
>> If there is increase in register pressure with code motion and if
>> we are moving into colder regions, that wont improve code generations.
>>
>> Hold/cold should be criteria but not the improved criteria with
>> code motion.
>>
>> We should consider register pressure with code motion than hot/cold
>> regions.
>>
>> Thanks & Regards
>> Ajit
>>
>>> Richard.
>>>
>>>> Thanks & Regards
>>
>>
>>>> Ajit
>>>>
>>>> tree-optimization: Add register pressure heuristics
>>>>
>>>> Currently code sinking heuristics are based on profile data like
>>>> basic block count and sink frequency threshold. We have removed
>>>> such heuristics to add register pressure heuristics based on
>>>> live-in and live-out of early blocks and immediate dominator of
>>>> use blocks.
>>>>
>>>> High register pressure region is the region where there are live-in of
>>>> early blocks that has been modified by the early block. If there are
>>>> modification of the variables in best block that are live-in in early
>>>> block that are live-out of best block.
>>>>
>>>> 2023-11-03 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
>>>> as paramters.
>>>> (sink_code_in_bb): Ditto.
>>>> (select_best_block): Add register pressure heuristics to select
>>>> the best blocks in the immediate dominator for same loop nest depth.
>>>> (execute): Add live range analysis.
>>>> (additional_var_map): New function.
>>>> * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
>>>> tests on ssa_names.
>>>> (verify_live_on_entry): Ditto.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
>>>> * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
>>>> ---
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
>>>> gcc/tree-ssa-live.cc | 11 ++-
>>>> gcc/tree-ssa-sink.cc | 93 ++++++++++++++-------
>>>> 4 files changed, 104 insertions(+), 34 deletions(-)
>>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>> new file mode 100644
>>>> index 00000000000..d3b79ca5803
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>> @@ -0,0 +1,15 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>>>> +void bar();
>>>> +int j;
>>>> +void foo(int a, int b, int c, int d, int e, int f)
>>>> +{
>>>> + int l;
>>>> + l = a + b + c + d +e + f;
>>>> + if (a != 5)
>>>> + {
>>>> + bar();
>>>> + j = l;
>>>> + }
>>>> +}
>>>> +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>> new file mode 100644
>>>> index 00000000000..84e7938c54f
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>> @@ -0,0 +1,19 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>>>> +void bar();
>>>> +int j, x;
>>>> +void foo(int a, int b, int c, int d, int e, int f)
>>>> +{
>>>> + int l;
>>>> + l = a + b + c + d +e + f;
>>>> + if (a != 5)
>>>> + {
>>>> + bar();
>>>> + if (b != 3)
>>>> + x = 3;
>>>> + else
>>>> + x = 5;
>>>> + j = l;
>>>> + }
>>>> +}
>>>> +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
>>>> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
>>>> index f06daf23035..998fe588278 100644
>>>> --- a/gcc/tree-ssa-live.cc
>>>> +++ b/gcc/tree-ssa-live.cc
>>>> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
>>>> def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>>>>
>>>> /* An undefined local variable does not need to be very alive. */
>>>> - if (ssa_undefined_value_p (ssa_name, false))
>>>> + if (virtual_operand_p (ssa_name)
>>>> + || ssa_undefined_value_p (ssa_name, false))
>>>> return;
>>>>
>>>> /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
>>>> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
>>>>
>>>>
>>>> /* Verify that the info in LIVE matches the current cfg. */
>>>> -
>>>> static void
>>>> verify_live_on_entry (tree_live_info_p live)
>>>> {
>>>> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
>>>> tree d = NULL_TREE;
>>>> bitmap loe;
>>>> var = partition_to_var (map, i);
>>>> + if (var == NULL_TREE)
>>>> + continue;
>>>> stmt = SSA_NAME_DEF_STMT (var);
>>>> tmp = gimple_bb (stmt);
>>>> +
>>>> if (SSA_NAME_VAR (var))
>>>> d = ssa_default_def (cfun, SSA_NAME_VAR (var));
>>>> -
>>>> loe = live_on_entry (live, e->dest);
>>>> if (loe && bitmap_bit_p (loe, i))
>>>> {
>>>> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
>>>> {
>>>> /* An undefined local variable does not need to be very
>>>> alive. */
>>>> - if (ssa_undefined_value_p (var, false))
>>>> + if (virtual_operand_p (var)
>>>> + || ssa_undefined_value_p (var, false))
>>>> continue;
>>>>
>>>> /* The only way this var shouldn't be marked live on entry is
>>>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
>>>> index a360c5cdd6e..d0c9ef1ab86 100644
>>>> --- a/gcc/tree-ssa-sink.cc
>>>> +++ b/gcc/tree-ssa-sink.cc
>>>> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>>> tree, return the best basic block between them (inclusive) to place
>>>> statements.
>>>>
>>>> + The best basic block should be an immediate dominator of
>>>> + best basic block if we've moved to same loop nest.
>>>> +
>>>> We want the most control dependent block in the shallowest loop nest.
>>>>
>>>> If the resulting block is in a shallower loop nest, then use it. Else
>>>> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>>> static basic_block
>>>> select_best_block (basic_block early_bb,
>>>> basic_block late_bb,
>>>> - gimple *stmt)
>>>> + gimple *stmt,
>>>> + tree_live_info_p &live)
>>>> {
>>>> basic_block best_bb = late_bb;
>>>> basic_block temp_bb = late_bb;
>>>> - int threshold;
>>>>
>>>> while (temp_bb != early_bb)
>>>> {
>>>> /* If we've moved into a lower loop nest, then that becomes
>>>> our best block. */
>>>> - if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>>>> + if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
>>>> best_bb = temp_bb;
>>>>
>>>> /* Walk up the dominator tree, hopefully we'll find a shallower
>>>> loop nest. */
>>>> temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>>>> }
>>>> -
>>>> /* Placing a statement before a setjmp-like function would be invalid
>>>> (it cannot be reevaluated when execution follows an abnormal edge).
>>>> If we selected a block with abnormal predecessors, just punt. */
>>>> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb,
>>>> && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>>>> return early_bb;
>>>>
>>>> - /* Get the sinking threshold. If the statement to be moved has memory
>>>> - operands, then increase the threshold by 7% as those are even more
>>>> - profitable to avoid, clamping at 100%. */
>>>> - threshold = param_sink_frequency_threshold;
>>>> - if (gimple_vuse (stmt) || gimple_vdef (stmt))
>>>> - {
>>>> - threshold += 7;
>>>> - if (threshold > 100)
>>>> - threshold = 100;
>>>> - }
>>>> + int best_bb_liveout_cnt
>>>> + = bitmap_count_bits (&live->liveout[best_bb->index]);
>>>> + int early_bb_liveout_cnt
>>>> + = bitmap_count_bits (&live->liveout[early_bb->index]);
>>>> + int early_livein_cnt
>>>> + = bitmap_count_bits (&live->livein[early_bb->index]);
>>>> +
>>>> + /* High register pressure region is the region where there are live-in of
>>>> + early blocks that has been modified by the early block. If there are
>>>> + modification of the variables in best block that are live-in in early
>>>> + block that are live-out of best block. */
>>>> + bool live_in_rgn = (early_livein_cnt != 0
>>>> + && early_bb_liveout_cnt <= early_livein_cnt);
>>>> +
>>>> + bool high_reg_pressure_rgn
>>>> + = ((best_bb_liveout_cnt != 0 || live_in_rgn)
>>>> + && best_bb_liveout_cnt <= early_bb_liveout_cnt);
>>>>
>>>> /* If BEST_BB is at the same nesting level, then require it to have
>>>> - significantly lower execution frequency to avoid gratuitous movement. */
>>>> + significantly high register pressure region to avoid gratuitous
>>>> + movement. */
>>>> if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>>>> - /* If result of comparsion is unknown, prefer EARLY_BB.
>>>> - Thus use !(...>=..) rather than (...<...) */
>>>> - && !(best_bb->count * 100 >= early_bb->count * threshold))
>>>> - return best_bb;
>>>> + && high_reg_pressure_rgn)
>>>> + {
>>>> + /* Avoid sinking to immediate dominator if the statement to be moved
>>>> + has memory operand and same loop nest. */
>>>> + if (best_bb != late_bb && gimple_vuse (stmt))
>>>> + return late_bb;
>>>> + return best_bb;
>>>> + }
>>>>
>>>> /* No better block found, so return EARLY_BB, which happens to be the
>>>> statement's original block. */
>>>> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb,
>>>> static bool
>>>> statement_sink_location (gimple *stmt, basic_block frombb,
>>>> gimple_stmt_iterator *togsi, bool *zero_uses_p,
>>>> - virtual_operand_live &vop_live)
>>>> + virtual_operand_live &vop_live,
>>>> + tree_live_info_p &live)
>>>> {
>>>> gimple *use;
>>>> use_operand_p one_use = NULL_USE_OPERAND_P;
>>>> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>> if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
>>>> return false;
>>>>
>>>> - commondom = select_best_block (frombb, commondom, stmt);
>>>> + commondom = select_best_block (frombb, commondom, stmt, live);
>>>>
>>>> if (commondom == frombb)
>>>> return false;
>>>> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>> continue;
>>>> break;
>>>> }
>>>> +
>>>> use = USE_STMT (one_use);
>>>>
>>>> if (gimple_code (use) != GIMPLE_PHI)
>>>> {
>>>> - sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
>>>> + sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
>>>>
>>>> if (sinkbb == frombb)
>>>> return false;
>>>>
>>>> - if (sinkbb == gimple_bb (use))
>>>> - *togsi = gsi_for_stmt (use);
>>>> - else
>>>> - *togsi = gsi_after_labels (sinkbb);
>>>> + *togsi = gsi_after_labels (sinkbb);
>>>>
>>>> return true;
>>>> }
>>>> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>> if (!sinkbb)
>>>> return false;
>>>>
>>>> - sinkbb = select_best_block (frombb, sinkbb, stmt);
>>>> + sinkbb = select_best_block (frombb, sinkbb, stmt, live);
>>>> if (!sinkbb || sinkbb == frombb)
>>>> return false;
>>>>
>>>> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb)
>>>> /* Perform code sinking on BB */
>>>>
>>>> static unsigned
>>>> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>>> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
>>>> + tree_live_info_p &live)
>>>> {
>>>> gimple_stmt_iterator gsi;
>>>> edge_iterator ei;
>>>> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>>> gimple_stmt_iterator togsi;
>>>> bool zero_uses_p;
>>>>
>>>> - if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
>>>> + if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
>>>> + vop_live, live))
>>>> {
>>>> gimple_stmt_iterator saved = gsi;
>>>> if (!gsi_end_p (gsi))
>>>> @@ -814,6 +829,15 @@ private:
>>>> bool unsplit_edges;
>>>> }; // class pass_sink_code
>>>>
>>>> +void additional_var_map (var_map &map)
>>>> +{
>>>> + map->bmp_bbs = BITMAP_ALLOC (NULL);
>>>> + map->outofssa_p = false;
>>>> + basic_block bb;
>>>> + FOR_EACH_BB_FN (bb, cfun)
>>>> + bitmap_set_bit (map->bmp_bbs, bb->index);
>>>> +}
>>>> +
>>>> unsigned int
>>>> pass_sink_code::execute (function *fun)
>>>> {
>>>> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun)
>>>> calculate_dominance_info (CDI_DOMINATORS);
>>>>
>>>> virtual_operand_live vop_live;
>>>> + var_map map = init_var_map (num_ssa_names);
>>>> + additional_var_map (map);
>>>> + tree_live_info_p live = calculate_live_ranges (map, true);
>>>>
>>>> int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>>>> int n = inverted_rev_post_order_compute (fun, rpo);
>>>> +
>>>> for (int i = 0; i < n; ++i)
>>>> - todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
>>>> + todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
>>>> free (rpo);
>>>> + if (live)
>>>> + delete_tree_live_info (live);
>>>> +
>>>> + if (map)
>>>> + delete_var_map (map);
>>>>
>>>> statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>>>> statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
>>>> --
>>>> 2.39.3
>>>>
>>>>
next prev parent reply other threads:[~2023-11-03 14:55 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-02 20:50 Ajit Agarwal
2023-11-03 7:21 ` Richard Biener
2023-11-03 10:20 ` Ajit Agarwal
2023-11-03 13:36 ` Richard Biener
2023-11-03 14:54 ` Ajit Agarwal [this message]
2023-11-04 17:49 ` Ajit Agarwal
2023-11-16 6:12 ` Ajit Agarwal
2023-11-16 7:16 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d12a4582-ad35-46eb-83c6-7558cd1a3707@linux.ibm.com \
--to=aagarwa1@linux.ibm.com \
--cc=bergner@linux.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=linkw@linux.ibm.com \
--cc=richard.guenther@gmail.com \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).