From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 720263858D35 for ; Wed, 8 Nov 2023 09:45:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 720263858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 720263858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699436739; cv=none; b=UV3B3DwBfrmN+bnQ8ZVluc+13bFESXBjfvfB+pBGjx8GlneWMpRNW2zV7Yo3V7nkU9pDGoSGCTd27UIkM+/IbWVZuhvFqSm9HW4pTOvVkaUapao7mR6W4ZKvsLR2/uQdspFegAgRmAJhvhexOmnTDDWZzS39nC2q6Gbu2/EhBdE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699436739; c=relaxed/simple; bh=NfW7c5uvZ7qloIm3bZCh6RgONXfhj7gcxRGGu22uLfM=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=HJjTe152Jzm3V4O3dgqjZfZqJvBRljke5vThP1Muc/zgqmKaJ3CaCt8rEn4c7ll4LycP1KBY426pzjDSTn6LHvSp/xIQ22unDUzOo3KJYieqMoMOwsBd67L0kwQhGBiSilANE7GAVgUNSWYm3PpFGifCMo7JgjzLC7OPXVLmuXY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8E6671007; Wed, 8 Nov 2023 01:46:20 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 92E773F64C; Wed, 8 Nov 2023 01:45:34 -0800 (PST) From: Richard Sandiford To: "Kewen.Lin" Mail-Followup-To: "Kewen.Lin" ,GCC Patches , Richard Biener , Jeff Law , Vladimir Makarov , Alexander Monakov , zhroma@ispras.ru, abel@ispras.ru, Segher Boessenkool , Peter Bergner , Michael Meissner , richard.sandiford@arm.com Cc: GCC Patches , Richard Biener , Jeff Law , Vladimir Makarov , Alexander Monakov , zhroma@ispras.ru, abel@ispras.ru, Segher Boessenkool , Peter Bergner , Michael Meissner Subject: Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273] References: <85b4098e-a72f-d013-ff17-8097971f71ba@linux.ibm.com> Date: Wed, 08 Nov 2023 09:45:33 +0000 In-Reply-To: (Kewen Lin's message of "Wed, 8 Nov 2023 10:49:01 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-23.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Kewen.Lin" writes: > Hi, > > Gentle ping this: > > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html Sorry for the lack of review on this. Personally, I've never looked at this part of code base in detail, so I don't think I can do a proper review. I'll try to have a look in stage 3 if no one more qualified beats me to it. Thanks, Richard > > BR, > Kewen > > on 2023/10/25 10:45, Kewen.Lin wrote: >> Hi, >> >> This is almost a repost for v2 which was posted at[1] in March >> excepting for: >> 1) rebased from r14-4810 which is relatively up-to-date, >> some conflicts on "int to bool" return type change have >> been resolved; >> 2) adjust commit log a bit; >> 3) fix misspelled "articial" with "artificial" somewhere; >> >> -- >> *v2 comments*: >> >> By addressing Alexander's comments, against v1 this >> patch v2 mainly: >> >> - Rename no_real_insns_p to no_real_nondebug_insns_p; >> - Introduce enum rgn_bb_deps_free_action for three >> kinds of actions to free deps; >> - Change function free_deps_for_bb_no_real_insns_p to >> resolve_forw_deps which only focuses on forward deps; >> - Extend the handlings to cover dbg-cnt sched_block, >> add one test case for it; >> - Move free_trg_info call in schedule_region to an >> appropriate place. >> >> One thing I'm not sure about is the change in function >> sched_rgn_local_finish, currently the invocation to >> sched_rgn_local_free is guarded with !sel_sched_p (), >> so I just follow it, but the initialization of those >> structures (in sched_rgn_local_init) isn't guarded >> with !sel_sched_p (), it looks odd. >> >> -- >> >> As PR108273 shows, when there is one block which only has >> NOTE_P and LABEL_P insns at non-debug mode while has some >> extra DEBUG_INSN_P insns at debug mode, after scheduling >> it, the DFA states would be different between debug mode >> and non-debug mode. Since at non-debug mode, the block >> meets no_real_insns_p, it gets skipped; while at debug >> mode, it gets scheduled, even it only has NOTE_P, LABEL_P >> and DEBUG_INSN_P, the call of function advance_one_cycle >> will change the DFA state. PR108519 also shows this issue >> can be exposed by some scheduler changes. >> >> This patch is to change function no_real_insns_p to >> function no_real_nondebug_insns_p by taking debug insn into >> account, which make us not try to schedule for the block >> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns, >> resulting in consistent DFA states between non-debug and >> debug mode. >> >> Changing no_real_insns_p to no_real_nondebug_insns_p caused >> ICE when doing free_block_dependencies, the root cause is >> that we create dependencies for debug insns, those >> dependencies are expected to be resolved during scheduling >> insns, but they get skipped after this change. >> By checking the code, it looks it's reasonable to skip to >> compute block dependences for no_real_nondebug_insns_p >> blocks. There is also another issue, which gets exposed >> in SPEC2017 bmks build at option -O2 -g, is that we could >> skip to schedule some block, which already gets dependency >> graph built so has dependencies computed and rgn_n_insns >> accumulated, then the later verification on if the graph >> becomes exhausted by scheduling would fail as follow: >> >> /* Sanity check: verify that all region insns were >> scheduled. */ >> gcc_assert (sched_rgn_n_insns == rgn_n_insns); >> >> , and also some forward deps aren't resovled. >> >> As Alexander pointed out, the current debug count handling >> also suffers the similar issue, so this patch handles these >> two cases together: one is for some block gets skipped by >> !dbg_cnt (sched_block), the other is for some block which >> is not no_real_nondebug_insns_p initially but becomes >> no_real_nondebug_insns_p due to speculative scheduling. >> >> This patch can be bootstrapped and regress-tested on >> x86_64-redhat-linux, aarch64-linux-gnu and >> powerpc64{,le}-linux-gnu. >> >> I also verified this patch can pass SPEC2017 both intrate >> and fprate bmks building at -g -O2/-O3. >> >> Any thoughts? Is it ok for trunk? >> >> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html >> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html >> >> BR, >> Kewen >> ----- >> PR rtl-optimization/108273 >> >> gcc/ChangeLog: >> >> * haifa-sched.cc (no_real_insns_p): Rename to ... >> (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn. >> * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> * sched-int.h (no_real_insns_p): Rename to ... >> (no_real_nondebug_insns_p): ... this. >> * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum. >> (bb_deps_free_actions): New static variable. >> (compute_block_dependences): Skip for no_real_nondebug_insns_p. >> (resolve_forw_deps): New function. >> (free_block_dependencies): Check bb_deps_free_actions and call >> function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL. >> (compute_priorities): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> (schedule_region): Replace no_real_insns_p with >> no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block >> get dependencies computed before but skipped now, fix up count >> sched_rgn_n_insns for it too. Call free_trg_info when the block >> gets scheduled, and move sched_rgn_local_finish after the loop >> of free_block_dependencies loop. >> (sched_rgn_local_init): Allocate and compute bb_deps_free_actions. >> (sched_rgn_local_finish): Free bb_deps_free_actions. >> * sel-sched.cc (sel_region_target_finish): Replace no_real_insns_p with >> no_real_nondebug_insns_p. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr108273.c: New test. >> --- >> gcc/haifa-sched.cc | 9 +- >> gcc/sched-ebb.cc | 2 +- >> gcc/sched-int.h | 2 +- >> gcc/sched-rgn.cc | 148 +++++++++++++++----- >> gcc/sel-sched.cc | 3 +- >> gcc/testsuite/gcc.target/powerpc/pr108273.c | 26 ++++ >> 6 files changed, 150 insertions(+), 40 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108273.c >> >> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc >> index 8e8add709b3..30cc90ec49f 100644 >> --- a/gcc/haifa-sched.cc >> +++ b/gcc/haifa-sched.cc >> @@ -5033,14 +5033,17 @@ get_ebb_head_tail (basic_block beg, basic_block end, >> *tailp = end_tail; >> } >> >> -/* Return true if there are no real insns in the range [ HEAD, TAIL ]. */ >> +/* Return true if there are no real nondebug insns in the range >> + [ HEAD, TAIL ]. */ >> >> bool >> -no_real_insns_p (const rtx_insn *head, const rtx_insn *tail) >> +no_real_nondebug_insns_p (const rtx_insn *head, const rtx_insn *tail) >> { >> while (head != NEXT_INSN (tail)) >> { >> - if (!NOTE_P (head) && !LABEL_P (head)) >> + if (!NOTE_P (head) >> + && !LABEL_P (head) >> + && !DEBUG_INSN_P (head)) >> return false; >> head = NEXT_INSN (head); >> } >> diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc >> index 110fcdbca4d..03d96290a7c 100644 >> --- a/gcc/sched-ebb.cc >> +++ b/gcc/sched-ebb.cc >> @@ -491,7 +491,7 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling) >> first_bb = BLOCK_FOR_INSN (head); >> last_bb = BLOCK_FOR_INSN (tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> return BLOCK_FOR_INSN (tail); >> >> gcc_assert (INSN_P (head) && INSN_P (tail)); >> diff --git a/gcc/sched-int.h b/gcc/sched-int.h >> index 64a2f0bcff9..adca494ade5 100644 >> --- a/gcc/sched-int.h >> +++ b/gcc/sched-int.h >> @@ -1397,7 +1397,7 @@ extern void free_global_sched_pressure_data (void); >> extern int haifa_classify_insn (const_rtx); >> extern void get_ebb_head_tail (basic_block, basic_block, >> rtx_insn **, rtx_insn **); >> -extern bool no_real_insns_p (const rtx_insn *, const rtx_insn *); >> +extern bool no_real_nondebug_insns_p (const rtx_insn *, const rtx_insn *); >> >> extern int insn_sched_cost (rtx_insn *); >> extern int dep_cost_1 (dep_t, dw_t); >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc >> index e5964f54ead..2549e834aa8 100644 >> --- a/gcc/sched-rgn.cc >> +++ b/gcc/sched-rgn.cc >> @@ -213,6 +213,22 @@ static int rgn_nr_edges; >> /* Array of size rgn_nr_edges. */ >> static edge *rgn_edges; >> >> +/* Possible actions for dependencies freeing. */ >> +enum rgn_bb_deps_free_action >> +{ >> + /* This block doesn't get dependencies computed so don't need to free. */ >> + RGN_BB_DEPS_FREE_NO, >> + /* This block gets scheduled normally so free dependencies as usual. */ >> + RGN_BB_DEPS_FREE_NORMAL, >> + /* This block gets skipped in scheduling but has dependencies computed early, >> + need to free the forward list artificially. */ >> + RGN_BB_DEPS_FREE_ARTIFICIAL >> +}; >> + >> +/* For basic block i, bb_deps_free_actions[i] indicates which action needs >> + to be taken for freeing its dependencies. */ >> +static enum rgn_bb_deps_free_action *bb_deps_free_actions; >> + >> /* Mapping from each edge in the graph to its number in the rgn. */ >> #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux) >> #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr)) >> @@ -2735,6 +2751,15 @@ compute_block_dependences (int bb) >> gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> + /* Don't compute block dependences if there are no real nondebug insns. */ >> + if (no_real_nondebug_insns_p (head, tail)) >> + { >> + if (current_nr_blocks > 1) >> + propagate_deps (bb, &tmp_deps); >> + free_deps (&tmp_deps); >> + return; >> + } >> + >> sched_analyze (&tmp_deps, head, tail); >> >> add_branch_dependences (head, tail); >> @@ -2749,6 +2774,24 @@ compute_block_dependences (int bb) >> targetm.sched.dependencies_evaluation_hook (head, tail); >> } >> >> +/* Artificially resolve forward dependencies for instructions HEAD to TAIL. */ >> + >> +static void >> +resolve_forw_deps (rtx_insn *head, rtx_insn *tail) >> +{ >> + rtx_insn *insn; >> + rtx_insn *next_tail = NEXT_INSN (tail); >> + sd_iterator_def sd_it; >> + dep_t dep; >> + >> + /* There could be some insns which get skipped in scheduling but we compute >> + dependencies for them previously, so make them resolved. */ >> + for (insn = head; insn != next_tail; insn = NEXT_INSN (insn)) >> + for (sd_it = sd_iterator_start (insn, SD_LIST_FORW); >> + sd_iterator_cond (&sd_it, &dep);) >> + sd_resolve_dep (sd_it); >> +} >> + >> /* Free dependencies of instructions inside BB. */ >> static void >> free_block_dependencies (int bb) >> @@ -2758,9 +2801,12 @@ free_block_dependencies (int bb) >> >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) >> return; >> >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) >> + resolve_forw_deps (head, tail); >> + >> sched_free_deps (head, tail, true); >> } >> >> @@ -3024,7 +3070,7 @@ compute_priorities (void) >> gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> continue; >> >> rgn_n_insns += set_priorities (head, tail); >> @@ -3158,7 +3204,7 @@ schedule_region (int rgn) >> >> get_ebb_head_tail (first_bb, last_bb, &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> { >> gcc_assert (first_bb == last_bb); >> continue; >> @@ -3178,44 +3224,62 @@ schedule_region (int rgn) >> >> get_ebb_head_tail (first_bb, last_bb, &head, &tail); >> >> - if (no_real_insns_p (head, tail)) >> + if (no_real_nondebug_insns_p (head, tail)) >> { >> gcc_assert (first_bb == last_bb); >> save_state_for_fallthru_edge (last_bb, bb_state[first_bb->index]); >> - continue; >> + >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_NO) >> + continue; >> + >> + /* As it's not no_real_nondebug_insns_p initially, then it has some >> + dependencies computed so free it artificially. */ >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; >> } >> + else >> + { >> + current_sched_info->prev_head = PREV_INSN (head); >> + current_sched_info->next_tail = NEXT_INSN (tail); >> >> - current_sched_info->prev_head = PREV_INSN (head); >> - current_sched_info->next_tail = NEXT_INSN (tail); >> + remove_notes (head, tail); >> >> - remove_notes (head, tail); >> + unlink_bb_notes (first_bb, last_bb); >> >> - unlink_bb_notes (first_bb, last_bb); >> + target_bb = bb; >> >> - target_bb = bb; >> + gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); >> + current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; >> >> - gcc_assert (flag_schedule_interblock || current_nr_blocks == 1); >> - current_sched_info->queue_must_finish_empty = current_nr_blocks == 1; >> + curr_bb = first_bb; >> + if (dbg_cnt (sched_block)) >> + { >> + int saved_last_basic_block = last_basic_block_for_fn (cfun); >> >> - curr_bb = first_bb; >> - if (dbg_cnt (sched_block)) >> - { >> - int saved_last_basic_block = last_basic_block_for_fn (cfun); >> + schedule_block (&curr_bb, bb_state[first_bb->index]); >> + gcc_assert (EBB_FIRST_BB (bb) == first_bb); >> + sched_rgn_n_insns += sched_n_insns; >> + realloc_bb_state_array (saved_last_basic_block); >> + save_state_for_fallthru_edge (last_bb, curr_state); >> >> - schedule_block (&curr_bb, bb_state[first_bb->index]); >> - gcc_assert (EBB_FIRST_BB (bb) == first_bb); >> - sched_rgn_n_insns += sched_n_insns; >> - realloc_bb_state_array (saved_last_basic_block); >> - save_state_for_fallthru_edge (last_bb, curr_state); >> - } >> - else >> - { >> - sched_rgn_n_insns += rgn_n_insns; >> - } >> + /* Clean up. */ >> + if (current_nr_blocks > 1) >> + free_trg_info (); >> + } >> + else >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_ARTIFICIAL; >> + } >> >> - /* Clean up. */ >> - if (current_nr_blocks > 1) >> - free_trg_info (); >> + /* We have counted this block when computing rgn_n_insns >> + previously, so need to fix up sched_rgn_n_insns now. */ >> + if (bb_deps_free_actions[bb] == RGN_BB_DEPS_FREE_ARTIFICIAL) >> + { >> + while (head != NEXT_INSN (tail)) >> + { >> + if (INSN_P (head)) >> + sched_rgn_n_insns++; >> + head = NEXT_INSN (head); >> + } >> + } >> } >> >> /* Sanity check: verify that all region insns were scheduled. */ >> @@ -3223,13 +3287,13 @@ schedule_region (int rgn) >> >> sched_finish_ready_list (); >> >> - /* Done with this region. */ >> - sched_rgn_local_finish (); >> - >> /* Free dependencies. */ >> for (bb = 0; bb < current_nr_blocks; ++bb) >> free_block_dependencies (bb); >> >> + /* Done with this region. */ >> + sched_rgn_local_finish (); >> + >> gcc_assert (haifa_recovery_bb_ever_added_p >> || deps_pools_are_empty_p ()); >> } >> @@ -3450,6 +3514,19 @@ sched_rgn_local_init (int rgn) >> e->aux = NULL; >> } >> } >> + >> + /* Initialize bb_deps_free_actions. */ >> + bb_deps_free_actions >> + = XNEWVEC (enum rgn_bb_deps_free_action, current_nr_blocks); >> + for (bb = 0; bb < current_nr_blocks; bb++) >> + { >> + rtx_insn *head, *tail; >> + get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> + if (no_real_nondebug_insns_p (head, tail)) >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NO; >> + else >> + bb_deps_free_actions[bb] = RGN_BB_DEPS_FREE_NORMAL; >> + } >> } >> >> /* Free data computed for the finished region. */ >> @@ -3467,9 +3544,12 @@ sched_rgn_local_free (void) >> void >> sched_rgn_local_finish (void) >> { >> - if (current_nr_blocks > 1 && !sel_sched_p ()) >> + if (!sel_sched_p ()) >> { >> - sched_rgn_local_free (); >> + if (current_nr_blocks > 1) >> + sched_rgn_local_free (); >> + >> + free (bb_deps_free_actions); >> } >> } >> >> diff --git a/gcc/sel-sched.cc b/gcc/sel-sched.cc >> index 1925f4a9461..8310c892e13 100644 >> --- a/gcc/sel-sched.cc >> +++ b/gcc/sel-sched.cc >> @@ -7213,7 +7213,8 @@ sel_region_target_finish (bool reset_sched_cycles_p) >> >> find_ebb_boundaries (EBB_FIRST_BB (i), scheduled_blocks); >> >> - if (no_real_insns_p (current_sched_info->head, current_sched_info->tail)) >> + if (no_real_nondebug_insns_p (current_sched_info->head, >> + current_sched_info->tail)) >> continue; >> >> if (reset_sched_cycles_p) >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108273.c b/gcc/testsuite/gcc.target/powerpc/pr108273.c >> new file mode 100644 >> index 00000000000..937224eaa69 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108273.c >> @@ -0,0 +1,26 @@ >> +/* { dg-options "-O2 -fdbg-cnt=sched_block:1" } */ >> +/* { dg-prune-output {\*\*\*dbgcnt:.*limit.*reached} } */ >> + >> +/* Verify there is no ICE. */ >> + >> +int a, b, c, e, f; >> +float d; >> + >> +void >> +g () >> +{ >> + float h, i[1]; >> + for (; f;) >> + if (c) >> + { >> + d *e; >> + if (b) >> + { >> + float *j = i; >> + j[0] += 0; >> + } >> + h += d; >> + } >> + if (h) >> + a = i[0]; >> +} >> -- >> 2.39.1