From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id C8F8A3858D20 for ; Mon, 20 Mar 2023 09:49:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C8F8A3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32K7vCsY016782; Mon, 20 Mar 2023 09:49:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=ZREPHKZuSbfYpf+ilXAo3TW1m82VXyibCOsnnldi2uY=; b=Uw0Zd48O+agLm5q32OPx4h3XhBN6U8034yiEp/ycUdde9AXwsyhwFRa8WlkV+dko0aXh Axz/40PTRsZ67LKx5zcuJV8UIRYq/ynv7HiMEMo2cwC/eTXC0Gj5S1wqyQZUAzh++crZ pzEWWcXnDIKI+5sykbOFbrP53KT9GEndEbzdtZpe8zC5RSZVdTp5g4EZoIeFFYCRaK00 AJPkPx9B+UlDr6H7FKLnTslKDFlSUCMc87sJD02Fu4u/xZ6WDJF4ERf1hQsYvSbi/UqS 132XKf9hTcDIAayDOCN0qjB0VIf1tTaVXF11cr6QV0ZT1eDu4XZf3gEIvVXG00L7kIYq ZA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pdq3tkx0d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Mar 2023 09:49:08 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 32K9SPS7010121; Mon, 20 Mar 2023 09:49:07 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pdq3tkwye-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Mar 2023 09:49:07 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 32JGA0er018062; Mon, 20 Mar 2023 09:49:05 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma06fra.de.ibm.com (PPS) with ESMTPS id 3pd4jfahqx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Mar 2023 09:49:05 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 32K9n2J040501646 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Mar 2023 09:49:02 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B2D5D20049; Mon, 20 Mar 2023 09:49:02 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9C8D120043; Mon, 20 Mar 2023 09:48:59 +0000 (GMT) Received: from [9.177.11.227] (unknown [9.177.11.227]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 20 Mar 2023 09:48:59 +0000 (GMT) Message-ID: <4bb281df-8c7b-94b5-2586-56e5a6add564@linux.ibm.com> Date: Mon, 20 Mar 2023 17:48:57 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [RFC/PATCH] sched: Consider debug insn in no_real_insns_p [PR108273] Content-Language: en-US To: Alexander Monakov Cc: GCC Patches , Segher Boessenkool , Peter Bergner , Richard Biener , Jeff Law , Vladimir Makarov , Richard Sandiford References: <928b5bd5-387c-5400-6863-0c045fd22aef@linux.ibm.com> From: "Kewen.Lin" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: flRvB05Cpu76sngGCkZbQwP856WoZq4I X-Proofpoint-GUID: o1kDZUvM142hfvtrhn4tmA5sIvbshVZV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-20_06,2023-03-16_02,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 spamscore=0 priorityscore=1501 bulkscore=0 impostorscore=0 phishscore=0 suspectscore=0 mlxlogscore=999 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303150002 definitions=main-2303200082 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Alexander, Thanks a lot for your comments and suggestions! on 2023/3/20 16:03, Alexander Monakov wrote: > > On Mon, 20 Mar 2023, Kewen.Lin wrote: > >> Hi, > > Hi. Thank you for the thorough analysis. Since I analyzed > PR108519, I'd like to offer my comments. > >> As PR108273 shows, when there is one block which only has >> NOTE_P and LABEL_P insns at non-debug mode while has some >> extra DEBUG_INSN_P insns at debug mode, after scheduling >> it, the DFA states would be different between debug mode >> and non-debug mode. Since at non-debug mode, the block >> meets no_real_insns_p, it gets skipped; while at debug >> mode, it gets scheduled, even it only has NOTE_P, LABEL_P >> and DEBUG_INSN_P, the call of function advance_one_cycle >> will change the DFA state. PR108519 also shows this issue >> issue can be exposed by some scheduler changes. > > (yes, so an alternative is to avoid extraneous advance_one_cycle > calls, but I think adjusting no_real_insns_p is preferable) > Yeah, it looks no value to enter schedule_block for this kind of no_real_nondebug_insns_p block. >> This patch is to take debug insn into account in function >> no_real_insns_p, which make us not try to schedule for the >> block having only NOTE_P, LABEL_P and DEBUG_INSN_P insns, >> resulting in consistent DFA states between non-debug and >> debug mode. Changing no_real_insns_p caused ICE when doing >> free_block_dependencies, the root cause is that we create >> dependencies for debug insns, those dependencies are >> expected to be resolved during scheduling insns which gets >> skipped after the change in no_real_insns_p. By checking >> the code, it looks it's reasonable to skip to compute block >> dependencies for no_real_insns_p blocks. It can be >> bootstrapped and regtested but it hit one ICE when built >> SPEC2017 bmks at option -O2 -g. The root cause is that >> initially there are no no_real_insns_p blocks in a region, >> but in the later scheduling one block has one insn scheduled >> speculatively then becomes no_real_insns_p, so we compute >> dependencies and rgn_n_insns for this special block before >> scheduling, later it gets skipped so not scheduled, the >> following counts would mismatch: >> >> /* Sanity check: verify that all region insns were scheduled. */ >> gcc_assert (sched_rgn_n_insns == rgn_n_insns); >> >> , and we miss to release the allocated dependencies. > > Hm, but it is quite normal for BBs to become empty via speculative > scheduling in non-debug mode as well. So I don't think it's the > right way to frame the problem. > > I think the main issue here is that debug_insns are "not real insns" > except we add them together with normal insns in the dependency graph, > and then we verify that the graph was exhausted by the scheduler. Thanks for your better description and explanation! Yeah, it creates dependencies and makes statistics for debug insns like the other normal insns. > > We already handle a situation when dbg_cnt is telling the scheduler > to skip blocks. I guess the dbg_cnt handling is broken for a similar > reason? Yes, the dbg_cnt handlings looks wrong, I didn't dig into the history but I guessed the original meaning (usage) of rgn_n_insns or sched_rgn_n_insns have changed over time. > > Can we fix this issue together with the debug_cnt issue by adjusting > dbg_cnt handling in schedule_region, i.e. if no_real_insns_p || !dbg_cnt > then adjust sched_rgn_n_insns and manually resolve+free dependencies? Good idea! I'll expand this to handle dbg_cnt issue. One thing I'm not sure about is that if it's a good idea to skip dependencies computing for this kind of no_real_nondebug_insns_p block. The pro is that it's more efficient by skipping, while the con is that it needs an extra bitmap and some checks. > >> To avoid the unexpected mis-matchings, this patch adds one >> bitmap to track this kind of special block which isn't >> no_real_insns_p but becomes no_real_insns_p later, then we >> can adjust the count and free deps for it. > > Per above, I hope a simpler solution is possible. > > (some comments on the patch below) > >> This patch can be bootstrapped and regress-tested on >> x86_64-redhat-linux, aarch64-linux-gnu and >> powerpc64{,le}-linux-gnu. >> >> I also verified this patch can pass SPEC2017 both intrate >> and fprate bmks building at -g -O2/-O3. >> >> This is for next stage 1, but since I know little on the >> scheduler, I'd like to post it early for more comments. >> >> Is it on the right track? Any thoughts? >> >> BR, >> Kewen >> ----- >> PR rtl-optimization/108273 >> >> gcc/ChangeLog: >> >> * haifa-sched.cc (no_real_insns_p): Consider DEBUG_INSN_P insn. >> * sched-rgn.cc (no_real_insns): New static bitmap variable. >> (compute_block_dependences): Skip for no_real_insns_p. >> (free_deps_for_bb_no_real_insns_p): New function. >> (free_block_dependencies): Call free_deps_for_bb_no_real_insns_p for >> no_real_insns_p bb. >> (schedule_region): Fix up sched_rgn_n_insns for some block for which >> rgn_n_insns is computed before, and move sched_rgn_local_finish after >> free_block_dependencies loop. >> (sched_rgn_local_init): Allocate and compute no_real_insns. >> (sched_rgn_local_free): Free no_real_insns. >> --- >> gcc/haifa-sched.cc | 8 ++++- >> gcc/sched-rgn.cc | 84 +++++++++++++++++++++++++++++++++++++++++++--- >> 2 files changed, 87 insertions(+), 5 deletions(-) >> >> diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc >> index 48b53776fa9..378f3b34cc0 100644 >> --- a/gcc/haifa-sched.cc >> +++ b/gcc/haifa-sched.cc >> @@ -5040,7 +5040,13 @@ no_real_insns_p (const rtx_insn *head, const rtx_insn *tail) >> { >> while (head != NEXT_INSN (tail)) >> { >> - if (!NOTE_P (head) && !LABEL_P (head)) >> + /* Take debug insn into account here, otherwise we can have different >> + DFA states after scheduling a block which only has NOTE_P, LABEL_P >> + and DEBUG_P (debug mode) insns between non-debug and debug modes, >> + it could cause -fcompare-debug failure. */ > > Sorry, I don't think this comment is appropriate. I'd suggest to rename the > function to no_real_nondebug_insns_p (so when you adjust all callers it > becomes apparent what's affected), and then no comment will be necessary. Good idea, will update, thanks! > >> + if (!NOTE_P (head) >> + && !LABEL_P (head) >> + && !DEBUG_INSN_P (head)) >> return 0; >> head = NEXT_INSN (head); >> } >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc >> index f2751f62450..211b62e2b4a 100644 >> --- a/gcc/sched-rgn.cc >> +++ b/gcc/sched-rgn.cc >> @@ -213,6 +213,11 @@ static int rgn_nr_edges; >> /* Array of size rgn_nr_edges. */ >> static edge *rgn_edges; >> >> +/* For basic block i, the corresponding set bit i in bitmap indicates this basic >> + block meets predicate no_real_insns_p before scheduling any basic blocks in >> + the region. */ >> +static bitmap no_real_insns; >> + >> /* Mapping from each edge in the graph to its number in the rgn. */ >> #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux) >> #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr)) >> @@ -2730,6 +2735,15 @@ compute_block_dependences (int bb) >> gcc_assert (EBB_FIRST_BB (bb) == EBB_LAST_BB (bb)); >> get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail); >> >> + /* Don't compute block dependencies if there are no real insns. */ >> + if (no_real_insns_p (head, tail)) >> + { >> + if (current_nr_blocks > 1) >> + propagate_deps (bb, &tmp_deps); >> + free_deps (&tmp_deps); >> + return; >> + } >> + >> sched_analyze (&tmp_deps, head, tail); >> >> add_branch_dependences (head, tail); >> @@ -2744,6 +2758,42 @@ compute_block_dependences (int bb) >> targetm.sched.dependencies_evaluation_hook (head, tail); >> } >> >> +/* The basic block without any real insns (no_real_insns_p) would be >> + skipped in compute_block_dependences, so for most no_real_insns_p >> + basic block, we don't need to free dependencies for them. But >> + sometimes some basic block which isn't no_real_insns_p before >> + scheduling can become no_real_insns_p with speculative scheduling, >> + so we still need to free dependencies computed early for it. So >> + this function is to free dependencies if need. */ > > Please try to keep comments more concise. The following might have been > more appropriate: > > /* Artificially resolve and free dependencies for instructions HEAD to TAIL. */ > > and the explanation why we are doing that would go to the caller. Will update, thanks! > >> + >> +static void >> +free_deps_for_bb_no_real_insns_p (rtx_insn *head, rtx_insn *tail, int bb) >> +{ >> + gcc_assert (no_real_insns_p (head, tail)); >> + >> + /* Don't bother if there is only one block. */ >> + if (current_nr_blocks == 1) >> + return; > > Sorry, can you explain this check? Since bitmap no_real_insns only gets allocated when current_nr_blocks > 1, this is to avoid an access to an un-allocated bitmap. > >> + >> + /* We don't compute dependencies before. */ >> + if (bitmap_bit_p (no_real_insns, bb)) >> + return; > > This looks more appropriate to check in the caller. Actually in the initial draft I checked this in the call site, but later I thought maybe it's to make the call site concise by putting all the checks here. Will adjust it back. :) Thanks again! BR, Kewen