From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) by sourceware.org (Postfix) with ESMTPS id 3D4323858CDA for ; Mon, 14 Aug 2023 11:23:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3D4323858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.alibaba.com X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=jinma@linux.alibaba.com;NM=1;PH=DS;RN=12;SR=0;TI=SMTPD_---0Vpm52P8_1692012184; Received: from localhost.localdomain(mailfrom:jinma@linux.alibaba.com fp:SMTPD_---0Vpm52P8_1692012184) by smtp.aliyun-inc.com; Mon, 14 Aug 2023 19:23:06 +0800 From: Jin Ma To: gcc-patches@gcc.gnu.org Cc: jeffreyalaw@gmail.com, palmer@dabbelt.com, richard.sandiford@arm.com, kito.cheng@gmail.com, philipp.tomsich@vrull.eu, christoph.muellner@vrull.eu, rdapp.gcc@gmail.com, juzhe.zhong@rivai.ai, vineetg@rivosinc.com, jinma.contrib@gmail.com, Jin Ma Subject: [PATCH v2] In the pipeline, USE or CLOBBER should delay execution if it starts a new live range. Date: Mon, 14 Aug 2023 19:22:55 +0800 Message-Id: <20230814112255.2071-1-jinma@linux.alibaba.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-20.2 required=5.0 tests=BAYES_00,ENV_AND_HDR_SPF_MATCH,GIT_PATCH_0,KAM_DMARC_STATUS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: CLOBBER and USE does not represent real instructions, but in the process of pipeline optimization, they will wait for transmission in ready list like other insns, without considering resource conflicts and cycles. This results in a multi-issue CPU architecture that can be issued at any time if other regular insns have resource conflicts or cannot be launched for other reasons. As a result, its position is advanced in the generated insns sequence, which will affect register allocation and often lead to more redundant mov instructions. A simple example: https://github.com/majin2020/gcc-test/blob/master/test.c This is a function in the dhrystone benchmark. https://github.com/majin2020/gcc-test/blob/0b08c1a13de9663d7d9aba7539b960ec0607ca24/test.c.299r.sched1 This is a log of the pass 'sched1' When -mtune=rocket but issue_rate == 2. The pipeline is: ;; | insn | prio | ;; | 17 | 3 | r142=a0 alu ;; | 14 | 0 | clobber r136 nothing ;; | 13 | 0 | clobber a0 nothing ;; | 18 | 2 | r143=a1 alu ... ;; | 12 | 0 | a0=r136 alu ;; | 15 | 0 | use a0 nothing In this log, insn 13 and 14 are much ahead of schedule, which risks generating redundant mov instructions, which seems unreasonable. Therefore, I submit patch again on the basis of the last review opinions to try to solve this problem. https://github.com/majin2020/gcc-test/commit/efcb43e3369e771bde702955048bfe3f501263dd#diff-805031b1be5092a2322852a248d0b0f92eef7cad5784a8209f4dfc6221407457L189 This is the diff log of shed1 after patch is added. The new pipeline is: ;; | insn | prio | ;; | 17 | 3 | r142=a0 alu ... ;; | 10 | 0 | [r144]=r141 alu ;; | 13 | 0 | clobber a0 nothing ;; | 14 | 0 | clobber r136 nothing ;; | 12 | 0 | a0=r136 alu ;; | 15 | 0 | use a0 nothing gcc/ChangeLog: * haifa-sched.cc (use_or_clobber_starts_range_p): New. (prune_ready_list): USE or CLOBBER should delay execution if it starts a new live range. --- gcc/haifa-sched.cc | 55 +++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 5 deletions(-) diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc index 8e8add709b3..47ad09457c7 100644 --- a/gcc/haifa-sched.cc +++ b/gcc/haifa-sched.cc @@ -765,6 +765,23 @@ real_insn_for_shadow (rtx_insn *insn) return pair->i1; } +/* Return TRUE if INSN (a USE or CLOBBER) starts a new live + range, FALSE otherwise. */ + +static bool +use_or_clobber_starts_range_p (rtx_insn *insn) +{ + gcc_assert (insn); + + if ((GET_CODE (PATTERN (insn)) == CLOBBER + || GET_CODE (PATTERN (insn)) == USE) + && !sd_lists_empty_p (insn, SD_LIST_FORW) + && sd_lists_empty_p (insn, SD_LIST_BACK)) + return true; + + return false; +} + /* For a pair P of insns, return the fixed distance in cycles from the first insn after which the second must be scheduled. */ static int @@ -6320,11 +6337,39 @@ prune_ready_list (state_t temp_state, bool first_cycle_insn_p, } else if (recog_memoized (insn) < 0) { - if (!first_cycle_insn_p - && (GET_CODE (PATTERN (insn)) == ASM_INPUT - || asm_noperands (PATTERN (insn)) >= 0)) - cost = 1; - reason = "asm"; + if (GET_CODE (PATTERN (insn)) == ASM_INPUT + || asm_noperands (PATTERN (insn)) >= 0) + { + reason = "asm"; + if (!first_cycle_insn_p) + cost = 1; + } + else if (use_or_clobber_starts_range_p (insn)) + { + /* If USE or CLOBBER opens an active range, its execution should + be delayed so as to be closer to the relevant instructions and + avoid the generation of some redundant mov instructions. + Otherwise, it should be executed as soon as possible. */ + reason = "unrecog insn"; + if (!first_cycle_insn_p) + /* If USE or CLOBBER is not in the first cycle, simply delay it + by one cycle. */ + cost = 1; + else + { + /* If the USE or CLOBBER is in the first cycle and there are no + other non-USE or non-CLOBBER instructions after it, we need + to execute it immediately, otherwise we need to execute the + non-USE or non-CLOBBER instructions first and postpone the + execution of the USE or CLOBBER instructions. */ + int j = i; + while (n > ++j) + if (!use_or_clobber_starts_range_p (ready_element (&ready, j))) + break; + + cost = (j == n) ? 0 : 1; + } + } } else if (sched_pressure != SCHED_PRESSURE_NONE) { base-commit: c944ded09595946290778a26794074e69cc65f3e -- 2.17.1