From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 28E123839828 for ; Fri, 11 Jun 2021 13:16:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 28E123839828 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15BD47T2170052; Fri, 11 Jun 2021 09:16:29 -0400 Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com with ESMTP id 3947nrsrx2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 11 Jun 2021 09:16:29 -0400 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 15BDC9Sq026135; Fri, 11 Jun 2021 13:16:27 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma06fra.de.ibm.com with ESMTP id 3900hhhwu5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 11 Jun 2021 13:16:27 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 15BDGOA931588834 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 11 Jun 2021 13:16:24 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AA94AAE06A; Fri, 11 Jun 2021 13:16:24 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 411BAAE05F; Fri, 11 Jun 2021 13:16:23 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.200.34.6]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 11 Jun 2021 13:16:22 +0000 (GMT) Subject: [PATCH v2] combine: Tweak the condition of last_set invalidation To: Segher Boessenkool Cc: GCC Patches , Bill Schmidt References: <6bcd32fa-d0ef-b136-ddd9-92a1d21f60af@linux.ibm.com> <20210609201735.GJ18427@gate.crashing.org> From: "Kewen.Lin" Message-ID: Date: Fri, 11 Jun 2021 21:16:21 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <20210609201735.GJ18427@gate.crashing.org> Content-Type: multipart/mixed; boundary="------------9F22921F91F236D413676FF1" Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: guU9s8JXPyHFH1plcyFv1fQA8WlTHLdt X-Proofpoint-GUID: guU9s8JXPyHFH1plcyFv1fQA8WlTHLdt X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-06-11_05:2021-06-11, 2021-06-11 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 phishscore=0 adultscore=0 malwarescore=0 mlxscore=0 mlxlogscore=999 impostorscore=0 clxscore=1015 suspectscore=0 spamscore=0 lowpriorityscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106110084 X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, MIME_CHARSET_FARAWAY, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2021 13:16:33 -0000 This is a multi-part message in MIME format. --------------9F22921F91F236D413676FF1 Content-Type: text/plain; charset=gbk Content-Transfer-Encoding: 8bit Hi Segher, Thanks for the review! on 2021/6/10 ÉÏÎç4:17, Segher Boessenkool wrote: > Hi! > > On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote: >> Currently we have the check: >> >> if (!insn >> || (value && rsp->last_set_table_tick >= label_tick_ebb_start)) >> rsp->last_set_invalid = 1; >> >> which means if we want to record some value for some reg and >> this reg got refered before in a valid scope, > > If we already know it is *set* in this same extended basic block. > Possibly by the same instruction btw. > >> we invalidate the >> set of reg (last_set_invalid to 1). It avoids to find the wrong >> set for one reg reference, such as the case like: >> >> ... op regX // this regX could find wrong last_set below >> regX = ... // if we think this set is valid >> ... op regX > > Yup, exactly. > >> But because of retry's existence, the last_set_table_tick could >> be set by some later reference insns, but we see it's set due >> to retry on the set (for that reg) insn again, such as: >> >> insn 1 >> insn 2 >> >> regX = ... --> (a) >> ... op regX --> (b) >> >> insn 3 >> >> // assume all in the same BB. >> >> Assuming we combine 1, 2 -> 3 sucessfully and replace them as two >> (3 insns -> 2 insns), > > This will delete insn 1 and write the combined result to insns 2 and 3. > >> retrying from insn1 or insn2 again: > > Always 2, but your point remains valid. > >> it will scan insn (a) again, the below condition holds for regX: >> >> (value && rsp->last_set_table_tick >= label_tick_ebb_start) >> >> it will mark this set as invalid set. But actually the >> last_set_table_tick here is set by insn (b) before retrying, so it >> should be safe to be taken as valid set. > > Yup. > >> This proposal is to check whether the last_set_table safely happens >> after the current set, make the set still valid if so. > >> Full SPEC2017 building shows this patch gets more sucessful combines >> from 1902208 to 1902243 (trivial though). > > Do you have some example, or maybe even a testcase? :-) > Sorry for the late reply, it took some time to get one reduced case. typedef struct SA *pa_t; struct SC { int h; pa_t elem[]; }; struct SD { struct SC *e; }; struct SA { struct { struct SD f[1]; } g; }; void foo(pa_t *k, char **m) { int l, i; pa_t a; l = (int)a->g.f[5].e; i = 0; for (; i < l; i++) { k[i] = a->g.f[5].e->elem[i]; m[i] = ""; } } Baseline is r12-0 and the option is "-O3 -mcpu=power9 -fno-strict-aliasing", with this patch, the generated assembly can save two rlwinm s. >> + /* Record the luid of the insn whose expression involving register n. */ >> + >> + int last_set_table_luid; > > "Record the luid of the insn for which last_set_table_tick was set", > right? > But it can be updated later to one smaller luid, how about the wording like: + /* Record the luid of the insn which uses register n, the insn should + be the first one using register n in that block of the insn which + last_set_table_tick was set for. */ >> -static void update_table_tick (rtx); >> +static void update_table_tick (rtx, int); > > Please remove this declaration instead, the function is not used until > after its actual definition :-) > Done. >> @@ -13243,7 +13247,21 @@ update_table_tick (rtx x) >> for (r = regno; r < endregno; r++) >> { >> reg_stat_type *rsp = ®_stat[r]; >> - rsp->last_set_table_tick = label_tick; >> + if (rsp->last_set_table_tick >= label_tick_ebb_start) >> + { >> + /* Later references should not have lower ticks. */ >> + gcc_assert (label_tick >= rsp->last_set_table_tick); > > This should be obvious, but checking it won't hurt, okay. > >> + /* Should pick up the lowest luid if the references >> + are in the same block. */ >> + if (label_tick == rsp->last_set_table_tick >> + && rsp->last_set_table_luid > insn_luid) >> + rsp->last_set_table_luid = insn_luid; > > Why? Is it conservative for the check you will do later? Please spell > this out, it is crucial! > Since later the combinations involving this insn probably make the register be used in one insn sitting ahead (which has smaller luid than the one which was recorded before). Yes, it's very conservative, this ensure that we always use the luid of the insn which is the first insn using this register in the block. The last_set invalidation is going to catch the case like: ... regX // avoid the set used here ... regX = ... ... Once we have the smallest luid one of all insns which use register X, any unsafe regX sets should be caught. I updated the comments to: + /* Since combination may generate some instructions + to replace some foregoing instructions with the + references to register r (using register r), we + need to make sure we record the first instruction + which is using register r, so always update with + the lowest luid here. If the given set happens + before this recorded earliest reference, the set + value should be safe to be used. */ >> @@ -13359,7 +13378,10 @@ record_value_for_reg (rtx reg, rtx_insn *insn, rtx value) >> >> /* Mark registers that are being referenced in this value. */ >> if (value) >> - update_table_tick (value); >> + { >> + gcc_assert (insn); >> + update_table_tick (value, DF_INSN_LUID (insn)); >> + } > > Don't add that assert please. If you really want one it should come > right at the start of the function, not 60 lines later :-) > Exactly, fixed. > Looks good if I understood this correctly :-) > > Thanks again, I also updated the comments in func record_value_for_reg, the new version is attached. BR, Kewen ----- gcc/ChangeLog: * combine.c (struct reg_stat_type): New member last_set_table_luid. (update_table_tick): Add one argument for insn luid and set last_set_table_luid with it, remove its declaration. (record_value_for_reg): Adjust the condition to set last_set_invalid nonzero. --------------9F22921F91F236D413676FF1 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="last_set_luid_v2.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="last_set_luid_v2.diff" diff --git a/gcc/combine.c b/gcc/combine.c index 62bf4aeaaba..6ef6a08d54f 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -202,6 +202,12 @@ struct reg_stat_type { int last_set_table_tick; + /* Record the luid of the insn which uses register n, the insn should + be the first one using register n in that block of the insn which + last_set_table_tick was set for. */ + + int last_set_table_luid; + /* Record the value of label_tick when the value for register n is placed in last_set_value. */ @@ -476,7 +482,6 @@ static rtx gen_lowpart_for_combine (machine_mode, rtx); static enum rtx_code simplify_compare_const (enum rtx_code, machine_mode, rtx, rtx *); static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *); -static void update_table_tick (rtx); static void record_value_for_reg (rtx, rtx_insn *, rtx); static void check_promoted_subreg (rtx_insn *, rtx); static void record_dead_and_set_regs_1 (rtx, const_rtx, void *); @@ -13179,7 +13184,7 @@ count_rtxs (rtx x) for each register mentioned. Similar to mention_regs in cse.c */ static void -update_table_tick (rtx x) +update_table_tick (rtx x, int insn_luid) { enum rtx_code code = GET_CODE (x); const char *fmt = GET_RTX_FORMAT (code); @@ -13194,7 +13199,27 @@ update_table_tick (rtx x) for (r = regno; r < endregno; r++) { reg_stat_type *rsp = ®_stat[r]; - rsp->last_set_table_tick = label_tick; + if (rsp->last_set_table_tick >= label_tick_ebb_start) + { + /* Later references should not have lower ticks. */ + gcc_assert (label_tick >= rsp->last_set_table_tick); + /* Since combination may generate some instructions + to replace some foregoing instructions with the + references to register r (using register r), we + need to make sure we record the first instruction + which is using register r, so always update with + the lowest luid here. If the given set happens + before this recorded earliest reference, the set + value should be safe to be used. */ + if (label_tick == rsp->last_set_table_tick + && rsp->last_set_table_luid > insn_luid) + rsp->last_set_table_luid = insn_luid; + } + else + { + rsp->last_set_table_tick = label_tick; + rsp->last_set_table_luid = insn_luid; + } } return; @@ -13230,16 +13255,17 @@ update_table_tick (rtx x) if (ARITHMETIC_P (x0) && (x1 == XEXP (x0, 0) || x1 == XEXP (x0, 1))) { - update_table_tick (XEXP (x0, x1 == XEXP (x0, 0) ? 1 : 0)); + update_table_tick (XEXP (x0, x1 == XEXP (x0, 0) ? 1 : 0), + insn_luid); break; } } - update_table_tick (XEXP (x, i)); + update_table_tick (XEXP (x, i), insn_luid); } else if (fmt[i] == 'E') for (j = 0; j < XVECLEN (x, i); j++) - update_table_tick (XVECEXP (x, i, j)); + update_table_tick (XVECEXP (x, i, j), insn_luid); } /* Record that REG is set to VALUE in insn INSN. If VALUE is zero, we @@ -13310,21 +13336,26 @@ record_value_for_reg (rtx reg, rtx_insn *insn, rtx value) /* Mark registers that are being referenced in this value. */ if (value) - update_table_tick (value); + update_table_tick (value, DF_INSN_LUID (insn)); /* Now update the status of each register being set. If someone is using this register in this block, set this register to invalid since we will get confused between the two lives in this basic block. This makes using this register always invalid. In cse, we scan the table to invalidate all entries using this register, but this - is too much work for us. */ + is too much work for us. If we know this register set and its register + uses are in the same block, and the set always happens before any uses, + we don't need to make it invalid. */ for (i = regno; i < endregno; i++) { rsp = ®_stat[i]; rsp->last_set_label = label_tick; + gcc_assert (label_tick >= rsp->last_set_table_tick); if (!insn - || (value && rsp->last_set_table_tick >= label_tick_ebb_start)) + || (value && rsp->last_set_table_tick >= label_tick_ebb_start + && !(label_tick == rsp->last_set_table_tick + && DF_INSN_LUID (insn) < rsp->last_set_table_luid))) rsp->last_set_invalid = 1; else rsp->last_set_invalid = 0; --------------9F22921F91F236D413676FF1--