From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 77A3A3992026 for ; Thu, 22 Jul 2021 12:06:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 77A3A3992026 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 16MC4Can021668; Thu, 22 Jul 2021 08:06:51 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 39y6jh3rvr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 22 Jul 2021 08:06:50 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 16MBws5b017917; Thu, 22 Jul 2021 12:06:49 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma04ams.nl.ibm.com with ESMTP id 39xhx48m9t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 22 Jul 2021 12:06:49 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 16MC4IvU23462320 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 22 Jul 2021 12:04:18 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D8C85AE04D; Thu, 22 Jul 2021 12:06:46 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 989BAAE055; Thu, 22 Jul 2021 12:06:46 +0000 (GMT) Received: from li-926bd7cc-2dd1-11b2-a85c-f6adc0f5efec.ibm.com (unknown [9.171.28.15]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 22 Jul 2021 12:06:46 +0000 (GMT) Subject: Re: [PATCH 1/7] ifcvt: Check if cmovs are needed. To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com References: <20210625160905.23786-1-rdapp@linux.ibm.com> <20210625160905.23786-2-rdapp@linux.ibm.com> From: Robin Dapp Message-ID: <399e9ada-8a0a-e95b-f037-b3f3cb8a6c48@linux.ibm.com> Date: Thu, 22 Jul 2021 14:06:46 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------F281BAA34AB9F55F17647FCD" Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: -kb8wuF3OjsXW3TnmyjKc0-EpMCz8hCX X-Proofpoint-ORIG-GUID: -kb8wuF3OjsXW3TnmyjKc0-EpMCz8hCX X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-22_04:2021-07-22, 2021-07-22 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 mlxlogscore=999 lowpriorityscore=0 adultscore=0 suspectscore=0 priorityscore=1501 bulkscore=0 impostorscore=0 spamscore=0 malwarescore=0 clxscore=1015 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2107220081 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2021 12:06:55 -0000 This is a multi-part message in MIME format. --------------F281BAA34AB9F55F17647FCD Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi Richard, thanks for going through the patch set. > A hash_set might be simpler, given that the code only enters insns > for which the bool is false. “rtx_insn *” would be better than rtx. Right, changed that. > Do you mean the sets might be removed or that the checks might be > removed? It's actually the checks that are removed. I clarified and amended the comments. > The patch is quite hard to review on its own, since nothing actually > uses this variable. It's also not obvious how the > reg_overlap_mentioned_p code works if the old target is referenced > later. > > Could you refactor the series a bit so that each patch is > self-contained? It's OK if that means fewer patches. The attached v2 makes use of need_cmov now, I hope this makes it a bit clearer. Regarding the reg_overlap_mentioned_p: it is used to detect the swap idioms as well as when a cmov destination is used in the condition as well. If needed this could be two separate patches (most of the second patch would be comments, though). Regards Robin --------------F281BAA34AB9F55F17647FCD Content-Type: text/x-patch; charset=UTF-8; name="v2-0001-ifcvt-Check-if-cmovs-are-needed.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="v2-0001-ifcvt-Check-if-cmovs-are-needed.patch" >From bf7e372d7f48e614f20676c7cf4b2fbde741d4fc Mon Sep 17 00:00:00 2001 From: Robin Dapp Date: Thu, 24 Jun 2021 16:40:04 +0200 Subject: [PATCH v2 1/7] ifcvt: Check if cmovs are needed. When if-converting multiple SETs and we encounter a swap-style idiom if (a > b) { tmp = c; // [1] c = d; d = tmp; } ifcvt should not generate a conditional move for the instruction at [1]. In order to achieve that, this patch goes through all relevant SETs and marks the relevant instructions. This helps to evaluate costs. On top, only generate temporaries if the current cmov is going to overwrite one of the comparands of the initial compare. --- gcc/ifcvt.c | 138 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 118 insertions(+), 20 deletions(-) diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 017944f4f79..a5e55456d88 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -98,6 +98,7 @@ static int dead_or_predicable (basic_block, basic_block, basic_block, edge, int); static void noce_emit_move_insn (rtx, rtx); static rtx_insn *block_has_only_trap (basic_block); +static void check_need_cmovs (basic_block, hash_set *); /* Count the number of non-jump active insns in BB. */ @@ -3203,6 +3204,10 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) auto_vec unmodified_insns; int count = 0; + hash_set need_no_cmov; + + check_need_cmovs (then_bb, &need_no_cmov); + FOR_BB_INSNS (then_bb, insn) { /* Skip over non-insns. */ @@ -3213,26 +3218,47 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) gcc_checking_assert (set); rtx target = SET_DEST (set); - rtx temp = gen_reg_rtx (GET_MODE (target)); + rtx temp; rtx new_val = SET_SRC (set); rtx old_val = target; - /* If we were supposed to read from an earlier write in this block, - we've changed the register allocation. Rewire the read. While - we are looking, also try to catch a swap idiom. */ - for (int i = count - 1; i >= 0; --i) - if (reg_overlap_mentioned_p (new_val, targets[i])) - { - /* Catch a "swap" style idiom. */ - if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX) - /* The write to targets[i] is only live until the read - here. As the condition codes match, we can propagate - the set to here. */ - new_val = SET_SRC (single_set (unmodified_insns[i])); - else - new_val = temporaries[i]; - break; - } + /* As we are transforming + if (x > y) + { + a = b; + c = d; + } + into + a = (x > y) ... + c = (x > y) ... + + we potentially check x > y before every set. + Even though the check might be removed by subsequent passes, this means + that we cannot transform + if (x > y) + { + x = y; + ... + } + into + x = (x > y) ... + ... + since this would invalidate x and the following to-be-removed checks. + Therefore we introduce a temporary every time we are about to + overwrite a variable used in the check. Costing of a sequence with + these is going to be inaccurate so only use temporaries when + needed. */ + if (reg_overlap_mentioned_p (target, cond)) + temp = gen_reg_rtx (GET_MODE (target)); + else + temp = target; + + /* We have identified swap-style idioms in check_need_cmovs. A normal + set will need to be a cmov while the first instruction of a swap-style + idiom can be a regular move. This helps with costing. */ + bool need_cmov = true; + if (need_no_cmov.contains (insn)) + need_cmov = false; /* If we had a non-canonical conditional jump (i.e. one where the fallthrough is to the "else" case) we need to reverse @@ -3275,9 +3301,22 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) old_val = lowpart_subreg (dst_mode, old_val, src_mode); } - /* Actually emit the conditional move. */ - rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code, - x, y, new_val, old_val); + rtx temp_dest = NULL_RTX; + + if (need_cmov) + { + /* Actually emit the conditional move. */ + temp_dest = noce_emit_cmove (if_info, temp, cond_code, + x, y, new_val, old_val); + } + else + { + if (if_info->then_else_reverse) + noce_emit_move_insn (temp, old_val); + else + noce_emit_move_insn (temp, new_val); + temp_dest = temp; + } /* If we failed to expand the conditional move, drop out and don't try to continue. */ @@ -3808,6 +3847,65 @@ check_cond_move_block (basic_block bb, return TRUE; } +/* Find local swap-style idioms in BB and mark the first insn (1) + that is only a temporary as not needing a conditional move as + it is going to be dead afterwards anyway. + + (1) int tmp = a; + a = b; + b = tmp; + + ifcvt + --> + + load tmp,a + cmov a,b + cmov b,tmp */ + +static void +check_need_cmovs (basic_block bb, hash_set *need_no_cmov) +{ + rtx_insn *insn; + int count = 0; + auto_vec insns; + auto_vec dests; + + /* Iterate over all SETs, storing the destinations + in DEST. If we hit a SET that reads from a destination + that we have seen before and the corresponding register + is dead afterwards, it must be a swap. */ + FOR_BB_INSNS (bb, insn) + { + rtx set, src, dest; + + if (!active_insn_p (insn)) + continue; + + set = single_set (insn); + if (set == NULL_RTX) + continue; + + src = SET_SRC (set); + dest = SET_DEST (set); + + /* Check if the current SET's source is the same + as any previously seen destination. + This is quadratic but the number of insns in BB + is bounded by PARAM_MAX_RTL_IF_CONVERSION_INSNS. */ + for (int i = count - 1; i >= 0; --i) + { + if (reg_overlap_mentioned_p (src, dests[i]) + && find_reg_note (insn, REG_DEAD, src) != NULL_RTX) + need_no_cmov->add (insns[i]); + } + + insns.safe_push (insn); + dests.safe_push (dest); + + count++; + } +} + /* Given a basic block BB suitable for conditional move conversion, a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing the register values depending on COND, emit the insns in the block as -- 2.31.1 --------------F281BAA34AB9F55F17647FCD--