From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id C39473858C2C for ; Thu, 14 Oct 2021 08:45:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C39473858C2C Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7F9A411D4; Thu, 14 Oct 2021 01:45:02 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0924A3F66F; Thu, 14 Oct 2021 01:45:01 -0700 (PDT) From: Richard Sandiford To: Robin Dapp Mail-Followup-To: Robin Dapp , gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH 1/7] ifcvt: Check if cmovs are needed. References: <20210625160905.23786-1-rdapp@linux.ibm.com> <20210625160905.23786-2-rdapp@linux.ibm.com> <399e9ada-8a0a-e95b-f037-b3f3cb8a6c48@linux.ibm.com> Date: Thu, 14 Oct 2021 09:45:00 +0100 In-Reply-To: (Robin Dapp's message of "Wed, 15 Sep 2021 10:39:54 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Oct 2021 08:45:05 -0000 Hi Robin, Thanks for the update and sorry for the late response. Robin Dapp writes: > Hi Richard, > >> Don't we still need this code (without the REG_DEAD handling) for the >> case in which=E2=80=A6 >>=20 >>> + /* As we are transforming >>> + if (x > y) >>> + { >>> + a =3D b; >>> + c =3D d; >>> + } >>> + into >>> + a =3D (x > y) ... >>> + c =3D (x > y) ... >>> + >>> + we potentially check x > y before every set. >>> + Even though the check might be removed by subsequent passes, this me= ans >>> + that we cannot transform >>> + if (x > y) >>> + { >>> + x =3D y; >>> + ... >>> + } >>> + into >>> + x =3D (x > y) ... >>> + ... >>> + since this would invalidate x and the following to-be-removed checks. >>> + Therefore we introduce a temporary every time we are about to >>> + overwrite a variable used in the check. Costing of a sequence with >>> + these is going to be inaccurate so only use temporaries when >>> + needed. */ >>> + if (reg_overlap_mentioned_p (target, cond)) >>> + temp =3D gen_reg_rtx (GET_MODE (target)); >>=20 >> =E2=80=A6this code triggers? I don't see otherwise how later uses of x = would >> pick up =E2=80=9Ctemp=E2=80=9D instead of the original target. E.g. sup= pose we had: >>=20 >> if (x > y) >> { >> x =3D =E2=80=A6; >> z =3D x; // x does not die here >> } >>=20 >> Without the loop, it looks like z would pick up the old value of x >> (used in the comparison) instead of the new one. > > getting back to this now. I re-added handling of the situation you=20 > mentioned (even though I didn't manage to trigger it myself). Yeah, the only reliable way to test for this might be an rtx test (see testsuite/gcc.dg/rtl for some examples). A test isn't necessary though. My only remaining concern is that bb_ok_for_noce_convert_multiple_sets allows subreg sources: if (!(REG_P (src) || (GET_CODE (src) =3D=3D SUBREG && REG_P (SUBREG_REG (src)) && subreg_lowpart_p (src)))) return false; These subregs could also require rewrites. It looks like the original code checks for overlaps correctly, but doesn't necessarily deal with a hit correctly. So this is at least partly pre-existing. I think the way of handling that is as follows: > diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c > index 017944f4f79..f1448667732 100644 > --- a/gcc/ifcvt.c > +++ b/gcc/ifcvt.c > @@ -98,6 +98,8 @@ static int dead_or_predicable (basic_block, basic_block= , basic_block, > edge, int); > static void noce_emit_move_insn (rtx, rtx); > static rtx_insn *block_has_only_trap (basic_block); > +static void need_cmov_or_rewire (basic_block, hash_set *, > + hash_map *); > > /* Count the number of non-jump active insns in BB. */ >=20=20 > @@ -3203,6 +3205,11 @@ noce_convert_multiple_sets (struct noce_if_info *i= f_info) > auto_vec unmodified_insns; > int count =3D 0; >=20=20 > + hash_set need_no_cmov; > + hash_map rewired_src; > + > + need_cmov_or_rewire (then_bb, &need_no_cmov, &rewired_src); > + > FOR_BB_INSNS (then_bb, insn) > { > /* Skip over non-insns. */ > @@ -3213,26 +3220,47 @@ noce_convert_multiple_sets (struct noce_if_info *= if_info) > gcc_checking_assert (set); >=20=20 > rtx target =3D SET_DEST (set); > - rtx temp =3D gen_reg_rtx (GET_MODE (target)); > - rtx new_val =3D SET_SRC (set); > + rtx temp; > + > + int *ii =3D rewired_src.get (insn); > + rtx new_val =3D ii =3D=3D NULL ? SET_SRC (set) : temporaries[*ii]; (1) here use something like: rtx new_val =3D SET_SRC (set); if (int *ii =3D rewired_src.get (insn)) new_val =3D simplify_replace_rtx (new_val, targets[*ii], temporaries[*ii]); > rtx old_val =3D target; >=20=20 > - /* If we were supposed to read from an earlier write in this block, > - we've changed the register allocation. Rewire the read. While > - we are looking, also try to catch a swap idiom. */ > - for (int i =3D count - 1; i >=3D 0; --i) > - if (reg_overlap_mentioned_p (new_val, targets[i])) > - { > - /* Catch a "swap" style idiom. */ > - if (find_reg_note (insn, REG_DEAD, new_val) !=3D NULL_RTX) > - /* The write to targets[i] is only live until the read > - here. As the condition codes match, we can propagate > - the set to here. */ > - new_val =3D SET_SRC (single_set (unmodified_insns[i])); > - else > - new_val =3D temporaries[i]; > - break; > - } > + /* As we are transforming > + if (x > y) > + { > + a =3D b; > + c =3D d; > + } > + into > + a =3D (x > y) ... > + c =3D (x > y) ... > + > + we potentially check x > y before every set. > + Even though the check might be removed by subsequent passes, this means > + that we cannot transform > + if (x > y) > + { > + x =3D y; > + ... > + } > + into > + x =3D (x > y) ... > + ... > + since this would invalidate x and the following to-be-removed checks. > + Therefore we introduce a temporary every time we are about to > + overwrite a variable used in the check. Costing of a sequence with > + these is going to be inaccurate so only use temporaries when > + needed. */ > + if (reg_overlap_mentioned_p (target, cond)) > + temp =3D gen_reg_rtx (GET_MODE (target)); > + else > + temp =3D target; > + > + /* We have identified swap-style idioms in check_need_cmovs. A no= rmal > + set will need to be a cmov while the first instruction of a swap-style > + idiom can be a regular move. This helps with costing. */ > + bool need_cmov =3D !need_no_cmov.contains (insn); >=20=20 > /* If we had a non-canonical conditional jump (i.e. one where > the fallthrough is to the "else" case) we need to reverse > @@ -3275,16 +3303,29 @@ noce_convert_multiple_sets (struct noce_if_info *= if_info) > old_val =3D lowpart_subreg (dst_mode, old_val, src_mode); > } >=20=20 > - /* Actually emit the conditional move. */ > - rtx temp_dest =3D noce_emit_cmove (if_info, temp, cond_code, > - x, y, new_val, old_val); > + rtx temp_dest =3D NULL_RTX; >=20=20 > - /* If we failed to expand the conditional move, drop out and don't > - try to continue. */ > - if (temp_dest =3D=3D NULL_RTX) > + if (need_cmov) > { > - end_sequence (); > - return FALSE; > + /* Actually emit the conditional move. */ > + temp_dest =3D noce_emit_cmove (if_info, temp, cond_code, > + x, y, new_val, old_val); > + > + /* If we failed to expand the conditional move, drop out and don't > + try to continue. */ > + if (temp_dest =3D=3D NULL_RTX) > + { > + end_sequence (); > + return FALSE; > + } > + } > + else > + { > + if (if_info->then_else_reversed) > + noce_emit_move_insn (temp, old_val); > + else > + noce_emit_move_insn (temp, new_val); > + temp_dest =3D temp; > } >=20=20 > /* Bookkeeping. */ > @@ -3808,6 +3849,87 @@ check_cond_move_block (basic_block bb, > return TRUE; > } >=20=20 > +/* Find local swap-style idioms in BB and mark the first insn (1) > + that is only a temporary as not needing a conditional move as > + it is going to be dead afterwards anyway. > + > + (1) int tmp =3D a; > + a =3D b; > + b =3D tmp; > + > + ifcvt > + --> > + > + tmp =3D a; > + a =3D cond ? b : a_old; > + b =3D cond ? tmp : b_old; > + > + Additionally, store the index of insns like (2) when a subsequent > + SET reads from their destination. > + > + (2) int c =3D a; > + int d =3D c; > + > + ifcvt > + --> > + > + c =3D cond ? a : c_old; > + d =3D cond ? d : c; // Need to use c rather than c_old here. > +*/ > + > +static void > +need_cmov_or_rewire (basic_block bb, > + hash_set *need_no_cmov, > + hash_map *rewired_src) > +{ > + rtx_insn *insn; > + int count =3D 0; > + auto_vec insns; > + auto_vec dests; > + > + /* Iterate over all SETs, storing the destinations > + in DEST. > + - If we hit a SET that reads from a destination > + that we have seen before and the corresponding register > + is dead afterwards, the register does not need to be > + moved conditionally. > + - If we encounter a previously changed register, > + rewire the read to the original source. */ > + FOR_BB_INSNS (bb, insn) > + { > + rtx set, src, dest; > + > + if (!active_insn_p (insn)) > + continue; > + > + set =3D single_set (insn); > + if (set =3D=3D NULL_RTX) > + continue; > + > + src =3D SET_SRC (set); > + dest =3D SET_DEST (set); > + > + /* Check if the current SET's source is the same > + as any previously seen destination. > + This is quadratic but the number of insns in BB > + is bounded by PARAM_MAX_RTL_IF_CONVERSION_INSNS. */ (2) Insert: if (SUBREG_P (src)) src =3D SUBREG_REG (src); here. OK with those changes if that works. Let me know if they don't =E2=80=94 I'll try to be quicker with the next review. Thanks, Richard > + if (REG_P (src)) > + for (int i =3D count - 1; i >=3D 0; --i) > + if (reg_overlap_mentioned_p (src, dests[i])) > + { > + if (find_reg_note (insn, REG_DEAD, src) !=3D NULL_RTX) > + need_no_cmov->add (insns[i]); > + else > + rewired_src->put (insn, i); > + } > + > + insns.safe_push (insn); > + dests.safe_push (dest); > + > + count++; > + } > +} > + > /* Given a basic block BB suitable for conditional move conversion, > a condition COND, and pointer maps THEN_VALS and ELSE_VALS containing > the register values depending on COND, emit the insns in the block as