From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 9F6ED389A131 for ; Mon, 5 Dec 2022 13:14:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9F6ED389A131 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DBD1123A; Mon, 5 Dec 2022 05:15:05 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E4EE73F71A; Mon, 5 Dec 2022 05:14:57 -0800 (PST) From: Richard Sandiford To: Tamar Christina via Gcc-patches Mail-Followup-To: Tamar Christina via Gcc-patches ,Richard Biener , Tamar Christina , Richard Biener , Aldy Hernandez , Jeff Law , nd , "MacLeod\, Andrew" , richard.sandiford@arm.com Cc: Richard Biener , Tamar Christina , Richard Biener , Aldy Hernandez , Jeff Law , nd , "MacLeod\, Andrew" Subject: Re: [PATCH 1/2]middle-end: Add new tbranch optab to add support for bit-test-and-branch operations References: Date: Mon, 05 Dec 2022 13:14:56 +0000 In-Reply-To: (Richard Sandiford via Gcc-patches's message of "Mon, 05 Dec 2022 12:00:40 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-39.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Sandiford via Gcc-patches writes: > Tamar Christina via Gcc-patches writes: >>> > +/* Check to see if the supplied comparison in PTEST can be performed as a >>> > + bit-test-and-branch instead. VAL must contain the original tree >>> > + expression of the non-zero operand which will be used to rewrite the >>> > + comparison in PTEST. >>> > + >>> > + Returns TRUE if operation succeeds and returns updated PMODE and >>> PTEST, >>> > + else FALSE. */ >>> > + >>> > +enum insn_code >>> > +static validate_test_and_branch (tree val, rtx *ptest, machine_mode >>> > +*pmode) { >>> > + if (!val || TREE_CODE (val) != SSA_NAME) >>> > + return CODE_FOR_nothing; >>> > + >>> > + machine_mode mode = TYPE_MODE (TREE_TYPE (val)); rtx test = >>> > + *ptest; >>> > + >>> > + if (GET_CODE (test) != EQ && GET_CODE (test) != NE) >>> > + return CODE_FOR_nothing; >>> > + >>> > + /* If the target supports the testbit comparison directly, great. >>> > + */ auto icode = direct_optab_handler (tbranch_optab, mode); if >>> > + (icode == CODE_FOR_nothing) >>> > + return icode; >>> > + >>> > + if (tree_zero_one_valued_p (val)) >>> > + { >>> > + auto pos = BYTES_BIG_ENDIAN ? GET_MODE_BITSIZE (mode) - 1 : 0; >>> >>> Does this work for BYTES_BIG_ENDIAN && !WORDS_BIG_ENDIAN and mode >>> > word_mode? >>> >> >> It does now. In this particular case all that matters is the bit ordering, so I've changed >> It to BITS_BIG_ENDIAN. >> >> Also during the review of the AArch64 optab Richard Sandiford wanted me to split the >> optabs apart into two. The reason is that a match_operator still gets the full RTL. >> >> In the case of a tbranch the full RTL has an invalid comparison, so if a target doesn't implement >> the hook correctly this would lead to incorrect code. We've now moved the operator as part of >> the name itself to avoid this. >> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> >> Ok for master? >> >> Thanks, >> Tamar >> >> gcc/ChangeLog: >> >> * dojump.cc (do_jump): Pass along value. >> (do_jump_by_parts_greater_rtx): Likewise. >> (do_jump_by_parts_zero_rtx): Likewise. >> (do_jump_by_parts_equality_rtx): Likewise. >> (do_compare_rtx_and_jump): Likewise. >> (do_compare_and_jump): Likewise. >> * dojump.h (do_compare_rtx_and_jump): New. >> * optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check. >> (validate_test_and_branch): New. >> (emit_cmp_and_jump_insns): Optiobally take a value, and when value is >> supplied then check if it's suitable for tbranch. >> * optabs.def (tbranch_eq$a4, tbranch_ne$a4): New. >> * doc/md.texi (tbranch_@var{op}@var{mode}4): Document it. >> * optabs.h (emit_cmp_and_jump_insns): >> * tree.h (tree_zero_one_valued_p): New. > > Thanks for doing this. > >> --- inline copy of patch --- >> >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi >> index d0a71ecbb806de3a6564c6ffe973fec5da5c597b..c6c4b13d756de28078a0a779876a00c614246914 100644 >> --- a/gcc/doc/md.texi >> +++ b/gcc/doc/md.texi >> @@ -6964,6 +6964,14 @@ case, you can and should make operand 1's predicate reject some operators >> in the @samp{cstore@var{mode}4} pattern, or remove the pattern altogether >> from the machine description. >> >> +@cindex @code{tbranch_@var{op}@var{mode}4} instruction pattern >> +@item @samp{tbranch_@var{op}@var{mode}4} >> +Conditional branch instruction combined with a bit test-and-compare >> +instruction. Operand 0 is a comparison operator. Operand 1 is the >> +operand of the comparison. Operand 2 is the bit position of Operand 1 to test. >> +Operand 3 is the @code{code_label} to jump to. @var{op} is one of @var{eq} or >> +@var{ne}. >> + > > The documentation still describes the old interface. Also, there are only 3 > operands now, rather than 4, so the optab name should end with 3. > >> @cindex @code{cbranch@var{mode}4} instruction pattern >> @item @samp{cbranch@var{mode}4} >> Conditional branch instruction combined with a compare instruction. >> diff --git a/gcc/dojump.h b/gcc/dojump.h >> index e379cceb34bb1765cb575636e4c05b61501fc2cf..d1d79c490c420a805fe48d58740a79c1f25fb839 100644 >> --- a/gcc/dojump.h >> +++ b/gcc/dojump.h >> @@ -71,6 +71,10 @@ extern void jumpifnot (tree exp, rtx_code_label *label, >> extern void jumpifnot_1 (enum tree_code, tree, tree, rtx_code_label *, >> profile_probability); >> >> +extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int, tree, >> + machine_mode, rtx, rtx_code_label *, >> + rtx_code_label *, profile_probability); >> + >> extern void do_compare_rtx_and_jump (rtx, rtx, enum rtx_code, int, >> machine_mode, rtx, rtx_code_label *, >> rtx_code_label *, profile_probability); >> diff --git a/gcc/dojump.cc b/gcc/dojump.cc >> index 2af0cd1aca3b6af13d5d8799094ee93f18022296..190324f36f1a31990f8c49bc8c0f45c23da5c31e 100644 >> --- a/gcc/dojump.cc >> +++ b/gcc/dojump.cc >> @@ -619,7 +619,7 @@ do_jump (tree exp, rtx_code_label *if_false_label, >> } >> do_compare_rtx_and_jump (temp, CONST0_RTX (GET_MODE (temp)), >> NE, TYPE_UNSIGNED (TREE_TYPE (exp)), >> - GET_MODE (temp), NULL_RTX, >> + exp, GET_MODE (temp), NULL_RTX, >> if_false_label, if_true_label, prob); >> } >> >> @@ -687,7 +687,7 @@ do_jump_by_parts_greater_rtx (scalar_int_mode mode, int unsignedp, rtx op0, >> >> /* All but high-order word must be compared as unsigned. */ >> do_compare_rtx_and_jump (op0_word, op1_word, code, (unsignedp || i > 0), >> - word_mode, NULL_RTX, NULL, if_true_label, >> + NULL, word_mode, NULL_RTX, NULL, if_true_label, >> prob); >> >> /* Emit only one comparison for 0. Do not emit the last cond jump. */ >> @@ -695,8 +695,8 @@ do_jump_by_parts_greater_rtx (scalar_int_mode mode, int unsignedp, rtx op0, >> break; >> >> /* Consider lower words only if these are equal. */ >> - do_compare_rtx_and_jump (op0_word, op1_word, NE, unsignedp, word_mode, >> - NULL_RTX, NULL, if_false_label, >> + do_compare_rtx_and_jump (op0_word, op1_word, NE, unsignedp, NULL, >> + word_mode, NULL_RTX, NULL, if_false_label, >> prob.invert ()); >> } >> >> @@ -755,7 +755,7 @@ do_jump_by_parts_zero_rtx (scalar_int_mode mode, rtx op0, >> >> if (part != 0) >> { >> - do_compare_rtx_and_jump (part, const0_rtx, EQ, 1, word_mode, >> + do_compare_rtx_and_jump (part, const0_rtx, EQ, 1, NULL, word_mode, >> NULL_RTX, if_false_label, if_true_label, prob); >> return; >> } >> @@ -766,7 +766,7 @@ do_jump_by_parts_zero_rtx (scalar_int_mode mode, rtx op0, >> >> for (i = 0; i < nwords; i++) >> do_compare_rtx_and_jump (operand_subword_force (op0, i, mode), >> - const0_rtx, EQ, 1, word_mode, NULL_RTX, >> + const0_rtx, EQ, 1, NULL, word_mode, NULL_RTX, >> if_false_label, NULL, prob); >> >> if (if_true_label) >> @@ -809,8 +809,8 @@ do_jump_by_parts_equality_rtx (scalar_int_mode mode, rtx op0, rtx op1, >> >> for (i = 0; i < nwords; i++) >> do_compare_rtx_and_jump (operand_subword_force (op0, i, mode), >> - operand_subword_force (op1, i, mode), >> - EQ, 0, word_mode, NULL_RTX, >> + operand_subword_force (op1, i, mode), >> + EQ, 0, NULL, word_mode, NULL_RTX, >> if_false_label, NULL, prob); >> >> if (if_true_label) >> @@ -962,6 +962,23 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp, >> rtx_code_label *if_false_label, >> rtx_code_label *if_true_label, >> profile_probability prob) >> +{ >> + do_compare_rtx_and_jump (op0, op1, code, unsignedp, NULL, mode, size, >> + if_false_label, if_true_label, prob); >> +} >> + >> +/* Like do_compare_and_jump but expects the values to compare as two rtx's. >> + The decision as to signed or unsigned comparison must be made by the caller. >> + >> + If MODE is BLKmode, SIZE is an RTX giving the size of the objects being >> + compared. */ >> + >> +void >> +do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp, >> + tree val, machine_mode mode, rtx size, >> + rtx_code_label *if_false_label, >> + rtx_code_label *if_true_label, >> + profile_probability prob) >> { >> rtx tem; >> rtx_code_label *dummy_label = NULL; >> @@ -1177,8 +1194,10 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp, >> } >> else >> dest_label = if_false_label; >> - do_compare_rtx_and_jump (op0, op1, first_code, unsignedp, mode, >> - size, dest_label, NULL, first_prob); >> + >> + do_compare_rtx_and_jump (op0, op1, first_code, unsignedp, >> + val, mode, size, dest_label, NULL, >> + first_prob); >> } >> /* For !and_them we want to split: >> if (x) goto t; // prob; >> @@ -1192,8 +1211,9 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp, >> else >> { >> profile_probability first_prob = prob.split (cprob); >> - do_compare_rtx_and_jump (op0, op1, first_code, unsignedp, mode, >> - size, NULL, if_true_label, first_prob); >> + do_compare_rtx_and_jump (op0, op1, first_code, unsignedp, >> + val, mode, size, NULL, >> + if_true_label, first_prob); >> if (orig_code == NE && can_compare_p (UNEQ, mode, ccp_jump)) >> { >> /* x != y can be split into x unord y || x ltgt y >> @@ -1215,7 +1235,7 @@ do_compare_rtx_and_jump (rtx op0, rtx op1, enum rtx_code code, int unsignedp, >> } >> } >> >> - emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, >> + emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, val, >> if_true_label, prob); >> } >> >> @@ -1289,9 +1309,9 @@ do_compare_and_jump (tree treeop0, tree treeop1, enum rtx_code signed_code, >> op1 = new_op1; >> } >> >> - do_compare_rtx_and_jump (op0, op1, code, unsignedp, mode, >> - ((mode == BLKmode) >> - ? expr_size (treeop0) : NULL_RTX), >> + do_compare_rtx_and_jump (op0, op1, code, unsignedp, treeop0, mode, >> + ((mode == BLKmode) >> + ? expr_size (treeop0) : NULL_RTX), >> if_false_label, if_true_label, prob); >> } >> >> diff --git a/gcc/optabs.cc b/gcc/optabs.cc >> index 31b15fd3df5fa88119867a23d2abbed139a05115..303b4fd2def9278ddbc3d586103ac8274e73a982 100644 >> --- a/gcc/optabs.cc >> +++ b/gcc/optabs.cc >> @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3. If not see >> #include "libfuncs.h" >> #include "internal-fn.h" >> #include "langhooks.h" >> +#include "gimple.h" >> +#include "ssa.h" >> >> static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *, >> machine_mode *); >> @@ -4623,7 +4625,8 @@ prepare_operand (enum insn_code icode, rtx x, int opnum, machine_mode mode, >> >> static void >> emit_cmp_and_jump_insn_1 (rtx test, machine_mode mode, rtx label, >> - profile_probability prob) >> + direct_optab cmp_optab, profile_probability prob, >> + bool test_branch) >> { >> machine_mode optab_mode; >> enum mode_class mclass; >> @@ -4632,12 +4635,17 @@ emit_cmp_and_jump_insn_1 (rtx test, machine_mode mode, rtx label, >> >> mclass = GET_MODE_CLASS (mode); >> optab_mode = (mclass == MODE_CC) ? CCmode : mode; >> - icode = optab_handler (cbranch_optab, optab_mode); >> + icode = optab_handler (cmp_optab, optab_mode); >> >> gcc_assert (icode != CODE_FOR_nothing); >> - gcc_assert (insn_operand_matches (icode, 0, test)); >> - insn = emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), >> - XEXP (test, 1), label)); >> + gcc_assert (test_branch || insn_operand_matches (icode, 0, test)); >> + if (test_branch) >> + insn = emit_jump_insn (GEN_FCN (icode) (XEXP (test, 0), >> + XEXP (test, 1), label)); >> + else >> + insn = emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), >> + XEXP (test, 1), label)); >> + >> if (prob.initialized_p () >> && profile_status_for_fn (cfun) != PROFILE_ABSENT >> && insn >> @@ -4647,6 +4655,63 @@ emit_cmp_and_jump_insn_1 (rtx test, machine_mode mode, rtx label, >> add_reg_br_prob_note (insn, prob); >> } >> >> +/* Check to see if the supplied comparison in PTEST can be performed as a >> + bit-test-and-branch instead. VAL must contain the original tree >> + expression of the non-zero operand which will be used to rewrite the >> + comparison in PTEST. >> + >> + Returns TRUE if operation succeeds and returns updated PMODE and PTEST, >> + else FALSE. */ > > The function now returns an icode rather than true/false. I think it'd > also be good to clarify what *PTEST means for the tbranch case. How about: > > /* PTEST points to a comparison that compares its first operand with zero. > Check to see if it can be performed as a bit-test-and-branch instead. > On success, return the instruction that performs the bit-and-test-and-branch (bit-test-and-branch) > and replace the second operand of *PTEST with the bit number to test. > On failure, return CODE_FOR_nothing and leave *PTEST unchanged. > > Note that the comparison described by *PTEST should not be taken > literally after a successful return. *PTEST is just a convenient > place to store the two operands of the bit-and-test. > > VAL must contain the original tree expression for the first operand > of *PTEST. */ > > Looks good to me otherwise. > > Thanks, > Richard > >> +static enum insn_code >> +validate_test_and_branch (tree val, rtx *ptest, machine_mode *pmode, optab *res) >> +{ >> + if (!val || TREE_CODE (val) != SSA_NAME) >> + return CODE_FOR_nothing; >> + >> + machine_mode mode = TYPE_MODE (TREE_TYPE (val)); >> + rtx test = *ptest; >> + direct_optab optab; >> + >> + if (GET_CODE (test) == EQ) >> + optab = tbranch_eq_optab; >> + else if (GET_CODE (test) == NE) >> + optab = tbranch_ne_optab; >> + else >> + return CODE_FOR_nothing; >> + >> + *res = optab; >> + >> + /* If the target supports the testbit comparison directly, great. */ >> + auto icode = direct_optab_handler (optab, mode); >> + if (icode == CODE_FOR_nothing) >> + return icode; >> + >> + if (tree_zero_one_valued_p (val)) >> + { >> + auto pos = BITS_BIG_ENDIAN ? GET_MODE_BITSIZE (mode) - 1 : 0; >> + XEXP (test, 1) = gen_int_mode (pos, mode); >> + *ptest = test; >> + *pmode = mode; >> + return icode; >> + } >> + >> + wide_int wcst = get_nonzero_bits (val); >> + if (wcst == -1) >> + return CODE_FOR_nothing; >> + >> + int bitpos; >> + >> + if ((bitpos = wi::exact_log2 (wcst)) == -1) >> + return CODE_FOR_nothing; >> + >> + auto pos = BITS_BIG_ENDIAN ? GET_MODE_BITSIZE (mode) - 1 - bitpos : bitpos; >> + XEXP (test, 1) = gen_int_mode (pos, mode); >> + *ptest = test; >> + *pmode = mode; >> + return icode; >> +} >> + >> /* Generate code to compare X with Y so that the condition codes are >> set and to jump to LABEL if the condition is true. If X is a >> constant and Y is not a constant, then the comparison is swapped to >> @@ -4664,11 +4729,13 @@ emit_cmp_and_jump_insn_1 (rtx test, machine_mode mode, rtx label, >> It will be potentially converted into an unsigned variant based on >> UNSIGNEDP to select a proper jump instruction. >> >> - PROB is the probability of jumping to LABEL. */ >> + PROB is the probability of jumping to LABEL. If the comparison is against >> + zero then VAL contains the expression from which the non-zero RTL is >> + derived. */ >> >> void >> emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code comparison, rtx size, >> - machine_mode mode, int unsignedp, rtx label, >> + machine_mode mode, int unsignedp, tree val, rtx label, >> profile_probability prob) >> { >> rtx op0 = x, op1 = y; >> @@ -4693,10 +4760,34 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code comparison, rtx size, >> >> prepare_cmp_insn (op0, op1, comparison, size, unsignedp, OPTAB_LIB_WIDEN, >> &test, &mode); >> - emit_cmp_and_jump_insn_1 (test, mode, label, prob); >> + >> + /* Check if we're comparing a truth type with 0, and if so check if >> + the target supports tbranch. */ >> + machine_mode tmode = mode; >> + direct_optab optab; >> + if (op1 == CONST0_RTX (GET_MODE (op1)) >> + && validate_test_and_branch (val, &test, &tmode, >> + &optab) != CODE_FOR_nothing) >> + { >> + emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true); >> + return; >> + } >> + >> + emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, false); >> } >> >> - >> >> +/* Overloaded version of emit_cmp_and_jump_insns in which VAL is unknown. */ >> + >> +void >> +emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code comparison, rtx size, >> + machine_mode mode, int unsignedp, rtx label, >> + profile_probability prob) >> +{ >> + emit_cmp_and_jump_insns (x, y, comparison, size, mode, unsignedp, NULL, >> + label, prob); >> +} >> + >> + >> /* Emit a library call comparison between floating point X and Y. >> COMPARISON is the rtl operator to compare with (EQ, NE, GT, etc.). */ >> >> diff --git a/gcc/optabs.def b/gcc/optabs.def >> index a6db2342bed6baf13ecbd84112c8432c6972e6fe..3199b05e90d6b9b9c6fb3c0353db3db02321e964 100644 >> --- a/gcc/optabs.def >> +++ b/gcc/optabs.def >> @@ -220,6 +220,8 @@ OPTAB_D (reload_in_optab, "reload_in$a") >> OPTAB_D (reload_out_optab, "reload_out$a") >> >> OPTAB_DC(cbranch_optab, "cbranch$a4", COMPARE) >> +OPTAB_D (tbranch_eq_optab, "tbranch_eq$a4") >> +OPTAB_D (tbranch_ne_optab, "tbranch_ne$a4") >> OPTAB_D (addcc_optab, "add$acc") >> OPTAB_D (negcc_optab, "neg$acc") >> OPTAB_D (notcc_optab, "not$acc") >> diff --git a/gcc/optabs.h b/gcc/optabs.h >> index cfd7c742d2d21b0539f5227c22a94f32c793d6f7..cd55604bc3d452d7e28c5530bb4793d481766f4f 100644 >> --- a/gcc/optabs.h >> +++ b/gcc/optabs.h >> @@ -268,6 +268,10 @@ extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx, >> machine_mode, int, rtx, >> profile_probability prob >> = profile_probability::uninitialized ()); >> +extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx, >> + machine_mode, int, tree, rtx, >> + profile_probability prob >> + = profile_probability::uninitialized ()); >> >> /* Generate code to indirectly jump to a location given in the rtx LOC. */ >> extern void emit_indirect_jump (rtx); >> diff --git a/gcc/tree.h b/gcc/tree.h >> index a863d2e50e5ecafa3f5da4dda98d9637261d07a9..abedaa80a3983ebb6f9ac733b2eaa8d039688f0a 100644 >> --- a/gcc/tree.h >> +++ b/gcc/tree.h >> @@ -4726,6 +4726,7 @@ extern tree signed_or_unsigned_type_for (int, tree); >> extern tree signed_type_for (tree); >> extern tree unsigned_type_for (tree); >> extern bool is_truth_type_for (tree, tree); >> +extern bool tree_zero_one_valued_p (tree); >> extern tree truth_type_for (tree); >> extern tree build_pointer_type_for_mode (tree, machine_mode, bool); >> extern tree build_pointer_type (tree);