From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C81F93857C41 for ; Fri, 11 Mar 2022 06:33:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C81F93857C41 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 22B4F303008186; Fri, 11 Mar 2022 06:33:44 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3eqg9e5kks-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 11 Mar 2022 06:33:44 +0000 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 22B5pHuH021245; Fri, 11 Mar 2022 06:33:43 GMT Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0b-001b2d01.pphosted.com with ESMTP id 3eqg9e5kkg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 11 Mar 2022 06:33:43 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 22B6NLeF015984; Fri, 11 Mar 2022 06:33:42 GMT Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by ppma04dal.us.ibm.com with ESMTP id 3epb9cxupp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 11 Mar 2022 06:33:42 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 22B6XfiZ26345800 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 11 Mar 2022 06:33:41 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D45F8AE062; Fri, 11 Mar 2022 06:33:41 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 62D58AE05F; Fri, 11 Mar 2022 06:33:41 +0000 (GMT) Received: from perkins (unknown [9.3.90.38]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTPS; Fri, 11 Mar 2022 06:33:41 +0000 (GMT) From: Jiufu Guo To: Richard Biener Cc: Segher Boessenkool , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, jlaw@tachyum.com Subject: Re: [PATCH] Check if loading const from mem is faster References: <20220222065313.2040127-1-guojiufu@linux.ibm.com> <70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr> <1d471fba-a966-3e90-92ce-ae4707fe53b6@linux.ibm.com> <20220223212749.GI614@gate.crashing.org> <20220228164511.GB614@gate.crashing.org> <7p35k1u4pi.fsf@linux.ibm.com> <20220302202416.GG614@gate.crashing.org> <7pilsvqref.fsf@linux.ibm.com> <7pcziwy9cp.fsf@linux.ibm.com> <9s5op1r4-9nps-r156-95r-os98o6474153@fhfr.qr> <7pwnh3wxkd.fsf@linux.ibm.com> <7pr17awob7.fsf@linux.ibm.com> <55n87321-7o53-ro4-on74-75qspq8q215@fhfr.qr> Date: Fri, 11 Mar 2022 14:33:38 +0800 In-Reply-To: <55n87321-7o53-ro4-on74-75qspq8q215@fhfr.qr> (Richard Biener's message of "Thu, 10 Mar 2022 08:09:15 +0100 (CET)") Message-ID: <7pfsnpvvzx.fsf@linux.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: SyPTxidhsAqqdgjMADGjf2pcZPQcXlcA X-Proofpoint-ORIG-GUID: JYoOz2Q-R7oo2b8JzL7PU0falhD4VJUu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.816,Hydra:6.0.425,FMLib:17.11.64.514 definitions=2022-03-10_09,2022-03-11_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 malwarescore=0 clxscore=1015 adultscore=0 spamscore=0 impostorscore=0 bulkscore=0 suspectscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2203110027 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2022 06:33:47 -0000 Hi! Richard Biener writes: > On Thu, 10 Mar 2022, Jiufu Guo wrote: > >> >> Hi! >> >> Richard Biener writes: >> >> > On Wed, 9 Mar 2022, Jiufu Guo wrote: >> > >> >> >> >> Hi! >> >> >> >> Richard Biener writes: >> >> >> >> > On Tue, 8 Mar 2022, Jiufu Guo wrote: >> >> > >> >> >> Jiufu Guo writes: >> >> >> >> >> >> Hi! >> >> >> >> >> >> > Hi Sehger, >> >> >> > >> >> >> > Segher Boessenkool writes: >> >> >> > >> >> >> >> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote: >> >> >> >>> Segher Boessenkool writes: >> >> >> >>> > No. insn_cost is only for correct, existing instructions, not for >> >> >> >>> > made-up nonsense. I created insn_cost precisely to get away from that >> >> >> >>> > aspect of rtx_cost (and some other issues, like, it is incredibly hard >> >> >> >>> > and cumbersome to write a correct rtx_cost). >> >> >> >>> >> >> >> >>> Thanks! The implementations of hook insn_cost are align with this >> >> >> >>> design, they are checking insn's attributes and COSTS_N_INSNS. >> >> >> >>> >> >> >> >>> One question on the speciall case: >> >> >> >>> For instruction: "r119:DI=0x100803004101001" >> >> >> >>> Would we treat it as valid instruction? >> >> >> >> >> >> >> >> Currently we do, alternative 6 in *movdi_internal64: we allow any r<-n. >> >> >> >> This is costed as 5 insns (cost=20). >> >> >> >> >> >> >> >> It generally is better to split things into patterns close to the >> >> >> >> eventual machine isntructions as early as possible: all the more generic >> >> >> >> optimisations can take advantage of that then. >> >> >> > Get it! >> >> >> >> >> >> >> >>> A patch, which is attached the end of this mail, accepts >> >> >> >>> "r119:DI=0x100803004101001" as input of insn_cost. >> >> >> >>> In this patch, >> >> >> >>> - A tmp instruction is generated via make_insn_raw. >> >> >> >>> - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost. >> >> >> >>> - In hook of insn_cost, checking the special 'constant' instruction. >> >> >> >>> Are these make sense? >> >> >> >> >> >> >> >> I'll review that patch inline. >> >> >> >> >> >> I drafted a new patch that replace rtx_cost with insn_cost for cse.cc. >> >> >> Different from the previous partial patch, this patch replaces all usage >> >> >> of rtx_cost. It may be better/aggressive than previous one. >> >> > >> >> > I think there's no advantage for using insn_cost over rtx_cost for >> >> > the simple SET case. >> >> >> >> Thanks for your comments and raise this concern. >> >> >> >> For those targets which do not implement insn_cost, insn_cost calls >> >> rtx_cost through pattern_cost, then insn_cost is equal to rtx_cost. >> >> >> >> While, for those targets which have insn_cost, it seems insn_cost would >> >> be better(or say more accurate/consistent?) than rtx_cost. Since: >> >> - insn_cost recog the insn first, and compute cost through something >> > >> > target hooks are expected to call recog on the insn but the generic >> > fallback does not!? Or do you say a target _could_ call recog? >> > I think it would be valid to only expect recognized insns here >> > and thus cse.cc obligation would be to call regoc on the "fake" >> > instruction which then raises the obvious issue whether you should >> > ever call recog on something "fake". >> Thanks Richard! I also feel this is a main point of insn_cost. >> From my understanding: it would be better to let insn_cost check >> the valid recognizable insns; this would be the major purpose of >> insn_cost. While, I'm also wondering, we may let it go for 'fake' >> instruction for current implementations (e.g. arm_insn_cost) which >> behaviors to walk/check pattern. > > Note for the CSE case you'll always end up with the single move > pattern so it's somewhat pointless to do the recog & insn_cost > dance there. For move cost using rtx_cost (SET, ..) should > be good enough. One could argue we should standardize a (wrapping) > API like move_cost (enum machine_mode mode, rtx to, rtx from), > with to/from optional (if omitted a REG of MODE). But the existing > rtx_cost target hook implementation should be sufficient to handle > it, without building a fake insn and without doing (the pointless, > if not failing) recog process. Thanks so much for your comments and suggestions! Using rtx_cost is able to handle those things in cse.cc. For insn_cost, it is good for checking an instruction. To simulate 'rtx_cost' using insn_cost, there is excess work: 'recog' on the setting of rtx expr and alternative estimation. I just wondering we may prefer to use insn_cost for its consistency and accuracy. :-) So, the patch is prepared. BR, Jiufu > > Richard. > >> > >> > I also see that rs6000_insn_cost does >> > >> > static int >> > rs6000_insn_cost (rtx_insn *insn, bool speed) >> > { >> > if (recog_memoized (insn) < 0) >> > return 0; >> > >> > so not recognized insns become quite cheap - it looks like >> > insn_cost has no documented failure mode and initial implementors >> > chickened out asserting that an insn is / can be recognized. >> > In fact I'd assert that INSN_CODE (insn) >= 0 simply because there's >> > no failure mode and the _caller_ should have made sure the >> > insn can be recognized. That said, the insn_cost should do that. >> Thanks! Using the assert would help to catch un-recog failures. >> >> > (is INSN_CODE () == 0 a real thing?) >> 0 is specialy, it is some thing like CODE_FOR_nothing. :-) >> >> > >> >> (like length/cost attributes from .md file) for the 'machine insn'. >> >> - rtx_cost estimates the cost through analyzing the 'rtx content'. >> >> The accurate estimation relates to the context. >> >> >> >> For a special example: "%r100 = C", as a previous patch, by tunning >> >> target's rtx_cost hook, cost could be computed according to the value >> >> of C. insn_cost may just model the cost in the define of the machine >> >> instruction. >> >> >> >> These reasons are my initial thoughts. Segher may have better >> >> explain. :-) >> > >> > OK, so in theory the targets insn_cost can use insn attributes >> > which allows to keep the cost parts of an instruction close to the >> > instruction patterns which is probably a good thing for >> > maintainability. But as you point out this requires recognizing >> > of possibly generated instructions. And before reload it has >> > the issue that the alternative is not determined which means the >> > true cost cannot be determined without walking the pattern, >> > like on x86 where many instruction operands have both register >> > and memory alternatives. >> Right, when there is no alternative fits for true cost, it may have to >> check the pattern like 'rtx_cost' then. :-) >> >> >> BR, >> Jiufu >> >> > >> >> To replace rtx_cost with insn_cost, this patch build a SET instruction: >> >> "%r = rtx_expr", then using "%r = rtx_expr" from insn_cost to simulate >> >> the cost of "rtx_expr" from rtx_cost. >> >> >> >> >> >> BR, >> >> Jiufu >> >> >> >> > >> >> > Richard. >> >> > >> >> >> With this patch, bootstrap pass. >> >> >> From regtest, only output of fusion-p10-ldcmpi.c is changed, and the >> >> >> change seems as expected. >> >> >> >> >> >> >> >> >> BR, >> >> >> Jiufu >> >> >> >> >> >> diff --git a/gcc/cse.cc b/gcc/cse.cc >> >> >> index a18b599d324..e623ad298db 100644 >> >> >> --- a/gcc/cse.cc >> >> >> +++ b/gcc/cse.cc >> >> >> @@ -262,6 +262,9 @@ static struct qty_table_elem *qty_table; >> >> >> static rtx_insn *this_insn; >> >> >> static bool optimize_this_for_speed_p; >> >> >> >> >> >> +/* Used for insn_cost. */ >> >> >> +static rtx_insn *estimate_insn; >> >> >> + >> >> >> /* Index by register number, gives the number of the next (or >> >> >> previous) register in the chain of registers sharing the same >> >> >> value. >> >> >> @@ -445,7 +448,7 @@ struct table_elt >> >> >> /* Compute cost of X, as stored in the `cost' field of a table_elt. Fixed >> >> >> hard registers and pointers into the frame are the cheapest with a cost >> >> >> of 0. Next come pseudos with a cost of one and other hard registers with >> >> >> - a cost of 2. Aside from these special cases, call `rtx_cost'. */ >> >> >> + a cost of 2. Aside from these special cases, call `insn_cost'. */ >> >> >> >> >> >> #define CHEAP_REGNO(N) \ >> >> >> (REGNO_PTR_FRAME_P (N) \ >> >> >> @@ -698,18 +701,33 @@ preferable (int cost_a, int regcost_a, int cost_b, int regcost_b) >> >> >> from COST macro to keep it simple. */ >> >> >> >> >> >> static int >> >> >> -notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno) >> >> >> +notreg_cost (rtx x, machine_mode mode, enum rtx_code /*outer*/, int /*opno*/) >> >> >> { >> >> >> scalar_int_mode int_mode, inner_mode; >> >> >> - return ((GET_CODE (x) == SUBREG >> >> >> - && REG_P (SUBREG_REG (x)) >> >> >> - && is_int_mode (mode, &int_mode) >> >> >> - && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode) >> >> >> - && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode) >> >> >> - && subreg_lowpart_p (x) >> >> >> - && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode)) >> >> >> - ? 0 >> >> >> - : rtx_cost (x, mode, outer, opno, optimize_this_for_speed_p) * 2); >> >> >> + if (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x)) >> >> >> + && is_int_mode (mode, &int_mode) >> >> >> + && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode) >> >> >> + && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode) >> >> >> + && subreg_lowpart_p (x) >> >> >> + && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode)) >> >> >> + return 0; >> >> >> + >> >> >> + if (estimate_insn == NULL) >> >> >> + { >> >> >> + estimate_insn = make_insn_raw ( >> >> >> + gen_rtx_SET (gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1), x)); >> >> >> + SET_PREV_INSN (estimate_insn) = NULL_RTX; >> >> >> + SET_NEXT_INSN (estimate_insn) = NULL_RTX; >> >> >> + INSN_LOCATION (estimate_insn) = 0; >> >> >> + } >> >> >> + else >> >> >> + { >> >> >> + /* Update for new context. */ >> >> >> + INSN_CODE (estimate_insn) = -1; >> >> >> + PUT_MODE (SET_DEST (PATTERN (estimate_insn)), mode); >> >> >> + SET_SRC (PATTERN (estimate_insn)) = x; >> >> >> + } >> >> >> + return insn_cost (estimate_insn, optimize_this_for_speed_p); >> >> >> } >> >> >> >> >> >> >> >> >> @@ -6667,6 +6685,7 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs) >> >> >> >> >> >> init_recog (); >> >> >> init_alias_analysis (); >> >> >> + estimate_insn = NULL; >> >> >> >> >> >> reg_eqv_table = XNEWVEC (struct reg_eqv_elem, nregs); >> >> >> >> >> >> >> >> >>