From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id B8A5C3858C2C for ; Thu, 10 Mar 2022 07:09:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B8A5C3858C2C Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 89C191F381; Thu, 10 Mar 2022 07:09:17 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 8EF24A3B81; Thu, 10 Mar 2022 07:09:15 +0000 (UTC) Date: Thu, 10 Mar 2022 08:09:15 +0100 (CET) From: Richard Biener To: Jiufu Guo cc: Segher Boessenkool , gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, jlaw@tachyum.com Subject: Re: [PATCH] Check if loading const from mem is faster In-Reply-To: <7pr17awob7.fsf@linux.ibm.com> Message-ID: <55n87321-7o53-ro4-on74-75qspq8q215@fhfr.qr> References: <20220222065313.2040127-1-guojiufu@linux.ibm.com> <70r5oq10-988r-3rns-356-o3s79o292nn0@fhfr.qr> <1d471fba-a966-3e90-92ce-ae4707fe53b6@linux.ibm.com> <20220223212749.GI614@gate.crashing.org> <20220228164511.GB614@gate.crashing.org> <7p35k1u4pi.fsf@linux.ibm.com> <20220302202416.GG614@gate.crashing.org> <7pilsvqref.fsf@linux.ibm.com> <7pcziwy9cp.fsf@linux.ibm.com> <9s5op1r4-9nps-r156-95r-os98o6474153@fhfr.qr> <7pwnh3wxkd.fsf@linux.ibm.com> <7pr17awob7.fsf@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, DKIM_INVALID, DKIM_SIGNED, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2022 07:09:32 -0000 On Thu, 10 Mar 2022, Jiufu Guo wrote: > > Hi! > > Richard Biener writes: > > > On Wed, 9 Mar 2022, Jiufu Guo wrote: > > > >> > >> Hi! > >> > >> Richard Biener writes: > >> > >> > On Tue, 8 Mar 2022, Jiufu Guo wrote: > >> > > >> >> Jiufu Guo writes: > >> >> > >> >> Hi! > >> >> > >> >> > Hi Sehger, > >> >> > > >> >> > Segher Boessenkool writes: > >> >> > > >> >> >> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote: > >> >> >>> Segher Boessenkool writes: > >> >> >>> > No. insn_cost is only for correct, existing instructions, not for > >> >> >>> > made-up nonsense. I created insn_cost precisely to get away from that > >> >> >>> > aspect of rtx_cost (and some other issues, like, it is incredibly hard > >> >> >>> > and cumbersome to write a correct rtx_cost). > >> >> >>> > >> >> >>> Thanks! The implementations of hook insn_cost are align with this > >> >> >>> design, they are checking insn's attributes and COSTS_N_INSNS. > >> >> >>> > >> >> >>> One question on the speciall case: > >> >> >>> For instruction: "r119:DI=0x100803004101001" > >> >> >>> Would we treat it as valid instruction? > >> >> >> > >> >> >> Currently we do, alternative 6 in *movdi_internal64: we allow any r<-n. > >> >> >> This is costed as 5 insns (cost=20). > >> >> >> > >> >> >> It generally is better to split things into patterns close to the > >> >> >> eventual machine isntructions as early as possible: all the more generic > >> >> >> optimisations can take advantage of that then. > >> >> > Get it! > >> >> >> > >> >> >>> A patch, which is attached the end of this mail, accepts > >> >> >>> "r119:DI=0x100803004101001" as input of insn_cost. > >> >> >>> In this patch, > >> >> >>> - A tmp instruction is generated via make_insn_raw. > >> >> >>> - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost. > >> >> >>> - In hook of insn_cost, checking the special 'constant' instruction. > >> >> >>> Are these make sense? > >> >> >> > >> >> >> I'll review that patch inline. > >> >> > >> >> I drafted a new patch that replace rtx_cost with insn_cost for cse.cc. > >> >> Different from the previous partial patch, this patch replaces all usage > >> >> of rtx_cost. It may be better/aggressive than previous one. > >> > > >> > I think there's no advantage for using insn_cost over rtx_cost for > >> > the simple SET case. > >> > >> Thanks for your comments and raise this concern. > >> > >> For those targets which do not implement insn_cost, insn_cost calls > >> rtx_cost through pattern_cost, then insn_cost is equal to rtx_cost. > >> > >> While, for those targets which have insn_cost, it seems insn_cost would > >> be better(or say more accurate/consistent?) than rtx_cost. Since: > >> - insn_cost recog the insn first, and compute cost through something > > > > target hooks are expected to call recog on the insn but the generic > > fallback does not!? Or do you say a target _could_ call recog? > > I think it would be valid to only expect recognized insns here > > and thus cse.cc obligation would be to call regoc on the "fake" > > instruction which then raises the obvious issue whether you should > > ever call recog on something "fake". > Thanks Richard! I also feel this is a main point of insn_cost. > From my understanding: it would be better to let insn_cost check > the valid recognizable insns; this would be the major purpose of > insn_cost. While, I'm also wondering, we may let it go for 'fake' > instruction for current implementations (e.g. arm_insn_cost) which > behaviors to walk/check pattern. Note for the CSE case you'll always end up with the single move pattern so it's somewhat pointless to do the recog & insn_cost dance there. For move cost using rtx_cost (SET, ..) should be good enough. One could argue we should standardize a (wrapping) API like move_cost (enum machine_mode mode, rtx to, rtx from), with to/from optional (if omitted a REG of MODE). But the existing rtx_cost target hook implementation should be sufficient to handle it, without building a fake insn and without doing (the pointless, if not failing) recog process. Richard. > > > > I also see that rs6000_insn_cost does > > > > static int > > rs6000_insn_cost (rtx_insn *insn, bool speed) > > { > > if (recog_memoized (insn) < 0) > > return 0; > > > > so not recognized insns become quite cheap - it looks like > > insn_cost has no documented failure mode and initial implementors > > chickened out asserting that an insn is / can be recognized. > > In fact I'd assert that INSN_CODE (insn) >= 0 simply because there's > > no failure mode and the _caller_ should have made sure the > > insn can be recognized. That said, the insn_cost should do that. > Thanks! Using the assert would help to catch un-recog failures. > > > (is INSN_CODE () == 0 a real thing?) > 0 is specialy, it is some thing like CODE_FOR_nothing. :-) > > > > >> (like length/cost attributes from .md file) for the 'machine insn'. > >> - rtx_cost estimates the cost through analyzing the 'rtx content'. > >> The accurate estimation relates to the context. > >> > >> For a special example: "%r100 = C", as a previous patch, by tunning > >> target's rtx_cost hook, cost could be computed according to the value > >> of C. insn_cost may just model the cost in the define of the machine > >> instruction. > >> > >> These reasons are my initial thoughts. Segher may have better > >> explain. :-) > > > > OK, so in theory the targets insn_cost can use insn attributes > > which allows to keep the cost parts of an instruction close to the > > instruction patterns which is probably a good thing for > > maintainability. But as you point out this requires recognizing > > of possibly generated instructions. And before reload it has > > the issue that the alternative is not determined which means the > > true cost cannot be determined without walking the pattern, > > like on x86 where many instruction operands have both register > > and memory alternatives. > Right, when there is no alternative fits for true cost, it may have to > check the pattern like 'rtx_cost' then. :-) > > > BR, > Jiufu > > > > >> To replace rtx_cost with insn_cost, this patch build a SET instruction: > >> "%r = rtx_expr", then using "%r = rtx_expr" from insn_cost to simulate > >> the cost of "rtx_expr" from rtx_cost. > >> > >> > >> BR, > >> Jiufu > >> > >> > > >> > Richard. > >> > > >> >> With this patch, bootstrap pass. > >> >> From regtest, only output of fusion-p10-ldcmpi.c is changed, and the > >> >> change seems as expected. > >> >> > >> >> > >> >> BR, > >> >> Jiufu > >> >> > >> >> diff --git a/gcc/cse.cc b/gcc/cse.cc > >> >> index a18b599d324..e623ad298db 100644 > >> >> --- a/gcc/cse.cc > >> >> +++ b/gcc/cse.cc > >> >> @@ -262,6 +262,9 @@ static struct qty_table_elem *qty_table; > >> >> static rtx_insn *this_insn; > >> >> static bool optimize_this_for_speed_p; > >> >> > >> >> +/* Used for insn_cost. */ > >> >> +static rtx_insn *estimate_insn; > >> >> + > >> >> /* Index by register number, gives the number of the next (or > >> >> previous) register in the chain of registers sharing the same > >> >> value. > >> >> @@ -445,7 +448,7 @@ struct table_elt > >> >> /* Compute cost of X, as stored in the `cost' field of a table_elt. Fixed > >> >> hard registers and pointers into the frame are the cheapest with a cost > >> >> of 0. Next come pseudos with a cost of one and other hard registers with > >> >> - a cost of 2. Aside from these special cases, call `rtx_cost'. */ > >> >> + a cost of 2. Aside from these special cases, call `insn_cost'. */ > >> >> > >> >> #define CHEAP_REGNO(N) \ > >> >> (REGNO_PTR_FRAME_P (N) \ > >> >> @@ -698,18 +701,33 @@ preferable (int cost_a, int regcost_a, int cost_b, int regcost_b) > >> >> from COST macro to keep it simple. */ > >> >> > >> >> static int > >> >> -notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno) > >> >> +notreg_cost (rtx x, machine_mode mode, enum rtx_code /*outer*/, int /*opno*/) > >> >> { > >> >> scalar_int_mode int_mode, inner_mode; > >> >> - return ((GET_CODE (x) == SUBREG > >> >> - && REG_P (SUBREG_REG (x)) > >> >> - && is_int_mode (mode, &int_mode) > >> >> - && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode) > >> >> - && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode) > >> >> - && subreg_lowpart_p (x) > >> >> - && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode)) > >> >> - ? 0 > >> >> - : rtx_cost (x, mode, outer, opno, optimize_this_for_speed_p) * 2); > >> >> + if (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x)) > >> >> + && is_int_mode (mode, &int_mode) > >> >> + && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode) > >> >> + && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode) > >> >> + && subreg_lowpart_p (x) > >> >> + && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode)) > >> >> + return 0; > >> >> + > >> >> + if (estimate_insn == NULL) > >> >> + { > >> >> + estimate_insn = make_insn_raw ( > >> >> + gen_rtx_SET (gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1), x)); > >> >> + SET_PREV_INSN (estimate_insn) = NULL_RTX; > >> >> + SET_NEXT_INSN (estimate_insn) = NULL_RTX; > >> >> + INSN_LOCATION (estimate_insn) = 0; > >> >> + } > >> >> + else > >> >> + { > >> >> + /* Update for new context. */ > >> >> + INSN_CODE (estimate_insn) = -1; > >> >> + PUT_MODE (SET_DEST (PATTERN (estimate_insn)), mode); > >> >> + SET_SRC (PATTERN (estimate_insn)) = x; > >> >> + } > >> >> + return insn_cost (estimate_insn, optimize_this_for_speed_p); > >> >> } > >> >> > >> >> > >> >> @@ -6667,6 +6685,7 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs) > >> >> > >> >> init_recog (); > >> >> init_alias_analysis (); > >> >> + estimate_insn = NULL; > >> >> > >> >> reg_eqv_table = XNEWVEC (struct reg_eqv_elem, nregs); > >> >> > >> >> > >> > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)