From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 111192 invoked by alias); 22 Jun 2016 19:11:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 111180 invoked by uid 89); 22 Jun 2016 19:10:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=sch, determines, power8, scheduled X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 22 Jun 2016 19:10:49 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.13.8) with ESMTP id u5MJAjtT028789; Wed, 22 Jun 2016 14:10:45 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id u5MJAilL028788; Wed, 22 Jun 2016 14:10:44 -0500 Date: Wed, 22 Jun 2016 19:11:00 -0000 From: Segher Boessenkool To: Pat Haugen Cc: GCC Patches , David Edelsohn Subject: Re: [PATCH, rs6000] Scheduling update Message-ID: <20160622191044.GA15865@gate.crashing.org> References: <57697D36.3040704@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <57697D36.3040704@linux.vnet.ibm.com> User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2016-06/txt/msg01639.txt.bz2 Hi Pat, On Tue, Jun 21, 2016 at 12:45:26PM -0500, Pat Haugen wrote: > 2016-06-21 Pat Haugen > > * config/rs6000/power8.md (power8-fp): Include dfp type. > * config/rs6000/power6.md (power6-fp): Likewise. Please put the files in the changelog in some logical order, even if your patch tool is a bit dumb. > * config/rs6000/htm.md (various insns): Change type atribute to > htmsimple and set power9_alu2 appropriately. "attribute", trailing space. The "power9_alu2" attribute is writing part of the scheduling description inside the machine description proper. Can this be reduced, maybe by adding an attribute describing something about the insns that makes them be handled by the alu2? I realise it isn't all so regular :-( > (rs6000_option_override_internal): Remove temporary code setting > tuning to power8. Don't set rs6000_sched_groups for power9. Two spaces after full stop. > (divCnt, vec_load_pendulum): New variables. camelCase? > (_rs6000_sched_context, rs6000_init_sched_context, > rs6000_set_sched_context): Handle context save/restore of new > variables. Pre-existing, but we shouldn't use names starting with underscore+lowercase. > Index: config/rs6000/htm.md > =================================================================== > --- config/rs6000/htm.md (revision 237621) > +++ config/rs6000/htm.md (working copy) > @@ -72,7 +72,8 @@ (define_insn "*tabort" > (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))] > "TARGET_HTM" > "tabort. %0" > - [(set_attr "type" "htm") > + [(set_attr "type" "htmsimple") > + (set_attr "power9_alu2" "yes") > (set_attr "length" "4")]) What determines if an insn is htm or htmsimple? > +(define_cpu_unit "x0_power9,x1_power9,xa0_power9,xa1_power9,\ > + x2_power9,x3_power9,xb0_power9,xb1_power9, > + br0_power9,br1_power9" "power9dsp") One lines has a backslash and one does not. None are needed I think? > +; The xa0/xa1 units really represent the 3rd dispatch port for a superslice but > +; are listed as separate units to allow those insns that preclude its use to > +; still be scheduled two to a superslice while reserving the 3rd slot. The > +; same applies for xb0/xb1. Two spaces after a full stop. > +; Any execution slice dispatch Trailing space. > +; Superslice > +(define_reservation "DU_super_power9" > + "x0_power9+x1_power9|x2_power9+x3_power9") This needs parens around the alternatives? Or is it superfluous in all the other cases that use it? > +(define_reservation "LSU_pair_power9" > + "lsu0_power9+lsu1_power9|lsu1_power9+lsu2_power9|\ > + lsu2_power9+lsu3_power9|lsu3_power9+lsu1_power9") The 3+1 looks strange, please check (we've talked about that). > +; 2 cycle FP ops > +(define_attr "power9_fp_2cyc" "no,yes" > + (cond [(eq_attr "mnemonic" "fabs,fcpsgn,fmr,fmrgow,fnabs,fneg,\ > + xsabsdp,xscpsgndp,xsnabsdp,xsnegdp,\ > + xsabsqp,xscpsgnqp,xsnabsqp,xsnegqp") > + (const_string "yes")] > + (const_string "no"))) Eww. Can we have an attribute for the FP move instructions, instead? Maybe a value "fpmove" for the "type", even? > +; Quad-precision FP ops, execute in DFU > +(define_attr "power9_qp" "no,yes" > + (if_then_else (ior (match_operand:KF 0 "" "") > + (match_operand:TF 0 "" "") > + (match_operand:KF 1 "" "") > + (match_operand:TF 1 "" "")) > + (const_string "yes") > + (const_string "no"))) (The "" are not needed I think). This perhaps could be better handled with the "size" attribute. > +(define_insn_reservation "power9-load-ext" 6 > + (and (eq_attr "type" "load") > + (eq_attr "sign_extend" "yes") > + (eq_attr "update" "no") > + (eq_attr "cpu" "power9")) > + "DU_C2_power9,LSU_power9") So you do not describe the units used after the first cycle? Why is that, to keep the size of the automaton down? > +(define_insn_reservation "power9-fpload-double" 4 > + (and (eq_attr "type" "fpload") > + (eq_attr "update" "no") > + (match_operand:DF 0 "" "") > + (eq_attr "cpu" "power9")) > + "DU_slice_3_power9,LSU_power9") Using match_operand here is asking for trouble. "size", and you can default that for "fpload" insns, and document there that it looks at the mode of operands[0] for fpload? > +; Store data can issue 2 cycles after AGEN issue, 3 cycles for vector store > +(define_insn_reservation "power9-store" 0 > + (and (eq_attr "type" "store") > + (not (and (eq_attr "update" "yes") > + (eq_attr "indexed" "yes"))) > + (eq_attr "cpu" "power9")) > + "DU_slice_3_power9,LSU_power9") That should be +(define_insn_reservation "power9-store" 0 + (and (eq_attr "type" "store") + (eq_attr "update" "no") + (eq_attr "indexed" "no") + (eq_attr "cpu" "power9")) + "DU_slice_3_power9,LSU_power9") > +; Fixed point ops > + > +; Most ALU insns are simple 2 cycl, including record form "cycl"? > +(define_insn_reservation "power9-alu" 2 > + (and (ior (eq_attr "type" "add,cmp,exts,integer,logical,trap,isel") > + (and (eq_attr "type" "insert,shift") > + (eq_attr "dot" "no"))) > + (eq_attr "cpu" "power9")) > + "DU_any_power9,VSU_power9") > + > +; Record form rotate/shift are cracked > +(define_insn_reservation "power9-cracked-alu" 2 > + (and (eq_attr "type" "insert,shift") > + (eq_attr "dot" "yes") > + (eq_attr "cpu" "power9")) > + "DU_C2_power9,VSU_power9") > +; 4 cycle CR latency Trailing space. > +(define_insn_reservation "power9-alu2" 3 > + (and (eq_attr "type" "cntlz,popcnt") > + (eq_attr "cpu" "power9")) > + "DU_any_power9,VSU_power9") These alu2 insns are nice and clean and easy, maybe we can do something similar for (at least some of) the others. > +; FP div/sqrt are executed in VSU slices, not pipelined for other divides, but for the > +; most part do not block pipelined ops. Line too long. Maybe you can rewrite it, it's not super clear ;-) > +(define_insn_reservation "power9-vecnormal" 7 > + (and (eq_attr "type" "vecfloat,vecdouble") > + (eq_attr "power9_fp_2cyc" "no") > + (eq_attr "power9_alu2" "no") > + (eq_attr "power9_qp" "no") > + (eq_attr "cpu" "power9")) > + "DU_super_power9,VSU_super_power9") So this has all three "eww"s :-) > +(define_insn_reservation "power9-vecnormal2" 2 > + (and (eq_attr "type" "vecfloat") > + (eq_attr "power9_fp_2cyc" "yes") > + (eq_attr "cpu" "power9")) > + "DU_super_power9,VSU_super_power9") > + > +(define_insn_reservation "power9-vecnormal-alu2" 3 > + (and (eq_attr "type" "vecfloat,vecdouble") > + (eq_attr "power9_alu2" "yes") > + (eq_attr "cpu" "power9")) > + "DU_super_power9,VSU_super_power9") > + > +(define_insn_reservation "power9-qp" 12 > + (and (eq_attr "type" "vecfloat,vecdouble") > + (eq_attr "power9_qp" "yes") > + (eq_attr "cpu" "power9")) > + "DU_super_power9,dfu_power9") ... luckily they are exclusive. > +; Crytpo Unit Typo. > /* The following variable value is the last issued insn. */ > > -static rtx last_scheduled_insn; > +static rtx_insn * last_scheduled_insn; No space after *. > +/* The following variables are used to keep track of various scheduling > + information. */ > +static int divCnt; > +static int vec_load_pendulum; "divCnt" is typographically wrong, and not very descriptive either (and neither is the comment btw). It's not like this variable is used so often, you can make it a bit longer name if that helps. > @@ -30144,6 +30134,8 @@ rs6000_adjust_cost (rtx_insn *insn, rtx > break; > } > } > + /* Fall through, no cost for output dependency. */ Two spaces after full stop. > @@ -30786,6 +30801,10 @@ static int > rs6000_sched_reorder2 (FILE *dump, int sched_verbose, rtx_insn **ready, > int *pn_ready, int clock_var ATTRIBUTE_UNUSED) We can write this as int *pn_ready, int /*clock_var*/) nowadays. Not sure which is preferred. > { > + int pos; > + int i; > + rtx_insn *tmp; Moving these to an outer scope is really a step back. The new code could just declare them itself; in fact, it should probably be a separate function anyway. > + /* Try to issue fixed point divides back-to-back in pairs so they will > + be routed to separate execution units and execute in parallel. */ Two spaces after dot (many times here). > + pos = *pn_ready-1; Spaces around "-" (many times). Maybe you want an extra variable to hold this value, you use it a lot? lastpos or something. > + 3 : 2 vector insns and a vector load have been issued and another > + vector load has been found and moved to the end of the ready > + list. > + */ */ should not be on a line on its own. > @@ -31699,8 +31933,10 @@ rs6000_sched_finish (FILE *dump, int sch > struct _rs6000_sched_context > { > short cached_can_issue_more; > - rtx last_scheduled_insn; > + rtx_insn * last_scheduled_insn; No space after asterisk. Segher