From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11705 invoked by alias); 20 Oct 2014 06:57:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 11691 invoked by uid 89); 20 Oct 2014 06:57:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pd0-f175.google.com Received: from mail-pd0-f175.google.com (HELO mail-pd0-f175.google.com) (209.85.192.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 20 Oct 2014 06:57:48 +0000 Received: by mail-pd0-f175.google.com with SMTP id v10so4406691pde.20 for ; Sun, 19 Oct 2014 23:57:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:subject:date:message-id:cc:to :mime-version; bh=dHF7SZFaoqHNcXCG0DIIKna9QGdzTrWGu6Y+BtmvKF4=; b=lj+gc92SSV8txEUGYeXdVs1UX9HpJkRr87ekAg5orGO0vUwGpnoo+mH6aTPrFvD/FF VKxHl3oSkes5/t9N0tqYsmS43GCv2nkVqHY6yQhIecgsrR8U/rXyEPSBDSfAwtDwsZIw M1T87ynGlw+t8UPDc0oecGmJpJx/R5Ls/vBguzbX/gXWnAgciUFeSewAzC7hwrFp1Qzk flo8A+dZSOHEEvBstxh+7uFnfNtoed1vR5ZX1JHSYKEBYOmDL7VQ7rMuaT16G7OnH33r IVfo3VChIOqjrdhNKJyxaSF6tWyAktgZsnm/QSzod2plTmeD0ptroNk9NKjIQuWWdiiX mETg== X-Gm-Message-State: ALoCoQn3EVEwHjqmvfcdM8fZZQC1IG1t333Qu7/VMS8djVGvdvYzqYTH9o97G5gdNBJ2HAMtcID2 X-Received: by 10.68.209.230 with SMTP id mp6mr21375666pbc.27.1413788266847; Sun, 19 Oct 2014 23:57:46 -0700 (PDT) Received: from [192.168.0.122] (121-99-179-86.bng1.nct.orcon.net.nz. [121.99.179.86]) by mx.google.com with ESMTPSA id hp4sm8160560pbb.95.2014.10.19.23.57.43 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 19 Oct 2014 23:57:45 -0700 (PDT) From: Maxim Kuvyrkov Content-Type: multipart/mixed; boundary="Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF" Subject: [PATCH] Account for prologue spills in reg_pressure scheduling Date: Mon, 20 Oct 2014 07:03:00 -0000 Message-Id: Cc: Vladimir Makarov , Richard Sandiford To: GCC Patches Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-SW-Source: 2014-10/txt/msg01873.txt.bz2 --Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Content-length: 1521 Hi, This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGH= TED and SCHED_PRESSURE_MODEL) to better estimate number of available regist= ers. At the moment the scheduler does not account for spills in the prologues an= d restores in the epilogue, which occur from use of call-used registers. T= he current state is, essentially, optimized for case when there is a hot lo= op inside the function, and the loop executes significantly more often than= the prologue/epilogue. However, on the opposite end, we have a case when = the function is just a single non-cyclic basic block, which executes just a= s often as prologue / epilogue, so spills in the prologue hurt performance = as much as spills in the basic block itself. In such a case the scheduler = should throttle-down on the number of available registers and try to not go= beyond call-clobbered registers. The patch uses basic block frequencies to balance the cost of using call-us= ed registers for intermediate cases between the two above extremes. The motivation for this patch was a floating-point testcase on arm-linux-gn= ueabihf (ARM is one of the few targets that use register pressure schedulin= g by default). A "thanks" goes to Richard good discussion of the problem and suggestions o= n the approach to fix it. The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exerci= ses the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-g= nu. OK to apply? -- Maxim Kuvyrkov www.linaro.org --Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF Content-Disposition: attachment; filename=0001-sched_class_reg_num.ChangeLog Content-Type: application/octet-stream; name="0001-sched_class_reg_num.ChangeLog" Content-Transfer-Encoding: 7bit Content-length: 537 Account for prologue spills in reg_pressure scheduling * haifa-sched.c (sched_class_regs_num, call_used_regs_num): New static arrays. Use sched_class_regs_num instead of ira_class_hard_regs_num. (print_curr_reg_pressure, setup_insn_reg_pressure_info,) (model_update_pressure, model_spill_cost): Use sched_class_regs_num. (model_start_schedule): Update. (sched_pressure_start_bb): New static function. Calculate sched_class_regs_num. (schedule_block): Use it. (alloc_global_sched_pressure_data): Calculate call_used_regs_num. --Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF Content-Disposition: attachment; filename=0001-sched_class_reg_num.patch Content-Type: application/octet-stream; name="0001-sched_class_reg_num.patch" Content-Transfer-Encoding: quoted-printable Content-length: 8990 >From 12e043a184ad6773d3c42baf23bd2003f6ebe72d Mon Sep 17 00:00:00 2001=0A= From: Maxim Kuvyrkov =0A= Date: Mon, 20 Oct 2014 05:04:23 +0100=0A= Subject: [PATCH 1/2] sched_class_reg_num=0A= =0A= ---=0A= gcc/haifa-sched.c | 97 +++++++++++++++++++++++++++++++++++++++++++++----= ----=0A= 1 file changed, 83 insertions(+), 14 deletions(-)=0A= =0A= diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c=0A= index db8a45c..2b624a1 100644=0A= --- a/gcc/haifa-sched.c=0A= +++ b/gcc/haifa-sched.c=0A= @@ -933,6 +933,13 @@ static bitmap saved_reg_live;=0A= /* Registers mentioned in the current region. */=0A= static bitmap region_ref_regs;=0A= =20=0A= +/* Effective number of available registers of a given class (see comment= =0A= + in model_start_schedule). */=0A= +static int sched_class_regs_num[N_REG_CLASSES];=0A= +/* Number of call_used_regs. This is a helper for calculating of=0A= + sched_class_regs_num. */=0A= +static int call_used_regs_num[N_REG_CLASSES];=0A= +=0A= /* Initiate register pressure relative info for scheduling the current=0A= region. Currently it is only clearing register mentioned in the=0A= current region. */=0A= @@ -1116,7 +1123,7 @@ print_curr_reg_pressure (void)=0A= gcc_assert (curr_reg_pressure[cl] >=3D 0);=0A= fprintf (sched_dump, " %s:%d(%d)", reg_class_names[cl],=0A= curr_reg_pressure[cl],=0A= - curr_reg_pressure[cl] - ira_class_hard_regs_num[cl]);=0A= + curr_reg_pressure[cl] - sched_class_regs_num[cl]);=0A= }=0A= fprintf (sched_dump, "\n");=0A= }=0A= @@ -1731,9 +1738,9 @@ setup_insn_reg_pressure_info (rtx_insn *insn)=0A= cl =3D ira_pressure_classes[i];=0A= gcc_assert (curr_reg_pressure[cl] >=3D 0);=0A= change =3D (int) pressure_info[i].set_increase - death[cl];=0A= - before =3D MAX (0, max_reg_pressure[i] - ira_class_hard_regs_num[cl]= );=0A= + before =3D MAX (0, max_reg_pressure[i] - sched_class_regs_num[cl]);= =0A= after =3D MAX (0, max_reg_pressure[i] + change=0A= - - ira_class_hard_regs_num[cl]);=0A= + - sched_class_regs_num[cl]);=0A= hard_regno =3D ira_class_hard_regs[cl][0];=0A= gcc_assert (hard_regno >=3D 0);=0A= mode =3D reg_raw_mode[hard_regno];=0A= @@ -2070,7 +2077,7 @@ model_update_pressure (struct model_pressure_group *g= roup,=0A= =20=0A= /* Check whether the maximum pressure in the overall schedule=0A= has increased. (This means that the MODEL_MAX_PRESSURE of=0A= - every point <=3D POINT will need to increae too; see below.) */=0A= + every point <=3D POINT will need to increase too; see below.) */=0A= if (group->limits[pci].pressure < ref_pressure)=0A= group->limits[pci].pressure =3D ref_pressure;=0A= =20=0A= @@ -2347,7 +2354,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)=0A= /* Return the cost of increasing the pressure in class CL from FROM to TO.= =0A= =20=0A= Here we use the very simplistic cost model that every register above=0A= - ira_class_hard_regs_num[CL] has a spill cost of 1. We could use other= =0A= + sched_class_regs_num[CL] has a spill cost of 1. We could use other=0A= measures instead, such as one based on MEMORY_MOVE_COST. However:=0A= =20=0A= (1) In order for an instruction to be scheduled, the higher cost=0A= @@ -2371,7 +2378,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)=0A= static int=0A= model_spill_cost (int cl, int from, int to)=0A= {=0A= - from =3D MAX (from, ira_class_hard_regs_num[cl]);=0A= + from =3D MAX (from, sched_class_regs_num[cl]);=0A= return MAX (to, from) - from;=0A= }=0A= =20=0A= @@ -2477,7 +2484,7 @@ model_set_excess_costs (rtx_insn **insns, int count)= =0A= bool print_p;=0A= =20=0A= /* Record the baseECC value for each instruction in the model schedule,= =0A= - except that negative costs are converted to zero ones now rather that= n=0A= + except that negative costs are converted to zero ones now rather than= =0A= later. Do not assign a cost to debug instructions, since they must= =0A= not change code-generation decisions. Experiments suggest we also=0A= get better results by not assigning a cost to instructions from=0A= @@ -3727,15 +3734,13 @@ model_dump_pressure_summary (void)=0A= scheduling region. */=0A= =20=0A= static void=0A= -model_start_schedule (void)=0A= +model_start_schedule (basic_block bb)=0A= {=0A= - basic_block bb;=0A= -=0A= model_next_priority =3D 1;=0A= model_schedule.create (sched_max_luid);=0A= model_insns =3D XCNEWVEC (struct model_insn_info, sched_max_luid);=0A= =20=0A= - bb =3D BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head));=0A= + gcc_assert (bb =3D=3D BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->pre= v_head)));=0A= initiate_reg_pressure_info (df_get_live_in (bb));=0A= =20=0A= model_analyze_insns ();=0A= @@ -3773,6 +3778,53 @@ model_end_schedule (void)=0A= model_finalize_pressure_group (&model_before_pressure);=0A= model_schedule.release ();=0A= }=0A= +=0A= +/* Prepare reg pressure scheduling for basic block BB. */=0A= +static void=0A= +sched_pressure_start_bb (basic_block bb)=0A= +{=0A= + /* Set the number of available registers for each class taking into acco= unt=0A= + relative probability of current basic block versus function prologue = and=0A= + epilogue.=0A= + * If the basic block executes much more often than the prologue/epilo= gue=0A= + (e.g., inside a hot loop), then cost of spill in the prologue is clos= e to=0A= + nil, so the effective number of available registers is=0A= + (ira_class_hard_regs_num[cl] - 0).=0A= + * If the basic block executes as often as the prologue/epilogue,=0A= + then spill in the block is as costly as in the prologue, so the effec= tive=0A= + number of available registers is=0A= + (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).=0A= + Note that all-else-equal, we prefer to spill in the prologue, since t= hat=0A= + allows "extra" registers for other basic blocks of the function.=0A= + * If the basic block is on the cold path of the function and executes= =0A= + rarely, then we should always prefer to spill in the block, rather th= an=0A= + in the prologue/epilogue. The effective number of available register= is=0A= + (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]). */=0A= + {=0A= + int i;=0A= + int entry_freq =3D ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency;=0A= + int bb_freq =3D bb->frequency;=0A= +=0A= + if (bb_freq =3D=3D 0)=0A= + {=0A= + if (entry_freq =3D=3D 0)=0A= + entry_freq =3D bb_freq =3D 1;=0A= + }=0A= + if (bb_freq < entry_freq)=0A= + bb_freq =3D entry_freq;=0A= +=0A= + for (i =3D 0; i < ira_pressure_classes_num; ++i)=0A= + {=0A= + enum reg_class cl =3D ira_pressure_classes[i];=0A= + sched_class_regs_num[cl] =3D ira_class_hard_regs_num[cl];=0A= + sched_class_regs_num[cl]=0A= + -=3D (call_used_regs_num[cl] * entry_freq) / bb_freq;=0A= + }=0A= + }=0A= +=0A= + if (sched_pressure =3D=3D SCHED_PRESSURE_MODEL)=0A= + model_start_schedule (bb);=0A= +}=0A= =0C=0A= /* A structure that holds local state for the loop in schedule_block. */= =0A= struct sched_block_state=0A= @@ -6053,8 +6105,8 @@ schedule_block (basic_block *target_bb, state_t init_= state)=0A= in try_ready () (which is called through init_ready_list ()). */=0A= (*current_sched_info->init_ready_list) ();=0A= =20=0A= - if (sched_pressure =3D=3D SCHED_PRESSURE_MODEL)=0A= - model_start_schedule ();=0A= + if (sched_pressure)=0A= + sched_pressure_start_bb (*target_bb);=0A= =20=0A= /* The algorithm is O(n^2) in the number of ready insns at any given=0A= time in the worst case. Before reload we are more likely to have=0A= @@ -6681,7 +6733,7 @@ alloc_global_sched_pressure_data (void)=0A= {=0A= if (sched_pressure !=3D SCHED_PRESSURE_NONE)=0A= {=0A= - int i, max_regno =3D max_reg_num ();=0A= + int i, c, max_regno =3D max_reg_num ();=0A= =20=0A= if (sched_dump !=3D NULL)=0A= /* We need info about pseudos for rtl dumps about pseudo=0A= @@ -6701,6 +6753,23 @@ alloc_global_sched_pressure_data (void)=0A= saved_reg_live =3D BITMAP_ALLOC (NULL);=0A= region_ref_regs =3D BITMAP_ALLOC (NULL);=0A= }=0A= +=0A= + /* Calculate number of CALL_USED_REGS in register classes that=0A= + we calculate register pressure for. */=0A= + for (c =3D 0; c < ira_pressure_classes_num; ++c)=0A= + {=0A= + enum reg_class cl =3D ira_pressure_classes[c];=0A= + call_used_regs_num[cl] =3D 0;=0A= + }=0A= +=0A= + for (i =3D 0; i < FIRST_PSEUDO_REGISTER; ++i)=0A= + if (call_used_regs[i])=0A= + for (c =3D 0; c < ira_pressure_classes_num; ++c)=0A= + {=0A= + enum reg_class cl =3D ira_pressure_classes[c];=0A= + if (ira_class_hard_regs[cl][i])=0A= + ++call_used_regs_num[cl];=0A= + }=0A= }=0A= }=0A= =20=0A= --=20=0A= 1.7.9.5=0A= =0A= --Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF--