From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-381098-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 11705 invoked by alias); 20 Oct 2014 06:57:52 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 11691 invoked by uid 89); 20 Oct 2014 06:57:51 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-pd0-f175.google.com
Received: from mail-pd0-f175.google.com (HELO mail-pd0-f175.google.com) (209.85.192.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 20 Oct 2014 06:57:48 +0000
Received: by mail-pd0-f175.google.com with SMTP id v10so4406691pde.20        for <gcc-patches@gcc.gnu.org>; Sun, 19 Oct 2014 23:57:47 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:from:content-type:subject:date:message-id:cc:to         :mime-version;        bh=dHF7SZFaoqHNcXCG0DIIKna9QGdzTrWGu6Y+BtmvKF4=;        b=lj+gc92SSV8txEUGYeXdVs1UX9HpJkRr87ekAg5orGO0vUwGpnoo+mH6aTPrFvD/FF         VKxHl3oSkes5/t9N0tqYsmS43GCv2nkVqHY6yQhIecgsrR8U/rXyEPSBDSfAwtDwsZIw         M1T87ynGlw+t8UPDc0oecGmJpJx/R5Ls/vBguzbX/gXWnAgciUFeSewAzC7hwrFp1Qzk         flo8A+dZSOHEEvBstxh+7uFnfNtoed1vR5ZX1JHSYKEBYOmDL7VQ7rMuaT16G7OnH33r         IVfo3VChIOqjrdhNKJyxaSF6tWyAktgZsnm/QSzod2plTmeD0ptroNk9NKjIQuWWdiiX         mETg==
X-Gm-Message-State: ALoCoQn3EVEwHjqmvfcdM8fZZQC1IG1t333Qu7/VMS8djVGvdvYzqYTH9o97G5gdNBJ2HAMtcID2
X-Received: by 10.68.209.230 with SMTP id mp6mr21375666pbc.27.1413788266847;        Sun, 19 Oct 2014 23:57:46 -0700 (PDT)
Received: from [192.168.0.122] (121-99-179-86.bng1.nct.orcon.net.nz. [121.99.179.86])        by mx.google.com with ESMTPSA id hp4sm8160560pbb.95.2014.10.19.23.57.43        for <multiple recipients>        (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);        Sun, 19 Oct 2014 23:57:45 -0700 (PDT)
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Content-Type: multipart/mixed; boundary="Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF"
Subject: [PATCH] Account for prologue spills in reg_pressure scheduling
Date: Mon, 20 Oct 2014 07:03:00 -0000
Message-Id: <FE7CEF4D-A27B-46D0-96B3-3569C1558DF9@linaro.org>
Cc: Vladimir Makarov <vmakarov@redhat.com>, Richard Sandiford <rdsandiford@googlemail.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
X-SW-Source: 2014-10/txt/msg01873.txt.bz2


--Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii
Content-length: 1521

Hi,

This patch improves register pressure scheduling (both SCHED_PRESSURE_WEIGH=
TED and SCHED_PRESSURE_MODEL) to better estimate number of available regist=
ers.

At the moment the scheduler does not account for spills in the prologues an=
d restores in the epilogue, which occur from use of call-used registers.  T=
he current state is, essentially, optimized for case when there is a hot lo=
op inside the function, and the loop executes significantly more often than=
 the prologue/epilogue.  However, on the opposite end, we have a case when =
the function is just a single non-cyclic basic block, which executes just a=
s often as prologue / epilogue, so spills in the prologue hurt performance =
as much as spills in the basic block itself.  In such a case the scheduler =
should throttle-down on the number of available registers and try to not go=
 beyond call-clobbered registers.

The patch uses basic block frequencies to balance the cost of using call-us=
ed registers for intermediate cases between the two above extremes.

The motivation for this patch was a floating-point testcase on arm-linux-gn=
ueabihf (ARM is one of the few targets that use register pressure schedulin=
g by default).

A "thanks" goes to Richard good discussion of the problem and suggestions o=
n the approach to fix it.

The patch was bootstrapped on x86_64-linux-gnu (which doesn't really exerci=
ses the patch), and cross-tested on arm-linux-gnueabihf and aarch64-linux-g=
nu.

OK to apply?

--
Maxim Kuvyrkov
www.linaro.org



--Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF
Content-Disposition: attachment;
	filename=0001-sched_class_reg_num.ChangeLog
Content-Type: application/octet-stream;
	name="0001-sched_class_reg_num.ChangeLog"
Content-Transfer-Encoding: 7bit
Content-length: 537

Account for prologue spills in reg_pressure scheduling

	* haifa-sched.c (sched_class_regs_num, call_used_regs_num): New static
	arrays.  Use sched_class_regs_num instead of ira_class_hard_regs_num.
	(print_curr_reg_pressure, setup_insn_reg_pressure_info,)
	(model_update_pressure, model_spill_cost): Use sched_class_regs_num.
	(model_start_schedule): Update.
	(sched_pressure_start_bb): New static function.  Calculate
	sched_class_regs_num.
	(schedule_block): Use it.
	(alloc_global_sched_pressure_data): Calculate call_used_regs_num.

--Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF
Content-Disposition: attachment;
	filename=0001-sched_class_reg_num.patch
Content-Type: application/octet-stream;
	name="0001-sched_class_reg_num.patch"
Content-Transfer-Encoding: quoted-printable
Content-length: 8990

>From 12e043a184ad6773d3c42baf23bd2003f6ebe72d Mon Sep 17 00:00:00 2001=0A=
From: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>=0A=
Date: Mon, 20 Oct 2014 05:04:23 +0100=0A=
Subject: [PATCH 1/2] sched_class_reg_num=0A=
=0A=
---=0A=
 gcc/haifa-sched.c |   97 +++++++++++++++++++++++++++++++++++++++++++++----=
----=0A=
 1 file changed, 83 insertions(+), 14 deletions(-)=0A=
=0A=
diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c=0A=
index db8a45c..2b624a1 100644=0A=
--- a/gcc/haifa-sched.c=0A=
+++ b/gcc/haifa-sched.c=0A=
@@ -933,6 +933,13 @@ static bitmap saved_reg_live;=0A=
 /* Registers mentioned in the current region.  */=0A=
 static bitmap region_ref_regs;=0A=
=20=0A=
+/* Effective number of available registers of a given class (see comment=
=0A=
+   in model_start_schedule).  */=0A=
+static int sched_class_regs_num[N_REG_CLASSES];=0A=
+/* Number of call_used_regs.  This is a helper for calculating of=0A=
+   sched_class_regs_num.  */=0A=
+static int call_used_regs_num[N_REG_CLASSES];=0A=
+=0A=
 /* Initiate register pressure relative info for scheduling the current=0A=
    region.  Currently it is only clearing register mentioned in the=0A=
    current region.  */=0A=
@@ -1116,7 +1123,7 @@ print_curr_reg_pressure (void)=0A=
       gcc_assert (curr_reg_pressure[cl] >=3D 0);=0A=
       fprintf (sched_dump, "  %s:%d(%d)", reg_class_names[cl],=0A=
 	       curr_reg_pressure[cl],=0A=
-	       curr_reg_pressure[cl] - ira_class_hard_regs_num[cl]);=0A=
+	       curr_reg_pressure[cl] - sched_class_regs_num[cl]);=0A=
     }=0A=
   fprintf (sched_dump, "\n");=0A=
 }=0A=
@@ -1731,9 +1738,9 @@ setup_insn_reg_pressure_info (rtx_insn *insn)=0A=
       cl =3D ira_pressure_classes[i];=0A=
       gcc_assert (curr_reg_pressure[cl] >=3D 0);=0A=
       change =3D (int) pressure_info[i].set_increase - death[cl];=0A=
-      before =3D MAX (0, max_reg_pressure[i] - ira_class_hard_regs_num[cl]=
);=0A=
+      before =3D MAX (0, max_reg_pressure[i] - sched_class_regs_num[cl]);=
=0A=
       after =3D MAX (0, max_reg_pressure[i] + change=0A=
-		   - ira_class_hard_regs_num[cl]);=0A=
+		   - sched_class_regs_num[cl]);=0A=
       hard_regno =3D ira_class_hard_regs[cl][0];=0A=
       gcc_assert (hard_regno >=3D 0);=0A=
       mode =3D reg_raw_mode[hard_regno];=0A=
@@ -2070,7 +2077,7 @@ model_update_pressure (struct model_pressure_group *g=
roup,=0A=
=20=0A=
       /* Check whether the maximum pressure in the overall schedule=0A=
 	 has increased.  (This means that the MODEL_MAX_PRESSURE of=0A=
-	 every point <=3D POINT will need to increae too; see below.)  */=0A=
+	 every point <=3D POINT will need to increase too; see below.)  */=0A=
       if (group->limits[pci].pressure < ref_pressure)=0A=
 	group->limits[pci].pressure =3D ref_pressure;=0A=
=20=0A=
@@ -2347,7 +2354,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)=0A=
 /* Return the cost of increasing the pressure in class CL from FROM to TO.=
=0A=
=20=0A=
    Here we use the very simplistic cost model that every register above=0A=
-   ira_class_hard_regs_num[CL] has a spill cost of 1.  We could use other=
=0A=
+   sched_class_regs_num[CL] has a spill cost of 1.  We could use other=0A=
    measures instead, such as one based on MEMORY_MOVE_COST.  However:=0A=
=20=0A=
       (1) In order for an instruction to be scheduled, the higher cost=0A=
@@ -2371,7 +2378,7 @@ must_restore_pattern_p (rtx_insn *next, dep_t dep)=0A=
 static int=0A=
 model_spill_cost (int cl, int from, int to)=0A=
 {=0A=
-  from =3D MAX (from, ira_class_hard_regs_num[cl]);=0A=
+  from =3D MAX (from, sched_class_regs_num[cl]);=0A=
   return MAX (to, from) - from;=0A=
 }=0A=
=20=0A=
@@ -2477,7 +2484,7 @@ model_set_excess_costs (rtx_insn **insns, int count)=
=0A=
   bool print_p;=0A=
=20=0A=
   /* Record the baseECC value for each instruction in the model schedule,=
=0A=
-     except that negative costs are converted to zero ones now rather that=
n=0A=
+     except that negative costs are converted to zero ones now rather than=
=0A=
      later.  Do not assign a cost to debug instructions, since they must=
=0A=
      not change code-generation decisions.  Experiments suggest we also=0A=
      get better results by not assigning a cost to instructions from=0A=
@@ -3727,15 +3734,13 @@ model_dump_pressure_summary (void)=0A=
    scheduling region.  */=0A=
=20=0A=
 static void=0A=
-model_start_schedule (void)=0A=
+model_start_schedule (basic_block bb)=0A=
 {=0A=
-  basic_block bb;=0A=
-=0A=
   model_next_priority =3D 1;=0A=
   model_schedule.create (sched_max_luid);=0A=
   model_insns =3D XCNEWVEC (struct model_insn_info, sched_max_luid);=0A=
=20=0A=
-  bb =3D BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->prev_head));=0A=
+  gcc_assert (bb =3D=3D BLOCK_FOR_INSN (NEXT_INSN (current_sched_info->pre=
v_head)));=0A=
   initiate_reg_pressure_info (df_get_live_in (bb));=0A=
=20=0A=
   model_analyze_insns ();=0A=
@@ -3773,6 +3778,53 @@ model_end_schedule (void)=0A=
   model_finalize_pressure_group (&model_before_pressure);=0A=
   model_schedule.release ();=0A=
 }=0A=
+=0A=
+/* Prepare reg pressure scheduling for basic block BB.  */=0A=
+static void=0A=
+sched_pressure_start_bb (basic_block bb)=0A=
+{=0A=
+  /* Set the number of available registers for each class taking into acco=
unt=0A=
+     relative probability of current basic block versus function prologue =
and=0A=
+     epilogue.=0A=
+     * If the basic block executes much more often than the prologue/epilo=
gue=0A=
+     (e.g., inside a hot loop), then cost of spill in the prologue is clos=
e to=0A=
+     nil, so the effective number of available registers is=0A=
+     (ira_class_hard_regs_num[cl] - 0).=0A=
+     * If the basic block executes as often as the prologue/epilogue,=0A=
+     then spill in the block is as costly as in the prologue, so the effec=
tive=0A=
+     number of available registers is=0A=
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).=0A=
+     Note that all-else-equal, we prefer to spill in the prologue, since t=
hat=0A=
+     allows "extra" registers for other basic blocks of the function.=0A=
+     * If the basic block is on the cold path of the function and executes=
=0A=
+     rarely, then we should always prefer to spill in the block, rather th=
an=0A=
+     in the prologue/epilogue.  The effective number of available register=
 is=0A=
+     (ira_class_hard_regs_num[cl] - call_used_regs_num[cl]).  */=0A=
+  {=0A=
+    int i;=0A=
+    int entry_freq =3D ENTRY_BLOCK_PTR_FOR_FN (cfun)->frequency;=0A=
+    int bb_freq =3D bb->frequency;=0A=
+=0A=
+    if (bb_freq =3D=3D 0)=0A=
+      {=0A=
+	if (entry_freq =3D=3D 0)=0A=
+	  entry_freq =3D bb_freq =3D 1;=0A=
+      }=0A=
+    if (bb_freq < entry_freq)=0A=
+      bb_freq =3D entry_freq;=0A=
+=0A=
+    for (i =3D 0; i < ira_pressure_classes_num; ++i)=0A=
+      {=0A=
+	enum reg_class cl =3D ira_pressure_classes[i];=0A=
+	sched_class_regs_num[cl] =3D ira_class_hard_regs_num[cl];=0A=
+	sched_class_regs_num[cl]=0A=
+	  -=3D (call_used_regs_num[cl] * entry_freq) / bb_freq;=0A=
+      }=0A=
+  }=0A=
+=0A=
+  if (sched_pressure =3D=3D SCHED_PRESSURE_MODEL)=0A=
+    model_start_schedule (bb);=0A=
+}=0A=
 =0C=0A=
 /* A structure that holds local state for the loop in schedule_block.  */=
=0A=
 struct sched_block_state=0A=
@@ -6053,8 +6105,8 @@ schedule_block (basic_block *target_bb, state_t init_=
state)=0A=
      in try_ready () (which is called through init_ready_list ()).  */=0A=
   (*current_sched_info->init_ready_list) ();=0A=
=20=0A=
-  if (sched_pressure =3D=3D SCHED_PRESSURE_MODEL)=0A=
-    model_start_schedule ();=0A=
+  if (sched_pressure)=0A=
+    sched_pressure_start_bb (*target_bb);=0A=
=20=0A=
   /* The algorithm is O(n^2) in the number of ready insns at any given=0A=
      time in the worst case.  Before reload we are more likely to have=0A=
@@ -6681,7 +6733,7 @@ alloc_global_sched_pressure_data (void)=0A=
 {=0A=
   if (sched_pressure !=3D SCHED_PRESSURE_NONE)=0A=
     {=0A=
-      int i, max_regno =3D max_reg_num ();=0A=
+      int i, c, max_regno =3D max_reg_num ();=0A=
=20=0A=
       if (sched_dump !=3D NULL)=0A=
 	/* We need info about pseudos for rtl dumps about pseudo=0A=
@@ -6701,6 +6753,23 @@ alloc_global_sched_pressure_data (void)=0A=
 	  saved_reg_live =3D BITMAP_ALLOC (NULL);=0A=
 	  region_ref_regs =3D BITMAP_ALLOC (NULL);=0A=
 	}=0A=
+=0A=
+      /* Calculate number of CALL_USED_REGS in register classes that=0A=
+	 we calculate register pressure for.  */=0A=
+      for (c =3D 0; c < ira_pressure_classes_num; ++c)=0A=
+	{=0A=
+	  enum reg_class cl =3D ira_pressure_classes[c];=0A=
+	  call_used_regs_num[cl] =3D 0;=0A=
+	}=0A=
+=0A=
+      for (i =3D 0; i < FIRST_PSEUDO_REGISTER; ++i)=0A=
+	if (call_used_regs[i])=0A=
+	  for (c =3D 0; c < ira_pressure_classes_num; ++c)=0A=
+	    {=0A=
+	      enum reg_class cl =3D ira_pressure_classes[c];=0A=
+	      if (ira_class_hard_regs[cl][i])=0A=
+		++call_used_regs_num[cl];=0A=
+	    }=0A=
     }=0A=
 }=0A=
=20=0A=
--=20=0A=
1.7.9.5=0A=
=0A=

--Apple-Mail=_D0151207-D3B4-433A-8D16-0C2F7C77E9FF--