From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-508966-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 42479 invoked by alias); 12 Sep 2019 16:19:59 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 42466 invoked by uid 89); 12 Sep 2019 16:19:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=consequently, Favor
X-HELO: mail-qt1-f173.google.com
Received: from mail-qt1-f173.google.com (HELO mail-qt1-f173.google.com) (209.85.160.173) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Sep 2019 16:19:55 +0000
Received: by mail-qt1-f173.google.com with SMTP id l22so30234807qtp.10        for <gcc-patches@gcc.gnu.org>; Thu, 12 Sep 2019 09:19:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=embecosm.com; s=google;        h=from:subject:openpgp:autocrypt:to:cc:message-id:date:user-agent         :mime-version:content-language;        bh=u/yJFF3eTcDRwqX6+OdzF3rrZv1m+RHmHievZlGEHo0=;        b=f5VUEXftdbxvhBQ5LFUcYd+EYh0PPBYcRowwMmfUq/jXjmoB+8OmkM4IQh31vyh6sB         bRtRrJIIQ2X6IiI+lywi35Oh2bZP1R6c00/paHTap7VR9+E8OSiAWvTI2gjvS6rpZ/FG         an8wThEns4a03u6QTwYCRmmQL+FFYqRSfQws6v2fLnn8XSDPz1Eme0MH4e6xfGvfFYKB         v8JX1Nm3P7gBVcC/aOD3omAhO2jehw3C5u9K5NCUADkhV/eZX/mD3Y0/5c9n3VViJv69         SQ0e8db3GI2JzbO2jv1u84RLKD7/ja24tZn0iWkbar6zQirFZsAzvi3ISSZqo05pa1My         ckZQ==
Return-Path: <craig.blackmore@embecosm.com>
Received: from [10.39.127.205] ([207.253.95.5])        by smtp.gmail.com with ESMTPSA id k199sm12755518qke.45.2019.09.12.09.19.52        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);        Thu, 12 Sep 2019 09:19:52 -0700 (PDT)
From: Craig Blackmore <craig.blackmore@embecosm.com>
Subject: [PATCH] RISC-V: Allow more load/stores to be compressed
Openpgp: preference=signencrypt
To: gcc-patches@gcc.gnu.org
Cc: jimw@sifive.com, Ofer Shinaar <Ofer.Shinaar@wdc.com>, Nidal Faour <Nidal.Faour@wdc.com>
Message-ID: <f413b5c6-1ba5-b83a-a7b7-991fb0e3d8b2@embecosm.com>
Date: Thu, 12 Sep 2019 16:19:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2019-09/txt/msg00857.txt.bz2

This patch aims to allow more load/store instructions to be compressed by
replacing a load/store of 'base register + large offset' with a new load/st=
ore
of 'new base + small offset'. If the new base gets stored in a compressed
register, then the new load/store can be compressed. Since there is an over=
head
in creating the new base, this change is only attempted when 'base register=
' is
referenced in at least 4 load/stores in a basic block.

The optimization is implemented in a new RISC-V specific pass called
shorten_memrefs which is enabled for RVC targets. It has been developed for=
 the
32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in fut=
ure.

The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
Embench benchmark suite (https://www.embench.org/) we see code size reducti=
ons
of up to 18 bytes (0.7%) and only two cases where code size is increased
slightly, by 2 bytes each:

Embench results (.text size in bytes, excluding .rodata)

Benchmark       Without patch  With patch  Diff
aha-mont64      1052           1052        0
crc32           232            232         0
cubic           2446           2448        2
edn             1454           1450        -4
huffbench       1642           1642        0
matmult-int     420            420         0
minver          1056           1056        0
nbody           714            714         0
nettle-aes      2888           2884        -4
nettle-sha256   5566           5564        -2
nsichneu        15052          15052       0
picojpeg        8078           8078        0
qrduino         6140           6140        0
sglib-combined  2444           2444        0
slre            2438           2420        -18
st              880            880         0
statemate       3842           3842        0
ud              702            702         0
wikisort        4278           4280        2
-------------------------------------------------
Total           61324          61300       -24

The patch has been tested on the following bare metal targets using QEMU
and there were no regressions:

  rv32i
  rv32iac
  rv32im
  rv32imac
  rv32imafc
  rv64imac
  rv64imafdc

We noticed that sched2 undoes some of the addresses generated by this
optimization and consequently increases code size, therefore this patch add=
s a
check in sched-deps.c to avoid changes that are expected to increase code s=
ize
when not optimizing for speed. Since this change touches target-independent
code, the patch has been bootstrapped and tested on x86 with no regressions.

gcc/ChangeLog

	* config/riscv/riscv.c (tree-pass.h): New include.
	(cfg.h) Likewise.
	(context.h) Likewise.
	(riscv_compressed_reg_p): New function.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
	to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
	cheaper for code size than uncompressed load/stores.
	(class pass_shorten_memrefs): New pass.
	(pass_shorten_memrefs::execute): Likewise.
	(make_pass_shorten_memrefs): Likewise.
	(riscv_option_override): Register shorten_memrefs pass for
	TARGET_RVC.
	(riscv_register_priority): Move compressed register check to
	riscv_compressed_reg_p.
	* sched-deps.c (attempt_change): When optimizing for code size
	don't make change if it increases code size.

---
 gcc/config/riscv/riscv.c | 179 +++++++++++++++++++++++++++++++++++++++++++=
++--
 gcc/sched-deps.c         |  10 +++
 2 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 39bf87a..e510314 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -55,6 +55,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "builtins.h"
 #include "predict.h"
+#include "tree-pass.h"
+#include "cfg.h"
+#include "context.h"
=20
 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
 #define UNSPEC_ADDRESS_P(X)					\
@@ -848,6 +851,44 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, =
bool strict_p)
   return riscv_classify_address (&addr, x, mode, strict_p);
 }
=20
+/* Return true if hard reg REGNO can be used in compressed instructions.  =
*/
+
+static bool
+riscv_compressed_reg_p (int regno)
+{
+  /* x8-x15/f8-f15 are compressible registers.  */
+  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST +=
 15)
+	  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
+}
+
+/* Return true if load/store from/to address x can be compressed.  */
+
+static bool
+riscv_compressed_lw_address_p (rtx x)
+{
+  struct riscv_address_info addr;
+  bool result =3D riscv_classify_address (&addr, x, GET_MODE (x),
+					reload_completed);
+
+  /* Before reload, assuming all load/stores of valid addresses get compre=
ssed
+     gives better code size than checking if the address is reg + small_of=
fset
+     early on.  */
+  if (result && !reload_completed)
+    return true;
+
+  /* Return false if address is not compressed_reg + small_offset.  */
+  if (!result
+      || addr.type !=3D ADDRESS_REG
+      || (!riscv_compressed_reg_p (REGNO (addr.reg))
+	    && addr.reg !=3D stack_pointer_rtx)
+      || !CONST_INT_P (addr.offset)
+      || (INTVAL (addr.offset) & 3) !=3D 0
+      || !IN_RANGE (INTVAL (addr.offset), 0, 124))
+    return false;
+
+  return result;
+}
+
 /* Return the number of instructions needed to load or store a value
    of mode MODE at address X.  Return 0 if X isn't valid for MODE.
    Assume that multiword moves may need to be split into word moves
@@ -1318,7 +1359,9 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_U=
NUSED,
   if (riscv_split_symbol (NULL, x, mode, &addr))
     return riscv_force_address (addr, mode);
=20
-  /* Handle BASE + OFFSET using riscv_add_offset.  */
+  /* When optimizing for size, try to convert BASE + LARGE_OFFSET into
+     NEW_BASE + SMALL_OFFSET to allow possible compressed load/store, othe=
rwise,
+     handle BASE + OFFSET using riscv_add_offset.  */
   if (GET_CODE (x) =3D=3D PLUS && CONST_INT_P (XEXP (x, 1))
       && INTVAL (XEXP (x, 1)) !=3D 0)
     {
@@ -1327,7 +1370,24 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_=
UNUSED,
=20
       if (!riscv_valid_base_register_p (base, mode, false))
 	base =3D copy_to_mode_reg (Pmode, base);
-      addr =3D riscv_add_offset (NULL, base, offset);
+      if (optimize_function_for_size_p (cfun)
+	  && (strcmp (current_pass->name, "shorten_memrefs") =3D=3D 0)
+	  && mode =3D=3D SImode
+	  && (offset & 3) =3D=3D 0
+	  && !IN_RANGE (offset, 0, 124))
+	{
+	  rtx high;
+
+	  /* Leave OFFSET as a 7-bit offset and put the excess in HIGH.  */
+	  high =3D GEN_INT (offset & ~124);
+	  offset &=3D 124;
+	  if (!SMALL_OPERAND (INTVAL (high)))
+	    high =3D force_reg (Pmode, high);
+	  base =3D force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
+	  addr =3D plus_constant (Pmode, base, offset);
+	}
+      else
+	addr =3D riscv_add_offset (NULL, base, offset);
       return riscv_force_address (addr, mode);
     }
=20
@@ -1812,7 +1872,10 @@ riscv_address_cost (rtx addr, machine_mode mode,
 		    addr_space_t as ATTRIBUTE_UNUSED,
 		    bool speed ATTRIBUTE_UNUSED)
 {
-  return riscv_address_insns (addr, mode, false);
+  if (!speed && mode =3D=3D SImode
+      && riscv_compressed_lw_address_p (addr))
+    return 1;
+  return !speed + riscv_address_insns (addr, mode, false);
 }
=20
 /* Return one word of double-word value OP.  HIGH_P is true to select the
@@ -4541,6 +4604,106 @@ riscv_init_machine_status (void)
   return ggc_cleared_alloc<machine_function> ();
 }
=20
+namespace {
+
+const pass_data pass_data_shorten_memrefs =3D
+{
+  RTL_PASS, /* type */
+  "shorten_memrefs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_shorten_memrefs : public rtl_opt_pass
+{
+public:
+  pass_shorten_memrefs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return optimize > 0; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_shorten_memrefs
+
+/* Try to make more use of compressed load and store instructions by repla=
cing
+   a load/store at address BASE + LARGE_OFFSET with a new load/store at ad=
dress
+   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed registe=
r, the
+   load/store can be compressed.  Since creating NEW BASE incurs an overhe=
ad,
+   the change is only attempted when BASE is referenced by at least four
+   load/stores in the same basic block.  */
+unsigned int
+pass_shorten_memrefs::execute (function *fn)
+{
+  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
+  typedef hash_map <regno_hash, int> regno_map;
+
+  basic_block bb;
+  rtx_insn *insn;
+
+  regstat_init_n_sets_and_refs ();
+
+  FOR_ALL_BB_FN (bb, fn)
+  {
+    regno_map *m =3D hash_map<regno_hash, int>::create_ggc (10);
+    for (int pass =3D 0; !optimize_bb_for_speed_p (bb) && pass < 2; pass++)
+      FOR_BB_INSNS (bb, insn)
+	{
+	  if (!NONJUMP_INSN_P (insn))
+	    continue;
+	  rtx pat =3D PATTERN (insn);
+	  if (GET_CODE (pat) !=3D SET)
+	    continue;
+	  start_sequence ();
+	  for (int i =3D 0; i < 2; i++)
+	    {
+	      rtx mem =3D XEXP (pat, i);
+	      if (MEM_P (mem) && GET_MODE (mem) =3D=3D SImode)
+		{
+		  rtx addr =3D XEXP (mem, 0);
+		  if (GET_CODE (addr) !=3D PLUS)
+		    continue;
+		  if (!REG_P (XEXP (addr, 0)))
+		    continue;
+		  HOST_WIDE_INT regno =3D REGNO (XEXP (addr, 0));
+		  if (REG_N_REFS (regno) < 4)
+		    continue;
+		  if (pass =3D=3D 0)
+		    m->get_or_insert (regno)++;
+		  else if (m->get_or_insert (regno) > 3)
+		    {
+		      addr
+			=3D riscv_legitimize_address (addr, addr, GET_MODE (mem));
+		      XEXP (pat, i) =3D replace_equiv_address (mem, addr);
+		      df_insn_rescan (insn);
+		    }
+		}
+	    }
+	  rtx_insn *seq =3D get_insns ();
+	  end_sequence ();
+	  emit_insn_before (seq, insn);
+	}
+
+  }
+  regstat_free_n_sets_and_refs ();
+
+  return 0;
+}
+
+} // anon namespace
+
+opt_pass *
+make_pass_shorten_memrefs (gcc::context *ctxt)
+{
+  return new pass_shorten_memrefs (ctxt);
+}
+
 /* Implement TARGET_OPTION_OVERRIDE.  */
=20
 static void
@@ -4637,6 +4800,10 @@ riscv_option_override (void)
     error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.3=
2"
 	   " [%<-mriscv-attribute%>]");
 #endif
+
+  if (TARGET_RVC)
+    register_pass (make_pass_shorten_memrefs (g),
+		   PASS_POS_INSERT_AFTER, "store_motion", 1);
 }
=20
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -4676,9 +4843,9 @@ riscv_conditional_register_usage (void)
 static int
 riscv_register_priority (int regno)
 {
-  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.=
  */
-  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
-		     || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
+  /* Favor compressed registers to improve the odds of RVC instruction
+     selection.  */
+  if (riscv_compressed_reg_p (regno))
     return 1;
=20
   return 0;
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 52db3cc..92a0893 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "sched-int.h"
 #include "params.h"
 #include "cselib.h"
+#include "predict.h"
=20
 #ifdef INSN_SCHEDULING
=20
@@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx new_ad=
dr)
   rtx mem =3D *mii->mem_loc;
   rtx new_mem;
=20
+  /* When not optimizing for speed, avoid changes that are expected to mak=
e code
+     size larger.  */
+  addr_space_t as =3D MEM_ADDR_SPACE (mem);
+  bool speed =3D optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
+  int old_cost =3D address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
+  int new_cost =3D address_cost (new_addr, GET_MODE (mem), as, speed);
+  if (new_cost > old_cost && !speed)
+    return NULL_RTX;
+
   /* Jump through a lot of hoops to keep the attributes up to date.  We
      do not want to call one of the change address variants that take
      an offset even though we know the offset in many cases.  These