[PATCH] RISC-V: Allow more load/stores to be compressed

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] RISC-V: Allow more load/stores to be compressed
@ 2019-09-12 16:19 Craig Blackmore
  2019-09-18 10:01 ` Kito Cheng
  0 siblings, 1 reply; 24+ messages in thread
From: Craig Blackmore @ 2019-09-12 16:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: jimw, Ofer Shinaar, Nidal Faour

This patch aims to allow more load/store instructions to be compressed by
replacing a load/store of 'base register + large offset' with a new load/store
of 'new base + small offset'. If the new base gets stored in a compressed
register, then the new load/store can be compressed. Since there is an overhead
in creating the new base, this change is only attempted when 'base register' is
referenced in at least 4 load/stores in a basic block.

The optimization is implemented in a new RISC-V specific pass called
shorten_memrefs which is enabled for RVC targets. It has been developed for the
32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.

The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
Embench benchmark suite (https://www.embench.org/) we see code size reductions
of up to 18 bytes (0.7%) and only two cases where code size is increased
slightly, by 2 bytes each:

Embench results (.text size in bytes, excluding .rodata)

Benchmark       Without patch  With patch  Diff
aha-mont64      1052           1052        0
crc32           232            232         0
cubic           2446           2448        2
edn             1454           1450        -4
huffbench       1642           1642        0
matmult-int     420            420         0
minver          1056           1056        0
nbody           714            714         0
nettle-aes      2888           2884        -4
nettle-sha256   5566           5564        -2
nsichneu        15052          15052       0
picojpeg        8078           8078        0
qrduino         6140           6140        0
sglib-combined  2444           2444        0
slre            2438           2420        -18
st              880            880         0
statemate       3842           3842        0
ud              702            702         0
wikisort        4278           4280        2
-------------------------------------------------
Total           61324          61300       -24

The patch has been tested on the following bare metal targets using QEMU
and there were no regressions:

  rv32i
  rv32iac
  rv32im
  rv32imac
  rv32imafc
  rv64imac
  rv64imafdc

We noticed that sched2 undoes some of the addresses generated by this
optimization and consequently increases code size, therefore this patch adds a
check in sched-deps.c to avoid changes that are expected to increase code size
when not optimizing for speed. Since this change touches target-independent
code, the patch has been bootstrapped and tested on x86 with no regressions.

gcc/ChangeLog

	* config/riscv/riscv.c (tree-pass.h): New include.
	(cfg.h) Likewise.
	(context.h) Likewise.
	(riscv_compressed_reg_p): New function.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
	to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
	cheaper for code size than uncompressed load/stores.
	(class pass_shorten_memrefs): New pass.
	(pass_shorten_memrefs::execute): Likewise.
	(make_pass_shorten_memrefs): Likewise.
	(riscv_option_override): Register shorten_memrefs pass for
	TARGET_RVC.
	(riscv_register_priority): Move compressed register check to
	riscv_compressed_reg_p.
	* sched-deps.c (attempt_change): When optimizing for code size
	don't make change if it increases code size.

---
 gcc/config/riscv/riscv.c | 179 +++++++++++++++++++++++++++++++++++++++++++++--
 gcc/sched-deps.c         |  10 +++
 2 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 39bf87a..e510314 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -55,6 +55,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "builtins.h"
 #include "predict.h"
+#include "tree-pass.h"
+#include "cfg.h"
+#include "context.h"
 
 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
 #define UNSPEC_ADDRESS_P(X)					\
@@ -848,6 +851,44 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
   return riscv_classify_address (&addr, x, mode, strict_p);
 }
 
+/* Return true if hard reg REGNO can be used in compressed instructions.  */
+
+static bool
+riscv_compressed_reg_p (int regno)
+{
+  /* x8-x15/f8-f15 are compressible registers.  */
+  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
+	  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
+}
+
+/* Return true if load/store from/to address x can be compressed.  */
+
+static bool
+riscv_compressed_lw_address_p (rtx x)
+{
+  struct riscv_address_info addr;
+  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
+					reload_completed);
+
+  /* Before reload, assuming all load/stores of valid addresses get compressed
+     gives better code size than checking if the address is reg + small_offset
+     early on.  */
+  if (result && !reload_completed)
+    return true;
+
+  /* Return false if address is not compressed_reg + small_offset.  */
+  if (!result
+      || addr.type != ADDRESS_REG
+      || (!riscv_compressed_reg_p (REGNO (addr.reg))
+	    && addr.reg != stack_pointer_rtx)
+      || !CONST_INT_P (addr.offset)
+      || (INTVAL (addr.offset) & 3) != 0
+      || !IN_RANGE (INTVAL (addr.offset), 0, 124))
+    return false;
+
+  return result;
+}
+
 /* Return the number of instructions needed to load or store a value
    of mode MODE at address X.  Return 0 if X isn't valid for MODE.
    Assume that multiword moves may need to be split into word moves
@@ -1318,7 +1359,9 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
   if (riscv_split_symbol (NULL, x, mode, &addr))
     return riscv_force_address (addr, mode);
 
-  /* Handle BASE + OFFSET using riscv_add_offset.  */
+  /* When optimizing for size, try to convert BASE + LARGE_OFFSET into
+     NEW_BASE + SMALL_OFFSET to allow possible compressed load/store, otherwise,
+     handle BASE + OFFSET using riscv_add_offset.  */
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
       && INTVAL (XEXP (x, 1)) != 0)
     {
@@ -1327,7 +1370,24 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
 
       if (!riscv_valid_base_register_p (base, mode, false))
 	base = copy_to_mode_reg (Pmode, base);
-      addr = riscv_add_offset (NULL, base, offset);
+      if (optimize_function_for_size_p (cfun)
+	  && (strcmp (current_pass->name, "shorten_memrefs") == 0)
+	  && mode == SImode
+	  && (offset & 3) == 0
+	  && !IN_RANGE (offset, 0, 124))
+	{
+	  rtx high;
+
+	  /* Leave OFFSET as a 7-bit offset and put the excess in HIGH.  */
+	  high = GEN_INT (offset & ~124);
+	  offset &= 124;
+	  if (!SMALL_OPERAND (INTVAL (high)))
+	    high = force_reg (Pmode, high);
+	  base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
+	  addr = plus_constant (Pmode, base, offset);
+	}
+      else
+	addr = riscv_add_offset (NULL, base, offset);
       return riscv_force_address (addr, mode);
     }
 
@@ -1812,7 +1872,10 @@ riscv_address_cost (rtx addr, machine_mode mode,
 		    addr_space_t as ATTRIBUTE_UNUSED,
 		    bool speed ATTRIBUTE_UNUSED)
 {
-  return riscv_address_insns (addr, mode, false);
+  if (!speed && mode == SImode
+      && riscv_compressed_lw_address_p (addr))
+    return 1;
+  return !speed + riscv_address_insns (addr, mode, false);
 }
 
 /* Return one word of double-word value OP.  HIGH_P is true to select the
@@ -4541,6 +4604,106 @@ riscv_init_machine_status (void)
   return ggc_cleared_alloc<machine_function> ();
 }
 
+namespace {
+
+const pass_data pass_data_shorten_memrefs =
+{
+  RTL_PASS, /* type */
+  "shorten_memrefs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_shorten_memrefs : public rtl_opt_pass
+{
+public:
+  pass_shorten_memrefs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return optimize > 0; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_shorten_memrefs
+
+/* Try to make more use of compressed load and store instructions by replacing
+   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
+   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
+   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
+   the change is only attempted when BASE is referenced by at least four
+   load/stores in the same basic block.  */
+unsigned int
+pass_shorten_memrefs::execute (function *fn)
+{
+  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
+  typedef hash_map <regno_hash, int> regno_map;
+
+  basic_block bb;
+  rtx_insn *insn;
+
+  regstat_init_n_sets_and_refs ();
+
+  FOR_ALL_BB_FN (bb, fn)
+  {
+    regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
+    for (int pass = 0; !optimize_bb_for_speed_p (bb) && pass < 2; pass++)
+      FOR_BB_INSNS (bb, insn)
+	{
+	  if (!NONJUMP_INSN_P (insn))
+	    continue;
+	  rtx pat = PATTERN (insn);
+	  if (GET_CODE (pat) != SET)
+	    continue;
+	  start_sequence ();
+	  for (int i = 0; i < 2; i++)
+	    {
+	      rtx mem = XEXP (pat, i);
+	      if (MEM_P (mem) && GET_MODE (mem) == SImode)
+		{
+		  rtx addr = XEXP (mem, 0);
+		  if (GET_CODE (addr) != PLUS)
+		    continue;
+		  if (!REG_P (XEXP (addr, 0)))
+		    continue;
+		  HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+		  if (REG_N_REFS (regno) < 4)
+		    continue;
+		  if (pass == 0)
+		    m->get_or_insert (regno)++;
+		  else if (m->get_or_insert (regno) > 3)
+		    {
+		      addr
+			= riscv_legitimize_address (addr, addr, GET_MODE (mem));
+		      XEXP (pat, i) = replace_equiv_address (mem, addr);
+		      df_insn_rescan (insn);
+		    }
+		}
+	    }
+	  rtx_insn *seq = get_insns ();
+	  end_sequence ();
+	  emit_insn_before (seq, insn);
+	}
+
+  }
+  regstat_free_n_sets_and_refs ();
+
+  return 0;
+}
+
+} // anon namespace
+
+opt_pass *
+make_pass_shorten_memrefs (gcc::context *ctxt)
+{
+  return new pass_shorten_memrefs (ctxt);
+}
+
 /* Implement TARGET_OPTION_OVERRIDE.  */
 
 static void
@@ -4637,6 +4800,10 @@ riscv_option_override (void)
     error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
 	   " [%<-mriscv-attribute%>]");
 #endif
+
+  if (TARGET_RVC)
+    register_pass (make_pass_shorten_memrefs (g),
+		   PASS_POS_INSERT_AFTER, "store_motion", 1);
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -4676,9 +4843,9 @@ riscv_conditional_register_usage (void)
 static int
 riscv_register_priority (int regno)
 {
-  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
-  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
-		     || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
+  /* Favor compressed registers to improve the odds of RVC instruction
+     selection.  */
+  if (riscv_compressed_reg_p (regno))
     return 1;
 
   return 0;
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 52db3cc..92a0893 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "sched-int.h"
 #include "params.h"
 #include "cselib.h"
+#include "predict.h"
 
 #ifdef INSN_SCHEDULING
 
@@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
   rtx mem = *mii->mem_loc;
   rtx new_mem;
 
+  /* When not optimizing for speed, avoid changes that are expected to make code
+     size larger.  */
+  addr_space_t as = MEM_ADDR_SPACE (mem);
+  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
+  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
+  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
+  if (new_cost > old_cost && !speed)
+    return NULL_RTX;
+
   /* Jump through a lot of hoops to keep the attributes up to date.  We
      do not want to call one of the change address variants that take
      an offset even though we know the offset in many cases.  These

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] RISC-V: Allow more load/stores to be compressed
  2019-09-12 16:19 [PATCH] RISC-V: Allow more load/stores to be compressed Craig Blackmore
@ 2019-09-18 10:01 ` Kito Cheng
  2019-10-25 17:40   ` [PATCH v2 0/2] " Craig Blackmore
  0 siblings, 1 reply; 24+ messages in thread
From: Kito Cheng @ 2019-09-18 10:01 UTC (permalink / raw)
  To: Craig Blackmore; +Cc: GCC Patches, Jim Wilson, Ofer Shinaar, Nidal Faour

Hi Craig:

Some general review comment:
- Split new pass into new file.
- Add new option to enable/disable this pass.
- Could you extend this patch to support lw/sw/ld/sd/flw/fsw/fld/fsd?
  I think there is lots of common logic for supporting other types
compressed load/store
  instruction, but I'd like to see those support at once.
- Do you have experimental data about doing that after register
allocation/reload,
  I'd prefer doing such optimization after RA, because we can
accurately estimate
  how many byte we can gain, I guess it because RA didn't assign fit
src/dest reg
  or base reg so that increase code size?

On Fri, Sep 13, 2019 at 12:20 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
>
> This patch aims to allow more load/store instructions to be compressed by
> replacing a load/store of 'base register + large offset' with a new load/store
> of 'new base + small offset'. If the new base gets stored in a compressed
> register, then the new load/store can be compressed. Since there is an overhead
> in creating the new base, this change is only attempted when 'base register' is
> referenced in at least 4 load/stores in a basic block.
>
> The optimization is implemented in a new RISC-V specific pass called
> shorten_memrefs which is enabled for RVC targets. It has been developed for the
> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
>
> The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
> compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
> Embench benchmark suite (https://www.embench.org/) we see code size reductions
> of up to 18 bytes (0.7%) and only two cases where code size is increased
> slightly, by 2 bytes each:
>
> Embench results (.text size in bytes, excluding .rodata)
>
> Benchmark       Without patch  With patch  Diff
> aha-mont64      1052           1052        0
> crc32           232            232         0
> cubic           2446           2448        2
> edn             1454           1450        -4
> huffbench       1642           1642        0
> matmult-int     420            420         0
> minver          1056           1056        0
> nbody           714            714         0
> nettle-aes      2888           2884        -4
> nettle-sha256   5566           5564        -2
> nsichneu        15052          15052       0
> picojpeg        8078           8078        0
> qrduino         6140           6140        0
> sglib-combined  2444           2444        0
> slre            2438           2420        -18
> st              880            880         0
> statemate       3842           3842        0
> ud              702            702         0
> wikisort        4278           4280        2
> -------------------------------------------------
> Total           61324          61300       -24
>
> The patch has been tested on the following bare metal targets using QEMU
> and there were no regressions:
>
>   rv32i
>   rv32iac
>   rv32im
>   rv32imac
>   rv32imafc
>   rv64imac
>   rv64imafdc
>
> We noticed that sched2 undoes some of the addresses generated by this
> optimization and consequently increases code size, therefore this patch adds a
> check in sched-deps.c to avoid changes that are expected to increase code size
> when not optimizing for speed. Since this change touches target-independent
> code, the patch has been bootstrapped and tested on x86 with no regressions.
>
> gcc/ChangeLog
>
>         * config/riscv/riscv.c (tree-pass.h): New include.
>         (cfg.h) Likewise.
>         (context.h) Likewise.
>         (riscv_compressed_reg_p): New function.
>         (riscv_compressed_lw_address_p): Likewise.
>         (riscv_legitimize_address): Attempt to convert base + large_offset
>         to compressible new_base + small_offset.
>         (riscv_address_cost): Make anticipated compressed load/stores
>         cheaper for code size than uncompressed load/stores.
>         (class pass_shorten_memrefs): New pass.
>         (pass_shorten_memrefs::execute): Likewise.
>         (make_pass_shorten_memrefs): Likewise.
>         (riscv_option_override): Register shorten_memrefs pass for
>         TARGET_RVC.
>         (riscv_register_priority): Move compressed register check to
>         riscv_compressed_reg_p.
>         * sched-deps.c (attempt_change): When optimizing for code size
>         don't make change if it increases code size.
>
> ---
>  gcc/config/riscv/riscv.c | 179 +++++++++++++++++++++++++++++++++++++++++++++--
>  gcc/sched-deps.c         |  10 +++
>  2 files changed, 183 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 39bf87a..e510314 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -55,6 +55,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "diagnostic.h"
>  #include "builtins.h"
>  #include "predict.h"
> +#include "tree-pass.h"
> +#include "cfg.h"
> +#include "context.h"
>
>  /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
>  #define UNSPEC_ADDRESS_P(X)                                    \
> @@ -848,6 +851,44 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
>    return riscv_classify_address (&addr, x, mode, strict_p);
>  }
>
> +/* Return true if hard reg REGNO can be used in compressed instructions.  */
> +
> +static bool
> +riscv_compressed_reg_p (int regno)
> +{
> +  /* x8-x15/f8-f15 are compressible registers.  */
> +  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
> +         || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
> +}
> +
> +/* Return true if load/store from/to address x can be compressed.  */
> +
> +static bool
> +riscv_compressed_lw_address_p (rtx x)
> +{
> +  struct riscv_address_info addr;
> +  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
> +                                       reload_completed);
> +
> +  /* Before reload, assuming all load/stores of valid addresses get compressed
> +     gives better code size than checking if the address is reg + small_offset
> +     early on.  */
> +  if (result && !reload_completed)
> +    return true;
> +
> +  /* Return false if address is not compressed_reg + small_offset.  */
> +  if (!result
> +      || addr.type != ADDRESS_REG
> +      || (!riscv_compressed_reg_p (REGNO (addr.reg))
> +           && addr.reg != stack_pointer_rtx)
> +      || !CONST_INT_P (addr.offset)
> +      || (INTVAL (addr.offset) & 3) != 0
> +      || !IN_RANGE (INTVAL (addr.offset), 0, 124))
> +    return false;
> +
> +  return result;
> +}
> +
>  /* Return the number of instructions needed to load or store a value
>     of mode MODE at address X.  Return 0 if X isn't valid for MODE.
>     Assume that multiword moves may need to be split into word moves
> @@ -1318,7 +1359,9 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
>    if (riscv_split_symbol (NULL, x, mode, &addr))
>      return riscv_force_address (addr, mode);
>
> -  /* Handle BASE + OFFSET using riscv_add_offset.  */
> +  /* When optimizing for size, try to convert BASE + LARGE_OFFSET into
> +     NEW_BASE + SMALL_OFFSET to allow possible compressed load/store, otherwise,
> +     handle BASE + OFFSET using riscv_add_offset.  */
>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
>        && INTVAL (XEXP (x, 1)) != 0)
>      {
> @@ -1327,7 +1370,24 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
>
>        if (!riscv_valid_base_register_p (base, mode, false))
>         base = copy_to_mode_reg (Pmode, base);
> -      addr = riscv_add_offset (NULL, base, offset);
> +      if (optimize_function_for_size_p (cfun)
> +         && (strcmp (current_pass->name, "shorten_memrefs") == 0)
> +         && mode == SImode
> +         && (offset & 3) == 0
> +         && !IN_RANGE (offset, 0, 124))

I think the offset check can be relax here, since you can put those
offset at HIGH,
and rest offset for compressed load/store still can align to fit
instruction format.

e.g.
  lw a1, 125(a0)
 lw a2, 129(a0)
->
 addi a3, a0, 125
 lw a1, 0(a3)
 lw a2, 4(a3)

> +       {
> +         rtx high;
> +
> +         /* Leave OFFSET as a 7-bit offset and put the excess in HIGH.  */
> +         high = GEN_INT (offset & ~124);
> +         offset &= 124;

124 like a magic number appear 4 times in this patch,
I know it come from (((2^5) << 2) -1) & ~0x3 = 124, but it should
find some way to make this more readable.

> +         if (!SMALL_OPERAND (INTVAL (high)))
> +           high = force_reg (Pmode, high);
> +         base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
> +         addr = plus_constant (Pmode, base, offset);
> +       }
> +      else
> +       addr = riscv_add_offset (NULL, base, offset);

Could you split those logic into new function?

>        return riscv_force_address (addr, mode);
>      }
>
> @@ -1812,7 +1872,10 @@ riscv_address_cost (rtx addr, machine_mode mode,
>                     addr_space_t as ATTRIBUTE_UNUSED,
>                     bool speed ATTRIBUTE_UNUSED)
>  {
> -  return riscv_address_insns (addr, mode, false);
> +  if (!speed && mode == SImode
> +      && riscv_compressed_lw_address_p (addr))
> +    return 1;
> +  return !speed + riscv_address_insns (addr, mode, false);
>  }
>
>  /* Return one word of double-word value OP.  HIGH_P is true to select the
> @@ -4541,6 +4604,106 @@ riscv_init_machine_status (void)
>    return ggc_cleared_alloc<machine_function> ();
>  }
>
> +namespace {
> +
> +const pass_data pass_data_shorten_memrefs =
> +{
> +  RTL_PASS, /* type */
> +  "shorten_memrefs", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_shorten_memrefs : public rtl_opt_pass
> +{
> +public:
> +  pass_shorten_memrefs (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return optimize > 0; }
> +  virtual unsigned int execute (function *);
> +
> +}; // class pass_shorten_memrefs
> +
> +/* Try to make more use of compressed load and store instructions by replacing
> +   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
> +   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
> +   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
> +   the change is only attempted when BASE is referenced by at least four
> +   load/stores in the same basic block.  */
> +unsigned int
> +pass_shorten_memrefs::execute (function *fn)
> +{
> +  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
> +  typedef hash_map <regno_hash, int> regno_map;
> +
> +  basic_block bb;
> +  rtx_insn *insn;
> +
> +  regstat_init_n_sets_and_refs ();
> +
> +  FOR_ALL_BB_FN (bb, fn)
> +  {
> +    regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
> +    for (int pass = 0; !optimize_bb_for_speed_p (bb) && pass < 2; pass++)

It seems like collect info at first pass and doing the transform at second pass,
But you did the transform at first pass too, split two pass into two part would
be more readable.

> +      FOR_BB_INSNS (bb, insn)
> +       {
> +         if (!NONJUMP_INSN_P (insn))
> +           continue;
> +         rtx pat = PATTERN (insn);
> +         if (GET_CODE (pat) != SET)
> +           continue;
> +         start_sequence ();
> +         for (int i = 0; i < 2; i++)
> +           {
> +             rtx mem = XEXP (pat, i);
> +             if (MEM_P (mem) && GET_MODE (mem) == SImode)
> +               {
> +                 rtx addr = XEXP (mem, 0);
> +                 if (GET_CODE (addr) != PLUS)
> +                   continue;
> +                 if (!REG_P (XEXP (addr, 0)))
> +                   continue;
> +                 HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
> +                 if (REG_N_REFS (regno) < 4)
> +                   continue;
> +                 if (pass == 0)
> +                   m->get_or_insert (regno)++;
> +                 else if (m->get_or_insert (regno) > 3)
> +                   {
> +                     addr
> +                       = riscv_legitimize_address (addr, addr, GET_MODE (mem));
> +                     XEXP (pat, i) = replace_equiv_address (mem, addr);
> +                     df_insn_rescan (insn);
> +                   }
> +               }
> +           }
> +         rtx_insn *seq = get_insns ();
> +         end_sequence ();
> +         emit_insn_before (seq, insn);
> +       }
> +
> +  }
> +  regstat_free_n_sets_and_refs ();
> +
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +opt_pass *
> +make_pass_shorten_memrefs (gcc::context *ctxt)
> +{
> +  return new pass_shorten_memrefs (ctxt);
> +}
> +
>  /* Implement TARGET_OPTION_OVERRIDE.  */
>
>  static void
> @@ -4637,6 +4800,10 @@ riscv_option_override (void)
>      error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
>            " [%<-mriscv-attribute%>]");
>  #endif
> +
> +  if (TARGET_RVC)
> +    register_pass (make_pass_shorten_memrefs (g),
> +                  PASS_POS_INSERT_AFTER, "store_motion", 1);
>  }

GCC has new interface to put target specify optimization passes.
You can create a new file riscv-passes.def like
aarch64-passes.def/i386-passes.def
and add PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def to t-riscv.

>
>  /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
> @@ -4676,9 +4843,9 @@ riscv_conditional_register_usage (void)
>  static int
>  riscv_register_priority (int regno)
>  {
> -  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
> -  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
> -                    || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
> +  /* Favor compressed registers to improve the odds of RVC instruction
> +     selection.  */
> +  if (riscv_compressed_reg_p (regno))
>      return 1;
>
>    return 0;
> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
> index 52db3cc..92a0893 100644
> --- a/gcc/sched-deps.c
> +++ b/gcc/sched-deps.c
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "sched-int.h"
>  #include "params.h"
>  #include "cselib.h"
> +#include "predict.h"
>
>  #ifdef INSN_SCHEDULING
>
> @@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
>    rtx mem = *mii->mem_loc;
>    rtx new_mem;
>
> +  /* When not optimizing for speed, avoid changes that are expected to make code
> +     size larger.  */
> +  addr_space_t as = MEM_ADDR_SPACE (mem);
> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
> +  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
> +  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
> +  if (new_cost > old_cost && !speed)

I think !speed should not needed here, it mean address_cost is
incorrect if generated worse code, but this change will effect all
other targets,
so I think it would be better to split into another patch and CC
related reviewer.


> +    return NULL_RTX;
> +
>    /* Jump through a lot of hoops to keep the attributes up to date.  We
>       do not want to call one of the change address variants that take
>       an offset even though we know the offset in many cases.  These
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/2] RISC-V: Allow more load/stores to be compressed
  2019-09-18 10:01 ` Kito Cheng
@ 2019-10-25 17:40   ` Craig Blackmore
  2019-10-25 17:40     ` [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass Craig Blackmore
  2019-10-25 17:57     ` [PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost Craig Blackmore
  0 siblings, 2 replies; 24+ messages in thread
From: Craig Blackmore @ 2019-10-25 17:40 UTC (permalink / raw)
  To: gcc-patches
  Cc: jimw, Ofer.Shinaar, Nidal.Faour, kito.cheng, law, Craig Blackmore

Hi Kito,

Thank you for taking the time to review my patch. I am posting an updated
patchset taking into account your comments.

On 18/09/2019 11:01, Kito Cheng wrote:
> Hi Craig:
>
> Some general review comment:
> - Split new pass into new file.
> - Add new option to enable/disable this pass.
> - Could you extend this patch to support lw/sw/ld/sd/flw/fsw/fld/fsd?
>   I think there is lots of common logic for supporting other types
> compressed load/store
>   instruction, but I'd like to see those support at once.

I agree the patch could be extended to other load/store instructions but
unfortunately I don't have the time to do this at the moment. Can the lw/sw
support be merged and the others added later?

> - Do you have experimental data about doing that after register
> allocation/reload,

I don't think it is feasible to move the pass after reload, because the pass
requires a new register to be allocated for the new base.

>   I'd prefer doing such optimization after RA, because we can
> accurately estimate
>   how many byte we can gain, I guess it because RA didn't assign fit
> src/dest reg
>   or base reg so that increase code size?
>

Before reload, we do not know whether the base reg will be a compressed register
or not.


>
> On Fri, Sep 13, 2019 at 12:20 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>>
>> This patch aims to allow more load/store instructions to be compressed by
>> replacing a load/store of 'base register + large offset' with a new load/store
>> of 'new base + small offset'. If the new base gets stored in a compressed
>> register, then the new load/store can be compressed. Since there is an overhead
>> in creating the new base, this change is only attempted when 'base register' is
>> referenced in at least 4 load/stores in a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been developed for the
>> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
>>
>> The patch saves 164 bytes (0.3%) on a proprietary application (59450 bytes
>> compared to 59286 bytes) compiled for rv32imc bare metal with -Os. On the
>> Embench benchmark suite (https://www.embench.org/) we see code size reductions
>> of up to 18 bytes (0.7%) and only two cases where code size is increased
>> slightly, by 2 bytes each:
>>
>> Embench results (.text size in bytes, excluding .rodata)
>>
>> Benchmark       Without patch  With patch  Diff
>> aha-mont64      1052           1052        0
>> crc32           232            232         0
>> cubic           2446           2448        2
>> edn             1454           1450        -4
>> huffbench       1642           1642        0
>> matmult-int     420            420         0
>> minver          1056           1056        0
>> nbody           714            714         0
>> nettle-aes      2888           2884        -4
>> nettle-sha256   5566           5564        -2
>> nsichneu        15052          15052       0
>> picojpeg        8078           8078        0
>> qrduino         6140           6140        0
>> sglib-combined  2444           2444        0
>> slre            2438           2420        -18
>> st              880            880         0
>> statemate       3842           3842        0
>> ud              702            702         0
>> wikisort        4278           4280        2
>> -------------------------------------------------
>> Total           61324          61300       -24
>>
>> The patch has been tested on the following bare metal targets using QEMU
>> and there were no regressions:
>>
>>   rv32i
>>   rv32iac
>>   rv32im
>>   rv32imac
>>   rv32imafc
>>   rv64imac
>>   rv64imafdc
>>
>> We noticed that sched2 undoes some of the addresses generated by this
>> optimization and consequently increases code size, therefore this patch adds a
>> check in sched-deps.c to avoid changes that are expected to increase code size
>> when not optimizing for speed. Since this change touches target-independent
>> code, the patch has been bootstrapped and tested on x86 with no regressions.
>>


>> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
>> index 52db3cc..92a0893 100644
>> --- a/gcc/sched-deps.c
>> +++ b/gcc/sched-deps.c
>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "sched-int.h"
>>  #include "params.h"
>>  #include "cselib.h"
>> +#include "predict.h"
>>
>>  #ifdef INSN_SCHEDULING
>>
>> @@ -4707,6 +4708,15 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
>>    rtx mem = *mii->mem_loc;
>>    rtx new_mem;
>>
>> +  /* When not optimizing for speed, avoid changes that are expected to make code
>> +     size larger.  */
>> +  addr_space_t as = MEM_ADDR_SPACE (mem);
>> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
>> +  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
>> +  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
>> +  if (new_cost > old_cost && !speed)
>
> I think !speed should not needed here, it mean address_cost is
> incorrect if generated worse code, but this change will effect all
> other targets,
> so I think it would be better to split into another patch and CC
> related reviewer.
>
>

I have removed !speed in the updated patch and CC'd Jeff Law.

Jeff - please could you review my change to sched-deps.c in patch 2/2?

Thanks,
Craig
--

Craig Blackmore (2):
  RISC-V: Add shorten_memrefs pass
  sched-deps.c: Avoid replacing address if it increases address cost

 gcc/config.gcc                           |   2 +-
 gcc/config/riscv/riscv-passes.def        |  20 +++
 gcc/config/riscv/riscv-protos.h          |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c | 188 +++++++++++++++++++++++
 gcc/config/riscv/riscv.c                 |  86 ++++++++++-
 gcc/config/riscv/riscv.h                 |   5 +
 gcc/config/riscv/riscv.opt               |   6 +
 gcc/config/riscv/t-riscv                 |   5 +
 gcc/doc/invoke.texi                      |  10 ++
 gcc/sched-deps.c                         |   9 ++
 10 files changed, 327 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-25 17:40   ` [PATCH v2 0/2] " Craig Blackmore
@ 2019-10-25 17:40     ` Craig Blackmore
  2019-10-26 18:25       ` Jeff Law
                         ` (2 more replies)
  2019-10-25 17:57     ` [PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost Craig Blackmore
  1 sibling, 3 replies; 24+ messages in thread
From: Craig Blackmore @ 2019-10-25 17:40 UTC (permalink / raw)
  To: gcc-patches
  Cc: jimw, Ofer.Shinaar, Nidal.Faour, kito.cheng, law, Craig Blackmore

This patch aims to allow more load/store instructions to be compressed by
replacing a load/store of 'base register + large offset' with a new load/store
of 'new base + small offset'. If the new base gets stored in a compressed
register, then the new load/store can be compressed. Since there is an overhead
in creating the new base, this change is only attempted when 'base register' is
referenced in at least 4 load/stores in a basic block.

The optimization is implemented in a new RISC-V specific pass called
shorten_memrefs which is enabled for RVC targets. It has been developed for the
32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.

Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
rv64imafdc via QEMU. No regressions.

gcc/ChangeLog:

	* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
	* config/riscv/riscv-passes.def: New file.
	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
	* config/riscv/riscv-shorten-memrefs.c: New file.
	* config/riscv/riscv.c (tree-pass.h): New include.
	(riscv_compressed_reg_p): New Function
	(riscv_compressed_lw_offset_p): Likewise.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_shorten_lw_offset): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
        to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
        cheaper for code size than uncompressed load/stores.
	(riscv_register_priority): Move compressed register check to
        riscv_compressed_reg_p.
	* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define.
	* config/riscv/riscv.opt (mshorten-memefs): New option.
	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
	(PASSES_EXTRA): Add riscv-passes.def.
	* doc/invoke.texi: Document -mshorten-memrefs.
---
 gcc/config.gcc                           |   2 +-
 gcc/config/riscv/riscv-passes.def        |  20 +++
 gcc/config/riscv/riscv-protos.h          |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c | 188 +++++++++++++++++++++++
 gcc/config/riscv/riscv.c                 |  86 ++++++++++-
 gcc/config/riscv/riscv.h                 |   5 +
 gcc/config/riscv/riscv.opt               |   6 +
 gcc/config/riscv/t-riscv                 |   5 +
 gcc/doc/invoke.texi                      |  10 ++
 9 files changed, 318 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index bdc2253f8ef..e617215314b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -523,7 +523,7 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-shorten-memrefs.o"
 	d_target_objs="riscv-d.o"
 	;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
new file mode 100644
index 00000000000..8a4ea0918db
--- /dev/null
+++ b/gcc/config/riscv/riscv-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for RISC-V.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5092294803c..78008c28b75 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -89,4 +89,6 @@ extern void riscv_init_builtins (void);
 /* Routines implemented in riscv-common.c.  */
 extern std::string riscv_arch_str ();
 
+rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c b/gcc/config/riscv/riscv-shorten-memrefs.c
new file mode 100644
index 00000000000..aed7ddb792e
--- /dev/null
+++ b/gcc/config/riscv/riscv-shorten-memrefs.c
@@ -0,0 +1,188 @@
+/* Shorten memrefs pass for RISC-V.
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "df.h"
+#include "predict.h"
+#include "tree-pass.h"
+
+/* Try to make more use of compressed load and store instructions by replacing
+   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
+   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
+   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
+   the change is only attempted when BASE is referenced by at least four
+   load/stores in the same basic block.  */
+
+namespace {
+
+const pass_data pass_data_shorten_memrefs =
+{
+  RTL_PASS, /* type */
+  "shorten_memrefs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_shorten_memrefs : public rtl_opt_pass
+{
+public:
+  pass_shorten_memrefs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0;
+    }
+  virtual unsigned int execute (function *);
+
+private:
+  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
+  typedef hash_map <regno_hash, int> regno_map;
+
+  regno_map * analyze (basic_block bb);
+  void transform (regno_map *m, basic_block bb);
+  bool get_si_mem_base_reg (rtx mem, rtx *addr);
+}; // class pass_shorten_memrefs
+
+bool
+pass_shorten_memrefs::get_si_mem_base_reg (rtx mem, rtx *addr)
+{
+  if (!MEM_P (mem) || GET_MODE (mem) != SImode)
+    return false;
+  *addr = XEXP (mem, 0);
+  return GET_CODE (*addr) == PLUS && REG_P (XEXP (*addr, 0));
+}
+
+/* Count how many times each regno is referenced as base address for a memory
+   access.  */
+
+pass_shorten_memrefs::regno_map *
+pass_shorten_memrefs::analyze (basic_block bb)
+{
+  regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
+  rtx_insn *insn;
+
+  regstat_init_n_sets_and_refs ();
+
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      /* Analyze stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      if (REG_N_REFS (regno) < 4)
+		continue;
+	      m->get_or_insert (regno)++;
+	    }
+	  }
+    }
+  regstat_free_n_sets_and_refs ();
+
+  return m;
+}
+
+/* Convert BASE + LARGE_OFFSET to NEW_BASE + SMALL_OFFSET for each load/store
+   with a base reg referenced at least 4 times.  */
+
+void
+pass_shorten_memrefs::transform (regno_map *m, basic_block bb)
+{
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      start_sequence ();
+      /* Transform stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      if (m->get_or_insert (regno) > 3)
+		{
+		  addr
+		    = targetm.legitimize_address (addr, addr, GET_MODE (mem));
+		  XEXP (pat, i) = replace_equiv_address (mem, addr);
+		  df_insn_rescan (insn);
+		}
+	    }
+	}
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      emit_insn_before (seq, insn);
+    }
+}
+
+unsigned int
+pass_shorten_memrefs::execute (function *fn)
+{
+  basic_block bb;
+
+  FOR_ALL_BB_FN (bb, fn)
+  {
+    regno_map *m;
+    if (optimize_bb_for_speed_p (bb))
+      continue;
+    m = analyze (bb);
+    transform (m, bb);
+  }
+
+  return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_shorten_memrefs (gcc::context *ctxt)
+{
+  return new pass_shorten_memrefs (ctxt);
+}
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 77a3ad94aa8..532eaa92632 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "builtins.h"
 #include "predict.h"
+#include "tree-pass.h"
 
 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
 #define UNSPEC_ADDRESS_P(X)					\
@@ -848,6 +849,52 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
   return riscv_classify_address (&addr, x, mode, strict_p);
 }
 
+/* Return true if hard reg REGNO can be used in compressed instructions.  */
+
+static bool
+riscv_compressed_reg_p (int regno)
+{
+  /* x8-x15/f8-f15 are compressible registers.  */
+  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
+	  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
+}
+
+/* Return true if x is an unsigned 5-bit immediate scaled by 4.  */
+
+static bool
+riscv_compressed_lw_offset_p (rtx x)
+{
+  return CONST_INT_P (x)
+	 && (INTVAL (x) & 3) == 0
+	 && IN_RANGE (INTVAL (x), 0, RISCV_MAX_COMPRESSED_LW_OFFSET);
+}
+
+/* Return true if load/store from/to address x can be compressed.  */
+
+static bool
+riscv_compressed_lw_address_p (rtx x)
+{
+  struct riscv_address_info addr;
+  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
+					reload_completed);
+
+  /* Before reload, assuming all load/stores of valid addresses get compressed
+     gives better code size than checking if the address is reg + small_offset
+     early on.  */
+  if (result && !reload_completed)
+    return true;
+
+  /* Return false if address is not compressed_reg + small_offset.  */
+  if (!result
+      || addr.type != ADDRESS_REG
+      || (!riscv_compressed_reg_p (REGNO (addr.reg))
+	    && addr.reg != stack_pointer_rtx)
+      || !riscv_compressed_lw_offset_p (addr.offset))
+    return false;
+
+  return result;
+}
+
 /* Return the number of instructions needed to load or store a value
    of mode MODE at address X.  Return 0 if X isn't valid for MODE.
    Assume that multiword moves may need to be split into word moves
@@ -1305,6 +1352,24 @@ riscv_force_address (rtx x, machine_mode mode)
   return x;
 }
 
+/* Modify base + offset so that offset fits within a compressed load/store insn
+   and the excess is added to base.  */
+
+static rtx
+riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset)
+{
+  rtx addr, high;
+  /* Leave OFFSET as an unsigned 5-bit offset scaled by 4 and put the excess
+     into HIGH.  */
+  high = GEN_INT (offset & ~RISCV_MAX_COMPRESSED_LW_OFFSET);
+  offset &= RISCV_MAX_COMPRESSED_LW_OFFSET;
+  if (!SMALL_OPERAND (INTVAL (high)))
+    high = force_reg (Pmode, high);
+  base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
+  addr = plus_constant (Pmode, base, offset);
+  return addr;
+}
+
 /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
    be legitimized in a way that the generic machinery might not expect,
    return a new address, otherwise return NULL.  MODE is the mode of
@@ -1323,7 +1388,7 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
   if (riscv_split_symbol (NULL, x, mode, &addr, FALSE))
     return riscv_force_address (addr, mode);
 
-  /* Handle BASE + OFFSET using riscv_add_offset.  */
+  /* Handle BASE + OFFSET.  */
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
       && INTVAL (XEXP (x, 1)) != 0)
     {
@@ -1332,7 +1397,14 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
 
       if (!riscv_valid_base_register_p (base, mode, false))
 	base = copy_to_mode_reg (Pmode, base);
-      addr = riscv_add_offset (NULL, base, offset);
+      if (optimize_function_for_size_p (cfun)
+	  && (strcmp (current_pass->name, "shorten_memrefs") == 0)
+	  && mode == SImode)
+	/* Convert BASE + LARGE_OFFSET into NEW_BASE + SMALL_OFFSET to allow
+	   possible compressed load/store.  */
+	addr = riscv_shorten_lw_offset (base, offset);
+      else
+	addr = riscv_add_offset (NULL, base, offset);
       return riscv_force_address (addr, mode);
     }
 
@@ -1822,6 +1894,9 @@ riscv_address_cost (rtx addr, machine_mode mode,
 		    addr_space_t as ATTRIBUTE_UNUSED,
 		    bool speed ATTRIBUTE_UNUSED)
 {
+  if (!speed && riscv_mshorten_memrefs && mode == SImode
+      && !riscv_compressed_lw_address_p (addr))
+    return riscv_address_insns (addr, mode, false) + 1;
   return riscv_address_insns (addr, mode, false);
 }
 
@@ -4647,6 +4722,7 @@ riscv_option_override (void)
     error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
 	   " [%<-mriscv-attribute%>]");
 #endif
+
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -4686,9 +4762,9 @@ riscv_conditional_register_usage (void)
 static int
 riscv_register_priority (int regno)
 {
-  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
-  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
-		     || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
+  /* Favor compressed registers to improve the odds of RVC instruction
+     selection.  */
+  if (riscv_compressed_reg_p (regno))
     return 1;
 
   return 0;
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 246494663f6..36b84c7e8ef 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -876,6 +876,11 @@ while (0)
 
 #define SET_RATIO(speed) (CLEAR_RATIO (speed) - ((speed) ? 0 : 2))
 
+/* This is the maximum value that can be represented in a compressed load/store
+   offset (an unsigned 5-bit value scaled by 4).  */
+
+#define RISCV_MAX_COMPRESSED_LW_OFFSET 124
+
 #ifndef USED_FOR_TARGET
 extern const enum reg_class riscv_regno_to_class[];
 extern bool riscv_slow_unaligned_access_p;
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 7f0c35e9e9c..138bddd369f 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -87,6 +87,12 @@ msave-restore
 Target Report Mask(SAVE_RESTORE)
 Use smaller but slower prologue and epilogue code.
 
+mshorten-memrefs
+Target Bool Var(riscv_mshorten_memrefs) Init(1)
+Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
+memory accesses to be generated as compressed instructions.  Currently targets
+32-bit integer load/stores.
+
 mcmodel=
 Target Report RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
 Specify the code model.
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index ece3a75d512..1e31a49c9c5 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -14,3 +14,8 @@ riscv-d.o: $(srcdir)/config/riscv/riscv-d.c
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
+PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1407d019d14..6de3e032447 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1075,6 +1075,7 @@ See RS/6000 and PowerPC Options.
 -mpreferred-stack-boundary=@var{num} @gol
 -msmall-data-limit=@var{N-bytes} @gol
 -msave-restore  -mno-save-restore @gol
+-mshorten-memrefs  -mno-shorten-memrefs @gol
 -mstrict-align  -mno-strict-align @gol
 -mcmodel=medlow  -mcmodel=medany @gol
 -mexplicit-relocs  -mno-explicit-relocs @gol
@@ -24210,6 +24211,15 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -mshorten-memrefs
+@itemx -mno-shorten-memrefs
+@opindex mshorten-memrefs
+Do or do not attempt to make more use of compressed load/store instructions by
+replacing a load/store of 'base register + large offset' with a new load/store
+of 'new base + small offset'.  If the new base gets stored in a compressed
+register, then the new load/store can be compressed.  Currently targets 32-bit
+integer load/stores only.
+
 @item -mstrict-align
 @itemx -mno-strict-align
 @opindex mstrict-align
-- 
2.17.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost
  2019-10-25 17:40   ` [PATCH v2 0/2] " Craig Blackmore
  2019-10-25 17:40     ` [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass Craig Blackmore
@ 2019-10-25 17:57     ` Craig Blackmore
  2019-10-31  2:00       ` Jim Wilson
  1 sibling, 1 reply; 24+ messages in thread
From: Craig Blackmore @ 2019-10-25 17:57 UTC (permalink / raw)
  To: gcc-patches
  Cc: jimw, Ofer.Shinaar, Nidal.Faour, kito.cheng, law, Craig Blackmore

The sched2 pass undoes some of the addresses generated by the RISC-V
shorten_memrefs code size optimization (patch 1/2) and consequently increases
code size. This patch prevents sched-deps.c from changing an address if it is
expected to increase address cost.

Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
rv64imafdc via QEMU. Bootstrapped and tested on x86_64-linux-gnu. No
regressions.

gcc/ChangeLog:

	* sched-deps.c (attempt_change): Use old address if it is
	cheaper than new address.
---
 gcc/sched-deps.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 308db4e3ca0..c7d0ca550df 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "cselib.h"
 #include "function-abi.h"
+#include "predict.h"
 
 #ifdef INSN_SCHEDULING
 
@@ -4694,6 +4695,14 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
   rtx mem = *mii->mem_loc;
   rtx new_mem;
 
+  /* Prefer old address if it is less expensive.  */
+  addr_space_t as = MEM_ADDR_SPACE (mem);
+  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (mii->mem_insn));
+  int old_cost = address_cost (XEXP (mem, 0), GET_MODE (mem), as, speed);
+  int new_cost = address_cost (new_addr, GET_MODE (mem), as, speed);
+  if (new_cost > old_cost)
+    return NULL_RTX;
+
   /* Jump through a lot of hoops to keep the attributes up to date.  We
      do not want to call one of the change address variants that take
      an offset even though we know the offset in many cases.  These
-- 
2.17.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-25 17:40     ` [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass Craig Blackmore
@ 2019-10-26 18:25       ` Jeff Law
  2019-10-26 19:16         ` Oleg Endo
  2019-10-26 20:02         ` Andrew Waterman
  2019-10-31  0:00       ` Jim Wilson
  2019-10-31  0:03       ` Jim Wilson
  2 siblings, 2 replies; 24+ messages in thread
From: Jeff Law @ 2019-10-26 18:25 UTC (permalink / raw)
  To: Craig Blackmore, gcc-patches; +Cc: jimw, Ofer.Shinaar, Nidal.Faour, kito.cheng

On 10/25/19 11:39 AM, Craig Blackmore wrote:
> This patch aims to allow more load/store instructions to be compressed by
> replacing a load/store of 'base register + large offset' with a new load/store
> of 'new base + small offset'. If the new base gets stored in a compressed
> register, then the new load/store can be compressed. Since there is an overhead
> in creating the new base, this change is only attempted when 'base register' is
> referenced in at least 4 load/stores in a basic block.
> 
> The optimization is implemented in a new RISC-V specific pass called
> shorten_memrefs which is enabled for RVC targets. It has been developed for the
> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
> 
> Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
> rv64imafdc via QEMU. No regressions.
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
> 	* config/riscv/riscv-passes.def: New file.
> 	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
> 	* config/riscv/riscv-shorten-memrefs.c: New file.
> 	* config/riscv/riscv.c (tree-pass.h): New include.
> 	(riscv_compressed_reg_p): New Function
> 	(riscv_compressed_lw_offset_p): Likewise.
> 	(riscv_compressed_lw_address_p): Likewise.
> 	(riscv_shorten_lw_offset): Likewise.
> 	(riscv_legitimize_address): Attempt to convert base + large_offset
>         to compressible new_base + small_offset.
> 	(riscv_address_cost): Make anticipated compressed load/stores
>         cheaper for code size than uncompressed load/stores.
> 	(riscv_register_priority): Move compressed register check to
>         riscv_compressed_reg_p.
> 	* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define.
> 	* config/riscv/riscv.opt (mshorten-memefs): New option.
> 	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> 	(PASSES_EXTRA): Add riscv-passes.def.
> 	* doc/invoke.texi: Document -mshorten-memrefs.
This has traditionally been done via the the legitimize_address hook.
Is there some reason that hook is insufficient for this case?

The hook, IIRC, is called out explow.c.

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-26 18:25       ` Jeff Law
@ 2019-10-26 19:16         ` Oleg Endo
  2019-10-26 20:04           ` Jeff Law
  2019-10-26 20:02         ` Andrew Waterman
  1 sibling, 1 reply; 24+ messages in thread
From: Oleg Endo @ 2019-10-26 19:16 UTC (permalink / raw)
  To: Jeff Law, Craig Blackmore, gcc-patches
  Cc: jimw, Ofer.Shinaar, Nidal.Faour, kito.cheng

On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote:
> On 10/25/19 11:39 AM, Craig Blackmore wrote:
> > This patch aims to allow more load/store instructions to be
> > compressed by
> > replacing a load/store of 'base register + large offset' with a new
> > load/store
> > of 'new base + small offset'. If the new base gets stored in a
> > compressed
> > register, then the new load/store can be compressed. Since there is
> > an overhead
> > in creating the new base, this change is only attempted when 'base
> > register' is
> > referenced in at least 4 load/stores in a basic block.
> > 
> > The optimization is implemented in a new RISC-V specific pass
> > called
> > shorten_memrefs which is enabled for RVC targets. It has been
> > developed for the
> > 32-bit lw/sw instructions but could also be extended to 64-bit
> > ld/sd in future.
> > 
> > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc,
> > rv64imac,
> > rv64imafdc via QEMU. No regressions.
> > 
> > gcc/ChangeLog:
> > 
> > 	* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for
> > riscv.
> > 	* config/riscv/riscv-passes.def: New file.
> > 	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs):
> > Declare.
> > 	* config/riscv/riscv-shorten-memrefs.c: New file.
> > 	* config/riscv/riscv.c (tree-pass.h): New include.
> > 	(riscv_compressed_reg_p): New Function
> > 	(riscv_compressed_lw_offset_p): Likewise.
> > 	(riscv_compressed_lw_address_p): Likewise.
> > 	(riscv_shorten_lw_offset): Likewise.
> > 	(riscv_legitimize_address): Attempt to convert base +
> > large_offset
> >         to compressible new_base + small_offset.
> > 	(riscv_address_cost): Make anticipated compressed load/stores
> >         cheaper for code size than uncompressed load/stores.
> > 	(riscv_register_priority): Move compressed register check to
> >         riscv_compressed_reg_p.
> > 	* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET):
> > Define.
> > 	* config/riscv/riscv.opt (mshorten-memefs): New option.
> > 	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> > 	(PASSES_EXTRA): Add riscv-passes.def.
> > 	* doc/invoke.texi: Document -mshorten-memrefs.
> 
> This has traditionally been done via the the legitimize_address hook.
> Is there some reason that hook is insufficient for this case?
> 
> The hook, IIRC, is called out explow.c.
> 

This sounds like some of my addressing mode selection (AMS) attempts on
SH.  Haven't looked at the patch (sorry), but I'm sure the problem is
pretty much the same.

On SH legitimize_address is used to do ... "something" ... to the
address in order to make the displacement fit.  The issue is,
legitimize_address doesn't get any context so it can't even try to find
a local optimal base address or something like that.

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-26 18:25       ` Jeff Law
  2019-10-26 19:16         ` Oleg Endo
@ 2019-10-26 20:02         ` Andrew Waterman
  2019-10-26 20:16           ` Jeff Law
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Waterman @ 2019-10-26 20:02 UTC (permalink / raw)
  To: Jeff Law
  Cc: Craig Blackmore, GCC Patches, Jim Wilson, Ofer Shinaar,
	Nidal.Faour, Kito Cheng

I don't know enough to say whether the legitimize_address hook is
sufficient for the intended purpose, but I am sure that RISC-V's
concerns are not unique: other GCC targets have to cope with
offset-size constraints that are coupled to register-allocation
constraints.


On Sat, Oct 26, 2019 at 11:21 AM Jeff Law <law@redhat.com> wrote:
>
> On 10/25/19 11:39 AM, Craig Blackmore wrote:
> > This patch aims to allow more load/store instructions to be compressed by
> > replacing a load/store of 'base register + large offset' with a new load/store
> > of 'new base + small offset'. If the new base gets stored in a compressed
> > register, then the new load/store can be compressed. Since there is an overhead
> > in creating the new base, this change is only attempted when 'base register' is
> > referenced in at least 4 load/stores in a basic block.
> >
> > The optimization is implemented in a new RISC-V specific pass called
> > shorten_memrefs which is enabled for RVC targets. It has been developed for the
> > 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
> >
> > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
> > rv64imafdc via QEMU. No regressions.
> >
> > gcc/ChangeLog:
> >
> >       * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
> >       * config/riscv/riscv-passes.def: New file.
> >       * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
> >       * config/riscv/riscv-shorten-memrefs.c: New file.
> >       * config/riscv/riscv.c (tree-pass.h): New include.
> >       (riscv_compressed_reg_p): New Function
> >       (riscv_compressed_lw_offset_p): Likewise.
> >       (riscv_compressed_lw_address_p): Likewise.
> >       (riscv_shorten_lw_offset): Likewise.
> >       (riscv_legitimize_address): Attempt to convert base + large_offset
> >         to compressible new_base + small_offset.
> >       (riscv_address_cost): Make anticipated compressed load/stores
> >         cheaper for code size than uncompressed load/stores.
> >       (riscv_register_priority): Move compressed register check to
> >         riscv_compressed_reg_p.
> >       * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define.
> >       * config/riscv/riscv.opt (mshorten-memefs): New option.
> >       * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> >       (PASSES_EXTRA): Add riscv-passes.def.
> >       * doc/invoke.texi: Document -mshorten-memrefs.
> This has traditionally been done via the the legitimize_address hook.
> Is there some reason that hook is insufficient for this case?
>
> The hook, IIRC, is called out explow.c.
>
> Jeff
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-26 19:16         ` Oleg Endo
@ 2019-10-26 20:04           ` Jeff Law
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff Law @ 2019-10-26 20:04 UTC (permalink / raw)
  To: Oleg Endo, Craig Blackmore, gcc-patches
  Cc: jimw, Ofer.Shinaar, Nidal.Faour, kito.cheng

On 10/26/19 1:10 PM, Oleg Endo wrote:
> On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote:
>> On 10/25/19 11:39 AM, Craig Blackmore wrote:
>>> This patch aims to allow more load/store instructions to be
>>> compressed by
>>> replacing a load/store of 'base register + large offset' with a new
>>> load/store
>>> of 'new base + small offset'. If the new base gets stored in a
>>> compressed
>>> register, then the new load/store can be compressed. Since there is
>>> an overhead
>>> in creating the new base, this change is only attempted when 'base
>>> register' is
>>> referenced in at least 4 load/stores in a basic block.
>>>
>>> The optimization is implemented in a new RISC-V specific pass
>>> called
>>> shorten_memrefs which is enabled for RVC targets. It has been
>>> developed for the
>>> 32-bit lw/sw instructions but could also be extended to 64-bit
>>> ld/sd in future.
>>>
>>> Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc,
>>> rv64imac,
>>> rv64imafdc via QEMU. No regressions.
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* config.gcc: Add riscv-shorten-memrefs.o to extra_objs for
>>> riscv.
>>> 	* config/riscv/riscv-passes.def: New file.
>>> 	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs):
>>> Declare.
>>> 	* config/riscv/riscv-shorten-memrefs.c: New file.
>>> 	* config/riscv/riscv.c (tree-pass.h): New include.
>>> 	(riscv_compressed_reg_p): New Function
>>> 	(riscv_compressed_lw_offset_p): Likewise.
>>> 	(riscv_compressed_lw_address_p): Likewise.
>>> 	(riscv_shorten_lw_offset): Likewise.
>>> 	(riscv_legitimize_address): Attempt to convert base +
>>> large_offset
>>>         to compressible new_base + small_offset.
>>> 	(riscv_address_cost): Make anticipated compressed load/stores
>>>         cheaper for code size than uncompressed load/stores.
>>> 	(riscv_register_priority): Move compressed register check to
>>>         riscv_compressed_reg_p.
>>> 	* config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET):
>>> Define.
>>> 	* config/riscv/riscv.opt (mshorten-memefs): New option.
>>> 	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
>>> 	(PASSES_EXTRA): Add riscv-passes.def.
>>> 	* doc/invoke.texi: Document -mshorten-memrefs.
>>
>> This has traditionally been done via the the legitimize_address hook.
>> Is there some reason that hook is insufficient for this case?
>>
>> The hook, IIRC, is called out explow.c.
>>
> 
> This sounds like some of my addressing mode selection (AMS) attempts on
> SH.  Haven't looked at the patch (sorry), but I'm sure the problem is
> pretty much the same.
> 
> On SH legitimize_address is used to do ... "something" ... to the
> address in order to make the displacement fit.  The issue is,
> legitimize_address doesn't get any context so it can't even try to find
> a local optimal base address or something like that.
True it's not given any context.  I think most ports try to form the
address in such a way that the base+offset' is likely to be a common
subexpression.

THere was a similar hook for reload which I didn't mention as I wasn't
sure it was used i the LRA world.

SH is likely  than most more complex.  Joern tried really hard to get
good code on the SH and as a result the implementation is sometimes
tough to follow.

Jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-26 20:02         ` Andrew Waterman
@ 2019-10-26 20:16           ` Jeff Law
  2019-10-27  7:13             ` Oleg Endo
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Law @ 2019-10-26 20:16 UTC (permalink / raw)
  To: Andrew Waterman
  Cc: Craig Blackmore, GCC Patches, Jim Wilson, Ofer Shinaar,
	Nidal.Faour, Kito Cheng

On 10/26/19 1:33 PM, Andrew Waterman wrote:
> I don't know enough to say whether the legitimize_address hook is
> sufficient for the intended purpose, but I am sure that RISC-V's
> concerns are not unique: other GCC targets have to cope with
> offset-size constraints that are coupled to register-allocation
> constraints.
Yup.  I think every risc port in the 90s faces this problem.  I always
wished for a generic mechanism for ports to handle this problem.

Regardless, it's probably worth investigating.

jeff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-26 20:16           ` Jeff Law
@ 2019-10-27  7:13             ` Oleg Endo
  0 siblings, 0 replies; 24+ messages in thread
From: Oleg Endo @ 2019-10-27  7:13 UTC (permalink / raw)
  To: Jeff Law, Andrew Waterman
  Cc: Craig Blackmore, GCC Patches, Jim Wilson, Ofer Shinaar,
	Nidal.Faour, Kito Cheng

On Sat, 2019-10-26 at 14:04 -0600, Jeff Law wrote:
> On 10/26/19 1:33 PM, Andrew Waterman wrote:
> > I don't know enough to say whether the legitimize_address hook is
> > sufficient for the intended purpose, but I am sure that RISC-V's
> > concerns are not unique: other GCC targets have to cope with
> > offset-size constraints that are coupled to register-allocation
> > constraints.
> 
> Yup.  I think every risc port in the 90s faces this problem.  I
> always
> wished for a generic mechanism for ports to handle this problem.
> 
> Regardless, it's probably worth investigating.
> 

What we tried to do with the address mode selection (AMS) optimization
some time ago was the following:

  - Extract memory accesses from the insns stream and put them in 
    "access sequences".  Also analyze the address expression and try   
    to find effective base addresses by tracing back address 
    calculations.

  - For each memory access, get a set of address mode alternatives and 
    the corresponding costs from the backend.  The full context of each
    access is provided, so the backend can detect things like 
    "in this access sequence, FP loads dominate" and use this 
    information to tune the alternative costs.

  - Given the alternatives and costs for each memory access, the pass 
    would then try to minimize the costs of the whole memory access    
    sequence, taking costs of address modification isnns into account. 

I think this is quite generic, but of course also complex.  The
optimization problem itself is hard.  There was some research done by
others using  CPLEX or PBQP solvers.  To keep things simple we used a 
backtracking algorithm and handled only a limited set of scenarios. 
For example, I think we could not get loop constructs work nicely to
improve post-inc address mode utilization.

The SH ISA has very similar properties to ARM thumb and RVC, and
perhaps others.  Advantages would not be limited to RISC only, even
CISC ISAs like M68K, RX, ... can benefit from it, as the "proper
optimization" can reduce the instruction sizes by shortening the
addresses in the instruction stream.

If anyone is interested, here is the AMS optimization pass class:

https://github.com/erikvarga/gcc/blob/master/gcc/ams.h
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc

It's using a different style to callback into the backend code.  Not
GCC's "hooks" but a delegate pattern.  SH backend's delegate
implementation is here

https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11897

We were getting some improvements in the generated code, but it started
putting pressure on register allocation related issues in the SH
backend (the R0 problem), so we could not do more proper testing.

Maybe somebody can get some ideas out of it.

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-25 17:40     ` [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass Craig Blackmore
  2019-10-26 18:25       ` Jeff Law
@ 2019-10-31  0:00       ` Jim Wilson
  2019-10-31  9:42         ` Nidal Faour
  2019-12-10 18:28         ` Craig Blackmore
  2019-10-31  0:03       ` Jim Wilson
  2 siblings, 2 replies; 24+ messages in thread
From: Jim Wilson @ 2019-10-31  0:00 UTC (permalink / raw)
  To: Craig Blackmore
  Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law

On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
> This patch aims to allow more load/store instructions to be compressed by
> replacing a load/store of 'base register + large offset' with a new load/store
> of 'new base + small offset'. If the new base gets stored in a compressed
> register, then the new load/store can be compressed. Since there is an overhead
> in creating the new base, this change is only attempted when 'base register' is
> referenced in at least 4 load/stores in a basic block.
>
> The optimization is implemented in a new RISC-V specific pass called
> shorten_memrefs which is enabled for RVC targets. It has been developed for the
> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.

The fact that this needs 4 load/stores in a block with the same base
address means it won't trigger very often.  But it does seem to work.
I see about a 0.05% code size reduction for a rv32gc newlib nano
libc.a.  There might be other codes that benefit more.

I'm concerned about the number of RISC-V specific optimization passes
people are writing.  I've seen at least 3 so far.  This one is at
least small and simple enough to understand that it doesn't look like
it will cause maintenance problems.

The config.gcc change conflicts with the save-restore optimization
pass that Andrew Burgess added, but that is a trivial fix.

The code only works for 32-bit load/stores.  rv64 has compressed
64-bit load/stores, and the F and D extensions have float and double
compressed loads and stores.  The patch would be more useful if it
handled all of these.

The patch doesn't check the other operand, it only looks at the memory
operand.  This results in some cases where the code rewrites
instructions that can never be compressed.  For instance, given
void
store1z (int *array)
{
  array[200] = 0;
  array[201] = 0;
  array[202] = 0;
  array[203] = 0;
}
the patch increases code size by 4 bytes because it rewrites the
stores, but we don't have compressed store 0, and x0 is not a
compressed reg, so there is no point in rewriting the stores.  I can
also create examples that show a size increase with loads but it is a
little trickier.
extern int sub1 (int, int, int, int, int, int, int);
int
load1a (int a0, int a1, int a2, int a3, int a4, int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return sub1 (a0, a1, a2, a3, a4, 0, a);
}
The register allocator will pass a0-a4 directly through from args to
the call, leaving only a5-a7 for the load base address and dest, and
only one of those regs is a compressed reg, but we need two, so these
loads can't be compressed.  The code still gets rewritten, resulting
in a size increase for the extra add.  Not sure if you can do anything
about that though, since you are doing this before register
allocation.

It isn't clear that the change riscv_address_cost is for.  That should
be commented.

I'd suggest parens in the riscv_compressed_lw_offset_p return
statement, that makes it easier to align the code correctly (in emacs
at least).

The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the "ISA
constants" section of riscv.h, and maybe renamed similar to the other
constants.

There is a REG_P check in get_si_mem_reg.  That should probably handle
SUBREGs too.

A testcase to verify the patch would be nice.  I have one I wrote for
testing that shows some tests that work, some tests that don't work,
and some that work but maybe shouldn't.

I vaguely remember Micheal Meissner talking about doing something
similar for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know
if he tried, or how far he got if he did try.  It might be useful to
ask him about that work.

Otherwise this looks OK to me.

Jim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-25 17:40     ` [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass Craig Blackmore
  2019-10-26 18:25       ` Jeff Law
  2019-10-31  0:00       ` Jim Wilson
@ 2019-10-31  0:03       ` Jim Wilson
  2 siblings, 0 replies; 24+ messages in thread
From: Jim Wilson @ 2019-10-31  0:03 UTC (permalink / raw)
  To: Craig Blackmore
  Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 187 bytes --]

FYI the testcase I'm using to test the patch.  Some of the functions
get smaller, some of them get bigger, and some don't change in size
but should when compiled for an rv64 target.

Jim

[-- Attachment #2: tmp.c --]
[-- Type: text/x-csrc, Size: 1323 bytes --]

void
store1z (int *array)
{
  array[200] = 0;
  array[201] = 0;
  array[202] = 0;
  array[203] = 0;
}

void
store2z (long long *array)
{
  array[200] = 0;
  array[201] = 0;
  array[202] = 0;
  array[203] = 0;
}

void
store1a (int *array, int a)
{
  array[200] = a;
  array[201] = a;
  array[202] = a;
  array[203] = a;
}

void
store2a (long long *array, long long a)
{
  array[200] = a;
  array[201] = a;
  array[202] = a;
  array[203] = a;
}

int
load1r (int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

long long
load2r (long long *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

extern int sub1 (int, int, int, int, int, int, int);

int
load1a (int a0, int a1, int a2, int a3, int a4, int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return sub1 (a0, a1, a2, a3, a4, 0, a);
}

extern long long sub2 (long long, long long, long long, long long, long long,
		       long long, long long);

long long
load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
	long long *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return sub2 (a0, a1, a2, a3, a4, 0, a);
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost
  2019-10-25 17:57     ` [PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost Craig Blackmore
@ 2019-10-31  2:00       ` Jim Wilson
  2019-12-10 18:29         ` Craig Blackmore
  0 siblings, 1 reply; 24+ messages in thread
From: Jim Wilson @ 2019-10-31  2:00 UTC (permalink / raw)
  To: Craig Blackmore
  Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law

On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
> The sched2 pass undoes some of the addresses generated by the RISC-V
> shorten_memrefs code size optimization (patch 1/2) and consequently increases
> code size. This patch prevents sched-deps.c from changing an address if it is
> expected to increase address cost.

This should be rewritten as a target hook, and then we can define the
hook to do what we want for RISC-V.  It isn't OK to make this change
for other targets without testing them.  If the default hook does
nothing, then other targets won't be affected.

Jim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-31  0:00       ` Jim Wilson
@ 2019-10-31  9:42         ` Nidal Faour
  2019-10-31 10:42           ` Andrew Waterman
  2019-12-10 18:28         ` Craig Blackmore
  1 sibling, 1 reply; 24+ messages in thread
From: Nidal Faour @ 2019-10-31  9:42 UTC (permalink / raw)
  To: Jim Wilson, Craig Blackmore
  Cc: GCC Patches, Ofer Shinaar, Kito Cheng, Jeff Law

A tests case for this patch has been written and pushed to WD github repository at the following link:
https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset

the test case itself has been written based on a real product scenario. 
We got a saving of about 10% in code size of the test itself.

Best Regards,

Nidal Faour
Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group

Western Digital®
Migdal Tefen 24959, P.O Box 3
Email: nidal.faour@wdc.com
Office: +972-4-9078756
Mobile: +972-50-8867756

>-----Original Message-----
>From: Jim Wilson <jimw@sifive.com>
>Sent: Thursday, 31 October 2019 1:57
>To: Craig Blackmore <craig.blackmore@embecosm.com>
>Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar
><Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng
><kito.cheng@gmail.com>; Jeff Law <law@redhat.com>
>Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
>
>CAUTION: This email originated from outside of Western Digital. Do not click
>on links or open attachments unless you recognize the sender and know that
>the content is safe.
>
>
>On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
><craig.blackmore@embecosm.com> wrote:
>> This patch aims to allow more load/store instructions to be compressed
>> by replacing a load/store of 'base register + large offset' with a new
>> load/store of 'new base + small offset'. If the new base gets stored
>> in a compressed register, then the new load/store can be compressed.
>> Since there is an overhead in creating the new base, this change is
>> only attempted when 'base register' is referenced in at least 4 load/stores in
>a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been
>> developed for the 32-bit lw/sw instructions but could also be extended to
>64-bit ld/sd in future.
>
>The fact that this needs 4 load/stores in a block with the same base address
>means it won't trigger very often.  But it does seem to work.
>I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a.  There
>might be other codes that benefit more.
>
>I'm concerned about the number of RISC-V specific optimization passes
>people are writing.  I've seen at least 3 so far.  This one is at least small and
>simple enough to understand that it doesn't look like it will cause maintenance
>problems.
>
>The config.gcc change conflicts with the save-restore optimization pass that
>Andrew Burgess added, but that is a trivial fix.
>
>The code only works for 32-bit load/stores.  rv64 has compressed 64-bit
>load/stores, and the F and D extensions have float and double compressed
>loads and stores.  The patch would be more useful if it handled all of these.
>
>The patch doesn't check the other operand, it only looks at the memory
>operand.  This results in some cases where the code rewrites instructions that
>can never be compressed.  For instance, given void store1z (int *array) {
>  array[200] = 0;
>  array[201] = 0;
>  array[202] = 0;
>  array[203] = 0;
>}
>the patch increases code size by 4 bytes because it rewrites the stores, but we
>don't have compressed store 0, and x0 is not a compressed reg, so there is no
>point in rewriting the stores.  I can also create examples that show a size
>increase with loads but it is a little trickier.
>extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int
>a3, int a4, int *array) {
>  int a = 0;
>  a += array[200];
>  a += array[201];
>  a += array[202];
>  a += array[203];
>  return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4
>directly through from args to the call, leaving only a5-a7 for the load base
>address and dest, and only one of those regs is a compressed reg, but we
>need two, so these loads can't be compressed.  The code still gets rewritten,
>resulting in a size increase for the extra add.  Not sure if you can do anything
>about that though, since you are doing this before register allocation.
>
>It isn't clear that the change riscv_address_cost is for.  That should be
>commented.
>
>I'd suggest parens in the riscv_compressed_lw_offset_p return statement,
>that makes it easier to align the code correctly (in emacs at least).
>
>The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the
>"ISA constants" section of riscv.h, and maybe renamed similar to the other
>constants.
>
>There is a REG_P check in get_si_mem_reg.  That should probably handle
>SUBREGs too.
>
>A testcase to verify the patch would be nice.  I have one I wrote for testing
>that shows some tests that work, some tests that don't work, and some that
>work but maybe shouldn't.
>
>I vaguely remember Micheal Meissner talking about doing something similar
>for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know if he tried, or
>how far he got if he did try.  It might be useful to ask him about that work.
>
>Otherwise this looks OK to me.
>
>Jim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-31  9:42         ` Nidal Faour
@ 2019-10-31 10:42           ` Andrew Waterman
  2019-10-31 15:57             ` Ofer Shinaar
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Waterman @ 2019-10-31 10:42 UTC (permalink / raw)
  To: Nidal Faour
  Cc: Jim Wilson, Craig Blackmore, GCC Patches, Ofer Shinaar,
	Kito Cheng, Jeff Law

Nevertheless, as Jim observed, it's a great burden on the RISC-V
backend maintainers to support all these passes.  Are you saying WD is
willing to put its money where its mouth is and will provide active
support for these passes?

On Thu, Oct 31, 2019 at 2:42 AM Nidal Faour <Nidal.Faour@wdc.com> wrote:
>
> A tests case for this patch has been written and pushed to WD github repository at the following link:
> https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset
>
> the test case itself has been written based on a real product scenario.
> We got a saving of about 10% in code size of the test itself.
>
> Best Regards,
>
> Nidal Faour
> Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group
>
> Western Digital®
> Migdal Tefen 24959, P.O Box 3
> Email: nidal.faour@wdc.com
> Office: +972-4-9078756
> Mobile: +972-50-8867756
>
> >-----Original Message-----
> >From: Jim Wilson <jimw@sifive.com>
> >Sent: Thursday, 31 October 2019 1:57
> >To: Craig Blackmore <craig.blackmore@embecosm.com>
> >Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar
> ><Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng
> ><kito.cheng@gmail.com>; Jeff Law <law@redhat.com>
> >Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
> >
> >CAUTION: This email originated from outside of Western Digital. Do not click
> >on links or open attachments unless you recognize the sender and know that
> >the content is safe.
> >
> >
> >On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
> ><craig.blackmore@embecosm.com> wrote:
> >> This patch aims to allow more load/store instructions to be compressed
> >> by replacing a load/store of 'base register + large offset' with a new
> >> load/store of 'new base + small offset'. If the new base gets stored
> >> in a compressed register, then the new load/store can be compressed.
> >> Since there is an overhead in creating the new base, this change is
> >> only attempted when 'base register' is referenced in at least 4 load/stores in
> >a basic block.
> >>
> >> The optimization is implemented in a new RISC-V specific pass called
> >> shorten_memrefs which is enabled for RVC targets. It has been
> >> developed for the 32-bit lw/sw instructions but could also be extended to
> >64-bit ld/sd in future.
> >
> >The fact that this needs 4 load/stores in a block with the same base address
> >means it won't trigger very often.  But it does seem to work.
> >I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a.  There
> >might be other codes that benefit more.
> >
> >I'm concerned about the number of RISC-V specific optimization passes
> >people are writing.  I've seen at least 3 so far.  This one is at least small and
> >simple enough to understand that it doesn't look like it will cause maintenance
> >problems.
> >
> >The config.gcc change conflicts with the save-restore optimization pass that
> >Andrew Burgess added, but that is a trivial fix.
> >
> >The code only works for 32-bit load/stores.  rv64 has compressed 64-bit
> >load/stores, and the F and D extensions have float and double compressed
> >loads and stores.  The patch would be more useful if it handled all of these.
> >
> >The patch doesn't check the other operand, it only looks at the memory
> >operand.  This results in some cases where the code rewrites instructions that
> >can never be compressed.  For instance, given void store1z (int *array) {
> >  array[200] = 0;
> >  array[201] = 0;
> >  array[202] = 0;
> >  array[203] = 0;
> >}
> >the patch increases code size by 4 bytes because it rewrites the stores, but we
> >don't have compressed store 0, and x0 is not a compressed reg, so there is no
> >point in rewriting the stores.  I can also create examples that show a size
> >increase with loads but it is a little trickier.
> >extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int
> >a3, int a4, int *array) {
> >  int a = 0;
> >  a += array[200];
> >  a += array[201];
> >  a += array[202];
> >  a += array[203];
> >  return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4
> >directly through from args to the call, leaving only a5-a7 for the load base
> >address and dest, and only one of those regs is a compressed reg, but we
> >need two, so these loads can't be compressed.  The code still gets rewritten,
> >resulting in a size increase for the extra add.  Not sure if you can do anything
> >about that though, since you are doing this before register allocation.
> >
> >It isn't clear that the change riscv_address_cost is for.  That should be
> >commented.
> >
> >I'd suggest parens in the riscv_compressed_lw_offset_p return statement,
> >that makes it easier to align the code correctly (in emacs at least).
> >
> >The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the
> >"ISA constants" section of riscv.h, and maybe renamed similar to the other
> >constants.
> >
> >There is a REG_P check in get_si_mem_reg.  That should probably handle
> >SUBREGs too.
> >
> >A testcase to verify the patch would be nice.  I have one I wrote for testing
> >that shows some tests that work, some tests that don't work, and some that
> >work but maybe shouldn't.
> >
> >I vaguely remember Micheal Meissner talking about doing something similar
> >for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know if he tried, or
> >how far he got if he did try.  It might be useful to ask him about that work.
> >
> >Otherwise this looks OK to me.
> >
> >Jim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-31 10:42           ` Andrew Waterman
@ 2019-10-31 15:57             ` Ofer Shinaar
  0 siblings, 0 replies; 24+ messages in thread
From: Ofer Shinaar @ 2019-10-31 15:57 UTC (permalink / raw)
  To: Andrew Waterman, Nidal Faour
  Cc: Jim Wilson, Craig Blackmore, GCC Patches, Kito Cheng, Jeff Law


The test case present a real embedded case, yes it is not functional 
benchmark, it is a test case, but on our days, most of the MCU's for IoT 
devices acts as controllers and less as computing/algorithm units. This 
days, functional algorithms are done on HW and managed by SW with 
snippet as the giving test-case.
This means that SW that accessing the memory on the SoC, for managing 
some HW feature, is commonly used. Therefor the benefit in this pass 
will serve everyone.

We (WD) will obviously support it as we are part of this joint effort to 
drive better density. As a next phase, we can improve this, taking Jim 
comments among others.
Also, as Jim said this, this is a simple case and should not cause 
maintenance problems.

On 31/10/2019 12:18, Andrew Waterman wrote:
> Nevertheless, as Jim observed, it's a great burden on the RISC-V
> backend maintainers to support all these passes.  Are you saying WD is
> willing to put its money where its mouth is and will provide active
> support for these passes?
>
> On Thu, Oct 31, 2019 at 2:42 AM Nidal Faour <Nidal.Faour@wdc.com> wrote:
>> A tests case for this patch has been written and pushed to WD github repository at the following link:
>> https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset
>>
>> the test case itself has been written based on a real product scenario.
>> We got a saving of about 10% in code size of the test itself.
>>
>> Best Regards,
>>
>> Nidal Faour
>> Staff Engineer, R&D Engineering – Firmware & Toolchain, CTO Group
>>
>> Western Digital®
>> Migdal Tefen 24959, P.O Box 3
>> Email: nidal.faour@wdc.com
>> Office: +972-4-9078756
>> Mobile: +972-50-8867756
>>
>>> -----Original Message-----
>>> From: Jim Wilson <jimw@sifive.com>
>>> Sent: Thursday, 31 October 2019 1:57
>>> To: Craig Blackmore <craig.blackmore@embecosm.com>
>>> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Ofer Shinaar
>>> <Ofer.Shinaar@wdc.com>; Nidal Faour <Nidal.Faour@wdc.com>; Kito Cheng
>>> <kito.cheng@gmail.com>; Jeff Law <law@redhat.com>
>>> Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
>>>
>>> CAUTION: This email originated from outside of Western Digital. Do not click
>>> on links or open attachments unless you recognize the sender and know that
>>> the content is safe.
>>>
>>>
>>> On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
>>> <craig.blackmore@embecosm.com> wrote:
>>>> This patch aims to allow more load/store instructions to be compressed
>>>> by replacing a load/store of 'base register + large offset' with a new
>>>> load/store of 'new base + small offset'. If the new base gets stored
>>>> in a compressed register, then the new load/store can be compressed.
>>>> Since there is an overhead in creating the new base, this change is
>>>> only attempted when 'base register' is referenced in at least 4 load/stores in
>>> a basic block.
>>>> The optimization is implemented in a new RISC-V specific pass called
>>>> shorten_memrefs which is enabled for RVC targets. It has been
>>>> developed for the 32-bit lw/sw instructions but could also be extended to
>>> 64-bit ld/sd in future.
>>>
>>> The fact that this needs 4 load/stores in a block with the same base address
>>> means it won't trigger very often.  But it does seem to work.
>>> I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a.  There
>>> might be other codes that benefit more.
>>>
>>> I'm concerned about the number of RISC-V specific optimization passes
>>> people are writing.  I've seen at least 3 so far.  This one is at least small and
>>> simple enough to understand that it doesn't look like it will cause maintenance
>>> problems.
>>>
>>> The config.gcc change conflicts with the save-restore optimization pass that
>>> Andrew Burgess added, but that is a trivial fix.
>>>
>>> The code only works for 32-bit load/stores.  rv64 has compressed 64-bit
>>> load/stores, and the F and D extensions have float and double compressed
>>> loads and stores.  The patch would be more useful if it handled all of these.
>>>
>>> The patch doesn't check the other operand, it only looks at the memory
>>> operand.  This results in some cases where the code rewrites instructions that
>>> can never be compressed.  For instance, given void store1z (int *array) {
>>>   array[200] = 0;
>>>   array[201] = 0;
>>>   array[202] = 0;
>>>   array[203] = 0;
>>> }
>>> the patch increases code size by 4 bytes because it rewrites the stores, but we
>>> don't have compressed store 0, and x0 is not a compressed reg, so there is no
>>> point in rewriting the stores.  I can also create examples that show a size
>>> increase with loads but it is a little trickier.
>>> extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int a1, int a2, int
>>> a3, int a4, int *array) {
>>>   int a = 0;
>>>   a += array[200];
>>>   a += array[201];
>>>   a += array[202];
>>>   a += array[203];
>>>   return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass a0-a4
>>> directly through from args to the call, leaving only a5-a7 for the load base
>>> address and dest, and only one of those regs is a compressed reg, but we
>>> need two, so these loads can't be compressed.  The code still gets rewritten,
>>> resulting in a size increase for the extra add.  Not sure if you can do anything
>>> about that though, since you are doing this before register allocation.
>>>
>>> It isn't clear that the change riscv_address_cost is for.  That should be
>>> commented.
>>>
>>> I'd suggest parens in the riscv_compressed_lw_offset_p return statement,
>>> that makes it easier to align the code correctly (in emacs at least).
>>>
>>> The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the
>>> "ISA constants" section of riscv.h, and maybe renamed similar to the other
>>> constants.
>>>
>>> There is a REG_P check in get_si_mem_reg.  That should probably handle
>>> SUBREGs too.
>>>
>>> A testcase to verify the patch would be nice.  I have one I wrote for testing
>>> that shows some tests that work, some tests that don't work, and some that
>>> work but maybe shouldn't.
>>>
>>> I vaguely remember Micheal Meissner talking about doing something similar
>>> for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know if he tried, or
>>> how far he got if he did try.  It might be useful to ask him about that work.
>>>
>>> Otherwise this looks OK to me.
>>>
>>> Jim



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-10-31  0:00       ` Jim Wilson
  2019-10-31  9:42         ` Nidal Faour
@ 2019-12-10 18:28         ` Craig Blackmore
  2020-02-19 11:40           ` Craig Blackmore
  1 sibling, 1 reply; 24+ messages in thread
From: Craig Blackmore @ 2019-12-10 18:28 UTC (permalink / raw)
  To: Jim Wilson; +Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law

Hi Jim,

Thank you for your review. I have posted an updated patch below which I think
addresses your comments.

On 30/10/2019 23:57, Jim Wilson wrote:
> On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>> This patch aims to allow more load/store instructions to be compressed by
>> replacing a load/store of 'base register + large offset' with a new load/store
>> of 'new base + small offset'. If the new base gets stored in a compressed
>> register, then the new load/store can be compressed. Since there is an overhead
>> in creating the new base, this change is only attempted when 'base register' is
>> referenced in at least 4 load/stores in a basic block.
>>
>> The optimization is implemented in a new RISC-V specific pass called
>> shorten_memrefs which is enabled for RVC targets. It has been developed for the
>> 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in future.
>
> The fact that this needs 4 load/stores in a block with the same base
> address means it won't trigger very often.  But it does seem to work.
> I see about a 0.05% code size reduction for a rv32gc newlib nano
> libc.a.  There might be other codes that benefit more.
>
> I'm concerned about the number of RISC-V specific optimization passes
> people are writing.  I've seen at least 3 so far.  This one is at
> least small and simple enough to understand that it doesn't look like
> it will cause maintenance problems.
>
> The config.gcc change conflicts with the save-restore optimization
> pass that Andrew Burgess added, but that is a trivial fix.
>

I have rebased onto latest trunk and fixed this.

> The code only works for 32-bit load/stores.  rv64 has compressed
> 64-bit load/stores, and the F and D extensions have float and double
> compressed loads and stores.  The patch would be more useful if it
> handled all of these.
>

We plan to add support for these in the future.

> The patch doesn't check the other operand, it only looks at the memory
> operand.  This results in some cases where the code rewrites
> instructions that can never be compressed.  For instance, given
> void
> store1z (int *array)
> {
>   array[200] = 0;
>   array[201] = 0;
>   array[202] = 0;
>   array[203] = 0;
> }
> the patch increases code size by 4 bytes because it rewrites the
> stores, but we don't have compressed store 0, and x0 is not a
> compressed reg, so there is no point in rewriting the stores.

I have now excluded store zero from being counted and rewritten.

>  I can
> also create examples that show a size increase with loads but it is a
> little trickier.
> extern int sub1 (int, int, int, int, int, int, int);
> int
> load1a (int a0, int a1, int a2, int a3, int a4, int *array)
> {
>   int a = 0;
>   a += array[200];
>   a += array[201];
>   a += array[202];
>   a += array[203];
>   return sub1 (a0, a1, a2, a3, a4, 0, a);
> }
> The register allocator will pass a0-a4 directly through from args to
> the call, leaving only a5-a7 for the load base address and dest, and
> only one of those regs is a compressed reg, but we need two, so these
> loads can't be compressed.  The code still gets rewritten, resulting
> in a size increase for the extra add.  Not sure if you can do anything
> about that though, since you are doing this before register
> allocation.

I haven't got a way of avoiding this at present, so I have added this
case as an xfail.

>
> It isn't clear that the change riscv_address_cost is for.  That should
> be commented.
>

This is now commented.

> I'd suggest parens in the riscv_compressed_lw_offset_p return
> statement, that makes it easier to align the code correctly (in emacs
> at least).
>

Parens added.

> The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the "ISA
> constants" section of riscv.h, and maybe renamed similar to the other
> constants.
>

Done.

> There is a REG_P check in get_si_mem_reg.  That should probably handle
> SUBREGs too.
>

I couldn't produce a case where the address is 'subreg + offset', so I have
not added handling for SUBREGs for now.

> A testcase to verify the patch would be nice.  I have one I wrote for
> testing that shows some tests that work, some tests that don't work,
> and some that work but maybe shouldn't.
>

Thank you for the test file. I have added some testcases to the patch
based on this. 

> I vaguely remember Micheal Meissner talking about doing something
> similar for PPC in a GNU Cauldron maybe 2 years ago.  But I don't know
> if he tried, or how far he got if he did try.  It might be useful to
> ask him about that work.
>
> Otherwise this looks OK to me.
>
> Jim

Thanks,
Craig

---

gcc/ChangeLog:

	* config.gcc:  Add riscv-shorten-memrefs.o to extra_objs for riscv.
	* config/riscv/riscv-passes.def: New file.
	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
	* config/riscv/riscv-shorten-memrefs.c: New file.
	* config/riscv/riscv.c (tree-pass.h): New include.
	(riscv_compressed_reg_p): New Function
	(riscv_compressed_lw_offset_p): Likewise.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_shorten_lw_offset): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
	to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
	cheaper for code size than uncompressed load/stores.
	(riscv_register_priority): Move compressed register check to
	riscv_compressed_reg_p.
	(riscv_new_address_profitable_p): New function.
	(TARGET_NEW_ADDRESS_PROFITABLE_P): Define.
	* config/riscv/riscv.h (C_S_BITS): Define.
	(CSW_MAX_OFFSET): Define.
	* config/riscv/riscv.opt (mshorten-memefs): New option.
	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
	(PASSES_EXTRA): Add riscv-passes.def.
	* doc/invoke.texi: Document -mshorten-memrefs.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/shorten-memrefs-1.c: New test.
	* gcc.target/riscv/shorten-memrefs-2.c: New test.
	* gcc.target/riscv/shorten-memrefs-3.c: New test.
	* gcc.target/riscv/shorten-memrefs-4.c: New test.
	* gcc.target/riscv/shorten-memrefs-5.c: New test.
	* gcc.target/riscv/shorten-memrefs-6.c: New test.
	* gcc.target/riscv/shorten-memrefs-7.c: New test.
---
 gcc/config.gcc                                |   2 +-
 gcc/config/riscv/riscv-passes.def             |  20 ++
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c      | 200 ++++++++++++++++++
 gcc/config/riscv/riscv.c                      |  88 +++++++-
 gcc/config/riscv/riscv.h                      |   5 +
 gcc/config/riscv/riscv.opt                    |   6 +
 gcc/config/riscv/t-riscv                      |   5 +
 gcc/doc/invoke.texi                           |  10 +
 .../gcc.target/riscv/shorten-memrefs-1.c      |  26 +++
 .../gcc.target/riscv/shorten-memrefs-2.c      |  51 +++++
 .../gcc.target/riscv/shorten-memrefs-3.c      |  39 ++++
 .../gcc.target/riscv/shorten-memrefs-4.c      |  26 +++
 .../gcc.target/riscv/shorten-memrefs-5.c      |  53 +++++
 .../gcc.target/riscv/shorten-memrefs-6.c      |  39 ++++
 .../gcc.target/riscv/shorten-memrefs-7.c      |  46 ++++
 16 files changed, 612 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5aa0130135f..29840752544 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -523,7 +523,7 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o"
 	d_target_objs="riscv-d.o"
 	;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
new file mode 100644
index 00000000000..8a4ea0918db
--- /dev/null
+++ b/gcc/config/riscv/riscv-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for RISC-V.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5092294803c..78008c28b75 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -89,4 +89,6 @@ extern void riscv_init_builtins (void);
 /* Routines implemented in riscv-common.c.  */
 extern std::string riscv_arch_str ();
 
+rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c b/gcc/config/riscv/riscv-shorten-memrefs.c
new file mode 100644
index 00000000000..3686005fe2e
--- /dev/null
+++ b/gcc/config/riscv/riscv-shorten-memrefs.c
@@ -0,0 +1,200 @@
+/* Shorten memrefs pass for RISC-V.
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "df.h"
+#include "predict.h"
+#include "tree-pass.h"
+
+/* Try to make more use of compressed load and store instructions by replacing
+   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
+   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
+   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
+   the change is only attempted when BASE is referenced by at least four
+   load/stores in the same basic block.  */
+
+namespace {
+
+const pass_data pass_data_shorten_memrefs =
+{
+  RTL_PASS, /* type */
+  "shorten_memrefs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_shorten_memrefs : public rtl_opt_pass
+{
+public:
+  pass_shorten_memrefs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0;
+    }
+  virtual unsigned int execute (function *);
+
+private:
+  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
+  typedef hash_map <regno_hash, int> regno_map;
+
+  regno_map * analyze (basic_block bb);
+  void transform (regno_map *m, basic_block bb);
+  bool get_si_mem_base_reg (rtx mem, rtx *addr);
+}; // class pass_shorten_memrefs
+
+bool
+pass_shorten_memrefs::get_si_mem_base_reg (rtx mem, rtx *addr)
+{
+  if (!MEM_P (mem) || GET_MODE (mem) != SImode)
+    return false;
+  *addr = XEXP (mem, 0);
+  return GET_CODE (*addr) == PLUS && REG_P (XEXP (*addr, 0));
+}
+
+/* Count how many times each regno is referenced as base address for a memory
+   access.  */
+
+pass_shorten_memrefs::regno_map *
+pass_shorten_memrefs::analyze (basic_block bb)
+{
+  regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
+  rtx_insn *insn;
+
+  regstat_init_n_sets_and_refs ();
+
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      /* Analyze stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      /* Do not count store zero as these cannot be compressed.  */
+	      if (i == 0)
+		{
+		  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
+		    continue;
+		}
+	      if (REG_N_REFS (regno) < 4)
+		continue;
+	      m->get_or_insert (regno)++;
+	    }
+	  }
+    }
+  regstat_free_n_sets_and_refs ();
+
+  return m;
+}
+
+/* Convert BASE + LARGE_OFFSET to NEW_BASE + SMALL_OFFSET for each load/store
+   with a base reg referenced at least 4 times.  */
+
+void
+pass_shorten_memrefs::transform (regno_map *m, basic_block bb)
+{
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      start_sequence ();
+      /* Transform stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      /* Do not transform store zero as these cannot be compressed.  */
+	      if (i == 0)
+		{
+		  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
+		    continue;
+		}
+	      if (m->get_or_insert (regno) > 3)
+		{
+		  addr
+		    = targetm.legitimize_address (addr, addr, GET_MODE (mem));
+		  XEXP (pat, i) = replace_equiv_address (mem, addr);
+		  df_insn_rescan (insn);
+		}
+	    }
+	}
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      emit_insn_before (seq, insn);
+    }
+}
+
+unsigned int
+pass_shorten_memrefs::execute (function *fn)
+{
+  basic_block bb;
+
+  FOR_ALL_BB_FN (bb, fn)
+  {
+    regno_map *m;
+    if (optimize_bb_for_speed_p (bb))
+      continue;
+    m = analyze (bb);
+    transform (m, bb);
+  }
+
+  return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_shorten_memrefs (gcc::context *ctxt)
+{
+  return new pass_shorten_memrefs (ctxt);
+}
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 9aa4d266e6b..1d6d2f89f7d 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "builtins.h"
 #include "predict.h"
+#include "tree-pass.h"
 
 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
 #define UNSPEC_ADDRESS_P(X)					\
@@ -848,6 +849,52 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
   return riscv_classify_address (&addr, x, mode, strict_p);
 }
 
+/* Return true if hard reg REGNO can be used in compressed instructions.  */
+
+static bool
+riscv_compressed_reg_p (int regno)
+{
+  /* x8-x15/f8-f15 are compressible registers.  */
+  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
+	  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
+}
+
+/* Return true if x is an unsigned 5-bit immediate scaled by 4.  */
+
+static bool
+riscv_compressed_lw_offset_p (rtx x)
+{
+  return (CONST_INT_P (x)
+	  && (INTVAL (x) & 3) == 0
+	  && IN_RANGE (INTVAL (x), 0, CSW_MAX_OFFSET));
+}
+
+/* Return true if load/store from/to address x can be compressed.  */
+
+static bool
+riscv_compressed_lw_address_p (rtx x)
+{
+  struct riscv_address_info addr;
+  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
+					reload_completed);
+
+  /* Before reload, assuming all load/stores of valid addresses get compressed
+     gives better code size than checking if the address is reg + small_offset
+     early on.  */
+  if (result && !reload_completed)
+    return true;
+
+  /* Return false if address is not compressed_reg + small_offset.  */
+  if (!result
+      || addr.type != ADDRESS_REG
+      || (!riscv_compressed_reg_p (REGNO (addr.reg))
+	    && addr.reg != stack_pointer_rtx)
+      || !riscv_compressed_lw_offset_p (addr.offset))
+    return false;
+
+  return result;
+}
+
 /* Return the number of instructions needed to load or store a value
    of mode MODE at address X.  Return 0 if X isn't valid for MODE.
    Assume that multiword moves may need to be split into word moves
@@ -1305,6 +1352,24 @@ riscv_force_address (rtx x, machine_mode mode)
   return x;
 }
 
+/* Modify base + offset so that offset fits within a compressed load/store insn
+   and the excess is added to base.  */
+
+static rtx
+riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset)
+{
+  rtx addr, high;
+  /* Leave OFFSET as an unsigned 5-bit offset scaled by 4 and put the excess
+     into HIGH.  */
+  high = GEN_INT (offset & ~CSW_MAX_OFFSET);
+  offset &= CSW_MAX_OFFSET;
+  if (!SMALL_OPERAND (INTVAL (high)))
+    high = force_reg (Pmode, high);
+  base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
+  addr = plus_constant (Pmode, base, offset);
+  return addr;
+}
+
 /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
    be legitimized in a way that the generic machinery might not expect,
    return a new address, otherwise return NULL.  MODE is the mode of
@@ -1323,7 +1388,7 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
   if (riscv_split_symbol (NULL, x, mode, &addr, FALSE))
     return riscv_force_address (addr, mode);
 
-  /* Handle BASE + OFFSET using riscv_add_offset.  */
+  /* Handle BASE + OFFSET.  */
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
       && INTVAL (XEXP (x, 1)) != 0)
     {
@@ -1332,7 +1397,14 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
 
       if (!riscv_valid_base_register_p (base, mode, false))
 	base = copy_to_mode_reg (Pmode, base);
-      addr = riscv_add_offset (NULL, base, offset);
+      if (optimize_function_for_size_p (cfun)
+	  && (strcmp (current_pass->name, "shorten_memrefs") == 0)
+	  && mode == SImode)
+	/* Convert BASE + LARGE_OFFSET into NEW_BASE + SMALL_OFFSET to allow
+	   possible compressed load/store.  */
+	addr = riscv_shorten_lw_offset (base, offset);
+      else
+	addr = riscv_add_offset (NULL, base, offset);
       return riscv_force_address (addr, mode);
     }
 
@@ -1822,6 +1894,11 @@ riscv_address_cost (rtx addr, machine_mode mode,
 		    addr_space_t as ATTRIBUTE_UNUSED,
 		    bool speed ATTRIBUTE_UNUSED)
 {
+  /* When optimizing for size, make uncompressible 32-bit addresses more
+   * expensive so that compressible 32-bit addresses are preferred.  */
+  if (!speed && riscv_mshorten_memrefs && mode == SImode
+      && !riscv_compressed_lw_address_p (addr))
+    return riscv_address_insns (addr, mode, false) + 1;
   return riscv_address_insns (addr, mode, false);
 }
 
@@ -4647,6 +4724,7 @@ riscv_option_override (void)
     error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
 	   " [%<-mriscv-attribute%>]");
 #endif
+
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -4686,9 +4764,9 @@ riscv_conditional_register_usage (void)
 static int
 riscv_register_priority (int regno)
 {
-  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
-  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
-		     || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
+  /* Favor compressed registers to improve the odds of RVC instruction
+     selection.  */
+  if (riscv_compressed_reg_p (regno))
     return 1;
 
   return 0;
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index b1e3403f8a7..a0d5354eae9 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -913,6 +913,7 @@ extern unsigned riscv_stack_boundary;
 #define SHIFT_RS1 15
 #define SHIFT_IMM 20
 #define IMM_BITS 12
+#define C_S_BITS 5
 #define C_SxSP_BITS 6
 
 #define IMM_REACH (1LL << IMM_BITS)
@@ -922,6 +923,10 @@ extern unsigned riscv_stack_boundary;
 #define SWSP_REACH (4LL << C_SxSP_BITS)
 #define SDSP_REACH (8LL << C_SxSP_BITS)
 
+/* This is the maximum value that can be represented in a compressed load/store
+   offset (an unsigned 5-bit value scaled by 4).  */
+#define CSW_MAX_OFFSET ((4LL << C_S_BITS) - 1) & ~3
+
 /* Called from RISCV_REORG, this is defined in riscv-sr.c.  */
 
 extern void riscv_remove_unneeded_save_restore_calls (void);
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 7f0c35e9e9c..138bddd369f 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -87,6 +87,12 @@ msave-restore
 Target Report Mask(SAVE_RESTORE)
 Use smaller but slower prologue and epilogue code.
 
+mshorten-memrefs
+Target Bool Var(riscv_mshorten_memrefs) Init(1)
+Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
+memory accesses to be generated as compressed instructions.  Currently targets
+32-bit integer load/stores.
+
 mcmodel=
 Target Report RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
 Specify the code model.
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 5ecb3c160a6..4820fb35d31 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -19,3 +19,8 @@ riscv-d.o: $(srcdir)/config/riscv/riscv-d.c
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
+PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index af3c7f2b910..329083fb9b9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1079,6 +1079,7 @@ See RS/6000 and PowerPC Options.
 -mpreferred-stack-boundary=@var{num} @gol
 -msmall-data-limit=@var{N-bytes} @gol
 -msave-restore  -mno-save-restore @gol
+-mshorten-memrefs  -mno-shorten-memrefs @gol
 -mstrict-align  -mno-strict-align @gol
 -mcmodel=medlow  -mcmodel=medany @gol
 -mexplicit-relocs  -mno-explicit-relocs @gol
@@ -24334,6 +24335,15 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -mshorten-memrefs
+@itemx -mno-shorten-memrefs
+@opindex mshorten-memrefs
+Do or do not attempt to make more use of compressed load/store instructions by
+replacing a load/store of 'base register + large offset' with a new load/store
+of 'new base + small offset'.  If the new base gets stored in a compressed
+register, then the new load/store can be compressed.  Currently targets 32-bit
+integer load/stores only.
+
 @item -mstrict-align
 @itemx -mno-strict-align
 @opindex mstrict-align
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
new file mode 100644
index 00000000000..958942a6f7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* These stores cannot be compressed because x0 is not a compressed reg.
+   Therefore the shorten_memrefs pass should not attempt to rewrite them into a
+   compressible format.  */
+
+void
+store1z (int *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+void
+store2z (long long *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */
+/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
new file mode 100644
index 00000000000..2c2f41548c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
@@ -0,0 +1,51 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* shorten_memrefs should rewrite these load/stores into a compressible
+   format.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler "store1a:\n\taddi" } } */
+/* The sd insns in store2a are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-*  } } } */
+/* { dg-final { scan-assembler "load1r:\n\taddi" } } */
+/* { dg-final { scan-assembler "load2r:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
new file mode 100644
index 00000000000..2001fe871ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
@@ -0,0 +1,39 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* These loads cannot be compressed because only one compressed reg is
+   available (since args are passed in a0-a4, that leaves a5-a7 available, of
+   which only a5 is a compressed reg). Therefore the shorten_memrefs pass should
+   not attempt to rewrite these loads into a compressible format. It may not
+   be possible to avoid this because shorten_memrefs happens before reg alloc.
+*/
+
+extern int sub1 (int, int, int, int, int, int, int);
+
+int
+load1a (int a0, int a1, int a2, int a3, int a4, int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub1 (a0, a1, a2, a3, a4, 0, a);
+}
+
+extern long long sub2 (long long, long long, long long, long long, long long,
+           long long, long long);
+
+long long
+load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
+  long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub2 (a0, a1, a2, a3, a4, 0, a);
+}
+
+/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" { xfail riscv*-*-*  } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
new file mode 100644
index 00000000000..cd4784913e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* These stores cannot be compressed because x0 is not a compressed reg.
+   Therefore the shorten_memrefs pass should not attempt to rewrite them into a
+   compressible format.  */
+
+void
+store1z (int *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+void
+store2z (long long *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */
+/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
new file mode 100644
index 00000000000..80b3897e4da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
@@ -0,0 +1,53 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* shorten_memrefs should rewrite these load/stores into a compressible
+   format.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler "store1a:\n\taddi" } } */
+/* The sd insns in store2a are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler "load1r:\n\taddi" } } */
+/* The ld insns in load2r are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "load2r:\n\taddi" { xfail riscv*-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
new file mode 100644
index 00000000000..3403c7044df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
@@ -0,0 +1,39 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* These loads cannot be compressed because only one compressed reg is
+   available (since args are passed in a0-a4, that leaves a5-a7 available, of
+   which only a5 is a compressed reg). Therefore the shorten_memrefs pass should
+   not attempt to rewrite these loads into a compressible format. It may not
+   be possible to avoid this because shorten_memrefs happens before reg alloc.
+*/
+
+extern int sub1 (int, int, int, int, int, int, int);
+
+int
+load1a (int a0, int a1, int a2, int a3, int a4, int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub1 (a0, a1, a2, a3, a4, 0, a);
+}
+
+extern long long sub2 (long long, long long, long long, long long, long long,
+           long long, long long);
+
+long long
+load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
+  long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub2 (a0, a1, a2, a3, a4, 0, a);
+}
+
+/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c
new file mode 100644
index 00000000000..a5833fd356d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c
@@ -0,0 +1,46 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32 -mno-shorten-memrefs" } */
+
+/* Check that these load/stores do not get rewritten into a compressible format
+   when shorten_memrefs is disabled.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler-not "addi" } } */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost
  2019-10-31  2:00       ` Jim Wilson
@ 2019-12-10 18:29         ` Craig Blackmore
  0 siblings, 0 replies; 24+ messages in thread
From: Craig Blackmore @ 2019-12-10 18:29 UTC (permalink / raw)
  To: Jim Wilson; +Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law

Hi Jim,

On 31/10/2019 00:03, Jim Wilson wrote:
> On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>> The sched2 pass undoes some of the addresses generated by the RISC-V
>> shorten_memrefs code size optimization (patch 1/2) and consequently increases
>> code size. This patch prevents sched-deps.c from changing an address if it is
>> expected to increase address cost.
>
> This should be rewritten as a target hook, and then we can define the
> hook to do what we want for RISC-V.  It isn't OK to make this change
> for other targets without testing them.  If the default hook does
> nothing, then other targets won't be affected.
>

Thanks for the review, here is an updated patch rewriting this as a target hook.

Thanks,
Craig

---
gcc/ChangeLog:

	* config/riscv/riscv.c (riscv_new_address_profitable_p): New function.
	(TARGET_NEW_ADDRESS_PROFITABLE_P): Define.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_NEW_ADDRESS_PROFITABLE_P):  New hook.
	* sched-deps.c (attempt_change): Use old address if it is cheaper than
	new address.
	* target.def (new_address_profitable_p): New hook.
	* targhooks.c (default_new_address_profitable_p): New function.
	* targhooks.h (default_new_address_profitable_p): Declare.
---
 gcc/config/riscv/riscv.c | 16 ++++++++++++++++
 gcc/doc/tm.texi          |  7 +++++++
 gcc/doc/tm.texi.in       |  2 ++
 gcc/sched-deps.c         |  3 +++
 gcc/target.def           | 11 +++++++++++
 gcc/targhooks.c          | 13 +++++++++++++
 gcc/targhooks.h          |  1 +
 7 files changed, 53 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 1d6d2f89f7d..a508894d16e 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -5095,6 +5095,19 @@ riscv_reorg (void)
     riscv_remove_unneeded_save_restore_calls ();
 }
 
+/* Implement TARGET_NEW_ADDRESS_PROFITABLE_P.  */
+
+bool
+riscv_new_address_profitable_p (rtx memref, rtx_insn *insn, rtx new_addr)
+{
+  /* Prefer old address if it is less expensive.  */
+  addr_space_t as = MEM_ADDR_SPACE (memref);
+  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
+  int old_cost = address_cost (XEXP (memref, 0), GET_MODE (memref), as, speed);
+  int new_cost = address_cost (new_addr, GET_MODE (memref), as, speed);
+  return new_cost <= old_cost;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -5272,6 +5285,9 @@ riscv_reorg (void)
 #undef TARGET_MACHINE_DEPENDENT_REORG
 #define TARGET_MACHINE_DEPENDENT_REORG riscv_reorg
 
+#undef TARGET_NEW_ADDRESS_PROFITABLE_P
+#define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5b8b68bd710..85470c8afa0 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6944,6 +6944,13 @@ candidate as a replacement for the if-convertible sequence described in
 @code{if_info}.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_NEW_ADDRESS_PROFITABLE_P (rtx @var{memref}, rtx_insn * @var{insn}, rtx @var{new_addr})
+Return @code{true} if it is profitable to replace the address in
+@var{memref} with @var{new_addr}.  This allows targets to prevent the
+scheduler from undoing address optimizations.  The instruction containing the
+memref is @var{insn}.  The default implementation returns @code{true}.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
 This predicate controls the use of the eager delay slot filler to disallow
 speculatively executed instructions being placed in delay slots.  Targets
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 1b061d70127..b5c80ce2fdc 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4695,6 +4695,8 @@ Define this macro if a non-short-circuit operation produced by
 
 @hook TARGET_NOCE_CONVERSION_PROFITABLE_P
 
+@hook TARGET_NEW_ADDRESS_PROFITABLE_P
+
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
 
 @hook TARGET_ESTIMATED_POLY_VALUE
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index a9e73fc6044..8f56594b745 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -4693,6 +4693,9 @@ attempt_change (struct mem_inc_info *mii, rtx new_addr)
   rtx mem = *mii->mem_loc;
   rtx new_mem;
 
+  if (!targetm.new_address_profitable_p (mem, mii->mem_insn, new_addr))
+    return NULL_RTX;
+
   /* Jump through a lot of hoops to keep the attributes up to date.  We
      do not want to call one of the change address variants that take
      an offset even though we know the offset in many cases.  These
diff --git a/gcc/target.def b/gcc/target.def
index e0e856979a9..81c78d8ffea 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3824,6 +3824,17 @@ candidate as a replacement for the if-convertible sequence described in\n\
 bool, (rtx_insn *seq, struct noce_if_info *if_info),
 default_noce_conversion_profitable_p)
 
+/* Return true if new_addr should be preferred over the existing address used by
+   memref in insn.  */
+DEFHOOK
+(new_address_profitable_p,
+ "Return @code{true} if it is profitable to replace the address in\n\
+@var{memref} with @var{new_addr}.  This allows targets to prevent the\n\
+scheduler from undoing address optimizations.  The instruction containing the\n\
+memref is @var{insn}.  The default implementation returns @code{true}.",
+bool, (rtx memref, rtx_insn * insn, rtx new_addr),
+default_new_address_profitable_p)
+
 DEFHOOK
 (estimated_poly_value,
  "Return an estimate of the runtime value of @var{val}, for use in\n\
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 2d3bcbf3e89..8e3ad63f1c1 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1564,6 +1564,19 @@ default_mode_dependent_address_p (const_rtx addr ATTRIBUTE_UNUSED,
   return false;
 }
 
+extern bool default_new_address_profitable_p (rtx, rtx);
+
+
+/* The default implementation of TARGET_NEW_ADDRESS_PROFITABLE_P.  */
+
+bool
+default_new_address_profitable_p (rtx memref ATTRIBUTE_UNUSED,
+				  rtx_insn *insn ATTRIBUTE_UNUSED,
+				  rtx new_addr ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 bool
 default_target_option_valid_attribute_p (tree ARG_UNUSED (fndecl),
 					 tree ARG_UNUSED (name),
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 34987bbcdb3..76a55428851 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -185,6 +185,7 @@ extern tree default_emutls_var_init (tree, tree, tree);
 extern unsigned int default_hard_regno_nregs (unsigned int, machine_mode);
 extern bool default_hard_regno_scratch_ok (unsigned int);
 extern bool default_mode_dependent_address_p (const_rtx, addr_space_t);
+extern bool default_new_address_profitable_p (rtx, rtx_insn *, rtx);
 extern bool default_target_option_valid_attribute_p (tree, tree, tree, int);
 extern bool default_target_option_pragma_parse (tree, tree);
 extern bool default_target_can_inline_p (tree, tree);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2019-12-10 18:28         ` Craig Blackmore
@ 2020-02-19 11:40           ` Craig Blackmore
  2020-04-08 16:04             ` Jim Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Craig Blackmore @ 2020-02-19 11:40 UTC (permalink / raw)
  To: Jim Wilson; +Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law


On 10/12/2019 18:28, Craig Blackmore wrote:
>
> Hi Jim,
>
> Thank you for your review. I have posted an updated patch below which I think
> addresses your comments.
>
Ping

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00713.html

Craig


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2020-02-19 11:40           ` Craig Blackmore
@ 2020-04-08 16:04             ` Jim Wilson
  2020-04-27 17:08               ` Craig Blackmore
  0 siblings, 1 reply; 24+ messages in thread
From: Jim Wilson @ 2020-04-08 16:04 UTC (permalink / raw)
  To: Craig Blackmore
  Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law

On Wed, Feb 19, 2020 at 3:40 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
> On 10/12/2019 18:28, Craig Blackmore wrote:
> Thank you for your review. I have posted an updated patch below which I think
> addresses your comments.
>
> Ping
>
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html

This looks OK.  There are some minor issues.

        (riscv_new_address_profitable_p): New function.
        (TARGET_NEW_ADDRESS_PROFITABLE_P): Define.

These are actually in part2, and part2 already has changelog entries
for them, so these can just be dropped.

+  /* When optimizing for size, make uncompressible 32-bit addresses more
+   * expensive so that compressible 32-bit addresses are preferred.  */
+  if (!speed && riscv_mshorten_memrefs && mode == SImode
+      && !riscv_compressed_lw_address_p (addr))
+    return riscv_address_insns (addr, mode, false) + 1;

I think that there should be a TARGET_RVC check here, just like in the gate
function for the new pass.  But I also suspect that this probably
doesn't matter much.

> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00713.html

This also looks OK to me, but adding a new target hook is something
that should really wait for stage1 which should only be a few weeks
away now.  That will also give us more time to test the patch before
it appears in a gcc release.  So this is OK once stage1 opens.

Jim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2020-04-08 16:04             ` Jim Wilson
@ 2020-04-27 17:08               ` Craig Blackmore
  2020-05-12 22:33                 ` Jim Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Craig Blackmore @ 2020-04-27 17:08 UTC (permalink / raw)
  To: Jim Wilson; +Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law

On 08/04/2020 17:04, Jim Wilson wrote:

> On Wed, Feb 19, 2020 at 3:40 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>> On 10/12/2019 18:28, Craig Blackmore wrote:
>> Thank you for your review. I have posted an updated patch below which I think
>> addresses your comments.
>>
>> Ping
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00712.html
> This looks OK.  There are some minor issues.
>
>         (riscv_new_address_profitable_p): New function.
>         (TARGET_NEW_ADDRESS_PROFITABLE_P): Define.
>
> These are actually in part2, and part2 already has changelog entries
> for them, so these can just be dropped.
>
> +  /* When optimizing for size, make uncompressible 32-bit addresses more
> +   * expensive so that compressible 32-bit addresses are preferred.  */
> +  if (!speed && riscv_mshorten_memrefs && mode == SImode
> +      && !riscv_compressed_lw_address_p (addr))
> +    return riscv_address_insns (addr, mode, false) + 1;
>
> I think that there should be a TARGET_RVC check here, just like in the gate
> function for the new pass.  But I also suspect that this probably
> doesn't matter much.

Hi Jim,

Thanks for the review. I have updated the following patch with those changes.

Craig

---

gcc/ChangeLog:

	* config.gcc:  Add riscv-shorten-memrefs.o to extra_objs for riscv.
	* config/riscv/riscv-passes.def: New file.
	* config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
	* config/riscv/riscv-shorten-memrefs.c: New file.
	* config/riscv/riscv.c (tree-pass.h): New include.
	(riscv_compressed_reg_p): New Function
	(riscv_compressed_lw_offset_p): Likewise.
	(riscv_compressed_lw_address_p): Likewise.
	(riscv_shorten_lw_offset): Likewise.
	(riscv_legitimize_address): Attempt to convert base + large_offset
	to compressible new_base + small_offset.
	(riscv_address_cost): Make anticipated compressed load/stores
	cheaper for code size than uncompressed load/stores.
	(riscv_register_priority): Move compressed register check to
	riscv_compressed_reg_p.
	* config/riscv/riscv.h (C_S_BITS): Define.
	(CSW_MAX_OFFSET): Define.
	* config/riscv/riscv.opt (mshorten-memefs): New option.
	* config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
	(PASSES_EXTRA): Add riscv-passes.def.
	* doc/invoke.texi: Document -mshorten-memrefs.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/shorten-memrefs-1.c: New test.
	* gcc.target/riscv/shorten-memrefs-2.c: New test.
	* gcc.target/riscv/shorten-memrefs-3.c: New test.
	* gcc.target/riscv/shorten-memrefs-4.c: New test.
	* gcc.target/riscv/shorten-memrefs-5.c: New test.
	* gcc.target/riscv/shorten-memrefs-6.c: New test.
	* gcc.target/riscv/shorten-memrefs-7.c: New test.
---
 gcc/config.gcc                                |   2 +-
 gcc/config/riscv/riscv-passes.def             |  20 ++
 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv-shorten-memrefs.c      | 200 ++++++++++++++++++
 gcc/config/riscv/riscv.c                      |  88 +++++++-
 gcc/config/riscv/riscv.h                      |   5 +
 gcc/config/riscv/riscv.opt                    |   6 +
 gcc/config/riscv/t-riscv                      |   5 +
 gcc/doc/invoke.texi                           |  10 +
 .../gcc.target/riscv/shorten-memrefs-1.c      |  26 +++
 .../gcc.target/riscv/shorten-memrefs-2.c      |  51 +++++
 .../gcc.target/riscv/shorten-memrefs-3.c      |  39 ++++
 .../gcc.target/riscv/shorten-memrefs-4.c      |  26 +++
 .../gcc.target/riscv/shorten-memrefs-5.c      |  53 +++++
 .../gcc.target/riscv/shorten-memrefs-6.c      |  39 ++++
 .../gcc.target/riscv/shorten-memrefs-7.c      |  46 ++++
 16 files changed, 612 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-passes.def
 create mode 100644 gcc/config/riscv/riscv-shorten-memrefs.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index cf1a87e2efd..3c2a0389b98 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -525,7 +525,7 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o"
 	d_target_objs="riscv-d.o"
 	;;
 rs6000*-*-*)
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
new file mode 100644
index 00000000000..8a4ea0918db
--- /dev/null
+++ b/gcc/config/riscv/riscv-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for RISC-V.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 8cf9137b5e7..72280ec1c76 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -91,4 +91,6 @@ extern std::string riscv_arch_str ();
 
 extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 
+rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-shorten-memrefs.c b/gcc/config/riscv/riscv-shorten-memrefs.c
new file mode 100644
index 00000000000..3686005fe2e
--- /dev/null
+++ b/gcc/config/riscv/riscv-shorten-memrefs.c
@@ -0,0 +1,200 @@
+/* Shorten memrefs pass for RISC-V.
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "df.h"
+#include "predict.h"
+#include "tree-pass.h"
+
+/* Try to make more use of compressed load and store instructions by replacing
+   a load/store at address BASE + LARGE_OFFSET with a new load/store at address
+   NEW BASE + SMALL OFFSET.  If NEW BASE is stored in a compressed register, the
+   load/store can be compressed.  Since creating NEW BASE incurs an overhead,
+   the change is only attempted when BASE is referenced by at least four
+   load/stores in the same basic block.  */
+
+namespace {
+
+const pass_data pass_data_shorten_memrefs =
+{
+  RTL_PASS, /* type */
+  "shorten_memrefs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_shorten_memrefs : public rtl_opt_pass
+{
+public:
+  pass_shorten_memrefs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_shorten_memrefs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return TARGET_RVC && riscv_mshorten_memrefs && optimize > 0;
+    }
+  virtual unsigned int execute (function *);
+
+private:
+  typedef int_hash <HOST_WIDE_INT, 0> regno_hash;
+  typedef hash_map <regno_hash, int> regno_map;
+
+  regno_map * analyze (basic_block bb);
+  void transform (regno_map *m, basic_block bb);
+  bool get_si_mem_base_reg (rtx mem, rtx *addr);
+}; // class pass_shorten_memrefs
+
+bool
+pass_shorten_memrefs::get_si_mem_base_reg (rtx mem, rtx *addr)
+{
+  if (!MEM_P (mem) || GET_MODE (mem) != SImode)
+    return false;
+  *addr = XEXP (mem, 0);
+  return GET_CODE (*addr) == PLUS && REG_P (XEXP (*addr, 0));
+}
+
+/* Count how many times each regno is referenced as base address for a memory
+   access.  */
+
+pass_shorten_memrefs::regno_map *
+pass_shorten_memrefs::analyze (basic_block bb)
+{
+  regno_map *m = hash_map<regno_hash, int>::create_ggc (10);
+  rtx_insn *insn;
+
+  regstat_init_n_sets_and_refs ();
+
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      /* Analyze stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      /* Do not count store zero as these cannot be compressed.  */
+	      if (i == 0)
+		{
+		  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
+		    continue;
+		}
+	      if (REG_N_REFS (regno) < 4)
+		continue;
+	      m->get_or_insert (regno)++;
+	    }
+	  }
+    }
+  regstat_free_n_sets_and_refs ();
+
+  return m;
+}
+
+/* Convert BASE + LARGE_OFFSET to NEW_BASE + SMALL_OFFSET for each load/store
+   with a base reg referenced at least 4 times.  */
+
+void
+pass_shorten_memrefs::transform (regno_map *m, basic_block bb)
+{
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+    {
+      if (!NONJUMP_INSN_P (insn))
+	continue;
+      rtx pat = PATTERN (insn);
+      if (GET_CODE (pat) != SET)
+	continue;
+      start_sequence ();
+      /* Transform stores first then loads.  */
+      for (int i = 0; i < 2; i++)
+	{
+	  rtx mem = XEXP (pat, i);
+	  rtx addr;
+	  if (get_si_mem_base_reg (mem, &addr))
+	    {
+	      HOST_WIDE_INT regno = REGNO (XEXP (addr, 0));
+	      /* Do not transform store zero as these cannot be compressed.  */
+	      if (i == 0)
+		{
+		  if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1))))
+		    continue;
+		}
+	      if (m->get_or_insert (regno) > 3)
+		{
+		  addr
+		    = targetm.legitimize_address (addr, addr, GET_MODE (mem));
+		  XEXP (pat, i) = replace_equiv_address (mem, addr);
+		  df_insn_rescan (insn);
+		}
+	    }
+	}
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      emit_insn_before (seq, insn);
+    }
+}
+
+unsigned int
+pass_shorten_memrefs::execute (function *fn)
+{
+  basic_block bb;
+
+  FOR_ALL_BB_FN (bb, fn)
+  {
+    regno_map *m;
+    if (optimize_bb_for_speed_p (bb))
+      continue;
+    m = analyze (bb);
+    transform (m, bb);
+  }
+
+  return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_shorten_memrefs (gcc::context *ctxt)
+{
+  return new pass_shorten_memrefs (ctxt);
+}
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 94b5ac01762..0c0879c8aee 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "builtins.h"
 #include "predict.h"
+#include "tree-pass.h"
 
 /* True if X is an UNSPEC wrapper around a SYMBOL_REF or LABEL_REF.  */
 #define UNSPEC_ADDRESS_P(X)					\
@@ -848,6 +849,52 @@ riscv_legitimate_address_p (machine_mode mode, rtx x, bool strict_p)
   return riscv_classify_address (&addr, x, mode, strict_p);
 }
 
+/* Return true if hard reg REGNO can be used in compressed instructions.  */
+
+static bool
+riscv_compressed_reg_p (int regno)
+{
+  /* x8-x15/f8-f15 are compressible registers.  */
+  return (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
+	  || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)));
+}
+
+/* Return true if x is an unsigned 5-bit immediate scaled by 4.  */
+
+static bool
+riscv_compressed_lw_offset_p (rtx x)
+{
+  return (CONST_INT_P (x)
+	  && (INTVAL (x) & 3) == 0
+	  && IN_RANGE (INTVAL (x), 0, CSW_MAX_OFFSET));
+}
+
+/* Return true if load/store from/to address x can be compressed.  */
+
+static bool
+riscv_compressed_lw_address_p (rtx x)
+{
+  struct riscv_address_info addr;
+  bool result = riscv_classify_address (&addr, x, GET_MODE (x),
+					reload_completed);
+
+  /* Before reload, assuming all load/stores of valid addresses get compressed
+     gives better code size than checking if the address is reg + small_offset
+     early on.  */
+  if (result && !reload_completed)
+    return true;
+
+  /* Return false if address is not compressed_reg + small_offset.  */
+  if (!result
+      || addr.type != ADDRESS_REG
+      || (!riscv_compressed_reg_p (REGNO (addr.reg))
+	    && addr.reg != stack_pointer_rtx)
+      || !riscv_compressed_lw_offset_p (addr.offset))
+    return false;
+
+  return result;
+}
+
 /* Return the number of instructions needed to load or store a value
    of mode MODE at address X.  Return 0 if X isn't valid for MODE.
    Assume that multiword moves may need to be split into word moves
@@ -1308,6 +1355,24 @@ riscv_force_address (rtx x, machine_mode mode)
   return x;
 }
 
+/* Modify base + offset so that offset fits within a compressed load/store insn
+   and the excess is added to base.  */
+
+static rtx
+riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset)
+{
+  rtx addr, high;
+  /* Leave OFFSET as an unsigned 5-bit offset scaled by 4 and put the excess
+     into HIGH.  */
+  high = GEN_INT (offset & ~CSW_MAX_OFFSET);
+  offset &= CSW_MAX_OFFSET;
+  if (!SMALL_OPERAND (INTVAL (high)))
+    high = force_reg (Pmode, high);
+  base = force_reg (Pmode, gen_rtx_PLUS (Pmode, high, base));
+  addr = plus_constant (Pmode, base, offset);
+  return addr;
+}
+
 /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
    be legitimized in a way that the generic machinery might not expect,
    return a new address, otherwise return NULL.  MODE is the mode of
@@ -1326,7 +1391,7 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
   if (riscv_split_symbol (NULL, x, mode, &addr, FALSE))
     return riscv_force_address (addr, mode);
 
-  /* Handle BASE + OFFSET using riscv_add_offset.  */
+  /* Handle BASE + OFFSET.  */
   if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))
       && INTVAL (XEXP (x, 1)) != 0)
     {
@@ -1335,7 +1400,14 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
 
       if (!riscv_valid_base_register_p (base, mode, false))
 	base = copy_to_mode_reg (Pmode, base);
-      addr = riscv_add_offset (NULL, base, offset);
+      if (optimize_function_for_size_p (cfun)
+	  && (strcmp (current_pass->name, "shorten_memrefs") == 0)
+	  && mode == SImode)
+	/* Convert BASE + LARGE_OFFSET into NEW_BASE + SMALL_OFFSET to allow
+	   possible compressed load/store.  */
+	addr = riscv_shorten_lw_offset (base, offset);
+      else
+	addr = riscv_add_offset (NULL, base, offset);
       return riscv_force_address (addr, mode);
     }
 
@@ -1833,6 +1905,11 @@ riscv_address_cost (rtx addr, machine_mode mode,
 		    addr_space_t as ATTRIBUTE_UNUSED,
 		    bool speed ATTRIBUTE_UNUSED)
 {
+  /* When optimizing for size, make uncompressible 32-bit addresses more
+   * expensive so that compressible 32-bit addresses are preferred.  */
+  if (TARGET_RVC && !speed && riscv_mshorten_memrefs && mode == SImode
+      && !riscv_compressed_lw_address_p (addr))
+    return riscv_address_insns (addr, mode, false) + 1;
   return riscv_address_insns (addr, mode, false);
 }
 
@@ -4666,6 +4743,7 @@ riscv_option_override (void)
     error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32"
 	   " [%<-mriscv-attribute%>]");
 #endif
+
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -4705,9 +4783,9 @@ riscv_conditional_register_usage (void)
 static int
 riscv_register_priority (int regno)
 {
-  /* Favor x8-x15/f8-f15 to improve the odds of RVC instruction selection.  */
-  if (TARGET_RVC && (IN_RANGE (regno, GP_REG_FIRST + 8, GP_REG_FIRST + 15)
-		     || IN_RANGE (regno, FP_REG_FIRST + 8, FP_REG_FIRST + 15)))
+  /* Favor compressed registers to improve the odds of RVC instruction
+     selection.  */
+  if (riscv_compressed_reg_p (regno))
     return 1;
 
   return 0;
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 567c23380fe..e6209ede9d6 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -920,6 +920,7 @@ extern unsigned riscv_stack_boundary;
 #define SHIFT_RS1 15
 #define SHIFT_IMM 20
 #define IMM_BITS 12
+#define C_S_BITS 5
 #define C_SxSP_BITS 6
 
 #define IMM_REACH (1LL << IMM_BITS)
@@ -929,6 +930,10 @@ extern unsigned riscv_stack_boundary;
 #define SWSP_REACH (4LL << C_SxSP_BITS)
 #define SDSP_REACH (8LL << C_SxSP_BITS)
 
+/* This is the maximum value that can be represented in a compressed load/store
+   offset (an unsigned 5-bit value scaled by 4).  */
+#define CSW_MAX_OFFSET ((4LL << C_S_BITS) - 1) & ~3
+
 /* Called from RISCV_REORG, this is defined in riscv-sr.c.  */
 
 extern void riscv_remove_unneeded_save_restore_calls (void);
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 29de246759e..e4bfcb86f51 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -87,6 +87,12 @@ msave-restore
 Target Report Mask(SAVE_RESTORE)
 Use smaller but slower prologue and epilogue code.
 
+mshorten-memrefs
+Target Bool Var(riscv_mshorten_memrefs) Init(1)
+Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
+memory accesses to be generated as compressed instructions.  Currently targets
+32-bit integer load/stores.
+
 mcmodel=
 Target Report RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
 Specify the code model.
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 5ecb3c160a6..4820fb35d31 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -19,3 +19,8 @@ riscv-d.o: $(srcdir)/config/riscv/riscv-d.c
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
+PASSES_EXTRA += $(srcdir)/config/riscv/riscv-passes.def
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a37a2ee9c19..ad4c6d94f82 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1129,6 +1129,7 @@ See RS/6000 and PowerPC Options.
 -mpreferred-stack-boundary=@var{num} @gol
 -msmall-data-limit=@var{N-bytes} @gol
 -msave-restore  -mno-save-restore @gol
+-mshorten-memrefs  -mno-shorten-memrefs @gol
 -mstrict-align  -mno-strict-align @gol
 -mcmodel=medlow  -mcmodel=medany @gol
 -mexplicit-relocs  -mno-explicit-relocs @gol
@@ -25321,6 +25322,15 @@ Do or don't use smaller but slower prologue and epilogue code that uses
 library function calls.  The default is to use fast inline prologues and
 epilogues.
 
+@item -mshorten-memrefs
+@itemx -mno-shorten-memrefs
+@opindex mshorten-memrefs
+Do or do not attempt to make more use of compressed load/store instructions by
+replacing a load/store of 'base register + large offset' with a new load/store
+of 'new base + small offset'.  If the new base gets stored in a compressed
+register, then the new load/store can be compressed.  Currently targets 32-bit
+integer load/stores only.
+
 @item -mstrict-align
 @itemx -mno-strict-align
 @opindex mstrict-align
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
new file mode 100644
index 00000000000..958942a6f7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-1.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* These stores cannot be compressed because x0 is not a compressed reg.
+   Therefore the shorten_memrefs pass should not attempt to rewrite them into a
+   compressible format.  */
+
+void
+store1z (int *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+void
+store2z (long long *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */
+/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
new file mode 100644
index 00000000000..2c2f41548c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-2.c
@@ -0,0 +1,51 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* shorten_memrefs should rewrite these load/stores into a compressible
+   format.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler "store1a:\n\taddi" } } */
+/* The sd insns in store2a are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-*  } } } */
+/* { dg-final { scan-assembler "load1r:\n\taddi" } } */
+/* { dg-final { scan-assembler "load2r:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
new file mode 100644
index 00000000000..2001fe871ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-3.c
@@ -0,0 +1,39 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32" } */
+
+/* These loads cannot be compressed because only one compressed reg is
+   available (since args are passed in a0-a4, that leaves a5-a7 available, of
+   which only a5 is a compressed reg). Therefore the shorten_memrefs pass should
+   not attempt to rewrite these loads into a compressible format. It may not
+   be possible to avoid this because shorten_memrefs happens before reg alloc.
+*/
+
+extern int sub1 (int, int, int, int, int, int, int);
+
+int
+load1a (int a0, int a1, int a2, int a3, int a4, int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub1 (a0, a1, a2, a3, a4, 0, a);
+}
+
+extern long long sub2 (long long, long long, long long, long long, long long,
+           long long, long long);
+
+long long
+load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
+  long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub2 (a0, a1, a2, a3, a4, 0, a);
+}
+
+/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" { xfail riscv*-*-*  } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
new file mode 100644
index 00000000000..cd4784913e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-4.c
@@ -0,0 +1,26 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* These stores cannot be compressed because x0 is not a compressed reg.
+   Therefore the shorten_memrefs pass should not attempt to rewrite them into a
+   compressible format.  */
+
+void
+store1z (int *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+void
+store2z (long long *array)
+{
+  array[200] = 0;
+  array[201] = 0;
+  array[202] = 0;
+  array[203] = 0;
+}
+
+/* { dg-final { scan-assembler-not "store1z:\n\taddi" } } */
+/* { dg-final { scan-assembler-not "store2z:\n\taddi" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
new file mode 100644
index 00000000000..80b3897e4da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-5.c
@@ -0,0 +1,53 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* shorten_memrefs should rewrite these load/stores into a compressible
+   format.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler "store1a:\n\taddi" } } */
+/* The sd insns in store2a are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "store2a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler "load1r:\n\taddi" } } */
+/* The ld insns in load2r are not rewritten because shorten_memrefs currently
+   only optimizes lw and sw.
+/* { dg-final { scan-assembler "load2r:\n\taddi" { xfail riscv*-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
new file mode 100644
index 00000000000..3403c7044df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-6.c
@@ -0,0 +1,39 @@
+/* { dg-options "-Os -march=rv64imc -mabi=lp64" } */
+
+/* These loads cannot be compressed because only one compressed reg is
+   available (since args are passed in a0-a4, that leaves a5-a7 available, of
+   which only a5 is a compressed reg). Therefore the shorten_memrefs pass should
+   not attempt to rewrite these loads into a compressible format. It may not
+   be possible to avoid this because shorten_memrefs happens before reg alloc.
+*/
+
+extern int sub1 (int, int, int, int, int, int, int);
+
+int
+load1a (int a0, int a1, int a2, int a3, int a4, int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub1 (a0, a1, a2, a3, a4, 0, a);
+}
+
+extern long long sub2 (long long, long long, long long, long long, long long,
+           long long, long long);
+
+long long
+load2a (long long a0, long long a1, long long a2, long long a3, long long a4,
+  long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return sub2 (a0, a1, a2, a3, a4, 0, a);
+}
+
+/* { dg-final { scan-assembler-not "load1a:\n\taddi" { xfail riscv*-*-* } } } */
+/* { dg-final { scan-assembler-not "load2a:\n.*addi\[ \t\]*\[at\]\[0-9\],\[at\]\[0-9\],\[0-9\]*" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c
new file mode 100644
index 00000000000..a5833fd356d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/shorten-memrefs-7.c
@@ -0,0 +1,46 @@
+/* { dg-options "-Os -march=rv32imc -mabi=ilp32 -mno-shorten-memrefs" } */
+
+/* Check that these load/stores do not get rewritten into a compressible format
+   when shorten_memrefs is disabled.  */
+
+void
+store1a (int *array, int a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+void
+store2a (long long *array, long long a)
+{
+  array[200] = a;
+  array[201] = a;
+  array[202] = a;
+  array[203] = a;
+}
+
+int
+load1r (int *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+long long
+load2r (long long *array)
+{
+  int a = 0;
+  a += array[200];
+  a += array[201];
+  a += array[202];
+  a += array[203];
+  return a;
+}
+
+/* { dg-final { scan-assembler-not "addi" } } */
-- 
2.17.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2020-04-27 17:08               ` Craig Blackmore
@ 2020-05-12 22:33                 ` Jim Wilson
  2020-05-13 17:51                   ` Craig Blackmore
  0 siblings, 1 reply; 24+ messages in thread
From: Jim Wilson @ 2020-05-12 22:33 UTC (permalink / raw)
  To: Craig Blackmore
  Cc: GCC Patches, Ofer Shinaar, Nidal Faour, Kito Cheng, Jeff Law

On Mon, Apr 27, 2020 at 10:08 AM Craig Blackmore
<craig.blackmore@embecosm.com> wrote:
> Thanks for the review. I have updated the following patch with those changes.

This looks good, and the tree is open for development work again, so I
committed both parts 1 and 2 and pushed it.

One weird thing is that while the patch program accepted the patch
fine, git am would not, and kept giving me an error that didn't make
any sense, pointing at a line that didn't have any visible problem.
So I had to commit it by hand and then use git commit --amend to fix
the authorship before pushing it.  The end result looks OK to me
though.

Jim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
  2020-05-12 22:33                 ` Jim Wilson
@ 2020-05-13 17:51                   ` Craig Blackmore
  0 siblings, 0 replies; 24+ messages in thread
From: Craig Blackmore @ 2020-05-13 17:51 UTC (permalink / raw)
  To: Jim Wilson; +Cc: GCC Patches, Ofer Shinaar, Nidal Faour

On 12/05/2020 23:33, Jim Wilson wrote:
> On Mon, Apr 27, 2020 at 10:08 AM Craig Blackmore
> <craig.blackmore@embecosm.com> wrote:
>> Thanks for the review. I have updated the following patch with those changes.
> This looks good, and the tree is open for development work again, so I
> committed both parts 1 and 2 and pushed it.
>
> One weird thing is that while the patch program accepted the patch
> fine, git am would not, and kept giving me an error that didn't make
> any sense, pointing at a line that didn't have any visible problem.
> So I had to commit it by hand and then use git commit --amend to fix
> the authorship before pushing it.  The end result looks OK to me
> though.
>
> Jim

Hi Jim,

Many thanks for your help in reviewing this patch.

I am not sure what happened with git am, thanks for working around it,
the end result looks good to me.

Craig


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-05-13 17:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-12 16:19 [PATCH] RISC-V: Allow more load/stores to be compressed Craig Blackmore
2019-09-18 10:01 ` Kito Cheng
2019-10-25 17:40   ` [PATCH v2 0/2] " Craig Blackmore
2019-10-25 17:40     ` [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass Craig Blackmore
2019-10-26 18:25       ` Jeff Law
2019-10-26 19:16         ` Oleg Endo
2019-10-26 20:04           ` Jeff Law
2019-10-26 20:02         ` Andrew Waterman
2019-10-26 20:16           ` Jeff Law
2019-10-27  7:13             ` Oleg Endo
2019-10-31  0:00       ` Jim Wilson
2019-10-31  9:42         ` Nidal Faour
2019-10-31 10:42           ` Andrew Waterman
2019-10-31 15:57             ` Ofer Shinaar
2019-12-10 18:28         ` Craig Blackmore
2020-02-19 11:40           ` Craig Blackmore
2020-04-08 16:04             ` Jim Wilson
2020-04-27 17:08               ` Craig Blackmore
2020-05-12 22:33                 ` Jim Wilson
2020-05-13 17:51                   ` Craig Blackmore
2019-10-31  0:03       ` Jim Wilson
2019-10-25 17:57     ` [PATCH v2 2/2] sched-deps.c: Avoid replacing address if it increases address cost Craig Blackmore
2019-10-31  2:00       ` Jim Wilson
2019-12-10 18:29         ` Craig Blackmore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).