Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <jeffreyalaw@gmail.com>
To: Manolis Tsamis <manolis.tsamis@vrull.eu>, gcc-patches@gcc.gnu.org
Cc: Philipp Tomsich <philipp.tomsich@vrull.eu>,
	Vineet Gupta <vineetg@rivosinc.com>,
	Richard Biener <richard.guenther@gmail.com>
Subject: Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.
Date: Mon, 11 Sep 2023 18:47:44 -0600	[thread overview]
Message-ID: <a8af8bf2-042d-492b-a939-47b1326bd4bc@gmail.com> (raw)
In-Reply-To: <20230909084652.2655745-1-manolis.tsamis@vrull.eu>



On 9/9/23 02:46, Manolis Tsamis wrote:
> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
> 
>    addi t4,sp,16
>    add  t2,a6,t4
>    shl  t3,t2,1
>    ld   a2,0(t3)
>    addi a2,1
>    sd   a2,8(t2)
> 
> into the following (one instruction less):
> 
>    add  t2,a6,sp
>    shl  t3,t2,1
>    ld   a2,32(t3)
>    addi a2,1
>    sd   a2,24(t2)
> 
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in: Add fold-mem-offsets.o.
> 	* passes.def: Schedule a new pass.
> 	* tree-pass.h (make_pass_fold_mem_offsets): Declare.
> 	* common.opt: New options.
> 	* doc/invoke.texi: Document new option.
> 	* fold-mem-offsets.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
> 
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
> 
> Changes in v5:
>          - Introduce new helper function fold_offsets_1.
>          - Fix bug because constants could be partially propagated
>            through instructions that weren't understood.
>          - Introduce helper class fold_mem_info that stores f-m-o
>            info for an instruction.
>          - Calculate fold_offsets only once with do_fold_info_calculation.
>          - Fix correctness issue by introducing compute_validity_closure.
>          - Propagate in more cases for PLUS/MINUS with constant.
> 
> Changes in v4:
>          - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
>          - Remove fold_mem_offsets_driver and enum fold_mem_phase.
>          - Call recog when patching offsets in do_commit_offset.
>          - Restore INSN_CODE after modifying insn in do_check_validity.
> 
> Changes in v3:
>          - Added propagation for more codes:
>            sub, neg, mul.
>          - Added folding / elimination for sub and
>            const int moves.
>          - For the validity check of the generated addresses
>            also test memory_address_addr_space_p.
>          - Replaced GEN_INT with gen_int_mode.
>          - Replaced some bitmap_head with auto_bitmap.
>          - Refactor each phase into own function for readability.
>          - Add dump details.
>          - Replace rtx iteration with reg_mentioned_p.
>          - Return early for codes that we can't propagate through.
> 
> Changes in v2:
>          - Made the pass target-independant instead of RISCV specific.
>          - Fixed a number of bugs.
>          - Add code to handle more ADD patterns as found
>            in other targets (x86, aarch64).
>          - Improved naming and comments.
>          - Fixed bitmap memory leak.
> 


> +
> +/* Get the single reaching definition of an instruction inside a BB.
> +   The definition is desired for REG used in INSN.
> +   Return the definition insn or NULL if there's no definition with
> +   the desired criteria.  */
> +static rtx_insn*
> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> +{
> +  df_ref use;
> +  struct df_link *ref_chain, *ref_link;
> +
> +  FOR_EACH_INSN_USE (use, insn)
> +    {
> +      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> +	return NULL;
> +      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> +	break;
> +    }
> +
> +  if (!use)
> +    return NULL;
> +
> +  ref_chain = DF_REF_CHAIN (use);
So what if there's two uses of REG in INSN?  I don't think it's be 
common at all, but probably better safe and reject than sorry, right? Or 
is that case filtered out earlier?




> +
> +  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
Formatting nit.  The '*' should be next to the variable, not the type.

> +

> +
> +static HOST_WIDE_INT
> +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns);
> +
> +/*  Helper function for fold_offsets.
> +
> +    If DO_RECURSION is false and ANALYZE is true this function returns true iff
> +    it understands the structure of INSN and knows how to propagate constants
> +    through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
> +
> +    If DO_RECURSION is true then it also calls fold_offsets for each recognised
> +    part of INSN with the appropriate arguments.
> +
> +    If DO_RECURSION is true and ANALYZE is false then offset that would result
> +    from folding is computed and is returned through the pointer OFFSET_OUT.
> +    The instructions that can be folded are recorded in FOLDABLE_INSNS.
> +*/
> +static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion,
> +			    HOST_WIDE_INT *offset_out, bitmap foldable_insns)
Nit.  Linkage and return type on separate line.  That makes the function 
name start at the beginning of a line.



> +
> +/* Test if INSN is a memory load / store that can have an offset folded to it.
> +   Return true iff INSN is such an instruction and return through MEM_OUT,
> +   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
> +   used as a base address and the offset accordingly.
> +   All of the out pointers may be NULL in which case they will be ignored.  */
> +bool
> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
> +		   HOST_WIDE_INT *offset_out)
> +{
> +  rtx set = single_set (insn);
> +  rtx mem = NULL_RTX;
> +
> +  if (set != NULL_RTX)
> +    {
> +      rtx src = SET_SRC (set);
> +      rtx dest = SET_DEST (set);
> +
> +      /* Don't fold when we have unspec / volatile.  */
> +      if (GET_CODE (src) == UNSPEC
> +	  || GET_CODE (src) == UNSPEC_VOLATILE
> +	  || GET_CODE (dest) == UNSPEC
> +	  || GET_CODE (dest) == UNSPEC_VOLATILE)
> +	return false;
> +
> +      if (MEM_P (src))
> +	mem = src;
> +      else if (MEM_P (dest))
> +	mem = dest;
> +      else if ((GET_CODE (src) == SIGN_EXTEND
> +		|| GET_CODE (src) == ZERO_EXTEND)
> +	       && MEM_P (XEXP (src, 0)))
Note some architectures allow both a source and destination memory.  It 
looks like your code will prefer the source operand in that case. 
That's fine, just pointing it out.



> +
> +static bool
> +compute_validity_closure (fold_info_map *fold_info)
> +{
> +  /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
> +     and memory operations rN that use xN as shown below.  If folding x1 in r1
> +     turns out to be invalid for whatever reason then it's also invalid to fold
> +     any of the other xN into any rN.  That means that we need the transitive
> +     closure of validity to determine whether we can fold a xN instruction.
> +
> +     +--------------+    +-------------------+    +-------------------+
> +     | r1 = mem[x1] |    | r2 = mem[x1 + x2] |    | r3 = mem[x2 + x3] |   ...
> +     +--------------+    +-------------------+    +-------------------+
> +	    ^                ^       ^                ^       ^
> +	    |               /        |               /        |           ...
> +	    |              /         |              /         |
> +     +-------------+      /   +-------------+      /   +-------------+
> +     | x1 = x1 + 1 |-----+    | x2 = x2 + 1 |-----+    | x3 = x3 + 1 |--- ...
> +     +-------------+          +-------------+          +-------------+
> +	    ^                        ^                        ^
> +	    |                        |                        |
> +	   ...                      ...                      ...
> +  */
> +
> +  int max_iters = 5;
> +  for (int i = 0; i < max_iters; i++)
> +    {
> +      bool made_changes = false;
> +      for (fold_info_map::iterator iter = fold_info->begin ();
> +	   iter != fold_info->end (); ++iter)
> +	{
> +	  fold_mem_info *info = (*iter).second;
> +	  if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> +	    made_changes |= bitmap_ior_into (&cannot_fold_insns,
> +					     info->fold_insns);
> +	}
> +
> +      if (!made_changes)
> +	return true;
> +    }
> +
> +  return false;
So how was the magic value of "5" determined here?  In general we try 
not to have magic #s like that and instead find a better way to control 
iterations, falling back to a PARAM when all else fails.


> +}
>> +
> +  machine_mode mode = GET_MODE (XEXP (mem, 0));
> +  XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
> +  INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
> +  df_insn_rescan (insn);
Don't we need to check if NEW_OFFSET is zero and if so generate simple 
register indirect addressing rather than indirect + displacement?

This looks *so* close ;-)  But I think we need a few questions answered 
and a few minor adjustments.

On a positive note, m68k passed with this version of the f-m-o patch :-)

Jeff

next prev parent reply	other threads:[~2023-09-12  0:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-09  8:46 Manolis Tsamis
2023-09-09  8:54 ` Manolis Tsamis
2023-09-12  0:47 ` Jeff Law [this message]
2023-09-12 10:13   ` Manolis Tsamis
2023-09-29 19:22     ` Jeff Law
2023-10-03 11:49       ` Manolis Tsamis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8af8bf2-042d-492b-a939-47b1326bd4bc@gmail.com \
    --to=jeffreyalaw@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=manolis.tsamis@vrull.eu \
    --cc=philipp.tomsich@vrull.eu \
    --cc=richard.guenther@gmail.com \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).