From: Jeff Law <jeffreyalaw@gmail.com>
To: Manolis Tsamis <manolis.tsamis@vrull.eu>, gcc-patches@gcc.gnu.org
Cc: Philipp Tomsich <philipp.tomsich@vrull.eu>,
Vineet Gupta <vineetg@rivosinc.com>,
Richard Biener <richard.guenther@gmail.com>
Subject: Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.
Date: Mon, 11 Sep 2023 18:47:44 -0600 [thread overview]
Message-ID: <a8af8bf2-042d-492b-a939-47b1326bd4bc@gmail.com> (raw)
In-Reply-To: <20230909084652.2655745-1-manolis.tsamis@vrull.eu>
On 9/9/23 02:46, Manolis Tsamis wrote:
> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
>
> addi t4,sp,16
> add t2,a6,t4
> shl t3,t2,1
> ld a2,0(t3)
> addi a2,1
> sd a2,8(t2)
>
> into the following (one instruction less):
>
> add t2,a6,sp
> shl t3,t2,1
> ld a2,32(t3)
> addi a2,1
> sd a2,24(t2)
>
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
>
> gcc/ChangeLog:
>
> * Makefile.in: Add fold-mem-offsets.o.
> * passes.def: Schedule a new pass.
> * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> * common.opt: New options.
> * doc/invoke.texi: Document new option.
> * fold-mem-offsets.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
>
> Changes in v5:
> - Introduce new helper function fold_offsets_1.
> - Fix bug because constants could be partially propagated
> through instructions that weren't understood.
> - Introduce helper class fold_mem_info that stores f-m-o
> info for an instruction.
> - Calculate fold_offsets only once with do_fold_info_calculation.
> - Fix correctness issue by introducing compute_validity_closure.
> - Propagate in more cases for PLUS/MINUS with constant.
>
> Changes in v4:
> - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
> - Remove fold_mem_offsets_driver and enum fold_mem_phase.
> - Call recog when patching offsets in do_commit_offset.
> - Restore INSN_CODE after modifying insn in do_check_validity.
>
> Changes in v3:
> - Added propagation for more codes:
> sub, neg, mul.
> - Added folding / elimination for sub and
> const int moves.
> - For the validity check of the generated addresses
> also test memory_address_addr_space_p.
> - Replaced GEN_INT with gen_int_mode.
> - Replaced some bitmap_head with auto_bitmap.
> - Refactor each phase into own function for readability.
> - Add dump details.
> - Replace rtx iteration with reg_mentioned_p.
> - Return early for codes that we can't propagate through.
>
> Changes in v2:
> - Made the pass target-independant instead of RISCV specific.
> - Fixed a number of bugs.
> - Add code to handle more ADD patterns as found
> in other targets (x86, aarch64).
> - Improved naming and comments.
> - Fixed bitmap memory leak.
>
> +
> +/* Get the single reaching definition of an instruction inside a BB.
> + The definition is desired for REG used in INSN.
> + Return the definition insn or NULL if there's no definition with
> + the desired criteria. */
> +static rtx_insn*
> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> +{
> + df_ref use;
> + struct df_link *ref_chain, *ref_link;
> +
> + FOR_EACH_INSN_USE (use, insn)
> + {
> + if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> + return NULL;
> + if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> + break;
> + }
> +
> + if (!use)
> + return NULL;
> +
> + ref_chain = DF_REF_CHAIN (use);
So what if there's two uses of REG in INSN? I don't think it's be
common at all, but probably better safe and reject than sorry, right? Or
is that case filtered out earlier?
> +
> + rtx_insn* def = DF_REF_INSN (ref_chain->ref);
Formatting nit. The '*' should be next to the variable, not the type.
> +
> +
> +static HOST_WIDE_INT
> +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns);
> +
> +/* Helper function for fold_offsets.
> +
> + If DO_RECURSION is false and ANALYZE is true this function returns true iff
> + it understands the structure of INSN and knows how to propagate constants
> + through it. In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
> +
> + If DO_RECURSION is true then it also calls fold_offsets for each recognised
> + part of INSN with the appropriate arguments.
> +
> + If DO_RECURSION is true and ANALYZE is false then offset that would result
> + from folding is computed and is returned through the pointer OFFSET_OUT.
> + The instructions that can be folded are recorded in FOLDABLE_INSNS.
> +*/
> +static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion,
> + HOST_WIDE_INT *offset_out, bitmap foldable_insns)
Nit. Linkage and return type on separate line. That makes the function
name start at the beginning of a line.
> +
> +/* Test if INSN is a memory load / store that can have an offset folded to it.
> + Return true iff INSN is such an instruction and return through MEM_OUT,
> + REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
> + used as a base address and the offset accordingly.
> + All of the out pointers may be NULL in which case they will be ignored. */
> +bool
> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
> + HOST_WIDE_INT *offset_out)
> +{
> + rtx set = single_set (insn);
> + rtx mem = NULL_RTX;
> +
> + if (set != NULL_RTX)
> + {
> + rtx src = SET_SRC (set);
> + rtx dest = SET_DEST (set);
> +
> + /* Don't fold when we have unspec / volatile. */
> + if (GET_CODE (src) == UNSPEC
> + || GET_CODE (src) == UNSPEC_VOLATILE
> + || GET_CODE (dest) == UNSPEC
> + || GET_CODE (dest) == UNSPEC_VOLATILE)
> + return false;
> +
> + if (MEM_P (src))
> + mem = src;
> + else if (MEM_P (dest))
> + mem = dest;
> + else if ((GET_CODE (src) == SIGN_EXTEND
> + || GET_CODE (src) == ZERO_EXTEND)
> + && MEM_P (XEXP (src, 0)))
Note some architectures allow both a source and destination memory. It
looks like your code will prefer the source operand in that case.
That's fine, just pointing it out.
> +
> +static bool
> +compute_validity_closure (fold_info_map *fold_info)
> +{
> + /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
> + and memory operations rN that use xN as shown below. If folding x1 in r1
> + turns out to be invalid for whatever reason then it's also invalid to fold
> + any of the other xN into any rN. That means that we need the transitive
> + closure of validity to determine whether we can fold a xN instruction.
> +
> + +--------------+ +-------------------+ +-------------------+
> + | r1 = mem[x1] | | r2 = mem[x1 + x2] | | r3 = mem[x2 + x3] | ...
> + +--------------+ +-------------------+ +-------------------+
> + ^ ^ ^ ^ ^
> + | / | / | ...
> + | / | / |
> + +-------------+ / +-------------+ / +-------------+
> + | x1 = x1 + 1 |-----+ | x2 = x2 + 1 |-----+ | x3 = x3 + 1 |--- ...
> + +-------------+ +-------------+ +-------------+
> + ^ ^ ^
> + | | |
> + ... ... ...
> + */
> +
> + int max_iters = 5;
> + for (int i = 0; i < max_iters; i++)
> + {
> + bool made_changes = false;
> + for (fold_info_map::iterator iter = fold_info->begin ();
> + iter != fold_info->end (); ++iter)
> + {
> + fold_mem_info *info = (*iter).second;
> + if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> + made_changes |= bitmap_ior_into (&cannot_fold_insns,
> + info->fold_insns);
> + }
> +
> + if (!made_changes)
> + return true;
> + }
> +
> + return false;
So how was the magic value of "5" determined here? In general we try
not to have magic #s like that and instead find a better way to control
iterations, falling back to a PARAM when all else fails.
> +}
>> +
> + machine_mode mode = GET_MODE (XEXP (mem, 0));
> + XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
> + INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
> + df_insn_rescan (insn);
Don't we need to check if NEW_OFFSET is zero and if so generate simple
register indirect addressing rather than indirect + displacement?
This looks *so* close ;-) But I think we need a few questions answered
and a few minor adjustments.
On a positive note, m68k passed with this version of the f-m-o patch :-)
Jeff
next prev parent reply other threads:[~2023-09-12 0:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-09 8:46 Manolis Tsamis
2023-09-09 8:54 ` Manolis Tsamis
2023-09-12 0:47 ` Jeff Law [this message]
2023-09-12 10:13 ` Manolis Tsamis
2023-09-29 19:22 ` Jeff Law
2023-10-03 11:49 ` Manolis Tsamis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a8af8bf2-042d-492b-a939-47b1326bd4bc@gmail.com \
--to=jeffreyalaw@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=manolis.tsamis@vrull.eu \
--cc=philipp.tomsich@vrull.eu \
--cc=richard.guenther@gmail.com \
--cc=vineetg@rivosinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).