From: Manolis Tsamis <manolis.tsamis@vrull.eu>
To: gcc-patches@gcc.gnu.org
Cc: Philipp Tomsich <philipp.tomsich@vrull.eu>,
Vineet Gupta <vineetg@rivosinc.com>,
Jeff Law <jeffreyalaw@gmail.com>,
Richard Biener <richard.guenther@gmail.com>
Subject: Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.
Date: Sat, 9 Sep 2023 11:54:30 +0300 [thread overview]
Message-ID: <CAM3yNXo9R7UY=NE5oKhdE+5opV=YeX_KG7ViO3aEiqx88Wr=yw@mail.gmail.com> (raw)
In-Reply-To: <20230909084652.2655745-1-manolis.tsamis@vrull.eu>
This new version fixes the issues discussed in v4 and also fixes an
issue that is described in the newly introduced
compute_validity_closure.
Bootstrapped on x86-64 and AArch64.
I also ran the GCC testsuite on x86-64, AArch64 and RISCV64. There are
no regressions except for gcc.target/i386/pr52146.c which I have
already mentioned and I believe it shouldn't be fixed in f-m-o.
I have also measured the number of eliminated instructions for SPEC
intrate on these three architectures, which are as follows:
RISCV64:
500: 112
502: 443
505: 0
520: 808
523: 20
525: 384
531: 41
541: 97
548: 101
557: 9
AArch64:
500: 71
502: 318
505: 0
520: 23
523: 205
525: 73
531: 7
541: 56
548: 0
557: 2
x86-64:
500: 8
502: 16
505: 0
520: 4
523: 5
525: 2
531: 0
541: 0
548: 0
557: 0
Thanks,
Manolis
On Sat, Sep 9, 2023 at 11:47 AM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
>
> addi t4,sp,16
> add t2,a6,t4
> shl t3,t2,1
> ld a2,0(t3)
> addi a2,1
> sd a2,8(t2)
>
> into the following (one instruction less):
>
> add t2,a6,sp
> shl t3,t2,1
> ld a2,32(t3)
> addi a2,1
> sd a2,24(t2)
>
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
>
> gcc/ChangeLog:
>
> * Makefile.in: Add fold-mem-offsets.o.
> * passes.def: Schedule a new pass.
> * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> * common.opt: New options.
> * doc/invoke.texi: Document new option.
> * fold-mem-offsets.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
>
> Changes in v5:
> - Introduce new helper function fold_offsets_1.
> - Fix bug because constants could be partially propagated
> through instructions that weren't understood.
> - Introduce helper class fold_mem_info that stores f-m-o
> info for an instruction.
> - Calculate fold_offsets only once with do_fold_info_calculation.
> - Fix correctness issue by introducing compute_validity_closure.
> - Propagate in more cases for PLUS/MINUS with constant.
>
> Changes in v4:
> - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
> - Remove fold_mem_offsets_driver and enum fold_mem_phase.
> - Call recog when patching offsets in do_commit_offset.
> - Restore INSN_CODE after modifying insn in do_check_validity.
>
> Changes in v3:
> - Added propagation for more codes:
> sub, neg, mul.
> - Added folding / elimination for sub and
> const int moves.
> - For the validity check of the generated addresses
> also test memory_address_addr_space_p.
> - Replaced GEN_INT with gen_int_mode.
> - Replaced some bitmap_head with auto_bitmap.
> - Refactor each phase into own function for readability.
> - Add dump details.
> - Replace rtx iteration with reg_mentioned_p.
> - Return early for codes that we can't propagate through.
>
> Changes in v2:
> - Made the pass target-independant instead of RISCV specific.
> - Fixed a number of bugs.
> - Add code to handle more ADD patterns as found
> in other targets (x86, aarch64).
> - Improved naming and comments.
> - Fixed bitmap memory leak.
>
> gcc/Makefile.in | 1 +
> gcc/common.opt | 4 +
> gcc/doc/invoke.texi | 8 +
> gcc/fold-mem-offsets.cc | 891 ++++++++++++++++++
> gcc/passes.def | 1 +
> .../gcc.target/riscv/fold-mem-offsets-1.c | 16 +
> .../gcc.target/riscv/fold-mem-offsets-2.c | 24 +
> .../gcc.target/riscv/fold-mem-offsets-3.c | 17 +
> gcc/tree-pass.h | 1 +
> 9 files changed, 963 insertions(+)
> create mode 100644 gcc/fold-mem-offsets.cc
> create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 6d608db4dd2..d18bef1be4b 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1435,6 +1435,7 @@ OBJS = \
> fixed-value.o \
> fold-const.o \
> fold-const-call.o \
> + fold-mem-offsets.o \
> function.o \
> function-abi.o \
> function-tests.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index f137a1f81ac..b103b8d28ed 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1252,6 +1252,10 @@ fcprop-registers
> Common Var(flag_cprop_registers) Optimization
> Perform a register copy-propagation optimization pass.
>
> +ffold-mem-offsets
> +Target Bool Var(flag_fold_mem_offsets) Init(1)
> +Fold instructions calculating memory offsets to the memory access instruction if possible.
> +
> fcrossjumping
> Common Var(flag_crossjumping) Optimization
> Perform cross-jumping optimization.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 33befee7d6b..ce5a83a2d9c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}.
> -fauto-inc-dec -fbranch-probabilities
> -fcaller-saves
> -fcombine-stack-adjustments -fconserve-stack
> +-ffold-mem-offsets
> -fcompare-elim -fcprop-registers -fcrossjumping
> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules
> -fcx-limited-range
> @@ -14355,6 +14356,13 @@ the comparison operation before register allocation is complete.
>
> Enabled at levels @option{-O1}, @option{-O2}, @option{-O3}, @option{-Os}.
>
> +@opindex ffold-mem-offsets
> +@item -ffold-mem-offsets
> +@itemx -fno-fold-mem-offsets
> +Try to eliminate add instructions by folding them in memory loads/stores.
> +
> +Enabled at levels @option{-O2}, @option{-O3}.
> +
> @opindex fcprop-registers
> @item -fcprop-registers
> After register allocation and post-register allocation instruction splitting,
> diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
> new file mode 100644
> index 00000000000..981c7a5e527
> --- /dev/null
> +++ b/gcc/fold-mem-offsets.cc
> @@ -0,0 +1,891 @@
> +/* Late RTL pass to fold memory offsets.
> + Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "expr.h"
> +#include "backend.h"
> +#include "regs.h"
> +#include "target.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "insn-config.h"
> +#include "recog.h"
> +#include "predict.h"
> +#include "df.h"
> +#include "tree-pass.h"
> +#include "cfgrtl.h"
> +
> +/* This pass tries to optimize memory offset calculations by moving constants
> + from add instructions to the memory instructions (loads / stores).
> + For example it can transform code like this:
> +
> + add t4, sp, 16
> + add t2, a6, t4
> + shl t3, t2, 1
> + ld a2, 0(t3)
> + add a2, 1
> + sd a2, 8(t2)
> +
> + into the following (one instruction less):
> +
> + add t2, a6, sp
> + shl t3, t2, 1
> + ld a2, 32(t3)
> + add a2, 1
> + sd a2, 24(t2)
> +
> + Although the previous passes try to emit efficient offset calculations
> + this pass is still beneficial because:
> +
> + - The mechanisms that optimize memory offsets usually work with specific
> + patterns or have limitations. This pass is designed to fold offsets
> + through complex calculations that affect multiple memory operations
> + and have partially overlapping calculations.
> +
> + - There are cases where add instructions are introduced in late rtl passes
> + and the rest of the pipeline cannot eliminate them. Arrays and structs
> + allocated on the stack can result in unwanted add instructions that
> + cannot be eliminated easily.
> +
> + This pass works on a basic block level and consists of 4 phases:
> +
> + - Phase 1 (Analysis): Find "foldable" instructions.
> + Foldable instructions are those that we know how to propagate
> + a constant addition through (add, shift, move, ...) and only have other
> + foldable instructions for uses. In that phase a DFS traversal on the
> + definition tree is performed and foldable instructions are marked on
> + a bitmap. The add immediate instructions that are reachable in this
> + DFS are candidates for folding since all the intermediate calculations
> + affected by them are also foldable.
> +
> + - Phase 2 (Validity): Traverse and calculate the offsets that would result
> + from folding the add immediate instructions. Check whether the
> + calculated offsets result in a valid instruction for the target.
> +
> + - Phase 3 (Commit offsets): Traverse again. It is now known which folds
> + are valid so at this point change the offsets in the memory instructions.
> +
> + - Phase 4 (Commit instruction deletions): Scan all instructions and delete
> + or simplify (reduce to move) all add immediate instructions that were
> + folded.
> +
> + This pass should run before hard register propagation because it creates
> + register moves that we expect to be eliminated. */
> +
> +namespace {
> +
> +const pass_data pass_data_fold_mem =
> +{
> + RTL_PASS, /* type */
> + "fold_mem_offsets", /* name */
> + OPTGROUP_NONE, /* optinfo_flags */
> + TV_NONE, /* tv_id */
> + 0, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_df_finish, /* todo_flags_finish */
> +};
> +
> +class pass_fold_mem_offsets : public rtl_opt_pass
> +{
> +public:
> + pass_fold_mem_offsets (gcc::context *ctxt)
> + : rtl_opt_pass (pass_data_fold_mem, ctxt)
> + {}
> +
> + /* opt_pass methods: */
> + virtual bool gate (function *)
> + {
> + return flag_fold_mem_offsets && optimize >= 2;
> + }
> +
> + virtual unsigned int execute (function *);
> +}; // class pass_fold_mem_offsets
> +
> +/* Class that holds in FOLD_INSNS the instructions that if folded the offset
> + of a memory instruction would increase by ADDED_OFFSET. */
> +class fold_mem_info {
> +public:
> + auto_bitmap fold_insns;
> + HOST_WIDE_INT added_offset;
> +};
> +
> +typedef hash_map<rtx_insn *, fold_mem_info *> fold_info_map;
> +
> +/* Tracks which instructions can be reached through instructions that can
> + propagate offsets for folding. */
> +static bitmap_head can_fold_insns;
> +
> +/* Marks instructions that are currently eligible for folding. */
> +static bitmap_head candidate_fold_insns;
> +
> +/* Tracks instructions that cannot be folded because it turned out that
> + folding will result in creating an invalid memory instruction.
> + An instruction can be in both CANDIDATE_FOLD_INSNS and CANNOT_FOLD_INSNS
> + at the same time, in which case it is not legal to fold. */
> +static bitmap_head cannot_fold_insns;
> +
> +/* The number of instructions that were simplified or eliminated. */
> +static int stats_fold_count;
> +
> +/* Get the single reaching definition of an instruction inside a BB.
> + The definition is desired for REG used in INSN.
> + Return the definition insn or NULL if there's no definition with
> + the desired criteria. */
> +static rtx_insn*
> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> +{
> + df_ref use;
> + struct df_link *ref_chain, *ref_link;
> +
> + FOR_EACH_INSN_USE (use, insn)
> + {
> + if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> + return NULL;
> + if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> + break;
> + }
> +
> + if (!use)
> + return NULL;
> +
> + ref_chain = DF_REF_CHAIN (use);
> +
> + if (!ref_chain)
> + return NULL;
> +
> + for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> + {
> + /* Problem getting some definition for this instruction. */
> + if (ref_link->ref == NULL)
> + return NULL;
> + if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> + return NULL;
> + if (global_regs[REGNO (reg)]
> + && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> + return NULL;
> + }
> +
> + if (ref_chain->next)
> + return NULL;
> +
> + rtx_insn* def = DF_REF_INSN (ref_chain->ref);
> +
> + if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
> + return NULL;
> +
> + if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
> + return NULL;
> +
> + return def;
> +}
> +
> +/* Get all uses of REG which is set in INSN. Return the use list or NULL if a
> + use is missing / irregular. If SUCCESS is not NULL then set it to false if
> + there are missing / irregular uses and true otherwise. */
> +static struct df_link*
> +get_uses (rtx_insn *insn, rtx reg, bool* success)
> +{
> + df_ref def;
> + struct df_link *ref_chain, *ref_link;
> +
> + if (success)
> + *success = false;
> +
> + FOR_EACH_INSN_DEF (def, insn)
> + if (REGNO (DF_REF_REG (def)) == REGNO (reg))
> + break;
> +
> + if (!def)
> + return NULL;
> +
> + ref_chain = DF_REF_CHAIN (def);
> +
> + for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> + {
> + /* Problem getting a use for this instruction. */
> + if (ref_link->ref == NULL)
> + return NULL;
> + if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
> + return NULL;
> + /* We do not handle REG_EQUIV/REG_EQ notes for now. */
> + if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE)
> + return NULL;
> + }
> +
> + if (success)
> + *success = true;
> +
> + return ref_chain;
> +}
> +
> +static HOST_WIDE_INT
> +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns);
> +
> +/* Helper function for fold_offsets.
> +
> + If DO_RECURSION is false and ANALYZE is true this function returns true iff
> + it understands the structure of INSN and knows how to propagate constants
> + through it. In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
> +
> + If DO_RECURSION is true then it also calls fold_offsets for each recognised
> + part of INSN with the appropriate arguments.
> +
> + If DO_RECURSION is true and ANALYZE is false then offset that would result
> + from folding is computed and is returned through the pointer OFFSET_OUT.
> + The instructions that can be folded are recorded in FOLDABLE_INSNS.
> +*/
> +static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recursion,
> + HOST_WIDE_INT *offset_out, bitmap foldable_insns)
> +{
> + /* Doesn't make sense if both DO_RECURSION and ANALYZE are false. */
> + gcc_checking_assert (do_recursion || analyze);
> + gcc_checking_assert (GET_CODE (PATTERN (insn)) == SET);
> +
> + rtx src = SET_SRC (PATTERN (insn));
> + HOST_WIDE_INT offset = 0;
> +
> + switch (GET_CODE (src))
> + {
> + case PLUS:
> + {
> + /* Propagate through add. */
> + rtx arg1 = XEXP (src, 0);
> + rtx arg2 = XEXP (src, 1);
> +
> + if (REG_P (arg1))
> + {
> + if (do_recursion)
> + offset += fold_offsets (insn, arg1, analyze, foldable_insns);
> + }
> + else if (GET_CODE (arg1) == ASHIFT
> + && REG_P (XEXP (arg1, 0))
> + && CONST_INT_P (XEXP (arg1, 1)))
> + {
> + /* Handle R1 = (R2 << C) + ... */
> + if (do_recursion)
> + {
> + HOST_WIDE_INT scale
> + = (HOST_WIDE_INT_1U << INTVAL (XEXP (arg1, 1)));
> + offset += scale * fold_offsets (insn, XEXP (arg1, 0), analyze,
> + foldable_insns);
> + }
> + }
> + else if (GET_CODE (arg1) == PLUS
> + && REG_P (XEXP (arg1, 0))
> + && REG_P (XEXP (arg1, 1)))
> + {
> + /* Handle R1 = (R2 + R3) + ... */
> + if (do_recursion)
> + {
> + offset += fold_offsets (insn, XEXP (arg1, 0), analyze,
> + foldable_insns);
> + offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
> + foldable_insns);
> + }
> + }
> + else if (GET_CODE (arg1) == PLUS
> + && GET_CODE (XEXP (arg1, 0)) == ASHIFT
> + && REG_P (XEXP (XEXP (arg1, 0), 0))
> + && CONST_INT_P (XEXP (XEXP (arg1, 0), 1))
> + && REG_P (XEXP (arg1, 1)))
> + {
> + /* Handle R1 = ((R2 << C) + R3) + ... */
> + if (do_recursion)
> + {
> + HOST_WIDE_INT scale
> + = (HOST_WIDE_INT_1U << INTVAL (XEXP (XEXP (arg1, 0), 1)));
> + offset += scale * fold_offsets (insn, XEXP (XEXP (arg1, 0), 0),
> + analyze, foldable_insns);
> + offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
> + foldable_insns);
> + }
> + }
> + else
> + return false;
> +
> + if (REG_P (arg2))
> + {
> + if (do_recursion)
> + offset += fold_offsets (insn, arg2, analyze, foldable_insns);
> + }
> + else if (CONST_INT_P (arg2))
> + {
> + if (REG_P (arg1))
> + {
> + offset += INTVAL (arg2);
> + /* This is a R1 = R2 + C instruction, candidate for folding. */
> + if (!analyze)
> + bitmap_set_bit (foldable_insns, INSN_UID (insn));
> + }
> + }
> + else
> + return false;
> +
> + /* Pattern recognised for folding. */
> + break;
> + }
> + case MINUS:
> + {
> + /* Propagate through minus. */
> + rtx arg1 = XEXP (src, 0);
> + rtx arg2 = XEXP (src, 1);
> +
> + if (REG_P (arg1))
> + {
> + if (do_recursion)
> + offset += fold_offsets (insn, arg1, analyze, foldable_insns);
> + }
> + else
> + return false;
> +
> + if (REG_P (arg2))
> + {
> + if (do_recursion)
> + offset -= fold_offsets (insn, arg2, analyze, foldable_insns);
> + }
> + else if (CONST_INT_P (arg2))
> + {
> + if (REG_P (arg1))
> + {
> + offset -= INTVAL (arg2);
> + /* This is a R1 = R2 - C instruction, candidate for folding. */
> + if (!analyze)
> + bitmap_set_bit (foldable_insns, INSN_UID (insn));
> + }
> + }
> + else
> + return false;
> +
> + /* Pattern recognised for folding. */
> + break;
> + }
> + case NEG:
> + {
> + /* Propagate through negation. */
> + rtx arg1 = XEXP (src, 0);
> + if (REG_P (arg1))
> + {
> + if (do_recursion)
> + offset = -fold_offsets (insn, arg1, analyze, foldable_insns);
> + }
> + else
> + return false;
> +
> + /* Pattern recognised for folding. */
> + break;
> + }
> + case MULT:
> + {
> + /* Propagate through multiply by constant. */
> + rtx arg1 = XEXP (src, 0);
> + rtx arg2 = XEXP (src, 1);
> +
> + if (REG_P (arg1) && CONST_INT_P (arg2))
> + {
> + if (do_recursion)
> + {
> + HOST_WIDE_INT scale = INTVAL (arg2);
> + offset = scale * fold_offsets (insn, arg1, analyze,
> + foldable_insns);
> + }
> + }
> + else
> + return false;
> +
> + /* Pattern recognised for folding. */
> + break;
> + }
> + case ASHIFT:
> + {
> + /* Propagate through shift left by constant. */
> + rtx arg1 = XEXP (src, 0);
> + rtx arg2 = XEXP (src, 1);
> +
> + if (REG_P (arg1) && CONST_INT_P (arg2))
> + {
> + if (do_recursion)
> + {
> + HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2));
> + offset = scale * fold_offsets (insn, arg1, analyze,
> + foldable_insns);
> + }
> + }
> + else
> + return false;
> +
> + /* Pattern recognised for folding. */
> + break;
> + }
> + case REG:
> + {
> + /* Propagate through register move. */
> + if (do_recursion)
> + offset = fold_offsets (insn, src, analyze, foldable_insns);
> +
> + /* Pattern recognised for folding. */
> + break;
> + }
> + case CONST_INT:
> + {
> + offset = INTVAL (src);
> + /* R1 = C is candidate for folding. */
> + if (!analyze)
> + bitmap_set_bit (foldable_insns, INSN_UID (insn));
> +
> + /* Pattern recognised for folding. */
> + break;
> + }
> + default:
> + /* Cannot recognise. */
> + return false;
> + }
> +
> + if (do_recursion && !analyze)
> + *offset_out = offset;
> +
> + return true;
> +}
> +
> +/* Function that computes the offset that would have to be added to all uses
> + of REG if the instructions marked in FOLDABLE_INSNS were to be eliminated.
> +
> + If ANALYZE is true then mark in CAN_FOLD_INSNS which instructions
> + transitively only affect other instructions found in CAN_FOLD_INSNS.
> + If ANALYZE is false then compute the offset required for folding. */
> +static HOST_WIDE_INT
> +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_insns)
> +{
> + rtx_insn* def = get_single_def_in_bb (insn, reg);
> +
> + if (!def || GET_CODE (PATTERN (def)) != SET)
> + return 0;
> +
> + rtx dest = SET_DEST (PATTERN (def));
> +
> + if (!REG_P (dest))
> + return 0;
> +
> + /* We can only affect the values of GPR registers. */
> + unsigned int dest_regno = REGNO (dest);
> + if (fixed_regs[dest_regno]
> + || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno))
> + return 0;
> +
> + if (analyze)
> + {
> + /* Check if we know how to handle DEF. */
> + if (!fold_offsets_1 (def, true, false, NULL, NULL))
> + return 0;
> +
> + /* We only fold through instructions that are transitively used as
> + memory addresses and do not have other uses. Use the same logic
> + from offset calculation to visit instructions that can propagate
> + offsets and keep track of them in CAN_FOLD_INSNS. */
> + bool success;
> + struct df_link *uses = get_uses (def, dest, &success), *ref_link;
> +
> + if (!success)
> + return 0;
> +
> + for (ref_link = uses; ref_link; ref_link = ref_link->next)
> + {
> + rtx_insn* use = DF_REF_INSN (ref_link->ref);
> +
> + if (DEBUG_INSN_P (use))
> + continue;
> +
> + /* Punt if the use is anything more complicated than a set
> + (clobber, use, etc). */
> + if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != SET)
> + return 0;
> +
> + /* This use affects instructions outside of CAN_FOLD_INSNS. */
> + if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
> + return 0;
> +
> + rtx use_set = PATTERN (use);
> +
> + /* Special case: A foldable memory store is not foldable if it
> + mentions DEST outside of the address calculation. */
> + if (use_set && MEM_P (SET_DEST (use_set))
> + && reg_mentioned_p (dest, SET_SRC (use_set)))
> + return 0;
> + }
> +
> + bitmap_set_bit (&can_fold_insns, INSN_UID (def));
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Instruction marked for propagation: ");
> + print_rtl_single (dump_file, def);
> + }
> + }
> + else
> + {
> + /* We cannot propagate through this instruction. */
> + if (!bitmap_bit_p (&can_fold_insns, INSN_UID (def)))
> + return 0;
> + }
> +
> + HOST_WIDE_INT offset = 0;
> + bool recognised = fold_offsets_1 (def, analyze, true, &offset,
> + foldable_insns);
> +
> + if (!recognised)
> + return 0;
> +
> + return offset;
> +}
> +
> +/* Test if INSN is a memory load / store that can have an offset folded to it.
> + Return true iff INSN is such an instruction and return through MEM_OUT,
> + REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
> + used as a base address and the offset accordingly.
> + All of the out pointers may be NULL in which case they will be ignored. */
> +bool
> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
> + HOST_WIDE_INT *offset_out)
> +{
> + rtx set = single_set (insn);
> + rtx mem = NULL_RTX;
> +
> + if (set != NULL_RTX)
> + {
> + rtx src = SET_SRC (set);
> + rtx dest = SET_DEST (set);
> +
> + /* Don't fold when we have unspec / volatile. */
> + if (GET_CODE (src) == UNSPEC
> + || GET_CODE (src) == UNSPEC_VOLATILE
> + || GET_CODE (dest) == UNSPEC
> + || GET_CODE (dest) == UNSPEC_VOLATILE)
> + return false;
> +
> + if (MEM_P (src))
> + mem = src;
> + else if (MEM_P (dest))
> + mem = dest;
> + else if ((GET_CODE (src) == SIGN_EXTEND
> + || GET_CODE (src) == ZERO_EXTEND)
> + && MEM_P (XEXP (src, 0)))
> + mem = XEXP (src, 0);
> + }
> +
> + if (mem == NULL_RTX)
> + return false;
> +
> + rtx mem_addr = XEXP (mem, 0);
> + rtx reg;
> + HOST_WIDE_INT offset;
> +
> + if (REG_P (mem_addr))
> + {
> + reg = mem_addr;
> + offset = 0;
> + }
> + else if (GET_CODE (mem_addr) == PLUS
> + && REG_P (XEXP (mem_addr, 0))
> + && CONST_INT_P (XEXP (mem_addr, 1)))
> + {
> + reg = XEXP (mem_addr, 0);
> + offset = INTVAL (XEXP (mem_addr, 1));
> + }
> + else
> + return false;
> +
> + if (mem_out)
> + *mem_out = mem;
> + if (reg_out)
> + *reg_out = reg;
> + if (offset_out)
> + *offset_out = offset;
> +
> + return true;
> +}
> +
> +/* If INSN is a root memory instruction then do a DFS traversal on its
> + definitions and find folding candidates. */
> +static void
> +do_analysis (rtx_insn* insn)
> +{
> + rtx reg;
> + if (!get_fold_mem_root (insn, NULL, ®, NULL))
> + return;
> +
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> + fprintf (dump_file, "Starting analysis from root: ");
> + print_rtl_single (dump_file, insn);
> + }
> +
> + /* Analyse folding opportunities for this memory instruction. */
> + bitmap_set_bit (&can_fold_insns, INSN_UID (insn));
> + fold_offsets (insn, reg, true, NULL);
> +}
> +
> +static void
> +do_fold_info_calculation (rtx_insn* insn, fold_info_map *fold_info)
> +{
> + rtx mem, reg;
> + HOST_WIDE_INT cur_offset;
> + if (!get_fold_mem_root (insn, &mem, ®, &cur_offset))
> + return;
> +
> + fold_mem_info *info = new fold_mem_info;
> + info->added_offset = fold_offsets (insn, reg, false, info->fold_insns);
> +
> + fold_info->put (insn, info);
> +}
> +
> +/* If INSN is a root memory instruction then compute a potentially new offset
> + for it and test if the resulting instruction is valid. */
> +static void
> +do_check_validity (rtx_insn* insn, fold_mem_info *info)
> +{
> + rtx mem, reg;
> + HOST_WIDE_INT cur_offset;
> + if (!get_fold_mem_root (insn, &mem, ®, &cur_offset))
> + return;
> +
> + HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
> +
> + /* Test if it is valid to change MEM's address offset to NEW_OFFSET. */
> + int icode = INSN_CODE (insn);
> + rtx mem_addr = XEXP (mem, 0);
> + machine_mode mode = GET_MODE (mem_addr);
> + XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
> +
> + bool illegal = insn_invalid_p (insn, false)
> + || !memory_address_addr_space_p (mode, XEXP (mem, 0),
> + MEM_ADDR_SPACE (mem));
> +
> + /* Restore the instruction. */
> + XEXP (mem, 0) = mem_addr;
> + INSN_CODE (insn) = icode;
> +
> + if (illegal)
> + bitmap_ior_into (&cannot_fold_insns, info->fold_insns);
> + else
> + bitmap_ior_into (&candidate_fold_insns, info->fold_insns);
> +}
> +
> +static bool
> +compute_validity_closure (fold_info_map *fold_info)
> +{
> + /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
> + and memory operations rN that use xN as shown below. If folding x1 in r1
> + turns out to be invalid for whatever reason then it's also invalid to fold
> + any of the other xN into any rN. That means that we need the transitive
> + closure of validity to determine whether we can fold a xN instruction.
> +
> + +--------------+ +-------------------+ +-------------------+
> + | r1 = mem[x1] | | r2 = mem[x1 + x2] | | r3 = mem[x2 + x3] | ...
> + +--------------+ +-------------------+ +-------------------+
> + ^ ^ ^ ^ ^
> + | / | / | ...
> + | / | / |
> + +-------------+ / +-------------+ / +-------------+
> + | x1 = x1 + 1 |-----+ | x2 = x2 + 1 |-----+ | x3 = x3 + 1 |--- ...
> + +-------------+ +-------------+ +-------------+
> + ^ ^ ^
> + | | |
> + ... ... ...
> + */
> +
> + int max_iters = 5;
> + for (int i = 0; i < max_iters; i++)
> + {
> + bool made_changes = false;
> + for (fold_info_map::iterator iter = fold_info->begin ();
> + iter != fold_info->end (); ++iter)
> + {
> + fold_mem_info *info = (*iter).second;
> + if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> + made_changes |= bitmap_ior_into (&cannot_fold_insns,
> + info->fold_insns);
> + }
> +
> + if (!made_changes)
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/* If INSN is a root memory instruction that was affected by any folding
> + then update its offset as necessary. */
> +static void
> +do_commit_offset (rtx_insn* insn, fold_mem_info *info)
> +{
> + rtx mem, reg;
> + HOST_WIDE_INT cur_offset;
> + if (!get_fold_mem_root (insn, &mem, ®, &cur_offset))
> + return;
> +
> + HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
> +
> + if (new_offset == cur_offset)
> + return;
> +
> + gcc_assert (!bitmap_empty_p (info->fold_insns));
> +
> + if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> + return;
> +
> + if (dump_file)
> + {
> + fprintf (dump_file, "Memory offset changed from "
> + HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC
> + " for instruction:\n", cur_offset, new_offset);
> + print_rtl_single (dump_file, insn);
> + }
> +
> + machine_mode mode = GET_MODE (XEXP (mem, 0));
> + XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
> + INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
> + df_insn_rescan (insn);
> +}
> +
> +/* If INSN is a move / add instruction that was folded then replace its
> + constant part with zero. */
> +static void
> +do_commit_insn (rtx_insn* insn)
> +{
> + if (bitmap_bit_p (&candidate_fold_insns, INSN_UID (insn))
> + && !bitmap_bit_p (&cannot_fold_insns, INSN_UID (insn)))
> + {
> + if (dump_file)
> + {
> + fprintf (dump_file, "Instruction folded:");
> + print_rtl_single (dump_file, insn);
> + }
> +
> + stats_fold_count++;
> +
> + rtx set = single_set (insn);
> + rtx dest = SET_DEST (set);
> + rtx src = SET_SRC (set);
> +
> + /* Emit a move and let subsequent passes eliminate it if possible. */
> + if (GET_CODE (src) == CONST_INT)
> + {
> + /* INSN is R1 = C.
> + Replace it with R1 = 0 because C was folded. */
> + rtx mov_rtx
> + = gen_move_insn (dest, gen_int_mode (0, GET_MODE (dest)));
> + df_insn_rescan (emit_insn_after (mov_rtx, insn));
> + }
> + else
> + {
> + /* INSN is R1 = R2 + C.
> + Replace it with R1 = R2 because C was folded. */
> + rtx arg1 = XEXP (src, 0);
> +
> + /* If the DEST == ARG1 then the move is a no-op. */
> + if (REGNO (dest) != REGNO (arg1))
> + {
> + gcc_checking_assert (GET_MODE (dest) == GET_MODE (arg1));
> + rtx mov_rtx = gen_move_insn (dest, arg1);
> + df_insn_rescan (emit_insn_after (mov_rtx, insn));
> + }
> + }
> +
> + /* Delete the original move / add instruction. */
> + delete_insn (insn);
> + }
> +}
> +
> +unsigned int
> +pass_fold_mem_offsets::execute (function *fn)
> +{
> + df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + DF_DEFER_INSN_RESCAN);
> + df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
> + df_analyze ();
> +
> + bitmap_initialize (&can_fold_insns, NULL);
> + bitmap_initialize (&candidate_fold_insns, NULL);
> + bitmap_initialize (&cannot_fold_insns, NULL);
> +
> + stats_fold_count = 0;
> +
> + basic_block bb;
> + rtx_insn *insn;
> + FOR_ALL_BB_FN (bb, fn)
> + {
> + /* There is a conflict between this pass and RISCV's shorten-memrefs
> + pass. For now disable folding if optimizing for size because
> + otherwise this cancels the effects of shorten-memrefs. */
> + if (optimize_bb_for_size_p (bb))
> + continue;
> +
> + fold_info_map fold_info;
> +
> + bitmap_clear (&can_fold_insns);
> + bitmap_clear (&candidate_fold_insns);
> + bitmap_clear (&cannot_fold_insns);
> +
> + FOR_BB_INSNS (bb, insn)
> + do_analysis (insn);
> +
> + FOR_BB_INSNS (bb, insn)
> + do_fold_info_calculation (insn, &fold_info);
> +
> + FOR_BB_INSNS (bb, insn)
> + if (fold_mem_info **info = fold_info.get (insn))
> + do_check_validity (insn, *info);
> +
> + if (compute_validity_closure (&fold_info))
> + {
> + FOR_BB_INSNS (bb, insn)
> + if (fold_mem_info **info = fold_info.get (insn))
> + do_commit_offset (insn, *info);
> +
> + FOR_BB_INSNS (bb, insn)
> + do_commit_insn (insn);
> + }
> +
> + for (fold_info_map::iterator iter = fold_info.begin ();
> + iter != fold_info.end (); ++iter)
> + delete (*iter).second;
> + }
> +
> + statistics_counter_event (cfun, "Number of folded instructions",
> + stats_fold_count);
> +
> + bitmap_release (&can_fold_insns);
> + bitmap_release (&candidate_fold_insns);
> + bitmap_release (&cannot_fold_insns);
> +
> + return 0;
> +}
> +
> +} // anon namespace
> +
> +rtl_opt_pass *
> +make_pass_fold_mem_offsets (gcc::context *ctxt)
> +{
> + return new pass_fold_mem_offsets (ctxt);
> +}
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 4110a472914..93e15cba65b 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -519,6 +519,7 @@ along with GCC; see the file COPYING3. If not see
> NEXT_PASS (pass_peephole2);
> NEXT_PASS (pass_if_after_reload);
> NEXT_PASS (pass_regrename);
> + NEXT_PASS (pass_fold_mem_offsets);
> NEXT_PASS (pass_cprop_hardreg);
> NEXT_PASS (pass_fast_rtl_dce);
> NEXT_PASS (pass_reorder_blocks);
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> new file mode 100644
> index 00000000000..8b31cd9e22e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffold-mem-offsets" } */
> +
> +void sink(int arr[2]);
> +
> +void
> +foo(int a, int b, int i)
> +{
> + int arr[2] = {a, b};
> + arr[i]++;
> + sink(arr);
> +}
> +
> +// Should compile without negative memory offsets when using -mfold-mem-offsets
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> new file mode 100644
> index 00000000000..2e1348efdf1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffold-mem-offsets" } */
> +
> +void sink(int arr[3]);
> +
> +void
> +foo(int a, int b, int c, int i)
> +{
> + int arr1[3] = {a, b, c};
> + int arr2[3] = {a, c, b};
> + int arr3[3] = {c, b, a};
> +
> + arr1[i]++;
> + arr2[i]++;
> + arr3[i]++;
> +
> + sink(arr1);
> + sink(arr2);
> + sink(arr3);
> +}
> +
> +// Should compile without negative memory offsets when using -mfold-mem-offsets
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> new file mode 100644
> index 00000000000..9c50c51ac1c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffold-mem-offsets" } */
> +
> +void load(int arr[2]);
> +
> +int
> +foo(long unsigned int i)
> +{
> + int arr[2];
> + load(arr);
> +
> + return arr[3 * i + 77];
> +}
> +
> +// Should compile without negative memory offsets when using -mfold-mem-offsets
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "addi\t.*,.*,77" } } */
> \ No newline at end of file
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index eba2d54ac76..34a38dbd148 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -621,6 +621,7 @@ extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_if_after_reload (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_regrename (gcc::context *ctxt);
> +extern rtl_opt_pass *make_pass_fold_mem_offsets (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_cprop_hardreg (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_reorder_blocks (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_leaf_regs (gcc::context *ctxt);
> --
> 2.34.1
>
next prev parent reply other threads:[~2023-09-09 8:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-09 8:46 Manolis Tsamis
2023-09-09 8:54 ` Manolis Tsamis [this message]
2023-09-12 0:47 ` Jeff Law
2023-09-12 10:13 ` Manolis Tsamis
2023-09-29 19:22 ` Jeff Law
2023-10-03 11:49 ` Manolis Tsamis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM3yNXo9R7UY=NE5oKhdE+5opV=YeX_KG7ViO3aEiqx88Wr=yw@mail.gmail.com' \
--to=manolis.tsamis@vrull.eu \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=philipp.tomsich@vrull.eu \
--cc=richard.guenther@gmail.com \
--cc=vineetg@rivosinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).