From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by sourceware.org (Postfix) with ESMTPS id A08763858D1E for ; Sat, 9 Sep 2023 08:55:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A08763858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-26d1e5f2c35so2186328a91.2 for ; Sat, 09 Sep 2023 01:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1694249707; x=1694854507; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=W4PqYvEMTS6pu+bcsqXaWolWjS+nANkHgy+A3pV3mwQ=; b=s4sPQfYS1xYcP1Oz4By5IIqbAxN481xTreGjGM2FQQls8DovyYbMeXl7x8KrQTCPzP 2v+UfD3p4xgzu/9hsebjE94AlFjt1ez4tPCJT8q0fAbYiSyNGJl3hShb8O2XjBEJMmCo q0f6lZHXi07pzAovAUYpvSCWL4A/o8BvYrkgZhyVSfafPPe7VY/dMlJ08XVY66VecbUs 3MIeMTlRb1TrjkKTh7kWDKC3F34Cc5CelVqeqUTnq+RmVXP0Niuv5gMa/6d9BZqROlBt FsDSZGPLumJMPmNsnzHzzaE+BsTOKKBA9sDYkcj6QcDozYYANOHipqaB6df/Po4nMTP9 pRvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694249707; x=1694854507; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W4PqYvEMTS6pu+bcsqXaWolWjS+nANkHgy+A3pV3mwQ=; b=c07QD3QOOh82ryXF9lkON+UAP0aVo7knKsyW3hxgF9b4+zynlstXGCiqWmjMuSEh7f sLUInJEpVmLIdeI0+NHMZpigENh7o3y8OHTZkxX1p338hj3y7iiedu3B+BTozhc5b+oK S9PkYPJ7PvfQqKsSZ3uD63GXCxbtbaNBzVhqqnrqzaACoD3tOOgvht8+3xiYrjdJvc78 2urSnp4csTred6VOsB0QhHf/gJRil5+J9nYkWDAjIdoEddi6qh8vIzEiCuflpseitCya qvxr5Y1A3HOd5ato9OUf7mN7YnR5TOet08sfyPGKaD3t2PWZWzeBnBtHH/7KSuAvlYLX EIiw== X-Gm-Message-State: AOJu0YzGZ0CIjnAo3jVZ6SGqgfyPWzLNEgueusNIWEzoosrec+2eX1l3 Fz2bnKZcDkl4tfSoAH97ZwpQTt/czm/d7HYHohuw59qT/Oi2Xzh8x83Ivw== X-Google-Smtp-Source: AGHT+IHZKzV3D7GQ5ajr1Ok2jk8KFs4NNV+JOj/zNGgQHJtX8wC36ZKxVDmDDEvPCcoePH3TVgOJwvtN/JRX254hczY= X-Received: by 2002:a17:90a:9a9:b0:268:14a0:f8a with SMTP id 38-20020a17090a09a900b0026814a00f8amr4501203pjo.39.1694249707121; Sat, 09 Sep 2023 01:55:07 -0700 (PDT) MIME-Version: 1.0 References: <20230909084652.2655745-1-manolis.tsamis@vrull.eu> In-Reply-To: <20230909084652.2655745-1-manolis.tsamis@vrull.eu> From: Manolis Tsamis Date: Sat, 9 Sep 2023 11:54:30 +0300 Message-ID: Subject: Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets. To: gcc-patches@gcc.gnu.org Cc: Philipp Tomsich , Vineet Gupta , Jeff Law , Richard Biener Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,JMQ_SPF_NEUTRAL,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This new version fixes the issues discussed in v4 and also fixes an issue that is described in the newly introduced compute_validity_closure. Bootstrapped on x86-64 and AArch64. I also ran the GCC testsuite on x86-64, AArch64 and RISCV64. There are no regressions except for gcc.target/i386/pr52146.c which I have already mentioned and I believe it shouldn't be fixed in f-m-o. I have also measured the number of eliminated instructions for SPEC intrate on these three architectures, which are as follows: RISCV64: 500: 112 502: 443 505: 0 520: 808 523: 20 525: 384 531: 41 541: 97 548: 101 557: 9 AArch64: 500: 71 502: 318 505: 0 520: 23 523: 205 525: 73 531: 7 541: 56 548: 0 557: 2 x86-64: 500: 8 502: 16 505: 0 520: 4 523: 5 525: 2 531: 0 541: 0 548: 0 557: 0 Thanks, Manolis On Sat, Sep 9, 2023 at 11:47=E2=80=AFAM Manolis Tsamis wrote: > > This is a new RTL pass that tries to optimize memory offset calculations > by moving them from add immediate instructions to the memory loads/stores= . > For example it can transform this: > > addi t4,sp,16 > add t2,a6,t4 > shl t3,t2,1 > ld a2,0(t3) > addi a2,1 > sd a2,8(t2) > > into the following (one instruction less): > > add t2,a6,sp > shl t3,t2,1 > ld a2,32(t3) > addi a2,1 > sd a2,24(t2) > > Although there are places where this is done already, this pass is more > powerful and can handle the more difficult cases that are currently not > optimized. Also, it runs late enough and can optimize away unnecessary > stack pointer calculations. > > gcc/ChangeLog: > > * Makefile.in: Add fold-mem-offsets.o. > * passes.def: Schedule a new pass. > * tree-pass.h (make_pass_fold_mem_offsets): Declare. > * common.opt: New options. > * doc/invoke.texi: Document new option. > * fold-mem-offsets.cc: New file. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/fold-mem-offsets-1.c: New test. > * gcc.target/riscv/fold-mem-offsets-2.c: New test. > * gcc.target/riscv/fold-mem-offsets-3.c: New test. > > Signed-off-by: Manolis Tsamis > --- > > Changes in v5: > - Introduce new helper function fold_offsets_1. > - Fix bug because constants could be partially propagated > through instructions that weren't understood. > - Introduce helper class fold_mem_info that stores f-m-o > info for an instruction. > - Calculate fold_offsets only once with do_fold_info_calculation. > - Fix correctness issue by introducing compute_validity_closure. > - Propagate in more cases for PLUS/MINUS with constant. > > Changes in v4: > - Add DF_EQ_NOTES flag to avoid incorrect state in notes. > - Remove fold_mem_offsets_driver and enum fold_mem_phase. > - Call recog when patching offsets in do_commit_offset. > - Restore INSN_CODE after modifying insn in do_check_validity. > > Changes in v3: > - Added propagation for more codes: > sub, neg, mul. > - Added folding / elimination for sub and > const int moves. > - For the validity check of the generated addresses > also test memory_address_addr_space_p. > - Replaced GEN_INT with gen_int_mode. > - Replaced some bitmap_head with auto_bitmap. > - Refactor each phase into own function for readability. > - Add dump details. > - Replace rtx iteration with reg_mentioned_p. > - Return early for codes that we can't propagate through. > > Changes in v2: > - Made the pass target-independant instead of RISCV specific. > - Fixed a number of bugs. > - Add code to handle more ADD patterns as found > in other targets (x86, aarch64). > - Improved naming and comments. > - Fixed bitmap memory leak. > > gcc/Makefile.in | 1 + > gcc/common.opt | 4 + > gcc/doc/invoke.texi | 8 + > gcc/fold-mem-offsets.cc | 891 ++++++++++++++++++ > gcc/passes.def | 1 + > .../gcc.target/riscv/fold-mem-offsets-1.c | 16 + > .../gcc.target/riscv/fold-mem-offsets-2.c | 24 + > .../gcc.target/riscv/fold-mem-offsets-3.c | 17 + > gcc/tree-pass.h | 1 + > 9 files changed, 963 insertions(+) > create mode 100644 gcc/fold-mem-offsets.cc > create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 6d608db4dd2..d18bef1be4b 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1435,6 +1435,7 @@ OBJS =3D \ > fixed-value.o \ > fold-const.o \ > fold-const-call.o \ > + fold-mem-offsets.o \ > function.o \ > function-abi.o \ > function-tests.o \ > diff --git a/gcc/common.opt b/gcc/common.opt > index f137a1f81ac..b103b8d28ed 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1252,6 +1252,10 @@ fcprop-registers > Common Var(flag_cprop_registers) Optimization > Perform a register copy-propagation optimization pass. > > +ffold-mem-offsets > +Target Bool Var(flag_fold_mem_offsets) Init(1) > +Fold instructions calculating memory offsets to the memory access instru= ction if possible. > + > fcrossjumping > Common Var(flag_crossjumping) Optimization > Perform cross-jumping optimization. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 33befee7d6b..ce5a83a2d9c 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}. > -fauto-inc-dec -fbranch-probabilities > -fcaller-saves > -fcombine-stack-adjustments -fconserve-stack > +-ffold-mem-offsets > -fcompare-elim -fcprop-registers -fcrossjumping > -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules > -fcx-limited-range > @@ -14355,6 +14356,13 @@ the comparison operation before register allocat= ion is complete. > > Enabled at levels @option{-O1}, @option{-O2}, @option{-O3}, @option{-Os}= . > > +@opindex ffold-mem-offsets > +@item -ffold-mem-offsets > +@itemx -fno-fold-mem-offsets > +Try to eliminate add instructions by folding them in memory loads/stores= . > + > +Enabled at levels @option{-O2}, @option{-O3}. > + > @opindex fcprop-registers > @item -fcprop-registers > After register allocation and post-register allocation instruction split= ting, > diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc > new file mode 100644 > index 00000000000..981c7a5e527 > --- /dev/null > +++ b/gcc/fold-mem-offsets.cc > @@ -0,0 +1,891 @@ > +/* Late RTL pass to fold memory offsets. > + Copyright (C) 2023 Free Software Foundation, Inc. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify > +it under the terms of the GNU General Public License as published by > +the Free Software Foundation; either version 3, or (at your option) > +any later version. > + > +GCC is distributed in the hope that it will be useful, > +but WITHOUT ANY WARRANTY; without even the implied warranty of > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +GNU General Public License for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +. */ > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "tm.h" > +#include "rtl.h" > +#include "tree.h" > +#include "expr.h" > +#include "backend.h" > +#include "regs.h" > +#include "target.h" > +#include "memmodel.h" > +#include "emit-rtl.h" > +#include "insn-config.h" > +#include "recog.h" > +#include "predict.h" > +#include "df.h" > +#include "tree-pass.h" > +#include "cfgrtl.h" > + > +/* This pass tries to optimize memory offset calculations by moving cons= tants > + from add instructions to the memory instructions (loads / stores). > + For example it can transform code like this: > + > + add t4, sp, 16 > + add t2, a6, t4 > + shl t3, t2, 1 > + ld a2, 0(t3) > + add a2, 1 > + sd a2, 8(t2) > + > + into the following (one instruction less): > + > + add t2, a6, sp > + shl t3, t2, 1 > + ld a2, 32(t3) > + add a2, 1 > + sd a2, 24(t2) > + > + Although the previous passes try to emit efficient offset calculation= s > + this pass is still beneficial because: > + > + - The mechanisms that optimize memory offsets usually work with spec= ific > + patterns or have limitations. This pass is designed to fold offse= ts > + through complex calculations that affect multiple memory operation= s > + and have partially overlapping calculations. > + > + - There are cases where add instructions are introduced in late rtl = passes > + and the rest of the pipeline cannot eliminate them. Arrays and st= ructs > + allocated on the stack can result in unwanted add instructions tha= t > + cannot be eliminated easily. > + > + This pass works on a basic block level and consists of 4 phases: > + > + - Phase 1 (Analysis): Find "foldable" instructions. > + Foldable instructions are those that we know how to propagate > + a constant addition through (add, shift, move, ...) and only have = other > + foldable instructions for uses. In that phase a DFS traversal on = the > + definition tree is performed and foldable instructions are marked = on > + a bitmap. The add immediate instructions that are reachable in th= is > + DFS are candidates for folding since all the intermediate calculat= ions > + affected by them are also foldable. > + > + - Phase 2 (Validity): Traverse and calculate the offsets that would = result > + from folding the add immediate instructions. Check whether the > + calculated offsets result in a valid instruction for the target. > + > + - Phase 3 (Commit offsets): Traverse again. It is now known which f= olds > + are valid so at this point change the offsets in the memory instru= ctions. > + > + - Phase 4 (Commit instruction deletions): Scan all instructions and = delete > + or simplify (reduce to move) all add immediate instructions that w= ere > + folded. > + > + This pass should run before hard register propagation because it crea= tes > + register moves that we expect to be eliminated. */ > + > +namespace { > + > +const pass_data pass_data_fold_mem =3D > +{ > + RTL_PASS, /* type */ > + "fold_mem_offsets", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + TV_NONE, /* tv_id */ > + 0, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_df_finish, /* todo_flags_finish */ > +}; > + > +class pass_fold_mem_offsets : public rtl_opt_pass > +{ > +public: > + pass_fold_mem_offsets (gcc::context *ctxt) > + : rtl_opt_pass (pass_data_fold_mem, ctxt) > + {} > + > + /* opt_pass methods: */ > + virtual bool gate (function *) > + { > + return flag_fold_mem_offsets && optimize >=3D 2; > + } > + > + virtual unsigned int execute (function *); > +}; // class pass_fold_mem_offsets > + > +/* Class that holds in FOLD_INSNS the instructions that if folded the of= fset > + of a memory instruction would increase by ADDED_OFFSET. */ > +class fold_mem_info { > +public: > + auto_bitmap fold_insns; > + HOST_WIDE_INT added_offset; > +}; > + > +typedef hash_map fold_info_map; > + > +/* Tracks which instructions can be reached through instructions that ca= n > + propagate offsets for folding. */ > +static bitmap_head can_fold_insns; > + > +/* Marks instructions that are currently eligible for folding. */ > +static bitmap_head candidate_fold_insns; > + > +/* Tracks instructions that cannot be folded because it turned out that > + folding will result in creating an invalid memory instruction. > + An instruction can be in both CANDIDATE_FOLD_INSNS and CANNOT_FOLD_IN= SNS > + at the same time, in which case it is not legal to fold. */ > +static bitmap_head cannot_fold_insns; > + > +/* The number of instructions that were simplified or eliminated. */ > +static int stats_fold_count; > + > +/* Get the single reaching definition of an instruction inside a BB. > + The definition is desired for REG used in INSN. > + Return the definition insn or NULL if there's no definition with > + the desired criteria. */ > +static rtx_insn* > +get_single_def_in_bb (rtx_insn *insn, rtx reg) > +{ > + df_ref use; > + struct df_link *ref_chain, *ref_link; > + > + FOR_EACH_INSN_USE (use, insn) > + { > + if (GET_CODE (DF_REF_REG (use)) =3D=3D SUBREG) > + return NULL; > + if (REGNO (DF_REF_REG (use)) =3D=3D REGNO (reg)) > + break; > + } > + > + if (!use) > + return NULL; > + > + ref_chain =3D DF_REF_CHAIN (use); > + > + if (!ref_chain) > + return NULL; > + > + for (ref_link =3D ref_chain; ref_link; ref_link =3D ref_link->next) > + { > + /* Problem getting some definition for this instruction. */ > + if (ref_link->ref =3D=3D NULL) > + return NULL; > + if (DF_REF_INSN_INFO (ref_link->ref) =3D=3D NULL) > + return NULL; > + if (global_regs[REGNO (reg)] > + && !set_of (reg, DF_REF_INSN (ref_link->ref))) > + return NULL; > + } > + > + if (ref_chain->next) > + return NULL; > + > + rtx_insn* def =3D DF_REF_INSN (ref_chain->ref); > + > + if (BLOCK_FOR_INSN (def) !=3D BLOCK_FOR_INSN (insn)) > + return NULL; > + > + if (DF_INSN_LUID (def) > DF_INSN_LUID (insn)) > + return NULL; > + > + return def; > +} > + > +/* Get all uses of REG which is set in INSN. Return the use list or NUL= L if a > + use is missing / irregular. If SUCCESS is not NULL then set it to fa= lse if > + there are missing / irregular uses and true otherwise. */ > +static struct df_link* > +get_uses (rtx_insn *insn, rtx reg, bool* success) > +{ > + df_ref def; > + struct df_link *ref_chain, *ref_link; > + > + if (success) > + *success =3D false; > + > + FOR_EACH_INSN_DEF (def, insn) > + if (REGNO (DF_REF_REG (def)) =3D=3D REGNO (reg)) > + break; > + > + if (!def) > + return NULL; > + > + ref_chain =3D DF_REF_CHAIN (def); > + > + for (ref_link =3D ref_chain; ref_link; ref_link =3D ref_link->next) > + { > + /* Problem getting a use for this instruction. */ > + if (ref_link->ref =3D=3D NULL) > + return NULL; > + if (DF_REF_CLASS (ref_link->ref) !=3D DF_REF_REGULAR) > + return NULL; > + /* We do not handle REG_EQUIV/REG_EQ notes for now. */ > + if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE) > + return NULL; > + } > + > + if (success) > + *success =3D true; > + > + return ref_chain; > +} > + > +static HOST_WIDE_INT > +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_ins= ns); > + > +/* Helper function for fold_offsets. > + > + If DO_RECURSION is false and ANALYZE is true this function returns t= rue iff > + it understands the structure of INSN and knows how to propagate cons= tants > + through it. In this case OFFSET_OUT and FOLDABLE_INSNS are unused. > + > + If DO_RECURSION is true then it also calls fold_offsets for each rec= ognised > + part of INSN with the appropriate arguments. > + > + If DO_RECURSION is true and ANALYZE is false then offset that would = result > + from folding is computed and is returned through the pointer OFFSET_= OUT. > + The instructions that can be folded are recorded in FOLDABLE_INSNS. > +*/ > +static bool fold_offsets_1 (rtx_insn* insn, bool analyze, bool do_recurs= ion, > + HOST_WIDE_INT *offset_out, bitmap foldable_in= sns) > +{ > + /* Doesn't make sense if both DO_RECURSION and ANALYZE are false. */ > + gcc_checking_assert (do_recursion || analyze); > + gcc_checking_assert (GET_CODE (PATTERN (insn)) =3D=3D SET); > + > + rtx src =3D SET_SRC (PATTERN (insn)); > + HOST_WIDE_INT offset =3D 0; > + > + switch (GET_CODE (src)) > + { > + case PLUS: > + { > + /* Propagate through add. */ > + rtx arg1 =3D XEXP (src, 0); > + rtx arg2 =3D XEXP (src, 1); > + > + if (REG_P (arg1)) > + { > + if (do_recursion) > + offset +=3D fold_offsets (insn, arg1, analyze, foldable_ins= ns); > + } > + else if (GET_CODE (arg1) =3D=3D ASHIFT > + && REG_P (XEXP (arg1, 0)) > + && CONST_INT_P (XEXP (arg1, 1))) > + { > + /* Handle R1 =3D (R2 << C) + ... */ > + if (do_recursion) > + { > + HOST_WIDE_INT scale > + =3D (HOST_WIDE_INT_1U << INTVAL (XEXP (arg1, 1))); > + offset +=3D scale * fold_offsets (insn, XEXP (arg1, 0), a= nalyze, > + foldable_insns); > + } > + } > + else if (GET_CODE (arg1) =3D=3D PLUS > + && REG_P (XEXP (arg1, 0)) > + && REG_P (XEXP (arg1, 1))) > + { > + /* Handle R1 =3D (R2 + R3) + ... */ > + if (do_recursion) > + { > + offset +=3D fold_offsets (insn, XEXP (arg1, 0), analyze, > + foldable_insns); > + offset +=3D fold_offsets (insn, XEXP (arg1, 1), analyze, > + foldable_insns); > + } > + } > + else if (GET_CODE (arg1) =3D=3D PLUS > + && GET_CODE (XEXP (arg1, 0)) =3D=3D ASHIFT > + && REG_P (XEXP (XEXP (arg1, 0), 0)) > + && CONST_INT_P (XEXP (XEXP (arg1, 0), 1)) > + && REG_P (XEXP (arg1, 1))) > + { > + /* Handle R1 =3D ((R2 << C) + R3) + ... */ > + if (do_recursion) > + { > + HOST_WIDE_INT scale > + =3D (HOST_WIDE_INT_1U << INTVAL (XEXP (XEXP (arg1, 0), = 1))); > + offset +=3D scale * fold_offsets (insn, XEXP (XEXP (arg1,= 0), 0), > + analyze, foldable_insns); > + offset +=3D fold_offsets (insn, XEXP (arg1, 1), analyze, > + foldable_insns); > + } > + } > + else > + return false; > + > + if (REG_P (arg2)) > + { > + if (do_recursion) > + offset +=3D fold_offsets (insn, arg2, analyze, foldable_ins= ns); > + } > + else if (CONST_INT_P (arg2)) > + { > + if (REG_P (arg1)) > + { > + offset +=3D INTVAL (arg2); > + /* This is a R1 =3D R2 + C instruction, candidate for fol= ding. */ > + if (!analyze) > + bitmap_set_bit (foldable_insns, INSN_UID (insn)); > + } > + } > + else > + return false; > + > + /* Pattern recognised for folding. */ > + break; > + } > + case MINUS: > + { > + /* Propagate through minus. */ > + rtx arg1 =3D XEXP (src, 0); > + rtx arg2 =3D XEXP (src, 1); > + > + if (REG_P (arg1)) > + { > + if (do_recursion) > + offset +=3D fold_offsets (insn, arg1, analyze, foldable_ins= ns); > + } > + else > + return false; > + > + if (REG_P (arg2)) > + { > + if (do_recursion) > + offset -=3D fold_offsets (insn, arg2, analyze, foldable_ins= ns); > + } > + else if (CONST_INT_P (arg2)) > + { > + if (REG_P (arg1)) > + { > + offset -=3D INTVAL (arg2); > + /* This is a R1 =3D R2 - C instruction, candidate for fol= ding. */ > + if (!analyze) > + bitmap_set_bit (foldable_insns, INSN_UID (insn)); > + } > + } > + else > + return false; > + > + /* Pattern recognised for folding. */ > + break; > + } > + case NEG: > + { > + /* Propagate through negation. */ > + rtx arg1 =3D XEXP (src, 0); > + if (REG_P (arg1)) > + { > + if (do_recursion) > + offset =3D -fold_offsets (insn, arg1, analyze, foldable_ins= ns); > + } > + else > + return false; > + > + /* Pattern recognised for folding. */ > + break; > + } > + case MULT: > + { > + /* Propagate through multiply by constant. */ > + rtx arg1 =3D XEXP (src, 0); > + rtx arg2 =3D XEXP (src, 1); > + > + if (REG_P (arg1) && CONST_INT_P (arg2)) > + { > + if (do_recursion) > + { > + HOST_WIDE_INT scale =3D INTVAL (arg2); > + offset =3D scale * fold_offsets (insn, arg1, analyze, > + foldable_insns); > + } > + } > + else > + return false; > + > + /* Pattern recognised for folding. */ > + break; > + } > + case ASHIFT: > + { > + /* Propagate through shift left by constant. */ > + rtx arg1 =3D XEXP (src, 0); > + rtx arg2 =3D XEXP (src, 1); > + > + if (REG_P (arg1) && CONST_INT_P (arg2)) > + { > + if (do_recursion) > + { > + HOST_WIDE_INT scale =3D (HOST_WIDE_INT_1U << INTVAL (arg2= )); > + offset =3D scale * fold_offsets (insn, arg1, analyze, > + foldable_insns); > + } > + } > + else > + return false; > + > + /* Pattern recognised for folding. */ > + break; > + } > + case REG: > + { > + /* Propagate through register move. */ > + if (do_recursion) > + offset =3D fold_offsets (insn, src, analyze, foldable_insns); > + > + /* Pattern recognised for folding. */ > + break; > + } > + case CONST_INT: > + { > + offset =3D INTVAL (src); > + /* R1 =3D C is candidate for folding. */ > + if (!analyze) > + bitmap_set_bit (foldable_insns, INSN_UID (insn)); > + > + /* Pattern recognised for folding. */ > + break; > + } > + default: > + /* Cannot recognise. */ > + return false; > + } > + > + if (do_recursion && !analyze) > + *offset_out =3D offset; > + > + return true; > +} > + > +/* Function that computes the offset that would have to be added to all = uses > + of REG if the instructions marked in FOLDABLE_INSNS were to be elimin= ated. > + > + If ANALYZE is true then mark in CAN_FOLD_INSNS which instructions > + transitively only affect other instructions found in CAN_FOLD_INSNS. > + If ANALYZE is false then compute the offset required for folding. */ > +static HOST_WIDE_INT > +fold_offsets (rtx_insn* insn, rtx reg, bool analyze, bitmap foldable_ins= ns) > +{ > + rtx_insn* def =3D get_single_def_in_bb (insn, reg); > + > + if (!def || GET_CODE (PATTERN (def)) !=3D SET) > + return 0; > + > + rtx dest =3D SET_DEST (PATTERN (def)); > + > + if (!REG_P (dest)) > + return 0; > + > + /* We can only affect the values of GPR registers. */ > + unsigned int dest_regno =3D REGNO (dest); > + if (fixed_regs[dest_regno] > + || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regn= o)) > + return 0; > + > + if (analyze) > + { > + /* Check if we know how to handle DEF. */ > + if (!fold_offsets_1 (def, true, false, NULL, NULL)) > + return 0; > + > + /* We only fold through instructions that are transitively used as > + memory addresses and do not have other uses. Use the same logic > + from offset calculation to visit instructions that can propagate > + offsets and keep track of them in CAN_FOLD_INSNS. */ > + bool success; > + struct df_link *uses =3D get_uses (def, dest, &success), *ref_link= ; > + > + if (!success) > + return 0; > + > + for (ref_link =3D uses; ref_link; ref_link =3D ref_link->next) > + { > + rtx_insn* use =3D DF_REF_INSN (ref_link->ref); > + > + if (DEBUG_INSN_P (use)) > + continue; > + > + /* Punt if the use is anything more complicated than a set > + (clobber, use, etc). */ > + if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) !=3D SET) > + return 0; > + > + /* This use affects instructions outside of CAN_FOLD_INSNS. */ > + if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use))) > + return 0; > + > + rtx use_set =3D PATTERN (use); > + > + /* Special case: A foldable memory store is not foldable if it > + mentions DEST outside of the address calculation. */ > + if (use_set && MEM_P (SET_DEST (use_set)) > + && reg_mentioned_p (dest, SET_SRC (use_set))) > + return 0; > + } > + > + bitmap_set_bit (&can_fold_insns, INSN_UID (def)); > + > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "Instruction marked for propagation: "); > + print_rtl_single (dump_file, def); > + } > + } > + else > + { > + /* We cannot propagate through this instruction. */ > + if (!bitmap_bit_p (&can_fold_insns, INSN_UID (def))) > + return 0; > + } > + > + HOST_WIDE_INT offset =3D 0; > + bool recognised =3D fold_offsets_1 (def, analyze, true, &offset, > + foldable_insns); > + > + if (!recognised) > + return 0; > + > + return offset; > +} > + > +/* Test if INSN is a memory load / store that can have an offset folded = to it. > + Return true iff INSN is such an instruction and return through MEM_OU= T, > + REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that= is > + used as a base address and the offset accordingly. > + All of the out pointers may be NULL in which case they will be ignore= d. */ > +bool > +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out, > + HOST_WIDE_INT *offset_out) > +{ > + rtx set =3D single_set (insn); > + rtx mem =3D NULL_RTX; > + > + if (set !=3D NULL_RTX) > + { > + rtx src =3D SET_SRC (set); > + rtx dest =3D SET_DEST (set); > + > + /* Don't fold when we have unspec / volatile. */ > + if (GET_CODE (src) =3D=3D UNSPEC > + || GET_CODE (src) =3D=3D UNSPEC_VOLATILE > + || GET_CODE (dest) =3D=3D UNSPEC > + || GET_CODE (dest) =3D=3D UNSPEC_VOLATILE) > + return false; > + > + if (MEM_P (src)) > + mem =3D src; > + else if (MEM_P (dest)) > + mem =3D dest; > + else if ((GET_CODE (src) =3D=3D SIGN_EXTEND > + || GET_CODE (src) =3D=3D ZERO_EXTEND) > + && MEM_P (XEXP (src, 0))) > + mem =3D XEXP (src, 0); > + } > + > + if (mem =3D=3D NULL_RTX) > + return false; > + > + rtx mem_addr =3D XEXP (mem, 0); > + rtx reg; > + HOST_WIDE_INT offset; > + > + if (REG_P (mem_addr)) > + { > + reg =3D mem_addr; > + offset =3D 0; > + } > + else if (GET_CODE (mem_addr) =3D=3D PLUS > + && REG_P (XEXP (mem_addr, 0)) > + && CONST_INT_P (XEXP (mem_addr, 1))) > + { > + reg =3D XEXP (mem_addr, 0); > + offset =3D INTVAL (XEXP (mem_addr, 1)); > + } > + else > + return false; > + > + if (mem_out) > + *mem_out =3D mem; > + if (reg_out) > + *reg_out =3D reg; > + if (offset_out) > + *offset_out =3D offset; > + > + return true; > +} > + > +/* If INSN is a root memory instruction then do a DFS traversal on its > + definitions and find folding candidates. */ > +static void > +do_analysis (rtx_insn* insn) > +{ > + rtx reg; > + if (!get_fold_mem_root (insn, NULL, ®, NULL)) > + return; > + > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "Starting analysis from root: "); > + print_rtl_single (dump_file, insn); > + } > + > + /* Analyse folding opportunities for this memory instruction. */ > + bitmap_set_bit (&can_fold_insns, INSN_UID (insn)); > + fold_offsets (insn, reg, true, NULL); > +} > + > +static void > +do_fold_info_calculation (rtx_insn* insn, fold_info_map *fold_info) > +{ > + rtx mem, reg; > + HOST_WIDE_INT cur_offset; > + if (!get_fold_mem_root (insn, &mem, ®, &cur_offset)) > + return; > + > + fold_mem_info *info =3D new fold_mem_info; > + info->added_offset =3D fold_offsets (insn, reg, false, info->fold_insn= s); > + > + fold_info->put (insn, info); > +} > + > +/* If INSN is a root memory instruction then compute a potentially new o= ffset > + for it and test if the resulting instruction is valid. */ > +static void > +do_check_validity (rtx_insn* insn, fold_mem_info *info) > +{ > + rtx mem, reg; > + HOST_WIDE_INT cur_offset; > + if (!get_fold_mem_root (insn, &mem, ®, &cur_offset)) > + return; > + > + HOST_WIDE_INT new_offset =3D cur_offset + info->added_offset; > + > + /* Test if it is valid to change MEM's address offset to NEW_OFFSET. = */ > + int icode =3D INSN_CODE (insn); > + rtx mem_addr =3D XEXP (mem, 0); > + machine_mode mode =3D GET_MODE (mem_addr); > + XEXP (mem, 0) =3D gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, m= ode)); > + > + bool illegal =3D insn_invalid_p (insn, false) > + || !memory_address_addr_space_p (mode, XEXP (mem, 0), > + MEM_ADDR_SPACE (mem)); > + > + /* Restore the instruction. */ > + XEXP (mem, 0) =3D mem_addr; > + INSN_CODE (insn) =3D icode; > + > + if (illegal) > + bitmap_ior_into (&cannot_fold_insns, info->fold_insns); > + else > + bitmap_ior_into (&candidate_fold_insns, info->fold_insns); > +} > + > +static bool > +compute_validity_closure (fold_info_map *fold_info) > +{ > + /* Let's say we have an arbitrary chain of foldable instructions xN = =3D xN + C > + and memory operations rN that use xN as shown below. If folding x1= in r1 > + turns out to be invalid for whatever reason then it's also invalid = to fold > + any of the other xN into any rN. That means that we need the trans= itive > + closure of validity to determine whether we can fold a xN instructi= on. > + > + +--------------+ +-------------------+ +-------------------+ > + | r1 =3D mem[x1] | | r2 =3D mem[x1 + x2] | | r3 =3D mem[x2 + = x3] | ... > + +--------------+ +-------------------+ +-------------------+ > + ^ ^ ^ ^ ^ > + | / | / | = ... > + | / | / | > + +-------------+ / +-------------+ / +-------------+ > + | x1 =3D x1 + 1 |-----+ | x2 =3D x2 + 1 |-----+ | x3 =3D x3 += 1 |--- ... > + +-------------+ +-------------+ +-------------+ > + ^ ^ ^ > + | | | > + ... ... ... > + */ > + > + int max_iters =3D 5; > + for (int i =3D 0; i < max_iters; i++) > + { > + bool made_changes =3D false; > + for (fold_info_map::iterator iter =3D fold_info->begin (); > + iter !=3D fold_info->end (); ++iter) > + { > + fold_mem_info *info =3D (*iter).second; > + if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns)) > + made_changes |=3D bitmap_ior_into (&cannot_fold_insns, > + info->fold_insns); > + } > + > + if (!made_changes) > + return true; > + } > + > + return false; > +} > + > +/* If INSN is a root memory instruction that was affected by any folding > + then update its offset as necessary. */ > +static void > +do_commit_offset (rtx_insn* insn, fold_mem_info *info) > +{ > + rtx mem, reg; > + HOST_WIDE_INT cur_offset; > + if (!get_fold_mem_root (insn, &mem, ®, &cur_offset)) > + return; > + > + HOST_WIDE_INT new_offset =3D cur_offset + info->added_offset; > + > + if (new_offset =3D=3D cur_offset) > + return; > + > + gcc_assert (!bitmap_empty_p (info->fold_insns)); > + > + if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns)) > + return; > + > + if (dump_file) > + { > + fprintf (dump_file, "Memory offset changed from " > + HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC > + " for instruction:\n", cur_offset, new_offset); > + print_rtl_single (dump_file, insn); > + } > + > + machine_mode mode =3D GET_MODE (XEXP (mem, 0)); > + XEXP (mem, 0) =3D gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, m= ode)); > + INSN_CODE (insn) =3D recog (PATTERN (insn), insn, 0); > + df_insn_rescan (insn); > +} > + > +/* If INSN is a move / add instruction that was folded then replace its > + constant part with zero. */ > +static void > +do_commit_insn (rtx_insn* insn) > +{ > + if (bitmap_bit_p (&candidate_fold_insns, INSN_UID (insn)) > + && !bitmap_bit_p (&cannot_fold_insns, INSN_UID (insn))) > + { > + if (dump_file) > + { > + fprintf (dump_file, "Instruction folded:"); > + print_rtl_single (dump_file, insn); > + } > + > + stats_fold_count++; > + > + rtx set =3D single_set (insn); > + rtx dest =3D SET_DEST (set); > + rtx src =3D SET_SRC (set); > + > + /* Emit a move and let subsequent passes eliminate it if possible.= */ > + if (GET_CODE (src) =3D=3D CONST_INT) > + { > + /* INSN is R1 =3D C. > + Replace it with R1 =3D 0 because C was folded. */ > + rtx mov_rtx > + =3D gen_move_insn (dest, gen_int_mode (0, GET_MODE (dest))); > + df_insn_rescan (emit_insn_after (mov_rtx, insn)); > + } > + else > + { > + /* INSN is R1 =3D R2 + C. > + Replace it with R1 =3D R2 because C was folded. */ > + rtx arg1 =3D XEXP (src, 0); > + > + /* If the DEST =3D=3D ARG1 then the move is a no-op. */ > + if (REGNO (dest) !=3D REGNO (arg1)) > + { > + gcc_checking_assert (GET_MODE (dest) =3D=3D GET_MODE (arg1)= ); > + rtx mov_rtx =3D gen_move_insn (dest, arg1); > + df_insn_rescan (emit_insn_after (mov_rtx, insn)); > + } > + } > + > + /* Delete the original move / add instruction. */ > + delete_insn (insn); > + } > +} > + > +unsigned int > +pass_fold_mem_offsets::execute (function *fn) > +{ > + df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + DF_DEFER_INSN_RESC= AN); > + df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN); > + df_analyze (); > + > + bitmap_initialize (&can_fold_insns, NULL); > + bitmap_initialize (&candidate_fold_insns, NULL); > + bitmap_initialize (&cannot_fold_insns, NULL); > + > + stats_fold_count =3D 0; > + > + basic_block bb; > + rtx_insn *insn; > + FOR_ALL_BB_FN (bb, fn) > + { > + /* There is a conflict between this pass and RISCV's shorten-memre= fs > + pass. For now disable folding if optimizing for size because > + otherwise this cancels the effects of shorten-memrefs. */ > + if (optimize_bb_for_size_p (bb)) > + continue; > + > + fold_info_map fold_info; > + > + bitmap_clear (&can_fold_insns); > + bitmap_clear (&candidate_fold_insns); > + bitmap_clear (&cannot_fold_insns); > + > + FOR_BB_INSNS (bb, insn) > + do_analysis (insn); > + > + FOR_BB_INSNS (bb, insn) > + do_fold_info_calculation (insn, &fold_info); > + > + FOR_BB_INSNS (bb, insn) > + if (fold_mem_info **info =3D fold_info.get (insn)) > + do_check_validity (insn, *info); > + > + if (compute_validity_closure (&fold_info)) > + { > + FOR_BB_INSNS (bb, insn) > + if (fold_mem_info **info =3D fold_info.get (insn)) > + do_commit_offset (insn, *info); > + > + FOR_BB_INSNS (bb, insn) > + do_commit_insn (insn); > + } > + > + for (fold_info_map::iterator iter =3D fold_info.begin (); > + iter !=3D fold_info.end (); ++iter) > + delete (*iter).second; > + } > + > + statistics_counter_event (cfun, "Number of folded instructions", > + stats_fold_count); > + > + bitmap_release (&can_fold_insns); > + bitmap_release (&candidate_fold_insns); > + bitmap_release (&cannot_fold_insns); > + > + return 0; > +} > + > +} // anon namespace > + > +rtl_opt_pass * > +make_pass_fold_mem_offsets (gcc::context *ctxt) > +{ > + return new pass_fold_mem_offsets (ctxt); > +} > diff --git a/gcc/passes.def b/gcc/passes.def > index 4110a472914..93e15cba65b 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -519,6 +519,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_peephole2); > NEXT_PASS (pass_if_after_reload); > NEXT_PASS (pass_regrename); > + NEXT_PASS (pass_fold_mem_offsets); > NEXT_PASS (pass_cprop_hardreg); > NEXT_PASS (pass_fast_rtl_dce); > NEXT_PASS (pass_reorder_blocks); > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c b/gcc/te= stsuite/gcc.target/riscv/fold-mem-offsets-1.c > new file mode 100644 > index 00000000000..8b31cd9e22e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ffold-mem-offsets" } */ > + > +void sink(int arr[2]); > + > +void > +foo(int a, int b, int i) > +{ > + int arr[2] =3D {a, b}; > + arr[i]++; > + sink(arr); > +} > + > +// Should compile without negative memory offsets when using -mfold-mem-= offsets > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */ > +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */ > \ No newline at end of file > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c b/gcc/te= stsuite/gcc.target/riscv/fold-mem-offsets-2.c > new file mode 100644 > index 00000000000..2e1348efdf1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c > @@ -0,0 +1,24 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ffold-mem-offsets" } */ > + > +void sink(int arr[3]); > + > +void > +foo(int a, int b, int c, int i) > +{ > + int arr1[3] =3D {a, b, c}; > + int arr2[3] =3D {a, c, b}; > + int arr3[3] =3D {c, b, a}; > + > + arr1[i]++; > + arr2[i]++; > + arr3[i]++; > + > + sink(arr1); > + sink(arr2); > + sink(arr3); > +} > + > +// Should compile without negative memory offsets when using -mfold-mem-= offsets > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */ > +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */ > \ No newline at end of file > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c b/gcc/te= stsuite/gcc.target/riscv/fold-mem-offsets-3.c > new file mode 100644 > index 00000000000..9c50c51ac1c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ffold-mem-offsets" } */ > + > +void load(int arr[2]); > + > +int > +foo(long unsigned int i) > +{ > + int arr[2]; > + load(arr); > + > + return arr[3 * i + 77]; > +} > + > +// Should compile without negative memory offsets when using -mfold-mem-= offsets > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */ > +/* { dg-final { scan-assembler-not "addi\t.*,.*,77" } } */ > \ No newline at end of file > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h > index eba2d54ac76..34a38dbd148 100644 > --- a/gcc/tree-pass.h > +++ b/gcc/tree-pass.h > @@ -621,6 +621,7 @@ extern rtl_opt_pass *make_pass_sched_fusion (gcc::con= text *ctxt); > extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt); > extern rtl_opt_pass *make_pass_if_after_reload (gcc::context *ctxt); > extern rtl_opt_pass *make_pass_regrename (gcc::context *ctxt); > +extern rtl_opt_pass *make_pass_fold_mem_offsets (gcc::context *ctxt); > extern rtl_opt_pass *make_pass_cprop_hardreg (gcc::context *ctxt); > extern rtl_opt_pass *make_pass_reorder_blocks (gcc::context *ctxt); > extern rtl_opt_pass *make_pass_leaf_regs (gcc::context *ctxt); > -- > 2.34.1 >