From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F28F73A1982F for ; Fri, 13 Nov 2020 08:25:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org F28F73A1982F Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ADDCF142F for ; Fri, 13 Nov 2020 00:25:01 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 01C543F718 for ; Fri, 13 Nov 2020 00:25:00 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 23/23] fwprop: Rewrite to use RTL SSA References: Date: Fri, 13 Nov 2020 08:24:59 +0000 In-Reply-To: (Richard Sandiford's message of "Fri, 13 Nov 2020 08:10:54 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Nov 2020 08:25:08 -0000 This patch rewrites fwprop.c to use the RTL SSA framework. It tries as far as possible to mimic the old behaviour, even in caes where that doesn't fit naturally with the new framework. I've added ??? comments to mark those places, but I think =E2=80=9Cfixing=E2=80=9D them sh= ould be done separately to make bisection easier. In particular: * The old implementation iterated over uses, and after a successful substitution, the new insn's uses were added to the end of the list. The pass still processed those uses, but because it processed them at the end, it didn't fully optimise one instruction before propagating it into the next. The new version follows the same approach for comparison purposes, but I'd like to drop that as a follow-on patch. * The old implementation operated on single use sites (DF_REF_LOCs). This doesn't work well for instructions with match_dups, where it's necessary to update both an operand and its dups at the same time. For example, attempting to substitute into a divmod instruction would fail because only the div or the mod side would be updated. The new version again follows this to some extent for comparison purposes (although not exactly). Again I'd like to drop it as a follow-on patch. One difference is that if a register occurs in multiple MEM addresses in a set, the new version will try to update them all at once. This is what causes the SVE ACLE st4* output to improve. Also, the old version didn't naturally guarantee termination (PR79405), whereas the new one does. gcc/ * fwprop.c: Rewrite to use the RTL SSA framework. gcc/testsuite/ * gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: Don't expect insn updates to be deferred. * gcc.target/aarch64/sve/acle/asm/st4_s8.c: Expect the addition to be folded into the address. * gcc.target/aarch64/sve/acle/asm/st4_s8.c: Likewise. --- gcc/fwprop.c | 1698 ++++++----------- .../test-return-const.c.before-fwprop.c | 2 +- .../gcc.target/aarch64/sve/acle/asm/st4_s8.c | 8 +- .../gcc.target/aarch64/sve/acle/asm/st4_u8.c | 8 +- 4 files changed, 561 insertions(+), 1155 deletions(-) *** /tmp/9upGS6_fwprop.c 2020-11-13 08:23:52.837409271 +0000 --- gcc/fwprop.c 2020-11-13 08:05:06.490403698 +0000 *************** *** 18,49 **** along with GCC; see the file COPYING3. If not see . */ =20=20 #include "config.h" #include "system.h" #include "coretypes.h" #include "backend.h" - #include "target.h" #include "rtl.h" - #include "predict.h" #include "df.h" ! #include "memmodel.h" ! #include "tm_p.h" ! #include "insn-config.h" ! #include "emit-rtl.h" ! #include "recog.h" =20=20 #include "sparseset.h" #include "cfgrtl.h" #include "cfgcleanup.h" #include "cfgloop.h" #include "tree-pass.h" - #include "domwalk.h" #include "rtl-iter.h" !=20 =20=20 /* This pass does simple forward propagation and simplification when an operand of an insn can only come from a single def. This pass uses ! df.c, so it is global. However, we only do limited analysis of available expressions. =20=20 1) The pass tries to propagate the source of the def into the use, --- 18,47 ---- along with GCC; see the file COPYING3. If not see . */ =20=20 + #define ADD_NOTES 0 +=20 + #define INCLUDE_ALGORITHM + #define INCLUDE_FUNCTIONAL #include "config.h" #include "system.h" #include "coretypes.h" #include "backend.h" #include "rtl.h" #include "df.h" ! #include "rtl-ssa.h" =20=20 #include "sparseset.h" + #include "predict.h" #include "cfgrtl.h" #include "cfgcleanup.h" #include "cfgloop.h" #include "tree-pass.h" #include "rtl-iter.h" ! #include "target.h" =20=20 /* This pass does simple forward propagation and simplification when an operand of an insn can only come from a single def. This pass uses ! RTL SSA, so it is global. However, we only do limited analysis of available expressions. =20=20 1) The pass tries to propagate the source of the def into the use, *************** *** 60,68 **** (set (subreg:SI (reg:DI 120) 0) (const_int 0)) (set (subreg:SI (reg:DI 120) 4) (const_int -1)) (set (subreg:SI (reg:DI 122) 0) ! (ior:SI (subreg:SI (reg:DI 119) 0) (subreg:SI (reg:DI 120) 0))) (set (subreg:SI (reg:DI 122) 4) ! (ior:SI (subreg:SI (reg:DI 119) 4) (subreg:SI (reg:DI 120) 4))) =20=20 can be simplified to the much simpler =20=20 --- 58,66 ---- (set (subreg:SI (reg:DI 120) 0) (const_int 0)) (set (subreg:SI (reg:DI 120) 4) (const_int -1)) (set (subreg:SI (reg:DI 122) 0) ! (ior:SI (subreg:SI (reg:DI 119) 0) (subreg:SI (reg:DI 120) 0))) (set (subreg:SI (reg:DI 122) 4) ! (ior:SI (subreg:SI (reg:DI 119) 4) (subreg:SI (reg:DI 120) 4))) =20=20 can be simplified to the much simpler =20=20 *************** *** 89,95 **** (set (reg:QI 120) (subreg:QI (reg:SI 118) 0)) (set (reg:QI 121) (subreg:QI (reg:SI 119) 0)) (set (reg:SI 122) (plus:SI (subreg:SI (reg:QI 120) 0) ! (subreg:SI (reg:QI 121) 0))) =20=20 are very common on machines that can only do word-sized operations. For each use of a paradoxical subreg (subreg:WIDER (reg:NARROW N) 0), --- 87,93 ---- (set (reg:QI 120) (subreg:QI (reg:SI 118) 0)) (set (reg:QI 121) (subreg:QI (reg:SI 119) 0)) (set (reg:SI 122) (plus:SI (subreg:SI (reg:QI 120) 0) ! (subreg:SI (reg:QI 121) 0))) =20=20 are very common on machines that can only do word-sized operations. For each use of a paradoxical subreg (subreg:WIDER (reg:NARROW N) 0), *************** *** 101,318 **** (set (reg:QI 121) (subreg:QI (reg:SI 119) 0)) (set (reg:SI 122) (plus:SI (reg:SI 118) (reg:SI 119))) =20=20 ! where the first two insns are now dead. !=20 ! We used to use reaching definitions to find which uses have a ! single reaching definition (sounds obvious...), but this is too ! complex a problem in nasty testcases like PR33928. Now we use the ! multiple definitions problem in df-problems.c. The similarity ! between that problem and SSA form creation is taken further, in ! that fwprop does a dominator walk to create its chains; however, ! instead of creating a PHI function where multiple definitions meet ! I just punt and record only singleton use-def chains, which is ! all that is needed by fwprop. */ =20=20 =20=20 static int num_changes; =20=20 - static vec use_def_ref; - static vec reg_defs; - static vec reg_defs_stack; -=20 - /* The maximum number of propagations that are still allowed. If we do - more propagations than originally we had uses, we must have ended up - in a propagation loop, as in PR79405. Until the algorithm fwprop - uses can obviously not get into such loops we need a workaround like - this. */ - static int propagations_left; -=20 - /* The MD bitmaps are trimmed to include only live registers to cut - memory usage on testcases like insn-recog.c. Track live registers - in the basic block and do not perform forward propagation if the - destination is a dead pseudo occurring in a note. */ - static bitmap local_md; - static bitmap local_lr; -=20 - /* Return the only def in USE's use-def chain, or NULL if there is - more than one def in the chain. */ -=20 - static inline df_ref - get_def_for_use (df_ref use) - { - return use_def_ref[DF_REF_ID (use)]; - } -=20 -=20 - /* Update the reg_defs vector with non-partial definitions in DEF_REC. - TOP_FLAG says which artificials uses should be used, when DEF_REC - is an artificial def vector. LOCAL_MD is modified as after a - df_md_simulate_* function; we do more or less the same processing - done there, so we do not use those functions. */ -=20 - #define DF_MD_GEN_FLAGS \ - (DF_REF_PARTIAL | DF_REF_CONDITIONAL | DF_REF_MAY_CLOBBER) -=20 - static void - process_defs (df_ref def, int top_flag) - { - for (; def; def =3D DF_REF_NEXT_LOC (def)) - { - df_ref curr_def =3D reg_defs[DF_REF_REGNO (def)]; - unsigned int dregno; -=20 - if ((DF_REF_FLAGS (def) & DF_REF_AT_TOP) !=3D top_flag) - continue; -=20 - dregno =3D DF_REF_REGNO (def); - if (curr_def) - reg_defs_stack.safe_push (curr_def); - else - { - /* Do not store anything if "transitioning" from NULL to NULL. But - otherwise, push a special entry on the stack to tell the - leave_block callback that the entry in reg_defs was NULL. */ - if (DF_REF_FLAGS (def) & DF_MD_GEN_FLAGS) - ; - else - reg_defs_stack.safe_push (def); - } -=20 - if (DF_REF_FLAGS (def) & DF_MD_GEN_FLAGS) - { - bitmap_set_bit (local_md, dregno); - reg_defs[dregno] =3D NULL; - } - else - { - bitmap_clear_bit (local_md, dregno); - reg_defs[dregno] =3D def; - } - } - } -=20 -=20 - /* Fill the use_def_ref vector with values for the uses in USE_REC, - taking reaching definitions info from LOCAL_MD and REG_DEFS. - TOP_FLAG says which artificials uses should be used, when USE_REC - is an artificial use vector. */ -=20 - static void - process_uses (df_ref use, int top_flag) - { - for (; use; use =3D DF_REF_NEXT_LOC (use)) - if ((DF_REF_FLAGS (use) & DF_REF_AT_TOP) =3D=3D top_flag) - { - unsigned int uregno =3D DF_REF_REGNO (use); - if (reg_defs[uregno] - && !bitmap_bit_p (local_md, uregno) - && bitmap_bit_p (local_lr, uregno)) - use_def_ref[DF_REF_ID (use)] =3D reg_defs[uregno]; - } - } -=20 - class single_def_use_dom_walker : public dom_walker - { - public: - single_def_use_dom_walker (cdi_direction direction) - : dom_walker (direction) {} - virtual edge before_dom_children (basic_block); - virtual void after_dom_children (basic_block); - }; -=20 - edge - single_def_use_dom_walker::before_dom_children (basic_block bb) - { - int bb_index =3D bb->index; - class df_md_bb_info *md_bb_info =3D df_md_get_bb_info (bb_index); - class df_lr_bb_info *lr_bb_info =3D df_lr_get_bb_info (bb_index); - rtx_insn *insn; -=20 - bitmap_copy (local_md, &md_bb_info->in); - bitmap_copy (local_lr, &lr_bb_info->in); -=20 - /* Push a marker for the leave_block callback. */ - reg_defs_stack.safe_push (NULL); -=20 - process_uses (df_get_artificial_uses (bb_index), DF_REF_AT_TOP); - process_defs (df_get_artificial_defs (bb_index), DF_REF_AT_TOP); -=20 - /* We don't call df_simulate_initialize_forwards, as it may overestimate - the live registers if there are unused artificial defs. We prefer - liveness to be underestimated. */ -=20 - FOR_BB_INSNS (bb, insn) - if (INSN_P (insn)) - { - unsigned int uid =3D INSN_UID (insn); - process_uses (DF_INSN_UID_USES (uid), 0); - process_uses (DF_INSN_UID_EQ_USES (uid), 0); - process_defs (DF_INSN_UID_DEFS (uid), 0); - df_simulate_one_insn_forwards (bb, insn, local_lr); - } -=20 - process_uses (df_get_artificial_uses (bb_index), 0); - process_defs (df_get_artificial_defs (bb_index), 0); -=20 - return NULL; - } -=20 - /* Pop the definitions created in this basic block when leaving its - dominated parts. */ -=20 - void - single_def_use_dom_walker::after_dom_children (basic_block bb ATTRIBUTE_U= NUSED) - { - df_ref saved_def; - while ((saved_def =3D reg_defs_stack.pop ()) !=3D NULL) - { - unsigned int dregno =3D DF_REF_REGNO (saved_def); -=20 - /* See also process_defs. */ - if (saved_def =3D=3D reg_defs[dregno]) - reg_defs[dregno] =3D NULL; - else - reg_defs[dregno] =3D saved_def; - } - } -=20 -=20 - /* Build a vector holding the reaching definitions of uses reached by a - single dominating definition. */ -=20 - static void - build_single_def_use_links (void) - { - /* We use the multiple definitions problem to compute our restricted - use-def chains. */ - df_set_flags (DF_EQ_NOTES); - df_md_add_problem (); - df_note_add_problem (); - df_analyze (); - df_maybe_reorganize_use_refs (DF_REF_ORDER_BY_INSN_WITH_NOTES); -=20 - use_def_ref.create (DF_USES_TABLE_SIZE ()); - use_def_ref.safe_grow_cleared (DF_USES_TABLE_SIZE (), true); -=20 - reg_defs.create (max_reg_num ()); - reg_defs.safe_grow_cleared (max_reg_num (), true); -=20 - reg_defs_stack.create (n_basic_blocks_for_fn (cfun) * 10); - local_md =3D BITMAP_ALLOC (NULL); - local_lr =3D BITMAP_ALLOC (NULL); -=20 - /* Walk the dominator tree looking for single reaching definitions - dominating the uses. This is similar to how SSA form is built. */ - single_def_use_dom_walker (CDI_DOMINATORS) - .walk (cfun->cfg->x_entry_block_ptr); -=20 - BITMAP_FREE (local_lr); - BITMAP_FREE (local_md); - reg_defs.release (); - reg_defs_stack.release (); - } -=20 - /* Do not try to replace constant addresses or addresses of local and argument slots. These MEM expressions are made only once and inserted in many instructions, as well as being used to control symbol table --- 99,110 ---- (set (reg:QI 121) (subreg:QI (reg:SI 119) 0)) (set (reg:SI 122) (plus:SI (reg:SI 118) (reg:SI 119))) =20=20 ! where the first two insns are now dead. */ =20=20 + using namespace rtl_ssa; =20=20 static int num_changes; =20=20 /* Do not try to replace constant addresses or addresses of local and argument slots. These MEM expressions are made only once and inserted in many instructions, as well as being used to control symbol table *************** *** 342,1114 **** && REGNO (reg) !=3D ARG_POINTER_REGNUM)); } =20=20 ! /* Returns a canonical version of X for the address, from the point of vi= ew, ! that all multiplications are represented as MULT instead of the multip= ly ! by a power of 2 being represented as ASHIFT. !=20 ! Every ASHIFT we find has been made by simplify_gen_binary and was not ! there before, so it is not shared. So we can do this in place. */ !=20 ! static void ! canonicalize_address (rtx x) ! { ! for (;;) ! switch (GET_CODE (x)) ! { ! case ASHIFT: ! if (CONST_INT_P (XEXP (x, 1)) ! && INTVAL (XEXP (x, 1)) < GET_MODE_UNIT_BITSIZE (GET_MODE (x)) ! && INTVAL (XEXP (x, 1)) >=3D 0) ! { ! HOST_WIDE_INT shift =3D INTVAL (XEXP (x, 1)); ! PUT_CODE (x, MULT); ! XEXP (x, 1) =3D gen_int_mode (HOST_WIDE_INT_1 << shift, ! GET_MODE (x)); ! } !=20 ! x =3D XEXP (x, 0); ! break; !=20 ! case PLUS: ! if (GET_CODE (XEXP (x, 0)) =3D=3D PLUS ! || GET_CODE (XEXP (x, 0)) =3D=3D ASHIFT ! || GET_CODE (XEXP (x, 0)) =3D=3D CONST) ! canonicalize_address (XEXP (x, 0)); !=20 ! x =3D XEXP (x, 1); ! break; !=20 ! case CONST: ! x =3D XEXP (x, 0); ! break; !=20 ! default: ! return; ! } ! } !=20 ! /* OLD is a memory address. Return whether it is good to use NEW instead, ! for a memory access in the given MODE. */ =20=20 static bool ! should_replace_address (rtx old_rtx, rtx new_rtx, machine_mode mode, ! addr_space_t as, bool speed) { int gain; =20=20 - if (rtx_equal_p (old_rtx, new_rtx) - || !memory_address_addr_space_p (mode, new_rtx, as)) - return false; -=20 - /* Copy propagation is always ok. */ - if (REG_P (old_rtx) && REG_P (new_rtx)) - return true; -=20 /* Prefer the new address if it is less expensive. */ ! gain =3D (address_cost (old_rtx, mode, as, speed) ! - address_cost (new_rtx, mode, as, speed)); =20=20 /* If the addresses have equivalent cost, prefer the new address if it has the highest `set_src_cost'. That has the potential of eliminating the most insns without additional costs, and it is the same that cse.c used to do. */ if (gain =3D=3D 0) ! gain =3D (set_src_cost (new_rtx, VOIDmode, speed) ! - set_src_cost (old_rtx, VOIDmode, speed)); =20=20 return (gain > 0); } =20=20 =20=20 ! /* Flags for the last parameter of propagate_rtx_1. */ !=20 ! enum { ! /* If PR_CAN_APPEAR is true, propagate_rtx_1 always returns true; ! if it is false, propagate_rtx_1 returns false if, for at least ! one occurrence OLD, it failed to collapse the result to a constant. ! For example, (mult:M (reg:M A) (minus:M (reg:M B) (reg:M A))) may ! collapse to zero if replacing (reg:M B) with (reg:M A). !=20 ! PR_CAN_APPEAR is disregarded inside MEMs: in that case, ! propagate_rtx_1 just tries to make cheaper and valid memory ! addresses. */ ! PR_CAN_APPEAR =3D 1, !=20 ! /* If PR_HANDLE_MEM is not set, propagate_rtx_1 won't attempt any repla= cement ! outside memory addresses. This is needed because propagate_rtx_1 do= es ! not do any analysis on memory; thus it is very conservative and in g= eneral ! it will fail if non-read-only MEMs are found in the source expressio= n. !=20 ! PR_HANDLE_MEM is set when the source of the propagation was not ! another MEM. Then, it is safe not to treat non-read-only MEMs as ! ``opaque'' objects. */ ! PR_HANDLE_MEM =3D 2, !=20 ! /* Set when costs should be optimized for speed. */ ! PR_OPTIMIZE_FOR_SPEED =3D 4 ! }; =20=20 ! /* Check that X has a single def. */ =20=20 ! static bool ! reg_single_def_p (rtx x) ! { ! if (!REG_P (x)) ! return false; =20=20 ! int regno =3D REGNO (x); ! return (DF_REG_DEF_COUNT (regno) =3D=3D 1 ! && !bitmap_bit_p (DF_LR_OUT (ENTRY_BLOCK_PTR_FOR_FN (cfun)), regno)); } =20=20 ! /* Replace all occurrences of OLD in *PX with NEW and try to simplify the ! resulting expression. Replace *PX with a new RTL expression if an ! occurrence of OLD was found. =20=20 ! This is only a wrapper around simplify-rtx.c: do not add any pattern ! matching code here. (The sole exception is the handling of LO_SUM, but ! that is because there is no simplify_gen_* function for LO_SUM). */ =20=20 ! static bool ! propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags) { ! rtx x =3D *px, tem =3D NULL_RTX, op0, op1, op2; ! enum rtx_code code =3D GET_CODE (x); ! machine_mode mode =3D GET_MODE (x); ! machine_mode op_mode; ! bool can_appear =3D (flags & PR_CAN_APPEAR) !=3D 0; ! bool valid_ops =3D true; !=20 ! if (!(flags & PR_HANDLE_MEM) && MEM_P (x) && !MEM_READONLY_P (x)) ! { ! /* If unsafe, change MEMs to CLOBBERs or SCRATCHes (to preserve whe= ther ! they have side effects or not). */ ! *px =3D (side_effects_p (x) ! ? gen_rtx_CLOBBER (GET_MODE (x), const0_rtx) ! : gen_rtx_SCRATCH (GET_MODE (x))); return false; } =20=20 ! /* If X is OLD_RTX, return NEW_RTX. But not if replacing only within an ! address, and we are *not* inside one. */ ! if (x =3D=3D old_rtx) ! { ! *px =3D new_rtx; ! return can_appear; ! } !=20 ! /* If this is an expression, try recursive substitution. */ ! switch (GET_RTX_CLASS (code)) ! { ! case RTX_UNARY: ! op0 =3D XEXP (x, 0); ! op_mode =3D GET_MODE (op0); ! valid_ops &=3D propagate_rtx_1 (&op0, old_rtx, new_rtx, flags); ! if (op0 =3D=3D XEXP (x, 0)) ! return true; ! tem =3D simplify_gen_unary (code, mode, op0, op_mode); ! break; !=20 ! case RTX_BIN_ARITH: ! case RTX_COMM_ARITH: ! op0 =3D XEXP (x, 0); ! op1 =3D XEXP (x, 1); ! valid_ops &=3D propagate_rtx_1 (&op0, old_rtx, new_rtx, flags); ! valid_ops &=3D propagate_rtx_1 (&op1, old_rtx, new_rtx, flags); ! if (op0 =3D=3D XEXP (x, 0) && op1 =3D=3D XEXP (x, 1)) ! return true; ! tem =3D simplify_gen_binary (code, mode, op0, op1); ! break; !=20 ! case RTX_COMPARE: ! case RTX_COMM_COMPARE: ! op0 =3D XEXP (x, 0); ! op1 =3D XEXP (x, 1); ! op_mode =3D GET_MODE (op0) !=3D VOIDmode ? GET_MODE (op0) : GET_MOD= E (op1); ! valid_ops &=3D propagate_rtx_1 (&op0, old_rtx, new_rtx, flags); ! valid_ops &=3D propagate_rtx_1 (&op1, old_rtx, new_rtx, flags); ! if (op0 =3D=3D XEXP (x, 0) && op1 =3D=3D XEXP (x, 1)) ! return true; ! tem =3D simplify_gen_relational (code, mode, op_mode, op0, op1); ! break; !=20 ! case RTX_TERNARY: ! case RTX_BITFIELD_OPS: ! op0 =3D XEXP (x, 0); ! op1 =3D XEXP (x, 1); ! op2 =3D XEXP (x, 2); ! op_mode =3D GET_MODE (op0); ! valid_ops &=3D propagate_rtx_1 (&op0, old_rtx, new_rtx, flags); ! valid_ops &=3D propagate_rtx_1 (&op1, old_rtx, new_rtx, flags); ! valid_ops &=3D propagate_rtx_1 (&op2, old_rtx, new_rtx, flags); ! if (op0 =3D=3D XEXP (x, 0) && op1 =3D=3D XEXP (x, 1) && op2 =3D=3D = XEXP (x, 2)) ! return true; ! if (op_mode =3D=3D VOIDmode) ! op_mode =3D GET_MODE (op0); ! tem =3D simplify_gen_ternary (code, mode, op_mode, op0, op1, op2); ! break; !=20 ! case RTX_EXTRA: ! /* The only case we try to handle is a SUBREG. */ ! if (code =3D=3D SUBREG) ! { ! op0 =3D XEXP (x, 0); ! valid_ops &=3D propagate_rtx_1 (&op0, old_rtx, new_rtx, flags); ! if (op0 =3D=3D XEXP (x, 0)) ! return true; ! tem =3D simplify_gen_subreg (mode, op0, GET_MODE (SUBREG_REG (x)), ! SUBREG_BYTE (x)); ! } !=20 ! else ! { ! rtvec vec; ! rtvec newvec; ! const char *fmt =3D GET_RTX_FORMAT (code); ! rtx op; !=20 ! for (int i =3D 0; fmt[i]; i++) ! switch (fmt[i]) ! { ! case 'E': ! vec =3D XVEC (x, i); ! newvec =3D vec; ! for (int j =3D 0; j < GET_NUM_ELEM (vec); j++) ! { ! op =3D RTVEC_ELT (vec, j); ! valid_ops &=3D propagate_rtx_1 (&op, old_rtx, new_rtx, flags); ! if (op !=3D RTVEC_ELT (vec, j)) ! { ! if (newvec =3D=3D vec) ! { ! newvec =3D shallow_copy_rtvec (vec); ! if (!tem) ! tem =3D shallow_copy_rtx (x); ! XVEC (tem, i) =3D newvec; ! } ! RTVEC_ELT (newvec, j) =3D op; ! } ! } ! break; !=20 ! case 'e': ! if (XEXP (x, i)) ! { ! op =3D XEXP (x, i); ! valid_ops &=3D propagate_rtx_1 (&op, old_rtx, new_rtx, flags); ! if (op !=3D XEXP (x, i)) ! { ! if (!tem) ! tem =3D shallow_copy_rtx (x); ! XEXP (tem, i) =3D op; ! } ! } ! break; ! } ! } !=20 ! break; !=20 ! case RTX_OBJ: ! if (code =3D=3D MEM && x !=3D new_rtx) ! { ! rtx new_op0; ! op0 =3D XEXP (x, 0); !=20 ! /* There are some addresses that we cannot work on. */ ! if (!can_simplify_addr (op0)) ! return true; !=20 ! op0 =3D new_op0 =3D targetm.delegitimize_address (op0); ! valid_ops &=3D propagate_rtx_1 (&new_op0, old_rtx, new_rtx, ! flags | PR_CAN_APPEAR); !=20 ! /* Dismiss transformation that we do not want to carry on. */ ! if (!valid_ops ! || new_op0 =3D=3D op0 ! || !(GET_MODE (new_op0) =3D=3D GET_MODE (op0) ! || GET_MODE (new_op0) =3D=3D VOIDmode)) ! return true; !=20 ! canonicalize_address (new_op0); !=20 ! /* Copy propagations are always ok. Otherwise check the costs. */ ! if (!(REG_P (old_rtx) && REG_P (new_rtx)) ! && !should_replace_address (op0, new_op0, GET_MODE (x), ! MEM_ADDR_SPACE (x), ! flags & PR_OPTIMIZE_FOR_SPEED)) ! return true; !=20 ! tem =3D replace_equiv_address_nv (x, new_op0); ! } !=20 ! else if (code =3D=3D LO_SUM) ! { ! op0 =3D XEXP (x, 0); ! op1 =3D XEXP (x, 1); =20=20 ! /* The only simplification we do attempts to remove references to op0 ! or make it constant -- in both cases, op0's invalidity will not ! make the result invalid. */ ! propagate_rtx_1 (&op0, old_rtx, new_rtx, flags | PR_CAN_APPEAR); ! valid_ops &=3D propagate_rtx_1 (&op1, old_rtx, new_rtx, flags); ! if (op0 =3D=3D XEXP (x, 0) && op1 =3D=3D XEXP (x, 1)) ! return true; =20=20 ! /* (lo_sum (high x) x) -> x */ ! if (GET_CODE (op0) =3D=3D HIGH && rtx_equal_p (XEXP (op0, 0), op1)) ! tem =3D op1; ! else ! tem =3D gen_rtx_LO_SUM (mode, op0, op1); !=20 ! /* OP1 is likely not a legitimate address, otherwise there would have ! been no LO_SUM. We want it to disappear if it is invalid, return ! false in that case. */ ! return memory_address_p (mode, tem); ! } =20=20 ! else if (code =3D=3D REG) ! { ! if (rtx_equal_p (x, old_rtx)) ! { ! *px =3D new_rtx; ! return can_appear; ! } ! } ! break; =20=20 ! default: ! break; } =20=20 - /* No change, no trouble. */ - if (tem =3D=3D NULL_RTX) - return true; -=20 - *px =3D tem; -=20 /* Allow replacements that simplify operations on a vector or complex value to a component. The most prominent case is (subreg ([vec_]concat ...)). */ ! if (REG_P (tem) && !HARD_REGISTER_P (tem) ! && (VECTOR_MODE_P (GET_MODE (new_rtx)) ! || COMPLEX_MODE_P (GET_MODE (new_rtx))) ! && GET_MODE (tem) =3D=3D GET_MODE_INNER (GET_MODE (new_rtx))) ! return true; =20=20 ! /* The replacement we made so far is valid, if all of the recursive ! replacements were valid, or we could simplify everything to ! a constant. */ ! return valid_ops || can_appear || CONSTANT_P (tem); } =20=20 =20=20 ! /* Return true if X constains a non-constant mem. */ =20=20 ! static bool ! varying_mem_p (const_rtx x) { ! subrtx_iterator::array_type array; ! FOR_EACH_SUBRTX (iter, array, x, NONCONST) ! if (MEM_P (*iter) && !MEM_READONLY_P (*iter)) ! return true; ! return false; ! } !=20 =20=20 ! /* Replace all occurrences of OLD in X with NEW and try to simplify the ! resulting expression (in mode MODE). Return a new expression if it is ! a constant, otherwise X. !=20 ! Simplifications where occurrences of NEW collapse to a constant are al= ways ! accepted. All simplifications are accepted if NEW is a pseudo too. ! Otherwise, we accept simplifications that have a lower or equal cost. = */ !=20 ! static rtx ! propagate_rtx (rtx x, machine_mode mode, rtx old_rtx, rtx new_rtx, ! bool speed) ! { ! rtx tem; ! bool collapsed; ! int flags; =20=20 ! if (REG_P (new_rtx) && REGNO (new_rtx) < FIRST_PSEUDO_REGISTER) ! return NULL_RTX; =20=20 ! flags =3D 0; ! if (REG_P (new_rtx) ! || CONSTANT_P (new_rtx) ! || (GET_CODE (new_rtx) =3D=3D SUBREG ! && REG_P (SUBREG_REG (new_rtx)) ! && !paradoxical_subreg_p (new_rtx))) ! flags |=3D PR_CAN_APPEAR; ! if (!varying_mem_p (new_rtx)) ! flags |=3D PR_HANDLE_MEM; !=20 ! if (speed) ! flags |=3D PR_OPTIMIZE_FOR_SPEED; !=20 ! tem =3D x; ! collapsed =3D propagate_rtx_1 (&tem, old_rtx, copy_rtx (new_rtx), flags= ); ! if (tem =3D=3D x || !collapsed) ! return NULL_RTX; !=20 ! /* gen_lowpart_common will not be able to process VOIDmode entities oth= er ! than CONST_INTs. */ ! if (GET_MODE (tem) =3D=3D VOIDmode && !CONST_INT_P (tem)) ! return NULL_RTX; =20=20 ! if (GET_MODE (tem) =3D=3D VOIDmode) ! tem =3D rtl_hooks.gen_lowpart_no_emit (mode, tem); ! else ! gcc_assert (GET_MODE (tem) =3D=3D mode); =20=20 ! return tem; } =20=20 =20=20 ! =20=20 ! /* Return true if the register from reference REF is killed ! between FROM to (but not including) TO. */ =20=20 static bool ! local_ref_killed_between_p (df_ref ref, rtx_insn *from, rtx_insn *to) { ! rtx_insn *insn; !=20 ! for (insn =3D from; insn !=3D to; insn =3D NEXT_INSN (insn)) { ! df_ref def; ! if (!INSN_P (insn)) ! continue; !=20 ! FOR_EACH_INSN_DEF (def, insn) ! if (DF_REF_REGNO (ref) =3D=3D DF_REF_REGNO (def)) ! return true; } return false; } =20=20 =20=20 ! /* Check if USE is killed between DEF_INSN and TARGET_INSN. This would ! require full computation of available expressions; we check only a few ! restricted conditions: ! - if the reg in USE has only one definition, go ahead; ! - in the same basic block, we check for no definitions killing the use; ! - if TARGET_INSN's basic block has DEF_INSN's basic block as its sole ! predecessor, we check if the use is killed after DEF_INSN or before ! TARGET_INSN insn, in their respective basic blocks. */ =20=20 ! static bool ! use_killed_between (df_ref use, rtx_insn *def_insn, rtx_insn *target_insn) { ! basic_block def_bb =3D BLOCK_FOR_INSN (def_insn); ! basic_block target_bb =3D BLOCK_FOR_INSN (target_insn); ! int regno; ! df_ref def; !=20 ! /* We used to have a def reaching a use that is _before_ the def, ! with the def not dominating the use even though the use and def ! are in the same basic block, when a register may be used ! uninitialized in a loop. This should not happen anymore since ! we do not use reaching definitions, but still we test for such ! cases and assume that DEF is not available. */ ! if (def_bb =3D=3D target_bb ! ? DF_INSN_LUID (def_insn) >=3D DF_INSN_LUID (target_insn) ! : !dominated_by_p (CDI_DOMINATORS, target_bb, def_bb)) ! return true; !=20 ! /* Check if the reg in USE has only one definition. We already ! know that this definition reaches use, or we wouldn't be here. ! However, this is invalid for hard registers because if they are ! live at the beginning of the function it does not mean that we ! have an uninitialized access. And we have to check for the case ! where a register may be used uninitialized in a loop as above. */ ! regno =3D DF_REF_REGNO (use); ! def =3D DF_REG_DEF_CHAIN (regno); ! if (def ! && DF_REF_NEXT_REG (def) =3D=3D NULL ! && regno >=3D FIRST_PSEUDO_REGISTER ! && (BLOCK_FOR_INSN (DF_REF_INSN (def)) =3D=3D def_bb ! ? DF_INSN_LUID (DF_REF_INSN (def)) < DF_INSN_LUID (def_insn) ! : dominated_by_p (CDI_DOMINATORS, ! def_bb, BLOCK_FOR_INSN (DF_REF_INSN (def))))) ! return false; !=20 ! /* Check locally if we are in the same basic block. */ ! if (def_bb =3D=3D target_bb) ! return local_ref_killed_between_p (use, def_insn, target_insn); !=20 ! /* Finally, if DEF_BB is the sole predecessor of TARGET_BB. */ ! if (single_pred_p (target_bb) ! && single_pred (target_bb) =3D=3D def_bb) ! { ! df_ref x; !=20 ! /* See if USE is killed between DEF_INSN and the last insn in the ! basic block containing DEF_INSN. */ ! x =3D df_bb_regno_last_def_find (def_bb, regno); ! if (x && DF_INSN_LUID (DF_REF_INSN (x)) >=3D DF_INSN_LUID (def_insn= )) ! return true; =20=20 ! /* See if USE is killed between TARGET_INSN and the first insn in t= he ! basic block containing TARGET_INSN. */ ! x =3D df_bb_regno_first_def_find (target_bb, regno); ! if (x && DF_INSN_LUID (DF_REF_INSN (x)) < DF_INSN_LUID (target_insn= )) ! return true; !=20 ! return false; } =20=20 ! /* Otherwise assume the worst case. */ ! return true; ! } =20=20 !=20 ! /* Check if all uses in DEF_INSN can be used in TARGET_INSN. This ! would require full computation of available expressions; ! we check only restricted conditions, see use_killed_between. */ ! static bool ! all_uses_available_at (rtx_insn *def_insn, rtx_insn *target_insn) ! { ! df_ref use; ! struct df_insn_info *insn_info =3D DF_INSN_INFO_GET (def_insn); ! rtx def_set =3D single_set (def_insn); ! rtx_insn *next; !=20 ! gcc_assert (def_set); !=20 ! /* If target_insn comes right after def_insn, which is very common ! for addresses, we can use a quicker test. Ignore debug insns ! other than target insns for this. */ ! next =3D NEXT_INSN (def_insn); ! while (next && next !=3D target_insn && DEBUG_INSN_P (next)) ! next =3D NEXT_INSN (next); ! if (next =3D=3D target_insn && REG_P (SET_DEST (def_set))) ! { ! rtx def_reg =3D SET_DEST (def_set); !=20 ! /* If the insn uses the reg that it defines, the substitution is ! invalid. */ ! FOR_EACH_INSN_INFO_USE (use, insn_info) ! if (rtx_equal_p (DF_REF_REG (use), def_reg)) ! return false; ! FOR_EACH_INSN_INFO_EQ_USE (use, insn_info) ! if (rtx_equal_p (DF_REF_REG (use), def_reg)) ! return false; ! } ! else { ! rtx def_reg =3D REG_P (SET_DEST (def_set)) ? SET_DEST (def_set) : N= ULL_RTX; !=20 ! /* Look at all the uses of DEF_INSN, and see if they are not ! killed between DEF_INSN and TARGET_INSN. */ ! FOR_EACH_INSN_INFO_USE (use, insn_info) { ! if (def_reg && rtx_equal_p (DF_REF_REG (use), def_reg)) ! return false; ! if (use_killed_between (use, def_insn, target_insn)) ! return false; } ! FOR_EACH_INSN_INFO_EQ_USE (use, insn_info) { ! if (def_reg && rtx_equal_p (DF_REF_REG (use), def_reg)) ! return false; ! if (use_killed_between (use, def_insn, target_insn)) ! return false; } } =20=20 ! return true; } =20=20 ! ! static df_ref *active_defs; ! static sparseset active_defs_check; !=20 ! /* Fill the ACTIVE_DEFS array with the use->def link for the registers ! mentioned in USE_REC. Register the valid entries in ACTIVE_DEFS_CHECK ! too, for checking purposes. */ =20=20 ! static void ! register_active_defs (df_ref use) { ! for (; use; use =3D DF_REF_NEXT_LOC (use)) ! { ! df_ref def =3D get_def_for_use (use); ! int regno =3D DF_REF_REGNO (use); =20=20 ! if (flag_checking) ! sparseset_set_bit (active_defs_check, regno); ! active_defs[regno] =3D def; } - } -=20 =20=20 ! /* Build the use->def links that we use to update the dataflow info ! for new uses. Note that building the links is very cheap and if ! it were done earlier, they could be used to rule out invalid ! propagations (in addition to what is done in all_uses_available_at). ! I'm not doing this yet, though. */ =20=20 ! static void ! update_df_init (rtx_insn *def_insn, rtx_insn *insn) ! { ! if (flag_checking) ! sparseset_clear (active_defs_check); ! register_active_defs (DF_INSN_USES (def_insn)); ! register_active_defs (DF_INSN_USES (insn)); ! register_active_defs (DF_INSN_EQ_USES (insn)); ! } =20=20 =20=20 ! /* Update the USE_DEF_REF array for the given use, using the active defin= itions ! in the ACTIVE_DEFS array to match pseudos to their def. */ =20=20 ! static inline void ! update_uses (df_ref use) ! { ! for (; use; use =3D DF_REF_NEXT_LOC (use)) { ! int regno =3D DF_REF_REGNO (use); =20=20 ! /* Set up the use-def chain. */ ! if (DF_REF_ID (use) >=3D (int) use_def_ref.length ()) ! use_def_ref.safe_grow_cleared (DF_REF_ID (use) + 1, true); =20=20 ! if (flag_checking) ! gcc_assert (sparseset_bit_p (active_defs_check, regno)); ! use_def_ref[DF_REF_ID (use)] =3D active_defs[regno]; ! } ! } =20=20 =20=20 ! /* Update the USE_DEF_REF array for the uses in INSN. Only update note ! uses if NOTES_ONLY is true. */ =20=20 ! static void ! update_df (rtx_insn *insn, rtx note) ! { ! struct df_insn_info *insn_info =3D DF_INSN_INFO_GET (insn); =20=20 ! if (note) ! { ! df_uses_create (&XEXP (note, 0), insn, DF_REF_IN_NOTE); ! df_notes_rescan (insn); } ! else { ! df_uses_create (&PATTERN (insn), insn, 0); ! df_insn_rescan (insn); ! update_uses (DF_INSN_INFO_USES (insn_info)); } =20=20 ! update_uses (DF_INSN_INFO_EQ_USES (insn_info)); } =20=20 !=20 ! /* Try substituting NEW into LOC, which originated from forward propagati= on ! of USE's value from DEF_INSN. SET_REG_EQUAL says whether we are ! substituting the whole SET_SRC, so we can set a REG_EQUAL note if the ! new insn is not recognized. Return whether the substitution was ! performed. */ =20=20 static bool ! try_fwprop_subst (df_ref use, rtx *loc, rtx new_rtx, rtx_insn *def_insn, ! bool set_reg_equal) ! { ! rtx_insn *insn =3D DF_REF_INSN (use); ! rtx set =3D single_set (insn); ! rtx note =3D NULL_RTX; ! bool speed =3D optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); ! int old_cost =3D 0; ! bool ok; =20=20 ! update_df_init (def_insn, insn); =20=20 ! /* forward_propagate_subreg may be operating on an instruction with ! multiple sets. If so, assume the cost of the new instruction is ! not greater than the old one. */ ! if (set) ! old_cost =3D set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), = speed); ! if (dump_file) ! { ! fprintf (dump_file, "\nIn insn %d, replacing\n ", INSN_UID (insn)); ! print_inline_rtx (dump_file, *loc, 2); ! fprintf (dump_file, "\n with "); ! print_inline_rtx (dump_file, new_rtx, 2); ! fprintf (dump_file, "\n"); ! } =20=20 ! validate_unshare_change (insn, loc, new_rtx, true); ! if (!verify_changes (0)) ! { ! if (dump_file) ! fprintf (dump_file, "Changes to insn %d not recognized\n", ! INSN_UID (insn)); ! ok =3D false; ! } !=20 ! else if (DF_REF_TYPE (use) =3D=3D DF_REF_REG_USE ! && set ! && (set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed) ! > old_cost)) ! { ! if (dump_file) ! fprintf (dump_file, "Changes to insn %d not profitable\n", ! INSN_UID (insn)); ! ok =3D false; ! } =20=20 ! else ! { ! if (dump_file) ! fprintf (dump_file, "Changed insn %d\n", INSN_UID (insn)); ! ok =3D true; ! } =20=20 ! if (ok) { ! confirm_change_group (); ! num_changes++; } - else - { - cancel_changes (0); =20=20 ! /* Can also record a simplified value in a REG_EQUAL note, ! making a new one if one does not already exist. */ ! if (set_reg_equal) ! { ! /* If there are any paradoxical SUBREGs, don't add REG_EQUAL note, ! because the bits in there can be anything and so might not ! match the REG_EQUAL note content. See PR70574. */ ! subrtx_var_iterator::array_type array; ! FOR_EACH_SUBRTX_VAR (iter, array, *loc, NONCONST) ! { ! rtx x =3D *iter; ! if (SUBREG_P (x) && paradoxical_subreg_p (x)) ! { ! set_reg_equal =3D false; ! break; ! } ! } =20=20 ! if (set_reg_equal) ! { ! if (dump_file) ! fprintf (dump_file, " Setting REG_EQUAL note\n"); =20=20 ! note =3D set_unique_reg_note (insn, REG_EQUAL, copy_rtx (new_rtx)); ! } ! } ! } !=20 ! if ((ok || note) && !CONSTANT_P (new_rtx)) ! update_df (insn, note); =20=20 ! return ok; } =20=20 /* For the given single_set INSN, containing SRC known to be a --- 134,610 ---- && REGNO (reg) !=3D ARG_POINTER_REGNUM)); } =20=20 ! /* MEM is the result of an address simplification, and temporarily ! undoing changes OLD_NUM_CHANGES onwards restores the original address. ! Return whether it is good to use the new address instead of the ! old one. INSN is the containing instruction. */ =20=20 static bool ! should_replace_address (int old_num_changes, rtx mem, rtx_insn *insn) { int gain; =20=20 /* Prefer the new address if it is less expensive. */ ! bool speed =3D optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); ! temporarily_undo_changes (old_num_changes); ! gain =3D address_cost (XEXP (mem, 0), GET_MODE (mem), ! MEM_ADDR_SPACE (mem), speed); ! redo_changes (old_num_changes); ! gain -=3D address_cost (XEXP (mem, 0), GET_MODE (mem), ! MEM_ADDR_SPACE (mem), speed); =20=20 /* If the addresses have equivalent cost, prefer the new address if it has the highest `set_src_cost'. That has the potential of eliminating the most insns without additional costs, and it is the same that cse.c used to do. */ if (gain =3D=3D 0) ! { ! gain =3D set_src_cost (XEXP (mem, 0), VOIDmode, speed); ! temporarily_undo_changes (old_num_changes); ! gain -=3D set_src_cost (XEXP (mem, 0), VOIDmode, speed); ! redo_changes (old_num_changes); ! } =20=20 return (gain > 0); } =20=20 =20=20 ! namespace ! { ! class fwprop_propagation : public insn_propagation ! { ! public: ! static const uint16_t CHANGED_MEM =3D FIRST_SPARE_RESULT; ! static const uint16_t CONSTANT =3D FIRST_SPARE_RESULT << 1; ! static const uint16_t PROFITABLE =3D FIRST_SPARE_RESULT << 2; =20=20 ! fwprop_propagation (rtx_insn *, rtx, rtx); =20=20 ! bool changed_mem_p () const { return result_flags & CHANGED_MEM; } ! bool folded_to_constants_p () const; ! bool profitable_p () const; =20=20 ! bool check_mem (int, rtx) final override; ! void note_simplification (int, uint16_t, rtx, rtx) final override; ! uint16_t classify_result (rtx, rtx); ! }; } =20=20 ! /* Prepare to replace FROM with TO in INSN. */ !=20 ! fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to) ! : insn_propagation (insn, from, to) ! { ! should_check_mems =3D true; ! should_note_simplifications =3D true; ! } =20=20 ! /* MEM is the result of an address simplification, and temporarily ! undoing changes OLD_NUM_CHANGES onwards restores the original address. ! Return true if the propagation should continue, false if it has failed= . */ =20=20 ! bool ! fwprop_propagation::check_mem (int old_num_changes, rtx mem) { ! if (!memory_address_addr_space_p (GET_MODE (mem), XEXP (mem, 0), ! MEM_ADDR_SPACE (mem))) ! { ! failure_reason =3D "would create an invalid MEM"; return false; } =20=20 ! temporarily_undo_changes (old_num_changes); ! bool can_simplify =3D can_simplify_addr (XEXP (mem, 0)); ! redo_changes (old_num_changes); ! if (!can_simplify) ! { ! failure_reason =3D "would replace a frame address"; ! return false; ! } =20=20 ! /* Copy propagations are always ok. Otherwise check the costs. */ ! if (!(REG_P (from) && REG_P (to)) ! && !should_replace_address (old_num_changes, mem, insn)) ! { ! failure_reason =3D "would increase the cost of a MEM"; ! return false; ! } =20=20 ! result_flags |=3D CHANGED_MEM; ! return true; ! } =20=20 ! /* OLDX has been simplified to NEWX. Describe the change in terms of ! result_flags. */ =20=20 ! uint16_t ! fwprop_propagation::classify_result (rtx old_rtx, rtx new_rtx) ! { ! if (CONSTANT_P (new_rtx)) ! { ! /* If OLD_RTX is a LO_SUM, then it presumably exists for a reason, ! and NEW_RTX is likely not a legitimate address. We want it to ! disappear if it is invalid. !=20 ! ??? Using the mode of the LO_SUM as the mode of the address ! seems odd, but it was what the pre-SSA code did. */ ! if (GET_CODE (old_rtx) =3D=3D LO_SUM ! && !memory_address_p (GET_MODE (old_rtx), new_rtx)) ! return CONSTANT; ! return CONSTANT | PROFITABLE; } =20=20 /* Allow replacements that simplify operations on a vector or complex value to a component. The most prominent case is (subreg ([vec_]concat ...)). */ ! if (REG_P (new_rtx) ! && !HARD_REGISTER_P (new_rtx) ! && (VECTOR_MODE_P (GET_MODE (from)) ! || COMPLEX_MODE_P (GET_MODE (from))) ! && GET_MODE (new_rtx) =3D=3D GET_MODE_INNER (GET_MODE (from))) ! return PROFITABLE; !=20 ! return 0; ! } =20=20 ! /* Record that OLD_RTX has been simplified to NEW_RTX. OLD_NUM_CHANGES ! is the number of unrelated changes that had been made before processing ! OLD_RTX and its subrtxes. OLD_RESULT_FLAGS is the value that result_f= lags ! had at that point. */ !=20 ! void ! fwprop_propagation::note_simplification (int old_num_changes, ! uint16_t old_result_flags, ! rtx old_rtx, rtx new_rtx) ! { ! result_flags &=3D ~(CONSTANT | PROFITABLE); ! uint16_t new_flags =3D classify_result (old_rtx, new_rtx); ! if (old_num_changes) ! new_flags &=3D old_result_flags; ! result_flags |=3D new_flags; ! } !=20 ! /* Return true if all substitutions eventually folded to constants. */ !=20 ! bool ! fwprop_propagation::folded_to_constants_p () const ! { ! /* If we're propagating a HIGH, require it to be folded with a ! partnering LO_SUM. For example, a REG_EQUAL note with a register ! replaced by an unfolded HIGH is not useful. */ ! if (CONSTANT_P (to) && GET_CODE (to) !=3D HIGH) ! return true; ! return !(result_flags & UNSIMPLIFIED) && (result_flags & CONSTANT); } =20=20 =20=20 ! /* Return true if it is worth keeping the result of the propagation, ! false if it would increase the complexity of the pattern too much. */ =20=20 ! bool ! fwprop_propagation::profitable_p () const { ! if (changed_mem_p ()) ! return true; =20=20 ! if (!(result_flags & UNSIMPLIFIED) ! && (result_flags & PROFITABLE)) ! return true; =20=20 ! if (REG_P (to)) ! return true; =20=20 ! if (GET_CODE (to) =3D=3D SUBREG ! && REG_P (SUBREG_REG (to)) ! && !paradoxical_subreg_p (to)) ! return true; =20=20 ! if (CONSTANT_P (to)) ! return true; =20=20 ! return false; } =20=20 + /* Check that X has a single def. */ =20=20 ! static bool ! reg_single_def_p (rtx x) ! { ! return REG_P (x) && crtl->ssa->single_dominating_def (REGNO (x)); ! } =20=20 ! /* Return true if X contains a paradoxical subreg. */ =20=20 static bool ! contains_paradoxical_subreg_p (rtx x) { ! subrtx_var_iterator::array_type array; ! FOR_EACH_SUBRTX_VAR (iter, array, x, NONCONST) { ! x =3D *iter; ! if (SUBREG_P (x) && paradoxical_subreg_p (x)) ! return true; } return false; } =20=20 + /* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_I= NSN. + Return the number of substitutions on success, otherwise return -1 and + leave USE_INSN unchanged. =20=20 ! If REQUIRE_CONSTANT is true, require all substituted occurences of SRC ! to fold to a constant, so that the note does not use any more registers ! than it did previously. If REQUIRE_CONSTANT is false, also allow the ! substitution if it's something we'd normally allow for the main ! instruction pattern. */ =20=20 ! static int ! try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, ! rtx note, rtx dest, rtx src, bool require_constant) { ! rtx_insn *use_rtl =3D use_insn->rtl (); =20=20 ! insn_change_watermark watermark; ! fwprop_propagation prop (use_rtl, dest, src); ! if (!prop.apply_to_rvalue (&XEXP (note, 0))) ! { ! if (dump_file && (dump_flags & TDF_DETAILS)) ! fprintf (dump_file, "cannot propagate from insn %d into" ! " notes of insn %d: %s\n", def_insn->uid (), ! use_insn->uid (), prop.failure_reason); ! return -1; } =20=20 ! if (prop.num_replacements =3D=3D 0) ! return 0; =20=20 ! if (require_constant) { ! if (!prop.folded_to_constants_p ()) { ! if (dump_file && (dump_flags & TDF_DETAILS)) ! fprintf (dump_file, "cannot propagate from insn %d into" ! " notes of insn %d: %s\n", def_insn->uid (), ! use_insn->uid (), "wouldn't fold to constants"); ! return -1; } ! } ! else ! { ! if (!prop.folded_to_constants_p () && !prop.profitable_p ()) { ! if (dump_file && (dump_flags & TDF_DETAILS)) ! fprintf (dump_file, "cannot propagate from insn %d into" ! " notes of insn %d: %s\n", def_insn->uid (), ! use_insn->uid (), "would increase complexity of node"); ! return -1; } } =20=20 ! if (dump_file && (dump_flags & TDF_DETAILS)) ! { ! fprintf (dump_file, "\nin notes of insn %d, replacing:\n ", ! INSN_UID (use_rtl)); ! temporarily_undo_changes (0); ! print_inline_rtx (dump_file, note, 2); ! redo_changes (0); ! fprintf (dump_file, "\n with:\n "); ! print_inline_rtx (dump_file, note, 2); ! fprintf (dump_file, "\n"); ! } ! watermark.keep (); ! return prop.num_replacements; } =20=20 ! /* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of ! USE_INSN's pattern. Return true on success, otherwise leave USE_INSN ! unchanged. */ =20=20 ! static bool ! try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_ch= ange, ! insn_info *def_insn, rtx *loc, rtx dest, rtx src) { ! insn_info *use_insn =3D use_change.insn (); ! rtx_insn *use_rtl =3D use_insn->rtl (); =20=20 ! insn_change_watermark watermark; ! fwprop_propagation prop (use_rtl, dest, src); ! if (!prop.apply_to_pattern (loc)) ! { ! if (dump_file && (dump_flags & TDF_DETAILS)) ! fprintf (dump_file, "cannot propagate from insn %d into" ! " insn %d: %s\n", def_insn->uid (), use_insn->uid (), ! prop.failure_reason); ! return false; } =20=20 ! if (prop.num_replacements =3D=3D 0) ! return false; =20=20 ! if (!prop.profitable_p ()) ! { ! if (dump_file && (dump_flags & TDF_DETAILS)) ! fprintf (dump_file, "cannot propagate from insn %d into" ! " insn %d: %s\n", def_insn->uid (), use_insn->uid (), ! "would increase complexity of pattern"); ! return false; ! } =20=20 + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "\npropagating insn %d into insn %d, replacing:= \n", + def_insn->uid (), use_insn->uid ()); + temporarily_undo_changes (0); + print_rtl_single (dump_file, PATTERN (use_rtl)); + redo_changes (0); + } =20=20 ! /* ??? In theory, it should be better to use insn costs rather than ! set_src_costs here. That would involve replacing this code with ! change_is_worthwhile. */ ! bool ok =3D recog (attempt, use_change); ! if (ok && !prop.changed_mem_p () && !use_insn->is_asm ()) ! if (rtx use_set =3D single_set (use_rtl)) ! { ! bool speed =3D optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_rtl)); ! temporarily_undo_changes (0); ! auto old_cost =3D set_src_cost (SET_SRC (use_set), ! GET_MODE (SET_DEST (use_set)), speed); ! redo_changes (0); ! auto new_cost =3D set_src_cost (SET_SRC (use_set), ! GET_MODE (SET_DEST (use_set)), speed); ! if (new_cost > old_cost) ! { ! if (dump_file) ! fprintf (dump_file, "change not profitable" ! " (cost %d -> cost %d)\n", old_cost, new_cost); ! ok =3D false; ! } ! } =20=20 ! if (!ok) { ! /* The pattern didn't match, but if all uses of SRC folded to ! constants, we can add a REG_EQUAL note for the result, if there ! isn't one already. */ ! if (!prop.folded_to_constants_p ()) ! return false; =20=20 ! /* Test this first to avoid creating an unnecessary copy of SRC. */ ! if (find_reg_note (use_rtl, REG_EQUAL, NULL_RTX)) ! return false; =20=20 ! rtx set =3D set_for_reg_notes (use_rtl); ! if (!set || !REG_P (SET_DEST (set))) ! return false; =20=20 + rtx value =3D copy_rtx (SET_SRC (set)); + cancel_changes (0); =20=20 ! /* If there are any paradoxical SUBREGs, drop the REG_EQUAL note, ! because the bits in there can be anything and so might not ! match the REG_EQUAL note content. See PR70574. */ ! if (contains_paradoxical_subreg_p (SET_SRC (set))) ! return false; =20=20 ! if (dump_file && (dump_flags & TDF_DETAILS)) ! fprintf (dump_file, " Setting REG_EQUAL note\n"); =20=20 ! return set_unique_reg_note (use_rtl, REG_EQUAL, value); } !=20 ! rtx *note_ptr =3D ®_NOTES (use_rtl); ! while (rtx note =3D *note_ptr) { ! if ((REG_NOTE_KIND (note) =3D=3D REG_EQUAL ! || REG_NOTE_KIND (note) =3D=3D REG_EQUIV) ! && try_fwprop_subst_note (use_insn, def_insn, note, ! dest, src, false) < 0) ! { ! *note_ptr =3D XEXP (note, 1); ! free_EXPR_LIST_node (note); ! } ! else ! note_ptr =3D &XEXP (note, 1); } =20=20 ! confirm_change_group (); ! crtl->ssa->change_insn (use_change); ! num_changes++; ! return true; } =20=20 ! /* Try to substitute (set DEST SRC) from DEF_INSN into USE_INSN's notes, ! given that it was not possible to do this for USE_INSN's main pattern. ! Return true on success, otherwise leave USE_INSN unchanged. */ =20=20 static bool ! try_fwprop_subst_notes (insn_info *use_insn, insn_info *def_insn, ! rtx dest, rtx src) ! { ! rtx_insn *use_rtl =3D use_insn->rtl (); ! for (rtx note =3D REG_NOTES (use_rtl); note; note =3D XEXP (note, 1)) ! if ((REG_NOTE_KIND (note) =3D=3D REG_EQUAL ! || REG_NOTE_KIND (note) =3D=3D REG_EQUIV) ! && try_fwprop_subst_note (use_insn, def_insn, note, ! dest, src, true) > 0) ! { ! confirm_change_group (); ! return true; ! } !=20 ! return false; ! } =20=20 ! /* Check whether we could validly substitute (set DEST SRC) from DEF_INSN ! into USE. If so, first try performing the substitution in location LOC ! of USE->insn ()'s pattern. If that fails, try instead to substitute ! into the notes. =20=20 ! Return true on success, otherwise leave USE_INSN unchanged. */ =20=20 ! static bool ! try_fwprop_subst (use_info *use, insn_info *def_insn, ! rtx *loc, rtx dest, rtx src) ! { ! insn_info *use_insn =3D use->insn (); =20=20 ! auto attempt =3D crtl->ssa->new_change_attempt (); ! use_array src_uses =3D remove_note_accesses (attempt, def_insn->uses ()= ); =20=20 ! /* ??? Not really a meaningful test: it means we can propagate arithmet= ic ! involving hard registers but not bare references to them. A better ! test would be to iterate over src_uses looking for hard registers ! that are not fixed. */ ! if (REG_P (src) && HARD_REGISTER_P (src)) ! return false; !=20 ! /* ??? It would be better to make this EBB-based instead. That would ! involve checking for eqaul EBBs rather than equal BBs and trying ! to make the uses available at use_insn->ebb ()->first_bb (). */ ! if (def_insn->bb () !=3D use_insn->bb ()) { ! src_uses =3D crtl->ssa->make_uses_available (attempt, src_uses, ! use_insn->bb ()); ! if (!src_uses.is_valid ()) ! return false; } =20=20 ! insn_change use_change (use_insn); ! use_change.new_uses =3D merge_access_arrays (attempt, use_change.new_us= es, ! src_uses); ! if (!use_change.new_uses.is_valid ()) ! return false; =20=20 ! /* ??? We could allow movement within the EBB by adding: =20=20 ! use_change.move_range =3D use_insn->ebb ()->insn_range (); */ ! if (!restrict_movement (use_change)) ! return false; =20=20 ! return (try_fwprop_subst_pattern (attempt, use_change, def_insn, ! loc, dest, src) ! || try_fwprop_subst_notes (use_insn, def_insn, dest, src)); } =20=20 /* For the given single_set INSN, containing SRC known to be a *************** *** 1117,1149 **** load from memory. */ =20=20 static bool ! free_load_extend (rtx src, rtx_insn *insn) { ! rtx reg; ! df_ref def, use; !=20 ! reg =3D XEXP (src, 0); if (load_extend_op (GET_MODE (reg)) !=3D GET_CODE (src)) return false; =20=20 ! FOR_EACH_INSN_USE (use, insn) ! if (!DF_REF_IS_ARTIFICIAL (use) ! && DF_REF_TYPE (use) =3D=3D DF_REF_REG_USE ! && DF_REF_REG (use) =3D=3D reg) ! break; ! if (!use) ! return false; =20=20 - def =3D get_def_for_use (use); if (!def) return false; =20=20 ! if (DF_REF_IS_ARTIFICIAL (def)) return false; =20=20 ! if (NONJUMP_INSN_P (DF_REF_INSN (def))) { ! rtx patt =3D PATTERN (DF_REF_INSN (def)); =20=20 if (GET_CODE (patt) =3D=3D SET && GET_CODE (SET_SRC (patt)) =3D=3D MEM --- 613,643 ---- load from memory. */ =20=20 static bool ! free_load_extend (rtx src, insn_info *insn) { ! rtx reg =3D XEXP (src, 0); if (load_extend_op (GET_MODE (reg)) !=3D GET_CODE (src)) return false; =20=20 ! def_info *def =3D nullptr; ! for (use_info *use : insn->uses ()) ! if (use->regno () =3D=3D REGNO (reg)) ! { ! def =3D use->def (); ! break; ! } =20=20 if (!def) return false; =20=20 ! insn_info *def_insn =3D def->insn (); ! if (def_insn->is_artificial ()) return false; =20=20 ! rtx_insn *def_rtl =3D def_insn->rtl (); ! if (NONJUMP_INSN_P (def_rtl)) { ! rtx patt =3D PATTERN (def_rtl); =20=20 if (GET_CODE (patt) =3D=3D SET && GET_CODE (SET_SRC (patt)) =3D=3D MEM *************** *** 1153,1174 **** return false; } =20=20 ! /* If USE is a subreg, see if it can be replaced by a pseudo. */ =20=20 static bool ! forward_propagate_subreg (df_ref use, rtx_insn *def_insn, rtx def_set) { - rtx use_reg =3D DF_REF_REG (use); - rtx_insn *use_insn; - rtx src; scalar_int_mode int_use_mode, src_mode; =20=20 /* Only consider subregs... */ machine_mode use_mode =3D GET_MODE (use_reg); if (GET_CODE (use_reg) !=3D SUBREG ! || !REG_P (SET_DEST (def_set))) return false; =20=20 if (paradoxical_subreg_p (use_reg)) { /* If this is a paradoxical SUBREG, we have no idea what value the --- 647,670 ---- return false; } =20=20 ! /* Subroutine of forward_propagate_subreg that handles a use of DEST ! in REF. The other parameters are the same. */ =20=20 static bool ! forward_propagate_subreg (use_info *use, insn_info *def_insn, ! rtx dest, rtx src, df_ref ref) { scalar_int_mode int_use_mode, src_mode; =20=20 /* Only consider subregs... */ + rtx use_reg =3D DF_REF_REG (ref); machine_mode use_mode =3D GET_MODE (use_reg); if (GET_CODE (use_reg) !=3D SUBREG ! || GET_MODE (SUBREG_REG (use_reg)) !=3D GET_MODE (dest)) return false; =20=20 + /* ??? Replacing throughout the pattern would help for match_dups. */ + rtx *loc =3D DF_REF_LOC (ref); if (paradoxical_subreg_p (use_reg)) { /* If this is a paradoxical SUBREG, we have no idea what value the *************** *** 1176,1191 **** a SUBREG whose operand is the same as our mode, and all the modes are within a word, we can just use the inner operand because these SUBREGs just say how to treat the register. */ - use_insn =3D DF_REF_INSN (use); - src =3D SET_SRC (def_set); if (GET_CODE (src) =3D=3D SUBREG && REG_P (SUBREG_REG (src)) && REGNO (SUBREG_REG (src)) >=3D FIRST_PSEUDO_REGISTER && GET_MODE (SUBREG_REG (src)) =3D=3D use_mode ! && subreg_lowpart_p (src) ! && all_uses_available_at (def_insn, use_insn)) ! return try_fwprop_subst (use, DF_REF_LOC (use), SUBREG_REG (src), ! def_insn, false); } =20=20 /* If this is a SUBREG of a ZERO_EXTEND or SIGN_EXTEND, and the SUBREG --- 672,684 ---- a SUBREG whose operand is the same as our mode, and all the modes are within a word, we can just use the inner operand because these SUBREGs just say how to treat the register. */ if (GET_CODE (src) =3D=3D SUBREG && REG_P (SUBREG_REG (src)) && REGNO (SUBREG_REG (src)) >=3D FIRST_PSEUDO_REGISTER && GET_MODE (SUBREG_REG (src)) =3D=3D use_mode ! && subreg_lowpart_p (src)) ! return try_fwprop_subst (use, def_insn, loc, ! use_reg, SUBREG_REG (src)); } =20=20 /* If this is a SUBREG of a ZERO_EXTEND or SIGN_EXTEND, and the SUBREG *************** *** 1206,1213 **** else if (is_a (use_mode, &int_use_mode) && subreg_lowpart_p (use_reg)) { - use_insn =3D DF_REF_INSN (use); - src =3D SET_SRC (def_set); if ((GET_CODE (src) =3D=3D ZERO_EXTEND || GET_CODE (src) =3D=3D SIGN_EXTEND) && is_a (GET_MODE (src), &src_mode) --- 699,704 ---- *************** *** 1216,1354 **** && GET_MODE (XEXP (src, 0)) =3D=3D use_mode && !free_load_extend (src, def_insn) && (targetm.mode_rep_extended (int_use_mode, src_mode) ! !=3D (int) GET_CODE (src)) ! && all_uses_available_at (def_insn, use_insn)) ! return try_fwprop_subst (use, DF_REF_LOC (use), XEXP (src, 0), ! def_insn, false); } =20=20 return false; } =20=20 ! /* Try to replace USE with SRC (defined in DEF_INSN) in __asm. */ =20=20 static bool ! forward_propagate_asm (df_ref use, rtx_insn *def_insn, rtx def_set, rtx r= eg) { ! rtx_insn *use_insn =3D DF_REF_INSN (use); ! rtx src, use_pat, asm_operands, new_rtx, *loc; ! int speed_p, i; ! df_ref uses; !=20 ! gcc_assert ((DF_REF_FLAGS (use) & DF_REF_IN_NOTE) =3D=3D 0); !=20 ! src =3D SET_SRC (def_set); ! use_pat =3D PATTERN (use_insn); =20=20 ! /* In __asm don't replace if src might need more registers than ! reg, as that could increase register pressure on the __asm. */ ! uses =3D DF_INSN_USES (def_insn); ! if (uses && DF_REF_NEXT_LOC (uses)) return false; =20=20 ! update_df_init (def_insn, use_insn); ! speed_p =3D optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_insn)); ! asm_operands =3D NULL_RTX; ! switch (GET_CODE (use_pat)) ! { ! case ASM_OPERANDS: ! asm_operands =3D use_pat; ! break; ! case SET: ! if (MEM_P (SET_DEST (use_pat))) ! { ! loc =3D &SET_DEST (use_pat); ! new_rtx =3D propagate_rtx (*loc, GET_MODE (*loc), reg, src, speed_p); ! if (new_rtx) ! validate_unshare_change (use_insn, loc, new_rtx, true); ! } ! asm_operands =3D SET_SRC (use_pat); ! break; ! case PARALLEL: ! for (i =3D 0; i < XVECLEN (use_pat, 0); i++) ! if (GET_CODE (XVECEXP (use_pat, 0, i)) =3D=3D SET) ! { ! if (MEM_P (SET_DEST (XVECEXP (use_pat, 0, i)))) ! { ! loc =3D &SET_DEST (XVECEXP (use_pat, 0, i)); ! new_rtx =3D propagate_rtx (*loc, GET_MODE (*loc), reg, ! src, speed_p); ! if (new_rtx) ! validate_unshare_change (use_insn, loc, new_rtx, true); ! } ! asm_operands =3D SET_SRC (XVECEXP (use_pat, 0, i)); ! } ! else if (GET_CODE (XVECEXP (use_pat, 0, i)) =3D=3D ASM_OPERANDS) ! asm_operands =3D XVECEXP (use_pat, 0, i); ! break; ! default: ! gcc_unreachable (); ! } =20=20 ! gcc_assert (asm_operands && GET_CODE (asm_operands) =3D=3D ASM_OPERANDS= ); ! for (i =3D 0; i < ASM_OPERANDS_INPUT_LENGTH (asm_operands); i++) ! { ! loc =3D &ASM_OPERANDS_INPUT (asm_operands, i); ! new_rtx =3D propagate_rtx (*loc, GET_MODE (*loc), reg, src, speed_p= ); ! if (new_rtx) ! validate_unshare_change (use_insn, loc, new_rtx, true); ! } =20=20 ! if (num_changes_pending () =3D=3D 0 || !apply_change_group ()) ! return false; =20=20 ! update_df (use_insn, NULL); ! num_changes++; ! return true; } =20=20 ! /* Try to replace USE with SRC (defined in DEF_INSN) and simplify the ! result. */ =20=20 static bool ! forward_propagate_and_simplify (df_ref use, rtx_insn *def_insn, rtx def_s= et) { ! rtx_insn *use_insn =3D DF_REF_INSN (use); ! rtx use_set =3D single_set (use_insn); ! rtx src, reg, new_rtx, *loc; ! bool set_reg_equal; ! machine_mode mode; ! int asm_use =3D -1; !=20 ! if (INSN_CODE (use_insn) < 0) ! asm_use =3D asm_noperands (PATTERN (use_insn)); =20=20 ! if (!use_set && asm_use < 0 && !DEBUG_INSN_P (use_insn)) return false; =20=20 ! /* Do not propagate into PC, CC0, etc. */ if (use_set && GET_MODE (SET_DEST (use_set)) =3D=3D VOIDmode) return false; =20=20 ! /* If def and use are subreg, check if they match. */ ! reg =3D DF_REF_REG (use); ! if (GET_CODE (reg) =3D=3D SUBREG && GET_CODE (SET_DEST (def_set)) =3D= =3D SUBREG) ! { ! if (maybe_ne (SUBREG_BYTE (SET_DEST (def_set)), SUBREG_BYTE (reg))) ! return false; ! } ! /* Check if the def had a subreg, but the use has the whole reg. */ ! else if (REG_P (reg) && GET_CODE (SET_DEST (def_set)) =3D=3D SUBREG) ! return false; ! /* Check if the use has a subreg, but the def had the whole reg. Unlik= e the ! previous case, the optimization is possible and often useful indeed.= */ ! else if (GET_CODE (reg) =3D=3D SUBREG && REG_P (SET_DEST (def_set))) ! reg =3D SUBREG_REG (reg); !=20 ! /* Make sure that we can treat REG as having the same mode as the ! source of DEF_SET. */ ! if (GET_MODE (SET_DEST (def_set)) !=3D GET_MODE (reg)) ! return false; !=20 ! /* Check if the substitution is valid (last, because it's the most ! expensive check!). */ ! src =3D SET_SRC (def_set); ! if (!CONSTANT_P (src) && !all_uses_available_at (def_insn, use_insn)) return false; =20=20 /* Check if the def is loading something from the constant pool; in this --- 707,779 ---- && GET_MODE (XEXP (src, 0)) =3D=3D use_mode && !free_load_extend (src, def_insn) && (targetm.mode_rep_extended (int_use_mode, src_mode) ! !=3D (int) GET_CODE (src))) ! return try_fwprop_subst (use, def_insn, loc, use_reg, XEXP (src, 0)); } =20=20 return false; } =20=20 ! /* Try to substitute (set DEST SRC) from DEF_INSN into USE and simplify ! the result, handling cases where DEST is used in a subreg and where ! applying that subreg to SRC results in a useful simplification. */ =20=20 static bool ! forward_propagate_subreg (use_info *use, insn_info *def_insn, ! rtx dest, rtx src) { ! if (!use->includes_subregs () || !REG_P (dest)) ! return false; =20=20 ! if (GET_CODE (src) !=3D SUBREG ! && GET_CODE (src) !=3D ZERO_EXTEND ! && GET_CODE (src) !=3D SIGN_EXTEND) return false; =20=20 ! rtx_insn *use_rtl =3D use->insn ()->rtl (); ! df_ref ref; =20=20 ! FOR_EACH_INSN_USE (ref, use_rtl) ! if (DF_REF_REGNO (ref) =3D=3D use->regno () ! && forward_propagate_subreg (use, def_insn, dest, src, ref)) ! return true; =20=20 ! FOR_EACH_INSN_EQ_USE (ref, use_rtl) ! if (DF_REF_REGNO (ref) =3D=3D use->regno () ! && forward_propagate_subreg (use, def_insn, dest, src, ref)) ! return true; =20=20 ! return false; } =20=20 ! /* Try to substitute (set DEST SRC) from DEF_INSN into USE and ! simplify the result. */ =20=20 static bool ! forward_propagate_and_simplify (use_info *use, insn_info *def_insn, ! rtx dest, rtx src) { ! insn_info *use_insn =3D use->insn (); ! rtx_insn *use_rtl =3D use_insn->rtl (); =20=20 ! /* ??? This check seems unnecessary. We should be able to propagate ! into any kind of instruction, regardless of whether it's a single se= t. ! It seems odd to be more permissive with asms than normal instruction= s. */ ! bool need_single_set =3D (!use_insn->is_asm () && !use_insn->is_debug_i= nsn ()); ! rtx use_set =3D single_set (use_rtl); ! if (need_single_set && !use_set) return false; =20=20 ! /* Do not propagate into PC, CC0, etc. !=20 ! ??? This too seems unnecessary. The current code should work correc= tly ! without it, including cases where jumps become unconditional. */ if (use_set && GET_MODE (SET_DEST (use_set)) =3D=3D VOIDmode) return false; =20=20 ! /* In __asm don't replace if src might need more registers than ! reg, as that could increase register pressure on the __asm. */ ! if (use_insn->is_asm () && def_insn->uses ().size () > 1) return false; =20=20 /* Check if the def is loading something from the constant pool; in this *************** *** 1357,1505 **** if (MEM_P (src) && MEM_READONLY_P (src)) { rtx x =3D avoid_constant_pool_reference (src); ! if (x !=3D src && use_set) { ! rtx note =3D find_reg_note (use_insn, REG_EQUAL, NULL_RTX); ! rtx old_rtx =3D note ? XEXP (note, 0) : SET_SRC (use_set); rtx new_rtx =3D simplify_replace_rtx (old_rtx, src, x); if (old_rtx !=3D new_rtx) ! set_unique_reg_note (use_insn, REG_EQUAL, copy_rtx (new_rtx)); } return false; } =20=20 ! if (asm_use >=3D 0) ! return forward_propagate_asm (use, def_insn, def_set, reg); !=20 ! /* Else try simplifying. */ !=20 ! if (DF_REF_TYPE (use) =3D=3D DF_REF_REG_MEM_STORE) ! { ! loc =3D &SET_DEST (use_set); ! set_reg_equal =3D false; ! } ! else if (!use_set) ! { ! loc =3D &INSN_VAR_LOCATION_LOC (use_insn); ! set_reg_equal =3D false; ! } ! else ! { ! rtx note =3D find_reg_note (use_insn, REG_EQUAL, NULL_RTX); ! if (DF_REF_FLAGS (use) & DF_REF_IN_NOTE) ! loc =3D &XEXP (note, 0); ! else ! loc =3D &SET_SRC (use_set); !=20 ! /* Do not replace an existing REG_EQUAL note if the insn is not ! recognized. Either we're already replacing in the note, or we'll ! separately try plugging the definition in the note and simplifying. ! And only install a REQ_EQUAL note when the destination is a REG ! that isn't mentioned in USE_SET, as the note would be invalid ! otherwise. We also don't want to install a note if we are merely ! propagating a pseudo since verifying that this pseudo isn't dead ! is a pain; moreover such a note won't help anything. ! If the use is a paradoxical subreg, make sure we don't add a ! REG_EQUAL note for it, because it is not equivalent, it is one ! possible value for it, but we can't rely on it holding that value. ! See PR70574. */ ! set_reg_equal =3D (note =3D=3D NULL_RTX ! && REG_P (SET_DEST (use_set)) ! && !REG_P (src) ! && !(GET_CODE (src) =3D=3D SUBREG ! && REG_P (SUBREG_REG (src))) ! && !reg_mentioned_p (SET_DEST (use_set), ! SET_SRC (use_set)) ! && !paradoxical_subreg_p (DF_REF_REG (use))); ! } !=20 ! if (GET_MODE (*loc) =3D=3D VOIDmode) ! mode =3D GET_MODE (SET_DEST (use_set)); ! else ! mode =3D GET_MODE (*loc); !=20 ! new_rtx =3D propagate_rtx (*loc, mode, reg, src, ! optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_insn))); !=20 ! if (!new_rtx) ! return false; !=20 ! return try_fwprop_subst (use, loc, new_rtx, def_insn, set_reg_equal); } =20=20 -=20 /* Given a use USE of an insn, if it has a single reaching definition, try to forward propagate it into that insn. ! Return true if cfg cleanup will be needed. REG_PROP_ONLY is true if we should only propagate register copies. */ =20=20 static bool ! forward_propagate_into (df_ref use, bool reg_prop_only =3D false) { ! df_ref def; ! rtx_insn *def_insn, *use_insn; ! rtx def_set; ! rtx parent; !=20 ! if (DF_REF_FLAGS (use) & DF_REF_READ_WRITE) ! return false; ! if (DF_REF_IS_ARTIFICIAL (use)) return false; =20=20 ! /* Only consider uses that have a single definition. */ ! def =3D get_def_for_use (use); if (!def) return false; - if (DF_REF_FLAGS (def) & DF_REF_READ_WRITE) - return false; - if (DF_REF_IS_ARTIFICIAL (def)) - return false; =20=20 ! /* Check if the use is still present in the insn! */ ! use_insn =3D DF_REF_INSN (use); ! if (DF_REF_FLAGS (use) & DF_REF_IN_NOTE) ! parent =3D find_reg_note (use_insn, REG_EQUAL, NULL_RTX); ! else ! parent =3D PATTERN (use_insn); =20=20 ! if (!reg_mentioned_p (DF_REF_REG (use), parent)) return false; =20=20 ! def_insn =3D DF_REF_INSN (def); ! if (multiple_sets (def_insn)) return false; ! def_set =3D single_set (def_insn); if (!def_set) return false; =20=20 ! if (reg_prop_only ! && (!reg_single_def_p (SET_SRC (def_set)) ! || !reg_single_def_p (SET_DEST (def_set)))) ! return false; =20=20 /* Allow propagations into a loop only for reg-to-reg copies, since replacing one register by another shouldn't increase the cost. */ =20=20 ! if (DF_REF_BB (def)->loop_father !=3D DF_REF_BB (use)->loop_father ! && (!reg_single_def_p (SET_SRC (def_set)) ! || !reg_single_def_p (SET_DEST (def_set)))) return false; =20=20 ! /* Only try one kind of propagation. If two are possible, we'll ! do it on the following iterations. */ ! if (forward_propagate_and_simplify (use, def_insn, def_set) ! || forward_propagate_subreg (use, def_insn, def_set)) ! { ! propagations_left--; =20=20 - if (cfun->can_throw_non_call_exceptions - && find_reg_note (use_insn, REG_EH_REGION, NULL_RTX) - && purge_dead_edges (DF_REF_BB (use))) - return true; - } return false; } -=20 static void fwprop_init (void) --- 782,876 ---- if (MEM_P (src) && MEM_READONLY_P (src)) { rtx x =3D avoid_constant_pool_reference (src); ! rtx note_set; ! if (x !=3D src ! && (note_set =3D set_for_reg_notes (use_rtl)) ! && REG_P (SET_DEST (note_set)) ! && !contains_paradoxical_subreg_p (SET_SRC (note_set))) { ! rtx note =3D find_reg_note (use_rtl, REG_EQUAL, NULL_RTX); ! rtx old_rtx =3D note ? XEXP (note, 0) : SET_SRC (note_set); rtx new_rtx =3D simplify_replace_rtx (old_rtx, src, x); if (old_rtx !=3D new_rtx) ! set_unique_reg_note (use_rtl, REG_EQUAL, copy_rtx (new_rtx)); } return false; } =20=20 ! /* ??? Unconditionally propagating into PATTERN would work better ! for instructions that have match_dups. */ ! rtx *loc =3D need_single_set ? &use_set : &PATTERN (use_rtl); ! return try_fwprop_subst (use, def_insn, loc, dest, src); } =20=20 /* Given a use USE of an insn, if it has a single reaching definition, try to forward propagate it into that insn. ! Return true if something changed. !=20 REG_PROP_ONLY is true if we should only propagate register copies. */ =20=20 static bool ! forward_propagate_into (use_info *use, bool reg_prop_only =3D false) { ! if (use->includes_read_writes ()) return false; =20=20 ! /* Disregard uninitialized uses. */ ! def_info *def =3D use->def (); if (!def) return false; =20=20 ! /* Only consider single-register definitions. This could be relaxed, ! but it should rarely be needed before RA. */ ! def =3D look_through_degenerate_phi (def); ! if (def->includes_multiregs ()) ! return false; =20=20 ! /* Only consider uses whose definition comes from a real instruction. = */ ! insn_info *def_insn =3D def->insn (); ! if (def_insn->is_artificial ()) return false; =20=20 ! rtx_insn *def_rtl =3D def_insn->rtl (); ! if (!NONJUMP_INSN_P (def_rtl)) return false; ! /* ??? This seems an unnecessary restriction. We can easily tell ! which set the definition comes from. */ ! if (multiple_sets (def_rtl)) ! return false; ! rtx def_set =3D simple_regno_set (PATTERN (def_rtl), def->regno ()); if (!def_set) return false; =20=20 ! rtx dest =3D SET_DEST (def_set); ! rtx src =3D SET_SRC (def_set); =20=20 /* Allow propagations into a loop only for reg-to-reg copies, since replacing one register by another shouldn't increase the cost. */ + struct loop *def_loop =3D def_insn->bb ()->cfg_bb ()->loop_father; + struct loop *use_loop =3D use->bb ()->cfg_bb ()->loop_father; + if ((reg_prop_only || def_loop !=3D use_loop) + && (!reg_single_def_p (dest) || !reg_single_def_p (src))) + return false; =20=20 ! /* Don't substitute into a non-local goto, this confuses CFG. */ ! insn_info *use_insn =3D use->insn (); ! rtx_insn *use_rtl =3D use_insn->rtl (); ! if (JUMP_P (use_rtl) ! && find_reg_note (use_rtl, REG_NON_LOCAL_GOTO, NULL_RTX)) return false; =20=20 ! /* Don't replace register asms in asm statements; we mustn't ! change the user's register allocation. */ ! if (use_insn->is_asm () && register_asm_p (dest)) ! return false; !=20 ! if (forward_propagate_and_simplify (use, def_insn, dest, src) ! || forward_propagate_subreg (use, def_insn, dest, src)) ! return true; =20=20 return false; } static void fwprop_init (void) *************** *** 1513,1526 **** build_single_def_use_links. */ loop_optimizer_init (AVOID_CFG_MODIFICATIONS); =20=20 ! build_single_def_use_links (); ! df_set_flags (DF_DEFER_INSN_RESCAN); !=20 ! active_defs =3D XNEWVEC (df_ref, max_reg_num ()); ! if (flag_checking) ! active_defs_check =3D sparseset_alloc (max_reg_num ()); !=20 ! propagations_left =3D DF_USES_TABLE_SIZE (); } =20=20 static void --- 884,897 ---- build_single_def_use_links. */ loop_optimizer_init (AVOID_CFG_MODIFICATIONS); =20=20 ! #if ADD_NOTES ! /* Not necessary with the SSA version, just makes comparing the dumps ! easier. */ ! df_set_flags (DF_EQ_NOTES); ! df_note_add_problem (); ! #endif ! df_analyze (); ! crtl->ssa =3D new rtl_ssa::function_info (cfun); } =20=20 static void *************** *** 1528,1540 **** { loop_optimizer_finalize (); =20=20 ! use_def_ref.release (); ! free (active_defs); ! if (flag_checking) ! sparseset_free (active_defs_check); !=20 free_dominance_info (CDI_DOMINATORS); cleanup_cfg (0); delete_trivially_dead_insns (get_insns (), max_reg_num ()); =20=20 if (dump_file) --- 899,911 ---- { loop_optimizer_finalize (); =20=20 ! crtl->ssa->perform_pending_updates (); free_dominance_info (CDI_DOMINATORS); cleanup_cfg (0); +=20 + delete crtl->ssa; + crtl->ssa =3D nullptr; +=20 delete_trivially_dead_insns (get_insns (), max_reg_num ()); =20=20 if (dump_file) *************** *** 1543,1548 **** --- 914,954 ---- num_changes); } =20=20 + /* Try to optimize INSN, returning true if something changes. + FWPROP_ADDR_P is true if we are running fwprop_addr rather than + the full fwprop. */ +=20 + static bool + fwprop_insn (insn_info *insn, bool fwprop_addr_p) + { + for (use_info *use : insn->uses ()) + { + if (use->is_mem ()) + continue; + /* ??? The choices here follow those in the pre-SSA code. */ + if (!use->includes_address_uses ()) + { + if (forward_propagate_into (use, fwprop_addr_p)) + return true; + } + else + { + struct loop *loop =3D insn->bb ()->cfg_bb ()->loop_father; + /* The outermost loop is not really a loop. */ + if (loop =3D=3D NULL || loop_outer (loop) =3D=3D NULL) + { + if (forward_propagate_into (use, fwprop_addr_p)) + return true; + } + else if (fwprop_addr_p) + { + if (forward_propagate_into (use, false)) + return true; + } + } + } + return false; + } =20=20 /* Main entry point. */ =20=20 *************** *** 1555,1587 **** static unsigned int fwprop (bool fwprop_addr_p) { - unsigned i; -=20 fwprop_init (); =20=20 ! /* Go through all the uses. df_uses_create will create new ones at the ! end, and we'll go through them as well. =20=20 Do not forward propagate addresses into loops until after unrolling. CSE did so because it was able to fix its own mess, but we are not. = */ =20=20 ! for (i =3D 0; i < DF_USES_TABLE_SIZE (); i++) ! { ! if (!propagations_left) ! break; !=20 ! df_ref use =3D DF_USES_GET (i); ! if (use) ! { ! if (DF_REF_TYPE (use) =3D=3D DF_REF_REG_USE ! || DF_REF_BB (use)->loop_father =3D=3D NULL ! /* The outer most loop is not really a loop. */ ! || loop_outer (DF_REF_BB (use)->loop_father) =3D=3D NULL) ! forward_propagate_into (use, fwprop_addr_p); =20=20 ! else if (fwprop_addr_p) ! forward_propagate_into (use, false); ! } } =20=20 fwprop_done (); --- 961,993 ---- static unsigned int fwprop (bool fwprop_addr_p) { fwprop_init (); =20=20 ! /* Go through all the instructions (including debug instructions) looki= ng ! for uses that we could propagate into. =20=20 Do not forward propagate addresses into loops until after unrolling. CSE did so because it was able to fix its own mess, but we are not. = */ =20=20 ! insn_info *next; =20=20 ! /* ??? This code uses a worklist in order to preserve the behavior ! of the pre-SSA implementation. It would be better to instead ! iterate on each instruction until no more propagations are ! possible, then move on to the next. */ ! auto_vec worklist; ! for (insn_info *insn =3D crtl->ssa->first_insn (); insn; insn =3D next) ! { ! next =3D insn->next_any_insn (); ! if (insn->can_be_optimized () || insn->is_debug_insn ()) ! if (fwprop_insn (insn, fwprop_addr_p)) ! worklist.safe_push (insn); ! } ! for (unsigned int i =3D 0; i < worklist.length (); ++i) ! { ! insn_info *insn =3D worklist[i]; ! if (fwprop_insn (insn, fwprop_addr_p)) ! worklist.safe_push (insn); } =20=20 fwprop_done (); *** /tmp/dfQipa_test-return-const.c.before-fwprop.c 2020-11-13 08:23:52.853= 409199 +0000 --- gcc/testsuite/gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c 202= 0-11-13 08:05:06.490403698 +0000 *************** *** 31,37 **** } =20=20 /* Verify that insn 5 is eliminated. */ ! /* { dg-final { scan-rtl-dump "deferring deletion of insn with uid =3D 5"= "fwprop1" } } */ /* { dg-final { scan-rtl-dump "Deleted 1 trivially dead insns" "fwprop1" = } } */ =20=20 int main (void) --- 31,37 ---- } =20=20 /* Verify that insn 5 is eliminated. */ ! /* { dg-final { scan-rtl-dump "deleting insn with uid =3D 5" "fwprop1" } = } */ /* { dg-final { scan-rtl-dump "Deleted 1 trivially dead insns" "fwprop1" = } } */ =20=20 int main (void) *** /tmp/XS4Rr9_st4_s8.c 2020-11-13 08:23:52.865409146 +0000 --- gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st4_s8.c 2020-11-13 08:05= :06.490403698 +0000 *************** *** 74,80 **** /* ** st4_s8_32: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+\] ** ret */ TEST_STORE (st4_s8_32, svint8x4_t, int8_t, --- 74,80 ---- /* ** st4_s8_32: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+, x[0-9]+\] ** ret */ TEST_STORE (st4_s8_32, svint8x4_t, int8_t, *************** *** 135,141 **** /* ** st4_s8_m36: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+\] ** ret */ TEST_STORE (st4_s8_m36, svint8x4_t, int8_t, --- 135,141 ---- /* ** st4_s8_m36: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+, x[0-9]+\] ** ret */ TEST_STORE (st4_s8_m36, svint8x4_t, int8_t, *************** *** 205,211 **** /* ** st4_vnum_s8_32: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+\] ** ret */ TEST_STORE (st4_vnum_s8_32, svint8x4_t, int8_t, --- 205,211 ---- /* ** st4_vnum_s8_32: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+, x[0-9]+\] ** ret */ TEST_STORE (st4_vnum_s8_32, svint8x4_t, int8_t, *************** *** 266,272 **** /* ** st4_vnum_s8_m36: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+\] ** ret */ TEST_STORE (st4_vnum_s8_m36, svint8x4_t, int8_t, --- 266,272 ---- /* ** st4_vnum_s8_m36: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+, x[0-9]+\] ** ret */ TEST_STORE (st4_vnum_s8_m36, svint8x4_t, int8_t, *** /tmp/nepSbd_st4_u8.c 2020-11-13 08:23:52.881409075 +0000 --- gcc/testsuite/gcc.target/aarch64/sve/acle/asm/st4_u8.c 2020-11-13 08:05= :06.490403698 +0000 *************** *** 74,80 **** /* ** st4_u8_32: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+\] ** ret */ TEST_STORE (st4_u8_32, svuint8x4_t, uint8_t, --- 74,80 ---- /* ** st4_u8_32: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+, x[0-9]+\] ** ret */ TEST_STORE (st4_u8_32, svuint8x4_t, uint8_t, *************** *** 135,141 **** /* ** st4_u8_m36: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+\] ** ret */ TEST_STORE (st4_u8_m36, svuint8x4_t, uint8_t, --- 135,141 ---- /* ** st4_u8_m36: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+, x[0-9]+\] ** ret */ TEST_STORE (st4_u8_m36, svuint8x4_t, uint8_t, *************** *** 205,211 **** /* ** st4_vnum_u8_32: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+\] ** ret */ TEST_STORE (st4_vnum_u8_32, svuint8x4_t, uint8_t, --- 205,211 ---- /* ** st4_vnum_u8_32: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+, x[0-9]+\] ** ret */ TEST_STORE (st4_vnum_u8_32, svuint8x4_t, uint8_t, *************** *** 266,272 **** /* ** st4_vnum_u8_m36: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+\] ** ret */ TEST_STORE (st4_vnum_u8_m36, svuint8x4_t, uint8_t, --- 266,272 ---- /* ** st4_vnum_u8_m36: ** [^{]* ! ** st4b {z0\.b - z3\.b}, p0, \[x[0-9]+, x[0-9]+\] ** ret */ TEST_STORE (st4_vnum_u8_m36, svuint8x4_t, uint8_t,