From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A5ADA385772B for ; Thu, 7 Sep 2023 13:42:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A5ADA385772B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 210D71762; Thu, 7 Sep 2023 06:43:37 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 79BA63F766; Thu, 7 Sep 2023 06:42:58 -0700 (PDT) From: Richard Sandiford To: Robin Dapp Mail-Followup-To: Robin Dapp ,Jeff Law , gcc-patches , richard.sandiford@arm.com Cc: Jeff Law , gcc-patches Subject: Re: [PATCH] fwprop: Allow UNARY_P and check register pressure. References: <5a90c8a9-1570-5af4-bfdc-19d097bfee6e@gmail.com> <9acc1a24-5d01-40ad-b4b2-5948585d3e8c@gmail.com> <48bed106-190e-ab5f-4099-fdfd4f5a193f@gmail.com> <77daf152-e1fe-4980-8297-d37901d925e8@gmail.com> Date: Thu, 07 Sep 2023 14:42:57 +0100 In-Reply-To: (Robin Dapp's message of "Wed, 6 Sep 2023 13:22:12 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-25.1 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Robin Dapp writes: > Hi Richard, > > I did some testing with the attached v2 that does not restrict to UNARY > anymore. As feared ;) there is some more fallout that I'm detailing below. > > On Power there is one guality fail (pr43051-1.c) that I would take > the liberty of ignoring for now. > > On x86 there are four fails: > > - cond_op_addsubmuldiv__Float16-2.c: assembler error > unsupported masking for `vmovsh'. I guess that's a latent backend > problem. > > - ifcvt-3.c, pr49095.c: Here we propagate into a compare. Before, we had > (cmp (reg/CC) 0) and now we have (cmp (plus (reg1 reg2) 0). > That looks like a costing problem and can hopefully solveable by making > the second compare more expensive, preventing the propagation. > i386 costing (or every costing?) is brittle so that could well break other > things. > > - pr88873.c: This is interesting because even before this patch we > propagated with different register classes (V2DF vs DI). With the patch > we check the register pressure, find the class NO_REGS for V2DF and > abort (because the patch assumes NO_REGS = high pressure). I'm thinking > of keeping the old behavior for reg-reg propagations and only checking > the pressure for more complex operations. > > aarch64 has the most fails: > > - One guality fail (same as Power). > - shrn-combine-[123].c as before. > > - A class of (hopefully, I only checked some) similar cases where we > propagate an unspec_whilelo into an unspec_ptest. Before we would only > set a REG_EQUALS note. > Before we managed to create a while_ultsivnx16bi_cc whereas now we have > while_ultsivnx16bi and while_ultsivnx16bi_ptest that won't be combined. > We create redundant whilelos and I'm not sure how to improve that. I > guess a peephole is out of the question :) Yeah, I think this is potentially a blocker for propagating A into B when A is used elsewhere. Combine is able to combine A and B while keeping A in parallel with the result. I think either fwprop would need to try that too, or it would need to be restricted to cases where A is only used in B. I don't think that's really a UNARY_P-or-not thing. The same problem would in principle apply to plain unary operations. > - pred-combine-and.c: Here the new propagation appears useful at first. I'm not sure about that, because... > We propagate a "vector mask and" into a while_ultsivnx4bi_ptest and the > individual and registers remain live up to the propagation site (while > being dead before the patch). > With the registers dead, combine could create a single fcmgt before. > Now it only manages a 2->2 combination because we still need the registers > and end up with two fcmgts. > The code is worse but this seems more bad luck than anything. ...the fwprop1 change is: (insn 26 25 27 4 (set (reg:VNx4BI 102 [ vec_mask_and_64 ]) (and:VNx4BI (reg:VNx4BI 116 [ mask__30.13 ]) (reg:VNx4BI 98 [ loop_mask_58 ]))) 8420 {andvnx4bi3} (nil)) ... (insn 31 30 32 4 (set (reg:VNx4BI 106 [ mask__24.18 ]) (and:VNx4BI (reg:VNx4BI 118 [ mask__25.17 ]) (reg:VNx4BI 102 [ vec_mask_and_64 ]))) 8420 {andvnx4bi3} (nil)) to: (insn 26 25 27 4 (set (reg:VNx4BI 102 [ vec_mask_and_64 ]) (and:VNx4BI (reg:VNx4BI 116 [ mask__30.13 ]) (reg:VNx4BI 98 [ loop_mask_58 ]))) 8420 {andvnx4bi3} (expr_list:REG_DEAD (reg:VNx4BI 116 [ mask__30.13 ]) (expr_list:REG_DEAD (reg:VNx4BI 98 [ loop_mask_58 ]) (nil)))) ... (insn 31 30 32 4 (set (reg:VNx4BI 106 [ mask__24.18 ]) (and:VNx4BI (and:VNx4BI (reg:VNx4BI 116 [ mask__30.13 ]) (reg:VNx4BI 98 [ loop_mask_58 ])) (reg:VNx4BI 118 [ mask__25.17 ]))) 8428 {aarch64_pred_andvnx4bi_z} (nil)) On its own this isn't worse. But it's also not a win. The before and after sequences have equal cost, but the after sequence is more complex. That would probably be OK for something that runs near the end of the pre-RA pipeline, since it could in principle increase parallelism. But it's probably a bad idea when we know that the main instruction combination pass is still to run. By making insn 31 more complex, we're making it a less likely combination candidate. So this isn't necesarily the wrong thing to do. But I think it is the wrong time to do it. > - Addressing fails from before: I looked into these and suspect all of > them are a similar. > What happens is that we have a poly_int offset that we shift, negate > and then add to x0. The result is used as load address. > Before, we would pull (combine) the (plus x0 reg) into the load keeping > the neg and shift. > Now we propagate everything into a single (set (minus x0 offset)). > The propagation itself seems worthwhile because we save one insn. > However as we got rid of the base/offset split by lumping everything > together, combine cannot pull the (plus) into the address load and > we require an aarch64_split_add_offset. This will emit the longer > sequence of ashiftl and subtract. The "base" address is x0 here so > we cannot convert (minus x0 ...)) into neg. > I didn't go through all of aarch64_split_add_offset. I suppose we > could re-add the separation of base/offset there but that might be > a loss when the result is not used as an address. Yeah, at least for ld4_s8.c, I agree there are improvements as well as regressions. And we don't use (shifted) index addressing for ld4_s16.c etc even before the patch. So I agree those changes are OK. I don't know whether this will show up elsewhere. But we can ignore it for these particular instances. I think the summary is: IMO, we have to be mindful that combine is still to run. We need to avoid making equal-cost changes if the new form is more complex, or otherwise likely to interfere with combine. Alternatively, we could delay the optimisation until after combine and have freer rein, since we're then just mopping up opportunities that other passes left behind. A while back I was experimenting with a second combine pass. That was the original motiviation for rtl-ssa. I never got chance to finish it off though. We could do with something similar for PR106594. Thanks, Richard > Again, all in all no fatal problems but pretty annoying :) It's not much > but just gradually worse than with just UNARY. Any idea on how/whether to > continue? > > Regards > Robin > > gcc/ChangeLog: > > * fwprop.cc (fwprop_propagation::profitable_p): Add unary > handling. > (fwprop_propagation::update_register_pressure): New function. > (fwprop_propagation::register_pressure_high_p): New function > (reg_single_def_for_src_p): Look through unary expressions. > (try_fwprop_subst_pattern): Check register pressure. > (forward_propagate_into): Call new function. > (fwprop_init): Init register pressure. > (fwprop_done): Clean up register pressure. > (fwprop_insn): Add comment. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c: New test. > --- > gcc/fwprop.cc | 359 +++++++++++++++++- > .../riscv/rvv/autovec/binop/vadd-vx-fwprop.c | 64 ++++ > 2 files changed, 419 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c > > diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc > index 0707a234726..ce6f5a74b00 100644 > --- a/gcc/fwprop.cc > +++ b/gcc/fwprop.cc > @@ -36,6 +36,10 @@ along with GCC; see the file COPYING3. If not see > #include "tree-pass.h" > #include "rtl-iter.h" > #include "target.h" > +#include "dominance.h" > + > +#include "ira.h" > +#include "regpressure.h" > > /* This pass does simple forward propagation and simplification when an > operand of an insn can only come from a single def. This pass uses > @@ -103,6 +107,10 @@ using namespace rtl_ssa; > > static int num_changes; > > +/* Keep track of which registers already increased the pressure to avoid double > + booking. */ > +sbitmap pressure_accounted; > + > /* Do not try to replace constant addresses or addresses of local and > argument slots. These MEM expressions are made only once and inserted > in many instructions, as well as being used to control symbol table > @@ -181,6 +189,8 @@ namespace > bool changed_mem_p () const { return result_flags & CHANGED_MEM; } > bool folded_to_constants_p () const; > bool profitable_p () const; > + bool register_pressure_high_p (rtx, rtx, rtx_insn *, rtx_insn *) const; > + bool update_register_pressure (rtx, rtx, rtx_insn *, rtx_insn *) const; > > bool check_mem (int, rtx) final override; > void note_simplification (int, uint16_t, rtx, rtx) final override; > @@ -340,18 +350,272 @@ fwprop_propagation::profitable_p () const > && !paradoxical_subreg_p (to)) > return true; > > - if (CONSTANT_P (to)) > + /* Disallow propagations into debug insns when it is not a reg-reg > + propagation. Their costs are usually 0 and a cost comparison will > + always be profitable no matter how much we complicate the pattern. */ > + if (DEBUG_INSN_P (insn)) > + return false; > + > + /* For more complex operations check each sub-rtx. */ > + subrtx_iterator::array_type array; > + FOR_EACH_SUBRTX (iter, array, to, NONCONST) > + { > + const_rtx x = *iter; > + > + if (REG_P (x)) > + continue; > + > + else if (GET_CODE (x) == SUBREG > + && REG_P (SUBREG_REG (x)) > + && !paradoxical_subreg_p (x)) > + continue; > + > + else if (rtx_equal_p (x, to)) > + continue; > + > + else > + return false; > + } > + > + return true; > +} > + > +/* Check if the register pressure in any predecessor block of USE's block > + until DEF's block is equal or higher than the number of hardregs in NU's > + register class. */ > +bool > +fwprop_propagation::register_pressure_high_p (rtx nu, rtx old, rtx_insn *def, > + rtx_insn *use) const > +{ > + enum reg_class nu_class, old_class; > + int nu_nregs, old_nregs; > + nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs); > + old_class > + = regpressure_get_regno_pressure_class (REGNO (old), &old_nregs); > + > + if (nu_class == NO_REGS && old_class == NO_REGS) > return true; > > - return false; > This patch enables the forwarding of sources from more complex > operations like unary (i.e. vec_duplicate) or binary. As this > potentially involves replacing a register with a register of a different > class the ira_hoist_pressure machinery is used to calculate the change > in register pressure. If the propagation would increase the pressure > beyond the number of hard regs, we don't perform it. > + if (nu_class == old_class) > + return false; > + > + basic_block bbfrom = BLOCK_FOR_INSN (def); > + basic_block bbto = BLOCK_FOR_INSN (use); > + > + basic_block bb; > + > + sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun)); > + bitmap_clear (visited); > + auto_vec q; > + q.safe_push (bbto); > + > + bool high = false; > + > + while (!q.is_empty ()) > + { > + bb = q.pop (); > + > + if (bitmap_bit_p (visited, bb->index)) > + continue; > + > + /* Nothing to do if the register to be replaced is not live > + in this BB. */ > + if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old))) > + continue; > + > + /* Nothing to do if the replacement register is already live in > + this BB. */ > + if (regpressure_is_live_in (bb, REGNO (nu))) > + continue; > + > + bitmap_set_bit (visited, bb->index); > + > + int press = regpressure_get (bb, nu_class); > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "pressure for reg %d in bb %d: %d\n", > + REGNO (nu), bb->index, press); > + > + if (press + nu_nregs > ira_class_hard_regs_num[nu_class]) > + { > + high = true; > + break; > + } > + > + if (bb == bbfrom || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)) > + break; > + > + edge e; > + edge_iterator ei; > + > + FOR_EACH_EDGE (e, ei, bb->preds) > + { > + basic_block pred = e->src; > + q.safe_push (pred); > + } > + } > + > + sbitmap_free (visited); > + > + return high; > } > > -/* Check that X has a single def. */ > +/* Update the register pressure when propagating from DEF to USE > + replacing OLD with NU. Return TRUE if the pressure was updated. */ > +bool > +fwprop_propagation::update_register_pressure (rtx nu, rtx old, rtx_insn *def, > + rtx_insn *use) const > +{ > + basic_block bb; > + > + enum reg_class nu_class, old_class; > + int nu_nregs, old_nregs; > + nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs); > + old_class = regpressure_get_regno_pressure_class (REGNO (old), &old_nregs); > + > + bool changed = false; > + > + if (nu_class == old_class) > + return false; > + > + basic_block bbfrom = BLOCK_FOR_INSN (def); > + basic_block bbto = BLOCK_FOR_INSN (use); > + > + sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun)); > + bitmap_clear (visited); > + auto_vec q; > + q.safe_push (bbto); > + > + /* Precompute the users. */ > + auto_vec users; > + df_ref duse; > + bool should_decrease = true; > + FOR_EACH_INSN_USE (duse, def) > + { > + rtx dreg = DF_REF_REAL_REG (duse); > + if (REGNO (dreg) != REGNO (old)) > + continue; > + > + df_ref op_ref = DF_REG_USE_CHAIN (REGNO (dreg)); > + for (; op_ref; op_ref = DF_REF_NEXT_REG (op_ref)) > + { > + if (!DF_REF_INSN_INFO (op_ref)) > + continue; > + > + rtx_insn *insn = DF_REF_INSN (op_ref); > + > + /* If there are other users in BBTO, never decrease the > + pressure. */ > + if (BLOCK_FOR_INSN (insn) == bbto && insn != use) > + { > + should_decrease = false; > + break; > + } > + > + users.safe_push (insn); > + } > + } > + > + while (!q.is_empty ()) > + { > + bb = q.pop (); > + > + if (bitmap_bit_p (visited, bb->index)) > + continue; > + > + /* Nothing to do if the register to be replaced is not live > + in this BB. */ > + if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old))) > + continue; > + > + /* Nothing to do if the new register is already live in this BB. */ > + if (regpressure_is_live_in (bb, REGNO (nu))) > + continue; > + > + bitmap_set_bit (visited, bb->index); > + > + if (regpressure_is_live_in (bb, REGNO (old))) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "increasing pressure for " > + "reg %d in bb %d by %d reg(s).\n", > + REGNO (nu), bb->index, nu_nregs); > + regpressure_set_live_in (bb, REGNO (nu)); > + regpressure_increase (bb, nu_class, nu_nregs); > + changed = true; > + > + if (should_decrease) > + { > + rtx_insn *user; > + int ix; > + bool should_decrease_local = true; > + > + /* If this BB dominates no BB where OLD is used (except BBTO) > + the register pressure is decreased. */ > + FOR_EACH_VEC_ELT(users, ix, user) > + { > + basic_block ubb = BLOCK_FOR_INSN (user); > + if (ubb == bbto) > + continue; > + if (dominated_by_p (CDI_DOMINATORS, ubb, bb)) > + { > + should_decrease_local = false; > + break; > + } > + } > + > + if (should_decrease_local) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "decreasing pressure for " > + "reg %d in bb %d by %d reg(s).\n", > + REGNO (old), bb->index, old_nregs); > + regpressure_clear_live_in (bb, REGNO (old)); > + regpressure_decrease (bb, old_class, old_nregs); > + } > + } > + } > + > + if (bb == bbfrom || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)) > + break; > + > + edge e; > + edge_iterator ei; > + > + FOR_EACH_EDGE (e, ei, bb->preds) > + { > + basic_block pred = e->src; > + q.safe_push (pred); > + } > + } > + > + sbitmap_free (visited); > + return changed; > +} > + > +/* Check that X has a single def. In case of more complex operations > + check each sub-rtx. */ > > static bool > reg_single_def_p (rtx x) > { > - return REG_P (x) && crtl->ssa->single_dominating_def (REGNO (x)); > + if (REG_P (x)) > + return crtl->ssa->single_dominating_def (REGNO (x)); > + > + subrtx_var_iterator::array_type array; > + FOR_EACH_SUBRTX_VAR (iter, array, x, ALL) > + { > + rtx sub = *iter; > + > + if (SUBREG_P (x) && !contains_paradoxical_subreg_p (sub)) > + sub = SUBREG_REG (x); > + > + bool ok = REG_P (sub) && crtl->ssa->single_dominating_def (REGNO (sub)); > + > + if (!ok) > + return false; > + } > + > + return true; > } > > /* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of > @@ -490,6 +754,81 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, > } > } > > + /* Perform register-pressure checks for all sub-rtxs. > + If FLAG_IRA_HOIST_PRESSURE is set only allow the propagation if the > + register pressure is not high. > + Otherwise only allow the propagation if DEF has only a single use. */ > + if (ok) > + { > + rtx_insn *def_rtl = def_insn->rtl (); > + > + subrtx_var_iterator::array_type array; > + FOR_EACH_SUBRTX_VAR (iter, array, src, NONCONST) > + { > + rtx srcreg = *iter; > + rtx dstreg = dest; > + > + /* For register pressure we do not care about subregs so strip > + them. */ > + while (SUBREG_P (dstreg)) > + dstreg = SUBREG_REG (dstreg); > + > + while (SUBREG_P (srcreg)) > + srcreg = SUBREG_REG (srcreg); > + > + if (!REG_P (dstreg) || !REG_P (srcreg)) > + continue; > + > + /* Nothing to do for the same class. */ > + bool same_class = regpressure_same_class (dstreg, srcreg); > + if (same_class) > + { > + if (flag_ira_hoist_pressure) > + continue; > + else > + { > + /* Check if this is the only use. If not do not > + continue. */ > + for (use_info *use : def->all_uses ()) > + if (use->insn () != use_insn) > + return false; > + } > + } > + > + /* Never allow different pressure classes when not considering > + register pressure. */ > + if (!same_class && !flag_ira_hoist_pressure) > + return false; > + > + if (!bitmap_bit_p (pressure_accounted, REGNO (srcreg))) > + { > + if (!prop.register_pressure_high_p (srcreg, dstreg, def_rtl, > + use_rtl)) > + { > + bool updated > + = prop.update_register_pressure (srcreg, dstreg, def_rtl, > + use_rtl); > + if (updated) > + bitmap_set_bit (pressure_accounted, REGNO (srcreg)); > + } > + else > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "cannot propagate from insn %d into" > + " insn %d: %s\n", def_insn->uid (), > + use_insn->uid (), > + "would increase register pressure beyond number" > + " of hard regs"); > + ok = false; > + } > + } > + else > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "pressure for reg %d has already" > + " been accounted\n", REGNO (srcreg)); > + } > + } > + > if (!ok) > { > /* The pattern didn't match, but if all uses of SRC folded to > @@ -538,6 +877,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, > confirm_change_group (); > crtl->ssa->change_insn (use_change); > num_changes++; > + > return true; > } > > @@ -890,6 +1230,10 @@ fwprop_init (void) > > df_analyze (); > crtl->ssa = new rtl_ssa::function_info (cfun); > + > + regpressure_init (); > + pressure_accounted = sbitmap_alloc (DF_REG_SIZE (df)); > + bitmap_clear (pressure_accounted); > } > > static void > @@ -910,6 +1254,9 @@ fwprop_done (void) > fprintf (dump_file, > "\nNumber of successful forward propagations: %d\n\n", > num_changes); > + > + regpressure_cleanup (); > + sbitmap_free (pressure_accounted); > } > > /* Try to optimize INSN, returning true if something changes. > @@ -919,6 +1266,10 @@ fwprop_done (void) > static bool > fwprop_insn (insn_info *insn, bool fwprop_addr_p) > { > + /* TODO: when rejecting propagations due to register pressure > + we would actually like to try the propagation with most > + potential = uses first instead of going through them sequentially. */ > + > for (use_info *use : insn->uses ()) > { > if (use->is_mem ()) > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c > new file mode 100644 > index 00000000000..0ed1bb80f73 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c > @@ -0,0 +1,64 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -fdump-rtl-fwprop1-details" } */ > + > +/* Check that we propagate from vec_duplicates into vadd.vv, turning it into > + vadd.vx. */ > + > +#include > + > +/* Define 26 variables so that we can perform two forward propagations until we > + reach the hard reg limit of 28. */ > +uint64_t *restrict dst1; > +uint64_t *restrict dst2; > +uint64_t *restrict dst3; > +uint64_t *restrict dst4; > +uint64_t *restrict dst5; > +uint64_t *restrict dst6; > +uint64_t *restrict dst7; > +uint64_t *restrict dst8; > +uint64_t *restrict dst9; > +uint64_t *restrict dst10; > +uint64_t *restrict dst11; > +uint64_t *restrict a1; > +uint64_t *restrict a2; > +uint64_t *restrict a3; > +uint64_t *restrict a4; > +uint64_t *restrict a5; > +uint64_t *restrict a6; > +uint64_t *restrict a7; > +uint64_t *restrict a8; > +uint64_t *restrict a9; > +uint64_t *restrict a10; > +uint64_t *restrict a11; > +uint64_t b1; > +uint64_t b2; > +uint64_t b3; > +uint64_t b4; > + > +void foo (int n) > +{ > + __builtin_expect (n % 128, 0); > + for (int i = 0; i < n; i++) > + { > + /* We expect b3 to be propagated into vadd here as well as in the other > + 7 occurences (without increasing the pressure every time). > + Then b2 is still supposed to be propagated but b1 is not even though > + it would be better to do so. */ > + dst1[i] = a1[i] + b3; > + dst2[i] = a2[i] + b3; > + dst3[i] = a3[i] + b2; > + dst4[i] = a4[i] + b1; > + dst5[i] = a5[i] + b1; > + dst6[i] = a6[i] + b3; > + dst7[i] = a7[i] + b3; > + dst8[i] = a8[i] + b3; > + dst9[i] = a9[i] + b3; > + dst10[i] = a10[i] + b3; > + dst11[i] = a11[i] + b3; > + } > +} > + > +/* { dg-final { scan-rtl-dump-times "increasing pressure for reg" 2 "fwprop1" } } */ > +/* { dg-final { scan-rtl-dump-times "pressure for reg \[0-9\]+ has already been accounted" 7 "fwprop1" } } */ > +/* { dg-final { scan-rtl-dump-times "would increase register pressure beyond number of hard regs" 2 "fwprop1" } } */ > +/* { dg-final { scan-assembler-times "\tvadd\.vx" 9 } } */