From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a]) by sourceware.org (Postfix) with ESMTPS id ECA733858D28 for ; Mon, 7 Aug 2023 10:27:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ECA733858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x22a.google.com with SMTP id 38308e7fff4ca-2b9db1de50cso65808821fa.3 for ; Mon, 07 Aug 2023 03:27:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691404021; x=1692008821; h=content-transfer-encoding:to:subject:from:content-language:cc :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=EY/mt2adNXYap0BQN0ANeeoFUSoPxDXmGMUL+xzLqns=; b=YUI8T0ax98v5p0urEv3ERK0L2HF9DkzYxy/foOVTMP59a6wjxfctNVu4Jb9B1HWtMW SMvWzdZo6HN3XLs1Jj+mbBCYTDzFk450C8NKfu19Wd7yo8rjAN2b8kbT8P4H06ZOn1HY qjZJ48L44DFDBWj7exVLEvYpGDsHzDMH2ZOBGwvsslG8xakzTEcE9hFl0SfnJDk0OLVP IShVg87UQE4IevA50xw0I1KbKMnn6h7zXDNsr4tC7T26WL6bumb+WSSH9NXccpx+ekbl RhOAp5yFYS+uKUUxb903tR69tZeo9IpVYNpZxQdFsGgrQbhJKAxCH7AJdApAH5TU/xtS 6OlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691404021; x=1692008821; h=content-transfer-encoding:to:subject:from:content-language:cc :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=EY/mt2adNXYap0BQN0ANeeoFUSoPxDXmGMUL+xzLqns=; b=dZiluY6nJmcYam7tcmtaqiM51vfA5wVdgsfvhxTz8LiBtCt055+CvVeYRBiQWV/bZ2 zq7SQCVXgCKm9lZ6e+3OMUNdoMg9ctjbxzntGHCpXpudYggEGd/TFRcXR5MAmqhaa0tp Z2WFf/ycHQo3aXf7hVtuwHv3HR3X/Vi/h5GPJHje1jl4tqfBi1qCGntp0Q3E+T1qni8U FO2rA1ALPU7mJVBqEswXIzDMjyUh5JhfKSuZpjaTmJbRQW55DYF51gwxaJSzno4MkT5Z qLenyXtIrL+eQLqgdv8keq3f4wywn84wW20Ji7M3p0SxHurEuw6ur9yrXItnZTCORycQ IQfg== X-Gm-Message-State: AOJu0YxCEt5MEmqn5wwYLqB0lCIjELDkH4PPJrgAP2duXuTWdLa2dWbl Y/v9wd+7mShQ8p72gwGBhy4pBrXN50g= X-Google-Smtp-Source: AGHT+IFpjW9b/MKp+CmffWleol9QSNO4a5hnAek7CbZMnEwTKEm0M515aXfKARUlYRdcX6BYnq5mdQ== X-Received: by 2002:a2e:a163:0:b0:2b6:dd26:c02a with SMTP id u3-20020a2ea163000000b002b6dd26c02amr6163776ljl.14.1691404020822; Mon, 07 Aug 2023 03:27:00 -0700 (PDT) Received: from [192.168.1.23] (ip-046-005-130-086.um12.pools.vodafone-ip.de. [46.5.130.86]) by smtp.gmail.com with ESMTPSA id e14-20020a170906374e00b0099ce025f8ccsm746794ejc.186.2023.08.07.03.27.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Aug 2023 03:27:00 -0700 (PDT) Message-ID: <5a90c8a9-1570-5af4-bfdc-19d097bfee6e@gmail.com> Date: Mon, 7 Aug 2023 12:26:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: rdapp.gcc@gmail.com Content-Language: en-US From: Robin Dapp Subject: [PATCH] fwprop: Allow UNARY_P and check register pressure. To: gcc-patches , "richard.sandiford" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, originally inspired by the wish to transform vmv v3, a0 ; = vec_duplicate vadd.vv v1, v2, v3 into vadd.vx v1, v2, a0 via fwprop for riscv, this patch enables the forward propagation of UNARY_P sources. As this involves potentially replacing a vector register with a scalar register the ira_hoist_pressure machinery is used to calculate the change in register pressure. If the propagation would increase the pressure beyond the number of hard regs, we don't perform it. The regpressure commit this patch depends on is not yet pushed because I figured I'd post the code using it in case of further comments. The testsuite is unchanged on i386 and power10 but aarch64 has some new FAILs but I'm not terribly familiar with aarch64, hence some examples here: The following cases shrn-combine-1/2/3 seem worse as we emit one instruction more. Before: shrn v30.8b, v30.8h, 2 shrn2 v30.16b, v31.8h, 2 str q30, [x1, x3] After: ushr v30.8h, v30.8h, 2 ushr v31.8h, v31.8h, 2 uzp1 v31.16b, v30.16b, v31.16b str q31, [x1, x3] Here, one optimization already happens in fwprop so combine cannot do the same work it did before. vec-init-22-speed.c changes from sxth w0, w0 sxth w1, w1 fmov d31, x0 fmov d0, x1 to: dup v31.4h, w0 dup v0.4h, w1 which I hope has the same semantics and is shorter. Apart from that there are numerous check-asm testcases where a new, earlier propagation prevents a later, supposedly better propagation. One such example is from ld4_s8.c: (insn 11 10 12 2 (set (reg:DI 100) (neg:DI (reg:DI 102))) 385 {negdi2} (expr_list:REG_EQUAL (const_poly_int:DI [-576, -576]) (nil))) (insn 12 11 13 2 (set (reg/f:DI 99) (plus:DI (reg/v/f:DI 97 [ x0 ]) (reg:DI 100))) 153 {*adddi3_aarch64} (expr_list:REG_EQUAL (plus:DI (reg/v/f:DI 97 [ x0 ]) (const_poly_int:DI [-576, -576])) (nil))) (insn 13 12 14 2 (set (reg/v:VNx64QI 94 [ z0 ]) (unspec:VNx64QI [ (reg/v:VNx16BI 96 [ p0 ]) (mem:VNx64QI (reg/f:DI 99) [0 MEM [(signed char *)_1]+0 S[64, 64] A8]) ] UNSPEC_LDN)) 5885 {vec_mask_load_lanesvnx64qivnx16qi} (nil)) where we now do: propagating insn 11 into insn 12, replacing: (set (reg/f:DI 99) (plus:DI (reg/v/f:DI 97 [ x0 ]) (reg:DI 100))) successfully matched this instruction to subdi3: (set (reg/f:DI 99) (minus:DI (reg/v/f:DI 97 [ x0 ]) (reg:DI 102))) vs before: cannot propagate from insn 11 into insn 12: would increase complexity of pattern propagating insn 12 into insn 13, replacing: (set (reg/v:VNx64QI 94 [ z0 ]) (unspec:VNx64QI [ (reg/v:VNx16BI 96 [ p0 ]) (mem:VNx64QI (reg/f:DI 99) [0 MEM [(signed char *)_1]+0 S[64, 64] A8]) ] UNSPEC_LDN)) successfully matched this instruction to vec_mask_load_lanesvnx64qivnx16qi: (set (reg/v:VNx64QI 94 [ z0 ]) (unspec:VNx64QI [ (reg/v:VNx16BI 96 [ p0 ]) (mem:VNx64QI (plus:DI (reg/v/f:DI 97 [ x0 ]) (reg:DI 100)) [0 MEM [(signed char *)_1]+0 S[64, 64] A8]) ] UNSPEC_LDN)) All in all this seems like a general problem with earlier optimizations preventing later ones and surely all of those could be fixed individually. Still, the question remains if the general approach is useful or desired or if we not rather prevent more optimizations that we gain. Suggestions welcome. I have some riscv tests for it but didn't attach them yet in order to focus on the main part first. Regards Robin gcc/ChangeLog: * fwprop.cc (fwprop_propagation::profitable_p): Add unary handling. (fwprop_propagation::update_register_pressure): New function. (fwprop_propagation::register_pressure_high_p): New function (reg_single_def_for_src_p): Look through unary expressions. (try_fwprop_subst_pattern): Check register pressure. (forward_propagate_into): Call new function. (fwprop_init): Init register pressure. (fwprop_done): Clean up register pressure. (fwprop_insn): Add comment. --- gcc/fwprop.cc | 307 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 300 insertions(+), 7 deletions(-) diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 0707a234726..413fe4e7335 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -36,6 +36,10 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "rtl-iter.h" #include "target.h" +#include "dominance.h" + +#include "ira.h" +#include "regpressure.h" /* This pass does simple forward propagation and simplification when an operand of an insn can only come from a single def. This pass uses @@ -103,6 +107,10 @@ using namespace rtl_ssa; static int num_changes; +/* Keep track of which registers already increased the pressure to avoid double + booking. */ +sbitmap pressure_accounted; + /* Do not try to replace constant addresses or addresses of local and argument slots. These MEM expressions are made only once and inserted in many instructions, as well as being used to control symbol table @@ -181,6 +189,8 @@ namespace bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; bool profitable_p () const; + bool register_pressure_high_p (rtx, rtx, rtx_insn *, rtx_insn *) const; + void update_register_pressure (rtx, rtx, rtx_insn *, rtx_insn *) const; bool check_mem (int, rtx) final override; void note_simplification (int, uint16_t, rtx, rtx) final override; @@ -332,25 +342,243 @@ fwprop_propagation::profitable_p () const && (result_flags & PROFITABLE)) return true; - if (REG_P (to)) + /* Only continue with an unary operation if we consider register + pressure. */ + rtx what = copy_rtx (to); + if (UNARY_P (what) && flag_ira_hoist_pressure) + what = XEXP (what, 0); + + if (REG_P (what)) return true; - if (GET_CODE (to) == SUBREG - && REG_P (SUBREG_REG (to)) - && !paradoxical_subreg_p (to)) + if (GET_CODE (what) == SUBREG + && REG_P (SUBREG_REG (what)) + && !paradoxical_subreg_p (what)) return true; - if (CONSTANT_P (to)) + if (CONSTANT_P (what)) return true; return false; } -/* Check that X has a single def. */ +/* Check if the register pressure in any predecessor block of USE's block + until DEF's block is equal or higher to the number of hardregs in NU's + register class. */ +bool +fwprop_propagation::register_pressure_high_p (rtx nu, rtx old, rtx_insn *def, + rtx_insn *use) const +{ + enum reg_class nu_class, old_class; + int nu_nregs, old_nregs; + nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs); + old_class + = regpressure_get_regno_pressure_class (REGNO (old), &old_nregs); + + if (nu_class == NO_REGS && old_class == NO_REGS) + return true; + + if (nu_class == old_class) + return false; + + basic_block bbfrom = BLOCK_FOR_INSN (def); + basic_block bbto = BLOCK_FOR_INSN (use); + + basic_block bb; + + sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun)); + bitmap_clear (visited); + auto_vec q; + q.safe_push (bbto); + + while (!q.is_empty ()) + { + bb = q.pop (); + + if (bitmap_bit_p (visited, bb->index)) + continue; + + /* Nothing to do if the register to be replaced is not live + in this BB. */ + if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old))) + continue; + + /* Nothing to do if the replacement register is already live in + this BB. */ + if (regpressure_is_live_in (bb, REGNO (nu))) + continue; + + bitmap_set_bit (visited, bb->index); + + int press = regpressure_get (bb, nu_class); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "pressure for reg %d in bb %d: %d\n", + REGNO (nu), bb->index, press); + + if (press + nu_nregs > ira_class_hard_regs_num[nu_class]) + return true; + + if (bb == bbfrom || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)) + break; + + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, bb->preds) + { + basic_block pred = e->src; + q.safe_push (pred); + } + } + + sbitmap_free (visited); + + return false; +} + +/* Update the register pressure when propagating from DEF to USE + replacing OLD with NU. */ +void +fwprop_propagation::update_register_pressure (rtx nu, rtx old, rtx_insn *def, + rtx_insn *use) const +{ + basic_block bb; + + enum reg_class nu_class, old_class; + int nu_nregs, old_nregs; + nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs); + old_class = regpressure_get_regno_pressure_class (REGNO (old), &old_nregs); + + if (nu_class == old_class) + return; + + basic_block bbfrom = BLOCK_FOR_INSN (def); + basic_block bbto = BLOCK_FOR_INSN (use); + + sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun)); + bitmap_clear (visited); + auto_vec q; + q.safe_push (bbto); + + /* Precompute the users. */ + auto_vec users; + df_ref duse; + bool should_decrease = true; + FOR_EACH_INSN_USE (duse, def) + { + rtx dreg = DF_REF_REAL_REG (duse); + if (REGNO (dreg) != REGNO (old)) + continue; + + df_ref op_ref = DF_REG_USE_CHAIN (REGNO (dreg)); + for (; op_ref; op_ref = DF_REF_NEXT_REG (op_ref)) + { + if (!DF_REF_INSN_INFO (op_ref)) + continue; + + rtx_insn *insn = DF_REF_INSN (op_ref); + + /* If there are other users in BBTO, never decrease the + pressure. */ + if (BLOCK_FOR_INSN (insn) == bbto && insn != use) + { + should_decrease = false; + break; + } + + users.safe_push (insn); + } + } + + while (!q.is_empty ()) + { + bb = q.pop (); + + if (bitmap_bit_p (visited, bb->index)) + continue; + + /* Nothing to do if the register to be replaced is not live + in this BB. */ + if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old))) + continue; + + /* Nothing to do if the new register is already live in this BB. */ + if (regpressure_is_live_in (bb, REGNO (nu))) + continue; + + bitmap_set_bit (visited, bb->index); + + if (regpressure_is_live_in (bb, REGNO (old))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "increasing pressure for " + "reg %d in bb %d by %d regs.\n", + REGNO (nu), bb->index, nu_nregs); + regpressure_set_live_in (bb, REGNO (nu)); + regpressure_increase (bb, nu_class, nu_nregs); + + if (should_decrease) + { + rtx_insn *user; + int ix; + bool should_decrease_local = true; + + /* If this BB dominates no BB where OLD is used (except BBTO) + the register pressure is decreased. */ + FOR_EACH_VEC_ELT(users, ix, user) + { + basic_block ubb = BLOCK_FOR_INSN (user); + if (ubb == bbto) + continue; + if (dominated_by_p (CDI_DOMINATORS, ubb, bb)) + { + should_decrease_local = false; + break; + } + } + + if (should_decrease_local) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "decreasing pressure for " + "reg %d in bb %d by %d regs.\n", + REGNO (nu), bb->index, old_nregs); + regpressure_clear_live_in (bb, REGNO (old)); + regpressure_decrease (bb, old_class, old_nregs); + } + } + } + + if (bb == bbfrom || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)) + break; + + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, bb->preds) + { + basic_block pred = e->src; + q.safe_push (pred); + } + } + + sbitmap_free (visited); +} + +/* Check that X has a single def. Also look through unary operations if + we take register pressure into consideration. */ static bool reg_single_def_p (rtx x) { + if (UNARY_P (x) && flag_ira_hoist_pressure) + { + x = XEXP (x, 0); + + if (SUBREG_P (x) && !contains_paradoxical_subreg_p (x)) + x = SUBREG_REG (x); + } + return REG_P (x) && crtl->ssa->single_dominating_def (REGNO (x)); } @@ -490,6 +718,53 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, } } + if (ok && UNARY_P (src)) + { + /* Currently the register classes can only be different with a + unary operation like vec_duplicate. + In that case the propagation can increase the register pressure + on the way to the target BB. + Therefore, we first check if the pressure of the respective register + is already high, i.e. equals or exceeds the number of hardregs for + that class. If not, we allow the propagation and update the + pressure. */ + + rtx src1 = src; + + /* Strip unary operation and possible subreg. */ + src1 = XEXP (src, 0); + if (GET_CODE (src1) == SUBREG) + src1 = SUBREG_REG (src1); + + rtx_insn *def_rtl = def_insn->rtl (); + + if (REG_P (dest) && REG_P (src1)) + { + if (!bitmap_bit_p (pressure_accounted, REGNO (src1))) + { + if (!prop.register_pressure_high_p (src1, dest, def_rtl, use_rtl)) + { + prop.update_register_pressure (src1, dest, def_rtl, use_rtl); + bitmap_set_bit (pressure_accounted, REGNO (src1)); + } + else + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "cannot propagate from insn %d into" + " insn %d: %s\n", def_insn->uid (), + use_insn->uid (), + "would increase register pressure beyond number" + " of hard regs"); + ok = false; + } + } + else + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "pressure for reg %d has already" + " been accounted\n", REGNO (src1)); + } + } + if (!ok) { /* The pattern didn't match, but if all uses of SRC folded to @@ -859,7 +1134,7 @@ forward_propagate_into (use_info *use, bool reg_prop_only = false) if ((reg_prop_only || (def_loop != use_loop && !flow_loop_nested_p (use_loop, def_loop))) - && (!reg_single_def_p (dest) || !reg_single_def_p (src))) + && !reg_single_def_p (dest)) return false; /* Don't substitute into a non-local goto, this confuses CFG. */ @@ -890,6 +1165,14 @@ fwprop_init (void) df_analyze (); crtl->ssa = new rtl_ssa::function_info (cfun); + + /* Calculate register pressure for each basic block. */ + if (flag_ira_hoist_pressure) + { + regpressure_init (); + + pressure_accounted = sbitmap_alloc (DF_REG_SIZE (df)); + } } static void @@ -910,6 +1193,12 @@ fwprop_done (void) fprintf (dump_file, "\nNumber of successful forward propagations: %d\n\n", num_changes); + + if (flag_ira_hoist_pressure) + { + regpressure_cleanup (); + sbitmap_free (pressure_accounted); + } } /* Try to optimize INSN, returning true if something changes. @@ -919,6 +1208,10 @@ fwprop_done (void) static bool fwprop_insn (insn_info *insn, bool fwprop_addr_p) { + /* TODO: when rejecting propagations due to register pressure + we would actually like to try the propagation with most + potential = uses first instead of going through them + sequentially. */ for (use_info *use : insn->uses ()) { if (use->is_mem ()) -- 2.41.0