public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
*  [PATCH PR95696] regrename creates overlapping register allocations for vliw
@ 2020-07-16 16:18 zhongyunde
  2020-07-20  0:59 ` Zhongyunde
  0 siblings, 1 reply; 5+ messages in thread
From: zhongyunde @ 2020-07-16 16:18 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

hi,   Insometarget,itislimitedtoissuetwoinsnstogetherwithchangethesameregister,so Imakeapatchtoextendtheliverangeuntiltheendofvliwtoavoidit.(Theinsn73startwithinsn:TI,soitwillbeissuedtogetherwithothersinsnsuntilanewinsnstartwithinsn:TI,suchasinsn71)TheregrenamecanknownthemodeV2VFininsn73needtwosuccessiveregisters,i.e.v2andv3,hereisdumpsnippetbeforetheregrename.(insn:TI7376714(set(reg/v:V2VF37v2[orig:180_62][180])(unspec:V2VF[(reg/v:VHF43v8[orig:210Dest_value][210])(reg/v:VHF43v8[orig:210Dest_value][210])]UNSPEC_HFSQMAG_32X32))"../test_modify.c":57710{hfsqmag_v2vf}(expr_list:REG_DEAD(reg/v:VHF43v8[orig:210Dest_value][210])(expr_list:REG_UNUSED(reg:VHF38v3)(expr_list:REG_STAGE(const_int2[0x2])(expr_list:REG_CYCLE(const_int2[0x2])(expr_list:REG_UNITS(const_int256[0x100])(nil)))))))(insn71732434(set(reg:VHF43v8[orig:265MEM[(constvfloat32x16*)Src_base_134]][265])(mem:VHF(reg/v/f:DI13a13[orig:207Src_base][207])[1MEM[(constvfloat32x16*)Src_base_134]+0S64A512]))"../test_modify.c":56450{movvhf_internal}(expr_list:REG_STAGE(const_int1[0x1])(expr_list:REG_CYCLE(const_int2[0x2])(nil))))Then,intheregrename,theinsn71willbetransformedintofollowingcodewithregisterv3,sothereisanconflictbetweeninsn73andinsn71,asbothofthemsetthev3register.Registerv2(2):73[SVEC_REGS]Registerv8(1):71[VEC_ALL_REGS]....(insn71732434(set(reg:VHF38v3[orig:265MEM[(constvfloat32x16*)Src_base_134]][265])(mem:VHF(reg/v/f:DI13a13[orig:207Src_base][207])[1MEM[(constvfloat32x16*)Src_base_134]+0S64A512]))"../test_modify.c":56450{movvhf_internal}(expr_list:REG_STAGE(const_int1[0x1]) (expr_list:REG_CYCLE(const_int2[0x2]) 随心邮-在微信里收发邮件,及时省电又安心

[-- Attachment #2: PR95696.patch --]
[-- Type: application/octet-stream, Size: 2658 bytes --]

diff --git a/gcc/regrename.c b/gcc/regrename.c
index c38173a77..e54794413 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1614,12 +1614,26 @@ record_out_operands (rtx_insn *insn, bool earlyclobber, insn_rr_info *insn_info)
   cur_operand = NULL;
 }
 
+/* Get the first real insn of next vliw in current BB.  */
+static rtx_insn *
+get_next_vliw_first_insn (rtx_insn *cur_insn, basic_block bb)
+{
+  rtx_insn *insn = next_real_insn (cur_insn);
+
+  for (; insn && insn != BB_END (bb); insn = next_real_insn (insn))
+    if (GET_MODE (insn) == TImode)
+      return insn;
+
+  return cur_insn;
+}
+
 /* Build def/use chain.  */
 
 static bool
 build_def_use (basic_block bb)
 {
   rtx_insn *insn;
+  rtx_insn *vliw_start_insn = NULL;
   unsigned HOST_WIDE_INT untracked_operands;
 
   fail_current_block = false;
@@ -1663,6 +1677,9 @@ build_def_use (basic_block bb)
 	     to be marked unrenamable or even cause us to abort the entire
 	     basic block.  */
 
+	  if (GET_MODE (insn) == TImode)
+	    vliw_start_insn = insn;
+
 	  extract_constrain_insn (insn);
 	  preprocess_constraints (insn);
 	  const operand_alternative *op_alt = which_op_alt ();
@@ -1858,17 +1875,26 @@ build_def_use (basic_block bb)
 	      scan_rtx (insn, &XEXP (note, 0), ALL_REGS, mark_access,
 			OP_INOUT);
 
-	  /* Step 7: Close chains for registers that were never
-	     really used here.  */
-	  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
-	    if (REG_NOTE_KIND (note) == REG_UNUSED)
-	      {
-		remove_from_hard_reg_set (&live_hard_regs,
-					  GET_MODE (XEXP (note, 0)),
-					  REGNO (XEXP (note, 0)));
-		scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead,
-			  OP_IN);
-	      }
+	  /* Step 7: Close chains for registers that were never
+	     really used delayed at the end of vliw.  */
+	  if (vliw_start_insn
+	      && next_real_insn (insn) == get_next_vliw_first_insn (insn, bb))
+	    {
+	      rtx_insn *member;
+
+	      for (member = vliw_start_insn;
+	           member != next_active_insn (insn);
+	           member = next_active_insn (member))
+	        for (note = REG_NOTES (member); note; note = XEXP (note, 1))
+	          if (REG_NOTE_KIND (note) == REG_UNUSED)
+	            {
+	              remove_from_hard_reg_set (&live_hard_regs,
+	                                        GET_MODE (XEXP (note, 0)),
+	                                        REGNO (XEXP (note, 0)));
+	              scan_rtx (member, &XEXP (note, 0), NO_REGS, terminate_dead,
+	                        OP_IN);
+	            }
+	    }
 
 	  /* Step 8: Kill the chains involving register restores.  Those
 	     should restore _that_ register.  */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH PR95696] regrename creates overlapping register allocations for vliw
  2020-07-16 16:18 [PATCH PR95696] regrename creates overlapping register allocations for vliw zhongyunde
@ 2020-07-20  0:59 ` Zhongyunde
  2020-07-20 16:05   ` Richard Sandiford
  0 siblings, 1 reply; 5+ messages in thread
From: Zhongyunde @ 2020-07-20  0:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Zhanghaijian (A), Yangfei (A)

[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]

Hi,

In most target, it is limited to issue two insns with change the same register. So a register is not realy unused if there is another insn, which set the register in the save VLIW.

For example, The insn 73 start with insn:TI, so it will be issued together with others insns until a new insn start with insn:TI, here is insn 243.

The regrename pass known the mode V2VF in insn 73 need two successive registers, i.e. v2 and v3, here is dump snippet before the regrename:



==========================================================================================

(insn:TI 73 76 71 4 (set (reg/v:V2VF 37 v2 [orig:180 _62 ] [180])

        (unspec:V2VF [

                (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])

                (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])

            ] UNSPEC_HFSQMAG_32X32)) "../test_modify.c":57 710 {hfsqmag_v2vf}

     (expr_list:REG_DEAD (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])

        (expr_list:REG_UNUSED (reg:VHF 38 v3)

            (expr_list:REG_STAGE (const_int 2 [0x2])

                (expr_list:REG_CYCLE (const_int 2 [0x2])

                    (expr_list:REG_UNITS (const_int 256 [0x100])

                        (nil)))))))



(insn 71 73 243 4 (set (reg:VHF 43 v8 [orig:265 MEM[(const vfloat32x16 *)Src_base_134] ] [265])

        (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 MEM[(const vfloat32x16 *)Src_base_134]+0 S64 A512])) "../test_modify.c":56 450 {movvhf_internal}

     (expr_list:REG_STAGE (const_int 1 [0x1])

        (expr_list:REG_CYCLE (const_int 2 [0x2])

            (nil))))



(insn:TI 243 …



Then, in the regrename, the insn 71 will be transformed as following code with register v3, and there is an conflict between insn 73 and insn 71 (as both of them set the v3 register).



Register v2 (2): 73 [SVEC_REGS]

Register v8 (1): 71 [VEC_ALL_REGS]



....



(insn 71 73 243 4 (set (reg:VHF 38 v3 [orig:265 MEM[(const vfloat32x16 *)Src_base_134] ] [265])

        (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 MEM[(const vfloat32x16 *)Src_base_134]+0 S64 A512])) "../test_modify.c":56 450 {movvhf_internal}

     (expr_list:REG_STAGE (const_int 1 [0x1])

        (expr_list:REG_CYCLE (const_int 2 [0x2])

[-- Attachment #2: PR95696.patch --]
[-- Type: application/octet-stream, Size: 2736 bytes --]

diff --git a/gcc/regrename.c b/gcc/regrename.c
index c38173a77..e54794413 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1614,12 +1614,26 @@ record_out_operands (rtx_insn *insn, bool earlyclobber, insn_rr_info *insn_info)
   cur_operand = NULL;
 }
 
+/* Get the first real insn of next vliw in current BB.  */
+static rtx_insn *
+get_next_vliw_first_insn (rtx_insn *cur_insn, basic_block bb)
+{
+  rtx_insn *insn = next_real_insn (cur_insn);
+
+  for (; insn && insn != BB_END (bb); insn = next_real_insn (insn))
+    if (GET_MODE (insn) == TImode)
+      return insn;
+
+  return cur_insn;
+}
+
 /* Build def/use chain.  */
 
 static bool
 build_def_use (basic_block bb)
 {
   rtx_insn *insn;
+  rtx_insn *vliw_start_insn = NULL;
   unsigned HOST_WIDE_INT untracked_operands;
 
   fail_current_block = false;
@@ -1663,6 +1677,9 @@ build_def_use (basic_block bb)
 	     to be marked unrenamable or even cause us to abort the entire
 	     basic block.  */
 
+	  if (GET_MODE (insn) == TImode)
+	    vliw_start_insn = insn;
+
 	  extract_constrain_insn (insn);
 	  preprocess_constraints (insn);
 	  const operand_alternative *op_alt = which_op_alt ();
@@ -1858,17 +1875,26 @@ build_def_use (basic_block bb)
 	      scan_rtx (insn, &XEXP (note, 0), ALL_REGS, mark_access,
 			OP_INOUT);
 
-	  /* Step 7: Close chains for registers that were never
-	     really used here.  */
-	  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
-	    if (REG_NOTE_KIND (note) == REG_UNUSED)
-	      {
-		remove_from_hard_reg_set (&live_hard_regs,
-					  GET_MODE (XEXP (note, 0)),
-					  REGNO (XEXP (note, 0)));
-		scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead,
-			  OP_IN);
-	      }
+	  /* Step 7: Close chains for registers that were never
+	     really used delayed at the end of vliw.  */
+	  if (vliw_start_insn
+	      && next_real_insn (insn) == get_next_vliw_first_insn (insn, bb))
+	    {
+	      rtx_insn *member;
+
+	      for (member = vliw_start_insn;
+	           member != next_active_insn (insn);
+	           member = next_active_insn (member))
+	        for (note = REG_NOTES (member); note; note = XEXP (note, 1))
+	          if (REG_NOTE_KIND (note) == REG_UNUSED)
+	            {
+	              remove_from_hard_reg_set (&live_hard_regs,
+	                                        GET_MODE (XEXP (note, 0)),
+	                                        REGNO (XEXP (note, 0)));
+	              scan_rtx (member, &XEXP (note, 0), NO_REGS, terminate_dead,
+	                        OP_IN);
+	            }
+	    }
 
 	  /* Step 8: Kill the chains involving register restores.  Those
 	     should restore _that_ register.  */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH PR95696] regrename creates overlapping register allocations for vliw
  2020-07-20  0:59 ` Zhongyunde
@ 2020-07-20 16:05   ` Richard Sandiford
  2020-07-21  7:20     ` 答复: " Zhongyunde
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Sandiford @ 2020-07-20 16:05 UTC (permalink / raw)
  To: Zhongyunde; +Cc: gcc-patches, Yangfei (A)

Hi,

Zhongyunde <zhongyunde@huawei.com> writes:
> Hi,
>
> In most target, it is limited to issue two insns with change the same register. So a register is not realy unused if there is another insn, which set the register in the save VLIW.
>
> For example, The insn 73 start with insn:TI, so it will be issued together with others insns until a new insn start with insn:TI, here is insn 243.
>
> The regrename pass known the mode V2VF in insn 73 need two successive registers, i.e. v2 and v3, here is dump snippet before the regrename:
>
>
>
> ==========================================================================================
>
> (insn:TI 73 76 71 4 (set (reg/v:V2VF 37 v2 [orig:180 _62 ] [180])
>
>         (unspec:V2VF [
>
>                 (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
>                 (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
>             ] UNSPEC_HFSQMAG_32X32)) "../test_modify.c":57 710 {hfsqmag_v2vf}
>
>      (expr_list:REG_DEAD (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
>         (expr_list:REG_UNUSED (reg:VHF 38 v3)
>
>             (expr_list:REG_STAGE (const_int 2 [0x2])
>
>                 (expr_list:REG_CYCLE (const_int 2 [0x2])
>
>                     (expr_list:REG_UNITS (const_int 256 [0x100])
>
>                         (nil)))))))
>
>
>
> (insn 71 73 243 4 (set (reg:VHF 43 v8 [orig:265 MEM[(const vfloat32x16 *)Src_base_134] ] [265])
>
>         (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 MEM[(const vfloat32x16 *)Src_base_134]+0 S64 A512])) "../test_modify.c":56 450 {movvhf_internal}
>
>      (expr_list:REG_STAGE (const_int 1 [0x1])
>
>         (expr_list:REG_CYCLE (const_int 2 [0x2])
>
>             (nil))))
>
>
>
> (insn:TI 243 …
>
>
>
> Then, in the regrename, the insn 71 will be transformed as following code with register v3, and there is an conflict between insn 73 and insn 71 (as both of them set the v3 register).
>
>
>
> Register v2 (2): 73 [SVEC_REGS]
>
> Register v8 (1): 71 [VEC_ALL_REGS]
>
>
>
> ....
>
>
>
> (insn 71 73 243 4 (set (reg:VHF 38 v3 [orig:265 MEM[(const vfloat32x16 *)Src_base_134] ] [265])
>
>         (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 MEM[(const vfloat32x16 *)Src_base_134]+0 S64 A512])) "../test_modify.c":56 450 {movvhf_internal}
>
>      (expr_list:REG_STAGE (const_int 1 [0x1])
>
>         (expr_list:REG_CYCLE (const_int 2 [0x2])

Do you need this for correctness, or is it “just” an optimisation?

The reason for asking is that regrename runs quite a long time after
the first scheduling pass, with things like register allocation
happening in between.  The :TI markers on the instructions are
therefore going to be very stale by the time that regrename runs.

If it's just a heuristic then it might be OK to use them anyway,
but it would be worth having a comment to say what's going on.

One alternative would be to run the DFA over the instructions
to recompute the VLIW bundles, but that isn't simple.  It also isn't
necessarily a better heuristic, since the second scheduling pass is
likely to change things anyway.

There's also the problem that :TI markers are used on non-VLIW
targets too, whereas the problem is really specific to VLIW targets.

Unfortunately, I can't think of any good counter-suggestions
at the moment…

Thanks,
Richard

>
> diff --git a/gcc/regrename.c b/gcc/regrename.c
> index c38173a77..e54794413 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -1614,12 +1614,26 @@ record_out_operands (rtx_insn *insn, bool earlyclobber, insn_rr_info *insn_info)
>    cur_operand = NULL;
>  }
>  
> +/* Get the first real insn of next vliw in current BB.  */
> +static rtx_insn *
> +get_next_vliw_first_insn (rtx_insn *cur_insn, basic_block bb)
> +{
> +  rtx_insn *insn = next_real_insn (cur_insn);
> +
> +  for (; insn && insn != BB_END (bb); insn = next_real_insn (insn))
> +    if (GET_MODE (insn) == TImode)
> +      return insn;
> +
> +  return cur_insn;
> +}
> +
>  /* Build def/use chain.  */
>  
>  static bool
>  build_def_use (basic_block bb)
>  {
>    rtx_insn *insn;
> +  rtx_insn *vliw_start_insn = NULL;
>    unsigned HOST_WIDE_INT untracked_operands;
>  
>    fail_current_block = false;
> @@ -1663,6 +1677,9 @@ build_def_use (basic_block bb)
>  	     to be marked unrenamable or even cause us to abort the entire
>  	     basic block.  */
>  
> +	  if (GET_MODE (insn) == TImode)
> +	    vliw_start_insn = insn;
> +
>  	  extract_constrain_insn (insn);
>  	  preprocess_constraints (insn);
>  	  const operand_alternative *op_alt = which_op_alt ();
> @@ -1858,17 +1875,26 @@ build_def_use (basic_block bb)
>  	      scan_rtx (insn, &XEXP (note, 0), ALL_REGS, mark_access,
>  			OP_INOUT);
>  
> -	  /* Step 7: Close chains for registers that were never
> -	     really used here.  */
> -	  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
> -	    if (REG_NOTE_KIND (note) == REG_UNUSED)
> -	      {
> -		remove_from_hard_reg_set (&live_hard_regs,
> -					  GET_MODE (XEXP (note, 0)),
> -					  REGNO (XEXP (note, 0)));
> -		scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead,
> -			  OP_IN);
> -	      }
> +	  /* Step 7: Close chains for registers that were never
> +	     really used delayed at the end of vliw.  */
> +	  if (vliw_start_insn
> +	      && next_real_insn (insn) == get_next_vliw_first_insn (insn, bb))
> +	    {
> +	      rtx_insn *member;
> +
> +	      for (member = vliw_start_insn;
> +	           member != next_active_insn (insn);
> +	           member = next_active_insn (member))
> +	        for (note = REG_NOTES (member); note; note = XEXP (note, 1))
> +	          if (REG_NOTE_KIND (note) == REG_UNUSED)
> +	            {
> +	              remove_from_hard_reg_set (&live_hard_regs,
> +	                                        GET_MODE (XEXP (note, 0)),
> +	                                        REGNO (XEXP (note, 0)));
> +	              scan_rtx (member, &XEXP (note, 0), NO_REGS, terminate_dead,
> +	                        OP_IN);
> +	            }
> +	    }
>  
>  	  /* Step 8: Kill the chains involving register restores.  Those
>  	     should restore _that_ register.  */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* 答复: [PATCH PR95696] regrename creates overlapping register allocations for vliw
  2020-07-20 16:05   ` Richard Sandiford
@ 2020-07-21  7:20     ` Zhongyunde
  2020-07-21 16:12       ` Richard Sandiford
  0 siblings, 1 reply; 5+ messages in thread
From: Zhongyunde @ 2020-07-21  7:20 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, Yangfei (A)

Thanks for your review.

First of all, this is an optimization.
   gcc do sms before reload, and here each insn use pseudo-register. After reload, they are allocated hard-register, then the regrename pass try to adjust the register number with def/use chain created by build_def_use.
 As now gcc doesn't consider the VLIW bundles, so regrename pass may updated a reg which may not really unused, which will bring in invalid VLIW bundles. 
   Before the final schedule, we usually recheck the validation of VLIW bundles, and reschedule the conflicted insns into two VLIW to make them validation to avoid above issue, so this is not a correctness issue. 
 Certainly, reschedule the conflicted insns into two VLIW will destroy the kernel loop's sms schedule result, and usually it will be harmful to the performance.


-----邮件原件-----
发件人: Richard Sandiford [mailto:richard.sandiford@arm.com] 
发送时间: 2020年7月21日 0:05
收件人: Zhongyunde <zhongyunde@huawei.com>
抄送: gcc-patches@gcc.gnu.org; Yangfei (A) <lancelo.yang@huawei.com>
主题: Re: [PATCH PR95696] regrename creates overlapping register allocations for vliw

Hi,

Zhongyunde <zhongyunde@huawei.com> writes:
> Hi,
>
> In most target, it is limited to issue two insns with change the same register. So a register is not realy unused if there is another insn, which set the register in the save VLIW.
>
> For example, The insn 73 start with insn:TI, so it will be issued together with others insns until a new insn start with insn:TI, here is insn 243.
>
> The regrename pass known the mode V2VF in insn 73 need two successive registers, i.e. v2 and v3, here is dump snippet before the regrename:
>
>
>
> ======================================================================
> ====================
>
> (insn:TI 73 76 71 4 (set (reg/v:V2VF 37 v2 [orig:180 _62 ] [180])
>
>         (unspec:V2VF [
>
>                 (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
>                 (reg/v:VHF 43 v8 [orig:210 Dest_value ] [210])
>
>             ] UNSPEC_HFSQMAG_32X32)) "../test_modify.c":57 710 
> {hfsqmag_v2vf}
>
>      (expr_list:REG_DEAD (reg/v:VHF 43 v8 [orig:210 Dest_value ] 
> [210])
>
>         (expr_list:REG_UNUSED (reg:VHF 38 v3)
>
>             (expr_list:REG_STAGE (const_int 2 [0x2])
>
>                 (expr_list:REG_CYCLE (const_int 2 [0x2])
>
>                     (expr_list:REG_UNITS (const_int 256 [0x100])
>
>                         (nil)))))))
>
>
>
> (insn 71 73 243 4 (set (reg:VHF 43 v8 [orig:265 MEM[(const vfloat32x16 
> *)Src_base_134] ] [265])
>
>         (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 
> MEM[(const vfloat32x16 *)Src_base_134]+0 S64 A512])) 
> "../test_modify.c":56 450 {movvhf_internal}
>
>      (expr_list:REG_STAGE (const_int 1 [0x1])
>
>         (expr_list:REG_CYCLE (const_int 2 [0x2])
>
>             (nil))))
>
>
>
> (insn:TI 243 …
>
>
>
> Then, in the regrename, the insn 71 will be transformed as following code with register v3, and there is an conflict between insn 73 and insn 71 (as both of them set the v3 register).
>
>
>
> Register v2 (2): 73 [SVEC_REGS]
>
> Register v8 (1): 71 [VEC_ALL_REGS]
>
>
>
> ....
>
>
>
> (insn 71 73 243 4 (set (reg:VHF 38 v3 [orig:265 MEM[(const vfloat32x16 
> *)Src_base_134] ] [265])
>
>         (mem:VHF (reg/v/f:DI 13 a13 [orig:207 Src_base ] [207]) [1 
> MEM[(const vfloat32x16 *)Src_base_134]+0 S64 A512])) 
> "../test_modify.c":56 450 {movvhf_internal}
>
>      (expr_list:REG_STAGE (const_int 1 [0x1])
>
>         (expr_list:REG_CYCLE (const_int 2 [0x2])

Do you need this for correctness, or is it “just” an optimisation?

The reason for asking is that regrename runs quite a long time after the first scheduling pass, with things like register allocation happening in between.  The :TI markers on the instructions are therefore going to be very stale by the time that regrename runs.

If it's just a heuristic then it might be OK to use them anyway, but it would be worth having a comment to say what's going on.

One alternative would be to run the DFA over the instructions to recompute the VLIW bundles, but that isn't simple.  It also isn't necessarily a better heuristic, since the second scheduling pass is likely to change things anyway.

There's also the problem that :TI markers are used on non-VLIW targets too, whereas the problem is really specific to VLIW targets.

Unfortunately, I can't think of any good counter-suggestions at the moment…

Thanks,
Richard

>
> diff --git a/gcc/regrename.c b/gcc/regrename.c index 
> c38173a77..e54794413 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -1614,12 +1614,26 @@ record_out_operands (rtx_insn *insn, bool earlyclobber, insn_rr_info *insn_info)
>    cur_operand = NULL;
>  }
>  
> +/* Get the first real insn of next vliw in current BB.  */ static 
> +rtx_insn * get_next_vliw_first_insn (rtx_insn *cur_insn, basic_block 
> +bb) {
> +  rtx_insn *insn = next_real_insn (cur_insn);
> +
> +  for (; insn && insn != BB_END (bb); insn = next_real_insn (insn))
> +    if (GET_MODE (insn) == TImode)
> +      return insn;
> +
> +  return cur_insn;
> +}
> +
>  /* Build def/use chain.  */
>  
>  static bool
>  build_def_use (basic_block bb)
>  {
>    rtx_insn *insn;
> +  rtx_insn *vliw_start_insn = NULL;
>    unsigned HOST_WIDE_INT untracked_operands;
>  
>    fail_current_block = false;
> @@ -1663,6 +1677,9 @@ build_def_use (basic_block bb)
>  	     to be marked unrenamable or even cause us to abort the entire
>  	     basic block.  */
>  
> +	  if (GET_MODE (insn) == TImode)
> +	    vliw_start_insn = insn;
> +
>  	  extract_constrain_insn (insn);
>  	  preprocess_constraints (insn);
>  	  const operand_alternative *op_alt = which_op_alt (); @@ -1858,17 
> +1875,26 @@ build_def_use (basic_block bb)
>  	      scan_rtx (insn, &XEXP (note, 0), ALL_REGS, mark_access,
>  			OP_INOUT);
>  
> -	  /* Step 7: Close chains for registers that were never
> -	     really used here.  */
> -	  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
> -	    if (REG_NOTE_KIND (note) == REG_UNUSED)
> -	      {
> -		remove_from_hard_reg_set (&live_hard_regs,
> -					  GET_MODE (XEXP (note, 0)),
> -					  REGNO (XEXP (note, 0)));
> -		scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead,
> -			  OP_IN);
> -	      }
> +	  /* Step 7: Close chains for registers that were never
> +	     really used delayed at the end of vliw.  */
> +	  if (vliw_start_insn
> +	      && next_real_insn (insn) == get_next_vliw_first_insn (insn, bb))
> +	    {
> +	      rtx_insn *member;
> +
> +	      for (member = vliw_start_insn;
> +	           member != next_active_insn (insn);
> +	           member = next_active_insn (member))
> +	        for (note = REG_NOTES (member); note; note = XEXP (note, 1))
> +	          if (REG_NOTE_KIND (note) == REG_UNUSED)
> +	            {
> +	              remove_from_hard_reg_set (&live_hard_regs,
> +	                                        GET_MODE (XEXP (note, 0)),
> +	                                        REGNO (XEXP (note, 0)));
> +	              scan_rtx (member, &XEXP (note, 0), NO_REGS, terminate_dead,
> +	                        OP_IN);
> +	            }
> +	    }
>  
>  	  /* Step 8: Kill the chains involving register restores.  Those
>  	     should restore _that_ register.  */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 答复: [PATCH PR95696] regrename creates overlapping register allocations for vliw
  2020-07-21  7:20     ` 答复: " Zhongyunde
@ 2020-07-21 16:12       ` Richard Sandiford
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Sandiford @ 2020-07-21 16:12 UTC (permalink / raw)
  To: Zhongyunde; +Cc: gcc-patches, Yangfei (A)

Zhongyunde <zhongyunde@huawei.com> writes:
> Thanks for your review.
>
> First of all, this is an optimization.

OK, good.

>    gcc do sms before reload, and here each insn use pseudo-register. After reload, they are allocated hard-register, then the regrename pass try to adjust the register number with def/use chain created by build_def_use.
>  As now gcc doesn't consider the VLIW bundles, so regrename pass may updated a reg which may not really unused, which will bring in invalid VLIW bundles. 
>    Before the final schedule, we usually recheck the validation of VLIW bundles, and reschedule the conflicted insns into two VLIW to make them validation to avoid above issue, so this is not a correctness issue. 
>  Certainly, reschedule the conflicted insns into two VLIW will destroy the kernel loop's sms schedule result, and usually it will be harmful to the performance.

Yeah.  The reason I was worried about the TI markers being stale is
that, in general, register allocation can introduce new spills and
reloads, can add and remove instructions, and can convert instructions
into different forms (e.g. as a result of register elimination).
There are then post-reload optimisers that can change the code further.
All these things could invalidate the VLIW bundling done by the first
scheduler.

It sounds like that's not happening in your motivating testcase, and the
VLIW bundling is still correct (for this loop) by the time that regrename
runs.  Is that right?

It's interesting that this is for a testcase using SMS.  One of the
traditional problems with the GCC implementation of SMS has been
ensuring that later passes don't mess up the scheduled loop.  So in your
testcase, does register allocation succeed for the SMS loop without
invalidating the bundling decisions?

If so, then it's probably better to avoid running regrename on it at all.
It mostly exists to help the second scheduling pass, but the second
scheduling pass shouldn't be messing with an SMS loop anyway.  Also,
although the patch deals with one case in which regrename could disrupt
the bundling, there are others too.

So maybe one option would be to make regrename ignore blocks that have
BB_DISABLE_SCHEDULE set.  (Sorry if that's been discussed and discounted
already.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-21 16:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-16 16:18 [PATCH PR95696] regrename creates overlapping register allocations for vliw zhongyunde
2020-07-20  0:59 ` Zhongyunde
2020-07-20 16:05   ` Richard Sandiford
2020-07-21  7:20     ` 答复: " Zhongyunde
2020-07-21 16:12       ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).