public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
To: "Richard Biener" <richard.guenther@gmail.com>
Cc: rguenther <rguenther@suse.de>,
	 gcc-patches <gcc-patches@gcc.gnu.org>,
	 kito.cheng <kito.cheng@gmail.com>,
	 richard.sandiford <richard.sandiford@arm.com>,
	 jeffreyalaw <jeffreyalaw@gmail.com>,
	 apinski <apinski@marvell.com>
Subject: Re: Re: [PATCH] CPROP: Allow cprop optimization when the function has a single block
Date: Thu, 2 Feb 2023 09:49:43 +0800	[thread overview]
Message-ID: <AE1E047A272B75D7+20230202094942622550102@rivai.ai> (raw)
In-Reply-To: <0CEDD257-703F-4ED3-B927-C21279557CE6@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 14114 bytes --]

Yeah, Thanks. You are right. CSE should do the job. 
Now I know the reason CSE failed to optimize is I include VL_REGNUM(66)/VTYPE_RENUM(67) hard reg
as the dependency of pred_broadcast:
(insn 19 18 20 4 (set (reg:VNx1DI 152)
>         (if_then_else:VNx1DI (unspec:VNx1BI [
>                     (const_vector:VNx1BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (const_int 4 [0x4])
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (vec_duplicate:VNx1DI (reg/v:DI 148 [ x ]))
>             (unspec:VNx1DI [
>                     (const_int 0 [0])
>                 ] UNSPEC_VUNDEF))) "rvv.c":22:23 695 {pred_broadcastvnx1di}
>      (nil))
Then CSE failed to set the 152 as copy.

VL_REGNUM(66)/VTYPE_RENUM(67) are the global hard reg that I should make each RVV instruction depend on them.
Since we use vsetvl instruction (which is setting global VL_REGNUM(66)/VTYPE_RENUM(67) status) to set the global status for
each RVV instruction. 
Including the dependency here is to make sure the global VL/VTYPE status is correct of each RVV instruction. (If we don't include
such dependency in RVV instruction, instruction scheduling may move the RVV instructions and vsetvl instructions randomly then
produce incorrect vsetvl configuration)

The original reg_class of VL_REGNUM(66)/VTYPE_RENUM(67) I set here:
riscv_regno_to_class [VL_REGNUM] = VL_REGS;
riscv_regno_to_class [VTYPE_RENUM] = VTYPE_REGS;
Such configuration make CSE failed.

However, if I change the reg_class :
riscv_regno_to_class [VL_REGNUM] = NO_REGS;
riscv_regno_to_class [VTYPE_RENUM] = NO_REGS;
The CSE now can do the optimization now!

1) Would you mind telling me the difference between them?
2) If I set these 2 global status register as NO_REGS, will it create issues for the global status configuration of each RVV instructions ?

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-02-01 23:46
To: juzhe.zhong
CC: rguenther; gcc-patches; kito.cheng; richard.sandiford; jeffreyalaw; apinski
Subject: Re: [PATCH] CPROP: Allow cprop optimization when the function has a single block


Am 01.02.2023 um 13:57 schrieb juzhe.zhong@rivai.ai:

 
>> I see loads of UNSPECs, that might explain why some passes do something
>> and some not.  That said, not sure what exactly CPROP does, the two
>> pred_broadcast insns look exactly the same so CSE should CSE them?
Yes, the "source" these 2 pred_broadcast the same. However, they have different pseudos in their "dest" (operands[0])
The first one is 151, the second one is 152. And the result in these 2 pseudos are the same.
The first pred_add use 151, the second pred_add use 152.

CSE should do the job to eliminate the second pred_broadcast, but I am not sure whether  CSE can propagate the 151 pseudo
to the second pred_add ??

It should do that.  At least it should make setting 152 a copy.  So you need to figure why it doesn’t do that?  Maybe it uses costs and those say recomputing is cheaper than a copy?



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-02-01 20:51
To: juzhe.zhong@rivai.ai
CC: gcc-patches; kito.cheng; richard.sandiford; jeffreyalaw; apinski
Subject: Re: Re: [PATCH] CPROP: Allow cprop optimization when the function has a single block
On Wed, 1 Feb 2023, juzhe.zhong@rivai.ai wrote:
 
>  I don't known whether CSE do the job. What I saw is CPROP do the optimization when we have more than 1 block.
> 
> This the RTL dump before CPROP:
> 
> (insn 19 18 20 4 (set (reg:VNx1DI 151)
>         (if_then_else:VNx1DI (unspec:VNx1BI [
>                     (const_vector:VNx1BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (const_int 4 [0x4])
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (vec_duplicate:VNx1DI (reg/v:DI 148 [ x ]))
>             (unspec:VNx1DI [
>                     (const_int 0 [0])
>                 ] UNSPEC_VUNDEF))) "rvv.c":22:23 695 {pred_broadcastvnx1di}
>      (nil))
> (insn 20 19 21 4 (set (reg/v:VNx1DI 139 [ v3 ])
>         (if_then_else:VNx1DI (unspec:VNx1BI [
>                     (const_vector:VNx1BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (const_int 4 [0x4])
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (plus:VNx1DI (reg/v:VNx1DI 138 [ v2 ])
>                 (reg:VNx1DI 151))
>             (unspec:VNx1DI [
>                     (const_int 0 [0])
>                 ] UNSPEC_VUNDEF))) "rvv.c":22:23 1528 {pred_addvnx1di}
>      (expr_list:REG_DEAD (reg:VNx1DI 151)
>         (nil)))
> (insn 21 20 22 4 (set (reg:VNx1DI 152)
>         (if_then_else:VNx1DI (unspec:VNx1BI [
>                     (const_vector:VNx1BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (const_int 4 [0x4])
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (vec_duplicate:VNx1DI (reg/v:DI 148 [ x ]))
>             (unspec:VNx1DI [
>                     (const_int 0 [0])
>                 ] UNSPEC_VUNDEF))) "rvv.c":23:23 695 {pred_broadcastvnx1di}
>      (nil))
> (insn 22 21 23 4 (set (reg/v:VNx1DI 140 [ v4 ])
>         (if_then_else:VNx1DI (unspec:VNx1BI [
>                     (const_vector:VNx1BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (const_int 4 [0x4])
>                     (const_int 0 [0])
>                     (const_int 2 [0x2])
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (plus:VNx1DI (reg/v:VNx1DI 138 [ v2 ])
>                 (reg:VNx1DI 152))
>             (reg/v:VNx1DI 139 [ v3 ]))) "rvv.c":23:23 1528 {pred_addvnx1di}
>      (expr_list:REG_DEAD (reg:VNx1DI 152)
>         (expr_list:REG_DEAD (reg/v:VNx1DI 139 [ v3 ])
>             (expr_list:REG_DEAD (reg/v:VNx1DI 138 [ v2 ])
>                 (nil)))))
> 
> After CRPOP:
> (insn 15 14 16 4 (set (reg:VNx1DI 147)
>         (if_then_else:VNx1DI (unspec:VNx1BI [
>                     (const_vector:VNx1BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (const_int 4 [0x4])
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (vec_duplicate:VNx1DI (reg/v:DI 143 [ x ]))
>             (unspec:VNx1DI [
>                     (const_int 0 [0])
>                 ] UNSPEC_VUNDEF))) "rvv.c":11:23 695 {pred_broadcastvnx1di}
>      (nil))
> (insn 16 15 18 4 (set (reg/v:VNx1DI 139 [ v3 ])
>         (if_then_else:VNx1DI (unspec:VNx1BI [
>                     (const_vector:VNx1BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (const_int 4 [0x4])
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (plus:VNx1DI (reg/v:VNx1DI 138 [ v2 ])
>                 (reg:VNx1DI 147))
>             (unspec:VNx1DI [
>                     (const_int 0 [0])
>                 ] UNSPEC_VUNDEF))) "rvv.c":11:23 1528 {pred_addvnx1di}
>      (expr_list:REG_DEAD (reg:VNx1DI 147)
>         (expr_list:REG_DEAD (reg/v:VNx1DI 138 [ v2 ])
>             (nil))))
> (insn 18 16 19 4 (set (reg/v:VNx1DI 140 [ v4 ])
>         (if_then_else:VNx1DI (unspec:VNx1BI [
>                     (const_vector:VNx1BI repeat [
>                             (const_int 1 [0x1])
>                         ])
>                     (const_int 4 [0x4])
>                     (const_int 2 [0x2]) repeated x2
>                     (const_int 0 [0])
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (plus:VNx1DI (reg/v:VNx1DI 139 [ v3 ])
>                 (reg:VNx1DI 147))
>             (unspec:VNx1DI [
>                     (const_int 0 [0])
>                 ] UNSPEC_VUNDEF))) "rvv.c":12:23 1528 {pred_addvnx1di}
>      (expr_list:REG_DEAD (reg:VNx1DI 148)
>         (expr_list:REG_DEAD (reg/v:VNx1DI 139 [ v3 ])
>             (nil))))
> 
> You can see CPROP remove the second the "pred_broadcast" instruction and propagate the result to the second "pred_add" instruction?
 
I see loads of UNSPECs, that might explain why some passes do something
and some not.  That said, not sure what exactly CPROP does, the two
pred_broadcast insns look exactly the same so CSE should CSE them?
 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2023-02-01 20:40
> To: Ju-Zhe Zhong
> CC: gcc-patches; kito.cheng; richard.sandiford; jeffreyalaw; apinski
> Subject: Re: [PATCH] CPROP: Allow cprop optimization when the function has a single block
> On Wed, 1 Feb 2023, juzhe.zhong@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > 
> > Hi, this patch is present for GCC 14 since I understand it's not appropriate
> > to land it in GCC 13.
> > 
> > NUM_FIXED_BLOCKS = 2 since GCC define each function has aleast 2 blocks
> > one is entry block, the other is exit block.
> > So according this code, the function will not do cprop optimization when
> > there is only exactly one single block.
>  
> cprop / GCSE are global dataflow problems, there's "nothing" to do for
> a single block, the local problem plus its application isn't considered
> but probably left for CSE.
>  
> Why does CSE not perform the desired transform?
>  
> > I am not sure whether it's correct to fix it like this.
> > Can anyone tell me why forbid cprop optimization if the function only has s
> > single block ?
> > 
> > Let's take a look at these 2 examples of RVV intrinsics:
> > 1.    void f1 (void * in, void *out, int64_t x, int n)
> >       {
> >       vint64m1_t v = __riscv_vle64_v_i64m1 (in + 1, 4);
> >       vint64m1_t v2 = __riscv_vle64_v_i64m1_tu (v, in + 2, 4);
> >       vint64m1_t v3 = __riscv_vadd_vx_i64m1 (v2, x, 4);
> >       vint64m1_t v4 = __riscv_vadd_vx_i64m1 (v3, x, 4);
> >       __riscv_vse64_v_i64m1 (out + 2, v4, 4);
> >       }
> > 
> > asm:
> >      addi sp,sp,-16
> > sw a2,8(sp)
> > sw a3,12(sp)
> > sw a2,0(sp)
> > sw a3,4(sp)
> > addi a5,a0,1
> > vsetivli zero,4,e64,m1,ta,ma
> > addi a0,a0,2
> > vle64.v v24,0(a5)
> > addi a5,sp,8
> > vlse64.v v27,0(a5),zero
> > addi a1,a1,2
> > vsetvli zero,zero,e64,m1,tu,ma
> > vle64.v v24,0(a0)
> > vsetvli zero,zero,e64,m1,ta,ma
> > vlse64.v v25,0(sp),zero
> > vadd.vv v26,v24,v27
> > vadd.vv v24,v26,v25
> > vse64.v v24,0(a1)
> > addi sp,sp,16
> > jr ra
> > you can see here there are 2 vlse64.v instructions that broadcast the scalar value "x"
> > GCC fail to eliminate the second vlse64.v instruction since GCC doesn't do the cprop
> > optimization (the function only has 1 single block). It can be optimized if we apply
> > this patch.
> > 
> > 2. void f1 (void * in, void *out, int64_t x, int n)
> > {
> >     if (n) {
> >       vint64m1_t v = __riscv_vle64_v_i64m1 (in + 1, 4);
> >       vint64m1_t v2 = __riscv_vle64_v_i64m1_tu (v, in + 2, 4);
> >       vint64m1_t v3 = __riscv_vadd_vx_i64m1 (v2, x, 4);
> >       vint64m1_t v4 = __riscv_vadd_vx_i64m1 (v3, x, 4);
> >       __riscv_vse64_v_i64m1 (out + 2, v4, 4);
> >     }
> > }
> > 
> > asm:
> >   f1:
> > vsetivli zero,4,e64,m1,ta,ma
> > beq a4,zero,.L7
> > addi sp,sp,-16
> > sw a2,8(sp)
> > sw a3,12(sp)
> > addi a5,a0,1
> > vle64.v v24,0(a5)
> > addi a0,a0,2
> > addi a5,sp,8
> > vlse64.v v25,0(a5),zero
> > addi a1,a1,2
> > vsetvli zero,zero,e64,m1,tu,ma
> > vle64.v v24,0(a0)
> > vadd.vv v26,v24,v25
> > vadd.vv v24,v26,v25
> > vse64.v v24,0(a1)
> > addi sp,sp,16
> > jr ra
> > .L7:
> > ret
> > 
> > Here, if we add if (n) condition here, the program will end up with more than 1 block.
> > So GCC will do the cprop optimization and the second vlse64.v instruction is eliminated.
> > 
> > I am not sure whether this patch is correct.
> > Can anyone help me with that ?
> > Thanks.
> > 
> > 
> > gcc/ChangeLog:
> > 
> >         * cprop.cc (one_cprop_pass): Remove +1.
> > 
> > ---
> >  gcc/cprop.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gcc/cprop.cc b/gcc/cprop.cc
> > index 6ec0bda4a24..615bc5078b6 100644
> > --- a/gcc/cprop.cc
> > +++ b/gcc/cprop.cc
> > @@ -1749,7 +1749,7 @@ one_cprop_pass (void)
> >    int changed = 0;
> >  
> >    /* Return if there's nothing to do, or it is too expensive.  */
> > -  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
> > +  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS
> >        || gcse_or_cprop_is_too_expensive (_ ("const/copy propagation disabled")))
> >      return 0;
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 

  parent reply	other threads:[~2023-02-02  1:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-01 12:24 juzhe.zhong
2023-02-01 12:40 ` Richard Biener
2023-02-01 12:47   ` juzhe.zhong
2023-02-01 12:51     ` Richard Biener
2023-02-01 12:56       ` juzhe.zhong
     [not found]         ` <0CEDD257-703F-4ED3-B927-C21279557CE6@gmail.com>
2023-02-02  1:49           ` juzhe.zhong [this message]
2023-02-02 12:26             ` Richard Biener
2023-02-02 12:35               ` juzhe.zhong
2023-02-02 14:36                 ` Jeff Law
2023-02-02 14:43                   ` juzhe.zhong
2023-02-02 15:18                   ` Kito Cheng
2023-02-02 14:36               ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AE1E047A272B75D7+20230202094942622550102@rivai.ai \
    --to=juzhe.zhong@rivai.ai \
    --cc=apinski@marvell.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@gmail.com \
    --cc=rguenther@suse.de \
    --cc=richard.guenther@gmail.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).