Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <jeffreyalaw@gmail.com>
To: 钟居哲 <juzhe.zhong@rivai.ai>, gcc-patches <gcc-patches@gcc.gnu.org>
Cc: "kito.cheng" <kito.cheng@gmail.com>,
	"kito.cheng" <kito.cheng@sifive.com>, palmer <palmer@dabbelt.com>,
	palmer <palmer@rivosinc.com>, "rdapp.gcc" <rdapp.gcc@gmail.com>
Subject: Re: [PATCH] RISC-V: Support vfwmul.vv combine lowering
Date: Thu, 29 Jun 2023 17:39:22 -0600	[thread overview]
Message-ID: <62be1be8-280f-2989-6a31-32c87dcbadcd@gmail.com> (raw)
In-Reply-To: <70EE22A0D10CD0D8+2023062906005450585022@rivai.ai>



On 6/28/23 16:00, 钟居哲 wrote:
> You can see here:
> 
> https://godbolt.org/z/d78646hWb <https://godbolt.org/z/d78646hWb>
So just to be explicit, I see no difference with that test before/after 
your proposed change.  Nor would I expect one based on my understanding 
of the patch.

The explicit conversions I see are because we need the output of the 
conversion in multiple vfmul instructions.  That won't be helped by the 
patch you've proposed.

To be more concrete:

>        vsetvli t1,t5,e32,mf2,ta,ma     # 99    [c=0 l=4]  vsetvldi
>         vle32.v v2,0(a4)        # 23    [c=4 l=4]  pred_movvnx2sf/1
>         vle32.v v1,0(a5)        # 25    [c=4 l=4]  pred_movvnx2sf/1
>         vsetvli t0,zero,e32,mf2,ta,ma   # 101   [c=0 l=4]  vsetvldi
>         vfwcvt.f.f.v    v3,v2   # 77    [c=4 l=4]  pred_extendvnx2df/0
>         vfwcvt.f.f.v    v2,v1   # 79    [c=4 l=4]  pred_extendvnx2df/0
>         vsetvli zero,t1,e32,mf2,ta,ma   # 102   [c=0 l=4]  vsetvl_discard_resultdi
>         vle32.v v5,0(a6)        # 31    [c=4 l=4]  pred_movvnx2sf/1
>         vle32.v v4,0(a7)        # 39    [c=4 l=4]  pred_movvnx2sf/1
>         vsetvli t0,zero,e32,mf2,ta,ma   # 103   [c=0 l=4]  vsetvldi
>         vfwcvt.f.f.v    v1,v5   # 81    [c=4 l=4]  pred_extendvnx2df/0
>         vsetvli zero,zero,e64,m1,ta,ma  # 104   [c=16 l=4]  vsetvl_vtype_change_only
>         vfmul.vv        v5,v2,v3        # 29    [c=4 l=4]  pred_mulvnx2df/2
>         vfmul.vv        v2,v1,v2        # 34    [c=4 l=4]  pred_mulvnx2df/2
>         vsetvli zero,t1,e64,m1,ta,ma    # 105   [c=0 l=4]  vsetvl_discard_resultdi
>         vse64.v v2,0(a1)        # 35    [c=4 l=4]  pred_storevnx2df
>         vse64.v v5,0(a0)        # 30    [c=4 l=4]  pred_storevnx2df
>         vsetvli t6,zero,e64,m1,ta,ma    # 106   [c=0 l=4]  vsetvldi
>         vfmul.vv        v1,v1,v3        # 37    [c=4 l=4]  pred_mulvnx2df/2
>         vsetvli zero,zero,e32,mf2,ta,ma # 107   [c=20 l=4]  vsetvl_vtype_change_only
>         vfwcvt.f.f.v    v2,v4   # 83    [c=4 l=4]  pred_extendvnx2df/0
>         vsetvli zero,t1,e64,m1,ta,ma    # 108   [c=0 l=4]  vsetvl_discard_resultdi
>         vse64.v v1,0(a2)        # 38    [c=4 l=4]  pred_storevnx2df
>         vsetvli t6,zero,e64,m1,ta,ma    # 109   [c=0 l=4]  vsetvldi
>         slli    t4,t1,2 # 22    [c=4 l=4]  ashldi3
>         slli    t3,t1,3 # 27    [c=4 l=4]  ashldi3
>         vfmul.vv        v1,v2,v3        # 42    [c=4 l=4]  pred_mulvnx2df/2


Note how the output of the explicit conversion done in insn 77 is used 
by the vfmul in insns 29, 37 and 42.  Similarly for the other explcit 
conversions.

Your pattern isn't going to help that problem.

You could model this as a dependency height reduction.  I think that 
will get you were you want to go.

You'll need a pattern that matches this:

> (parallel [     
>         (set (reg:VNx2DF 160 [ vect__11.15 ])
>             (if_then_else:VNx2DF (unspec:VNx2BI [
>                         (const_vector:VNx2BI repeat [
>                                 (const_int 1 [0x1])
>                             ])      
>                         (reg:DI 169)
>                         (const_int 2 [0x2]) repeated x2
>                         (const_int 1 [0x1]) 
>                         (const_int 7 [0x7])
>                         (reg:SI 66 vl)
>                         (reg:SI 67 vtype)
>                         (reg:SI 69 frm)
>                     ] UNSPEC_VPREDICATE)
>                 (mult:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 ]))
>                     (float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
>                 (unspec:VNx2DF [
>                         (reg:SI 0 zero)
>                     ] UNSPEC_VUNDEF)))
>         (set (reg:VNx2DF 143 [ vect__8.14 ])
>             (float_extend:VNx2DF (reg:VNx2SF 144 [ vect__7.13 ])))
>         (set (reg:VNx2DF 145 [ vect__5.10 ])
>             (float_extend:VNx2DF (reg:VNx2SF 146 [ vect__4.9 ])))
>     ])

It'll need to be a define_insn_and_split as its a 3->3 splitter.  The 
split will emit the two extensions and the widening multiply as 3 
distinct insns.

This has two positive effects.  First the widening multiply is no longer 
data dependent on the float_extend and so it can issue when ever r144 
and r146 are ready rather than when r143 and r145 are ready.

The second effect is I think this pattern will end up matching all the 
multiplies in this sample code.  As a result all the float_extend insns 
you generated when splitting become dead and should be removed by DCE.


Jeff

next prev parent reply	other threads:[~2023-06-29 23:39 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28  4:15 Juzhe-Zhong
2023-06-28 16:24 ` Jeff Law
2023-06-28 22:00   ` 钟居哲
2023-06-29 22:59     ` Jeff Law
2023-06-29 23:02       ` 钟居哲
2023-06-29 23:04       ` 钟居哲
2023-06-29 23:39     ` Jeff Law [this message]
2023-06-30 10:14       ` Robin Dapp
2023-06-30 22:35         ` Jeff Law
2023-07-01 11:45           ` Robin Dapp
     [not found]           ` <8D5801744511A6AD+6077E043-F267-4BC0-90B8-B2FCDCA10089@rivai.ai>
2023-07-03  7:49             ` Robin Dapp
2023-07-03  8:42               ` juzhe.zhong
2023-07-03  8:44                 ` Robin Dapp
2023-07-03  8:45                   ` juzhe.zhong
2023-07-03  8:49                     ` Robin Dapp
2023-07-03  8:51                       ` juzhe.zhong
2023-07-07 21:11                 ` Jeff Law
2023-07-07 23:05                   ` 钟居哲
2023-06-29 23:41     ` Jeff Law
     [not found]     ` <99D6E636A491D16D+F0E92F80-33DF-4109-912E-F9CAAD6F07B5@rivai.ai>
2023-06-29 23:48       ` Jeff Law
2023-06-30  0:44         ` juzhe.zhong
     [not found]   ` <2023062906005450585022@rivai.ai>
2023-06-28 22:59     ` 钟居哲

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62be1be8-280f-2989-6a31-32c87dcbadcd@gmail.com \
    --to=jeffreyalaw@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=juzhe.zhong@rivai.ai \
    --cc=kito.cheng@gmail.com \
    --cc=kito.cheng@sifive.com \
    --cc=palmer@dabbelt.com \
    --cc=palmer@rivosinc.com \
    --cc=rdapp.gcc@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).