public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* State of risc-v port in the current merge, revert, rinse-repeat commotion
@ 2024-04-24 17:58 Vineet Gupta
  2024-04-24 19:22 ` Robin Dapp
  0 siblings, 1 reply; 3+ messages in thread
From: Vineet Gupta @ 2024-04-24 17:58 UTC (permalink / raw)
  To: Jeff Law, Robin Dapp, Palmer Dabbelt, Kito Cheng, juzhe.zhong,
	jeremy.bennett
  Cc: GCC Patches, gnu-toolchain

Hi,

Per discussion in patchworks call yesterday I gave the trunk snapshot a
spin for SPEC2017 build/runs on QEMU (usual flags -Ofast -flto=auto,
-march=rv64gcv_zba_zbb_zbs_zicond)

     2024-04-23 6f0a646dd2fc Remove repeated information in
-ftree-loop-distribute-patterns doc  

The dynamic icounts looks sane (vs. Apr 10 snapshot) except for a
regression in x264 which is likely independent of the chaos going on.

     Apr 10     |     Apr 23      |
  109f1b28fc94  |  6f0a646dd2fc   |
----------------+-----------------+--------
276,584,692,883 | 277,816,987,018 |  -0.45%
913,452,236,000 | 927,291,935,180 |  -1.52%
903,916,092,805 | 915,364,006,176 |  -1.27%

cheers,
-Vineet

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: State of risc-v port in the current merge, revert, rinse-repeat commotion
  2024-04-24 17:58 State of risc-v port in the current merge, revert, rinse-repeat commotion Vineet Gupta
@ 2024-04-24 19:22 ` Robin Dapp
  2024-04-24 22:58   ` Vineet Gupta
  0 siblings, 1 reply; 3+ messages in thread
From: Robin Dapp @ 2024-04-24 19:22 UTC (permalink / raw)
  To: Vineet Gupta, Jeff Law, Palmer Dabbelt, Kito Cheng, juzhe.zhong,
	jeremy.bennett
  Cc: rdapp.gcc, GCC Patches, gnu-toolchain

Thanks Vineet!

> The dynamic icounts looks sane (vs. Apr 10 snapshot) except for a
> regression in x264 which is likely independent of the chaos going on.
> 
>      Apr 10     |     Apr 23      |
>   109f1b28fc94  |  6f0a646dd2fc   |
> ----------------+-----------------+--------
> 276,584,692,883 | 277,816,987,018 |  -0.45%
> 913,452,236,000 | 927,291,935,180 |  -1.52%
> 903,916,092,805 | 915,364,006,176 |  -1.27%

x264 uses widening arithmetic so it could be the reverts.
Can you compare the hot functions (e.g. x264_pixel_sad_16x16)
if anything stands out surrounding the vwadd.wv for example?

Regards
 Robin


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: State of risc-v port in the current merge, revert, rinse-repeat commotion
  2024-04-24 19:22 ` Robin Dapp
@ 2024-04-24 22:58   ` Vineet Gupta
  0 siblings, 0 replies; 3+ messages in thread
From: Vineet Gupta @ 2024-04-24 22:58 UTC (permalink / raw)
  To: Robin Dapp, Jeff Law, Palmer Dabbelt, Kito Cheng, juzhe.zhong,
	jeremy.bennett
  Cc: GCC Patches, gnu-toolchain

[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]



On 4/24/24 12:22, Robin Dapp wrote:
> The dynamic icounts looks sane (vs. Apr 10 snapshot) except for a
>> regression in x264 which is likely independent of the chaos going on.
>>
>>      Apr 10     |     Apr 23      |
>>   109f1b28fc94  |  6f0a646dd2fc   |
>> ----------------+-----------------+--------
>> 276,584,692,883 | 277,816,987,018 |  -0.45%
>> 913,452,236,000 | 927,291,935,180 |  -1.52%
>> 903,916,092,805 | 915,364,006,176 |  -1.27%
> x264 uses widening arithmetic so it could be the reverts.
> Can you compare the hot functions (e.g. x264_pixel_sad_16x16)

Function                                     old     new   delta
x264_pixel_sad_x4_16x8.lto_priv             5188    5288    +100
x264_pixel_sad_x4_8x16.lto_priv             5844    5924     +80
x264_pixel_sad_x3_16x8.lto_priv             3904    3980     +76
x264_pixel_sad_x4_16x16.lto_priv             834     898     +64
x264_pixel_sad_x3_8x16.lto_priv             4408    4468     +60
x264_pixel_sad_x4_8x8.lto_priv              3010    3058     +48
x264_pixel_sad_x3_8x8.lto_priv              2290    2338     +48
...
...
x264_pixel_sad_x4_4x8.lto_priv              1366    1362      -4
x264_pixel_sad_x4_4x4.lto_priv               716     712      -4
x264_pixel_sad_4x8.lto_priv                  332     328      -4
x264_pixel_sad_4x4.lto_priv                  172     168      -4
hpel_filter.lto_priv                         984     980      -4



> if anything stands out surrounding the vwadd.wv for example?

Yeah it does:  not specifically in the routine you mentioned above but
in its various brethren: see attached objdump for
x264_pixel_sad_x4_16x16 () for the 2 cases.

-Vineet

[-- Attachment #2: objd-x264-240409 --]
[-- Type: text/plain, Size: 6327 bytes --]

0000000000021872 <x264_pixel_sad_x4_16x16.lto_priv.0>:
   21872:	vsetivli	zero,4,e32,m1,ta,ma
   21876:	vmv.v.i	v5,0
   2187a:	add	sp,sp,-32
   2187c:	add	t4,a0,4
   21880:	vmv1r.v	v7,v5
   21884:	vmv1r.v	v8,v5
   21888:	vmv1r.v	v9,v5
   2188c:	vmv1r.v	v6,v5
   21890:	add	t5,a0,8
   21894:	add	t6,a0,12
   21898:	sd	s0,24(sp)
   2189a:	sd	s1,16(sp)
   2189c:	mv	s0,a6
   2189e:	sd	s2,8(sp)
   218a0:	sd	s3,0(sp)
   218a2:	mv	a6,t4
   218a4:	mv	t2,t5
   218a6:	mv	t0,t6
   218a8:	mv	t1,a0
   218aa:	add	a7,a0,256
   218ae:	mv	t3,a0
   218b0:	vsetvli	zero,zero,e8,mf4,ta,ma
   218b4:	add	s3,a1,4
   218b8:	add	s2,a1,8
   218bc:	vle8.v	v3,(s3)
   218c0:	vle8.v	v14,(a6)
   218c4:	vle8.v	v2,(s2)
   218c8:	vle8.v	v13,(t2)
   218cc:	add	s1,a1,12
   218d0:	vle8.v	v12,(t0)
   218d4:	vle8.v	v11,(t3)
   218d8:	vle8.v	v10,(a1)
   218dc:	vle8.v	v1,(s1)
   218e0:	vwsubu.vv	v4,v14,v3
   218e4:	vwsubu.vv	v3,v13,v2
   218e8:	add	t3,t3,16
   218ea:	add	a6,a6,16
   218ec:	add	t2,t2,16
   218ee:	vwsubu.vv	v2,v12,v1
   218f2:	vwsubu.vv	v1,v11,v10
   218f6:	vsetvli	zero,zero,e16,mf2,ta,mu
   218fa:	vmsle.vi	v0,v4,-1
   218fe:	vmsle.vi	v12,v3,-1
   21902:	vmsle.vi	v11,v2,-1
   21906:	vneg.v	v4,v4,v0.t
   2190a:	vmv1r.v	v0,v12
   2190e:	vmsle.vi	v10,v1,-1
   21912:	vwadd.wv	v9,v9,v4
   21916:	vneg.v	v3,v3,v0.t
   2191a:	vmv1r.v	v0,v11
   2191e:	add	t0,t0,16
   21920:	vwadd.wv	v8,v8,v3
   21924:	vneg.v	v2,v2,v0.t
   21928:	vmv1r.v	v0,v10
   2192c:	add	a1,a1,a5
   2192e:	vwadd.wv	v7,v7,v2
   21932:	vneg.v	v1,v1,v0.t
   21936:	vwadd.wv	v6,v6,v1
   2193a:	bne	t3,a7,218b0 <x264_pixel_sad_x4_16x16.lto_priv.0+0x3e>
   2193e:	vsetvli	zero,zero,e32,m1,ta,ma
   21942:	vadd.vv	v1,v6,v9
   21946:	li	a6,0
   21948:	vmv.s.x	v2,a6
   2194c:	vadd.vv	v1,v1,v8
   21950:	vmv1r.v	v9,v5
   21954:	vmv1r.v	v8,v5
   21958:	vadd.vv	v1,v1,v7
   2195c:	vmv1r.v	v6,v5
   21960:	vmv1r.v	v7,v5
   21964:	vredsum.vs	v1,v1,v2
   21968:	mv	t2,t6
   2196a:	mv	t0,t5
   2196c:	mv	t3,t4
   2196e:	mv	a1,a0
   21970:	vmv.x.s	a6,v1
   21974:	sw	a6,0(s0)
   21978:	vsetvli	zero,zero,e8,mf4,ta,ma
   2197c:	add	s2,a2,4
   21980:	add	s1,a2,8
   21984:	vle8.v	v3,(s2)
   21988:	vle8.v	v14,(t3)
   2198c:	vle8.v	v2,(s1)
   21990:	vle8.v	v13,(t0)
   21994:	add	a6,a2,12
   21998:	vle8.v	v12,(t2)
   2199c:	vle8.v	v11,(a1)
   219a0:	vle8.v	v10,(a2)
   219a4:	vle8.v	v1,(a6)
   219a8:	vwsubu.vv	v4,v14,v3
   219ac:	vwsubu.vv	v3,v13,v2
   219b0:	add	a1,a1,16
   219b2:	add	t3,t3,16
   219b4:	add	t0,t0,16
   219b6:	vwsubu.vv	v2,v12,v1
   219ba:	vwsubu.vv	v1,v11,v10
   219be:	vsetvli	zero,zero,e16,mf2,ta,mu
   219c2:	vmsle.vi	v0,v4,-1
   219c6:	vmsle.vi	v12,v3,-1
   219ca:	vmsle.vi	v11,v2,-1
   219ce:	vneg.v	v4,v4,v0.t
   219d2:	vmv1r.v	v0,v12
   219d6:	vmsle.vi	v10,v1,-1
   219da:	vwadd.wv	v9,v9,v4
   219de:	vneg.v	v3,v3,v0.t
   219e2:	vmv1r.v	v0,v11
   219e6:	add	t2,t2,16
   219e8:	vwadd.wv	v7,v7,v3
   219ec:	vneg.v	v2,v2,v0.t
   219f0:	vmv1r.v	v0,v10
   219f4:	add	a2,a2,a5
   219f6:	vwadd.wv	v8,v8,v2
   219fa:	vneg.v	v1,v1,v0.t
   219fe:	vwadd.wv	v6,v6,v1
   21a02:	bne	a1,a7,21978 <x264_pixel_sad_x4_16x16.lto_priv.0+0x106>
   21a06:	vsetvli	zero,zero,e32,m1,ta,ma
   21a0a:	vadd.vv	v1,v6,v9
   21a0e:	li	a1,0
   21a10:	vmv.s.x	v2,a1
   21a14:	vadd.vv	v1,v1,v7
   21a18:	vmv1r.v	v9,v5
   21a1c:	vmv1r.v	v7,v5
   21a20:	vadd.vv	v1,v1,v8
   21a24:	vmv1r.v	v6,v5
   21a28:	vmv1r.v	v8,v5
   21a2c:	vredsum.vs	v1,v1,v2
   21a30:	mv	a2,a0
   21a32:	mv	t3,t6
   21a34:	mv	a0,t5
   21a36:	mv	a1,t4
   21a38:	vmv.x.s	a6,v1
   21a3c:	sw	a6,4(s0)
   21a40:	vsetvli	zero,zero,e8,mf4,ta,ma
   21a44:	add	t2,a3,4
   21a48:	add	t0,a3,8
   21a4c:	vle8.v	v3,(t2)
   21a50:	vle8.v	v14,(a1)
   21a54:	vle8.v	v2,(t0)
   21a58:	vle8.v	v13,(a0)
   21a5c:	add	a6,a3,12
   21a60:	vle8.v	v12,(t3)
   21a64:	vle8.v	v11,(a2)
   21a68:	vle8.v	v10,(a3)
   21a6c:	vle8.v	v1,(a6)
   21a70:	vwsubu.vv	v4,v14,v3
   21a74:	vwsubu.vv	v3,v13,v2
   21a78:	add	a2,a2,16
   21a7a:	add	a1,a1,16
   21a7c:	add	a0,a0,16
   21a7e:	vwsubu.vv	v2,v12,v1
   21a82:	vwsubu.vv	v1,v11,v10
   21a86:	vsetvli	zero,zero,e16,mf2,ta,mu
   21a8a:	vmsle.vi	v0,v4,-1
   21a8e:	vmsle.vi	v12,v3,-1
   21a92:	vmsle.vi	v11,v2,-1
   21a96:	vneg.v	v4,v4,v0.t
   21a9a:	vmv1r.v	v0,v12
   21a9e:	vmsle.vi	v10,v1,-1
   21aa2:	vwadd.wv	v9,v9,v4
   21aa6:	vneg.v	v3,v3,v0.t
   21aaa:	vmv1r.v	v0,v11
   21aae:	add	t3,t3,16
   21ab0:	vwadd.wv	v8,v8,v3
   21ab4:	vneg.v	v2,v2,v0.t
   21ab8:	vmv1r.v	v0,v10
   21abc:	add	a3,a3,a5
   21abe:	vwadd.wv	v7,v7,v2
   21ac2:	vneg.v	v1,v1,v0.t
   21ac6:	vwadd.wv	v6,v6,v1
   21aca:	bne	a7,a2,21a40 <x264_pixel_sad_x4_16x16.lto_priv.0+0x1ce>
   21ace:	vsetvli	zero,zero,e32,m1,ta,ma
   21ad2:	vadd.vv	v1,v6,v9
   21ad6:	li	a2,0
   21ad8:	vmv.s.x	v2,a2
   21adc:	vadd.vv	v1,v1,v8
   21ae0:	vmv1r.v	v6,v5
   21ae4:	vmv1r.v	v8,v5
   21ae8:	vadd.vv	v1,v1,v7
   21aec:	vmv1r.v	v7,v5
   21af0:	vredsum.vs	v1,v1,v2
   21af4:	vmv.x.s	a3,v1
   21af8:	sw	a3,8(s0)
   21afa:	vsetvli	zero,zero,e8,mf4,ta,ma
   21afe:	add	a1,a4,4
   21b02:	add	a2,a4,8
   21b06:	vle8.v	v3,(a1)
   21b0a:	vle8.v	v13,(t4)
   21b0e:	vle8.v	v2,(a2)
   21b12:	vle8.v	v12,(t5)
   21b16:	add	a3,a4,12
   21b1a:	vle8.v	v11,(t6)
   21b1e:	vle8.v	v10,(t1)
   21b22:	vle8.v	v9,(a4)
   21b26:	vle8.v	v1,(a3)
   21b2a:	vwsubu.vv	v4,v13,v3
   21b2e:	vwsubu.vv	v3,v12,v2
   21b32:	add	t1,t1,16
   21b34:	add	t4,t4,16
   21b36:	add	t5,t5,16
   21b38:	vwsubu.vv	v2,v11,v1
   21b3c:	vwsubu.vv	v1,v10,v9
   21b40:	vsetvli	zero,zero,e16,mf2,ta,mu
   21b44:	vmsle.vi	v0,v4,-1
   21b48:	vmsle.vi	v11,v3,-1
   21b4c:	vmsle.vi	v10,v2,-1
   21b50:	vneg.v	v4,v4,v0.t
   21b54:	vmv1r.v	v0,v11
   21b58:	vmsle.vi	v9,v1,-1
   21b5c:	vwadd.wv	v8,v8,v4
   21b60:	vneg.v	v3,v3,v0.t
   21b64:	vmv1r.v	v0,v10
   21b68:	add	t6,t6,16
   21b6a:	vwadd.wv	v7,v7,v3
   21b6e:	vneg.v	v2,v2,v0.t
   21b72:	vmv1r.v	v0,v9
   21b76:	add	a4,a4,a5
   21b78:	vwadd.wv	v5,v5,v2
   21b7c:	vneg.v	v1,v1,v0.t
   21b80:	vwadd.wv	v6,v6,v1
   21b84:	bne	a7,t1,21afa <x264_pixel_sad_x4_16x16.lto_priv.0+0x288>
   21b88:	vsetvli	zero,zero,e32,m1,ta,ma
   21b8c:	vadd.vv	v1,v6,v8
   21b90:	li	a4,0
   21b92:	vmv.s.x	v2,a4
   21b96:	vadd.vv	v1,v1,v7
   21b9a:	vadd.vv	v1,v1,v5
   21b9e:	vredsum.vs	v1,v1,v2
   21ba2:	vmv.x.s	a5,v1
   21ba6:	sw	a5,12(s0)
   21ba8:	ld	s0,24(sp)
   21baa:	ld	s1,16(sp)
   21bac:	ld	s2,8(sp)
   21bae:	ld	s3,0(sp)
   21bb0:	add	sp,sp,32
   21bb2:	ret

[-- Attachment #3: objd-x264-240423 --]
[-- Type: text/plain, Size: 6717 bytes --]

0000000000021892 <x264_pixel_sad_x4_16x16.lto_priv.0>:
   21892:	vsetivli	zero,4,e32,m1,ta,ma
   21896:	vmv.v.i	v5,0
   2189a:	add	sp,sp,-32
   2189c:	add	t4,a0,4
   218a0:	vmv1r.v	v7,v5
   218a4:	vmv1r.v	v8,v5
   218a8:	vmv1r.v	v9,v5
   218ac:	vmv1r.v	v6,v5
   218b0:	add	t5,a0,8
   218b4:	add	t6,a0,12
   218b8:	sd	s0,24(sp)
   218ba:	sd	s1,16(sp)
   218bc:	mv	s0,a6
   218be:	sd	s2,8(sp)
   218c0:	sd	s3,0(sp)
   218c2:	mv	a6,t4
   218c4:	mv	t2,t5
   218c6:	mv	t0,t6
   218c8:	mv	t1,a0
   218ca:	add	a7,a0,256
   218ce:	mv	t3,a0
   218d0:	vsetvli	zero,zero,e8,mf4,ta,ma
   218d4:	add	s3,a1,4
   218d8:	add	s2,a1,8
   218dc:	vle8.v	v3,(s3)
   218e0:	vle8.v	v14,(a6)
   218e4:	vle8.v	v2,(s2)
   218e8:	vle8.v	v13,(t2)
   218ec:	add	s1,a1,12
   218f0:	vle8.v	v12,(t0)
   218f4:	vle8.v	v11,(t3)
   218f8:	vle8.v	v10,(a1)
   218fc:	vle8.v	v1,(s1)
   21900:	vwsubu.vv	v4,v14,v3
   21904:	vwsubu.vv	v3,v13,v2
   21908:	add	t3,t3,16
   2190a:	add	a6,a6,16
   2190c:	add	t2,t2,16
   2190e:	vwsubu.vv	v2,v12,v1
   21912:	vwsubu.vv	v1,v11,v10
   21916:	vsetvli	zero,zero,e16,mf2,ta,mu
   2191a:	vmsle.vi	v0,v4,-1
   2191e:	vmsle.vi	v12,v3,-1
   21922:	vmsle.vi	v11,v2,-1
   21926:	vneg.v	v4,v4,v0.t
   2192a:	vmv1r.v	v0,v12
   2192e:	vmsle.vi	v10,v1,-1
   21932:	add	t0,t0,16
   21934:	vneg.v	v3,v3,v0.t
   21938:	vmv1r.v	v0,v11
   2193c:	add	a1,a1,a5
   2193e:	vneg.v	v2,v2,v0.t
   21942:	vmv1r.v	v0,v10
   21946:	vmv1r.v	v10,v9
   2194a:	vneg.v	v1,v1,v0.t
   2194e:	vwadd.wv	v9,v10,v4
   21952:	vmv1r.v	v4,v8
   21956:	vwadd.wv	v8,v4,v3
   2195a:	vmv1r.v	v3,v7
   2195e:	vwadd.wv	v7,v3,v2
   21962:	vmv1r.v	v2,v6
   21966:	vwadd.wv	v6,v2,v1
   2196a:	bne	t3,a7,218d0 <x264_pixel_sad_x4_16x16.lto_priv.0+0x3e>
   2196e:	vsetvli	zero,zero,e32,m1,ta,ma
   21972:	vadd.vv	v1,v6,v9
   21976:	li	a6,0
   21978:	vmv.s.x	v2,a6
   2197c:	vadd.vv	v1,v1,v8
   21980:	vmv1r.v	v9,v5
   21984:	vmv1r.v	v8,v5
   21988:	vadd.vv	v1,v1,v7
   2198c:	vmv1r.v	v6,v5
   21990:	vmv1r.v	v7,v5
   21994:	vredsum.vs	v1,v1,v2
   21998:	mv	t2,t6
   2199a:	mv	t0,t5
   2199c:	mv	t3,t4
   2199e:	mv	a1,a0
   219a0:	vmv.x.s	a6,v1
   219a4:	sw	a6,0(s0)
   219a8:	vsetvli	zero,zero,e8,mf4,ta,ma
   219ac:	add	s2,a2,4
   219b0:	add	s1,a2,8
   219b4:	vle8.v	v3,(s2)
   219b8:	vle8.v	v14,(t3)
   219bc:	vle8.v	v2,(s1)
   219c0:	vle8.v	v13,(t0)
   219c4:	add	a6,a2,12
   219c8:	vle8.v	v12,(t2)
   219cc:	vle8.v	v11,(a1)
   219d0:	vle8.v	v10,(a2)
   219d4:	vle8.v	v1,(a6)
   219d8:	vwsubu.vv	v4,v14,v3
   219dc:	vwsubu.vv	v3,v13,v2
   219e0:	add	a1,a1,16
   219e2:	add	t3,t3,16
   219e4:	add	t0,t0,16
   219e6:	vwsubu.vv	v2,v12,v1
   219ea:	vwsubu.vv	v1,v11,v10
   219ee:	vsetvli	zero,zero,e16,mf2,ta,mu
   219f2:	vmsle.vi	v0,v4,-1
   219f6:	vmsle.vi	v12,v3,-1
   219fa:	vmsle.vi	v11,v2,-1
   219fe:	vneg.v	v4,v4,v0.t
   21a02:	vmv1r.v	v0,v12
   21a06:	vmsle.vi	v10,v1,-1
   21a0a:	add	t2,t2,16
   21a0c:	vneg.v	v3,v3,v0.t
   21a10:	vmv1r.v	v0,v11
   21a14:	add	a2,a2,a5
   21a16:	vneg.v	v2,v2,v0.t
   21a1a:	vmv1r.v	v0,v10
   21a1e:	vmv1r.v	v10,v9
   21a22:	vneg.v	v1,v1,v0.t
   21a26:	vwadd.wv	v9,v10,v4
   21a2a:	vmv1r.v	v4,v7
   21a2e:	vwadd.wv	v7,v4,v3
   21a32:	vmv1r.v	v3,v8
   21a36:	vwadd.wv	v8,v3,v2
   21a3a:	vmv1r.v	v2,v6
   21a3e:	vwadd.wv	v6,v2,v1
   21a42:	bne	a1,a7,219a8 <x264_pixel_sad_x4_16x16.lto_priv.0+0x116>
   21a46:	vsetvli	zero,zero,e32,m1,ta,ma
   21a4a:	vadd.vv	v1,v6,v9
   21a4e:	li	a1,0
   21a50:	vmv.s.x	v2,a1
   21a54:	vadd.vv	v1,v1,v7
   21a58:	vmv1r.v	v9,v5
   21a5c:	vmv1r.v	v7,v5
   21a60:	vadd.vv	v1,v1,v8
   21a64:	vmv1r.v	v6,v5
   21a68:	vmv1r.v	v8,v5
   21a6c:	vredsum.vs	v1,v1,v2
   21a70:	mv	a2,a0
   21a72:	mv	t3,t6
   21a74:	mv	a0,t5
   21a76:	mv	a1,t4
   21a78:	vmv.x.s	a6,v1
   21a7c:	sw	a6,4(s0)
   21a80:	vsetvli	zero,zero,e8,mf4,ta,ma
   21a84:	add	t2,a3,4
   21a88:	add	t0,a3,8
   21a8c:	vle8.v	v3,(t2)
   21a90:	vle8.v	v14,(a1)
   21a94:	vle8.v	v2,(t0)
   21a98:	vle8.v	v13,(a0)
   21a9c:	add	a6,a3,12
   21aa0:	vle8.v	v12,(t3)
   21aa4:	vle8.v	v11,(a2)
   21aa8:	vle8.v	v10,(a3)
   21aac:	vle8.v	v1,(a6)
   21ab0:	vwsubu.vv	v4,v14,v3
   21ab4:	vwsubu.vv	v3,v13,v2
   21ab8:	add	a2,a2,16
   21aba:	add	a1,a1,16
   21abc:	add	a0,a0,16
   21abe:	vwsubu.vv	v2,v12,v1
   21ac2:	vwsubu.vv	v1,v11,v10
   21ac6:	vsetvli	zero,zero,e16,mf2,ta,mu
   21aca:	vmsle.vi	v0,v4,-1
   21ace:	vmsle.vi	v12,v3,-1
   21ad2:	vmsle.vi	v11,v2,-1
   21ad6:	vneg.v	v4,v4,v0.t
   21ada:	vmv1r.v	v0,v12
   21ade:	vmsle.vi	v10,v1,-1
   21ae2:	add	t3,t3,16
   21ae4:	vneg.v	v3,v3,v0.t
   21ae8:	vmv1r.v	v0,v11
   21aec:	add	a3,a3,a5
   21aee:	vneg.v	v2,v2,v0.t
   21af2:	vmv1r.v	v0,v10
   21af6:	vmv1r.v	v10,v9
   21afa:	vneg.v	v1,v1,v0.t
   21afe:	vwadd.wv	v9,v10,v4
   21b02:	vmv1r.v	v4,v8
   21b06:	vwadd.wv	v8,v4,v3
   21b0a:	vmv1r.v	v3,v7
   21b0e:	vwadd.wv	v7,v3,v2
   21b12:	vmv1r.v	v2,v6
   21b16:	vwadd.wv	v6,v2,v1
   21b1a:	bne	a7,a2,21a80 <x264_pixel_sad_x4_16x16.lto_priv.0+0x1ee>
   21b1e:	vsetvli	zero,zero,e32,m1,ta,ma
   21b22:	vadd.vv	v1,v6,v9
   21b26:	li	a2,0
   21b28:	vmv.s.x	v2,a2
   21b2c:	vadd.vv	v1,v1,v8
   21b30:	vmv1r.v	v6,v5
   21b34:	vmv1r.v	v8,v5
   21b38:	vadd.vv	v1,v1,v7
   21b3c:	vmv1r.v	v7,v5
   21b40:	vredsum.vs	v1,v1,v2
   21b44:	vmv.x.s	a3,v1
   21b48:	sw	a3,8(s0)
   21b4a:	vsetvli	zero,zero,e8,mf4,ta,ma
   21b4e:	add	a1,a4,4
   21b52:	add	a2,a4,8
   21b56:	vle8.v	v3,(a1)
   21b5a:	vle8.v	v13,(t4)
   21b5e:	vle8.v	v2,(a2)
   21b62:	vle8.v	v12,(t5)
   21b66:	add	a3,a4,12
   21b6a:	vle8.v	v11,(t6)
   21b6e:	vle8.v	v10,(t1)
   21b72:	vle8.v	v9,(a4)
   21b76:	vle8.v	v1,(a3)
   21b7a:	vwsubu.vv	v4,v13,v3
   21b7e:	vwsubu.vv	v3,v12,v2
   21b82:	add	t1,t1,16
   21b84:	add	t4,t4,16
   21b86:	add	t5,t5,16
   21b88:	vwsubu.vv	v2,v11,v1
   21b8c:	vwsubu.vv	v1,v10,v9
   21b90:	vsetvli	zero,zero,e16,mf2,ta,mu
   21b94:	vmsle.vi	v0,v4,-1
   21b98:	vmsle.vi	v11,v3,-1
   21b9c:	vmsle.vi	v10,v2,-1
   21ba0:	vneg.v	v4,v4,v0.t
   21ba4:	vmv1r.v	v0,v11
   21ba8:	vmsle.vi	v9,v1,-1
   21bac:	add	t6,t6,16
   21bae:	vneg.v	v3,v3,v0.t
   21bb2:	vmv1r.v	v0,v10
   21bb6:	add	a4,a4,a5
   21bb8:	vneg.v	v2,v2,v0.t
   21bbc:	vmv1r.v	v0,v9
   21bc0:	vmv1r.v	v9,v8
   21bc4:	vneg.v	v1,v1,v0.t
   21bc8:	vwadd.wv	v8,v9,v4
   21bcc:	vmv1r.v	v4,v7
   21bd0:	vwadd.wv	v7,v4,v3
   21bd4:	vmv1r.v	v3,v5
   21bd8:	vwadd.wv	v5,v3,v2
   21bdc:	vmv1r.v	v2,v6
   21be0:	vwadd.wv	v6,v2,v1
   21be4:	bne	a7,t1,21b4a <x264_pixel_sad_x4_16x16.lto_priv.0+0x2b8>
   21be8:	vsetvli	zero,zero,e32,m1,ta,ma
   21bec:	vadd.vv	v1,v6,v8
   21bf0:	li	a4,0
   21bf2:	vmv.s.x	v2,a4
   21bf6:	vadd.vv	v1,v1,v7
   21bfa:	vadd.vv	v1,v1,v5
   21bfe:	vredsum.vs	v1,v1,v2
   21c02:	vmv.x.s	a5,v1
   21c06:	sw	a5,12(s0)
   21c08:	ld	s0,24(sp)
   21c0a:	ld	s1,16(sp)
   21c0c:	ld	s2,8(sp)
   21c0e:	ld	s3,0(sp)
   21c10:	add	sp,sp,32
   21c12:	ret

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-24 22:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-24 17:58 State of risc-v port in the current merge, revert, rinse-repeat commotion Vineet Gupta
2024-04-24 19:22 ` Robin Dapp
2024-04-24 22:58   ` Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).