From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Pzh6=55=rivai.ai=juzhe.zhong@sourceware.org>
Received: from smtpbg156.qq.com (smtpbg156.qq.com [15.184.82.18])
	by sourceware.org (Postfix) with ESMTPS id 8330A3858C60
	for <gcc-patches@gcc.gnu.org>; Wed,  1 Feb 2023 12:25:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8330A3858C60
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai
X-QQ-mid: bizesmtp77t1675254288t133siq9
Received: from server1.localdomain ( [58.60.1.22])
	by bizesmtp.qq.com (ESMTP) with 
	id ; Wed, 01 Feb 2023 20:24:47 +0800 (CST)
X-QQ-SSF: 01400000000000E0L000000A0000000
X-QQ-FEAT: C46Rb8GPIEf/MMFw/9Wp+alic7bFzqnF6cP1SLmCeRmEerJDRfsa4XuVONxCT
	fcmJc4fppldcVr2n9YMMbu4ga7dxAV7yySZmZUG7iCHDW85KBwE8GPsAQ6Gr8mDDDpU1vZU
	b8rN16bAU5qt+VybBipeyUKaAYimXDoI3bREUs2HI/exlH7RDDPPm3xctrKOdrkX/K5hpWm
	tGFXvwz7Ms1kWRgurilaaEliU3vGRzP8Ta+gG33NNOYEUqXLz8jiTHio4XVZjobPqwg61Zt
	JQSjm0w5wAYnz9I64zuiMRXeaToPx71w0Rzmmt/K89BStUTWjsTp5AURmrGlv805qq+ArT6
	aOsfGqiuXlY96X3sjt41I+w99y/pMF+E9SZKCUy9sWnfAGAH5BC0YMX3We4GofF9dY/l9WI
	4m8LXVRRJCs=
X-QQ-GoodBg: 2
From: juzhe.zhong@rivai.ai
To: gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com,
	richard.sandiford@arm.com,
	rguenther@suse.de,
	jeffreyalaw@gmail.com,
	apinski@marvell.com,
	Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
Subject: [PATCH] CPROP: Allow cprop optimization when the function has a single block
Date: Wed,  1 Feb 2023 20:24:44 +0800
Message-Id: <20230201122444.253620-1-juzhe.zhong@rivai.ai>
X-Mailer: git-send-email 2.36.1
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-QQ-SENDSIZE: 520
Feedback-ID: bizesmtp:rivai.ai:qybglogicsvr:qybglogicsvr7
X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,RCVD_IN_BARRACUDACENTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,TXREP,T_SPF_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

Hi, this patch is present for GCC 14 since I understand it's not appropriate
to land it in GCC 13.

NUM_FIXED_BLOCKS = 2 since GCC define each function has aleast 2 blocks
one is entry block, the other is exit block.
So according this code, the function will not do cprop optimization when
there is only exactly one single block.

I am not sure whether it's correct to fix it like this.
Can anyone tell me why forbid cprop optimization if the function only has s
single block ?

Let's take a look at these 2 examples of RVV intrinsics:
1.    void f1 (void * in, void *out, int64_t x, int n)
      {
      vint64m1_t v = __riscv_vle64_v_i64m1 (in + 1, 4);
      vint64m1_t v2 = __riscv_vle64_v_i64m1_tu (v, in + 2, 4);
      vint64m1_t v3 = __riscv_vadd_vx_i64m1 (v2, x, 4);
      vint64m1_t v4 = __riscv_vadd_vx_i64m1 (v3, x, 4);
      __riscv_vse64_v_i64m1 (out + 2, v4, 4);
      }

asm:
     	addi	sp,sp,-16
	sw	a2,8(sp)
	sw	a3,12(sp)
	sw	a2,0(sp)
	sw	a3,4(sp)
	addi	a5,a0,1
	vsetivli	zero,4,e64,m1,ta,ma
	addi	a0,a0,2
	vle64.v	v24,0(a5)
	addi	a5,sp,8
	vlse64.v	v27,0(a5),zero
	addi	a1,a1,2
	vsetvli	zero,zero,e64,m1,tu,ma
	vle64.v	v24,0(a0)
	vsetvli	zero,zero,e64,m1,ta,ma
	vlse64.v	v25,0(sp),zero
	vadd.vv	v26,v24,v27
	vadd.vv	v24,v26,v25
	vse64.v	v24,0(a1)
	addi	sp,sp,16
	jr	ra
you can see here there are 2 vlse64.v instructions that broadcast the scalar value "x"
GCC fail to eliminate the second vlse64.v instruction since GCC doesn't do the cprop
optimization (the function only has 1 single block). It can be optimized if we apply
this patch.

2. void f1 (void * in, void *out, int64_t x, int n)
{
    if (n) {
      vint64m1_t v = __riscv_vle64_v_i64m1 (in + 1, 4);
      vint64m1_t v2 = __riscv_vle64_v_i64m1_tu (v, in + 2, 4);
      vint64m1_t v3 = __riscv_vadd_vx_i64m1 (v2, x, 4);
      vint64m1_t v4 = __riscv_vadd_vx_i64m1 (v3, x, 4);
      __riscv_vse64_v_i64m1 (out + 2, v4, 4);
    }
}

asm:
  f1:
	vsetivli	zero,4,e64,m1,ta,ma
	beq	a4,zero,.L7
	addi	sp,sp,-16
	sw	a2,8(sp)
	sw	a3,12(sp)
	addi	a5,a0,1
	vle64.v	v24,0(a5)
	addi	a0,a0,2
	addi	a5,sp,8
	vlse64.v	v25,0(a5),zero
	addi	a1,a1,2
	vsetvli	zero,zero,e64,m1,tu,ma
	vle64.v	v24,0(a0)
	vadd.vv	v26,v24,v25
	vadd.vv	v24,v26,v25
	vse64.v	v24,0(a1)
	addi	sp,sp,16
	jr	ra
.L7:
	ret

Here, if we add if (n) condition here, the program will end up with more than 1 block.
So GCC will do the cprop optimization and the second vlse64.v instruction is eliminated.

I am not sure whether this patch is correct.
Can anyone help me with that ?
Thanks.


gcc/ChangeLog:

        * cprop.cc (one_cprop_pass): Remove +1.

---
 gcc/cprop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cprop.cc b/gcc/cprop.cc
index 6ec0bda4a24..615bc5078b6 100644
--- a/gcc/cprop.cc
+++ b/gcc/cprop.cc
@@ -1749,7 +1749,7 @@ one_cprop_pass (void)
   int changed = 0;
 
   /* Return if there's nothing to do, or it is too expensive.  */
-  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1
+  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS
       || gcse_or_cprop_is_too_expensive (_ ("const/copy propagation disabled")))
     return 0;
 
-- 
2.36.1