From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 03F823858C52; Mon, 21 Nov 2022 03:07:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 03F823858C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AL0bE8n000934; Mon, 21 Nov 2022 03:07:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : mime-version; s=pp1; bh=Ypjf4sjr5bLu75yYSOWKHOP6hlwECOrikWJaeKScqIg=; b=Mc+hr8j4nKknTe/hwhKTZBgz+z9Agu6ZFLnfLOCXFhfO9+EcuCN4thox0lCDLbwBLzOY gl9gIVb+hPlFRK945w7nMQn229w6AwPjsc4tzW63KaTvokd8ukokGPBkmpBwN1LNa+j0 0jj0FWo6rmLgExveQ/b6bYzkq6wUi6McW/tNkErcq3UhRZcZtuTpIZY8xsTShlnE7H2b Nq5p/Fs1YZiE1xowKI3N8MprAOY9bVFNO2MLs6+gjl6nk8zingiS0ULvEHmSbLTmZo5Y 9fkp7JyIh6cet7W2orgfibuZt2waDT40PgygV/YsojxKhmLmjwWh/asWOVbpPngeCYd/ tg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ky8tjw8xf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 21 Nov 2022 03:07:11 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AL2mINu006613; Mon, 21 Nov 2022 03:07:10 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ky8tjw8x6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 21 Nov 2022 03:07:10 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AL35YMD014500; Mon, 21 Nov 2022 03:07:10 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma03dal.us.ibm.com with ESMTP id 3kxps9wfbk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 21 Nov 2022 03:07:09 +0000 Received: from smtpav01.dal12v.mail.ibm.com ([9.208.128.133]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AL3762m30016128 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 21 Nov 2022 03:07:06 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 633C858061; Mon, 21 Nov 2022 03:07:08 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1762D58062; Mon, 21 Nov 2022 03:07:08 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTPS; Mon, 21 Nov 2022 03:07:08 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, rguenther@suse.de, jeffreyalaw@gmail.com Subject: Re: [PATCH V2] Use subscalar mode to move struct block for parameter References: <20221117061549.178481-1-guojiufu@linux.ibm.com> Date: Mon, 21 Nov 2022 11:07:05 +0800 In-Reply-To: <20221117061549.178481-1-guojiufu@linux.ibm.com> (Jiufu Guo's message of "Thu, 17 Nov 2022 14:15:49 +0800") Message-ID: <7ea64lroo6.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 9FSVGwQ-Xjv1WztgkGCPludwTm9O55eX X-Proofpoint-GUID: d2pUiCejP27Lh2UkRd-3MeLGJcNWrORq X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-21_02,2022-11-18_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 suspectscore=0 bulkscore=0 lowpriorityscore=0 phishscore=0 mlxscore=0 clxscore=1015 priorityscore=1501 spamscore=0 impostorscore=0 adultscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211210023 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Jiufu Guo writes: > Hi, > > As mentioned in the previous version patch: > https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html > The suboptimal code is generated for "assigning from parameter" or > "assigning to return value". > This patch enhances the assignment from parameters like the below > cases: > /////case1.c > typedef struct SA {double a[3];long l; } A; > A ret_arg (A a) {return a;} > void st_arg (A a, A *p) {*p = a;} > > ////case2.c > typedef struct SA {double a[3];} A; > A ret_arg (A a) {return a;} > void st_arg (A a, A *p) {*p = a;} > > For this patch, bootstrap and regtest pass on ppc64{,le} > and x86_64. > * Besides asking for help reviewing this patch, I would like to > consult comments about enhancing for "assigning to returns". I updated the patch to fix the issue for returns. This patch adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used by a return stmt. This patch fix the issue in expand pass only, so, we would try to update the patch to avoid this flag. diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index dd29ffffc03..09b8ec64cea 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) frame_phase = off ? align - off : 0; } + /* Collect VARs on returns. */ + if (DECL_RESULT (current_function_decl)) + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) + { + tree val = gimple_return_retval (ret); + if (val && VAR_P (val)) + DECL_USEDBY_RETURN_P (val) = 1; + } + } + /* Set TREE_USED on all variables in the local_decls. */ FOR_EACH_LOCAL_DECL (cfun, i, var) TREE_USED (var) = 1; diff --git a/gcc/expr.cc b/gcc/expr.cc index d9407432ea5..20973649963 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool nontemporal) return; } + if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from) + && TYPE_MODE (TREE_TYPE (from)) == BLKmode + && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL + || REG_P (DECL_INCOMING_RTL (from)))) + || (VAR_P (to) && DECL_USEDBY_RETURN_P (to) + && TYPE_MODE (TREE_TYPE (to)) == BLKmode + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) + == PARALLEL)) + { + push_temp_slots (); + rtx par_ret; + machine_mode mode; + par_ret = TREE_CODE (from) == PARM_DECL + ? DECL_INCOMING_RTL (from) + : DECL_RTL (DECL_RESULT (current_function_decl)); + mode = GET_CODE (par_ret) == PARALLEL + ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0)) + : word_mode; + int mode_size = GET_MODE_SIZE (mode).to_constant (); + int size = INTVAL (expr_size (from)); + + /* If/How the parameter using submode, it dependes on the size and + position of the parameter. Here using heurisitic number. */ + int hurstc_num = 8; + if (size < mode_size || (size % mode_size) != 0 + || size > (mode_size * hurstc_num)) + result = store_expr (from, to_rtx, 0, nontemporal, false); + else + { + rtx from_rtx + = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL); + for (int i = 0; i < size / mode_size; i++) + { + rtx temp = gen_reg_rtx (mode); + rtx src = adjust_address (from_rtx, mode, mode_size * i); + rtx dest = adjust_address (to_rtx, mode, mode_size * i); + emit_move_insn (temp, src); + emit_move_insn (dest, temp); + } + result = to_rtx; + } + preserve_temp_slots (result); + pop_temp_slots (); + return; + } + /* Compute FROM and store the value in the rtx we got. */ push_temp_slots (); diff --git a/gcc/tree-core.h b/gcc/tree-core.h index af75522504f..be42e1464de 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1808,7 +1808,8 @@ struct GTY(()) tree_decl_common { In VAR_DECL, PARM_DECL and RESULT_DECL, this is DECL_HAS_VALUE_EXPR_P. */ unsigned decl_flag_2 : 1; - /* In FIELD_DECL, this is DECL_PADDING_P. */ + /* In FIELD_DECL, this is DECL_PADDING_P + In VAR_DECL, this is DECL_USEDBY_RETURN_P. */ unsigned decl_flag_3 : 1; /* Logically, these two would go in a theoretical base shared by var and parm decl. */ diff --git a/gcc/tree.h b/gcc/tree.h index a863d2e50e5..73c0314dac1 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -3011,6 +3011,10 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_PADDING_P(NODE) \ (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_3) +/* Used in a VAR_DECL to indicate that it is used by a return stmt. */ +#define DECL_USEDBY_RETURN_P(NODE) \ + (VAR_DECL_CHECK (NODE)->decl_common.decl_flag_3) + /* Used in a FIELD_DECL to indicate whether this field is not a flexible array member. This is only valid for the last array type field of a structure. */ > > On some targets(ppc64), for below case: > ////case3.c > typedef struct SA {double a[3]; long l; } A; > A ret_arg_pt (A *a) {return *a;} > > The optimized GIMPLE code looks like: > = *a_2(D); > return ; > Here, (aka. RESULT_DECL) is MEM, and "aggregate_value_p" > returns true for . > > * While for below case, the generated code is still suboptimal. > ////case4.c > typedef struct SA {double a[3];} A; > A ret_arg_pt (A *a) {return *a;} > > The optimized GIMPLE code looks like: > D.3951 = *a_2(D); > return D.3951; > The "return/assign" stmts are using D.3951(VAR_DECL) instead > "(RESULT_DECL)". The mode of D.3951/ is BLK. > The RTL of D.3951 is MEM, and RTL of is PARALLEL. For > PARALLEL, aggregate_value_p returns false. > > In function expand_assignment, there is code: > if (TREE_CODE (to) == RESULT_DECL > && (REG_P (to_rtx) || GET_CODE (to_rtx) == PARALLEL)) > This code can handle "", but can not handle "D.3951". > > I'm thinking of one way to handle this issue is to update the > GIMPLE sequence as: " = *a_2(D); return ;" > Or, collecting VARs which are used by return stmts; and for > assignments to those VARs, using sub scalar mode for the block > move. > > Thanks for any comments and suggestions! > > > BR, > Jeff (Jiufu) > > --- > gcc/expr.cc | 40 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 40 insertions(+) > > diff --git a/gcc/expr.cc b/gcc/expr.cc > index d9407432ea5..420f9cf3662 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -6045,6 +6045,46 @@ expand_assignment (tree to, tree from, bool nontemporal) > return; > } > > + if (TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from) > + && TYPE_MODE (TREE_TYPE (from)) == BLKmode > + && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL > + || REG_P (DECL_INCOMING_RTL (from)))) > + { > + rtx parm = DECL_INCOMING_RTL (from); > + > + push_temp_slots (); > + machine_mode mode; > + mode = GET_CODE (parm) == PARALLEL > + ? GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0)) > + : word_mode; > + int mode_size = GET_MODE_SIZE (mode).to_constant (); > + int size = INTVAL (expr_size (from)); > + > + /* If/How the parameter using submode, it dependes on the size and > + position of the parameter. Here using heurisitic number. */ > + int hurstc_num = 8; > + if (size < mode_size || (size % mode_size) != 0 > + || size > (mode_size * hurstc_num)) > + result = store_expr (from, to_rtx, 0, nontemporal, false); > + else > + { > + rtx from_rtx > + = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL); > + for (int i = 0; i < size / mode_size; i++) > + { > + rtx temp = gen_reg_rtx (mode); > + rtx src = adjust_address (from_rtx, mode, mode_size * i); > + rtx dest = adjust_address (to_rtx, mode, mode_size * i); > + emit_move_insn (temp, src); > + emit_move_insn (dest, temp); > + } > + result = to_rtx; > + } > + preserve_temp_slots (result); > + pop_temp_slots (); > + return; > + } > + > /* Compute FROM and store the value in the rtx we got. */ > > push_temp_slots ();