From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 3CC653858CDA; Fri, 30 Dec 2022 02:22:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3CC653858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BU163PR026366; Fri, 30 Dec 2022 02:22:38 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : content-transfer-encoding : mime-version; s=pp1; bh=79bdJt68SNg5Fz4LfllagMZs17w/tcRAfoFDF3mMb/I=; b=NSQU7ti8/PPNZbHM3b+blupPj7HDfMyf00yg75ImjwX1PKxs0gtsVi07JAmscJisVxak HZeJa5D6Vt9h+67GMQM3k8oFkl9VW2I9D7hUqUEnnN6N+AguQMfwCQLlZKY/OVT9u1UG o77BPCXr3kn11eEBFQsvJ2ZCuKgjD/jX4fX45HDNZ3ZgNJ/7llrMDmc/zWeeIjfVfn9m RQZCpw7DclFoHdVDsXBeWK02jlM+0bGimTPv4X4KfHJKEELHOmrEynJW4IX24Ijym1r+ AtH4R6LFITobnpEgCZmnF0cL+cEDCBYkBjoujkZLlDwaj0tDBiWAnE9A0rSm8ymEiGYt mw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3msmn4t28s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Dec 2022 02:22:37 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2BU1gdwr013924; Fri, 30 Dec 2022 02:22:37 GMT Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3msmn4t28m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Dec 2022 02:22:36 +0000 Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1]) by ppma05wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 2BTNUFLZ010539; Fri, 30 Dec 2022 02:22:35 GMT Received: from smtprelay02.wdc07v.mail.ibm.com ([9.208.129.120]) by ppma05wdc.us.ibm.com (PPS) with ESMTPS id 3mns287efc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 30 Dec 2022 02:22:35 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay02.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2BU2MYMu8127116 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 30 Dec 2022 02:22:34 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 98A3B58055; Fri, 30 Dec 2022 02:22:34 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0F5095804B; Fri, 30 Dec 2022 02:22:34 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Fri, 30 Dec 2022 02:22:33 +0000 (GMT) From: Jiufu Guo To: Jiufu Guo via Gcc-patches Cc: Segher Boessenkool , Richard Biener , Richard Biener , dje.gcc@gmail.com, linkw@gcc.gnu.org, jeffreyalaw@gmail.com Subject: Re: [PATCH] loading float member of parameter stored via int registers References: <20221223165239.GA25951@gate.crashing.org> <20221223195207.GB25951@gate.crashing.org> <7ek02dft05.fsf@pike.rch.stglabs.ibm.com> <7e8ritexu0.fsf@pike.rch.stglabs.ibm.com> Date: Fri, 30 Dec 2022 10:22:31 +0800 In-Reply-To: <7e8ritexu0.fsf@pike.rch.stglabs.ibm.com> (Jiufu Guo via Gcc-patches's message of "Tue, 27 Dec 2022 22:16:23 +0800") Message-ID: <7epmc1eil4.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) Content-Type: text/plain; charset=utf-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 17K6VqHPMnPvz4CXJKhax0_qeUK9GPWI X-Proofpoint-ORIG-GUID: L2ifh5gW9mqflSiPGtDujux6af9qPYzt Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-30_01,2022-12-29_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 phishscore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 bulkscore=0 clxscore=1015 mlxlogscore=999 adultscore=0 spamscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212300017 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Jiufu Guo via Gcc-patches writes: > Hi, > > Jiufu Guo via Gcc-patches writes: > >> Hi, >> >> Segher Boessenkool writes: >> >>> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote: >>>> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool : >>>> > There are at least six very different kinds of subreg: >>>> >=20 >>>> > 0) Lvalue subregs. Most archs have no use for it, and it can be >>>> > expressed much more clearly and cleanly always. >>>> > 1) Subregs of mem. Do not use, deprecated. When old reload goes aw= ay >>>> > this will go away. >>>> > 2) Subregs of hard registers. Do not use, there are much better way= s to >>>> > write subregs of a non-zero byte offset, and for zero offset this = is >>>> > non-canonical RTL. >>>> > 3) Bitcast subregs. In principle they go from one mode to another m= ode >>>> > of the same size (but read on). >>>> > 4) Paradoxical subregs. A concept completely separate from the rest, >>>> > different rules for everything, it has to be special cased almost >>>> > everywhere, it would be better if it was a separate rtx_code imo. >>>> > 5) Finally, normal subregs, taking a contiguous span of bits from so= me >>>> > value. >>>> >=20 >>>> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is >>>> > written as just one subreg, as you say. And a 4) of a 5) is just >>>> > invalid afaics (and let's not talk about 0)..2) anymore :-) ) >>>> >=20 >>>> >> Note whether targets actually support subreg operations needs to be= queried and I=E2=80=99m not sure how subreg with offset validation should = work there. >>>> >=20 >>>> > But 3) is always valid, no? On pseudos >> I also has similar question: do we need to query/recog if "SF(SI#0)" is >> valid on the target, or it would always work (even through reload)? >> I also hit this during debugging on ppc64le: "SF(SI#0)" is valid, >> and "SF(DI#4)" is not valid.=20 >>>>=20 >>>> Yes, but it will eventually result in a spill/reload which is >>>> undesirable when we created this from CSE from a load. So I think >>>> for CSE we do want to know whether a spill will definitely not >>>> occur. >>> >>> Does it cause reloads though? On any sane backend? If no movsf pattern >>> allows integer registers, can things work at all? >>> >>> Anyway, the normal way to test if some RTL is valid is to just generate >>> it (using validate_change) and then do apply_change_group, which then >>> cancels the changes if they do not work. CSE already does some of >>> this. >> validate_change seems ok. Thanks! >>> >>> (I am doubtful doing any of this in CSE is a good idea fwiw). >> Understand your concern! Especially when we need to emit additional >> inns in CSE. >> While CSE does some similar work. It transforms >> "[sf:DI]=3D%x:DI; %y:DI=3D[sf:DI]" to "%y:DI=3D%x:DI". >> and "see if a MEM has already been loaded with a widening operation; >> if it has, we can use a subreg of that." (only for int modes). >> So, it may be acceptable to do this in CSE (maybe still seems >> hacking). > > This maybe works for "DI to DF", because "mode converting > subreg:DF(x:DI,0)" is cheaper than "mem load DF([sf:DI])". Then > "y:DF=3D[sf:DI]" can be replaced by "y:DF=3Dx:DI#0". > > While for "subreg:SF(x:SI,0)", in CSE, it may not cheaper. > So, it may be doubtful for "convert DI to SF" in CSE. Considering the limitations of CSE, I try to find other places to handle this issue, and notice DSE can optimize below code: "[sfp:DI]=3Dx:DI ; y:SI=3D[sfp:DI]" to "y:SI=3Dx:DI#0". So, I drafted a patch to update DSE to handle DI->DF/SF. The patch updates "extract_low_bits" to get mode change=20 with subreg. diff --git a/gcc/expmed.cc b/gcc/expmed.cc index b12b0e000c2..5e36331082c 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode sr= c_mode, rtx src) =20 if (!targetm.modes_tieable_p (src_int_mode, src_mode)) return NULL_RTX; - if (!targetm.modes_tieable_p (int_mode, mode)) + if (!targetm.modes_tieable_p (int_mode, mode) + && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode)) + && GET_MODE_CLASS (mode) =3D=3D MODE_FLOAT + && GET_MODE_CLASS (src_mode) =3D=3D MODE_INT)) return NULL_RTX; =20 src =3D gen_lowpart (src_int_mode, src); ----------- While I am aware that DSE is not good at cross basic-block. e.g. for code, it was not optimized. typedef struct SF {float a[4];int i1; int i2; } SF; int foo_si (SF a, int flag) { if (flag =3D=3D 2) return a.i1 + a.i2; return 0; } So, we may back to previous ideas which we discussed in early of this thread. https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608872.html BR, Jeff (Jiufu) > > Any comments or suggestions? > > > BR, > Jeff (Jiufu) > >> >> Thanks for so great comments! >> >> BR, >> Jeff (Jiufu) >> >>> >>> >>> Segher