From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=dyNN=44=linux.ibm.com=guojiufu@sourceware.org>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1])
	by sourceware.org (Postfix) with ESMTPS id 3CC653858CDA;
	Fri, 30 Dec 2022 02:22:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3CC653858CDA
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com
Received: from pps.filterd (m0098396.ppops.net [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BU163PR026366;
	Fri, 30 Dec 2022 02:22:38 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject
 : references : date : in-reply-to : message-id : content-type :
 content-transfer-encoding : mime-version; s=pp1;
 bh=79bdJt68SNg5Fz4LfllagMZs17w/tcRAfoFDF3mMb/I=;
 b=NSQU7ti8/PPNZbHM3b+blupPj7HDfMyf00yg75ImjwX1PKxs0gtsVi07JAmscJisVxak
 HZeJa5D6Vt9h+67GMQM3k8oFkl9VW2I9D7hUqUEnnN6N+AguQMfwCQLlZKY/OVT9u1UG
 o77BPCXr3kn11eEBFQsvJ2ZCuKgjD/jX4fX45HDNZ3ZgNJ/7llrMDmc/zWeeIjfVfn9m
 RQZCpw7DclFoHdVDsXBeWK02jlM+0bGimTPv4X4KfHJKEELHOmrEynJW4IX24Ijym1r+
 AtH4R6LFITobnpEgCZmnF0cL+cEDCBYkBjoujkZLlDwaj0tDBiWAnE9A0rSm8ymEiGYt mw== 
Received: from pps.reinject (localhost [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3msmn4t28s-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Fri, 30 Dec 2022 02:22:37 +0000
Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1])
	by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2BU1gdwr013924;
	Fri, 30 Dec 2022 02:22:37 GMT
Received: from ppma05wdc.us.ibm.com (1b.90.2fa9.ip4.static.sl-reverse.com [169.47.144.27])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3msmn4t28m-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Fri, 30 Dec 2022 02:22:36 +0000
Received: from pps.filterd (ppma05wdc.us.ibm.com [127.0.0.1])
	by ppma05wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 2BTNUFLZ010539;
	Fri, 30 Dec 2022 02:22:35 GMT
Received: from smtprelay02.wdc07v.mail.ibm.com ([9.208.129.120])
	by ppma05wdc.us.ibm.com (PPS) with ESMTPS id 3mns287efc-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Fri, 30 Dec 2022 02:22:35 +0000
Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228])
	by smtprelay02.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2BU2MYMu8127116
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Fri, 30 Dec 2022 02:22:34 GMT
Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 98A3B58055;
	Fri, 30 Dec 2022 02:22:34 +0000 (GMT)
Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 0F5095804B;
	Fri, 30 Dec 2022 02:22:34 +0000 (GMT)
Received: from pike (unknown [9.5.12.127])
	by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTPS;
	Fri, 30 Dec 2022 02:22:33 +0000 (GMT)
From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
        Richard Biener <richard.guenther@gmail.com>,
        Richard Biener <rguenther@suse.de>, dje.gcc@gmail.com,
        linkw@gcc.gnu.org, jeffreyalaw@gmail.com
Subject: Re: [PATCH] loading float member of parameter stored via int registers
References: <20221223165239.GA25951@gate.crashing.org>
	<E2B896DC-D44E-444C-B191-8758ACF894D5@gmail.com>
	<20221223195207.GB25951@gate.crashing.org>
	<7ek02dft05.fsf@pike.rch.stglabs.ibm.com>
	<7e8ritexu0.fsf@pike.rch.stglabs.ibm.com>
Date: Fri, 30 Dec 2022 10:22:31 +0800
In-Reply-To: <7e8ritexu0.fsf@pike.rch.stglabs.ibm.com> (Jiufu Guo via
	Gcc-patches's message of "Tue, 27 Dec 2022 22:16:23 +0800")
Message-ID: <7epmc1eil4.fsf@pike.rch.stglabs.ibm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
Content-Type: text/plain; charset=utf-8
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: 17K6VqHPMnPvz4CXJKhax0_qeUK9GPWI
X-Proofpoint-ORIG-GUID: L2ifh5gW9mqflSiPGtDujux6af9qPYzt
Content-Transfer-Encoding: quoted-printable
X-Proofpoint-UnRewURL: 0 URL was un-rewritten
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1
 definitions=2022-12-30_01,2022-12-29_02,2022-06-22_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0
 phishscore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 bulkscore=0
 clxscore=1015 mlxlogscore=999 adultscore=0 spamscore=0 impostorscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000
 definitions=main-2212300017
X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi,

Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
>
> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
>> Hi,
>>
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>>
>>> On Fri, Dec 23, 2022 at 08:13:48PM +0100, Richard Biener wrote:
>>>> > Am 23.12.2022 um 17:55 schrieb Segher Boessenkool <segher@kernel.cra=
shing.org>:
>>>> > There are at least six very different kinds of subreg:
>>>> >=20
>>>> > 0) Lvalue subregs.  Most archs have no use for it, and it can be
>>>> >   expressed much more clearly and cleanly always.
>>>> > 1) Subregs of mem.  Do not use, deprecated.  When old reload goes aw=
ay
>>>> >   this will go away.
>>>> > 2) Subregs of hard registers.  Do not use, there are much better way=
s to
>>>> >   write subregs of a non-zero byte offset, and for zero offset this =
is
>>>> >   non-canonical RTL.
>>>> > 3) Bitcast subregs.  In principle they go from one mode to another m=
ode
>>>> >   of the same size (but read on).
>>>> > 4) Paradoxical subregs.  A concept completely separate from the rest,
>>>> >   different rules for everything, it has to be special cased almost
>>>> >   everywhere, it would be better if it was a separate rtx_code imo.
>>>> > 5) Finally, normal subregs, taking a contiguous span of bits from so=
me
>>>> >   value.
>>>> >=20
>>>> > Now, it is invalid to have a subreg of a subreg, so a 3) of a 5) is
>>>> > written as just one subreg, as you say.  And a 4) of a 5) is just
>>>> > invalid afaics (and let's not talk about 0)..2) anymore :-) )
>>>> >=20
>>>> >> Note whether targets actually support subreg operations needs to be=
 queried and I=E2=80=99m not sure how subreg with offset validation should =
work there.
>>>> >=20
>>>> > But 3) is always valid, no?  On pseudos
>> I also has similar question: do we need to query/recog if "SF(SI#0)" is
>> valid on the target, or it would always work (even through reload)?
>> I also hit this during debugging on ppc64le: "SF(SI#0)" is valid,
>> and "SF(DI#4)" is not valid.=20
>>>>=20
>>>> Yes, but it will eventually result in a spill/reload which is
>>>> undesirable when we created this from CSE from a load.  So I think
>>>> for CSE we do want to know whether a spill will definitely not
>>>> occur.
>>>
>>> Does it cause reloads though?  On any sane backend?  If no movsf pattern
>>> allows integer registers, can things work at all?
>>>
>>> Anyway, the normal way to test if some RTL is valid is to just generate
>>> it (using validate_change) and then do apply_change_group, which then
>>> cancels the changes if they do not work.  CSE already does some of
>>> this.
>> validate_change seems ok. Thanks!
>>>
>>> (I am doubtful doing any of this in CSE is a good idea fwiw).
>> Understand your concern! Especially when we need to emit additional
>> inns in CSE.
>> While CSE does some similar work. It transforms
>> "[sf:DI]=3D%x:DI; %y:DI=3D[sf:DI]" to "%y:DI=3D%x:DI".
>> and "see if a MEM has already been loaded with a widening operation;
>> if it has, we can use a subreg of that." (only for int modes).
>> So, it may be acceptable to do this in CSE (maybe still seems
>> hacking).
>
> This maybe works for "DI to DF", because "mode converting
> subreg:DF(x:DI,0)" is cheaper than "mem load DF([sf:DI])". Then
> "y:DF=3D[sf:DI]" can be replaced by "y:DF=3Dx:DI#0".
>
> While for "subreg:SF(x:SI,0)", in CSE, it may not cheaper.
> So, it may be doubtful for "convert DI to SF" in CSE.

Considering the limitations of CSE, I try to find other places
to handle this issue, and notice DSE can optimize below code:
"[sfp:DI]=3Dx:DI ; y:SI=3D[sfp:DI]" to "y:SI=3Dx:DI#0".

So, I drafted a patch to update DSE to handle DI->DF/SF.
The patch updates "extract_low_bits" to get mode change=20
with subreg.

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index b12b0e000c2..5e36331082c 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2439,7 +2439,10 @@ extract_low_bits (machine_mode mode, machine_mode sr=
c_mode, rtx src)
=20
   if (!targetm.modes_tieable_p (src_int_mode, src_mode))
     return NULL_RTX;
-  if (!targetm.modes_tieable_p (int_mode, mode))
+  if (!targetm.modes_tieable_p (int_mode, mode)
+      && !(known_le (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
+	   && GET_MODE_CLASS (mode) =3D=3D MODE_FLOAT
+	   && GET_MODE_CLASS (src_mode) =3D=3D MODE_INT))
     return NULL_RTX;
=20
   src =3D gen_lowpart (src_int_mode, src);
-----------

While I am aware that DSE is not good at cross basic-block.
e.g. for code, it was not optimized.

typedef struct SF {float a[4];int i1; int i2; } SF;
int foo_si (SF a, int flag)
{
  if (flag =3D=3D 2)
    return a.i1 + a.i2;
  return 0;
}


So, we may back to previous ideas which we discussed in
early of this thread.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608872.html


BR,
Jeff (Jiufu)

>
> Any comments or suggestions?
>
>
> BR,
> Jeff (Jiufu)
>
>>
>> Thanks for so great comments!
>>
>> BR,
>> Jeff (Jiufu)
>>
>>>
>>>
>>> Segher