From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=yh1s=4U=linux.ibm.com=guojiufu@sourceware.org>
Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5])
	by sourceware.org (Postfix) with ESMTPS id 265963858D1E;
	Thu, 22 Dec 2022 07:25:05 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 265963858D1E
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com
Received: from pps.filterd (m0127361.ppops.net [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BM7DcWc018746;
	Thu, 22 Dec 2022 07:25:04 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=mime-version : date :
 from : to : cc : subject : in-reply-to : references : message-id :
 content-type : content-transfer-encoding; s=pp1;
 bh=OSbYnhv0VlXVyyfYnMcseoxqwQF++9wFwCc+RjkWA/E=;
 b=dqRVJcRhwyJV8nONF3FqjjWWxQjUQaTGMlGfo/Og+5WnFwSwlU44nitnX++1jxqHnyMt
 BgqW0g9QVNqqF1vi+XKz9nxmdCKXsjPVQAxRXtYsZb6792208iRIXeZlmZNtoILiqYET
 df5sI3Ft10GuysKvYZkmtOEZOnYVSe7UoZbo+/TNUZjYaZ+HuINtBN3DuoR71SBcm9H4
 CWBDreOhLZF1sZeXGp3z7gRPoaza5Bz/yfSYQKWFA0oFmFs7Us1qWo9MYsXGaG/sxF7n
 LQEWHQLHjZBjv8J+7JQ9MbqhjXKu4VT0EqY7i5/cJWhU89V6gnMsoa5Wndy6zOY7B/j8 Vg== 
Received: from pps.reinject (localhost [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3mmjnhr813-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Thu, 22 Dec 2022 07:25:04 +0000
Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1])
	by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2BM7KKUr001974;
	Thu, 22 Dec 2022 07:25:03 GMT
Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3mmjnhr80n-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Thu, 22 Dec 2022 07:25:03 +0000
Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1])
	by ppma04wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 2BM5UFXg008294;
	Thu, 22 Dec 2022 07:25:03 GMT
Received: from smtprelay04.wdc07v.mail.ibm.com ([9.208.129.114])
	by ppma04wdc.us.ibm.com (PPS) with ESMTPS id 3mh6yxg5g0-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Thu, 22 Dec 2022 07:25:03 +0000
Received: from smtpav05.dal12v.mail.ibm.com (smtpav05.dal12v.mail.ibm.com [10.241.53.104])
	by smtprelay04.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2BM7P1Oh26149598
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Thu, 22 Dec 2022 07:25:02 GMT
Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 9CBB85804C;
	Thu, 22 Dec 2022 07:25:01 +0000 (GMT)
Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 4F62D5805D;
	Thu, 22 Dec 2022 07:25:01 +0000 (GMT)
Received: from ltc.linux.ibm.com (unknown [9.5.196.140])
	by smtpav05.dal12v.mail.ibm.com (Postfix) with ESMTP;
	Thu, 22 Dec 2022 07:25:01 +0000 (GMT)
MIME-Version: 1.0
Date: Thu, 22 Dec 2022 15:25:01 +0800
From: guojiufu <guojiufu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com,
        linkw@gcc.gnu.org, jeffreyalaw@gmail.com
Subject: Re: [PATCH] loading float member of parameter stored via int
 registers
In-Reply-To: <nycvar.YFH.7.77.849.2212210718030.5956@jbgna.fhfr.qr>
References: <20221221062736.78036-1-guojiufu@linux.ibm.com>
 <nycvar.YFH.7.77.849.2212210718030.5956@jbgna.fhfr.qr>
Message-ID: <58beeb5dd65a10b7480f73462da904a4@linux.ibm.com>
X-Sender: guojiufu@linux.ibm.com
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-ORIG-GUID: TT4qCJKu2-w8zWnwH4VuXS_II9KUlEQe
X-Proofpoint-GUID: gncIrTdz60pgBrNML0ZbfnTTv1xl59z6
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1
 definitions=2022-12-22_01,2022-12-21_01,2022-06-22_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0
 clxscore=1015 lowpriorityscore=0 spamscore=0 impostorscore=0
 mlxlogscore=999 priorityscore=1501 bulkscore=0 adultscore=0 phishscore=0
 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2212070000 definitions=main-2212220062
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi,

On 2022-12-21 15:30, Richard Biener wrote:
> On Wed, 21 Dec 2022, Jiufu Guo wrote:
> 
>> Hi,
>> 
>> This patch is fixing an issue about parameter accessing if the
>> parameter is struct type and passed through integer registers, and
>> there is floating member is accessed. Like below code:
>> 
>> typedef struct DF {double a[4]; long l; } DF;
>> double foo_df (DF arg){return arg.a[3];}
>> 
>> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
>> generated.  While instruction "mtvsrd 1, 6" would be enough for
>> this case.
> 
> So why do we end up spilling for PPC?

Good question! According to GCC source code (in function.cc/expr.cc),
it is common behavior: using "word_mode" to store the parameter to 
stack,
And using the field's mode (e.g. float mode) to load from the stack.
But with some tries, I fail to construct cases on many platforms.
So, I convert the fix to a target hook and implemented the rs6000 part
first.

> 
> struct X { int i; float f; };
> 
> float foo (struct X x)
> {
>   return x.f;
> }
> 
> does pass the structure in $RDI on x86_64 and we manage (with
> optimization, with -O0 we spill) to generate
> 
>         shrq    $32, %rdi
>         movd    %edi, %xmm0
> 
> and RTL expansion generates
> 
> (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
>         (reg:DI 5 di [ x ])) "t.c":4:1 -1
>      (nil))
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (parallel [
>             (set (reg:DI 85)
>                 (ashiftrt:DI (reg/v:DI 83 [ x ])
>                     (const_int 32 [0x20])))
>             (clobber (reg:CC 17 flags))
>         ]) "t.c":5:11 -1
>      (nil))
> (insn 7 6 8 2 (set (reg:SI 86)
>         (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
>      (nil))
> 
> I would imagine that for the ppc case we only see the subreg here
> which should be even easier to optimize.
> 
> So how's this not fixable by providing proper patterns / subreg
> capabilities?  Looking a bit at the RTL we have the issue might
> be that nothing seems to handle CSE of
> 

This case is also related to 'parameter on struct', PR89310 is
just for this case. On trunk, it is fixed.
One difference: the parameter is in DImode, and passed via an
integer register for "{int i; float f;}".
But for "{double a[4]; long l;}", the parameter is in BLKmode,
and stored to stack during the argument setup.

> (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
>                 (const_int 56 [0x38])) [2 arg+24 S8 A64])
>         (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg:DI 6 6)
>         (nil)))
> (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
> (note 10 7 15 2 NOTE_INSN_DELETED)
> (insn 15 10 16 2 (set (reg/i:DF 33 1)
>         (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
>                 (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) 
> "t.c":2:40
> 576 {*movdf_hardfloat64}
>      (nil))
> 
> Possibly because the store and load happen in a different mode?  Can
> you see why CSE doesn't handle this (producing a subreg)?  On

Yes, exactly! For "{double a[4]; long l;}", because the store and load
are using a different mode, and then CSE does not optimize it.  This
patch makes the store and load using the same mode (DImode), and then
leverage CSE to handle it.

> the GIMPLE side we'd happily do that (but we don't see the argument
> setup).

Thanks for your comments!


BR,
Jeff (Jiufu)

> 
> Thanks,
> Richard.
> 
>> This patch updates the behavior when loading floating members of a
>> parameter: if that floating member is stored via integer register,
>> then loading it as integer mode first, and converting it to floating
>> mode.
>> 
>> I also thought of a method: before storing the register to stack,
>> convert it to float mode first. While there are some cases that may
>> still prefer to keep an integer register store.
>> 
>> Bootstrap and regtest passes on ppc64{,le}.
>> I would ask for help to review for comments and if this patch is
>> acceptable for the trunk.
>> 
>> 
>> BR,
>> Jeff (Jiufu)
>> 
>> 	PR target/108073
>> 
>> gcc/ChangeLog:
>> 
>> 	* config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
>> 	macro definition.
>> 	(rs6000_loading_int_convert_to_float): New hook implement.
>> 	* doc/tm.texi: Regenerated.
>> 	* doc/tm.texi.in (loading_int_convert_to_float): New hook.
>> 	* expr.cc (expand_expr_real_1): Updated to use the new hook.
>> 	* target.def (loading_int_convert_to_float): New hook.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 	* g++.target/powerpc/pr102024.C: Update.
>> 	* gcc.target/powerpc/pr108073.c: New test.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc                 | 70 
>> +++++++++++++++++++++
>>  gcc/doc/tm.texi                             |  6 ++
>>  gcc/doc/tm.texi.in                          |  2 +
>>  gcc/expr.cc                                 | 15 +++++
>>  gcc/target.def                              | 11 ++++
>>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
>>  7 files changed, 129 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index b3a609f3aa3..af676eea276 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1559,6 +1559,9 @@ static const struct attribute_spec 
>> rs6000_attribute_table[] =
>>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
>>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
>> invalid_arg_for_unprototyped_fn
>> 
>> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT 
>> rs6000_loading_int_convert_to_float
>> +
>>  #undef TARGET_MD_ASM_ADJUST
>>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
>> 
>> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree 
>> typelist, const_tree funcdecl, const
>>  	  : NULL;
>>  }
>> 
>> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
>> +static rtx
>> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, 
>> rtx base)
>> +{
>> +  rtx src_base = XEXP (source, 0);
>> +  poly_uint64 offset = MEM_OFFSET (source);
>> +
>> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
>> +    {
>> +      offset += INTVAL (XEXP (src_base, 1));
>> +      src_base = XEXP (src_base, 0);
>> +    }
>> +
>> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
>> +    return NULL_RTX;
>> +
>> +  rtx temp_reg = gen_reg_rtx (word_mode);
>> +  rtx temp_mem = copy_rtx (source);
>> +  PUT_MODE (temp_mem, word_mode);
>> +
>> +  /* DI->DF */
>> +  if (word_mode == DImode && mode == DFmode)
>> +    {
>> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>> +	{
>> +	  emit_move_insn (temp_reg, temp_mem);
>> +	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode, 
>> 0);
>> +	  rtx target_reg = gen_reg_rtx (mode);
>> +	  emit_move_insn (target_reg, float_subreg);
>> +	  return target_reg;
>> +	}
>> +      return NULL_RTX;
>> +    }
>> +
>> +  /* Sub DI#->SF */
>> +  if (word_mode == DImode && mode == SFmode)
>> +    {
>> +      poly_uint64 byte_off = 0;
>> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
>> +	byte_off = 0;
>> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
>> +			   GET_MODE_SIZE (word_mode)))
>> +	byte_off = GET_MODE_SIZE (mode);
>> +      else
>> +	return NULL_RTX;
>> +
>> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
>> +      emit_move_insn (temp_reg, temp_mem);
>> +
>> +      /* little endia only? */
>> +      poly_uint64 high_off = subreg_highpart_offset (SImode, 
>> word_mode);
>> +      if (known_eq (byte_off, high_off))
>> +	{
>> +	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
>> +				   GET_MODE_PRECISION (SImode), temp_reg, 0);
>> +	}
>> +      rtx subreg_si = gen_reg_rtx (SImode);
>> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
>> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, 
>> SImode, 0);
>> +      rtx target_reg = gen_reg_rtx (mode);
>> +      emit_move_insn (target_reg, float_subreg);
>> +      return target_reg;
>> +    }
>> +
>> +  return NULL_RTX;
>> +}
>> +
>>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
>>     setup by using __stack_chk_fail_local hidden function instead of
>>     calling __stack_chk_fail directly.  Otherwise it is better to call
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 8fe49c2ba3d..10f94af553d 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -11933,6 +11933,12 @@ or when the back end is in a 
>> partially-initialized state.
>>  outside of any function scope.
>>  @end deftypefn
>> 
>> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT 
>> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
>> +If the target is protifiable to load an integer in word_mode from
>> +@var{source} which is based on @var{base}, then convert to a floating
>> +point value in @var{mode}.
>> +@end deftypefn
>> +
>>  @defmac TARGET_OBJECT_SUFFIX
>>  Define this macro to be a C string representing the suffix for object
>>  files on your target machine.  If you do not define this macro, GCC 
>> will
>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> index 62c49ac46de..1ca6a671d86 100644
>> --- a/gcc/doc/tm.texi.in
>> +++ b/gcc/doc/tm.texi.in
>> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
>> 
>>  @hook TARGET_SET_CURRENT_FUNCTION
>> 
>> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
>> +
>>  @defmac TARGET_OBJECT_SUFFIX
>>  Define this macro to be a C string representing the suffix for object
>>  files on your target machine.  If you do not define this macro, GCC 
>> will
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index d9407432ea5..466079220e7 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, 
>> machine_mode tmode,
>>  	    && modifier != EXPAND_WRITE)
>>  	  op0 = flip_storage_order (mode1, op0);
>> 
>> +	/* Accessing float field of struct parameter which passed via 
>> integer
>> +	   registers.  */
>> +	if (targetm.loading_int_convert_to_float && mode == mode1
>> +	    && GET_MODE_CLASS (mode) == MODE_FLOAT
>> +	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
>> +	    && REG_P (DECL_INCOMING_RTL (tem))
>> +	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
>> +	    && MEM_OFFSET_KNOWN_P (op0))
>> +	  {
>> +	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
>> +							    DECL_RTL (tem));
>> +	    if (res)
>> +	      op0 = res;
>> +	  }
>> +
>>  	if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
>>  	    || modifier == EXPAND_CONST_ADDRESS
>>  	    || modifier == EXPAND_INITIALIZER)
>> diff --git a/gcc/target.def b/gcc/target.def
>> index 082a7c62f34..837ce902489 100644
>> --- a/gcc/target.def
>> +++ b/gcc/target.def
>> @@ -4491,6 +4491,17 @@ original and the returned modes should be 
>> @code{MODE_INT}.",
>>   (machine_mode mode),
>>   default_preferred_doloop_mode)
>> 
>> +
>> +/* Loading an integer value from memory first, then convert (bitcast) 
>> to\n\n
>> +   floating point value, if target is able to support such behavior.  
>> */
>> +DEFHOOK
>> +(loading_int_convert_to_float,
>> +"If the target is protifiable to load an integer in word_mode from\n\
>> +@var{source} which is based on @var{base}, then convert to a 
>> floating\n\
>> +point value in @var{mode}.",
>> + rtx, (machine_mode mode, rtx source, rtx base),
>> + NULL)
>> +
>>  /* Returns true for a legitimate combined insn.  */
>>  DEFHOOK
>>  (legitimate_combined_insn,
>> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
>> b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> index 769585052b5..c8995cae707 100644
>> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> @@ -5,7 +5,7 @@
>>  // Test that a zero-width bit field in an otherwise homogeneous 
>> aggregate
>>  // generates a psabi warning and passes arguments in GPRs.
>> 
>> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>> 
>>  struct a_thing
>>  {
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> new file mode 100644
>> index 00000000000..aa02de56405
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> @@ -0,0 +1,24 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -save-temps" } */
>> +
>> +typedef struct DF {double a[4]; long l; } DF;
>> +typedef struct SF {float a[4];short l; } SF;
>> +
>> +/* Each of below function contains one mtvsrd.  */
>> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { 
>> has_arch_ppc64 && has_arch_pwr8 } } } } */
>> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
>> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
>> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
>> +
>> +double gd = 4.0;
>> +float gf1 = 3.0f, gf2 = 2.0f;
>> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
>> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
>> +
>> +int main()
>> +{
>> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) == 
>> gf2))
>> +    __builtin_abort ();
>> +  return 0;
>> +}
>> +
>>