From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 265963858D1E; Thu, 22 Dec 2022 07:25:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 265963858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BM7DcWc018746; Thu, 22 Dec 2022 07:25:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=mime-version : date : from : to : cc : subject : in-reply-to : references : message-id : content-type : content-transfer-encoding; s=pp1; bh=OSbYnhv0VlXVyyfYnMcseoxqwQF++9wFwCc+RjkWA/E=; b=dqRVJcRhwyJV8nONF3FqjjWWxQjUQaTGMlGfo/Og+5WnFwSwlU44nitnX++1jxqHnyMt BgqW0g9QVNqqF1vi+XKz9nxmdCKXsjPVQAxRXtYsZb6792208iRIXeZlmZNtoILiqYET df5sI3Ft10GuysKvYZkmtOEZOnYVSe7UoZbo+/TNUZjYaZ+HuINtBN3DuoR71SBcm9H4 CWBDreOhLZF1sZeXGp3z7gRPoaza5Bz/yfSYQKWFA0oFmFs7Us1qWo9MYsXGaG/sxF7n LQEWHQLHjZBjv8J+7JQ9MbqhjXKu4VT0EqY7i5/cJWhU89V6gnMsoa5Wndy6zOY7B/j8 Vg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3mmjnhr813-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 22 Dec 2022 07:25:04 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2BM7KKUr001974; Thu, 22 Dec 2022 07:25:03 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3mmjnhr80n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 22 Dec 2022 07:25:03 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 2BM5UFXg008294; Thu, 22 Dec 2022 07:25:03 GMT Received: from smtprelay04.wdc07v.mail.ibm.com ([9.208.129.114]) by ppma04wdc.us.ibm.com (PPS) with ESMTPS id 3mh6yxg5g0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 22 Dec 2022 07:25:03 +0000 Received: from smtpav05.dal12v.mail.ibm.com (smtpav05.dal12v.mail.ibm.com [10.241.53.104]) by smtprelay04.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2BM7P1Oh26149598 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 22 Dec 2022 07:25:02 GMT Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9CBB85804C; Thu, 22 Dec 2022 07:25:01 +0000 (GMT) Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4F62D5805D; Thu, 22 Dec 2022 07:25:01 +0000 (GMT) Received: from ltc.linux.ibm.com (unknown [9.5.196.140]) by smtpav05.dal12v.mail.ibm.com (Postfix) with ESMTP; Thu, 22 Dec 2022 07:25:01 +0000 (GMT) MIME-Version: 1.0 Date: Thu, 22 Dec 2022 15:25:01 +0800 From: guojiufu To: Richard Biener Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, jeffreyalaw@gmail.com Subject: Re: [PATCH] loading float member of parameter stored via int registers In-Reply-To: References: <20221221062736.78036-1-guojiufu@linux.ibm.com> Message-ID: <58beeb5dd65a10b7480f73462da904a4@linux.ibm.com> X-Sender: guojiufu@linux.ibm.com Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: TT4qCJKu2-w8zWnwH4VuXS_II9KUlEQe X-Proofpoint-GUID: gncIrTdz60pgBrNML0ZbfnTTv1xl59z6 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-22_01,2022-12-21_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 clxscore=1015 lowpriorityscore=0 spamscore=0 impostorscore=0 mlxlogscore=999 priorityscore=1501 bulkscore=0 adultscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212220062 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, On 2022-12-21 15:30, Richard Biener wrote: > On Wed, 21 Dec 2022, Jiufu Guo wrote: > >> Hi, >> >> This patch is fixing an issue about parameter accessing if the >> parameter is struct type and passed through integer registers, and >> there is floating member is accessed. Like below code: >> >> typedef struct DF {double a[4]; long l; } DF; >> double foo_df (DF arg){return arg.a[3];} >> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is >> generated. While instruction "mtvsrd 1, 6" would be enough for >> this case. > > So why do we end up spilling for PPC? Good question! According to GCC source code (in function.cc/expr.cc), it is common behavior: using "word_mode" to store the parameter to stack, And using the field's mode (e.g. float mode) to load from the stack. But with some tries, I fail to construct cases on many platforms. So, I convert the fix to a target hook and implemented the rs6000 part first. > > struct X { int i; float f; }; > > float foo (struct X x) > { > return x.f; > } > > does pass the structure in $RDI on x86_64 and we manage (with > optimization, with -O0 we spill) to generate > > shrq $32, %rdi > movd %edi, %xmm0 > > and RTL expansion generates > > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ]) > (reg:DI 5 di [ x ])) "t.c":4:1 -1 > (nil)) > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > (insn 6 3 7 2 (parallel [ > (set (reg:DI 85) > (ashiftrt:DI (reg/v:DI 83 [ x ]) > (const_int 32 [0x20]))) > (clobber (reg:CC 17 flags)) > ]) "t.c":5:11 -1 > (nil)) > (insn 7 6 8 2 (set (reg:SI 86) > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1 > (nil)) > > I would imagine that for the ppc case we only see the subreg here > which should be even easier to optimize. > > So how's this not fixable by providing proper patterns / subreg > capabilities? Looking a bit at the RTL we have the issue might > be that nothing seems to handle CSE of > This case is also related to 'parameter on struct', PR89310 is just for this case. On trunk, it is fixed. One difference: the parameter is in DImode, and passed via an integer register for "{int i; float f;}". But for "{double a[4]; long l;}", the parameter is in BLKmode, and stored to stack during the argument setup. > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp) > (const_int 56 [0x38])) [2 arg+24 S8 A64]) > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64} > (expr_list:REG_DEAD (reg:DI 6 6) > (nil))) > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG) > (note 10 7 15 2 NOTE_INSN_DELETED) > (insn 15 10 16 2 (set (reg/i:DF 33 1) > (mem/c:DF (plus:DI (reg/f:DI 110 sfp) > (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) > "t.c":2:40 > 576 {*movdf_hardfloat64} > (nil)) > > Possibly because the store and load happen in a different mode? Can > you see why CSE doesn't handle this (producing a subreg)? On Yes, exactly! For "{double a[4]; long l;}", because the store and load are using a different mode, and then CSE does not optimize it. This patch makes the store and load using the same mode (DImode), and then leverage CSE to handle it. > the GIMPLE side we'd happily do that (but we don't see the argument > setup). Thanks for your comments! BR, Jeff (Jiufu) > > Thanks, > Richard. > >> This patch updates the behavior when loading floating members of a >> parameter: if that floating member is stored via integer register, >> then loading it as integer mode first, and converting it to floating >> mode. >> >> I also thought of a method: before storing the register to stack, >> convert it to float mode first. While there are some cases that may >> still prefer to keep an integer register store. >> >> Bootstrap and regtest passes on ppc64{,le}. >> I would ask for help to review for comments and if this patch is >> acceptable for the trunk. >> >> >> BR, >> Jeff (Jiufu) >> >> PR target/108073 >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New >> macro definition. >> (rs6000_loading_int_convert_to_float): New hook implement. >> * doc/tm.texi: Regenerated. >> * doc/tm.texi.in (loading_int_convert_to_float): New hook. >> * expr.cc (expand_expr_real_1): Updated to use the new hook. >> * target.def (loading_int_convert_to_float): New hook. >> >> gcc/testsuite/ChangeLog: >> >> * g++.target/powerpc/pr102024.C: Update. >> * gcc.target/powerpc/pr108073.c: New test. >> >> --- >> gcc/config/rs6000/rs6000.cc | 70 >> +++++++++++++++++++++ >> gcc/doc/tm.texi | 6 ++ >> gcc/doc/tm.texi.in | 2 + >> gcc/expr.cc | 15 +++++ >> gcc/target.def | 11 ++++ >> gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- >> gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++ >> 7 files changed, 129 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index b3a609f3aa3..af676eea276 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec >> rs6000_attribute_table[] = >> #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN >> #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN >> invalid_arg_for_unprototyped_fn >> >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT >> rs6000_loading_int_convert_to_float >> + >> #undef TARGET_MD_ASM_ADJUST >> #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust >> >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree >> typelist, const_tree funcdecl, const >> : NULL; >> } >> >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */ >> +static rtx >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, >> rtx base) >> +{ >> + rtx src_base = XEXP (source, 0); >> + poly_uint64 offset = MEM_OFFSET (source); >> + >> + if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1))) >> + { >> + offset += INTVAL (XEXP (src_base, 1)); >> + src_base = XEXP (src_base, 0); >> + } >> + >> + if (!rtx_equal_p (XEXP (base, 0), src_base)) >> + return NULL_RTX; >> + >> + rtx temp_reg = gen_reg_rtx (word_mode); >> + rtx temp_mem = copy_rtx (source); >> + PUT_MODE (temp_mem, word_mode); >> + >> + /* DI->DF */ >> + if (word_mode == DImode && mode == DFmode) >> + { >> + if (multiple_p (offset, GET_MODE_SIZE (word_mode))) >> + { >> + emit_move_insn (temp_reg, temp_mem); >> + rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode, >> 0); >> + rtx target_reg = gen_reg_rtx (mode); >> + emit_move_insn (target_reg, float_subreg); >> + return target_reg; >> + } >> + return NULL_RTX; >> + } >> + >> + /* Sub DI#->SF */ >> + if (word_mode == DImode && mode == SFmode) >> + { >> + poly_uint64 byte_off = 0; >> + if (multiple_p (offset, GET_MODE_SIZE (word_mode))) >> + byte_off = 0; >> + else if (multiple_p (offset - GET_MODE_SIZE (mode), >> + GET_MODE_SIZE (word_mode))) >> + byte_off = GET_MODE_SIZE (mode); >> + else >> + return NULL_RTX; >> + >> + temp_mem = adjust_address (temp_mem, word_mode, -byte_off); >> + emit_move_insn (temp_reg, temp_mem); >> + >> + /* little endia only? */ >> + poly_uint64 high_off = subreg_highpart_offset (SImode, >> word_mode); >> + if (known_eq (byte_off, high_off)) >> + { >> + temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg, >> + GET_MODE_PRECISION (SImode), temp_reg, 0); >> + } >> + rtx subreg_si = gen_reg_rtx (SImode); >> + emit_move_insn (subreg_si, gen_lowpart (SImode, temp_reg)); >> + rtx float_subreg = simplify_gen_subreg (mode, subreg_si, >> SImode, 0); >> + rtx target_reg = gen_reg_rtx (mode); >> + emit_move_insn (target_reg, float_subreg); >> + return target_reg; >> + } >> + >> + return NULL_RTX; >> +} >> + >> /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register >> setup by using __stack_chk_fail_local hidden function instead of >> calling __stack_chk_fail directly. Otherwise it is better to call >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> index 8fe49c2ba3d..10f94af553d 100644 >> --- a/gcc/doc/tm.texi >> +++ b/gcc/doc/tm.texi >> @@ -11933,6 +11933,12 @@ or when the back end is in a >> partially-initialized state. >> outside of any function scope. >> @end deftypefn >> >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base}) >> +If the target is protifiable to load an integer in word_mode from >> +@var{source} which is based on @var{base}, then convert to a floating >> +point value in @var{mode}. >> +@end deftypefn >> + >> @defmac TARGET_OBJECT_SUFFIX >> Define this macro to be a C string representing the suffix for object >> files on your target machine. If you do not define this macro, GCC >> will >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in >> index 62c49ac46de..1ca6a671d86 100644 >> --- a/gcc/doc/tm.texi.in >> +++ b/gcc/doc/tm.texi.in >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}. >> >> @hook TARGET_SET_CURRENT_FUNCTION >> >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT >> + >> @defmac TARGET_OBJECT_SUFFIX >> Define this macro to be a C string representing the suffix for object >> files on your target machine. If you do not define this macro, GCC >> will >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> index d9407432ea5..466079220e7 100644 >> --- a/gcc/expr.cc >> +++ b/gcc/expr.cc >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, >> machine_mode tmode, >> && modifier != EXPAND_WRITE) >> op0 = flip_storage_order (mode1, op0); >> >> + /* Accessing float field of struct parameter which passed via >> integer >> + registers. */ >> + if (targetm.loading_int_convert_to_float && mode == mode1 >> + && GET_MODE_CLASS (mode) == MODE_FLOAT >> + && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem) >> + && REG_P (DECL_INCOMING_RTL (tem)) >> + && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0) >> + && MEM_OFFSET_KNOWN_P (op0)) >> + { >> + rtx res = targetm.loading_int_convert_to_float (mode, op0, >> + DECL_RTL (tem)); >> + if (res) >> + op0 = res; >> + } >> + >> if (mode == mode1 || mode1 == BLKmode || mode1 == tmode >> || modifier == EXPAND_CONST_ADDRESS >> || modifier == EXPAND_INITIALIZER) >> diff --git a/gcc/target.def b/gcc/target.def >> index 082a7c62f34..837ce902489 100644 >> --- a/gcc/target.def >> +++ b/gcc/target.def >> @@ -4491,6 +4491,17 @@ original and the returned modes should be >> @code{MODE_INT}.", >> (machine_mode mode), >> default_preferred_doloop_mode) >> >> + >> +/* Loading an integer value from memory first, then convert (bitcast) >> to\n\n >> + floating point value, if target is able to support such behavior. >> */ >> +DEFHOOK >> +(loading_int_convert_to_float, >> +"If the target is protifiable to load an integer in word_mode from\n\ >> +@var{source} which is based on @var{base}, then convert to a >> floating\n\ >> +point value in @var{mode}.", >> + rtx, (machine_mode mode, rtx source, rtx base), >> + NULL) >> + >> /* Returns true for a legitimate combined insn. */ >> DEFHOOK >> (legitimate_combined_insn, >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C >> b/gcc/testsuite/g++.target/powerpc/pr102024.C >> index 769585052b5..c8995cae707 100644 >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C >> @@ -5,7 +5,7 @@ >> // Test that a zero-width bit field in an otherwise homogeneous >> aggregate >> // generates a psabi warning and passes arguments in GPRs. >> >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } } >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } >> >> struct a_thing >> { >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c >> new file mode 100644 >> index 00000000000..aa02de56405 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c >> @@ -0,0 +1,24 @@ >> +/* { dg-do run } */ >> +/* { dg-options "-O2 -save-temps" } */ >> + >> +typedef struct DF {double a[4]; long l; } DF; >> +typedef struct SF {float a[4];short l; } SF; >> + >> +/* Each of below function contains one mtvsrd. */ >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { >> has_arch_ppc64 && has_arch_pwr8 } } } } */ >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];} >> +float __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];} >> +float __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];} >> + >> +double gd = 4.0; >> +float gf1 = 3.0f, gf2 = 2.0f; >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L}; >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L}; >> + >> +int main() >> +{ >> + if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) == >> gf2)) >> + __builtin_abort (); >> + return 0; >> +} >> + >>