From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 917FD3858425; Thu, 22 Dec 2022 11:28:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 917FD3858425 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 3526460936; Thu, 22 Dec 2022 11:28:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1671708482; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/B3se4hBgPUdA3NXWGbFq4PaWY3olQq/60uM7pK3kb4=; b=KeHKuSJqaPHCIL721PWgniPsaU3INZqBrClUiBSs/j4FaAMmemGD3fSBAW7vKglMbW+3TM hpawCc0WEuj6+yFtAOvbTZn9cq7Hrrlt85QA+idKCNxD9Ml1UC6dCrgyKFl+EVXyG7wYQp 4ughcChudZBuOj3PRtdV9dAsPBA9914= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1671708482; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/B3se4hBgPUdA3NXWGbFq4PaWY3olQq/60uM7pK3kb4=; b=JDmOHcZtDsBYI4Ym5btnwHo6jeP/CpNTh41KGnI8iR6jZyMtUzxX/7+XdqQ2eaRXRhte7c EprPJ1oFub35waCg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 036562C141; Thu, 22 Dec 2022 11:28:02 +0000 (UTC) Date: Thu, 22 Dec 2022 11:28:01 +0000 (UTC) From: Richard Biener To: Jiufu Guo cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, jeffreyalaw@gmail.com Subject: Re: [PATCH] loading float member of parameter stored via int registers In-Reply-To: <7ebknvhkun.fsf@pike.rch.stglabs.ibm.com> Message-ID: References: <20221221062736.78036-1-guojiufu@linux.ibm.com> <58beeb5dd65a10b7480f73462da904a4@linux.ibm.com> <7ebknvhkun.fsf@pike.rch.stglabs.ibm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 22 Dec 2022, Jiufu Guo wrote: > > Hi, > > Richard Biener writes: > > > On Thu, 22 Dec 2022, guojiufu wrote: > > > >> Hi, > >> > >> On 2022-12-21 15:30, Richard Biener wrote: > >> > On Wed, 21 Dec 2022, Jiufu Guo wrote: > >> > > >> >> Hi, > >> >> > >> >> This patch is fixing an issue about parameter accessing if the > >> >> parameter is struct type and passed through integer registers, and > >> >> there is floating member is accessed. Like below code: > >> >> > >> >> typedef struct DF {double a[4]; long l; } DF; > >> >> double foo_df (DF arg){return arg.a[3];} > >> >> > >> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is > >> >> generated. While instruction "mtvsrd 1, 6" would be enough for > >> >> this case. > >> > > >> > So why do we end up spilling for PPC? > >> > >> Good question! According to GCC source code (in function.cc/expr.cc), > >> it is common behavior: using "word_mode" to store the parameter to stack, > >> And using the field's mode (e.g. float mode) to load from the stack. > >> But with some tries, I fail to construct cases on many platforms. > >> So, I convert the fix to a target hook and implemented the rs6000 part > >> first. > >> > >> > > >> > struct X { int i; float f; }; > >> > > >> > float foo (struct X x) > >> > { > >> > return x.f; > >> > } > >> > > >> > does pass the structure in $RDI on x86_64 and we manage (with > >> > optimization, with -O0 we spill) to generate > >> > > >> > shrq $32, %rdi > >> > movd %edi, %xmm0 > >> > > >> > and RTL expansion generates > >> > > >> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > >> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ]) > >> > (reg:DI 5 di [ x ])) "t.c":4:1 -1 > >> > (nil)) > >> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > >> > (insn 6 3 7 2 (parallel [ > >> > (set (reg:DI 85) > >> > (ashiftrt:DI (reg/v:DI 83 [ x ]) > >> > (const_int 32 [0x20]))) > >> > (clobber (reg:CC 17 flags)) > >> > ]) "t.c":5:11 -1 > >> > (nil)) > >> > (insn 7 6 8 2 (set (reg:SI 86) > >> > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1 > >> > (nil)) > >> > > >> > I would imagine that for the ppc case we only see the subreg here > >> > which should be even easier to optimize. > >> > > >> > So how's this not fixable by providing proper patterns / subreg > >> > capabilities? Looking a bit at the RTL we have the issue might > >> > be that nothing seems to handle CSE of > >> > > >> > >> This case is also related to 'parameter on struct', PR89310 is > >> just for this case. On trunk, it is fixed. > >> One difference: the parameter is in DImode, and passed via an > >> integer register for "{int i; float f;}". > >> But for "{double a[4]; long l;}", the parameter is in BLKmode, > >> and stored to stack during the argument setup. > > > > OK, so this would be another case for "heuristics" to use > > sth different than word_mode for storing, but of course > > the arguments are in integer registers and using different > > modes can for example prohibit store-multiple instruction use. > > > > As I said in the related thread an RTL expansion time "SRA" > > with the incoming argument assignment in mind could make > > more optimal decisions for these kind of special-cases. > > Thanks a lot for your comments! > > Yeap! Using SRA-like analysis during expansion for parameter > and returns (and may also some field accessing) would be a > generic improvement for this kind of issue (PR101926 collected > a lot of them). > While we may still need some work for various ABIs and different > targets, to analyze where the 'struct field' come from > (int/float/vector/.. registers, or stack) and how the struct > need to be handled (keep in pseudo or store in the stack). > This may indicate a mount of changes for param_setup code. > > To reduce risk, I'm just draft straightforward patches for > special cases currently, Like: > https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html > and this patch. Heh, yes - though I'm not fond of special-casing things. RTL expansion is already full of special cases :/ So basically what I'd do is SRA analysis as if the aggregates were initialized by copies from the argument registers (and stack slots) and then try to avoid expanding the aggregate PARM_DECL itself but aim at fully scalarizing that with only the copies from the argument space remaining. > > > >> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > >> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp) > >> > (const_int 56 [0x38])) [2 arg+24 S8 A64]) > >> > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64} > >> > (expr_list:REG_DEAD (reg:DI 6 6) > >> > (nil))) > >> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG) > >> > (note 10 7 15 2 NOTE_INSN_DELETED) > >> > (insn 15 10 16 2 (set (reg/i:DF 33 1) > >> > (mem/c:DF (plus:DI (reg/f:DI 110 sfp) > >> > (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40 > >> > 576 {*movdf_hardfloat64} > >> > (nil)) > >> > > >> > Possibly because the store and load happen in a different mode? Can > >> > you see why CSE doesn't handle this (producing a subreg)? On > >> > >> Yes, exactly! For "{double a[4]; long l;}", because the store and load > >> are using a different mode, and then CSE does not optimize it. This > >> patch makes the store and load using the same mode (DImode), and then > >> leverage CSE to handle it. > > > > So can we instead fix CSE to consider replacing insn 15 above with > > > > (insn 15 (set (reg/i:DF 33 1) > > (subreg:DF (reg/f:DI 6 6))) > > > > Thanks for your suggestion! I will check to see if able to draft > a quick fix. > > BR, > Jeff (Jiufu) > > > ? One peculiarity is that this introduces a hardreg use in this > > special case. > > > > Richard. > > > >> > the GIMPLE side we'd happily do that (but we don't see the argument > >> > setup). > >> > >> Thanks for your comments! > >> > >> > >> BR, > >> Jeff (Jiufu) > >> > >> > > >> > Thanks, > >> > Richard. > >> > > >> >> This patch updates the behavior when loading floating members of a > >> >> parameter: if that floating member is stored via integer register, > >> >> then loading it as integer mode first, and converting it to floating > >> >> mode. > >> >> > >> >> I also thought of a method: before storing the register to stack, > >> >> convert it to float mode first. While there are some cases that may > >> >> still prefer to keep an integer register store. > >> >> > >> >> Bootstrap and regtest passes on ppc64{,le}. > >> >> I would ask for help to review for comments and if this patch is > >> >> acceptable for the trunk. > >> >> > >> >> > >> >> BR, > >> >> Jeff (Jiufu) > >> >> > >> >> PR target/108073 > >> >> > >> >> gcc/ChangeLog: > >> >> > >> >> * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New > >> >> macro definition. > >> >> (rs6000_loading_int_convert_to_float): New hook implement. > >> >> * doc/tm.texi: Regenerated. > >> >> * doc/tm.texi.in (loading_int_convert_to_float): New hook. > >> >> * expr.cc (expand_expr_real_1): Updated to use the new hook. > >> >> * target.def (loading_int_convert_to_float): New hook. > >> >> > >> >> gcc/testsuite/ChangeLog: > >> >> > >> >> * g++.target/powerpc/pr102024.C: Update. > >> >> * gcc.target/powerpc/pr108073.c: New test. > >> >> > >> >> --- > >> >> gcc/config/rs6000/rs6000.cc | 70 +++++++++++++++++++++ > >> >> gcc/doc/tm.texi | 6 ++ > >> >> gcc/doc/tm.texi.in | 2 + > >> >> gcc/expr.cc | 15 +++++ > >> >> gcc/target.def | 11 ++++ > >> >> gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- > >> >> gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++ > >> >> 7 files changed, 129 insertions(+), 1 deletion(-) > >> >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c > >> >> > >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > >> >> index b3a609f3aa3..af676eea276 100644 > >> >> --- a/gcc/config/rs6000/rs6000.cc > >> >> +++ b/gcc/config/rs6000/rs6000.cc > >> >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec > >> >> rs6000_attribute_table[] = > >> >> #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN > >> >> #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN > >> >> invalid_arg_for_unprototyped_fn > >> >> > >> >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT > >> >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT > >> >> rs6000_loading_int_convert_to_float > >> >> + > >> >> #undef TARGET_MD_ASM_ADJUST > >> >> #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust > >> >> > >> >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree > >> >> typelist, const_tree funcdecl, const > >> >> : NULL; > >> >> } > >> >> > >> >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */ > >> >> +static rtx > >> >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx > >> >> base) > >> >> +{ > >> >> + rtx src_base = XEXP (source, 0); > >> >> + poly_uint64 offset = MEM_OFFSET (source); > >> >> + > >> >> + if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1))) > >> >> + { > >> >> + offset += INTVAL (XEXP (src_base, 1)); > >> >> + src_base = XEXP (src_base, 0); > >> >> + } > >> >> + > >> >> + if (!rtx_equal_p (XEXP (base, 0), src_base)) > >> >> + return NULL_RTX; > >> >> + > >> >> + rtx temp_reg = gen_reg_rtx (word_mode); > >> >> + rtx temp_mem = copy_rtx (source); > >> >> + PUT_MODE (temp_mem, word_mode); > >> >> + > >> >> + /* DI->DF */ > >> >> + if (word_mode == DImode && mode == DFmode) > >> >> + { > >> >> + if (multiple_p (offset, GET_MODE_SIZE (word_mode))) > >> >> + { > >> >> + emit_move_insn (temp_reg, temp_mem); > >> >> + rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode, > >> >> 0); > >> >> + rtx target_reg = gen_reg_rtx (mode); > >> >> + emit_move_insn (target_reg, float_subreg); > >> >> + return target_reg; > >> >> + } > >> >> + return NULL_RTX; > >> >> + } > >> >> + > >> >> + /* Sub DI#->SF */ > >> >> + if (word_mode == DImode && mode == SFmode) > >> >> + { > >> >> + poly_uint64 byte_off = 0; > >> >> + if (multiple_p (offset, GET_MODE_SIZE (word_mode))) > >> >> + byte_off = 0; > >> >> + else if (multiple_p (offset - GET_MODE_SIZE (mode), > >> >> + GET_MODE_SIZE (word_mode))) > >> >> + byte_off = GET_MODE_SIZE (mode); > >> >> + else > >> >> + return NULL_RTX; > >> >> + > >> >> + temp_mem = adjust_address (temp_mem, word_mode, -byte_off); > >> >> + emit_move_insn (temp_reg, temp_mem); > >> >> + > >> >> + /* little endia only? */ > >> >> + poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode); > >> >> + if (known_eq (byte_off, high_off)) > >> >> + { > >> >> + temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg, > >> >> + GET_MODE_PRECISION (SImode), temp_reg, 0); > >> >> + } > >> >> + rtx subreg_si = gen_reg_rtx (SImode); > >> >> + emit_move_insn (subreg_si, gen_lowpart (SImode, temp_reg)); > >> >> + rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0); > >> >> + rtx target_reg = gen_reg_rtx (mode); > >> >> + emit_move_insn (target_reg, float_subreg); > >> >> + return target_reg; > >> >> + } > >> >> + > >> >> + return NULL_RTX; > >> >> +} > >> >> + > >> >> /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register > >> >> setup by using __stack_chk_fail_local hidden function instead of > >> >> calling __stack_chk_fail directly. Otherwise it is better to call > >> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > >> >> index 8fe49c2ba3d..10f94af553d 100644 > >> >> --- a/gcc/doc/tm.texi > >> >> +++ b/gcc/doc/tm.texi > >> >> @@ -11933,6 +11933,12 @@ or when the back end is in a > >> >> partially-initialized state. > >> >> outside of any function scope. > >> >> @end deftypefn > >> >> > >> >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT > >> >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base}) > >> >> +If the target is protifiable to load an integer in word_mode from > >> >> +@var{source} which is based on @var{base}, then convert to a floating > >> >> +point value in @var{mode}. > >> >> +@end deftypefn > >> >> + > >> >> @defmac TARGET_OBJECT_SUFFIX > >> >> Define this macro to be a C string representing the suffix for object > >> >> files on your target machine. If you do not define this macro, GCC > >> >> will > >> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > >> >> index 62c49ac46de..1ca6a671d86 100644 > >> >> --- a/gcc/doc/tm.texi.in > >> >> +++ b/gcc/doc/tm.texi.in > >> >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}. > >> >> > >> >> @hook TARGET_SET_CURRENT_FUNCTION > >> >> > >> >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT > >> >> + > >> >> @defmac TARGET_OBJECT_SUFFIX > >> >> Define this macro to be a C string representing the suffix for object > >> >> files on your target machine. If you do not define this macro, GCC > >> >> will > >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc > >> >> index d9407432ea5..466079220e7 100644 > >> >> --- a/gcc/expr.cc > >> >> +++ b/gcc/expr.cc > >> >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, > >> >> machine_mode tmode, > >> >> && modifier != EXPAND_WRITE) > >> >> op0 = flip_storage_order (mode1, op0); > >> >> > >> >> + /* Accessing float field of struct parameter which passed via integer > >> >> + registers. */ > >> >> + if (targetm.loading_int_convert_to_float && mode == mode1 > >> >> + && GET_MODE_CLASS (mode) == MODE_FLOAT > >> >> + && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem) > >> >> + && REG_P (DECL_INCOMING_RTL (tem)) > >> >> + && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0) > >> >> + && MEM_OFFSET_KNOWN_P (op0)) > >> >> + { > >> >> + rtx res = targetm.loading_int_convert_to_float (mode, op0, > >> >> + DECL_RTL (tem)); > >> >> + if (res) > >> >> + op0 = res; > >> >> + } > >> >> + > >> >> if (mode == mode1 || mode1 == BLKmode || mode1 == tmode > >> >> || modifier == EXPAND_CONST_ADDRESS > >> >> || modifier == EXPAND_INITIALIZER) > >> >> diff --git a/gcc/target.def b/gcc/target.def > >> >> index 082a7c62f34..837ce902489 100644 > >> >> --- a/gcc/target.def > >> >> +++ b/gcc/target.def > >> >> @@ -4491,6 +4491,17 @@ original and the returned modes should be > >> >> @code{MODE_INT}.", > >> >> (machine_mode mode), > >> >> default_preferred_doloop_mode) > >> >> > >> >> + > >> >> +/* Loading an integer value from memory first, then convert (bitcast) > >> >> to\n\n > >> >> + floating point value, if target is able to support such behavior. */ > >> >> +DEFHOOK > >> >> +(loading_int_convert_to_float, > >> >> +"If the target is protifiable to load an integer in word_mode from\n\ > >> >> +@var{source} which is based on @var{base}, then convert to a floating\n\ > >> >> +point value in @var{mode}.", > >> >> + rtx, (machine_mode mode, rtx source, rtx base), > >> >> + NULL) > >> >> + > >> >> /* Returns true for a legitimate combined insn. */ > >> >> DEFHOOK > >> >> (legitimate_combined_insn, > >> >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C > >> >> b/gcc/testsuite/g++.target/powerpc/pr102024.C > >> >> index 769585052b5..c8995cae707 100644 > >> >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C > >> >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C > >> >> @@ -5,7 +5,7 @@ > >> >> // Test that a zero-width bit field in an otherwise homogeneous aggregate > >> >> // generates a psabi warning and passes arguments in GPRs. > >> >> > >> >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } } > >> >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } > >> >> > >> >> struct a_thing > >> >> { > >> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c > >> >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c > >> >> new file mode 100644 > >> >> index 00000000000..aa02de56405 > >> >> --- /dev/null > >> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c > >> >> @@ -0,0 +1,24 @@ > >> >> +/* { dg-do run } */ > >> >> +/* { dg-options "-O2 -save-temps" } */ > >> >> + > >> >> +typedef struct DF {double a[4]; long l; } DF; > >> >> +typedef struct SF {float a[4];short l; } SF; > >> >> + > >> >> +/* Each of below function contains one mtvsrd. */ > >> >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { > >> >> has_arch_ppc64 && has_arch_pwr8 } } } } */ > >> >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];} > >> >> +float __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];} > >> >> +float __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];} > >> >> + > >> >> +double gd = 4.0; > >> >> +float gf1 = 3.0f, gf2 = 2.0f; > >> >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L}; > >> >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L}; > >> >> + > >> >> +int main() > >> >> +{ > >> >> + if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) == > >> >> gf2)) > >> >> + __builtin_abort (); > >> >> + return 0; > >> >> +} > >> >> + > >> >> > >> > >> > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)