From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 8D69E3858D1E; Thu, 22 Dec 2022 07:54:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8D69E3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 62FCB61253; Thu, 22 Dec 2022 07:54:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1671695688; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H++Gk4pB6SRzqsovUPY4bhLG4pdSZYAX3vj3yL5Qmfs=; b=I3Nf8RBTHLCrbaCCzDFVCRdoKzU2Wm51iJ8WG6+gUHbK4MUat0PHeMbHvYqmgBaXHRoxGq pmfLdE2THOSUZMWPUfN1hj1xnj3ldp+GDBcVRbaVk2KLCGM1aorbJxzHmw6wwbRzb6PIod 8mH2wOcQz8wSXXHOF7lbC6wtKpL8MTY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1671695688; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H++Gk4pB6SRzqsovUPY4bhLG4pdSZYAX3vj3yL5Qmfs=; b=G8c9Z+kl1cQwj7/eorzUg5GpZXTBnqHHL31XfpGKrgDSkuK1jCWmTWpbxjqqIx0oRn7SW0 1RO/mjZhOeVJTICQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 1D0B52C141; Thu, 22 Dec 2022 07:54:48 +0000 (UTC) Date: Thu, 22 Dec 2022 07:54:48 +0000 (UTC) From: Richard Biener To: guojiufu cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, jeffreyalaw@gmail.com Subject: Re: [PATCH] loading float member of parameter stored via int registers In-Reply-To: <58beeb5dd65a10b7480f73462da904a4@linux.ibm.com> Message-ID: References: <20221221062736.78036-1-guojiufu@linux.ibm.com> <58beeb5dd65a10b7480f73462da904a4@linux.ibm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,PDS_TONAME_EQ_TOLOCAL_HDRS_LCASE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 22 Dec 2022, guojiufu wrote: > Hi, > > On 2022-12-21 15:30, Richard Biener wrote: > > On Wed, 21 Dec 2022, Jiufu Guo wrote: > > > >> Hi, > >> > >> This patch is fixing an issue about parameter accessing if the > >> parameter is struct type and passed through integer registers, and > >> there is floating member is accessed. Like below code: > >> > >> typedef struct DF {double a[4]; long l; } DF; > >> double foo_df (DF arg){return arg.a[3];} > >> > >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is > >> generated. While instruction "mtvsrd 1, 6" would be enough for > >> this case. > > > > So why do we end up spilling for PPC? > > Good question! According to GCC source code (in function.cc/expr.cc), > it is common behavior: using "word_mode" to store the parameter to stack, > And using the field's mode (e.g. float mode) to load from the stack. > But with some tries, I fail to construct cases on many platforms. > So, I convert the fix to a target hook and implemented the rs6000 part > first. > > > > > struct X { int i; float f; }; > > > > float foo (struct X x) > > { > > return x.f; > > } > > > > does pass the structure in $RDI on x86_64 and we manage (with > > optimization, with -O0 we spill) to generate > > > > shrq $32, %rdi > > movd %edi, %xmm0 > > > > and RTL expansion generates > > > > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ]) > > (reg:DI 5 di [ x ])) "t.c":4:1 -1 > > (nil)) > > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > > (insn 6 3 7 2 (parallel [ > > (set (reg:DI 85) > > (ashiftrt:DI (reg/v:DI 83 [ x ]) > > (const_int 32 [0x20]))) > > (clobber (reg:CC 17 flags)) > > ]) "t.c":5:11 -1 > > (nil)) > > (insn 7 6 8 2 (set (reg:SI 86) > > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1 > > (nil)) > > > > I would imagine that for the ppc case we only see the subreg here > > which should be even easier to optimize. > > > > So how's this not fixable by providing proper patterns / subreg > > capabilities? Looking a bit at the RTL we have the issue might > > be that nothing seems to handle CSE of > > > > This case is also related to 'parameter on struct', PR89310 is > just for this case. On trunk, it is fixed. > One difference: the parameter is in DImode, and passed via an > integer register for "{int i; float f;}". > But for "{double a[4]; long l;}", the parameter is in BLKmode, > and stored to stack during the argument setup. OK, so this would be another case for "heuristics" to use sth different than word_mode for storing, but of course the arguments are in integer registers and using different modes can for example prohibit store-multiple instruction use. As I said in the related thread an RTL expansion time "SRA" with the incoming argument assignment in mind could make more optimal decisions for these kind of special-cases. > > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp) > > (const_int 56 [0x38])) [2 arg+24 S8 A64]) > > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64} > > (expr_list:REG_DEAD (reg:DI 6 6) > > (nil))) > > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG) > > (note 10 7 15 2 NOTE_INSN_DELETED) > > (insn 15 10 16 2 (set (reg/i:DF 33 1) > > (mem/c:DF (plus:DI (reg/f:DI 110 sfp) > > (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40 > > 576 {*movdf_hardfloat64} > > (nil)) > > > > Possibly because the store and load happen in a different mode? Can > > you see why CSE doesn't handle this (producing a subreg)? On > > Yes, exactly! For "{double a[4]; long l;}", because the store and load > are using a different mode, and then CSE does not optimize it. This > patch makes the store and load using the same mode (DImode), and then > leverage CSE to handle it. So can we instead fix CSE to consider replacing insn 15 above with (insn 15 (set (reg/i:DF 33 1) (subreg:DF (reg/f:DI 6 6))) ? One peculiarity is that this introduces a hardreg use in this special case. Richard. > > the GIMPLE side we'd happily do that (but we don't see the argument > > setup). > > Thanks for your comments! > > > BR, > Jeff (Jiufu) > > > > > Thanks, > > Richard. > > > >> This patch updates the behavior when loading floating members of a > >> parameter: if that floating member is stored via integer register, > >> then loading it as integer mode first, and converting it to floating > >> mode. > >> > >> I also thought of a method: before storing the register to stack, > >> convert it to float mode first. While there are some cases that may > >> still prefer to keep an integer register store. > >> > >> Bootstrap and regtest passes on ppc64{,le}. > >> I would ask for help to review for comments and if this patch is > >> acceptable for the trunk. > >> > >> > >> BR, > >> Jeff (Jiufu) > >> > >> PR target/108073 > >> > >> gcc/ChangeLog: > >> > >> * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New > >> macro definition. > >> (rs6000_loading_int_convert_to_float): New hook implement. > >> * doc/tm.texi: Regenerated. > >> * doc/tm.texi.in (loading_int_convert_to_float): New hook. > >> * expr.cc (expand_expr_real_1): Updated to use the new hook. > >> * target.def (loading_int_convert_to_float): New hook. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * g++.target/powerpc/pr102024.C: Update. > >> * gcc.target/powerpc/pr108073.c: New test. > >> > >> --- > >> gcc/config/rs6000/rs6000.cc | 70 +++++++++++++++++++++ > >> gcc/doc/tm.texi | 6 ++ > >> gcc/doc/tm.texi.in | 2 + > >> gcc/expr.cc | 15 +++++ > >> gcc/target.def | 11 ++++ > >> gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- > >> gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++ > >> 7 files changed, 129 insertions(+), 1 deletion(-) > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c > >> > >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > >> index b3a609f3aa3..af676eea276 100644 > >> --- a/gcc/config/rs6000/rs6000.cc > >> +++ b/gcc/config/rs6000/rs6000.cc > >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec > >> rs6000_attribute_table[] = > >> #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN > >> #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN > >> invalid_arg_for_unprototyped_fn > >> > >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT > >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT > >> rs6000_loading_int_convert_to_float > >> + > >> #undef TARGET_MD_ASM_ADJUST > >> #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust > >> > >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree > >> typelist, const_tree funcdecl, const > >> : NULL; > >> } > >> > >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */ > >> +static rtx > >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx > >> base) > >> +{ > >> + rtx src_base = XEXP (source, 0); > >> + poly_uint64 offset = MEM_OFFSET (source); > >> + > >> + if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1))) > >> + { > >> + offset += INTVAL (XEXP (src_base, 1)); > >> + src_base = XEXP (src_base, 0); > >> + } > >> + > >> + if (!rtx_equal_p (XEXP (base, 0), src_base)) > >> + return NULL_RTX; > >> + > >> + rtx temp_reg = gen_reg_rtx (word_mode); > >> + rtx temp_mem = copy_rtx (source); > >> + PUT_MODE (temp_mem, word_mode); > >> + > >> + /* DI->DF */ > >> + if (word_mode == DImode && mode == DFmode) > >> + { > >> + if (multiple_p (offset, GET_MODE_SIZE (word_mode))) > >> + { > >> + emit_move_insn (temp_reg, temp_mem); > >> + rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode, > >> 0); > >> + rtx target_reg = gen_reg_rtx (mode); > >> + emit_move_insn (target_reg, float_subreg); > >> + return target_reg; > >> + } > >> + return NULL_RTX; > >> + } > >> + > >> + /* Sub DI#->SF */ > >> + if (word_mode == DImode && mode == SFmode) > >> + { > >> + poly_uint64 byte_off = 0; > >> + if (multiple_p (offset, GET_MODE_SIZE (word_mode))) > >> + byte_off = 0; > >> + else if (multiple_p (offset - GET_MODE_SIZE (mode), > >> + GET_MODE_SIZE (word_mode))) > >> + byte_off = GET_MODE_SIZE (mode); > >> + else > >> + return NULL_RTX; > >> + > >> + temp_mem = adjust_address (temp_mem, word_mode, -byte_off); > >> + emit_move_insn (temp_reg, temp_mem); > >> + > >> + /* little endia only? */ > >> + poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode); > >> + if (known_eq (byte_off, high_off)) > >> + { > >> + temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg, > >> + GET_MODE_PRECISION (SImode), temp_reg, 0); > >> + } > >> + rtx subreg_si = gen_reg_rtx (SImode); > >> + emit_move_insn (subreg_si, gen_lowpart (SImode, temp_reg)); > >> + rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0); > >> + rtx target_reg = gen_reg_rtx (mode); > >> + emit_move_insn (target_reg, float_subreg); > >> + return target_reg; > >> + } > >> + > >> + return NULL_RTX; > >> +} > >> + > >> /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register > >> setup by using __stack_chk_fail_local hidden function instead of > >> calling __stack_chk_fail directly. Otherwise it is better to call > >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > >> index 8fe49c2ba3d..10f94af553d 100644 > >> --- a/gcc/doc/tm.texi > >> +++ b/gcc/doc/tm.texi > >> @@ -11933,6 +11933,12 @@ or when the back end is in a > >> partially-initialized state. > >> outside of any function scope. > >> @end deftypefn > >> > >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT > >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base}) > >> +If the target is protifiable to load an integer in word_mode from > >> +@var{source} which is based on @var{base}, then convert to a floating > >> +point value in @var{mode}. > >> +@end deftypefn > >> + > >> @defmac TARGET_OBJECT_SUFFIX > >> Define this macro to be a C string representing the suffix for object > >> files on your target machine. If you do not define this macro, GCC > >> will > >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > >> index 62c49ac46de..1ca6a671d86 100644 > >> --- a/gcc/doc/tm.texi.in > >> +++ b/gcc/doc/tm.texi.in > >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}. > >> > >> @hook TARGET_SET_CURRENT_FUNCTION > >> > >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT > >> + > >> @defmac TARGET_OBJECT_SUFFIX > >> Define this macro to be a C string representing the suffix for object > >> files on your target machine. If you do not define this macro, GCC > >> will > >> diff --git a/gcc/expr.cc b/gcc/expr.cc > >> index d9407432ea5..466079220e7 100644 > >> --- a/gcc/expr.cc > >> +++ b/gcc/expr.cc > >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, > >> machine_mode tmode, > >> && modifier != EXPAND_WRITE) > >> op0 = flip_storage_order (mode1, op0); > >> > >> + /* Accessing float field of struct parameter which passed via integer > >> + registers. */ > >> + if (targetm.loading_int_convert_to_float && mode == mode1 > >> + && GET_MODE_CLASS (mode) == MODE_FLOAT > >> + && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem) > >> + && REG_P (DECL_INCOMING_RTL (tem)) > >> + && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0) > >> + && MEM_OFFSET_KNOWN_P (op0)) > >> + { > >> + rtx res = targetm.loading_int_convert_to_float (mode, op0, > >> + DECL_RTL (tem)); > >> + if (res) > >> + op0 = res; > >> + } > >> + > >> if (mode == mode1 || mode1 == BLKmode || mode1 == tmode > >> || modifier == EXPAND_CONST_ADDRESS > >> || modifier == EXPAND_INITIALIZER) > >> diff --git a/gcc/target.def b/gcc/target.def > >> index 082a7c62f34..837ce902489 100644 > >> --- a/gcc/target.def > >> +++ b/gcc/target.def > >> @@ -4491,6 +4491,17 @@ original and the returned modes should be > >> @code{MODE_INT}.", > >> (machine_mode mode), > >> default_preferred_doloop_mode) > >> > >> + > >> +/* Loading an integer value from memory first, then convert (bitcast) > >> to\n\n > >> + floating point value, if target is able to support such behavior. */ > >> +DEFHOOK > >> +(loading_int_convert_to_float, > >> +"If the target is protifiable to load an integer in word_mode from\n\ > >> +@var{source} which is based on @var{base}, then convert to a floating\n\ > >> +point value in @var{mode}.", > >> + rtx, (machine_mode mode, rtx source, rtx base), > >> + NULL) > >> + > >> /* Returns true for a legitimate combined insn. */ > >> DEFHOOK > >> (legitimate_combined_insn, > >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C > >> b/gcc/testsuite/g++.target/powerpc/pr102024.C > >> index 769585052b5..c8995cae707 100644 > >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C > >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C > >> @@ -5,7 +5,7 @@ > >> // Test that a zero-width bit field in an otherwise homogeneous aggregate > >> // generates a psabi warning and passes arguments in GPRs. > >> > >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } } > >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } > >> > >> struct a_thing > >> { > >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c > >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c > >> new file mode 100644 > >> index 00000000000..aa02de56405 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c > >> @@ -0,0 +1,24 @@ > >> +/* { dg-do run } */ > >> +/* { dg-options "-O2 -save-temps" } */ > >> + > >> +typedef struct DF {double a[4]; long l; } DF; > >> +typedef struct SF {float a[4];short l; } SF; > >> + > >> +/* Each of below function contains one mtvsrd. */ > >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { > >> has_arch_ppc64 && has_arch_pwr8 } } } } */ > >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];} > >> +float __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];} > >> +float __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];} > >> + > >> +double gd = 4.0; > >> +float gf1 = 3.0f, gf2 = 2.0f; > >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L}; > >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L}; > >> + > >> +int main() > >> +{ > >> + if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) == > >> gf2)) > >> + __builtin_abort (); > >> + return 0; > >> +} > >> + > >> > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)