From: Richard Biener <rguenther@suse.de>
To: guojiufu <guojiufu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org,
dje.gcc@gmail.com, linkw@gcc.gnu.org, jeffreyalaw@gmail.com
Subject: Re: [PATCH] loading float member of parameter stored via int registers
Date: Thu, 22 Dec 2022 07:54:48 +0000 (UTC) [thread overview]
Message-ID: <nycvar.YFH.7.77.849.2212220748510.5956@jbgna.fhfr.qr> (raw)
In-Reply-To: <58beeb5dd65a10b7480f73462da904a4@linux.ibm.com>
On Thu, 22 Dec 2022, guojiufu wrote:
> Hi,
>
> On 2022-12-21 15:30, Richard Biener wrote:
> > On Wed, 21 Dec 2022, Jiufu Guo wrote:
> >
> >> Hi,
> >>
> >> This patch is fixing an issue about parameter accessing if the
> >> parameter is struct type and passed through integer registers, and
> >> there is floating member is accessed. Like below code:
> >>
> >> typedef struct DF {double a[4]; long l; } DF;
> >> double foo_df (DF arg){return arg.a[3];}
> >>
> >> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
> >> generated. While instruction "mtvsrd 1, 6" would be enough for
> >> this case.
> >
> > So why do we end up spilling for PPC?
>
> Good question! According to GCC source code (in function.cc/expr.cc),
> it is common behavior: using "word_mode" to store the parameter to stack,
> And using the field's mode (e.g. float mode) to load from the stack.
> But with some tries, I fail to construct cases on many platforms.
> So, I convert the fix to a target hook and implemented the rs6000 part
> first.
>
> >
> > struct X { int i; float f; };
> >
> > float foo (struct X x)
> > {
> > return x.f;
> > }
> >
> > does pass the structure in $RDI on x86_64 and we manage (with
> > optimization, with -O0 we spill) to generate
> >
> > shrq $32, %rdi
> > movd %edi, %xmm0
> >
> > and RTL expansion generates
> >
> > (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > (insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
> > (reg:DI 5 di [ x ])) "t.c":4:1 -1
> > (nil))
> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> > (insn 6 3 7 2 (parallel [
> > (set (reg:DI 85)
> > (ashiftrt:DI (reg/v:DI 83 [ x ])
> > (const_int 32 [0x20])))
> > (clobber (reg:CC 17 flags))
> > ]) "t.c":5:11 -1
> > (nil))
> > (insn 7 6 8 2 (set (reg:SI 86)
> > (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
> > (nil))
> >
> > I would imagine that for the ppc case we only see the subreg here
> > which should be even easier to optimize.
> >
> > So how's this not fixable by providing proper patterns / subreg
> > capabilities? Looking a bit at the RTL we have the issue might
> > be that nothing seems to handle CSE of
> >
>
> This case is also related to 'parameter on struct', PR89310 is
> just for this case. On trunk, it is fixed.
> One difference: the parameter is in DImode, and passed via an
> integer register for "{int i; float f;}".
> But for "{double a[4]; long l;}", the parameter is in BLKmode,
> and stored to stack during the argument setup.
OK, so this would be another case for "heuristics" to use
sth different than word_mode for storing, but of course
the arguments are in integer registers and using different
modes can for example prohibit store-multiple instruction use.
As I said in the related thread an RTL expansion time "SRA"
with the incoming argument assignment in mind could make
more optimal decisions for these kind of special-cases.
> > (note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> > (insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
> > (const_int 56 [0x38])) [2 arg+24 S8 A64])
> > (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
> > (expr_list:REG_DEAD (reg:DI 6 6)
> > (nil)))
> > (note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
> > (note 10 7 15 2 NOTE_INSN_DELETED)
> > (insn 15 10 16 2 (set (reg/i:DF 33 1)
> > (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
> > (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40
> > 576 {*movdf_hardfloat64}
> > (nil))
> >
> > Possibly because the store and load happen in a different mode? Can
> > you see why CSE doesn't handle this (producing a subreg)? On
>
> Yes, exactly! For "{double a[4]; long l;}", because the store and load
> are using a different mode, and then CSE does not optimize it. This
> patch makes the store and load using the same mode (DImode), and then
> leverage CSE to handle it.
So can we instead fix CSE to consider replacing insn 15 above with
(insn 15 (set (reg/i:DF 33 1)
(subreg:DF (reg/f:DI 6 6)))
? One peculiarity is that this introduces a hardreg use in this
special case.
Richard.
> > the GIMPLE side we'd happily do that (but we don't see the argument
> > setup).
>
> Thanks for your comments!
>
>
> BR,
> Jeff (Jiufu)
>
> >
> > Thanks,
> > Richard.
> >
> >> This patch updates the behavior when loading floating members of a
> >> parameter: if that floating member is stored via integer register,
> >> then loading it as integer mode first, and converting it to floating
> >> mode.
> >>
> >> I also thought of a method: before storing the register to stack,
> >> convert it to float mode first. While there are some cases that may
> >> still prefer to keep an integer register store.
> >>
> >> Bootstrap and regtest passes on ppc64{,le}.
> >> I would ask for help to review for comments and if this patch is
> >> acceptable for the trunk.
> >>
> >>
> >> BR,
> >> Jeff (Jiufu)
> >>
> >> PR target/108073
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
> >> macro definition.
> >> (rs6000_loading_int_convert_to_float): New hook implement.
> >> * doc/tm.texi: Regenerated.
> >> * doc/tm.texi.in (loading_int_convert_to_float): New hook.
> >> * expr.cc (expand_expr_real_1): Updated to use the new hook.
> >> * target.def (loading_int_convert_to_float): New hook.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * g++.target/powerpc/pr102024.C: Update.
> >> * gcc.target/powerpc/pr108073.c: New test.
> >>
> >> ---
> >> gcc/config/rs6000/rs6000.cc | 70 +++++++++++++++++++++
> >> gcc/doc/tm.texi | 6 ++
> >> gcc/doc/tm.texi.in | 2 +
> >> gcc/expr.cc | 15 +++++
> >> gcc/target.def | 11 ++++
> >> gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +-
> >> gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
> >> 7 files changed, 129 insertions(+), 1 deletion(-)
> >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >> index b3a609f3aa3..af676eea276 100644
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -1559,6 +1559,9 @@ static const struct attribute_spec
> >> rs6000_attribute_table[] =
> >> #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
> >> #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
> >> invalid_arg_for_unprototyped_fn
> >>
> >> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> rs6000_loading_int_convert_to_float
> >> +
> >> #undef TARGET_MD_ASM_ADJUST
> >> #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
> >>
> >> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree
> >> typelist, const_tree funcdecl, const
> >> : NULL;
> >> }
> >>
> >> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
> >> +static rtx
> >> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx
> >> base)
> >> +{
> >> + rtx src_base = XEXP (source, 0);
> >> + poly_uint64 offset = MEM_OFFSET (source);
> >> +
> >> + if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
> >> + {
> >> + offset += INTVAL (XEXP (src_base, 1));
> >> + src_base = XEXP (src_base, 0);
> >> + }
> >> +
> >> + if (!rtx_equal_p (XEXP (base, 0), src_base))
> >> + return NULL_RTX;
> >> +
> >> + rtx temp_reg = gen_reg_rtx (word_mode);
> >> + rtx temp_mem = copy_rtx (source);
> >> + PUT_MODE (temp_mem, word_mode);
> >> +
> >> + /* DI->DF */
> >> + if (word_mode == DImode && mode == DFmode)
> >> + {
> >> + if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> >> + {
> >> + emit_move_insn (temp_reg, temp_mem);
> >> + rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode,
> >> 0);
> >> + rtx target_reg = gen_reg_rtx (mode);
> >> + emit_move_insn (target_reg, float_subreg);
> >> + return target_reg;
> >> + }
> >> + return NULL_RTX;
> >> + }
> >> +
> >> + /* Sub DI#->SF */
> >> + if (word_mode == DImode && mode == SFmode)
> >> + {
> >> + poly_uint64 byte_off = 0;
> >> + if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> >> + byte_off = 0;
> >> + else if (multiple_p (offset - GET_MODE_SIZE (mode),
> >> + GET_MODE_SIZE (word_mode)))
> >> + byte_off = GET_MODE_SIZE (mode);
> >> + else
> >> + return NULL_RTX;
> >> +
> >> + temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
> >> + emit_move_insn (temp_reg, temp_mem);
> >> +
> >> + /* little endia only? */
> >> + poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
> >> + if (known_eq (byte_off, high_off))
> >> + {
> >> + temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
> >> + GET_MODE_PRECISION (SImode), temp_reg, 0);
> >> + }
> >> + rtx subreg_si = gen_reg_rtx (SImode);
> >> + emit_move_insn (subreg_si, gen_lowpart (SImode, temp_reg));
> >> + rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0);
> >> + rtx target_reg = gen_reg_rtx (mode);
> >> + emit_move_insn (target_reg, float_subreg);
> >> + return target_reg;
> >> + }
> >> +
> >> + return NULL_RTX;
> >> +}
> >> +
> >> /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
> >> setup by using __stack_chk_fail_local hidden function instead of
> >> calling __stack_chk_fail directly. Otherwise it is better to call
> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> >> index 8fe49c2ba3d..10f94af553d 100644
> >> --- a/gcc/doc/tm.texi
> >> +++ b/gcc/doc/tm.texi
> >> @@ -11933,6 +11933,12 @@ or when the back end is in a
> >> partially-initialized state.
> >> outside of any function scope.
> >> @end deftypefn
> >>
> >> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
> >> +If the target is protifiable to load an integer in word_mode from
> >> +@var{source} which is based on @var{base}, then convert to a floating
> >> +point value in @var{mode}.
> >> +@end deftypefn
> >> +
> >> @defmac TARGET_OBJECT_SUFFIX
> >> Define this macro to be a C string representing the suffix for object
> >> files on your target machine. If you do not define this macro, GCC
> >> will
> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> >> index 62c49ac46de..1ca6a671d86 100644
> >> --- a/gcc/doc/tm.texi.in
> >> +++ b/gcc/doc/tm.texi.in
> >> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
> >>
> >> @hook TARGET_SET_CURRENT_FUNCTION
> >>
> >> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
> >> +
> >> @defmac TARGET_OBJECT_SUFFIX
> >> Define this macro to be a C string representing the suffix for object
> >> files on your target machine. If you do not define this macro, GCC
> >> will
> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
> >> index d9407432ea5..466079220e7 100644
> >> --- a/gcc/expr.cc
> >> +++ b/gcc/expr.cc
> >> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target,
> >> machine_mode tmode,
> >> && modifier != EXPAND_WRITE)
> >> op0 = flip_storage_order (mode1, op0);
> >>
> >> + /* Accessing float field of struct parameter which passed via integer
> >> + registers. */
> >> + if (targetm.loading_int_convert_to_float && mode == mode1
> >> + && GET_MODE_CLASS (mode) == MODE_FLOAT
> >> + && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
> >> + && REG_P (DECL_INCOMING_RTL (tem))
> >> + && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
> >> + && MEM_OFFSET_KNOWN_P (op0))
> >> + {
> >> + rtx res = targetm.loading_int_convert_to_float (mode, op0,
> >> + DECL_RTL (tem));
> >> + if (res)
> >> + op0 = res;
> >> + }
> >> +
> >> if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
> >> || modifier == EXPAND_CONST_ADDRESS
> >> || modifier == EXPAND_INITIALIZER)
> >> diff --git a/gcc/target.def b/gcc/target.def
> >> index 082a7c62f34..837ce902489 100644
> >> --- a/gcc/target.def
> >> +++ b/gcc/target.def
> >> @@ -4491,6 +4491,17 @@ original and the returned modes should be
> >> @code{MODE_INT}.",
> >> (machine_mode mode),
> >> default_preferred_doloop_mode)
> >>
> >> +
> >> +/* Loading an integer value from memory first, then convert (bitcast)
> >> to\n\n
> >> + floating point value, if target is able to support such behavior. */
> >> +DEFHOOK
> >> +(loading_int_convert_to_float,
> >> +"If the target is protifiable to load an integer in word_mode from\n\
> >> +@var{source} which is based on @var{base}, then convert to a floating\n\
> >> +point value in @var{mode}.",
> >> + rtx, (machine_mode mode, rtx source, rtx base),
> >> + NULL)
> >> +
> >> /* Returns true for a legitimate combined insn. */
> >> DEFHOOK
> >> (legitimate_combined_insn,
> >> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> b/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> index 769585052b5..c8995cae707 100644
> >> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> >> @@ -5,7 +5,7 @@
> >> // Test that a zero-width bit field in an otherwise homogeneous aggregate
> >> // generates a psabi warning and passes arguments in GPRs.
> >>
> >> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> >> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
> >>
> >> struct a_thing
> >> {
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> new file mode 100644
> >> index 00000000000..aa02de56405
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> >> @@ -0,0 +1,24 @@
> >> +/* { dg-do run } */
> >> +/* { dg-options "-O2 -save-temps" } */
> >> +
> >> +typedef struct DF {double a[4]; long l; } DF;
> >> +typedef struct SF {float a[4];short l; } SF;
> >> +
> >> +/* Each of below function contains one mtvsrd. */
> >> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target {
> >> has_arch_ppc64 && has_arch_pwr8 } } } } */
> >> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
> >> +float __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
> >> +float __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
> >> +
> >> +double gd = 4.0;
> >> +float gf1 = 3.0f, gf2 = 2.0f;
> >> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
> >> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
> >> +
> >> +int main()
> >> +{
> >> + if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) ==
> >> gf2))
> >> + __builtin_abort ();
> >> + return 0;
> >> +}
> >> +
> >>
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2022-12-22 7:54 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-21 6:27 Jiufu Guo
2022-12-21 7:30 ` Richard Biener
2022-12-22 7:25 ` guojiufu
2022-12-22 7:54 ` Richard Biener [this message]
2022-12-22 9:02 ` Jiufu Guo
2022-12-22 11:28 ` Richard Biener
2022-12-22 18:40 ` Segher Boessenkool
2022-12-23 12:23 ` Jiufu Guo
2022-12-23 12:36 ` Jiufu Guo
2022-12-23 14:45 ` Segher Boessenkool
2022-12-23 16:20 ` Richard Biener
2022-12-23 16:52 ` Segher Boessenkool
2022-12-23 19:13 ` Richard Biener
2022-12-23 19:52 ` Segher Boessenkool
2022-12-27 3:03 ` Jiufu Guo
2022-12-27 14:16 ` Jiufu Guo
2022-12-30 2:22 ` Jiufu Guo
2022-12-30 7:44 ` Segher Boessenkool
2022-12-30 8:30 ` Andrew Pinski
2023-01-03 3:28 ` Jiufu Guo
2023-01-03 8:59 ` Segher Boessenkool
2023-01-03 9:10 ` Hu, Lin1
2022-12-27 2:15 ` Jiufu Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.77.849.2212220748510.5956@jbgna.fhfr.qr \
--to=rguenther@suse.de \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=guojiufu@linux.ibm.com \
--cc=jeffreyalaw@gmail.com \
--cc=linkw@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).