From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=NjgS=4T=suse.de=rguenther@sourceware.org>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
	by sourceware.org (Postfix) with ESMTPS id 2FBC53857C44;
	Wed, 21 Dec 2022 07:30:25 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2FBC53857C44
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out2.suse.de (Postfix) with ESMTP id B61974E23;
	Wed, 21 Dec 2022 07:30:23 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1671607823; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=rWIa7y+CF8EhNd4fzb8b5Tv05HP85m6dqzhwz4ctIEc=;
	b=EQUMrsKeAi141zE+N1nTkhhIG0xhQ9SBofQD4OplRtcs01ojj+rSpAGZDkyzQOvecdwk4J
	wzyd6ETSZgv6Gu1K6QPJ/2n7NH2Ba0hVe4T8vndOaJ+VAf9+0x+Ic6sxsFGpoN1OHbdIGT
	bWxUSQC4OOKUweG523S1XdfLHgOPtoQ=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1671607823;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=rWIa7y+CF8EhNd4fzb8b5Tv05HP85m6dqzhwz4ctIEc=;
	b=TEwhSmC3cU6aLLXFUkfLAFfFk0Y4mud5X3v5+POodED6C6nWLH92MRfzPr567bpVMsRUPc
	hKxnwaHLGOjyGMDA==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 4609A2C141;
	Wed, 21 Dec 2022 07:30:23 +0000 (UTC)
Date: Wed, 21 Dec 2022 07:30:23 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Jiufu Guo <guojiufu@linux.ibm.com>
cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, 
    linkw@gcc.gnu.org, jeffreyalaw@gmail.com
Subject: Re: [PATCH] loading float member of parameter stored via int
 registers
In-Reply-To: <20221221062736.78036-1-guojiufu@linux.ibm.com>
Message-ID: <nycvar.YFH.7.77.849.2212210718030.5956@jbgna.fhfr.qr>
References: <20221221062736.78036-1-guojiufu@linux.ibm.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, 21 Dec 2022, Jiufu Guo wrote:

> Hi,
> 
> This patch is fixing an issue about parameter accessing if the
> parameter is struct type and passed through integer registers, and
> there is floating member is accessed. Like below code:
> 
> typedef struct DF {double a[4]; long l; } DF;
> double foo_df (DF arg){return arg.a[3];}
> 
> On ppc64le, with trunk gcc, "std 6,-24(1) ; lfd 1,-24(1)" is
> generated.  While instruction "mtvsrd 1, 6" would be enough for
> this case.

So why do we end up spilling for ppc?

struct X { int i; float f; };

float foo (struct X x)
{
  return x.f;
}

does pass the structure in $RDI on x86_64 and we manage (with 
optimization, with -O0 we spill) to generate

        shrq    $32, %rdi
        movd    %edi, %xmm0

and RTL expansion generates

(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v:DI 83 [ x ])
        (reg:DI 5 di [ x ])) "t.c":4:1 -1
     (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
            (set (reg:DI 85)
                (ashiftrt:DI (reg/v:DI 83 [ x ])
                    (const_int 32 [0x20])))
            (clobber (reg:CC 17 flags))
        ]) "t.c":5:11 -1
     (nil))
(insn 7 6 8 2 (set (reg:SI 86)
        (subreg:SI (reg:DI 85) 0)) "t.c":5:11 -1
     (nil))

I would imagine that for the ppc case we only see the subreg here
which should be even easier to optimize.

So how's this not fixable by providing proper patterns / subreg
capabilities?  Looking a bit at the RTL we have the issue might
be that nothing seems to handle CSE of

(note 8 0 5 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 5 8 7 2 (set (mem/c:DI (plus:DI (reg/f:DI 110 sfp)
                (const_int 56 [0x38])) [2 arg+24 S8 A64])
        (reg:DI 6 6)) "t.c":2:23 679 {*movdi_internal64}
     (expr_list:REG_DEAD (reg:DI 6 6)
        (nil)))
(note 7 5 10 2 NOTE_INSN_FUNCTION_BEG)
(note 10 7 15 2 NOTE_INSN_DELETED)
(insn 15 10 16 2 (set (reg/i:DF 33 1)
        (mem/c:DF (plus:DI (reg/f:DI 110 sfp)
                (const_int 56 [0x38])) [1 arg.a[3]+0 S8 A64])) "t.c":2:40 
576 {*movdf_hardfloat64}
     (nil))

Possibly because the store and load happen in a different mode?  Can
you see why CSE doesn't handle this (producing a subreg)?  On
the GIMPLE side we'd happily do that (but we don't see the argument
setup).

Thanks,
Richard.

> This patch updates the behavior when loading floating members of a
> parameter: if that floating member is stored via integer register,
> then loading it as integer mode first, and converting it to floating
> mode.
> 
> I also thought of a method: before storing the register to stack,
> convert it to float mode first. While there are some cases that may
> still prefer to keep an integer register store.                                                  
> 
> Bootstrap and regtest passes on ppc64{,le}.
> I would ask for help to review for comments and if this patch is
> acceptable for the trunk.
> 
> 
> BR,
> Jeff (Jiufu)
> 
> 	PR target/108073
> 
> gcc/ChangeLog:
> 
> 	* config/rs6000/rs6000.cc (TARGET_LOADING_INT_CONVERT_TO_FLOAT): New
> 	macro definition.
> 	(rs6000_loading_int_convert_to_float): New hook implement.
> 	* doc/tm.texi: Regenerated.
> 	* doc/tm.texi.in (loading_int_convert_to_float): New hook.
> 	* expr.cc (expand_expr_real_1): Updated to use the new hook.
> 	* target.def (loading_int_convert_to_float): New hook.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.target/powerpc/pr102024.C: Update.
> 	* gcc.target/powerpc/pr108073.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc                 | 70 +++++++++++++++++++++
>  gcc/doc/tm.texi                             |  6 ++
>  gcc/doc/tm.texi.in                          |  2 +
>  gcc/expr.cc                                 | 15 +++++
>  gcc/target.def                              | 11 ++++
>  gcc/testsuite/g++.target/powerpc/pr102024.C |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr108073.c | 24 +++++++
>  7 files changed, 129 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index b3a609f3aa3..af676eea276 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1559,6 +1559,9 @@ static const struct attribute_spec rs6000_attribute_table[] =
>  #undef TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
>  #define TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN invalid_arg_for_unprototyped_fn
>  
> +#undef TARGET_LOADING_INT_CONVERT_TO_FLOAT
> +#define TARGET_LOADING_INT_CONVERT_TO_FLOAT rs6000_loading_int_convert_to_float
> +
>  #undef TARGET_MD_ASM_ADJUST
>  #define TARGET_MD_ASM_ADJUST rs6000_md_asm_adjust
>  
> @@ -24018,6 +24021,73 @@ invalid_arg_for_unprototyped_fn (const_tree typelist, const_tree funcdecl, const
>  	  : NULL;
>  }
>  
> +/* Implement the TARGET_LOADING_INT_CONVERT_TO_FLOAT. */
> +static rtx
> +rs6000_loading_int_convert_to_float (machine_mode mode, rtx source, rtx base)
> +{
> +  rtx src_base = XEXP (source, 0);
> +  poly_uint64 offset = MEM_OFFSET (source);
> +
> +  if (GET_CODE (src_base) == PLUS && CONSTANT_P (XEXP (src_base, 1)))
> +    {
> +      offset += INTVAL (XEXP (src_base, 1));
> +      src_base = XEXP (src_base, 0);
> +    }
> +
> +  if (!rtx_equal_p (XEXP (base, 0), src_base))
> +    return NULL_RTX;
> +
> +  rtx temp_reg = gen_reg_rtx (word_mode);
> +  rtx temp_mem = copy_rtx (source);
> +  PUT_MODE (temp_mem, word_mode);
> +
> +  /* DI->DF */
> +  if (word_mode == DImode && mode == DFmode)
> +    {
> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> +	{
> +	  emit_move_insn (temp_reg, temp_mem);
> +	  rtx float_subreg = simplify_gen_subreg (mode, temp_reg, word_mode, 0);
> +	  rtx target_reg = gen_reg_rtx (mode);
> +	  emit_move_insn (target_reg, float_subreg);
> +	  return target_reg;
> +	}
> +      return NULL_RTX;
> +    }
> +
> +  /* Sub DI#->SF */
> +  if (word_mode == DImode && mode == SFmode)
> +    {
> +      poly_uint64 byte_off = 0;
> +      if (multiple_p (offset, GET_MODE_SIZE (word_mode)))
> +	byte_off = 0;
> +      else if (multiple_p (offset - GET_MODE_SIZE (mode),
> +			   GET_MODE_SIZE (word_mode)))
> +	byte_off = GET_MODE_SIZE (mode);
> +      else
> +	return NULL_RTX;
> +
> +      temp_mem = adjust_address (temp_mem, word_mode, -byte_off);
> +      emit_move_insn (temp_reg, temp_mem);
> +
> +      /* little endia only? */
> +      poly_uint64 high_off = subreg_highpart_offset (SImode, word_mode);
> +      if (known_eq (byte_off, high_off))
> +	{
> +	  temp_reg = expand_shift (RSHIFT_EXPR, word_mode, temp_reg,
> +				   GET_MODE_PRECISION (SImode), temp_reg, 0);
> +	}
> +      rtx subreg_si = gen_reg_rtx (SImode);
> +      emit_move_insn (subreg_si,  gen_lowpart (SImode, temp_reg));
> +      rtx float_subreg = simplify_gen_subreg (mode, subreg_si, SImode, 0);
> +      rtx target_reg = gen_reg_rtx (mode);
> +      emit_move_insn (target_reg, float_subreg);
> +      return target_reg;
> +    }
> +
> +  return NULL_RTX;
> +}
> +
>  /* For TARGET_SECURE_PLT 32-bit PIC code we can save PIC register
>     setup by using __stack_chk_fail_local hidden function instead of
>     calling __stack_chk_fail directly.  Otherwise it is better to call
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 8fe49c2ba3d..10f94af553d 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -11933,6 +11933,12 @@ or when the back end is in a partially-initialized state.
>  outside of any function scope.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} rtx TARGET_LOADING_INT_CONVERT_TO_FLOAT (machine_mode @var{mode}, rtx @var{source}, rtx @var{base})
> +If the target is protifiable to load an integer in word_mode from
> +@var{source} which is based on @var{base}, then convert to a floating
> +point value in @var{mode}.
> +@end deftypefn
> +
>  @defmac TARGET_OBJECT_SUFFIX
>  Define this macro to be a C string representing the suffix for object
>  files on your target machine.  If you do not define this macro, GCC will
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 62c49ac46de..1ca6a671d86 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -7756,6 +7756,8 @@ to by @var{ce_info}.
>  
>  @hook TARGET_SET_CURRENT_FUNCTION
>  
> +@hook TARGET_LOADING_INT_CONVERT_TO_FLOAT
> +
>  @defmac TARGET_OBJECT_SUFFIX
>  Define this macro to be a C string representing the suffix for object
>  files on your target machine.  If you do not define this macro, GCC will
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index d9407432ea5..466079220e7 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -11812,6 +11812,21 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
>  	    && modifier != EXPAND_WRITE)
>  	  op0 = flip_storage_order (mode1, op0);
>  
> +	/* Accessing float field of struct parameter which passed via integer
> +	   registers.  */
> +	if (targetm.loading_int_convert_to_float && mode == mode1
> +	    && GET_MODE_CLASS (mode) == MODE_FLOAT
> +	    && TREE_CODE (tem) == PARM_DECL && DECL_INCOMING_RTL (tem)
> +	    && REG_P (DECL_INCOMING_RTL (tem))
> +	    && GET_MODE (DECL_INCOMING_RTL (tem)) == BLKmode && MEM_P (op0)
> +	    && MEM_OFFSET_KNOWN_P (op0))
> +	  {
> +	    rtx res = targetm.loading_int_convert_to_float (mode, op0,
> +							    DECL_RTL (tem));
> +	    if (res)
> +	      op0 = res;
> +	  }
> +
>  	if (mode == mode1 || mode1 == BLKmode || mode1 == tmode
>  	    || modifier == EXPAND_CONST_ADDRESS
>  	    || modifier == EXPAND_INITIALIZER)
> diff --git a/gcc/target.def b/gcc/target.def
> index 082a7c62f34..837ce902489 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -4491,6 +4491,17 @@ original and the returned modes should be @code{MODE_INT}.",
>   (machine_mode mode),
>   default_preferred_doloop_mode)
>  
> +
> +/* Loading an integer value from memory first, then convert (bitcast) to\n\n
> +   floating point value, if target is able to support such behavior.  */
> +DEFHOOK
> +(loading_int_convert_to_float,
> +"If the target is protifiable to load an integer in word_mode from\n\
> +@var{source} which is based on @var{base}, then convert to a floating\n\
> +point value in @var{mode}.",
> + rtx, (machine_mode mode, rtx source, rtx base),
> + NULL)
> +
>  /* Returns true for a legitimate combined insn.  */
>  DEFHOOK
>  (legitimate_combined_insn,
> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C
> index 769585052b5..c8995cae707 100644
> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> @@ -5,7 +5,7 @@
>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>  // generates a psabi warning and passes arguments in GPRs.
>  
> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>  
>  struct a_thing
>  {
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> new file mode 100644
> index 00000000000..aa02de56405
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -save-temps" } */
> +
> +typedef struct DF {double a[4]; long l; } DF;
> +typedef struct SF {float a[4];short l; } SF;
> +
> +/* Each of below function contains one mtvsrd.  */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
> +double __attribute__ ((noipa)) foo_df (DF arg){return arg.a[3];}
> +float  __attribute__ ((noipa)) foo_sf (SF arg){return arg.a[2];}
> +float  __attribute__ ((noipa)) foo_sf1 (SF arg){return arg.a[1];}
> +
> +double gd = 4.0;
> +float gf1 = 3.0f, gf2 = 2.0f;
> +DF gdf = {{1.0,2.0,3.0,4.0}, 1L};
> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1L};
> +
> +int main()
> +{
> +  if (!(foo_df (gdf) == gd && foo_sf (gsf) == gf1 && foo_sf1 (gsf) == gf2))
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)