public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Patch: PR40900, extending call patterns
@ 2010-04-30  1:05 Bernd Schmidt
  2010-04-30  3:33 ` Andrew Pinski
  0 siblings, 1 reply; 15+ messages in thread
From: Bernd Schmidt @ 2010-04-30  1:05 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 684 bytes --]

The following patch fixes PR40900, which points out an unnecessary
sign-extension for the following code:

extern short shortv2();
short shortv1()
{
  return shortv2();
}

The idea is to represent the sign-extension in the RTL call patterns,
choosing extending variants in calls.c when we notice a suitable
promotion.  Then we let combine do its work; this needed another little
tweak to make sure it recorded the extension of the return register.

Bootstrapped and regression tested on i686-linux.  An earlier version,
which had some unnecessary code and other small differences, was
regression tested on
arm-linux-gnueabi(qemu-system-armv7{arch=armv7-a/thumb,thumb,}).  Ok?


Bernd

[-- Attachment #2: extend-calls3.diff --]
[-- Type: text/plain, Size: 16422 bytes --]


	PR rtl-optimization/40900
	* doc/md.texi (Standard Names): Document call_value_zext<mode,
	call_value_sext<mode>, call_value_pop_sext<mode> and
	call_value_pop_zext<mode>.
	* optabs.h (enum optab_index): Add OTI_call_value_sext,
	OTI_call_value_zext, OTI_call_value_pop_sext, OTI_call_value_po_zext.
	(call_value_sext_optab, call_value_zext_optab,
	call_value_pop_sext_optab, call_value_pop_zext_optab): New macros.
	* genopinit.c (optabs): Initialize the new optabs.
	* final.c (call_from_call_insn): Handle ZERO_EXTEND and SIGN_EXTEND.
	* calls.c: Include "recog.h".
	(emit_call_1): Look for and use extending call patterns if the return
	value is promoted.
	* combine.c (record_dead_and_set_regs): Use record_dead_and_set_regs_1
	normally for call insns.
	* config/arm/arm.md (define_code_iterator cext): New.
	(define_code_attr optab): New.
	(define_mode_iterator CEXT): New.
	(call_value_<optab><mode>, call_value_<optab><mode>_internal,
	call_value_<optab><mode>_reg_armv5, call_value_<optab><mode>_reg_arm,
	call_value_<optab><mode>_mem, call_value_<optab><mode>_reg_thumb1_v5,
	call_value_<optab><mode>_reg_thumb1, call_value_<optab><mode>_symbol,
	call_value_<optab><mode>_insn): New patterns.
	* config/arm/thumb2.md (call_value<optab><mode>_reg_thumb2): New
	pattern.

	PR rtl-optimization/40900
	* gcc.target/arm/pr40900.c: New test.

Index: doc/md.texi
===================================================================
--- doc/md.texi	(revision 158639)
+++ doc/md.texi	(working copy)
@@ -4775,6 +4775,17 @@ For machines where @code{RETURN_POPS_ARG
 patterns increases the number of functions for which the frame pointer
 can be eliminated, if desired.
 
+@cindex @code{call_value_sext@var{mode}} instruction pattern
+@cindex @code{call_value_zext@var{mode}} instruction pattern
+@cindex @code{call_value_pop_sext@var{mode}} instruction pattern
+@cindex @code{call_value_pop_zext@var{mode}} instruction pattern
+
+These patterns correspond to @samp{call_value} and @samp{call_value_pop},
+but they describe a call which returns a sign-extended or zero-extended
+result.  This may enable some optimization opportunities.  Defining them
+is optional; if they don't exist, @samp{call_value} or @samp{call_value_pop}
+will be used in their place.
+
 @cindex @code{untyped_call} instruction pattern
 @item @samp{untyped_call}
 Subroutine call instruction returning a value of any type.  Operand 0 is
Index: optabs.h
===================================================================
--- optabs.h	(revision 158643)
+++ optabs.h	(working copy)
@@ -369,6 +369,12 @@ enum optab_index
   /* Perform a raise to the power of integer.  */
   OTI_powi,
 
+  /* Perform a call where we know how the return value is extended.  */
+  OTI_call_value_sext,
+  OTI_call_value_zext,
+  OTI_call_value_pop_sext,
+  OTI_call_value_pop_zext,
+
   OTI_MAX
 };
 
@@ -546,6 +552,11 @@ extern struct optab_d optab_table[OTI_MA
 
 #define powi_optab (&optab_table[OTI_powi])
 
+#define call_value_sext_optab (&optab_table[OTI_call_value_sext])
+#define call_value_zext_optab (&optab_table[OTI_call_value_zext])
+#define call_value_pop_sext_optab (&optab_table[OTI_call_value_pop_sext])
+#define call_value_pop_zext_optab (&optab_table[OTI_call_value_pop_zext])
+
 /* Conversion optabs have their own table and indexes.  */
 enum convert_optab_index
 {
Index: genopinit.c
===================================================================
--- genopinit.c	(revision 158639)
+++ genopinit.c	(working copy)
@@ -272,7 +272,11 @@ static const char * const optabs[] =
   "optab_handler (vec_pack_ssat_optab, $A)->insn_code = CODE_FOR_$(vec_pack_ssat_$a$)",
   "optab_handler (vec_pack_usat_optab, $A)->insn_code = CODE_FOR_$(vec_pack_usat_$a$)",
   "optab_handler (vec_pack_sfix_trunc_optab, $A)->insn_code = CODE_FOR_$(vec_pack_sfix_trunc_$a$)",
-  "optab_handler (vec_pack_ufix_trunc_optab, $A)->insn_code = CODE_FOR_$(vec_pack_ufix_trunc_$a$)"
+  "optab_handler (vec_pack_ufix_trunc_optab, $A)->insn_code = CODE_FOR_$(vec_pack_ufix_trunc_$a$)",
+  "optab_handler (call_value_sext_optab, $A)->insn_code = CODE_FOR_$(call_value_sext$I$a$)",
+  "optab_handler (call_value_zext_optab, $A)->insn_code = CODE_FOR_$(call_value_zext$I$a$)",
+  "optab_handler (call_value_pop_sext_optab, $A)->insn_code = CODE_FOR_$(call_value_pop_sext$I$a$)",
+  "optab_handler (call_value_pop_zext_optab, $A)->insn_code = CODE_FOR_$(call_value_pop_zext$I$a$)"
 };
 
 static void gen_insn (rtx);
Index: final.c
===================================================================
--- final.c	(revision 158639)
+++ final.c	(working copy)
@@ -1791,6 +1791,10 @@ call_from_call_insn (rtx insn)
 	case PARALLEL:
 	  x = XVECEXP (x, 0, 0);
 	  break;
+	case ZERO_EXTEND:
+	case SIGN_EXTEND:
+	  x = XEXP (x, 0);
+	  break;
 	case SET:
 	  x = XEXP (x, 1);
 	  break;
Index: calls.c
===================================================================
--- calls.c	(revision 158639)
+++ calls.c	(working copy)
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  
 #include "flags.h"
 #include "expr.h"
 #include "optabs.h"
+#include "recog.h"
 #include "libfuncs.h"
 #include "function.h"
 #include "regs.h"
@@ -256,6 +257,8 @@ emit_call_1 (rtx funexp, tree fntree ATT
   rtx call_insn;
   int already_popped = 0;
   HOST_WIDE_INT n_popped = RETURN_POPS_ARGS (fndecl, funtype, stack_size);
+  enum insn_code pop_icode = CODE_FOR_nothing;
+  enum insn_code icode = CODE_FOR_nothing;
 
 #ifdef CALL_POPS_ARGS
   n_popped += CALL_POPS_ARGS (* args_so_far);
@@ -267,6 +270,31 @@ emit_call_1 (rtx funexp, tree fntree ATT
   if (GET_CODE (funexp) != SYMBOL_REF)
     funexp = memory_address (FUNCTION_MODE, funexp);
 
+
+  if (valreg)
+    {
+      enum machine_mode retval_mode, promoted_mode;
+      int retval_unsignedp;
+
+      retval_unsignedp = TYPE_UNSIGNED (TREE_TYPE (funtype));
+      retval_mode = TYPE_MODE (TREE_TYPE (funtype));
+      promoted_mode
+	= promote_function_mode (TREE_TYPE (funtype), retval_mode,
+				 &retval_unsignedp, funtype, 1);
+      
+      if (promoted_mode != retval_mode)
+	{
+	  optab call_optab = (retval_unsignedp
+			      ? call_value_pop_zext_optab
+			      :  call_value_pop_sext_optab);
+	  pop_icode = optab_handler (call_optab, retval_mode)->insn_code;
+	  call_optab = (retval_unsignedp
+			? call_value_zext_optab
+			:  call_value_sext_optab);
+	  icode = optab_handler (call_optab, retval_mode)->insn_code;
+	}
+    }
+
 #if defined (HAVE_sibcall_pop) && defined (HAVE_sibcall_value_pop)
   if ((ecf_flags & ECF_SIBCALL)
       && HAVE_sibcall_pop && HAVE_sibcall_value_pop
@@ -311,7 +339,11 @@ emit_call_1 (rtx funexp, tree fntree ATT
       /* If this subroutine pops its own args, record that in the call insn
 	 if possible, for the sake of frame pointer elimination.  */
 
-      if (valreg)
+      if (valreg && pop_icode != CODE_FOR_nothing)
+	pat = GEN_FCN (pop_icode) (valreg,
+				   gen_rtx_MEM (FUNCTION_MODE, funexp),
+				   rounded_stack_size_rtx, next_arg_reg, n_pop);
+      else if (valreg)
 	pat = GEN_CALL_VALUE_POP (valreg,
 				  gen_rtx_MEM (FUNCTION_MODE, funexp),
 				  rounded_stack_size_rtx, next_arg_reg, n_pop);
@@ -345,7 +377,12 @@ emit_call_1 (rtx funexp, tree fntree ATT
 #if defined (HAVE_call) && defined (HAVE_call_value)
   if (HAVE_call && HAVE_call_value)
     {
-      if (valreg)
+      if (valreg && icode != CODE_FOR_nothing)
+	emit_call_insn (GEN_FCN (icode) (valreg,
+					 gen_rtx_MEM (FUNCTION_MODE, funexp),
+					 rounded_stack_size_rtx, next_arg_reg,
+					 NULL_RTX));
+      else if (valreg)
 	emit_call_insn (GEN_CALL_VALUE (valreg,
 					gen_rtx_MEM (FUNCTION_MODE, funexp),
 					rounded_stack_size_rtx, next_arg_reg,
Index: combine.c
===================================================================
--- combine.c	(revision 158639)
+++ combine.c	(working copy)
@@ -11889,15 +11889,8 @@ record_dead_and_set_regs (rtx insn)
 	  }
 
       last_call_luid = mem_last_set = DF_INSN_LUID (insn);
-
-      /* We can't combine into a call pattern.  Remember, though, that
-	 the return value register is set at this LUID.  We could
-	 still replace a register with the return value from the
-	 wrong subroutine call!  */
-      note_stores (PATTERN (insn), record_dead_and_set_regs_1, NULL_RTX);
     }
-  else
-    note_stores (PATTERN (insn), record_dead_and_set_regs_1, insn);
+  note_stores (PATTERN (insn), record_dead_and_set_regs_1, insn);
 }
 
 /* If a SUBREG has the promoted bit set, it is in fact a property of the
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 158639)
+++ Makefile.in	(working copy)
@@ -2807,7 +2807,7 @@ builtins.o : builtins.c $(CONFIG_H) $(SY
    libfuncs.h $(REAL_H) langhooks.h $(BASIC_BLOCK_H) tree-mudflap.h \
    $(BUILTINS_DEF) $(MACHMODE_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) value-prof.h
 calls.o : calls.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
-   $(TREE_H) $(FLAGS_H) $(EXPR_H) $(OPTABS_H) langhooks.h $(TARGET_H) \
+   $(TREE_H) $(FLAGS_H) $(EXPR_H) $(OPTABS_H) $(RECOG_H) langhooks.h $(TARGET_H) \
    libfuncs.h $(REGS_H) $(TOPLEV_H) output.h $(FUNCTION_H) $(TIMEVAR_H) $(TM_P_H) \
    $(CGRAPH_H) $(EXCEPT_H) sbitmap.h $(DBGCNT_H) $(TREE_FLOW_H)
 expmed.o : expmed.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 158639)
+++ config/arm/arm.md	(working copy)
@@ -8561,6 +8561,51 @@ (define_expand "call_value"
   }"
 )
 
+(define_code_iterator cext [sign_extend zero_extend])
+(define_code_attr optab [(sign_extend "sext")
+			 (zero_extend "zext")])
+
+(define_mode_iterator CEXT [QI HI])
+
+(define_expand "call_value_<optab><mode>"
+  [(parallel [(set (match_operand:SI       0 "" "")
+	           (cext:SI
+		    (call:CEXT (match_operand 1 "memory_operand" "")
+			       (match_operand 2 "general_operand" ""))))
+	      (use (match_operand 3 "" ""))
+	      (clobber (reg:SI LR_REGNUM))])]
+  "TARGET_EITHER"
+  "
+  {
+    rtx pat, callee;
+    
+    /* In an untyped call, we can get NULL for operand 2.  */
+    if (operands[3] == 0)
+      operands[3] = const0_rtx;
+      
+    /* Decide if we should generate indirect calls by loading the
+       32-bit address of the callee into a register before performing the
+       branch and link.  */
+    callee = XEXP (operands[1], 0);
+    if (GET_CODE (callee) == SYMBOL_REF
+	? arm_is_long_call_p (SYMBOL_REF_DECL (callee))
+	: !REG_P (callee))
+      XEXP (operands[1], 0) = force_reg (Pmode, callee);
+
+    pat = gen_call_value_<optab><mode>_internal (operands[0], operands[1],
+				   		 operands[2], operands[3]);
+    arm_emit_call_insn (pat, XEXP (operands[1], 0));
+    DONE;
+  }"
+)
+
+(define_expand "call_value_<optab><mode>_internal"
+  [(parallel [(set (match_operand 0 "" "")
+	           (cext:SI (call:CEXT (match_operand 1 "memory_operand" "")
+				       (match_operand 2 "general_operand" ""))))
+	      (use (match_operand 3 "" ""))
+	      (clobber (reg:SI LR_REGNUM))])])
+
 (define_expand "call_value_internal"
   [(parallel [(set (match_operand       0 "" "")
 	           (call (match_operand 1 "memory_operand" "")
@@ -8642,6 +8687,80 @@ (define_insn "*call_value_reg_thumb1"
   [(set_attr "type" "call")]
 )
 
+(define_insn "*call_value_<optab><mode>_reg_armv5"
+  [(set (match_operand:SI 0 "" "")
+        (cext:SI (call:CEXT (mem:SI (match_operand:SI 1 "s_register_operand" "r"))
+			    (match_operand 2 "" ""))))
+   (use (match_operand 3 "" ""))
+   (clobber (reg:SI LR_REGNUM))]
+  "TARGET_ARM && arm_arch5"
+  "blx%?\\t%1"
+  [(set_attr "type" "call")]
+)
+
+(define_insn "*call_value_<optab><mode>_reg_arm"
+  [(set (match_operand:SI 0 "" "")
+        (cext:SI (call:CEXT (mem:SI (match_operand:SI 1 "s_register_operand" "r"))
+			    (match_operand 2 "" ""))))
+   (use (match_operand 3 "" ""))
+   (clobber (reg:SI LR_REGNUM))]
+  "TARGET_ARM && !arm_arch5"
+  "*
+  return output_call (&operands[1]);
+  "
+  [(set_attr "length" "12")
+   (set_attr "type" "call")]
+)
+
+;; Note: see *call_mem
+
+(define_insn "*call_value_<optab><mode>_mem"
+  [(set (match_operand:SI 0 "" "")
+	(cext:SI (call:CEXT (mem:SI (match_operand:SI 1 "call_memory_operand" "m"))
+			    (match_operand 2 "" ""))))
+   (use (match_operand 3 "" ""))
+   (clobber (reg:SI LR_REGNUM))]
+  "TARGET_ARM && !arm_arch5 && (!CONSTANT_ADDRESS_P (XEXP (operands[1], 0)))"
+  "*
+  return output_call_mem (&operands[1]);
+  "
+  [(set_attr "length" "12")
+   (set_attr "type" "call")]
+)
+
+(define_insn "*call_value_<optab><mode>_reg_thumb1_v5"
+  [(set (match_operand:SI 0 "" "")
+	(cext:SI (call:CEXT (mem:SI (match_operand:SI 1 "register_operand" "l*r"))
+			    (match_operand 2 "" ""))))
+   (use (match_operand 3 "" ""))
+   (clobber (reg:SI LR_REGNUM))]
+  "TARGET_THUMB1 && arm_arch5"
+  "blx\\t%1"
+  [(set_attr "length" "2")
+   (set_attr "type" "call")]
+)
+
+(define_insn "*call_value_<optab><mode>_reg_thumb1"
+  [(set (match_operand:SI 0 "" "")
+	(cext:SI (call:CEXT (mem:SI (match_operand:SI 1 "register_operand" "l*r"))
+			    (match_operand 2 "" ""))))
+   (use (match_operand 3 "" ""))
+   (clobber (reg:SI LR_REGNUM))]
+  "TARGET_THUMB1 && !arm_arch5"
+  "*
+  {
+    if (!TARGET_CALLER_INTERWORKING)
+      return thumb_call_via_reg (operands[1]);
+    else if (operands[2] == const0_rtx)
+      return \"bl\\t%__interwork_call_via_%1\";
+    else if (frame_pointer_needed)
+      return \"bl\\t%__interwork_r7_call_via_%1\";
+    else
+      return \"bl\\t%__interwork_r11_call_via_%1\";
+  }"
+  [(set_attr "type" "call")]
+)
+
 ;; Allow calls to SYMBOL_REFs specially as they are not valid general addresses
 ;; The 'a' causes the operand to be treated as an address, i.e. no '#' output.
 
@@ -8676,6 +8795,22 @@ (define_insn "*call_value_symbol"
   [(set_attr "type" "call")]
 )
 
+(define_insn "*call_value_<optab><mode>_symbol"
+  [(set (match_operand:SI 0 "" "")
+	(cext:SI (call:CEXT (mem:SI (match_operand:SI 1 "" ""))
+			    (match_operand:SI 2 "" ""))))
+   (use (match_operand 3 "" ""))
+   (clobber (reg:SI LR_REGNUM))]
+  "TARGET_ARM
+   && (GET_CODE (operands[1]) == SYMBOL_REF)
+   && !arm_is_long_call_p (SYMBOL_REF_DECL (operands[1]))"
+  "*
+  {
+    return NEED_PLT_RELOC ? \"bl%?\\t%a1(PLT)\" : \"bl%?\\t%a1\";
+  }"
+  [(set_attr "type" "call")]
+)
+
 (define_insn "*call_insn"
   [(call (mem:SI (match_operand:SI 0 "" ""))
 	 (match_operand:SI 1 "" ""))
@@ -8703,6 +8838,20 @@ (define_insn "*call_value_insn"
    (set_attr "type" "call")]
 )
 
+(define_insn "*call_value_<optab><mode>_insn"
+  [(set (match_operand:SI 0 "" "")
+	(cext:SI (call:CEXT (mem:SI (match_operand 1 "" ""))
+			    (match_operand 2 "" ""))))
+   (use (match_operand 3 "" ""))
+   (clobber (reg:SI LR_REGNUM))]
+  "TARGET_THUMB
+   && GET_CODE (operands[1]) == SYMBOL_REF
+   && !arm_is_long_call_p (SYMBOL_REF_DECL (operands[1]))"
+  "bl\\t%a1"
+  [(set_attr "length" "4")
+   (set_attr "type" "call")]
+)
+
 ;; We may also be able to do sibcalls for Thumb, but it's much harder...
 (define_expand "sibcall"
   [(parallel [(call (match_operand 0 "memory_operand" "")
Index: config/arm/thumb2.md
===================================================================
--- config/arm/thumb2.md	(revision 158639)
+++ config/arm/thumb2.md	(working copy)
@@ -448,6 +448,17 @@ (define_insn "*call_value_reg_thumb2"
   [(set_attr "type" "call")]
 )
 
+(define_insn "*call_value_<optab><mode>_reg_thumb2"
+  [(set (match_operand:SI 0 "" "")
+	(cext:SI (call:CEXT (mem:SI (match_operand:SI 1 "register_operand" "l*r"))
+			    (match_operand 2 "" ""))))
+   (use (match_operand 3 "" ""))
+   (clobber (reg:SI LR_REGNUM))]
+  "TARGET_THUMB2"
+  "blx\\t%1"
+  [(set_attr "type" "call")]
+)
+
 (define_insn "*thumb2_indirect_jump"
   [(set (pc)
 	(match_operand:SI 0 "register_operand" "l*r"))]
Index: testsuite/gcc.target/arm/pr40900.c
===================================================================
--- testsuite/gcc.target/arm/pr40900.c	(revision 0)
+++ testsuite/gcc.target/arm/pr40900.c	(revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-optimize-sibling-calls" }  */
+
+extern short shortv2();
+short shortv1()
+{
+  return shortv2();
+}
+
+/* { dg-final { scan-assembler-not "lsl" } } */
+/* { dg-final { scan-assembler-not "asr" } } */
+/* { dg-final { scan-assembler-not "sxth" } } */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-04-30  1:05 Patch: PR40900, extending call patterns Bernd Schmidt
@ 2010-04-30  3:33 ` Andrew Pinski
  2010-04-30  9:23   ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Pinski @ 2010-04-30  3:33 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On Thu, Apr 29, 2010 at 5:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> The idea is to represent the sign-extension in the RTL call patterns,
> choosing extending variants in calls.c when we notice a suitable
> promotion.  Then we let combine do its work; this needed another little
> tweak to make sure it recorded the extension of the return register.

On PowerPC (and MIPS even with 4.3.3), this is already optimized
without a sign extend.

 (call_insn 5 4 6 3 t.c:4 (parallel [
            (set (reg:SI 3 3)
                (call (mem:SI (symbol_ref:SI ("shortv2") [flags 0x41]
<function_decl 0x7fc0bfb81a00 shortv2>) [0 S4 A8])
                    (const_int 0 [0x0])))
            (use (const_int 0 [0x0]))
            (clobber (reg:SI 65 lr))
        ]) -1 (nil)
    (nil))

(insn 6 5 7 3 t.c:4 (set (reg:SI 121)
        (reg:SI 3 3)) -1 (nil))

(insn 7 6 8 3 t.c:4 (set (reg:SI 119 [ D.2012+-2 ])
        (reg:SI 121)) -1 (nil))

(insn 8 7 9 3 t.c:4 (set (reg:SI 120 [ <retval>+-2 ])
        (reg:SI 119 [ D.2012+-2 ])) -1 (nil))

--- CUT ---

Maybe I am missing something but the arm RTL should look something
similar without a sign extension.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-04-30  3:33 ` Andrew Pinski
@ 2010-04-30  9:23   ` Paolo Bonzini
  2010-04-30 15:09     ` Bernd Schmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2010-04-30  9:23 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

On 04/30/2010 03:13 AM, Andrew Pinski wrote:
> On Thu, Apr 29, 2010 at 5:32 PM, Bernd Schmidt<bernds@codesourcery.com>  wrote:
>> The idea is to represent the sign-extension in the RTL call patterns,
>> choosing extending variants in calls.c when we notice a suitable
>> promotion.  Then we let combine do its work; this needed another little
>> tweak to make sure it recorded the extension of the return register.
>
> On PowerPC (and MIPS even with 4.3.3), this is already optimized
> without a sign extend.
>
>   (call_insn 5 4 6 3 t.c:4 (parallel [
>              (set (reg:SI 3 3)
>                  (call (mem:SI (symbol_ref:SI ("shortv2") [flags 0x41]
> <function_decl 0x7fc0bfb81a00 shortv2>) [0 S4 A8])
>                      (const_int 0 [0x0])))
>              (use (const_int 0 [0x0]))
>              (clobber (reg:SI 65 lr))
>          ]) -1 (nil)
>      (nil))
>
> (insn 6 5 7 3 t.c:4 (set (reg:SI 121)
>          (reg:SI 3 3)) -1 (nil))
>
> (insn 7 6 8 3 t.c:4 (set (reg:SI 119 [ D.2012+-2 ])
>          (reg:SI 121)) -1 (nil))
>
> (insn 8 7 9 3 t.c:4 (set (reg:SI 120 [<retval>+-2 ])
>          (reg:SI 119 [ D.2012+-2 ])) -1 (nil))
>
> --- CUT ---
>
> Maybe I am missing something but the arm RTL should look something
> similar without a sign extension.

I suspect that's due to a different definition of promote_prototypes and 
other hooks.

However, I agree with Andrew that all the information should already be 
there (through the promote_function_mode hook), it's just that combine 
isn't trying hard enough.  Maybe something based on the attached patch?

Paolo

[-- Attachment #2: pr40900-stub.patch --]
[-- Type: text/plain, Size: 2619 bytes --]

Index: combine.c
===================================================================
--- combine.c	(revision 158698)
+++ combine.c	(working copy)
@@ -427,7 +427,7 @@ static int recog_for_combine (rtx *, rtx
 static rtx gen_lowpart_for_combine (enum machine_mode, rtx);
 static enum rtx_code simplify_comparison (enum rtx_code, rtx *, rtx *);
 static void update_table_tick (rtx);
-static void record_value_for_reg (rtx, rtx, rtx);
+static reg_stat_type *record_value_for_reg (rtx, rtx, rtx);
 static void check_promoted_subreg (rtx, rtx);
 static void record_dead_and_set_regs_1 (rtx, const_rtx, void *);
 static void record_dead_and_set_regs (rtx);
@@ -11688,7 +11688,7 @@ update_table_tick (rtx x)
    only permitted with VALUE also zero and is used to invalidate the
    register.  */
 
-static void
+static reg_stat_type *
 record_value_for_reg (rtx reg, rtx insn, rtx value)
 {
   unsigned int regno = REGNO (reg);
@@ -11798,6 +11798,8 @@ record_value_for_reg (rtx reg, rtx insn,
       rsp->last_set_sign_bit_copies
 	= num_sign_bit_copies (value, GET_MODE (reg));
     }
+
+  return rsp;
 }
 
 /* Called via note_stores from record_dead_and_set_regs to handle one
@@ -11812,10 +11814,20 @@ record_dead_and_set_regs_1 (rtx dest, co
   if (GET_CODE (dest) == SUBREG)
     dest = SUBREG_REG (dest);
 
-  if (!record_dead_insn)
+  /* We can't combine into a call pattern.  Remember that
+     the return value register is set at this LUID.  We could
+     still replace a register with the return value from the
+     wrong subroutine call!  */
+  if (CALL_P (record_dead_insn))
     {
       if (REG_P (dest))
-	record_value_for_reg (dest, NULL_RTX, NULL_RTX);
+        {
+          reg_stat_type *rsp = record_value_for_reg (dest, NULL_RTX, NULL_RTX);
+
+          /* TODO: set last_set_mode, last_set_nonzero_bits,
+           * last_set_sign_bit_copies depending on the
+           * promote_function_mode hook...  */
+        }
       return;
     }
 
@@ -11898,15 +11910,9 @@ record_dead_and_set_regs (rtx insn)
 	  }
 
       last_call_luid = mem_last_set = DF_INSN_LUID (insn);
-
-      /* We can't combine into a call pattern.  Remember, though, that
-	 the return value register is set at this LUID.  We could
-	 still replace a register with the return value from the
-	 wrong subroutine call!  */
-      note_stores (PATTERN (insn), record_dead_and_set_regs_1, NULL_RTX);
     }
-  else
-    note_stores (PATTERN (insn), record_dead_and_set_regs_1, insn);
+
+  note_stores (PATTERN (insn), record_dead_and_set_regs_1, insn);
 }
 
 /* If a SUBREG has the promoted bit set, it is in fact a property of the

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-04-30  9:23   ` Paolo Bonzini
@ 2010-04-30 15:09     ` Bernd Schmidt
       [not found]       ` <201005010054.17350.ebotcazou@adacore.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Bernd Schmidt @ 2010-04-30 15:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Andrew Pinski, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 749 bytes --]

On 04/30/2010 10:57 AM, Paolo Bonzini wrote:
> However, I agree with Andrew that all the information should already be
> there (through the promote_function_mode hook), it's just that combine
> isn't trying hard enough.  Maybe something based on the attached patch?

When finished, it looks like the one below.  Simply setting
sign_bit_copies in record_dead_and_set_regs_1 doesn't work, and rather
than duplicating the bookkeeping I construct a small piece of RTL to
pass to record_value_for_reg that makes it do the right thing.

This works without modifying backends, but it has the drawback that it
can't optimize indirect calls since we're lacking a decl for them.

Testing now in progress on i686-linux.  Ok (this or the previous one)?


Bernd

[-- Attachment #2: pr40900-b.diff --]
[-- Type: text/plain, Size: 3625 bytes --]


	PR rtl-optimization/40900
	* combine.c (record_dead_and_set_regs_1): Handle calls with a MEM
	argument by examining the decl and recording an extension if the
	return value was promoted.
	(record_dead_and_set_regs): Don't handle calls specially.

	PR rtl-optimization/40900
	* gcc.target/arm/pr40900.c: New test.

Index: combine.c
===================================================================
--- combine.c	(revision 158639)
+++ combine.c	(working copy)
@@ -11803,20 +11803,49 @@ record_dead_and_set_regs_1 (rtx dest, co
   if (GET_CODE (dest) == SUBREG)
     dest = SUBREG_REG (dest);
 
-  if (!record_dead_insn)
-    {
-      if (REG_P (dest))
-	record_value_for_reg (dest, NULL_RTX, NULL_RTX);
-      return;
-    }
-
   if (REG_P (dest))
     {
       /* If we are setting the whole register, we know its value.  Otherwise
 	 show that we don't know the value.  We can handle SUBREG in
 	 some cases.  */
       if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
-	record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
+	{
+	  if (CALL_P (record_dead_insn) && GET_CODE (SET_SRC (setter)) == CALL
+	      && MEM_P (XEXP (SET_SRC (setter), 0))
+	      && GET_CODE (XEXP (XEXP (SET_SRC (setter), 0), 0)) == SYMBOL_REF)
+	    {
+	      /* Try to detect promotions of the return value.  */
+	      tree decl = SYMBOL_REF_DECL (XEXP (XEXP (SET_SRC (setter), 0), 0));
+	      if (decl)
+		{
+		  tree fntype = TREE_TYPE (decl);
+		  tree result_type = TREE_TYPE (fntype);
+		  int uns = TYPE_UNSIGNED (result_type);
+		  enum machine_mode mode_from, mode_to;
+		  reg_stat_type *rsp;
+		  rtx t;
+
+		  uns = TYPE_UNSIGNED (result_type);
+		  mode_from = TYPE_MODE (result_type);
+		  mode_to = promote_function_mode (result_type, mode_from, &uns,
+						   fntype, 1);
+
+		  if (mode_to == GET_MODE (dest)
+		      && GET_MODE_CLASS (mode_to) == MODE_INT
+		      && mode_from != mode_to)
+		    {
+		      rtx t = gen_rtx_fmt_e (uns ? ZERO_EXTEND : SIGN_EXTEND,
+					     mode_to,
+					     gen_rtx_SCRATCH (mode_from));
+		      record_value_for_reg (dest, record_dead_insn, t);
+		      return;
+		    }
+		}
+	      record_value_for_reg (dest, NULL_RTX, NULL_RTX);
+	    }
+	  else
+	    record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
+	}
       else if (GET_CODE (setter) == SET
 	       && GET_CODE (SET_DEST (setter)) == SUBREG
 	       && SUBREG_REG (SET_DEST (setter)) == dest
@@ -11889,15 +11918,9 @@ record_dead_and_set_regs (rtx insn)
 	  }
 
       last_call_luid = mem_last_set = DF_INSN_LUID (insn);
-
-      /* We can't combine into a call pattern.  Remember, though, that
-	 the return value register is set at this LUID.  We could
-	 still replace a register with the return value from the
-	 wrong subroutine call!  */
-      note_stores (PATTERN (insn), record_dead_and_set_regs_1, NULL_RTX);
     }
-  else
-    note_stores (PATTERN (insn), record_dead_and_set_regs_1, insn);
+
+  note_stores (PATTERN (insn), record_dead_and_set_regs_1, insn);
 }
 
 /* If a SUBREG has the promoted bit set, it is in fact a property of the
Index: testsuite/gcc.target/arm/pr40900.c
===================================================================
--- testsuite/gcc.target/arm/pr40900.c	(revision 0)
+++ testsuite/gcc.target/arm/pr40900.c	(revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-optimize-sibling-calls" }  */
+
+extern short shortv2();
+short shortv1()
+{
+  return shortv2();
+}
+
+/* { dg-final { scan-assembler-not "lsl" } } */
+/* { dg-final { scan-assembler-not "asr" } } */
+/* { dg-final { scan-assembler-not "sxth" } } */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
       [not found]       ` <201005010054.17350.ebotcazou@adacore.com>
@ 2010-05-01 10:45         ` Paolo Bonzini
  2010-05-03  9:39         ` Bernd Schmidt
  1 sibling, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2010-05-01 10:45 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Bernd Schmidt, gcc-patches, Andrew Pinski

On Sat, May 1, 2010 at 00:54, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> When finished, it looks like the one below.  Simply setting
>> sign_bit_copies in record_dead_and_set_regs_1 doesn't work, and rather
>> than duplicating the bookkeeping I construct a small piece of RTL to
>> pass to record_value_for_reg that makes it do the right thing.
>>
>> This works without modifying backends, but it has the drawback that it
>> can't optimize indirect calls since we're lacking a decl for them.

REG_EXPR?

> And going back to the Tree level in the RTL combiner is ugly.  Why can PowerPC
> and MIPS already optimize this and not ARM?  That's worth investigating.

True, however promote_function_mode is already being used by combine
for similar reasons (see setup_incoming_promotions), which is why I
suggested it.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
       [not found]       ` <201005010054.17350.ebotcazou@adacore.com>
  2010-05-01 10:45         ` Paolo Bonzini
@ 2010-05-03  9:39         ` Bernd Schmidt
  2010-05-05 13:01           ` Eric Botcazou
  1 sibling, 1 reply; 15+ messages in thread
From: Bernd Schmidt @ 2010-05-03  9:39 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Paolo Bonzini, Andrew Pinski

On 05/01/2010 12:54 AM, Eric Botcazou wrote:
>> When finished, it looks like the one below.  Simply setting
>> sign_bit_copies in record_dead_and_set_regs_1 doesn't work, and rather
>> than duplicating the bookkeeping I construct a small piece of RTL to
>> pass to record_value_for_reg that makes it do the right thing.
>>
>> This works without modifying backends, but it has the drawback that it
>> can't optimize indirect calls since we're lacking a decl for them.
> 
> And going back to the Tree level in the RTL combiner is ugly.

As Paolo mentioned, it's done elsewhere in combine already.  If we put a
decl in a SYMBOL_REF, we might as well use it.

>  Why can PowerPC 
> and MIPS already optimize this and not ARM?  That's worth investigating.

Seems to be because it's a signed operation, and arm has

#define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)     \
  if (GET_MODE_CLASS (MODE) == MODE_INT         \
      && GET_MODE_SIZE (MODE) < 4)              \
    {                                           \
      if (MODE == QImode)                       \
        UNSIGNEDP = 1;                          \
      else if (MODE == HImode)                  \
        UNSIGNEDP = 1;                          \
      (MODE) = SImode;                          \
    }

while ppc doesn't modify UNSIGNEDP.


Bernd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-05-03  9:39         ` Bernd Schmidt
@ 2010-05-05 13:01           ` Eric Botcazou
  2010-05-06  7:14             ` Paolo Bonzini
  2010-06-08 21:56             ` Bernd Schmidt
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Botcazou @ 2010-05-05 13:01 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, Paolo Bonzini, Andrew Pinski

> Seems to be because it's a signed operation, and arm has
>
> #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)     \
>   if (GET_MODE_CLASS (MODE) == MODE_INT         \
>       && GET_MODE_SIZE (MODE) < 4)              \
>     {                                           \
>       if (MODE == QImode)                       \
>         UNSIGNEDP = 1;                          \
>       else if (MODE == HImode)                  \
>         UNSIGNEDP = 1;                          \
>       (MODE) = SImode;                          \
>     }
>
> while ppc doesn't modify UNSIGNEDP.

Can't we be clever during RTL expansion and avoid blindly zero-extending the 
value when we known that

  D.2014_1 = shortv2 ();

and promote_function_mode sign-extends?  The kind of extension for a specific 
variable can be changed since the SUBREG_PROMOTED_* machinery records it.  In 
other words, can't we just override promote_decl_mode in the SSA_NAME case of 
expand_expr_real_1?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-05-05 13:01           ` Eric Botcazou
@ 2010-05-06  7:14             ` Paolo Bonzini
  2010-05-06  7:36               ` Eric Botcazou
  2010-06-08 21:56             ` Bernd Schmidt
  1 sibling, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2010-05-06  7:14 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Bernd Schmidt, gcc-patches, Andrew Pinski

On Wed, May 5, 2010 at 14:58, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> Seems to be because it's a signed operation, and arm has
>>
>> #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)     \
>>   if (GET_MODE_CLASS (MODE) == MODE_INT         \
>>       && GET_MODE_SIZE (MODE) < 4)              \
>>     {                                           \
>>       if (MODE == QImode)                       \
>>         UNSIGNEDP = 1;                          \
>>       else if (MODE == HImode)                  \
>>         UNSIGNEDP = 1;                          \
>>       (MODE) = SImode;                          \
>>     }
>>
>> while ppc doesn't modify UNSIGNEDP.
>
> Can't we be clever during RTL expansion and avoid blindly zero-extending the
> value when we known that
>
>  D.2014_1 = shortv2 ();
>
> and promote_function_mode sign-extends?

Instead I wonder if ppc optimizes something like

unsigned short short2();

unsigned int int1()
{
  return short2() & 0xFFFF;
}

This would be caught by Bernd's patch too (presumably).

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-05-06  7:14             ` Paolo Bonzini
@ 2010-05-06  7:36               ` Eric Botcazou
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Botcazou @ 2010-05-06  7:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Bernd Schmidt, gcc-patches, Andrew Pinski

> Instead I wonder if ppc optimizes something like
>
> unsigned short short2();
>
> unsigned int int1()
> {
>   return short2() & 0xFFFF;
> }
>
> This would be caught by Bernd's patch too (presumably).

Let's try to generate sane RTL from the start if we can, this will make the 
compiler faster and save some memory.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-05-05 13:01           ` Eric Botcazou
  2010-05-06  7:14             ` Paolo Bonzini
@ 2010-06-08 21:56             ` Bernd Schmidt
  2010-06-09 22:14               ` Eric Botcazou
  1 sibling, 1 reply; 15+ messages in thread
From: Bernd Schmidt @ 2010-06-08 21:56 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Paolo Bonzini, Andrew Pinski

On 05/05/2010 02:58 PM, Eric Botcazou wrote:
>> Seems to be because it's a signed operation, and arm has
>>
>> #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)     \
>>   if (GET_MODE_CLASS (MODE) == MODE_INT         \
>>       && GET_MODE_SIZE (MODE) < 4)              \
>>     {                                           \
>>       if (MODE == QImode)                       \
>>         UNSIGNEDP = 1;                          \
>>       else if (MODE == HImode)                  \
>>         UNSIGNEDP = 1;                          \
>>       (MODE) = SImode;                          \
>>     }
>>
>> while ppc doesn't modify UNSIGNEDP.
> 
> Can't we be clever during RTL expansion and avoid blindly zero-extending the 
> value when we known that
> 
>   D.2014_1 = shortv2 ();
> 
> and promote_function_mode sign-extends?  The kind of extension for a specific 
> variable can be changed since the SUBREG_PROMOTED_* machinery records it.  In 
> other words, can't we just override promote_decl_mode in the SSA_NAME case of 
> expand_expr_real_1?

I've looked into this now, and I don't really see a way to do better at
rtl expansion time.

In current sources, the function ends up as
  D.2006_1 = shortv2 (); [tail call]
  return D.2006_1;

PROMOTE_MODE ensures that D.2006_1 becomes
  #0  store_expr (exp=0xf7cbb5e8, target=0xf7d2c5ac, call_param_p=0,
      nontemporal=0 '\000') at ../../trunk/gcc/expr.c:4582
  4582	      rtx inner_target = 0;
  (gdb) p target(gdb) p debug_rtx (target)
  (subreg/s/u:HI (reg:SI 133 [ D.2006 ]) 0)

We create zero-extensions around the CALL_EXPR in store_expr and call
expand_expr.  At some point, in expand_call, we create a signed target:
  (subreg/s:HI (reg:SI 136) 0)
which is returned from expand_call.  This value then gets zero extended
due to the NOP_EXPRs around the call which were previously created in
store_expr.  The zero extension makes it into initial RTL but is quickly
deleted as useless.

The value returned by the call to expand_expr in store_expr is simply
(reg:SI 133), with no hint that there was a sign-extended promoted
subreg available anywhere.

Later, during expand_return, we create the superfluous sign extension.
This seems unavoidable since
a) what we return is D.2006_1, i.e. a subreg of reg 133, so we no
   longer see the subreg created by expand_call, and
b) even if we did see it, it had its SUBREG_PROMOTED_VAR_P flag deleted
   anyway by this code in expand_expr_real_2:

	  /* If the signedness of the conversion differs and OP0 is
	     a promoted SUBREG, clear that indication since we now
	     have to do the proper extension.  */
	  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)) != unsignedp
	      && GET_CODE (op0) == SUBREG)
	    SUBREG_PROMOTED_VAR_P (op0) = 0;

I see no obvious place in this sequence of events where we could have
done anything better, and I'm doubtful that arbitrarily ignoring the
choice made by PROMOTE_MODE at any point would be a good idea.  In light
of this, is my previous combiner patch OK?


Bernd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-06-08 21:56             ` Bernd Schmidt
@ 2010-06-09 22:14               ` Eric Botcazou
  2010-06-11 13:58                 ` Bernd Schmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Botcazou @ 2010-06-09 22:14 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, Paolo Bonzini, Andrew Pinski

> I've looked into this now, and I don't really see a way to do better at
> rtl expansion time.

IIRC I did manage to get your testcase optimized with only a few changed lines 
in expr.c.  I can try again, but not before the end of the week.

> In current sources, the function ends up as
>   D.2006_1 = shortv2 (); [tail call]
>   return D.2006_1;
>
> PROMOTE_MODE ensures that D.2006_1 becomes
>   #0  store_expr (exp=0xf7cbb5e8, target=0xf7d2c5ac, call_param_p=0,
>       nontemporal=0 '\000') at ../../trunk/gcc/expr.c:4582
>   4582	      rtx inner_target = 0;
>   (gdb) p target(gdb) p debug_rtx (target)
>   (subreg/s/u:HI (reg:SI 133 [ D.2006 ]) 0)

The root of the problem, this unsigned extension must never be generated or 
the game is indeed over.  Instead of a signed one should, by switching to 
PROMOTE_FUNCTION_MODE in this particular case (lhs of a call).

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-06-09 22:14               ` Eric Botcazou
@ 2010-06-11 13:58                 ` Bernd Schmidt
  2010-06-16 21:23                   ` Eric Botcazou
  0 siblings, 1 reply; 15+ messages in thread
From: Bernd Schmidt @ 2010-06-11 13:58 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Paolo Bonzini, Andrew Pinski

On 06/09/2010 11:57 PM, Eric Botcazou wrote:
> IIRC I did manage to get your testcase optimized with only a few changed lines 
> in expr.c.  I can try again, but not before the end of the week.
> 
That would be good.


Bernd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-06-11 13:58                 ` Bernd Schmidt
@ 2010-06-16 21:23                   ` Eric Botcazou
  2010-06-18  7:31                     ` Bernd Schmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Botcazou @ 2010-06-16 21:23 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, Paolo Bonzini, Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

> That would be good.

Very lightly tested.


	* expr.c (expand_expr_real_1) <SSA_NAME>: Fix long line.  Save the
	original expression for later reuse.
	<expand_decl_rtl>: Use promote_function_mode to compute the signedness
	of the promoted RTL for a SSA_NAME on the LHS of a call statement.


-- 
Eric Botcazou

[-- Attachment #2: pr40900.diff --]
[-- Type: text/x-diff, Size: 2555 bytes --]

Index: expr.c
===================================================================
--- expr.c	(revision 160776)
+++ expr.c	(working copy)
@@ -8284,6 +8284,8 @@ expand_expr_real_1 (tree exp, rtx target
   location_t loc = EXPR_LOCATION (exp);
   struct separate_ops ops;
   tree treeop0, treeop1, treeop2;
+  tree ssa_name = NULL_TREE;
+  gimple g;
 
   type = TREE_TYPE (exp);
   mode = TYPE_MODE (type);
@@ -8396,15 +8398,17 @@ expand_expr_real_1 (tree exp, rtx target
 	 base variable.  This unnecessarily allocates a pseudo, see how we can
 	 reuse it, if partition base vars have it set already.  */
       if (!currently_expanding_to_rtl)
-	return expand_expr_real_1 (SSA_NAME_VAR (exp), target, tmode, modifier, NULL);
-      {
-	gimple g = get_gimple_for_ssa_name (exp);
-	if (g)
-	  return expand_expr_real (gimple_assign_rhs_to_tree (g), target,
-				   tmode, modifier, NULL);
-      }
-      decl_rtl = get_rtx_for_ssa_name (exp);
-      exp = SSA_NAME_VAR (exp);
+	return expand_expr_real_1 (SSA_NAME_VAR (exp), target, tmode, modifier,
+				   NULL);
+
+      g = get_gimple_for_ssa_name (exp);
+      if (g)
+	return expand_expr_real (gimple_assign_rhs_to_tree (g), target, tmode,
+				 modifier, NULL);
+
+      ssa_name = exp;
+      decl_rtl = get_rtx_for_ssa_name (ssa_name);
+      exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
     case PARM_DECL:
@@ -8506,15 +8510,21 @@ expand_expr_real_1 (tree exp, rtx target
       /* If the mode of DECL_RTL does not match that of the decl, it
 	 must be a promoted value.  We return a SUBREG of the wanted mode,
 	 but mark it so that we know that it was already extended.  */
-
-      if (REG_P (decl_rtl)
-	  && GET_MODE (decl_rtl) != DECL_MODE (exp))
+      if (REG_P (decl_rtl) && GET_MODE (decl_rtl) != DECL_MODE (exp))
 	{
 	  enum machine_mode pmode;
 
-	  /* Get the signedness used for this variable.  Ensure we get the
-	     same mode we got when the variable was declared.  */
-	  pmode = promote_decl_mode (exp, &unsignedp);
+	  /* Get the signedness to be used for this variable.  Ensure we get
+	     the same mode we got when the variable was declared.  */
+	  if (code == SSA_NAME
+	      && (g = SSA_NAME_DEF_STMT (ssa_name))
+	      && gimple_code (g) == GIMPLE_CALL)
+	    pmode = promote_function_mode (type, mode, &unsignedp,
+					   TREE_TYPE
+					   (TREE_TYPE (gimple_call_fn (g))),
+					   2);
+	  else
+	    pmode = promote_decl_mode (exp, &unsignedp);
 	  gcc_assert (GET_MODE (decl_rtl) == pmode);
 
 	  temp = gen_lowpart_SUBREG (mode, decl_rtl);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-06-16 21:23                   ` Eric Botcazou
@ 2010-06-18  7:31                     ` Bernd Schmidt
  2010-06-18 17:09                       ` Eric Botcazou
  0 siblings, 1 reply; 15+ messages in thread
From: Bernd Schmidt @ 2010-06-18  7:31 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Paolo Bonzini, Andrew Pinski

On 06/16/2010 10:34 PM, Eric Botcazou wrote:
> Very lightly tested.
> 
> 
> 	* expr.c (expand_expr_real_1) <SSA_NAME>: Fix long line.  Save the
> 	original expression for later reuse.
> 	<expand_decl_rtl>: Use promote_function_mode to compute the signedness
> 	of the promoted RTL for a SSA_NAME on the LHS of a call statement.

This survived my normal arm-none-linux-gnueabi testing.  Ok, but please
also check in the testcase I included with my patch.


Bernd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Patch: PR40900, extending call patterns
  2010-06-18  7:31                     ` Bernd Schmidt
@ 2010-06-18 17:09                       ` Eric Botcazou
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Botcazou @ 2010-06-18 17:09 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, Paolo Bonzini, Andrew Pinski

> This survived my normal arm-none-linux-gnueabi testing.  Ok, but please
> also check in the testcase I included with my patch.

Thanks, bootstrapped/regtested on x86_64-suse-linux, applied on the mainline.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-06-18 16:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-30  1:05 Patch: PR40900, extending call patterns Bernd Schmidt
2010-04-30  3:33 ` Andrew Pinski
2010-04-30  9:23   ` Paolo Bonzini
2010-04-30 15:09     ` Bernd Schmidt
     [not found]       ` <201005010054.17350.ebotcazou@adacore.com>
2010-05-01 10:45         ` Paolo Bonzini
2010-05-03  9:39         ` Bernd Schmidt
2010-05-05 13:01           ` Eric Botcazou
2010-05-06  7:14             ` Paolo Bonzini
2010-05-06  7:36               ` Eric Botcazou
2010-06-08 21:56             ` Bernd Schmidt
2010-06-09 22:14               ` Eric Botcazou
2010-06-11 13:58                 ` Bernd Schmidt
2010-06-16 21:23                   ` Eric Botcazou
2010-06-18  7:31                     ` Bernd Schmidt
2010-06-18 17:09                       ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).