public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7
@ 2011-09-15 18:21 Michael Meissner
  2011-09-15 18:35 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 (patches included) Michael Meissner
  2011-09-15 19:01 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 Andrew Pinski
  0 siblings, 2 replies; 6+ messages in thread
From: Michael Meissner @ 2011-09-15 18:21 UTC (permalink / raw)
  To: gcc-patches, dje.gcc

The recent changes for the new code models tickled a latent bug in the GCC 4.6
and 4.7 compilers with the AIX calling sequence (PR 50341).  When you call a
function indirectly, the compiler has to save the current function's TOC
pointer that is in r2 on the stack frame, and then load the new function's TOC
pointer before doing the call.  If the indirect function that you are calling
has the same TOC value as the current function, the code will run, but if the
indirection function has a different TOC (for example it is in the main
program, and the indirect call is in a shared library), the wrong address will
be used.

The existing code did this by doing a split after reload has finished.
Scheduling after reload then moves the load of the new TOC.  These patches for
GCC 4.6 and GCC 4.7 disable the splits so that the load of the new TOC is
always next to the bctrl instruction, preventing the compiler from moving the
load.  Ideally, we can eventually tackle the underlying problem that we did not
have the right dependencies to prevent the scheduler from moving the new TOC
load before the use.

[gcc-4.6]
2011-09-15  Alan Modra  <amodra@gmail.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/50341
	* config/rs6000/rs6000.md (call_indirect_aix32): Do not split the
	load of the indirect function's TOC from the call to prevent the
	compiler from moving the load of the new TOC above code that
	references the current function's TOC.
	(call_indirect_aix64): Ditto.
	(call_value_indirect_aix32): Ditto.
	(call_value_indirect_aix64): Ditto.
	(call_indirect_nonlocal_aix32_internal): Ditto.
	(call_indirect_nonlocal_aix32): Ditto.
	(call_indirect_nonlocal_aix64_internal): Ditto.
	(call_indirect_nonlocal_aix64): Ditto.
	(call_value_indirect_nonlocal_aix32_internal): Ditto.
	(call_value_indirect_nonlocal_aix32): Ditto.
	(call_value_indirect_nonlocal_aix64_internal): Ditto.
	(call_value_indirect_nonlocal_aix64): Ditto.


[gcc-4.7]
2011-09-15  Alan Modra  <amodra@gmail.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/50341
	* config/rs6000/rs6000.md (call_indirect_aix<ptrsize>): Do not
	split the load of the indirect function's TOC from the call to
	prevent the compiler from moving the load of the new TOC above
	code that references the current function's TOC.
	(call_indirect_aix<ptrsize>_internal): Ditto.
	(call_indirect_aix<ptrsize>_nor11): Ditto.
	(call_indirect_aix<ptrsize>_internal2): Ditto.
	(call_value_indirect_aix<ptrsize>): Ditto.
	(call_value_indirect_aix<ptrsize>_internal): Ditto.
	(call_value_indirect_aix<ptrsize>_nor11): Ditto.
	(call_value_indirect_aix<ptrsize>_internal2): Ditto.

I have bootstraped both compilers for C, C++, and Fortran.  There were no
differences in the make check output.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 (patches included)
  2011-09-15 18:21 [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 Michael Meissner
@ 2011-09-15 18:35 ` Michael Meissner
  2011-09-16  7:05   ` David Edelsohn
  2011-09-15 19:01 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 Andrew Pinski
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Meissner @ 2011-09-15 18:35 UTC (permalink / raw)
  To: gcc-patches, dje.gcc

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

It would help if I included the patches:

[gcc-4.6]
2011-09-15  Alan Modra  <amodra@gmail.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/50341
	* config/rs6000/rs6000.md (call_indirect_aix32): Do not split the
	load of the indirect function's TOC from the call to prevent the
	compiler from moving the load of the new TOC above code that
	references the current function's TOC.
	(call_indirect_aix64): Ditto.
	(call_value_indirect_aix32): Ditto.
	(call_value_indirect_aix64): Ditto.
	(call_indirect_nonlocal_aix32_internal): Ditto.
	(call_indirect_nonlocal_aix32): Ditto.
	(call_indirect_nonlocal_aix64_internal): Ditto.
	(call_indirect_nonlocal_aix64): Ditto.
	(call_value_indirect_nonlocal_aix32_internal): Ditto.
	(call_value_indirect_nonlocal_aix32): Ditto.
	(call_value_indirect_nonlocal_aix64_internal): Ditto.
	(call_value_indirect_nonlocal_aix64): Ditto.

[gcc-4.7]
2011-09-15  Alan Modra  <amodra@gmail.com>
	    Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/50341
	* config/rs6000/rs6000.md (call_indirect_aix<ptrsize>): Do not
	split the load of the indirect function's TOC from the call to
	prevent the compiler from moving the load of the new TOC above
	code that references the current function's TOC.
	(call_indirect_aix<ptrsize>_internal): Ditto.
	(call_indirect_aix<ptrsize>_nor11): Ditto.
	(call_indirect_aix<ptrsize>_internal2): Ditto.
	(call_value_indirect_aix<ptrsize>): Ditto.
	(call_value_indirect_aix<ptrsize>_internal): Ditto.
	(call_value_indirect_aix<ptrsize>_nor11): Ditto.
	(call_value_indirect_aix<ptrsize>_internal2): Ditto.


-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

[-- Attachment #2: bug-50341-gcc46.patch02b --]
[-- Type: text/plain, Size: 12632 bytes --]

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 178888)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -12165,43 +12165,67 @@ (define_insn "largetoc_low"
 (define_expand "call_indirect_aix32"
   [(set (match_dup 2)
 	(mem:SI (match_operand:SI 0 "gpc_reg_operand" "")))
-   (set (mem:SI (plus:SI (reg:SI 1) (const_int 20)))
+   (set (match_dup 3)
 	(reg:SI 2))
    (set (reg:SI 11)
 	(mem:SI (plus:SI (match_dup 0)
 			 (const_int 8))))
    (parallel [(call (mem:SI (match_dup 2))
 		    (match_operand 1 "" ""))
-	      (use (mem:SI (plus:SI (match_dup 0) (const_int 4))))
+	      (use (match_dup 4))
+	      (set (reg:SI 2) (match_dup 3))
 	      (use (reg:SI 11))
-	      (use (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
 	      (clobber (reg:SI LR_REGNO))])]
   "TARGET_32BIT"
   "
-{ operands[2] = gen_reg_rtx (SImode); }")
+{
+  operands[2] = gen_reg_rtx (SImode);
+  operands[3] = gen_rtx_MEM (SImode,
+			     gen_rtx_PLUS (SImode, stack_pointer_rtx,
+					   GEN_INT (20)));
+
+  operands[4] = gen_rtx_MEM (SImode,
+			     gen_rtx_PLUS (SImode, operands[0],
+					   GEN_INT (4)));
+
+  /* Make sure the compiler does not optimize away the store of the TOC.  */
+  MEM_VOLATILE_P (operands[3]) = 1;
+}")
 
 (define_expand "call_indirect_aix64"
   [(set (match_dup 2)
 	(mem:DI (match_operand:DI 0 "gpc_reg_operand" "")))
-   (set (mem:DI (plus:DI (reg:DI 1) (const_int 40)))
+   (set (match_dup 3)
 	(reg:DI 2))
    (set (reg:DI 11)
 	(mem:DI (plus:DI (match_dup 0)
 			 (const_int 16))))
    (parallel [(call (mem:SI (match_dup 2))
 		    (match_operand 1 "" ""))
-	      (use (mem:DI (plus:DI (match_dup 0) (const_int 8))))
+	      (use (match_dup 4))
+	      (set (reg:DI 2) (match_dup 3))
 	      (use (reg:DI 11))
-	      (use (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	      (clobber (reg:SI LR_REGNO))])]
+	      (clobber (reg:DI LR_REGNO))])]
   "TARGET_64BIT"
   "
-{ operands[2] = gen_reg_rtx (DImode); }")
+{
+  operands[2] = gen_reg_rtx (DImode);
+  operands[3] = gen_rtx_MEM (DImode,
+			     gen_rtx_PLUS (DImode, stack_pointer_rtx,
+					   GEN_INT (40)));
+
+  operands[4] = gen_rtx_MEM (DImode,
+			     gen_rtx_PLUS (DImode, operands[0],
+					   GEN_INT (8)));
+
+  /* Make sure the compiler does not optimize away the store of the TOC.  */
+  MEM_VOLATILE_P (operands[3]) = 1;
+}")
 
 (define_expand "call_value_indirect_aix32"
   [(set (match_dup 3)
 	(mem:SI (match_operand:SI 1 "gpc_reg_operand" "")))
-   (set (mem:SI (plus:SI (reg:SI 1) (const_int 20)))
+   (set (match_dup 4)
 	(reg:SI 2))
    (set (reg:SI 11)
 	(mem:SI (plus:SI (match_dup 1)
@@ -12209,18 +12233,30 @@ (define_expand "call_value_indirect_aix3
    (parallel [(set (match_operand 0 "" "")
 		   (call (mem:SI (match_dup 3))
 			 (match_operand 2 "" "")))
-	      (use (mem:SI (plus:SI (match_dup 1) (const_int 4))))
+	      (use (match_dup 5))
+	      (set (reg:SI 2) (match_dup 4))
 	      (use (reg:SI 11))
-	      (use (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
 	      (clobber (reg:SI LR_REGNO))])]
   "TARGET_32BIT"
   "
-{ operands[3] = gen_reg_rtx (SImode); }")
+{
+  operands[3] = gen_reg_rtx (SImode);
+  operands[4] = gen_rtx_MEM (DImode,
+			     gen_rtx_PLUS (DImode, stack_pointer_rtx,
+					   GEN_INT (20)));
+
+  operands[5] = gen_rtx_MEM (SImode,
+			     gen_rtx_PLUS (SImode, operands[1],
+					   GEN_INT (4)));
+
+  /* Make sure the compiler does not optimize away the store of the TOC.  */
+  MEM_VOLATILE_P (operands[4]) = 1;
+}")
 
 (define_expand "call_value_indirect_aix64"
   [(set (match_dup 3)
 	(mem:DI (match_operand:DI 1 "gpc_reg_operand" "")))
-   (set (mem:DI (plus:DI (reg:DI 1) (const_int 40)))
+   (set (match_dup 4)
 	(reg:DI 2))
    (set (reg:DI 11)
 	(mem:DI (plus:DI (match_dup 1)
@@ -12228,13 +12264,25 @@ (define_expand "call_value_indirect_aix6
    (parallel [(set (match_operand 0 "" "")
 		   (call (mem:SI (match_dup 3))
 			 (match_operand 2 "" "")))
-	      (use (mem:DI (plus:DI (match_dup 1) (const_int 8))))
+	      (use (match_dup 5))
+	      (set (reg:DI 2) (match_dup 4))
 	      (use (reg:DI 11))
-	      (use (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	      (clobber (reg:SI LR_REGNO))])]
+	      (clobber (reg:DI LR_REGNO))])]
   "TARGET_64BIT"
   "
-{ operands[3] = gen_reg_rtx (DImode); }")
+{
+  operands[3] = gen_reg_rtx (DImode);
+  operands[4] = gen_rtx_MEM (DImode,
+			     gen_rtx_PLUS (DImode, stack_pointer_rtx,
+					   GEN_INT (40)));
+
+  operands[5] = gen_rtx_MEM (DImode,
+			     gen_rtx_PLUS (DImode, operands[1],
+					   GEN_INT (8)));
+
+  /* Make sure the compiler does not optimize away the store of the TOC.  */
+  MEM_VOLATILE_P (operands[4]) = 1;
+}")
 
 ;; Now the definitions for the call and call_value insns
 (define_expand "call"
@@ -12427,47 +12475,27 @@ (define_insn "*call_value_local64"
 
 ;; Call to function which may be in another module.  Restore the TOC
 ;; pointer (r2) after the call unless this is System V.
-;; Operand2 is nonzero if we are using the V.4 calling sequence and
+;; Operand1 is nonzero if we are using the V.4 calling sequence and
 ;; either the function was not prototyped, or it was prototyped as a
 ;; variable argument function.  It is > 0 if FP registers were passed
 ;; and < 0 if they were not.
+;; Operand2 is the address of the 3 word function pointer that offset 4 points
+;; to the value to be loaded in the TOC register.  Do not split the load from
+;; the call, as it may move the load of the TOC before any addresses using
+;; the TOC.
 
-(define_insn_and_split "*call_indirect_nonlocal_aix32_internal"
+(define_insn "*call_indirect_nonlocal_aix32"
   [(call (mem:SI (match_operand:SI 0 "register_operand" "c,*l"))
 		 (match_operand 1 "" "g,g"))
-   (use (mem:SI (plus:SI (match_operand:SI 2 "register_operand" "b,b") (const_int 4))))
+   (use (match_operand:SI 2 "memory_operand" "m,m"))
+   (set (reg:SI 2) (match_operand:SI 3 "memory_operand" "m,m"))
    (use (reg:SI 11))
-   (use (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
    (clobber (reg:SI LR_REGNO))]
   "TARGET_32BIT && DEFAULT_ABI == ABI_AIX"
-  "#"
-  "&& reload_completed"
-  [(set (reg:SI 2)
-	(mem:SI (plus:SI (match_dup 2) (const_int 4))))
-   (parallel [(call (mem:SI (match_dup 0))
-		    (match_dup 1))
-	      (use (reg:SI 2))
-	      (use (reg:SI 11))
-	      (set (reg:SI 2)
-		   (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-	      (clobber (reg:SI LR_REGNO))])]
-  ""
+  "{l|lwz} 2,%2\;b%T0l\;{l|lwz} 2,%3"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_indirect_nonlocal_aix32"
-  [(call (mem:SI (match_operand:SI 0 "register_operand" "c,*l"))
-	 (match_operand 1 "" "g,g"))
-   (use (reg:SI 2))
-   (use (reg:SI 11))
-   (set (reg:SI 2)
-	(mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_32BIT && DEFAULT_ABI == ABI_AIX && reload_completed"
-  "b%T0l\;{l|lwz} 2,20(1)"
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
-
 (define_insn "*call_nonlocal_aix32"
   [(call (mem:SI (match_operand:SI 0 "symbol_ref_operand" "s"))
 	 (match_operand 1 "" "g"))
@@ -12480,43 +12508,18 @@ (define_insn "*call_nonlocal_aix32"
   [(set_attr "type" "branch")
    (set_attr "length" "8")])
    
-(define_insn_and_split "*call_indirect_nonlocal_aix64_internal"
+(define_insn "*call_indirect_nonlocal_aix64"
   [(call (mem:SI (match_operand:DI 0 "register_operand" "c,*l"))
 		 (match_operand 1 "" "g,g"))
-   (use (mem:DI (plus:DI (match_operand:DI 2 "register_operand" "b,b")
-			 (const_int 8))))
+   (use (match_operand:DI 2 "memory_operand" "m,m"))
+   (set (reg:DI 2) (match_operand:DI 3 "memory_operand" "m,m"))
    (use (reg:DI 11))
-   (use (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-   (clobber (reg:SI LR_REGNO))]
+   (clobber (reg:DI LR_REGNO))]
   "TARGET_64BIT && DEFAULT_ABI == ABI_AIX"
-  "#"
-  "&& reload_completed"
-  [(set (reg:DI 2)
-	(mem:DI (plus:DI (match_dup 2) (const_int 8))))
-   (parallel [(call (mem:SI (match_dup 0))
-		    (match_dup 1))
-	      (use (reg:DI 2))
-	      (use (reg:DI 11))
-	      (set (reg:DI 2)
-		   (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	      (clobber (reg:SI LR_REGNO))])]
-  ""
+  "ld  2,%2\;b%T0l\;ld 2,%3"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_indirect_nonlocal_aix64"
-  [(call (mem:SI (match_operand:DI 0 "register_operand" "c,*l"))
-	 (match_operand 1 "" "g,g"))
-   (use (reg:DI 2))
-   (use (reg:DI 11))
-   (set (reg:DI 2)
-	(mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_64BIT && DEFAULT_ABI == ABI_AIX && reload_completed"
-  "b%T0l\;ld 2,40(1)"
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
-
 (define_insn "*call_nonlocal_aix64"
   [(call (mem:SI (match_operand:DI 0 "symbol_ref_operand" "s"))
 	 (match_operand 1 "" "g"))
@@ -12529,44 +12532,18 @@ (define_insn "*call_nonlocal_aix64"
   [(set_attr "type" "branch")
    (set_attr "length" "8")])
 
-(define_insn_and_split "*call_value_indirect_nonlocal_aix32_internal"
-  [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:SI 1 "register_operand" "c,*l"))
-		      (match_operand 2 "" "g,g")))
-	(use (mem:SI (plus:SI (match_operand:SI 3 "register_operand" "b,b")
-			      (const_int 4))))
-	(use (reg:SI 11))
-	(use (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-	(clobber (reg:SI LR_REGNO))]
-  "TARGET_32BIT && DEFAULT_ABI == ABI_AIX"
-  "#"
-  "&& reload_completed"
-  [(set (reg:SI 2)
-	(mem:SI (plus:SI (match_dup 3) (const_int 4))))
-   (parallel [(set (match_dup 0) (call (mem:SI (match_dup 1))
-				       (match_dup 2)))
-	      (use (reg:SI 2))
-	      (use (reg:SI 11))
-	      (set (reg:SI 2)
-		   (mem:SI (plus:SI (reg:SI 1) (const_int 20))))
-	      (clobber (reg:SI LR_REGNO))])]
-  ""
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "12")])
-
 (define_insn "*call_value_indirect_nonlocal_aix32"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand:SI 1 "register_operand" "c,*l"))
-	      (match_operand 2 "" "g,g")))
-   (use (reg:SI 2))
+		      (match_operand 2 "" "g,g")))
+   (use (match_operand:SI 3 "memory_operand" "m,m"))
+   (set (reg:SI 2) (match_operand:SI 4 "memory_operand" "m,m"))
    (use (reg:SI 11))
-   (set (reg:SI 2)
-	(mem:SI (plus:SI (reg:SI 1) (const_int 20))))
    (clobber (reg:SI LR_REGNO))]
-  "TARGET_32BIT && DEFAULT_ABI == ABI_AIX && reload_completed"
-  "b%T1l\;{l|lwz} 2,20(1)"
+  "TARGET_32BIT && DEFAULT_ABI == ABI_AIX"
+  "{l|lwz} 2,%3\;b%T1l\;{l|lwz} 2,%4"
   [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
+   (set_attr "length" "12")])
 
 (define_insn "*call_value_nonlocal_aix32"
   [(set (match_operand 0 "" "")
@@ -12581,45 +12558,19 @@ (define_insn "*call_value_nonlocal_aix32
   [(set_attr "type" "branch")
    (set_attr "length" "8")])
 
-(define_insn_and_split "*call_value_indirect_nonlocal_aix64_internal"
+(define_insn "*call_value_indirect_nonlocal_aix64"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand:DI 1 "register_operand" "c,*l"))
 		      (match_operand 2 "" "g,g")))
-	(use (mem:DI (plus:DI (match_operand:DI 3 "register_operand" "b,b")
-			      (const_int 8))))
-	(use (reg:DI 11))
-	(use (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	(clobber (reg:SI LR_REGNO))]
+   (use (match_operand:DI 3 "memory_operand" "m,m"))
+   (set (reg:DI 2) (match_operand:DI 4 "memory_operand" "m,m"))
+   (use (reg:DI 11))
+   (clobber (reg:DI LR_REGNO))]
   "TARGET_64BIT && DEFAULT_ABI == ABI_AIX"
-  "#"
-  "&& reload_completed"
-  [(set (reg:DI 2)
-	(mem:DI (plus:DI (match_dup 3) (const_int 8))))
-   (parallel [(set (match_dup 0) (call (mem:SI (match_dup 1))
-				       (match_dup 2)))
-	      (use (reg:DI 2))
-	      (use (reg:DI 11))
-	      (set (reg:DI 2)
-		   (mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-	      (clobber (reg:SI LR_REGNO))])]
-  ""
+  "ld 2,%3\;b%T1l\;ld 2,%4"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_value_indirect_nonlocal_aix64"
-  [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:DI 1 "register_operand" "c,*l"))
-	      (match_operand 2 "" "g,g")))
-   (use (reg:DI 2))
-   (use (reg:DI 11))
-   (set (reg:DI 2)
-	(mem:DI (plus:DI (reg:DI 1) (const_int 40))))
-   (clobber (reg:SI LR_REGNO))]
-  "TARGET_64BIT && DEFAULT_ABI == ABI_AIX && reload_completed"
-  "b%T1l\;ld 2,40(1)"
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
-
 (define_insn "*call_value_nonlocal_aix64"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand:DI 1 "symbol_ref_operand" "s"))

[-- Attachment #3: bug-50341-gcc47.patch02b --]
[-- Type: text/plain, Size: 7711 bytes --]

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 178858)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -12379,169 +12379,73 @@ (define_insn "*call_value_local64"
 ;; Operand2 is the location in the function descriptor to load r2 from
 ;; Operand3 is the stack location to hold the current TOC pointer
 
-(define_insn_and_split "call_indirect_aix<ptrsize>"
+(define_insn "call_indirect_aix<ptrsize>"
   [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
 	 (match_operand 1 "" "g,g"))
    (use (match_operand:P 2 "memory_operand" "m,m"))
-   (use (match_operand:P 3 "memory_operand" "m,m"))
+   (set (reg:P TOC_REGNUM) (match_operand:P 3 "memory_operand" "m,m"))
    (use (reg:P STATIC_CHAIN_REGNUM))
    (clobber (reg:P LR_REGNO))]
   "DEFAULT_ABI == ABI_AIX && TARGET_POINTERS_TO_NESTED_FUNCTIONS"
-  "#"
-  "&& reload_completed"
-  [(set (reg:P TOC_REGNUM) (match_dup 2))
-   (parallel [(call (mem:SI (match_dup 0))
-		    (match_dup 1))
-	      (use (reg:P TOC_REGNUM))
-	      (use (reg:P STATIC_CHAIN_REGNUM))
-	      (use (match_dup 3))
-	      (set (reg:P TOC_REGNUM) (match_dup 3))
-	      (clobber (reg:P LR_REGNO))])]
-  ""
+  "<ptrload> 2,%2\;b%T0l\;<ptrload> 2,%3"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_indirect_aix<ptrsize>_internal"
-  [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
-	 (match_operand 1 "" "g,g"))
-   (use (reg:P TOC_REGNUM))
-   (use (reg:P STATIC_CHAIN_REGNUM))
-   (use (match_operand:P 2 "memory_operand" "m,m"))
-   (set (reg:P TOC_REGNUM) (match_dup 2))
-   (clobber (reg:P LR_REGNO))]
-  "DEFAULT_ABI == ABI_AIX && reload_completed
-   && TARGET_POINTERS_TO_NESTED_FUNCTIONS"
-  "b%T0l\;<ptrload> 2,%2"
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
-
-;; Like call_indirect_aix<ptrsize>, except don't load the static chain
+;; Like call_indirect_aix<ptrsize>, but no use of the static chain
 ;; Operand0 is the addresss of the function to call
 ;; Operand1 is the flag for System V.4 for unprototyped or FP registers
 ;; Operand2 is the location in the function descriptor to load r2 from
 ;; Operand3 is the stack location to hold the current TOC pointer
 
-(define_insn_and_split "call_indirect_aix<ptrsize>_nor11"
+(define_insn "call_indirect_aix<ptrsize>_nor11"
   [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
 	 (match_operand 1 "" "g,g"))
    (use (match_operand:P 2 "memory_operand" "m,m"))
-   (use (match_operand:P 3 "memory_operand" "m,m"))
+   (set (reg:P TOC_REGNUM) (match_operand:P 3 "memory_operand" "m,m"))
    (clobber (reg:P LR_REGNO))]
   "DEFAULT_ABI == ABI_AIX && !TARGET_POINTERS_TO_NESTED_FUNCTIONS"
-  "#"
-  "&& reload_completed"
-  [(set (reg:P TOC_REGNUM) (match_dup 2))
-   (parallel [(call (mem:SI (match_dup 0))
-		    (match_dup 1))
-	      (use (reg:P TOC_REGNUM))
-	      (use (match_dup 3))
-	      (set (reg:P TOC_REGNUM) (match_dup 3))
-	      (clobber (reg:P LR_REGNO))])]
-  ""
+  "<ptrload> 2,%2\;b%T0l\;<ptrload> 2,%3"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_indirect_aix<ptrsize>_internal2"
-  [(call (mem:SI (match_operand:P 0 "register_operand" "c,*l"))
-	 (match_operand 1 "" "g,g"))
-   (use (reg:P TOC_REGNUM))
-   (use (match_operand:P 2 "memory_operand" "m,m"))
-   (set (reg:P TOC_REGNUM) (match_dup 2))
-   (clobber (reg:P LR_REGNO))]
-  "DEFAULT_ABI == ABI_AIX && reload_completed
-   && !TARGET_POINTERS_TO_NESTED_FUNCTIONS"
-  "b%T0l\;<ptrload> 2,%2"
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
-
 ;; Operand0 is the return result of the function
 ;; Operand1 is the addresss of the function to call
 ;; Operand2 is the flag for System V.4 for unprototyped or FP registers
 ;; Operand3 is the location in the function descriptor to load r2 from
 ;; Operand4 is the stack location to hold the current TOC pointer
 
-(define_insn_and_split "call_value_indirect_aix<ptrsize>"
+(define_insn "call_value_indirect_aix<ptrsize>"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
 	      (match_operand 2 "" "g,g")))
    (use (match_operand:P 3 "memory_operand" "m,m"))
-   (use (match_operand:P 4 "memory_operand" "m,m"))
+   (set (reg:P TOC_REGNUM) (match_operand:P 4 "memory_operand" "m,m"))
    (use (reg:P STATIC_CHAIN_REGNUM))
    (clobber (reg:P LR_REGNO))]
   "DEFAULT_ABI == ABI_AIX && TARGET_POINTERS_TO_NESTED_FUNCTIONS"
-  "#"
-  "&& reload_completed"
-  [(set (reg:P TOC_REGNUM) (match_dup 3))
-   (parallel [(set (match_dup 0)
-		   (call (mem:SI (match_dup 1))
-			 (match_dup 2)))
-	      (use (reg:P TOC_REGNUM))
-	      (use (reg:P STATIC_CHAIN_REGNUM))
-	      (use (match_dup 4))
-	      (set (reg:P TOC_REGNUM) (match_dup 4))
-	      (clobber (reg:P LR_REGNO))])]
-  ""
+  "<ptrload> 2,%3\;b%T1l\;<ptrload> 2,%4"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_value_indirect_aix<ptrsize>_internal"
-  [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
-	      (match_operand 2 "" "g,g")))
-   (use (reg:P TOC_REGNUM))
-   (use (reg:P STATIC_CHAIN_REGNUM))
-   (use (match_operand:P 3 "memory_operand" "m,m"))
-   (set (reg:P TOC_REGNUM) (match_dup 3))
-   (clobber (reg:P LR_REGNO))]
-  "DEFAULT_ABI == ABI_AIX && reload_completed
-   && TARGET_POINTERS_TO_NESTED_FUNCTIONS"
-  "b%T1l\;<ptrload> 2,%3"
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
-
-;; Like call_value_indirect_aix<ptrsize>, but don't load the static chain
+;; Like call_value_indirect_aix<ptrsize>, but no use of the static chain
 ;; Operand0 is the return result of the function
 ;; Operand1 is the addresss of the function to call
 ;; Operand2 is the flag for System V.4 for unprototyped or FP registers
 ;; Operand3 is the location in the function descriptor to load r2 from
 ;; Operand4 is the stack location to hold the current TOC pointer
 
-(define_insn_and_split "call_value_indirect_aix<ptrsize>_nor11"
+(define_insn "call_value_indirect_aix<ptrsize>_nor11"
   [(set (match_operand 0 "" "")
 	(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
 	      (match_operand 2 "" "g,g")))
    (use (match_operand:P 3 "memory_operand" "m,m"))
-   (use (match_operand:P 4 "memory_operand" "m,m"))
+   (set (reg:P TOC_REGNUM) (match_operand:P 4 "memory_operand" "m,m"))
    (clobber (reg:P LR_REGNO))]
   "DEFAULT_ABI == ABI_AIX && !TARGET_POINTERS_TO_NESTED_FUNCTIONS"
-  "#"
-  "&& reload_completed"
-  [(set (reg:P TOC_REGNUM) (match_dup 3))
-   (parallel [(set (match_dup 0)
-		   (call (mem:SI (match_dup 1))
-			 (match_dup 2)))
-	      (use (reg:P TOC_REGNUM))
-	      (use (match_dup 4))
-	      (set (reg:P TOC_REGNUM) (match_dup 4))
-	      (clobber (reg:P LR_REGNO))])]
-  ""
+  "<ptrload> 2,%3\;b%T1l\;<ptrload> 2,%4"
   [(set_attr "type" "jmpreg")
    (set_attr "length" "12")])
 
-(define_insn "*call_value_indirect_aix<ptrsize>_internal2"
-  [(set (match_operand 0 "" "")
-	(call (mem:SI (match_operand:P 1 "register_operand" "c,*l"))
-	      (match_operand 2 "" "g,g")))
-   (use (reg:P TOC_REGNUM))
-   (use (match_operand:P 3 "memory_operand" "m,m"))
-   (set (reg:P TOC_REGNUM) (match_dup 3))
-   (clobber (reg:P LR_REGNO))]
-  "DEFAULT_ABI == ABI_AIX && reload_completed
-   && !TARGET_POINTERS_TO_NESTED_FUNCTIONS"
-  "b%T1l\;<ptrload> 2,%3"
-  [(set_attr "type" "jmpreg")
-   (set_attr "length" "8")])
-
 ;; Call to function which may be in another module.  Restore the TOC
 ;; pointer (r2) after the call unless this is System V.
 ;; Operand2 is nonzero if we are using the V.4 calling sequence and

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7
  2011-09-15 18:21 [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 Michael Meissner
  2011-09-15 18:35 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 (patches included) Michael Meissner
@ 2011-09-15 19:01 ` Andrew Pinski
  2011-09-15 19:13   ` Peter Bergner
  2011-09-15 19:42   ` Michael Meissner
  1 sibling, 2 replies; 6+ messages in thread
From: Andrew Pinski @ 2011-09-15 19:01 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, dje.gcc

On Thu, Sep 15, 2011 at 10:52 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> The recent changes for the new code models tickled a latent bug in the GCC 4.6
> and 4.7 compilers with the AIX calling sequence (PR 50341).  When you call a
> function indirectly, the compiler has to save the current function's TOC
> pointer that is in r2 on the stack frame, and then load the new function's TOC
> pointer before doing the call.  If the indirect function that you are calling
> has the same TOC value as the current function, the code will run, but if the
> indirection function has a different TOC (for example it is in the main
> program, and the indirect call is in a shared library), the wrong address will
> be used.
>
> The existing code did this by doing a split after reload has finished.
> Scheduling after reload then moves the load of the new TOC.  These patches for
> GCC 4.6 and GCC 4.7 disable the splits so that the load of the new TOC is
> always next to the bctrl instruction, preventing the compiler from moving the
> load.  Ideally, we can eventually tackle the underlying problem that we did not
> have the right dependencies to prevent the scheduler from moving the new TOC
> load before the use.

ENOPATCH.

It was originally splitting before reload and I had changed it to be
splitting after reload assuming the dependencies would be correct.
What is the instruction which is being described as not dependent on
the load of r2?


-- Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7
  2011-09-15 19:01 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 Andrew Pinski
@ 2011-09-15 19:13   ` Peter Bergner
  2011-09-15 19:42   ` Michael Meissner
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Bergner @ 2011-09-15 19:13 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Thu, 2011-09-15 at 11:07 -0700, Andrew Pinski wrote:
> It was originally splitting before reload and I had changed it to be
> splitting after reload assuming the dependencies would be correct.
> What is the instruction which is being described as not dependent on
> the load of r2?

Alan describes the scenario here:

    http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00304.html


Peter



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7
  2011-09-15 19:01 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 Andrew Pinski
  2011-09-15 19:13   ` Peter Bergner
@ 2011-09-15 19:42   ` Michael Meissner
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2011-09-15 19:42 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Michael Meissner, gcc-patches, dje.gcc

On Thu, Sep 15, 2011 at 11:07:43AM -0700, Andrew Pinski wrote:
> On Thu, Sep 15, 2011 at 10:52 AM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > The recent changes for the new code models tickled a latent bug in the GCC 4.6
> > and 4.7 compilers with the AIX calling sequence (PR 50341).  When you call a
> > function indirectly, the compiler has to save the current function's TOC
> > pointer that is in r2 on the stack frame, and then load the new function's TOC
> > pointer before doing the call.  If the indirect function that you are calling
> > has the same TOC value as the current function, the code will run, but if the
> > indirection function has a different TOC (for example it is in the main
> > program, and the indirect call is in a shared library), the wrong address will
> > be used.
> >
> > The existing code did this by doing a split after reload has finished.
> > Scheduling after reload then moves the load of the new TOC.  These patches for
> > GCC 4.6 and GCC 4.7 disable the splits so that the load of the new TOC is
> > always next to the bctrl instruction, preventing the compiler from moving the
> > load.  Ideally, we can eventually tackle the underlying problem that we did not
> > have the right dependencies to prevent the scheduler from moving the new TOC
> > load before the use.
> 
> ENOPATCH.

Yes, I realized that after the fact and put it out later.
 
> It was originally splitting before reload and I had changed it to be
> splitting after reload assuming the dependencies would be correct.
> What is the instruction which is being described as not dependent on
> the load of r2?

From the example in the bug, it is the 3rd cpu_fprintf, where it decides to
move loading up an address into register 27, and move the load of the address
before the call, since register 27 is preserved across calls.

void cpu_dump_state (struct CPUPPCState *env, FILE *f, fprintf_function cpu_fprintf,
                     int flags)
{



    int i;

    cpu_fprintf(f, "NIP " "%016" "l" "x" "   LR " "%016" "l" "x" " CTR "
                "%016" "l" "x" " XER " "%016" "l" "x" "\n",
                env->nip, env->lr, env->ctr, env->xer);
    cpu_fprintf(f, "MSR " "%016" "l" "x" " HID0 " "%016" "l" "x" "  HF "
                "%016" "l" "x" " idx %d\n", env->msr, env->spr[(0x3F0)],
                env->hflags, env->mmu_idx);

    cpu_fprintf(f, "TB %08" "u" " %08" "l" "u"

                " DECR %08" "u"

                "\n",
                cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)

                , cpu_ppc_load_decr(env)

                );

These are the insns that are moved before the call:

(insn 1105 153 203 2 (set (reg/f:DI 27 27 [603])
        (const:DI (plus:DI (reg:DI 2 2)
                (high:DI (const:DI (unspec:DI [
                                (symbol_ref/f:DI ("*.LC5") [flags 0x82]  <var_decl 0xfff931e6360 *.LC5>)
                            ] UNSPEC_TOCREL)))))) 513 {largetoc_high}
     (expr_list:REG_EQUIV (const:DI (plus:DI (reg:DI 2 2)
                (high:DI (const:DI (unspec:DI [
                                (symbol_ref/f:DI ("*.LC5") [flags 0x82]  <var_decl 0xfff931e6360 *.LC5>)
                            ] UNSPEC_TOCREL)))))
        (nil)))

(insn 1107 161 204 2 (set (reg/f:DI 27 27 [601])
        (lo_sum:DI (reg/f:DI 27 27 [603])
            (const:DI (unspec:DI [
                        (symbol_ref/f:DI ("*.LC5") [flags 0x82]  <var_decl 0xfff931e6360 *.LC5>)
                    ] UNSPEC_TOCREL)))) 514 {largetoc_low}
     (expr_list:REG_EQUIV (symbol_ref/f:DI ("*.LC5") [flags 0x82]  <var_decl 0xfff931e6360 *.LC5>)
        (nil)))

The RTL logic is not looking into the const.

This is triggered by Alan's patch in June:

2011-06-20  Alan Modra  <amodra@gmail.com>

	* config/rs6000/rs6000.c (create_TOC_reference): Wrap high part
	of toc-relative address in CONST.
	(rs6000_delegitimize_address): Recognize changed address.
	(rs6000_legitimize_reload_address): Likewise.
	(rs6000_emit_move): Don't force these constants to memory.
	* config/rs6000/rs6000.md (tls_gd, tls_gd_high): Wrap high part of
	toc-relative address in CONST.
	(tls_ld, tls_ld_high, tls_got_dtprel, tls_got_dtprel_high): Likewise.
	(tls_got_tprel, tls_got_tprel_high, largetoc_high): Likewise.

Now, I anticipate that the patch in 4.7 will be temporary until we are
confident that we have the right solution, but it is better to put a bandaid on
the patch to limit the damage.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 (patches included)
  2011-09-15 18:35 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 (patches included) Michael Meissner
@ 2011-09-16  7:05   ` David Edelsohn
  0 siblings, 0 replies; 6+ messages in thread
From: David Edelsohn @ 2011-09-16  7:05 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches

On Thu, Sep 15, 2011 at 2:04 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> It would help if I included the patches:
>
> [gcc-4.6]
> 2011-09-15  Alan Modra  <amodra@gmail.com>
>            Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        PR target/50341
>        * config/rs6000/rs6000.md (call_indirect_aix32): Do not split the
>        load of the indirect function's TOC from the call to prevent the
>        compiler from moving the load of the new TOC above code that
>        references the current function's TOC.
>        (call_indirect_aix64): Ditto.
>        (call_value_indirect_aix32): Ditto.
>        (call_value_indirect_aix64): Ditto.
>        (call_indirect_nonlocal_aix32_internal): Ditto.
>        (call_indirect_nonlocal_aix32): Ditto.
>        (call_indirect_nonlocal_aix64_internal): Ditto.
>        (call_indirect_nonlocal_aix64): Ditto.
>        (call_value_indirect_nonlocal_aix32_internal): Ditto.
>        (call_value_indirect_nonlocal_aix32): Ditto.
>        (call_value_indirect_nonlocal_aix64_internal): Ditto.
>        (call_value_indirect_nonlocal_aix64): Ditto.
>
> [gcc-4.7]
> 2011-09-15  Alan Modra  <amodra@gmail.com>
>            Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>        PR target/50341
>        * config/rs6000/rs6000.md (call_indirect_aix<ptrsize>): Do not
>        split the load of the indirect function's TOC from the call to
>        prevent the compiler from moving the load of the new TOC above
>        code that references the current function's TOC.
>        (call_indirect_aix<ptrsize>_internal): Ditto.
>        (call_indirect_aix<ptrsize>_nor11): Ditto.
>        (call_indirect_aix<ptrsize>_internal2): Ditto.
>        (call_value_indirect_aix<ptrsize>): Ditto.
>        (call_value_indirect_aix<ptrsize>_internal): Ditto.
>        (call_value_indirect_aix<ptrsize>_nor11): Ditto.
>        (call_value_indirect_aix<ptrsize>_internal2): Ditto.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-09-16  5:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-15 18:21 [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 Michael Meissner
2011-09-15 18:35 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 (patches included) Michael Meissner
2011-09-16  7:05   ` David Edelsohn
2011-09-15 19:01 ` [PATCH], PR 50341, Fix TOC load scheduling for powerpc on GCC 4.6 and 4.7 Andrew Pinski
2011-09-15 19:13   ` Peter Bergner
2011-09-15 19:42   ` Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).