public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Painful problems with -fpic implementation on powerpc-sysv
       [not found] ` <98072823153402.00476@ns1102.munich.netsurf.de>
@ 1998-08-20  0:51   ` Geoff Keating
  1998-08-20  9:57     ` H.J. Lu
                       ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Geoff Keating @ 1998-08-20  0:51 UTC (permalink / raw)
  To: egcs; +Cc: Franz Sirl

This is an analysis of two very nasty bugs with -fpic/-fPIC and
optimisation on powerpc SVR4 ABI targets.

Two testcases for the bugs are attached below, called 'test3.c' and
'test4.c'.  The original files were from TclX 8.0.

If these are compiled with 'gcc -O2 -fpic test3.c -S', gcc fails with
an internal compiler error.  For test3.c, it's:

test3.c: In function `TclX_MaxObjCmd':
test3.c:12: internal error--insn does not satisfy its constraints:

(insn 48 11 13 (set (reg:SI 11 r11)
        (unspec[ 
                (symbol_ref/u:SI ("*.LC2"))
                (reg:SI 65 lr)
            ]  8)) 398 {*movsi_got_internal} (nil)
    (nil))
toplev.c:1360: Internal compiler error in function fatal_insn


For test4.c:

test4.c: In function `TclX_MaxObjCmd':
test4.c:8: internal error -- needed new GOT register during reload phase to load:

(symbol_ref/u:SI ("*.LC0"))
toplev.c:1360: Internal compiler error in function fatal_insn


The underlying problem for both of these is the same.

The powerpc -fpic implementation works by allocating a pseudo to hold
the GOT pointer, when required, (in the routine rs6000_got_register).
In rs6000_finalise_pic, the pseudo is initialised.  The result is used
in the movsi_got_internal insn.

There are two problems with this:

1. rs6000_finalise_pic is run before reload.  But sometimes reload can
   create a new memory reference (see, for instance, test4.c).  So
   rs6000_finalise_pic does not know where all the memory references
   are, or even if there are any at all.

2. Worse, rs6000_finalise_pic is actually run before scheduling, and
   so the scheduler can get the dependencies wrong if new memory
   references appear later.  For instance, in test3.c after local
   register allocation, there is:

(insn 13 11 38 (set (reg:DI 84)
        (const_double (const_int 0) 0 2146435072)) 425 {*movdi_32} (nil)
    (expr_list:REG_EQUIV (const_double (const_int 0) 0 2146435072)
        (nil)))

(insn 38 13 14 (set (reg:SI 88)
        (unspec[ 
                (const_int 0)
            ]  7)) 512 {init_v4_pic} (nil)
    (nil))

But the first insn, number 13, is a memory reference.  It doesn't look
like it now, but pseudo 84 will be allocated to a floating-point
register, and the only way to load a constant into a FP register is
from memory.  So reload will push the const_double out to a constant
memory location.  The load from memory will require the GOT pointer.
But the GOT pointer is not initialised until the _next_ insn.

Worse, because reload doesn't know any better at this point, it will
decide that pseudo 88 can be left in the link register (which is where
init_v4_pic puts it initially), but it has to be moved to a general
register to be used in movsi_got_internal.

test4.c is similar, but here no init_v4_pic insn is ever emitted, and
this case is caught earlier.


How to fix?  Well, clearly, if the scheduler is going to move insns
around based on what registers they use, then it is impermissible to
add new occurrences of the GOT pseudo after init_v4_pic has been
emitted; they will confuse sched's dependency checking.

A partial patch to try to (1) ensure and (2) enforce this is attached
below.  It is much better at enforcement, so much so that
'make LANGUAGES=c' fails when it tries to compile crtbegin.o with the
patched egcs :-(.  It does however let the tests below compile, and if
you remove
+  if (too_late_for_got)
+    abort();
from the patch, it's somewhat usable---at least, no worse than before.

The problem here is that half the compiler happens after FINALIZE_PIC.
Perhaps init_v4_pic should be shifted down to just before reload?  Or
maybe the (unspec[(const_int 0)] 7) should be inserted directly into
the insns if we need to make a new GOT reference between FINALIZE_PIC
and reload (this happens often so it might hurt performance), then the
patch below can try to avoid creating new GOT references during and
after reload.

Can anyone think of a better way?  The other ports use a fixed
register for the GOT.  The advantage of the rs6000 technique is that
it lets the GOT pointer be spilled to memory, which actually happens
when compiling the internals of printf (among other places); and it
allows reuse of the GOT register if the procedure doesn't actually
access global data.


PS: The change to PREFERRED_RELOAD_CLASS should probably go in no
matter how the -fpic problems are fixed.

-- 
Geoffrey Keating <geoffk@ozemail.com.au>

===File ~/gcc-bugs/test3.c==================================
double
TclX_MaxObjCmd (int objc, double value)
{
    double maxValue = - (__extension__	((union { 
	   unsigned __l __attribute__((__mode__(__DI__)));
	   double __d;})
	{ __l: 0x7ff0000000000000ULL }).__d);
    if (value > maxValue) {
      maxValue = value;
    }
    return 0 ;
}
============================================================

===File ~/gcc-bugs/test4.c==================================
double
TclX_MaxObjCmd (int objc, double value)
{
    return - (__extension__ ((union {
	   unsigned __l __attribute__((__mode__(__DI__)));
	   double __d; })
	{ __l: 0x7ff0000000000000ULL }).__d)  ;
}
============================================================

===File ~/patches/egcs-3.diff===============================
Wed Aug 19 04:26:35 1998  Geoff Keating  <geoffk@ozemail.com.au>

	* config/rs6000/rs6000.c (too_late_for_got): New flag.
	(rs6000_init_expanders): Initialise it.
	(rs6000_finalize_pic): Set it.
	(rs6000_got_register): Check it.
	* config/rs6000/rs6000.h (AVOID_RELOAD_CONST_MEM): Use new flag.
	(PREFERRED_RELOAD_CLASS): Correct definition, to be like sparc.h.
	* reload.c (find_reloads): Use AVOID_RELOAD_CONST_MEM.

--- reload.c~	Wed Aug 19 04:13:00 1998
+++ reload.c	Wed Aug 19 04:21:35 1998
@@ -113,6 +113,10 @@ a register with any other reload.  */
 #ifndef REG_MODE_OK_FOR_BASE_P
 #define REG_MODE_OK_FOR_BASE_P(REGNO, MODE) REG_OK_FOR_BASE_P (REGNO)
 #endif
+
+#ifndef AVOID_RELOAD_CONST_MEM
+#define AVOID_RELOAD_CONST_MEM(X) 0
+#endif
 \f
 /* The variables set up by `find_reloads' are:
 
@@ -2995,7 +2999,8 @@ find_reloads (insn, replace, ind_levels,
 		  win = 1;
 		if (CONSTANT_P (operand)
 		    /* force_const_mem does not accept HIGH.  */
-		    && GET_CODE (operand) != HIGH)
+		    && GET_CODE (operand) != HIGH
+		    && !AVOID_RELOAD_CONST_MEM (operand))
 		  badop = 0;
 		constmemok = 1;
 		break;
@@ -3071,7 +3076,9 @@ find_reloads (insn, replace, ind_levels,
 			    || (reg_equiv_address[REGNO (operand)] != 0))))
 		  win = 1;
 		/* force_const_mem does not accept HIGH.  */
-		if ((CONSTANT_P (operand) && GET_CODE (operand) != HIGH)
+		if ((CONSTANT_P (operand)
+		     && GET_CODE (operand) != HIGH
+		     && !AVOID_RELOAD_CONST_MEM (operand))
 		    || GET_CODE (operand) == MEM)
 		  badop = 0;
 		constmemok = 1;
--- config/rs6000/rs6000.h~	Mon Aug 17 17:19:30 1998
+++ config/rs6000/rs6000.h	Wed Aug 19 04:25:31 1998
@@ -1105,8 +1105,11 @@ enum reg_class
    floating-point CONST_DOUBLE to force it to be copied to memory.  */
 
 #define PREFERRED_RELOAD_CLASS(X,CLASS)			\
-  ((GET_CODE (X) == CONST_DOUBLE			\
-    && GET_MODE_CLASS (GET_MODE (X)) == MODE_FLOAT)	\
+  (CONSTANT_P (X)					\
+   && ((CLASS) == FLOAT_REGS				\
+       || (GET_MODE_CLASS (GET_MODE (X)) == MODE_FLOAT	\
+	   && (HOST_FLOAT_FORMAT != IEEE_FLOAT_FORMAT	\
+	       || HOST_BITS_PER_INT != BITS_PER_WORD)))	\
    ? NO_REGS : (CLASS))
 
 /* Return the register class of a scratch register needed to copy IN into
@@ -1122,6 +1125,11 @@ enum reg_class
 #define SECONDARY_MEMORY_NEEDED(CLASS1,CLASS2,MODE) \
  ((CLASS1) != (CLASS2) && ((CLASS1) == FLOAT_REGS || (CLASS2) == FLOAT_REGS))
 
+/* There are some times when it is inconvenient to force a constant
+   to memory.  */
+
+#define AVOID_RELOAD_CONST_MEM(X) (flag_pic && too_late_for_got)
+
 /* Return the maximum number of consecutive registers
    needed to represent mode MODE in a register of class CLASS.
 
@@ -3165,6 +3173,7 @@ extern int flag_pic;
 extern int optimize;
 extern int flag_expensive_optimizations;
 extern int frame_pointer_needed;
+extern int too_late_for_got;
 
 /* Declare functions in rs6000.c */
 extern void output_options ();
--- config/rs6000/rs6000.c~	Mon Aug 17 17:19:48 1998
+++ config/rs6000/rs6000.c	Wed Aug 19 04:26:16 1998
@@ -2263,16 +2263,22 @@ ccr_bit (op, scc_p)
     }
 }
 \f
+/* When this is 1, we cannot reliably return the GOT register.  */
+int too_late_for_got = 0;
+
 /* Return the GOT register, creating it if needed.  */
 
 struct rtx_def *
 rs6000_got_register (value)
      rtx value;
 {
+  if (too_late_for_got)
+    abort();
+
   if (!current_function_uses_pic_offset_table || !pic_offset_table_rtx)
     {
       if (reload_in_progress || reload_completed)
-	fatal_insn ("internal error -- needed new GOT register during reload phase to load:", value);
+	abort();
 
       current_function_uses_pic_offset_table = 1;
       pic_offset_table_rtx = gen_rtx_REG (Pmode, GOT_TOC_REGNUM);
@@ -2391,6 +2397,7 @@ rs6000_finalize_pic ()
 	    last_insn = insn;
 	}
 
+      too_late_for_got = 1;
       if (reg)
 	{
 	  rtx init = gen_init_v4_pic (reg);
@@ -2473,6 +2480,7 @@ rs6000_init_expanders ()
   rs6000_fpmem_size = 0;
   rs6000_fpmem_offset = 0;
   pic_offset_table_rtx = (rtx)0;
+  too_late_for_got = 0;
 
   /* Arrange to save and restore machine status around nested functions.  */
   save_machine_status = rs6000_save_machine_status;
============================================================

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-20  0:51   ` Painful problems with -fpic implementation on powerpc-sysv Geoff Keating
@ 1998-08-20  9:57     ` H.J. Lu
  1998-08-20 15:53       ` Jeffrey A Law
  1998-08-20 15:53     ` David Edelsohn
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: H.J. Lu @ 1998-08-20  9:57 UTC (permalink / raw)
  To: Geoff Keating; +Cc: egcs, Franz.Sirl-kernel

> The underlying problem for both of these is the same.
> 
> The powerpc -fpic implementation works by allocating a pseudo to hold
> the GOT pointer, when required, (in the routine rs6000_got_register).
> In rs6000_finalise_pic, the pseudo is initialised.  The result is used
> in the movsi_got_internal insn.
> 
> There are two problems with this:
> 
> 1. rs6000_finalise_pic is run before reload.  But sometimes reload can
>    create a new memory reference (see, for instance, test4.c).  So
>    rs6000_finalise_pic does not know where all the memory references
>    are, or even if there are any at all.
> 
> 2. Worse, rs6000_finalise_pic is actually run before scheduling, and
>    so the scheduler can get the dependencies wrong if new memory
>    references appear later.  For instance, in test3.c after local
>    register allocation, there is:
> 

That sounds similar to the x86 -fPIC/-fomit-frame-pointer bug we fixed
earlier. What we did was to make sure the PIC register is used when in
 doubt:

Sun Jul 26 01:11:12 1998  H.J. Lu  (hjl@gnu.org)
       
        * i386.h (CONST_DOUBLE_OK_FOR_LETTER_P): Return 0 when eliminating
        the frame pointer and compiling PIC code and reload has not completed.
   
Can you do something like that for PPC?


H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-20  0:51   ` Painful problems with -fpic implementation on powerpc-sysv Geoff Keating
  1998-08-20  9:57     ` H.J. Lu
@ 1998-08-20 15:53     ` David Edelsohn
  1998-08-21 12:51       ` Geoff Keating
  1998-08-21 12:51     ` David Edelsohn
  1998-08-26  3:53     ` Jeffrey A Law
  3 siblings, 1 reply; 29+ messages in thread
From: David Edelsohn @ 1998-08-20 15:53 UTC (permalink / raw)
  To: Geoff Keating; +Cc: egcs, Franz Sirl, Jeffrey Law

>>>>> Geoff Keating writes:

Geoff> (insn 13 11 38 (set (reg:DI 84)
Geoff> (const_double (const_int 0) 0 2146435072)) 425 {*movdi_32} (nil)
Geoff> (expr_list:REG_EQUIV (const_double (const_int 0) 0 2146435072)
Geoff> (nil)))

Geoff> (insn 38 13 14 (set (reg:SI 88)
Geoff> (unspec[ 
Geoff> (const_int 0)
Geoff> ]  7)) 512 {init_v4_pic} (nil)
Geoff> (nil))

Geoff> But the first insn, number 13, is a memory reference.  It doesn't look
Geoff> like it now, but pseudo 84 will be allocated to a floating-point
Geoff> register, and the only way to load a constant into a FP register is
Geoff> from memory.  So reload will push the const_double out to a constant
Geoff> memory location.  The load from memory will require the GOT pointer.
Geoff> But the GOT pointer is not initialised until the _next_ insn.

	FP constants can be materialized in GPRs and transferred to FPRs.
The define_splits for loading CONST_DOUBLEs into FPRs specifically test
reload_completed.  The problem seems to be the use of a memory location
that requires the GOT instead of the stack.

	Does GCC decide early on that it will construct the constant in
the GOT pool and load from there but then doesn't have the GOT handy?  In
these cases is there some way to convince GCC to reload the value directly
via an alternative method instead of from the GOT slot?

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-20  9:57     ` H.J. Lu
@ 1998-08-20 15:53       ` Jeffrey A Law
  0 siblings, 0 replies; 29+ messages in thread
From: Jeffrey A Law @ 1998-08-20 15:53 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Geoff Keating, egcs, Franz.Sirl-kernel

  In message < m0z9Y1d-00038xC@ocean.lucon.org >you write:
  > That sounds similar to the x86 -fPIC/-fomit-frame-pointer bug we fixed
  > earlier. What we did was to make sure the PIC register is used when in
  >  doubt:
  > 
  > Sun Jul 26 01:11:12 1998  H.J. Lu  (hjl@gnu.org)
  >        
  >         * i386.h (CONST_DOUBLE_OK_FOR_LETTER_P): Return 0 when eliminating
  >         the frame pointer and compiling PIC code and reload has not complet
It's a closely related problem.  But the solution is more complex.



jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-20 15:53     ` David Edelsohn
@ 1998-08-21 12:51       ` Geoff Keating
  1998-08-21 13:17         ` David Edelsohn
  0 siblings, 1 reply; 29+ messages in thread
From: Geoff Keating @ 1998-08-21 12:51 UTC (permalink / raw)
  To: dje; +Cc: egcs, Franz.Sirl-kernel, law

> Cc: egcs@cygnus.com, Franz Sirl <Franz.Sirl-kernel@lauterbach.com>,
>         Jeffrey Law <law@cygnus.com>
> Date: Thu, 20 Aug 1998 14:04:57 -0400
> From: David Edelsohn <dje@watson.ibm.com>

> 	FP constants can be materialized in GPRs and transferred to FPRs.
> The define_splits for loading CONST_DOUBLEs into FPRs specifically test
> reload_completed.  The problem seems to be the use of a memory location
> that requires the GOT instead of the stack.

Yes.  My patch (attached to that e-mail message) `fixes' the problem
in a really ugly way by forcing egcs to use the stack, not the GOT.

> 	Does GCC decide early on that it will construct the constant in
> the GOT pool and load from there but then doesn't have the GOT
> handy? 

No, that's why the problem is so rare.  GCC can only see that it is
going to use the constant pool once it has allocated registers; before
then, it looks like it would load the constants into a general
register which it can do with immediate operands.

> In these cases is there some way to convince GCC to reload the value
> directly via an alternative method instead of from the GOT slot?

That is what my patch does: it forces reload to try some other
alternative.  My patch has to add an extra hook to force this; H.J
suggested `CONST_DOUBLE_OK_FOR_LETTER_P' but that seems to be
something else (the name is good, but it isn't called when the letter
is 'm' or 'o' :-).  I don't dare to change the meaning of something
like that because it would surely break other ports.

I have now built glibc and tcl 8.0.2 with my patch (with that abort()
commented out), and current glibc, and nothing seems to have gone
wrong.  However, I have spotted one bug; it should be

#define AVOID_RELOAD_CONST_MEM(X) (flag_pic == 1 && too_late_for_got)

although this doesn't affect correctness, just code quality.

In fact, I think you could possibly eliminate all the too_late_for_got
changes altogether, the macro is only used in places when it's too
late.

-- 
Geoffrey Keating <geoffk@ozemail.com.au>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-20  0:51   ` Painful problems with -fpic implementation on powerpc-sysv Geoff Keating
  1998-08-20  9:57     ` H.J. Lu
  1998-08-20 15:53     ` David Edelsohn
@ 1998-08-21 12:51     ` David Edelsohn
  1998-08-24 18:48       ` Joern Rennecke
  1998-08-26  0:13       ` Jeffrey A Law
  1998-08-26  3:53     ` Jeffrey A Law
  3 siblings, 2 replies; 29+ messages in thread
From: David Edelsohn @ 1998-08-21 12:51 UTC (permalink / raw)
  To: Geoff Keating; +Cc: egcs, Franz Sirl

>>>>> Geoff Keating writes:

Geoff> But the first insn, number 13, is a memory reference.  It doesn't look
Geoff> like it now, but pseudo 84 will be allocated to a floating-point
Geoff> register, and the only way to load a constant into a FP register is
Geoff> from memory.  So reload will push the const_double out to a constant
Geoff> memory location.  The load from memory will require the GOT pointer.
Geoff> But the GOT pointer is not initialised until the _next_ insn.

	The movdf define_splits to load a const_double into an FPR
currently have "... && reload_completed" as a final condition.  Would
changing this to "... && (reload_in_progress || reload_completed)" have a
positive effect?  This might allow reload to create the constant in GPRs
and move it via the stack instead of creating a GOT symbol, depending on
what reload is trying to accomplish when these failures are occurring.

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-21 12:51       ` Geoff Keating
@ 1998-08-21 13:17         ` David Edelsohn
  1998-08-23  7:45           ` Geoff Keating
  0 siblings, 1 reply; 29+ messages in thread
From: David Edelsohn @ 1998-08-21 13:17 UTC (permalink / raw)
  To: Geoff Keating; +Cc: egcs, Franz.Sirl-kernel, law

>>>>> Geoff Keating writes:

Geoff> Yes.  My patch (attached to that e-mail message) `fixes' the problem
Geoff> in a really ugly way by forcing egcs to use the stack, not the GOT.

Geoff> That is what my patch does: it forces reload to try some other
Geoff> alternative.  My patch has to add an extra hook to force this; H.J
Geoff> suggested `CONST_DOUBLE_OK_FOR_LETTER_P' but that seems to be
Geoff> something else (the name is good, but it isn't called when the letter
Geoff> is 'm' or 'o' :-).  I don't dare to change the meaning of something
Geoff> like that because it would surely break other ports.

	I wouldn't have thought that changes to reload.c itself were
necessary to accomplish that.  The PowerPC port already materializes FP
constants in GPRs and moves them through the stack.  The FPMEM "register"
also exists which perhaps could be pulled into service for this or maybe
SECONDARY_MEMORY_NEEDED which also tells reload that it needs an extra
stack slot for FPR moves with anything.

	I am not sure when REGISTER_MOVE_COST is used, but possibly it is
encourging the backend to obtain the value from a symbol instead of
materializing it in a GPR and moving it.

	There are just so many hooks already in place to tell the backend
that it needs an additional stack slot at a late stage, that I am
surprised the changes to reload.c are necessary.

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-23  7:45           ` Geoff Keating
@ 1998-08-23  7:45             ` Franz Sirl
  1998-08-23 23:15               ` Geoff Keating
  1998-08-23 19:12             ` David Edelsohn
  1 sibling, 1 reply; 29+ messages in thread
From: Franz Sirl @ 1998-08-23  7:45 UTC (permalink / raw)
  To: Geoff Keating; +Cc: egcs, Franz.Sirl-kernel, law

Am Sun, 23 Aug 1998 schrieb Geoff Keating:
>> Cc: egcs@cygnus.com, Franz.Sirl-kernel@lauterbach.com, law@cygnus.com
>> Date: Fri, 21 Aug 1998 11:27:39 -0400
>> From: David Edelsohn <dje@watson.ibm.com>
>> X-UIDL: 9c6f676aff83562f25614b7bea1441f5
>> 
>> >>>>> Geoff Keating writes:
>> 
>> Geoff> Yes.  My patch (attached to that e-mail message) `fixes' the problem
>> Geoff> in a really ugly way by forcing egcs to use the stack, not the GOT.
>> 
>> Geoff> That is what my patch does: it forces reload to try some other
>> Geoff> alternative.  My patch has to add an extra hook to force this; H.J
>> Geoff> suggested `CONST_DOUBLE_OK_FOR_LETTER_P' but that seems to be
>> Geoff> something else (the name is good, but it isn't called when the letter
>> Geoff> is 'm' or 'o' :-).  I don't dare to change the meaning of something
>> Geoff> like that because it would surely break other ports.
>> 
>> 	I wouldn't have thought that changes to reload.c itself were
>> necessary to accomplish that.  The PowerPC port already materializes FP
>> constants in GPRs and moves them through the stack. 
>...
>> 	There are just so many hooks already in place to tell the backend
>> that it needs an additional stack slot at a late stage, that I am
>> surprised the changes to reload.c are necessary.
>
>That's why it's really ugly :-).
>
>The problem is not that reload can't load the value using GPRs; it
>can, my patch relies on it (well, there was a small bug that had to be
>fixed).  It's just that reload _won't_, because it is less efficient.
>
>Reload sees that its choices are either:
>
>- load immediate value in GPRs, store value to memory, load FPR back
>  from memory;  or
>- load FPR directly from memory.
>
>Naturally, it chooses the second.
>
>
>I have discovered another case when reload can generate new symbol_ref
>references.  Consider the attached test case, compiled with '-O -fpic'.

Hmm, is this one different from the one included in the testsuite
(gcc.dg/980523-1.c), which is not fixed by your patch? I can confirm that
egcs-1.1 bootstraps and tests fine with your patch, gcc.dg/980526-1.c is fixed.

Franz.

PS: 
did you finish your thesis?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-21 13:17         ` David Edelsohn
@ 1998-08-23  7:45           ` Geoff Keating
  1998-08-23  7:45             ` Franz Sirl
  1998-08-23 19:12             ` David Edelsohn
  0 siblings, 2 replies; 29+ messages in thread
From: Geoff Keating @ 1998-08-23  7:45 UTC (permalink / raw)
  To: dje; +Cc: egcs, Franz.Sirl-kernel, law

> Cc: egcs@cygnus.com, Franz.Sirl-kernel@lauterbach.com, law@cygnus.com
> Date: Fri, 21 Aug 1998 11:27:39 -0400
> From: David Edelsohn <dje@watson.ibm.com>
> X-UIDL: 9c6f676aff83562f25614b7bea1441f5
> 
> >>>>> Geoff Keating writes:
> 
> Geoff> Yes.  My patch (attached to that e-mail message) `fixes' the problem
> Geoff> in a really ugly way by forcing egcs to use the stack, not the GOT.
> 
> Geoff> That is what my patch does: it forces reload to try some other
> Geoff> alternative.  My patch has to add an extra hook to force this; H.J
> Geoff> suggested `CONST_DOUBLE_OK_FOR_LETTER_P' but that seems to be
> Geoff> something else (the name is good, but it isn't called when the letter
> Geoff> is 'm' or 'o' :-).  I don't dare to change the meaning of something
> Geoff> like that because it would surely break other ports.
> 
> 	I wouldn't have thought that changes to reload.c itself were
> necessary to accomplish that.  The PowerPC port already materializes FP
> constants in GPRs and moves them through the stack. 
...
> 	There are just so many hooks already in place to tell the backend
> that it needs an additional stack slot at a late stage, that I am
> surprised the changes to reload.c are necessary.

That's why it's really ugly :-).

The problem is not that reload can't load the value using GPRs; it
can, my patch relies on it (well, there was a small bug that had to be
fixed).  It's just that reload _won't_, because it is less efficient.

Reload sees that its choices are either:

- load immediate value in GPRs, store value to memory, load FPR back
  from memory;  or
- load FPR directly from memory.

Naturally, it chooses the second.


I have discovered another case when reload can generate new symbol_ref
references.  Consider the attached test case, compiled with '-O -fpic'.

After CSE and local register allocation, doit() looks like this:

;; Start of basic block 0, registers live: 1 [1] 3 [3] 4 [4] 5 [5] 6 [6] 31 [31]
 93
(insn 238 2 4 (set (reg:SI 128)
        (unspec[ 
                (const_int 0)
            ]  7)) 512 {init_v4_pic} (nil)
    (nil))

...  [lots of unrelated stuff.]

(insn 284 281 286 (set (reg:SI 117)
        (unspec[ 
                (symbol_ref:SI ("@h_malloc"))
                (reg:SI 128)
            ]  8)) 398 {*movsi_got_internal} (insn_list 238 (nil))
    (expr_list:REG_DEAD (reg:SI 128)
        (expr_list:REG_EQUIV (symbol_ref:SI ("@h_malloc"))
            (nil))))

... [three more insns]

;; End of basic block 0

(note 50 289 56 "" NOTE_INSN_LOOP_BEG)

;; Start of basic block 1, registers live: 1 [1] 31 [31] 81 82 83 84 86 88 89 91
 92 93 105 117 119 121 127

[the loop continues, using register 117.  CSE (or something) has
hoisted the load of h_malloc's address out of the loop.]

...

(insn 160 158 162 (set (reg:SI 120)
        (mem:SI (reg:SI 117))) 402 {movsi+1} (nil)
    (nil))

(insn 162 160 164 (set (reg:SI 3 r3)
        (ashift:SI (reg:SI 119)
            (reg/v:SI 81))) 179 {ashlsi3_no_power} (nil)
    (nil))

(call_insn 164 162 166 (parallel[ 
            (set (reg:SI 3 r3)
                (call (mem:SI (reg:SI 120))
                    (const_int 0)))
            (use (const_int 0))
            (clobber (scratch:SI))
        ] ) 497 {call_value_indirect_sysv} (insn_list 160 (insn_list 162 (nil)))
    (expr_list:REG_DEAD (reg:SI 120)
        (expr_list:REG_UNUSED (scratch:SI)
            (nil)))
    (expr_list (use (reg:SI 3 r3))
        (nil)))

[this is the call to *h_malloc.  It is the only other use of pseudo
117.]

Local alloc has decided to put pseudo 128 in hard reg 26---it can do
this because pseudo 128 is used only in block 0.

Now, pseudo 117 doesn't get a register in global alloc initially.  So
reload sees that register 117 is REG_EQUIV to
(symbol_ref ("@h_malloc")), and tries using that directly; this is
what it eventually decides to do:

;; Start of basic block 8, registers live: 1 [1] 81 82 83 84 85 86 87
88 89 91 93 103 105 107 117 119 121 125 127
...
(insn 297 158 160 (set (reg:SI 10 r10)
        (unspec[ 
                (symbol_ref:SI ("@h_malloc"))
                (reg:SI 26 r26)
            ]  8)) 398 {*movsi_got_internal} (nil)
    (nil))

(insn:HI 160 297 162 (set (reg:SI 0 r0)
        (mem:SI (reg:SI 10 r10))) 402 {movsi+1} (nil)
    (nil))
...
(insn 300 162 164 (set (reg:SI 65 lr)
        (reg:SI 0 r0)) 402 {movsi+1} (nil)
    (nil))

(call_insn:HI 164 300 166 (parallel[ 
            (set (reg:SI 3 r3)
                (call (mem:SI (reg:SI 65 lr))
                    (const_int 0)))
            (use (const_int 0))
            (clobber (reg:SI 65 lr))
        ] ) 497 {call_value_indirect_sysv} (insn_list 160 (insn_list 162 (nil)))
    (expr_list:REG_DEAD (reg:SI 0 r0)
        (expr_list:REG_UNUSED (reg:SI 65 lr)
            (nil)))
    (expr_list (use (reg:SI 3 r3))
        (nil)))


But look!  Insn 297 is now using register 26, which is only valid in
basic block 0.  Unfortunately, register 26 was allocated to a
different pseudo before reload by global alloc (it's the loop counter
'a'), and the generated code is bogus.

Now, like the const_double in the earlier case, the REG_EQUIV note is
correct; it's just that reload shouldn't be using it---and, before
anyone mentions it, I tried using LEGITIMATE_PIC_OPERAND_P, but it
does too much.  It's pretty easy to add an extra macro for reload to
also prevent this case, but I'd like to try to find a general solution
that convinces me I haven't missed yet another case.

[The example is from kaffe-1.0b1, but it turns out that linuxthreads
in glibc also suffers from this.]

-- 
Geoffrey Keating <geoffk@ozemail.com.au>

===File ~/gcc-bugs/test11.c=================================
typedef struct _huft {
  unsigned 	e;
  struct _huft* t;
} huft;

huft*   (*h_malloc)(unsigned);

static void
doit( unsigned p2, unsigned p3, huft *p4, huft** t)
{
  unsigned a, h, j, k, w;
  unsigned *xp, *l;
  unsigned lx[2*7+1], v[2*2];
  huft *q;
  huft r;
  huft *u[2];

  memset(v, 0, sizeof(v));
  memset(lx, 0, sizeof(lx));

  w = 0;
  q = 0;
  h = 0;
  l = lx+1;
  
  for (k=1; k <= 2; k++)
    for (a=0; a < 7; a++)
    {
      while (k > w)
      {
        w = l[h++];   

	j = w;
        if (w > a + 1)
        {
          xp = v + k;
          while (j < 2 && w <= xp[j])
	    ++j;
        }
        l[h] = j;        

        q = h_malloc(1<<p2);
        *t = q;
        t = &(q->t);
	*t = 0;
	u[k-1] = q;
      }

      r.e = p3;
      r.t = p4;
      q[0] = r;
    }
}

huft *is_ok(unsigned x) 
{
  exit(0);
}

int main() { h_malloc = is_ok; doit(0, 0, 0, 0); abort(); }

 
============================================================

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-23  7:45           ` Geoff Keating
  1998-08-23  7:45             ` Franz Sirl
@ 1998-08-23 19:12             ` David Edelsohn
  1998-08-26  0:13               ` Jeffrey A Law
  1 sibling, 1 reply; 29+ messages in thread
From: David Edelsohn @ 1998-08-23 19:12 UTC (permalink / raw)
  To: Geoff Keating; +Cc: egcs, Franz.Sirl-kernel, law

>>>>> Geoff Keating writes:

Geoff> Reload sees that its choices are either:

Geoff> - load immediate value in GPRs, store value to memory, load FPR back
Geoff> from memory;  or
Geoff> - load FPR directly from memory.

Geoff> Naturally, it chooses the second.

	That is exactly my point.  reload has another alternative and is
*choosing* the load from memory based upon some cost analysis.  I propose
changing the cost analysis when in reload so that the value is
materialized in GPRs and transferred via stack memory instead of a
symbol_ref. 

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-23  7:45             ` Franz Sirl
@ 1998-08-23 23:15               ` Geoff Keating
  0 siblings, 0 replies; 29+ messages in thread
From: Geoff Keating @ 1998-08-23 23:15 UTC (permalink / raw)
  To: Franz.Sirl-kernel; +Cc: egcs, Franz.Sirl-kernel, law

> From: Franz Sirl <Franz.Sirl-kernel@lauterbach.com>
> Date: Sun, 23 Aug 1998 16:25:39 +0200

> Hmm, is this one different from the one included in the testsuite
> (gcc.dg/980523-1.c), which is not fixed by your patch? I can confirm
> that egcs-1.1 bootstraps and tests fine with your patch,
> gcc.dg/980526-1.c is fixed.

It's the same.

-- 
Geoffrey Keating <geoffk@ozemail.com.au>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-21 12:51     ` David Edelsohn
@ 1998-08-24 18:48       ` Joern Rennecke
  1998-08-26  0:13       ` Jeffrey A Law
  1 sibling, 0 replies; 29+ messages in thread
From: Joern Rennecke @ 1998-08-24 18:48 UTC (permalink / raw)
  To: David Edelsohn; +Cc: geoffk, egcs, Franz.Sirl-kernel

> 	The movdf define_splits to load a const_double into an FPR
> currently have "... && reload_completed" as a final condition.  Would
> changing this to "... && (reload_in_progress || reload_completed)" have a
> positive effect?  This might allow reload to create the constant in GPRs
> and move it via the stack instead of creating a GOT symbol, depending on
> what reload is trying to accomplish when these failures are occurring.

reload doesn't try to do splits, hence changeing conditions of define_splits
should have no effect.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-21 12:51     ` David Edelsohn
  1998-08-24 18:48       ` Joern Rennecke
@ 1998-08-26  0:13       ` Jeffrey A Law
  1998-08-26 12:26         ` Richard Henderson
  1 sibling, 1 reply; 29+ messages in thread
From: Jeffrey A Law @ 1998-08-26  0:13 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Geoff Keating, egcs, Franz Sirl

  In message < 9808211609.AA36466@marc.watson.ibm.com >you write:
  > 	The movdf define_splits to load a const_double into an FPR
  > currently have "... && reload_completed" as a final condition.  Would
  > changing this to "... && (reload_in_progress || reload_completed)" have a
  > positive effect?  This might allow reload to create the constant in GPRs
  > and move it via the stack instead of creating a GOT symbol, depending on
  > what reload is trying to accomplish when these failures are occurring.
This could work for cases where one is trying to build up a constant,
but would not work if (for example) reload decided try and put
(const (plus (symbol_ref) (offset)) into the constant pool.  Yes, this
does really happen and it is painful.

jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-23 19:12             ` David Edelsohn
@ 1998-08-26  0:13               ` Jeffrey A Law
  0 siblings, 0 replies; 29+ messages in thread
From: Jeffrey A Law @ 1998-08-26  0:13 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Geoff Keating, egcs, Franz.Sirl-kernel

  In message < 9808232356.AA36800@marc.watson.ibm.com >you write:
  > >>>>> Geoff Keating writes:
  > 
  > Geoff> Reload sees that its choices are either:
  > 
  > Geoff> - load immediate value in GPRs, store value to memory, load FPR back
  > Geoff> from memory;  or
  > Geoff> - load FPR directly from memory.
  > 
  > Geoff> Naturally, it chooses the second.
  > 
  > 	That is exactly my point.  reload has another alternative and is
  > *choosing* the load from memory based upon some cost analysis.  I propose
  > changing the cost analysis when in reload so that the value is
  > materialized in GPRs and transferred via stack memory instead of a
  > symbol_ref. 
Generally that does not work -- it's just a cost, and eventually the
costs will do something unexpected and you'll lose.  One of my all-time
losers was when I had a cost thing in reload overflow and reload
thought using the 5bit shift register on the PA was really cheap
instead of really expensive (and wrong if you've got a 32bit value).

I would not recommend this approach.


jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-20  0:51   ` Painful problems with -fpic implementation on powerpc-sysv Geoff Keating
                       ` (2 preceding siblings ...)
  1998-08-21 12:51     ` David Edelsohn
@ 1998-08-26  3:53     ` Jeffrey A Law
  1998-08-26 11:57       ` David Edelsohn
  1998-08-26 12:26       ` Richard Henderson
  3 siblings, 2 replies; 29+ messages in thread
From: Jeffrey A Law @ 1998-08-26  3:53 UTC (permalink / raw)
  To: Geoff Keating; +Cc: egcs, Franz Sirl

  In message < 199808190652.QAA15243@geoffk.wattle.id.au >you write:
  > 
  > This is an analysis of two very nasty bugs with -fpic/-fPIC and
  > optimisation on powerpc SVR4 ABI targets.
  > 
  > Two testcases for the bugs are attached below, called 'test3.c' and
  > 'test4.c'.  The original files were from TclX 8.0.
  > 
  > If these are compiled with 'gcc -O2 -fpic test3.c -S', gcc fails with
  > an internal compiler error.  For test3.c, it's:
[ ... ]

  > The underlying problem for both of these is the same.
  > 
  > The powerpc -fpic implementation works by allocating a pseudo to hold
  > the GOT pointer, when required, (in the routine rs6000_got_register).
  > In rs6000_finalise_pic, the pseudo is initialised.  The result is used
  > in the movsi_got_internal insn.
  > 
  > There are two problems with this:
  > 
  > 1. rs6000_finalise_pic is run before reload.  But sometimes reload can
  >    create a new memory reference (see, for instance, test4.c).  So
  >    rs6000_finalise_pic does not know where all the memory references
  >    are, or even if there are any at all.
  > 
  > 2. Worse, rs6000_finalise_pic is actually run before scheduling, and
  >    so the scheduler can get the dependencies wrong if new memory
  >    references appear later.  For instance, in test3.c after local
  >    register allocation, there is:
First, having the PIC register be a pseudo is bad.  This is the core
problem that we need to fix.

We should select a callee saved register and reserve it for PIC.

This is non-optimal, but will work.  Trying to allocate the PIC reg
via a pseudo will not because we can not allocate a pseudo for the
PIC register during the reload pass (or anytime after flow).

I think addressing this core problem will go a long way towards fixing
the various PIC related issues on the powerpc.


Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26  3:53     ` Jeffrey A Law
@ 1998-08-26 11:57       ` David Edelsohn
  1998-08-26 20:55         ` Jeffrey A Law
  1998-08-26 12:26       ` Richard Henderson
  1 sibling, 1 reply; 29+ messages in thread
From: David Edelsohn @ 1998-08-26 11:57 UTC (permalink / raw)
  To: law; +Cc: Geoff Keating, egcs, Franz Sirl, Michael Meissner

>>>>> Jeffrey A Law writes:

Jeff> We should select a callee saved register and reserve it for PIC.

	What does the SVR4 PPC ABI say about r2 which AIX uses for its
addressibility TOC register?  I thought that it was "reserved" or
something.  It's fixed throughout the PowerPC port -- both AIX and
SVR4/eABI.  IF not that one then you run into conflicts with argument
registers or saved registers.

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26  3:53     ` Jeffrey A Law
  1998-08-26 11:57       ` David Edelsohn
@ 1998-08-26 12:26       ` Richard Henderson
  1998-08-26 18:28         ` Jeffrey A Law
  1 sibling, 1 reply; 29+ messages in thread
From: Richard Henderson @ 1998-08-26 12:26 UTC (permalink / raw)
  To: law, Geoff Keating; +Cc: egcs, Franz Sirl

On Wed, Aug 26, 1998 at 01:06:45AM -0600, Jeffrey A Law wrote:
> First, having the PIC register be a pseudo is bad.  This is the core
> problem that we need to fix.

It would be really nice if we could solve this, as I have plans to
try exactly the same thing on x86, where added register pressure due
to losing 1 of 6 registers is horrible.

> This is non-optimal, but will work.  Trying to allocate the PIC reg
> via a pseudo will not because we can not allocate a pseudo for the
> PIC register during the reload pass (or anytime after flow).

A possible solution is not to create such a pseudo any longer, but
instead to reload the PIC value into a hardreg via secondary reloads.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26  0:13       ` Jeffrey A Law
@ 1998-08-26 12:26         ` Richard Henderson
  1998-08-26 20:55           ` Jeffrey A Law
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 1998-08-26 12:26 UTC (permalink / raw)
  To: law, David Edelsohn; +Cc: Geoff Keating, egcs, Franz Sirl

On Wed, Aug 26, 1998 at 01:11:02AM -0600, Jeffrey A Law wrote:
> This could work for cases where one is trying to build up a constant,
> but would not work if (for example) reload decided try and put
> (const (plus (symbol_ref) (offset)) into the constant pool.  Yes, this
> does really happen and it is painful.

If this happens, it is a bug.  With -fpic, a symbol_ref is not a
constant, and will result in a run-time relocation to .rodata.
Not good.

This should instead be reloaded as

	(set tmp (mem:PI pic (symbol_ref)))
	(set tmp (plus:PI tmp (offset)))

splitting up the first as dictated by -fpic vs -fPIC.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26 12:26       ` Richard Henderson
@ 1998-08-26 18:28         ` Jeffrey A Law
  0 siblings, 0 replies; 29+ messages in thread
From: Jeffrey A Law @ 1998-08-26 18:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Geoff Keating, egcs, Franz Sirl

  In message < 19980826121846.C1747@dot.cygnus.com >you write:
  > On Wed, Aug 26, 1998 at 01:06:45AM -0600, Jeffrey A Law wrote:
  > > First, having the PIC register be a pseudo is bad.  This is the core
  > > problem that we need to fix.
  > 
  > It would be really nice if we could solve this, as I have plans to
  > try exactly the same thing on x86, where added register pressure due
  > to losing 1 of 6 registers is horrible.
It's certainly more painful on the x86 due to the shortage of regs.

  > > This is non-optimal, but will work.  Trying to allocate the PIC reg
  > > via a pseudo will not because we can not allocate a pseudo for the
  > > PIC register during the reload pass (or anytime after flow).
  > 
  > A possible solution is not to create such a pseudo any longer, but
  > instead to reload the PIC value into a hardreg via secondary reloads.
But you've got to find a solution to the calls to force_const_mem
which stuff things into readonly memory and give back a symbol_ref.

Calls to force_const_mem can occur after we've counted all the reload
regs, set the elimination offsets, etc in the "big while loop" in
reload1.c.  Plus you've got to deal with the "can't allocate a new
reg after flow problem" too.


jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26 20:55             ` Richard Henderson
@ 1998-08-26 20:55               ` Jeffrey A Law
  0 siblings, 0 replies; 29+ messages in thread
From: Jeffrey A Law @ 1998-08-26 20:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: David Edelsohn, Geoff Keating, egcs, Franz Sirl

  In message < 19980826191807.B27094@dot.cygnus.com >you write:
  > On Wed, Aug 26, 1998 at 07:29:57PM -0600, Jeffrey A Law wrote:
  > > Consider a target where it is impossible to synthesize the sequence
  > > at reload time.
  > 
  > Define "impossible to synthesize".  Ugly, yes.  I can also see
  > needing two or three reload regs.  But it should be possible, 
  > ignoring asms, and given that any one insn doesn't have more than
  > a couple inputs.
The case that comes up on the PA is function addresses.  A subset of
them you can synthesize in registers, but others you can't.  The
linker doesn't provide the necessary reloctation support.

Right now we just punt and throw them all into the data section.

jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26 20:55           ` Jeffrey A Law
@ 1998-08-26 20:55             ` Richard Henderson
  1998-08-26 20:55               ` Jeffrey A Law
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 1998-08-26 20:55 UTC (permalink / raw)
  To: law, Richard Henderson; +Cc: David Edelsohn, Geoff Keating, egcs, Franz Sirl

On Wed, Aug 26, 1998 at 07:29:57PM -0600, Jeffrey A Law wrote:
> Consider a target where it is impossible to synthesize the sequence
> at reload time.

Define "impossible to synthesize".  Ugly, yes.  I can also see
needing two or three reload regs.  But it should be possible, 
ignoring asms, and given that any one insn doesn't have more than
a couple inputs.

In any case, from what little I know about ppc, I wouldn't think
there is such a thing, at least for pic in the 32-bit sysv abi:

 * Having the pic register somewhere is non-negotiable.  You need
   it for either reading the actual data or for reading the address
   from constant memory.  We ought to be able to emit this 
   two-insn sequence just about anywhere.

 * From there, one can continue modifying the address in-place.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26 12:26         ` Richard Henderson
@ 1998-08-26 20:55           ` Jeffrey A Law
  1998-08-26 20:55             ` Richard Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: Jeffrey A Law @ 1998-08-26 20:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: David Edelsohn, Geoff Keating, egcs, Franz Sirl

  In message < 19980826122503.D1747@dot.cygnus.com >you write:
  > On Wed, Aug 26, 1998 at 01:11:02AM -0600, Jeffrey A Law wrote:
  > > This could work for cases where one is trying to build up a constant,
  > > but would not work if (for example) reload decided try and put
  > > (const (plus (symbol_ref) (offset)) into the constant pool.  Yes, this
  > > does really happen and it is painful.
  > 
  > If this happens, it is a bug.  With -fpic, a symbol_ref is not a
  > constant, and will result in a run-time relocation to .rodata.
  > Not good.
Certainly not good.  But it happens.

Consider a target where it is impossible to synthesize the sequence
at reload time.  It can (and does) happen on the PA.  It has to go into
memory.  Of course, we don't want it in readonly memory because of
shared library issues, so we arrange to get it into the normal data
segment.

jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26 11:57       ` David Edelsohn
@ 1998-08-26 20:55         ` Jeffrey A Law
  1998-08-27  0:40           ` Richard Henderson
                             ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jeffrey A Law @ 1998-08-26 20:55 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Geoff Keating, egcs, Franz Sirl, Michael Meissner

  In message < 9808261614.AA32650@marc.watson.ibm.com >you write:
  > >>>>> Jeffrey A Law writes:
  > 
  > Jeff> We should select a callee saved register and reserve it for PIC.
  > 
  > 	What does the SVR4 PPC ABI say about r2 which AIX uses for its
  > addressibility TOC register?  I thought that it was "reserved" or
  > something.  It's fixed throughout the PowerPC port -- both AIX and
  > SVR4/eABI.  IF not that one then you run into conflicts with argument
  > registers or saved registers.
It says:

  Register r2 is reserved for system use and should not be changed by
  application code.

I don't know if that gives us the leeway we need to claim it.

jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26 20:55         ` Jeffrey A Law
  1998-08-27  0:40           ` Richard Henderson
@ 1998-08-27  0:40           ` David Edelsohn
  1998-08-27 22:21             ` Geoff Keating
  1998-08-27  6:18           ` Rask Ingemann Lambertsen
  2 siblings, 1 reply; 29+ messages in thread
From: David Edelsohn @ 1998-08-27  0:40 UTC (permalink / raw)
  To: law; +Cc: Geoff Keating, egcs, Franz Sirl, Michael Meissner

>>>>> Jeffrey A Law writes:

Jeff> It says:
Jeff> Register r2 is reserved for system use and should not be changed by
Jeff> application code.

Jeff> I don't know if that gives us the leeway we need to claim it.

	Yeah, that's sort of what I remembered.  I have no idea what that
means though I would think that GOT addressibility comes under system use
and not application code.

	I would defer to Meissner as he has been involved with the PPC
eABI which is a variant of PPC SVR4, so he should know how the committee
intended r2 to be used.  If not r2, then I guess r30 would be the next
best choice as that is used for the AIX "minimal-TOC" alternate TOC
register which has similar requirements for an addressibility register
allocated by the compiler if the ABI intended "system use" to mean
something else.

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26 20:55         ` Jeffrey A Law
@ 1998-08-27  0:40           ` Richard Henderson
  1998-08-27 12:32             ` David Edelsohn
  1998-08-27  0:40           ` David Edelsohn
  1998-08-27  6:18           ` Rask Ingemann Lambertsen
  2 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 1998-08-27  0:40 UTC (permalink / raw)
  To: law, David Edelsohn; +Cc: Geoff Keating, egcs, Franz Sirl, Michael Meissner

On Wed, Aug 26, 1998 at 08:01:03PM -0600, Jeffrey A Law wrote:
>   Register r2 is reserved for system use and should not be changed by
>   application code.
> 
> I don't know if that gives us the leeway we need to claim it.

I doubt it.  The Sparc ABI uses the same language for a register
Solaris uses for the thread context pointer.

Interesting to know, though.  Geoff, thought about using this in
glibc for THREAD_SELF?


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-26 20:55         ` Jeffrey A Law
  1998-08-27  0:40           ` Richard Henderson
  1998-08-27  0:40           ` David Edelsohn
@ 1998-08-27  6:18           ` Rask Ingemann Lambertsen
  2 siblings, 0 replies; 29+ messages in thread
From: Rask Ingemann Lambertsen @ 1998-08-27  6:18 UTC (permalink / raw)
  To: Jeffrey A Law; +Cc: meissner, dje

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]

Den 27-Aug-98 04:01:03 skrev Jeffrey A Law følgende om "Re: Painful problems with -fpic implementation on powerpc-sysv ":
> It says:

>   Register r2 is reserved for system use and should not be changed by
>   application code.

> I don't know if that gives us the leeway we need to claim it.

   I asked a PowerPC kernel developer, and his understanding is that r2 is
reserved for the OS, not for the compiler.

Regards,

/¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯T¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯\
| Rask Ingemann Lambertsen       | E-mail: mailto:rask@kampsax.k-net.dk  |
| Registered Phase5 developer    | WWW: http://www.gbar.dtu.dk/~c948374/ |
| A4000, 775 kkeys/s (RC5-64)    | "ThrustMe" on XPilot, ARCnet and IRC  |
|      If there is something I can't stand, it's intolerant people       |


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-27  0:40           ` Richard Henderson
@ 1998-08-27 12:32             ` David Edelsohn
  0 siblings, 0 replies; 29+ messages in thread
From: David Edelsohn @ 1998-08-27 12:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: law, Geoff Keating, egcs, Franz Sirl, Michael Meissner

>>>>> Richard Henderson writes:

Richard> On Wed, Aug 26, 1998 at 08:01:03PM -0600, Jeffrey A Law wrote:
>> Register r2 is reserved for system use and should not be changed by
>> application code.
>> 
>> I don't know if that gives us the leeway we need to claim it.

Richard> I doubt it.  The Sparc ABI uses the same language for a register
Richard> Solaris uses for the thread context pointer.

Richard> Interesting to know, though.  Geoff, thought about using this in
Richard> glibc for THREAD_SELF?

	AIX 64-bit ABI has allocated GPR13 for thread-self.  AIX original
32-bit ABI works around not having thread-local data pointer.

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
  1998-08-27  0:40           ` David Edelsohn
@ 1998-08-27 22:21             ` Geoff Keating
  0 siblings, 0 replies; 29+ messages in thread
From: Geoff Keating @ 1998-08-27 22:21 UTC (permalink / raw)
  To: dje; +Cc: law, egcs, Franz.Sirl-kernel, meissner

> Date: Wed, 26 Aug 1998 22:57:28 -0400
> From: David Edelsohn <dje@watson.ibm.com>

> >>>>> Jeffrey A Law writes:
> 
> Jeff> It says:
> Jeff> Register r2 is reserved for system use and should not be changed by
> Jeff> application code.
> 
> Jeff> I don't know if that gives us the leeway we need to claim it.
> 
> 	Yeah, that's sort of what I remembered.  I have no idea what that
> means though I would think that GOT addressibility comes under system use
> and not application code.

In this context, 'system use' is the kernel and libc.  'applications'
are ELF executables and shared libraries that don't come with the
system.  For instance, it's allowed for the libc to clobber r2, which
might happen if AIX was doing a sysv ABI emulation; or to use r2 as a
thread ID, which is what I think the old pre-2.0 glibc does.

-fPIC uses register 30.  But this is really expensive, because it
means that even a trivial routine that uses one global data item has
to have a stack frame and has to save two registers (r30 and r31).
One of the nice things about -fpic is that accesses to global data do
not necessarily cause a leaf routine to have a stack frame.

> 	I would defer to Meissner as he has been involved with the PPC
> eABI which is a variant of PPC SVR4, so he should know how the committee
> intended r2 to be used.  If not r2, then I guess r30 would be the next
> best choice as that is used for the AIX "minimal-TOC" alternate TOC
> register which has similar requirements for an addressibility register
> allocated by the compiler if the ABI intended "system use" to mean
> something else.

I believe that the EABI defines r2 for accesses of small read-only
constant data, which is important for embedded code (r13 is
used by both ABIs for writable small data).


I don't think it's that hard to get the current powerpc -fpic
implementation working properly.  At worst, it might be necessary to,
under some circumstances, generate a new GOT register on-the-fly and
explain to reload that certain kinds of reloads require the link
register---you can always get the address of any global data by writing

  bl  _GLOBAL_OFFSET_TABLE_@local
  mflr %0
  lwz  %0,something@got(%0)

which requires two registers, one for the destination (which reload
already knows about) and the link register.  In pseudo-RTL, this looks
like

(set (reg 65 lr) (unspec [(const_int 0)] 7))
(set (reg %0) (reg 65 lr))
(set (reg %0) (unspec [(symbol_ref "something") (reg %0)] 8))

at the moment.

Of course, this is pretty expensive, requires two reload registers,
and I'm not sure how to explain it to reload.  But the need for it is
rare, and there are often (always?) other alternatives so it should
be possible to convince reload to do it very infrequently.

-- 
Geoffrey Keating <geoffk@ozemail.com.au>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Painful problems with -fpic implementation on powerpc-sysv
@ 1998-08-28 15:55 Michael Meissner
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Meissner @ 1998-08-28 15:55 UTC (permalink / raw)
  To: dje, geoffk; +Cc: egcs, Franz.Sirl-kernel, law, meissner

| In this context, 'system use' is the kernel and libc.  'applications'
| are ELF executables and shared libraries that don't come with the
| system.  For instance, it's allowed for the libc to clobber r2, which
| might happen if AIX was doing a sysv ABI emulation; or to use r2 as a
| thread ID, which is what I think the old pre-2.0 glibc does.
|
| -fPIC uses register 30.  But this is really expensive, because it
| means that even a trivial routine that uses one global data item has
| to have a stack frame and has to save two registers (r30 and r31).
| One of the nice things about -fpic is that accesses to global data do
| not necessarily cause a leaf routine to have a stack frame.
|
| > 	I would defer to Meissner as he has been involved with the PPC
| > eABI which is a variant of PPC SVR4, so he should know how the committee
| > intended r2 to be used.  If not r2, then I guess r30 would be the next
| > best choice as that is used for the AIX "minimal-TOC" alternate TOC
| > register which has similar requirements for an addressibility register
| > allocated by the compiler if the ABI intended "system use" to mean
| > something else.
|
| I believe that the EABI defines r2 for accesses of small read-only
| constant data, which is important for embedded code (r13 is
| used by both ABIs for writable small data).

Yep, and people use it (though one of our customers is using the -msdata=sysv
option, and using r2 as a thread pointer).

| I don't think it's that hard to get the current powerpc -fpic
| implementation working properly.  At worst, it might be necessary to,
| under some circumstances, generate a new GOT register on-the-fly and
| explain to reload that certain kinds of reloads require the link
| register---you can always get the address of any global data by writing
|
|   bl  _GLOBAL_OFFSET_TABLE_@local
|   mflr %0
|   lwz  %0,something@got(%0)
|
| which requires two registers, one for the destination (which reload
| already knows about) and the link register.  In pseudo-RTL, this looks
| like
|
| (set (reg 65 lr) (unspec [(const_int 0)] 7))
| (set (reg %0) (reg 65 lr))
| (set (reg %0) (unspec [(symbol_ref "something") (reg %0)] 8))
|
| at the moment.
|
| Of course, this is pretty expensive, requires two reload registers,
| and I'm not sure how to explain it to reload.  But the need for it is
| rare, and there are often (always?) other alternatives so it should
| be possible to convince reload to do it very infrequently.

Well if it is rare, you could always use the same register for the load
register.  You still have the problem that LR is trashed, and reload might be
keeping something else in it (such as an address it is about to jump to).

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~1998-08-28 15:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <199807231243.WAA09035@geoffk.wattle.id.au>
     [not found] ` <98072823153402.00476@ns1102.munich.netsurf.de>
1998-08-20  0:51   ` Painful problems with -fpic implementation on powerpc-sysv Geoff Keating
1998-08-20  9:57     ` H.J. Lu
1998-08-20 15:53       ` Jeffrey A Law
1998-08-20 15:53     ` David Edelsohn
1998-08-21 12:51       ` Geoff Keating
1998-08-21 13:17         ` David Edelsohn
1998-08-23  7:45           ` Geoff Keating
1998-08-23  7:45             ` Franz Sirl
1998-08-23 23:15               ` Geoff Keating
1998-08-23 19:12             ` David Edelsohn
1998-08-26  0:13               ` Jeffrey A Law
1998-08-21 12:51     ` David Edelsohn
1998-08-24 18:48       ` Joern Rennecke
1998-08-26  0:13       ` Jeffrey A Law
1998-08-26 12:26         ` Richard Henderson
1998-08-26 20:55           ` Jeffrey A Law
1998-08-26 20:55             ` Richard Henderson
1998-08-26 20:55               ` Jeffrey A Law
1998-08-26  3:53     ` Jeffrey A Law
1998-08-26 11:57       ` David Edelsohn
1998-08-26 20:55         ` Jeffrey A Law
1998-08-27  0:40           ` Richard Henderson
1998-08-27 12:32             ` David Edelsohn
1998-08-27  0:40           ` David Edelsohn
1998-08-27 22:21             ` Geoff Keating
1998-08-27  6:18           ` Rask Ingemann Lambertsen
1998-08-26 12:26       ` Richard Henderson
1998-08-26 18:28         ` Jeffrey A Law
1998-08-28 15:55 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).