public inbox for java@gcc.gnu.org
 help / color / mirror / Atom feed
* Exception causing insns in delay slots
@ 2002-04-25 21:41 David S. Miller
  2002-04-26  3:53 ` Jason Merrill
  2002-04-26 13:08 ` Mark Mitchell
  0 siblings, 2 replies; 18+ messages in thread
From: David S. Miller @ 2002-04-25 21:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: mark, java


[ Java folks, ignore the majority of this email, and please skip
  down to "Longer term:", item 2.  It is following up to the posting
  I made earlier this evening, specifically:

     http://gcc.gnu.org/ml/java/2002-04/msg00357.html

  that posting. ]

This patch fixes half of PR target/6422, the other change needed to
close that bug is in libjava and will go to java-patches shortly.

Currently, reorg will happily put exception causing instructions
into delay slots.

This causes major problems when convert_to_eh_region_ranges
runs and hits such a delay instruction.

The problematic case we end up with is a SEQUENCE containing a branch
and a delay slot exception causing instruction, in that order.

In convert_to_eh_region_ranges we try to emit:

1) NOTE_INSN_EH_REGION_BEG before the second insn in the
   SEQUENCE.

2) NOTE_INSN_EH_REGION_END after the second insn in the
   SEQUENCE.

#2 goes ok, the note is put after the SEQUENCE itself.

However, #1 is completely mishandled, this label is placed after the
SEQUENCE too.  The culprit is the SEQUENCE handling code in
emit-etl.c:add_insn_before().

The SEQUENCE handling of add_insn_before() has many problems, in fact
if you try to emit something before the first insn of a SEQUENCE this
code will create a loop in the insn chain!

However, after some discussion with Richard, it is just as wrong to
put exception causing instructions into delay slots by default.  I
also believe that this is the safest way to fix this bug at this time.

Thus, the fix implemented is to disallow reorg.c to consider exception
causing instructions for delay slots.

Longer term:

1) add_insn_before's SEQUENCE handling needs fixing or it needs to
   signal an assertion failure when something is attempted inside of
   a SEQUENCE.  The current behavior is just wrong. :-)

2) We might try to let reorg.c put exception causing instructions
   in delay slots for targets that can handle it properly.

   After some analysis of how unwinding and throw work, I think it
   may not even be supportable on Sparc.  The reason has to do with
   a combination of two things:

   a) How unwind-dw2.c looks for FDEs at "context->ra - 1" instead of
      just "context->ra"

   b) How on delay slot architectures "context->ra - 1" is not
      necessarily the previous instruction executed.

   Basically, for signal frame dwarf2 unwind to work in the current
   framework the signal handler advances the program counter past
   the exception causing instruction and that value minus one has to
   be within the exception causing instruction.

   I made a posting to java@gcc.gnu.org earlier this evening before
   realizing this point.  Sorry Java people for jumping the gun
   and possibly scaring everyone :-)  However my other point about
   how unwind-dwarf2 adjusts the program counter still stands.

   It appears Boehm has caught the errors in my initial analysis
   already :-)

Mark, ok for 3.1 branch?

2002-04-25  David S. Miller  <davem@redhat.com>

	PR target/6422
	* reorg.c (optimize_skip): Do not allow exception causing
	instructions to be considered for delay slots.
	(fill_simply_delay_slots, fill_slots_from_thread): Likewise.
	(relax_delay_slots): Do not try to consider exception causing
	instructions as redundant.

--- reorg.c.~1~	Thu Apr 25 21:05:14 2002
+++ reorg.c	Thu Apr 25 21:16:15 2002
@@ -750,7 +750,8 @@ optimize_skip (insn)
       || GET_CODE (PATTERN (trial)) == SEQUENCE
       || recog_memoized (trial) < 0
       || (! eligible_for_annul_false (insn, 0, trial, flags)
-	  && ! eligible_for_annul_true (insn, 0, trial, flags)))
+	  && ! eligible_for_annul_true (insn, 0, trial, flags))
+      || can_throw_internal (trial))
     return 0;
 
   /* There are two cases where we are just executing one insn (we assume
@@ -2127,7 +2128,8 @@ fill_simple_delay_slots (non_jumps_p)
 	  && GET_CODE (trial) == JUMP_INSN
 	  && simplejump_p (trial)
 	  && eligible_for_delay (insn, slots_filled, trial, flags)
-	  && no_labels_between_p (insn, trial))
+	  && no_labels_between_p (insn, trial)
+	  && ! can_throw_internal (trial))
 	{
 	  rtx *tmp;
 	  slots_filled++;
@@ -2197,7 +2199,7 @@ fill_simple_delay_slots (non_jumps_p)
 		  /* Can't separate set of cc0 from its use.  */
 		  && ! (reg_mentioned_p (cc0_rtx, pat) && ! sets_cc0_p (pat))
 #endif
-		  )
+		  && ! can_throw_internal (trial))
 		{
 		  trial = try_split (pat, trial, 1);
 		  next_trial = prev_nonnote_insn (trial);
@@ -2273,7 +2275,7 @@ fill_simple_delay_slots (non_jumps_p)
 
 	     Presumably, we should also check to see if we could get
 	     back to this function via `setjmp'.  */
-	  && !can_throw_internal (insn)
+	  && ! can_throw_internal (insn)
 	  && (GET_CODE (insn) != JUMP_INSN
 	      || ((condjump_p (insn) || condjump_in_parallel_p (insn))
 		  && ! simplejump_p (insn)
@@ -2340,7 +2342,8 @@ fill_simple_delay_slots (non_jumps_p)
 #endif
 		    && ! (maybe_never && may_trap_p (pat))
 		    && (trial = try_split (pat, trial, 0))
-		    && eligible_for_delay (insn, slots_filled, trial, flags))
+		    && eligible_for_delay (insn, slots_filled, trial, flags)
+		    && ! can_throw_internal(trial))
 		  {
 		    next_trial = next_nonnote_insn (trial);
 		    delay_list = add_to_delay_list (trial, delay_list);
@@ -2392,7 +2395,8 @@ fill_simple_delay_slots (non_jumps_p)
 #endif
 	      && ! (maybe_never && may_trap_p (PATTERN (next_trial)))
 	      && (next_trial = try_split (PATTERN (next_trial), next_trial, 0))
-	      && eligible_for_delay (insn, slots_filled, next_trial, flags))
+	      && eligible_for_delay (insn, slots_filled, next_trial, flags)
+	      && ! can_throw_internal (trial))
 	    {
 	      rtx new_label = next_active_insn (next_trial);
 
@@ -2496,7 +2500,7 @@ fill_simple_delay_slots (non_jumps_p)
 	  /* Don't want to mess with cc0 here.  */
 	  && ! reg_mentioned_p (cc0_rtx, pat)
 #endif
-	  )
+	  && ! can_throw_internal (trial))
 	{
 	  trial = try_split (pat, trial, 1);
 	  if (ELIGIBLE_FOR_EPILOGUE_DELAY (trial, slots_filled))
@@ -2637,7 +2641,7 @@ fill_slots_from_thread (insn, condition,
 	  && ! (reg_mentioned_p (cc0_rtx, pat)
 		&& (! own_thread || ! sets_cc0_p (pat)))
 #endif
-	  )
+	  && ! can_throw_internal (trial))
 	{
 	  rtx prior_insn;
 
@@ -2874,8 +2878,10 @@ fill_slots_from_thread (insn, condition,
       trial = new_thread;
       pat = PATTERN (trial);
 
-      if (GET_CODE (trial) != INSN || GET_CODE (pat) != SET
-	  || ! eligible_for_delay (insn, 0, trial, flags))
+      if (GET_CODE (trial) != INSN
+	  || GET_CODE (pat) != SET
+	  || ! eligible_for_delay (insn, 0, trial, flags)
+	  || can_throw_internal (trial))
 	return 0;
 
       dest = SET_DEST (pat), src = SET_SRC (pat);
@@ -3286,7 +3292,8 @@ relax_delay_slots (first)
 	     insn, redirect the jump to the following insn process again.  */
 	  trial = next_active_insn (target_label);
 	  if (trial && GET_CODE (PATTERN (trial)) != SEQUENCE
-	      && redundant_insn (trial, insn, 0))
+	      && redundant_insn (trial, insn, 0)
+	      && ! can_throw_internal (trial))
 	    {
 	      rtx tmp;
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-25 21:41 Exception causing insns in delay slots David S. Miller
@ 2002-04-26  3:53 ` Jason Merrill
  2002-04-26  4:15   ` David S. Miller
  2002-04-26 13:08 ` Mark Mitchell
  1 sibling, 1 reply; 18+ messages in thread
From: Jason Merrill @ 2002-04-26  3:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: gcc-patches, mark, java

>>>>> "David" == David S Miller <davem@redhat.com> writes:

>    After some analysis of how unwinding and throw work, I think it
>    may not even be supportable on Sparc.  The reason has to do with
>    a combination of two things:

>    a) How unwind-dw2.c looks for FDEs at "context->ra - 1" instead of
>       just "context->ra"

>    b) How on delay slot architectures "context->ra - 1" is not
>       necessarily the previous instruction executed.

context->ra - 1 isn't supposed to be the previous instruction; it's
supposed to point somewhere inside the throwing instruction.

>    Basically, for signal frame dwarf2 unwind to work in the current
>    framework the signal handler advances the program counter past
>    the exception causing instruction and that value minus one has to
>    be within the exception causing instruction.

Sounds right to me.  What's the problem?

The dwarf2 unwind information does not know about delay slots; it assumes
that code is executed linearly, so it's important that the shuffling of
instructions for delay slot scheduling doesn't affect register save
information.  Other than that, it should be fine.

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26  3:53 ` Jason Merrill
@ 2002-04-26  4:15   ` David S. Miller
  2002-04-26  4:19     ` Jason Merrill
  0 siblings, 1 reply; 18+ messages in thread
From: David S. Miller @ 2002-04-26  4:15 UTC (permalink / raw)
  To: jason; +Cc: gcc-patches, mark, java

   From: Jason Merrill <jason@redhat.com>
   Date: Fri, 26 Apr 2002 11:43:39 +0100

   context->ra - 1 isn't supposed to be the previous instruction; it's
   supposed to point somewhere inside the throwing instruction.
   
Aha, that makes sense.

   >    Basically, for signal frame dwarf2 unwind to work in the current
   >    framework the signal handler advances the program counter past
   >    the exception causing instruction and that value minus one has to
   >    be within the exception causing instruction.
   
   Sounds right to me.  What's the problem?
   
No problem, just didn't understand how it was supposed
to work I guess.

   The dwarf2 unwind information does not know about delay slots; it assumes
   that code is executed linearly, so it's important that the shuffling of
   instructions for delay slot scheduling doesn't affect register save
   information.  Other than that, it should be fine.
   
Ok, this seems to doubly confirm the need for my reorg fix.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26  4:15   ` David S. Miller
@ 2002-04-26  4:19     ` Jason Merrill
  2002-04-26  4:45       ` David S. Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Merrill @ 2002-04-26  4:19 UTC (permalink / raw)
  To: David S. Miller; +Cc: gcc-patches, mark, java, dave.anglin

>>>>> "David" == David S Miller <davem@redhat.com> writes:

>    The dwarf2 unwind information does not know about delay slots; it assumes
>    that code is executed linearly, so it's important that the shuffling of
>    instructions for delay slot scheduling doesn't affect register save
>    information.  Other than that, it should be fine.
   
> Ok, this seems to doubly confirm the need for my reorg fix.

I wouldn't expect trapping instructions to have any unwind information
impact, except that the unwind info needs to be correct when we reach them.
This would only be a problem if the trapping insn is scheduled in the
delay slot of an insn which adjusts the stack.  I could imagine that being
true of some sort of combined call-push insn, but I'm not aware of any such
beast, certainly not on a target with delay slots.

The problem with delay slots and dwarf2 has more to do with scheduling a
stack adjustment into the delay slot of a call which throws.  Some
discussion of this WRT the PA:

  http://gcc.gnu.org/ml/gcc-patches/2002-01/msg00209.html
  http://gcc.gnu.org/ml/gcc-patches/2002-04/msg00169.html

the upshot of which seems to be that the PA now suppresses scheduling of
prologue insns into the body to avoid the above.  I would think that a
better fix long term would be to recognize delay slot patterns in
dwarf2out.c, and emit dwarf2 info which pretends that the adjustment in the
delay slot actually happens after the instruction numerically preceding the
call.

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26  4:19     ` Jason Merrill
@ 2002-04-26  4:45       ` David S. Miller
  2002-04-26  4:51         ` Jason Merrill
  0 siblings, 1 reply; 18+ messages in thread
From: David S. Miller @ 2002-04-26  4:45 UTC (permalink / raw)
  To: jason; +Cc: gcc-patches, mark, java, dave.anglin

   From: Jason Merrill <jason@redhat.com>
   Date: Fri, 26 Apr 2002 12:15:30 +0100

   > Ok, this seems to doubly confirm the need for my reorg fix.
   
   I wouldn't expect trapping instructions to have any unwind information
   impact, except that the unwind info needs to be correct when we reach them.
   This would only be a problem if the trapping insn is scheduled in the
   delay slot of an insn which adjusts the stack.  I could imagine that being
   true of some sort of combined call-push insn, but I'm not aware of any such
   beast, certainly not on a target with delay slots.

We don't emit the notes properly.  When reorg puts an exception
causing instruction into a delay slot, and that instruction is the
only exception causing instruction in that particular exception
region, we currently don't emit the notes correctly.  My original
email of this thread shows the details around this.

That is the problem.

The safe fix for that is the reorg changes.

Even if it could emit the notes properly, we couldn't unwind to
the sucker correctly.  This is because of all the "current->ra - 1"
buisness and the assumption (which you mentioned) of the code stream
being linear.

Tell me which part isn't making sense and I'll try to elaborate
further.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26  4:45       ` David S. Miller
@ 2002-04-26  4:51         ` Jason Merrill
  2002-04-26  4:59           ` David S. Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Merrill @ 2002-04-26  4:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: gcc-patches, mark, java, dave.anglin

>>>>> "David" == David S Miller <davem@redhat.com> writes:

>    From: Jason Merrill <jason@redhat.com>
>    Date: Fri, 26 Apr 2002 12:15:30 +0100

>> Ok, this seems to doubly confirm the need for my reorg fix.
   
>    I wouldn't expect trapping instructions to have any unwind information
>    impact, except that the unwind info needs to be correct when we reach them.
>    This would only be a problem if the trapping insn is scheduled in the
>    delay slot of an insn which adjusts the stack.  I could imagine that being
>    true of some sort of combined call-push insn, but I'm not aware of any such
>    beast, certainly not on a target with delay slots.

> We don't emit the notes properly.  When reorg puts an exception
> causing instruction into a delay slot, and that instruction is the
> only exception causing instruction in that particular exception
> region, we currently don't emit the notes correctly.  My original
> email of this thread shows the details around this.

Right, I agree that your patch is a good short-term fix to the problem of
not properly finding the exception region for an insn in a delay slot, due
to the notes being in the wrong place.

> Even if it could emit the notes properly, we couldn't unwind to
> the sucker correctly.  This is because of all the "current->ra - 1"
> buisness and the assumption (which you mentioned) of the code stream
> being linear.

This is what I don't understand.  The current->ra - 1 business should give
us the trapping instruction, which is what we want.  The only way the
non-linearity would be a problem for a delayed trapping insn would be if
the unwind info were different before and after (numerically) the call insn,
which I don't think can happen.

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26  4:51         ` Jason Merrill
@ 2002-04-26  4:59           ` David S. Miller
  2002-04-26  5:03             ` Jason Merrill
  0 siblings, 1 reply; 18+ messages in thread
From: David S. Miller @ 2002-04-26  4:59 UTC (permalink / raw)
  To: jason; +Cc: gcc-patches, mark, java, dave.anglin

   From: Jason Merrill <jason@redhat.com>
   Date: Fri, 26 Apr 2002 12:45:48 +0100
   
   > Even if it could emit the notes properly, we couldn't unwind to
   > the sucker correctly.  This is because of all the "current->ra - 1"
   > buisness and the assumption (which you mentioned) of the code stream
   > being linear.
   
   This is what I don't understand.  The current->ra - 1 business should give
   us the trapping instruction, which is what we want.  The only way the
   non-linearity would be a problem for a delayed trapping insn would be if
   the unwind info were different before and after (numerically) the call insn,
   which I don't think can happen.

Consider a signal on a null pointer dereference in a delay
slot on Sparc.

Here is the sequence of events:

	b	label
EXCEPTION_BEGIN_LABEL:
	 ld	[%g0], %g2
EXCEPTION_END_LABEL:
...
label:	/* some more code goes here */

We trap, take a SIGSEGV, this lands us eventually in the
frame unwinder.  The kernel provides us with the two
program counter values PC (equal to the LOAD instruction)
and NPC (equal to 'label').

We have to advance PC, and the only valid way to do that
is "PC = NPC".  PC + 4 is not the next instruction we would
execute.

This new PC value is 'label' and 'label' - 1 is likely not in the load
instruction and therefore not in the exception region.

Are you suggesting I just ignore NPC and set PC to "PC + 4" instead?
Hmmm... ok, but does PC need to be accurate for other purposes?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26  4:59           ` David S. Miller
@ 2002-04-26  5:03             ` Jason Merrill
  2002-04-26 13:34               ` John David Anglin
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Merrill @ 2002-04-26  5:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: gcc-patches, mark, java, dave.anglin

>>>>> "David" == David S Miller <davem@redhat.com> writes:

> Are you suggesting I just ignore NPC and set PC to "PC + 4" instead?

Yes, exactly.

> Hmmm... ok, but does PC need to be accurate for other purposes?

The uses of PC in the EH mechanism are only interested in the location of
the throw.  The return address is just a way of getting at it.

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-25 21:41 Exception causing insns in delay slots David S. Miller
  2002-04-26  3:53 ` Jason Merrill
@ 2002-04-26 13:08 ` Mark Mitchell
  1 sibling, 0 replies; 18+ messages in thread
From: Mark Mitchell @ 2002-04-26 13:08 UTC (permalink / raw)
  To: David S. Miller, gcc-patches; +Cc: java

> Thus, the fix implemented is to disallow reorg.c to consider exception
> causing instructions for delay slots.

This seems safest, indeed.

> Mark, ok for 3.1 branch?

Yes, please!

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26  5:03             ` Jason Merrill
@ 2002-04-26 13:34               ` John David Anglin
  2002-04-26 14:42                 ` law
  2002-04-27 23:31                 ` David S. Miller
  0 siblings, 2 replies; 18+ messages in thread
From: John David Anglin @ 2002-04-26 13:34 UTC (permalink / raw)
  To: Jason Merrill; +Cc: davem, gcc-patches, mark, java, dave.anglin

> > Hmmm... ok, but does PC need to be accurate for other purposes?
> 
> The uses of PC in the EH mechanism are only interested in the location of
> the throw.  The return address is just a way of getting at it.

One of the optimizations that was disabled in the hppa-linux dw2
implementation was the return pointer adjustment for unconditional
branches in the delay slot of a call sequence.  In this optimization,
the return pointer is adjusted to effect the branch when the return
occurs.  This causes the loss of the location of the throw.

I must admit that I find it strange that the compiler puts these
insns into the delay slot at all.  In all other cases, the delay
slot insn executes before the branch is actually taken.  This seems
inconsistent.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26 13:34               ` John David Anglin
@ 2002-04-26 14:42                 ` law
  2002-04-26 18:59                   ` John David Anglin
  2002-04-28  8:23                   ` David S. Miller
  2002-04-27 23:31                 ` David S. Miller
  1 sibling, 2 replies; 18+ messages in thread
From: law @ 2002-04-26 14:42 UTC (permalink / raw)
  To: John David Anglin
  Cc: Jason Merrill, davem, gcc-patches, mark, java, dave.anglin

 In message <200204262003.g3QK3Xrp004014@hiauly1.hia.nrc.ca>, "John David 
Anglin" writes:
 > One of the optimizations that was disabled in the hppa-linux dw2
 > implementation was the return pointer adjustment for unconditional
 > branches in the delay slot of a call sequence.  In this optimization,
 > the return pointer is adjusted to effect the branch when the return
 > occurs.  This causes the loss of the location of the throw.
 > 
 > I must admit that I find it strange that the compiler puts these
 > insns into the delay slot at all.  In all other cases, the delay
 > slot insn executes before the branch is actually taken.  This seems
 > inconsistent.
It's not inconsistent.

The branch executes before its delay slot -- only the side effect of changing
flow control happens after the delay slot.  All other effects such as
generation of the return address into the return register occur _before_ the
delay slot instruction.

It's a very important distinction and it's key to understanding why things
like modifying a return pointer in the delay slot of a call instruction to
eliminate an unconditional branch after the call work.

It's also an important distinction when dealing with branches that have
other side effects such as movb, addb.

jeff

ps.  The trick of modifying the return value in a delay slot of a call is an
interesting optimization, but is not generally useful on targets which have a
call/return stack for predicting branches.  This is why we disable it anytime
we're optimizing for a PA8000 or newer machine.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26 14:42                 ` law
@ 2002-04-26 18:59                   ` John David Anglin
  2002-04-27  9:32                     ` law
  2002-04-28  8:23                   ` David S. Miller
  1 sibling, 1 reply; 18+ messages in thread
From: John David Anglin @ 2002-04-26 18:59 UTC (permalink / raw)
  To: law; +Cc: jason, davem, gcc-patches, mark, java, dave.anglin

>  > I must admit that I find it strange that the compiler puts these
>  > insns into the delay slot at all.  In all other cases, the delay
>  > slot insn executes before the branch is actually taken.  This seems
>  > inconsistent.
> It's not inconsistent.
> 
> The branch executes before its delay slot -- only the side effect of
> changing
> flow control happens after the delay slot.  All other effects such as
> generation of the return address into the return register occur _before_ the
> delay slot instruction.

Yes, the branch actually just updates iaoq_next.  However, when a branch
follows a branch, the second branch modifies the insn queue so that
one insn at the target of the first branch is executed, then the flow
transfers to the target of the second branch.  This differs from what
happens with the return pointer optimization where the effect doesn't
take place until the return of the function called.  You can create
a two insn timer loop using this capability and there is a significant
performance improvement on older machines without branch prediction.

I still think that a branch in the delay slot of another branch (call)
on the PA is not equivalent to the return pointer optimization (assuming
the adjustment can be made).

> call/return stack for predicting branches.  This is why we disable it
> anytime
> we're optimizing for a PA8000 or newer machine.

I looked at output_call again and I couldn't see that this is disabled
for PA8000 or newer machines.  In the dw2 testing, we were definitely
getting unconditional branches in the delay slot of calls on PA8000
or newer machines.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26 18:59                   ` John David Anglin
@ 2002-04-27  9:32                     ` law
  2002-04-27 11:03                       ` John David Anglin
  0 siblings, 1 reply; 18+ messages in thread
From: law @ 2002-04-27  9:32 UTC (permalink / raw)
  To: John David Anglin; +Cc: jason, davem, gcc-patches, mark, java, dave.anglin

 In message <200204262146.g3QLk4Hp007025@hiauly1.hia.nrc.ca>, "John David 
Anglin" writes:
 > Yes, the branch actually just updates iaoq_next.  However, when a branch
 > follows a branch, the second branch modifies the insn queue so that
 > one insn at the target of the first branch is executed, then the flow
 > transfers to the target of the second branch.  This differs from what
 > happens with the return pointer optimization where the effect doesn't
 > take place until the return of the function called.  You can create
 > a two insn timer loop using this capability and there is a significant
 > performance improvement on older machines without branch prediction.
 > 
 > I still think that a branch in the delay slot of another branch (call)
 > on the PA is not equivalent to the return pointer optimization (assuming
 > the adjustment can be made).
Maybe this is your confusion.  The return address twiddling isn't meant
to optimize a branch in the delay slot of the call, but instead a branch
after the call (and after the call's delay slot).  Maybe code would be better.

Let's assume we have something like this before delay slot optimizations
are run:

  bl foo,%r2
    <delay slot>
  bl newtarget,%r0
    <delay slot>


We can arrange for the return from "foo" to resume execution at "newtarget"
by twiddling the value in %r2 in the delay slot of foo.  It turns the code
into something like this:

temp:
  bl foo,%r2
  ldo newtarget-temp-8(%r2),%r2


Note carefully we are not trying to optimize the case of a branch in another
branch's delay slot.  We do not generate such code to the best of my knowledge,
and such code is not generally useful (you get a single instruction executed
from the target of the first branch if I recall correctly).  It's also the
case that anytime you have a branch in the delay slot of another branch all
predictions are turned off, so even if you could come up with an obscure way
to use this feature, it'd probably perform poorly, even on older HPs.



 > > call/return stack for predicting branches.  This is why we disable it
 > > anytime
 > > we're optimizing for a PA8000 or newer machine.
 > 
 > I looked at output_call again and I couldn't see that this is disabled
 > for PA8000 or newer machines.  In the dw2 testing, we were definitely
 > getting unconditional branches in the delay slot of calls on PA8000
 > or newer machines.
It's handled elsewhere -- output_call is way too late to catch this.  You
have to prevent reorg from filling the delay slot of the call with the
return address adjustment.

Also remember, this is controlled by optimizing for the PA8000 (-msched=8000).
If you didn't use -msched=8000, then you'd still get the return pointer
adjustments.

jeff







^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-27  9:32                     ` law
@ 2002-04-27 11:03                       ` John David Anglin
  2002-04-29  8:25                         ` law
  0 siblings, 1 reply; 18+ messages in thread
From: John David Anglin @ 2002-04-27 11:03 UTC (permalink / raw)
  To: law; +Cc: jason, davem, gcc-patches, mark, java, dave.anglin

>  > I still think that a branch in the delay slot of another branch (call)
>  > on the PA is not equivalent to the return pointer optimization (assuming
>  > the adjustment can be made).
> Maybe this is your confusion.  The return address twiddling isn't meant
> to optimize a branch in the delay slot of the call, but instead a branch
> after the call (and after the call's delay slot).  Maybe code would be
> better.
> 
> Let's assume we have something like this before delay slot optimizations
> are run:
> 
>   bl foo,%r2
>     <delay slot>
>   bl newtarget,%r0
>     <delay slot>

The point that I was trying to make is that in output_call this would
appear as a sequence with the second branch in the "delay slot" of the
the first branch.  We check for this and do the return adjust if we
can.  If we can't we just put a nop in the delay slot.

If the insn in the "delay slot" is not a branch, we just output it.
This is often an insn to load an argument register for the branch
to foo.  So, clearly in this case we assume that the second insn
executes before the transfer to foo occurs.

As you have noted before, it is generally a good idea to make the
rtl match the hardware as closely as is possible.  Maybe these sequences
are created at such a late stage that the above inconsistency doesn't
matter but I think there are a number of reasons not to allow it:

1)  As you noted below, there is no gain on PA8000 and above.
2)  We have to disable the return pointer optimization under hppa-linux
    for correct operation of the dwarf2 exception support.
3)  If we didn't allow an unconditional branch as the second insn in
    a delayed branch sequence, then the compiler might be able to find
    some other insn to fit in the sequence.  I guess we could allow
    an unconditional jump as the last alternative if there are no
    others.

At a minimum, we should turn off allowing branches in the "delay slot"
of sequences under hppa-linux at the same point as we do for the PA8000.

> We can arrange for the return from "foo" to resume execution at "newtarget"
> by twiddling the value in %r2 in the delay slot of foo.  It turns the code
> into something like this:
> 
> temp:
>   bl foo,%r2
>   ldo newtarget-temp-8(%r2),%r2

I understand this part.

> 
> Note carefully we are not trying to optimize the case of a branch in another
> branch's delay slot.  We do not generate such code to the best of my
> knowledge,
> and such code is not generally useful (you get a single instruction executed

The loop that I had was something like

extern __inline__ void __delay(unsigned long loops) {
  asm volatile(
"	addib,UV,n      -1,%0,.
	addib,NUV,n     -1,%0,.+8
	nop"
 : "=r" (loops) : "0" (loops));

Don't know if this is generally useful.  It does run much faster on a
7100 (more or less one cycle per decrement and test).

> Also remember, this is controlled by optimizing for the PA8000
> (-msched=8000).
> If you didn't use -msched=8000, then you'd still get the return pointer
> adjustments.

I wasn't using -msched=8000 so that explains why this was occuring.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26 13:34               ` John David Anglin
  2002-04-26 14:42                 ` law
@ 2002-04-27 23:31                 ` David S. Miller
  1 sibling, 0 replies; 18+ messages in thread
From: David S. Miller @ 2002-04-27 23:31 UTC (permalink / raw)
  To: dave; +Cc: jason, gcc-patches, mark, java, dave.anglin

   From: "John David Anglin" <dave@hiauly1.hia.nrc.ca>
   Date: Fri, 26 Apr 2002 16:03:32 -0400 (EDT)

   One of the optimizations that was disabled in the hppa-linux dw2
   implementation was the return pointer adjustment for unconditional
   branches in the delay slot of a call sequence.

Identically on Sparc, all of the peepholes we use to generate that
kind of return pointer adjustment are protected like so:

  "short_branch (INSN_UID (insn), INSN_UID (operands[3]))
   && (USING_SJLJ_EXCEPTIONS || ! can_throw_internal (ins1))"

Check out config/sparc/sparc.md starting at the line that reads:

;; Now peepholes to do a call followed by a jump.

which is around line 9356.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-26 14:42                 ` law
  2002-04-26 18:59                   ` John David Anglin
@ 2002-04-28  8:23                   ` David S. Miller
  2002-04-28 23:24                     ` law
  1 sibling, 1 reply; 18+ messages in thread
From: David S. Miller @ 2002-04-28  8:23 UTC (permalink / raw)
  To: law; +Cc: dave, jason, gcc-patches, mark, java, dave.anglin

   From: law@redhat.com
   Date: Fri, 26 Apr 2002 14:37:49 -0600
   
   ps.  The trick of modifying the return value in a delay slot of a call is an
   interesting optimization, but is not generally useful on targets which have a
   call/return stack for predicting branches.  This is why we disable it anytime
   we're optimizing for a PA8000 or newer machine.
   
I keep forgetting the make the Sparc backend do likewise when
generating code for UltraSPARC or later, for the same reason.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-28  8:23                   ` David S. Miller
@ 2002-04-28 23:24                     ` law
  0 siblings, 0 replies; 18+ messages in thread
From: law @ 2002-04-28 23:24 UTC (permalink / raw)
  To: David S. Miller; +Cc: dave, jason, gcc-patches, mark, java, dave.anglin

 In message <20020427.232129.123986259.davem@redhat.com>, "David S. Miller" 
writes:
 > I keep forgetting the make the Sparc backend do likewise when
 > generating code for UltraSPARC or later, for the same reason.
Yea.  I don't remember the exact numbers, but it was a percentage point or
two for integer codes.  Enough to be measurable :-)

What I keep meaning to do is switch the PA to default to PA8000 scheduling as
they're probably the most commonly found machines these days.

jeff


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Exception causing insns in delay slots
  2002-04-27 11:03                       ` John David Anglin
@ 2002-04-29  8:25                         ` law
  0 siblings, 0 replies; 18+ messages in thread
From: law @ 2002-04-29  8:25 UTC (permalink / raw)
  To: John David Anglin; +Cc: jason, davem, gcc-patches, mark, java, dave.anglin

 In message <200204271632.g3RGWIfV013676@hiauly1.hia.nrc.ca>, "John David 
Anglin" writes:
 > The point that I was trying to make is that in output_call this would
 > appear as a sequence with the second branch in the "delay slot" of the
 > the first branch.  We check for this and do the return adjust if we
 > can.  If we can't we just put a nop in the delay slot.
Right.  So what's the problem?  The vast majority of the time if we've put
the JUMP_INSN in the delay slot of a CALL_INSN, then we'll be able to
apply the return pointer adjustment trick.

Yes you get sub-optimal code if the branch target is out of range, but what
would you propose to do?  Re-run the delay slot optimizer again?  It's not
that common and in the cases where it does happen I seriously doubt the
call/jump is on the critical path.

 > If the insn in the "delay slot" is not a branch, we just output it.
 > This is often an insn to load an argument register for the branch
 > to foo.  So, clearly in this case we assume that the second insn
 > executes before the transfer to foo occurs.
No, that's not a safe assumption at all.  In fact, it would be a very unsafe
assumption.  Consider

	copy %r1,%r10
	movb,= %r10,%r11,target
	nop

You can not transform that into

	movb,= %r10,%r11,target
	copy %r1,%r10

Remember, only the side effect of control transfer is delayed.





 > As you have noted before, it is generally a good idea to make the
 > rtl match the hardware as closely as is possible.  Maybe these sequences
 > are created at such a late stage that the above inconsistency doesn't
 > matter but I think there are a number of reasons not to allow it:
I still don't see the inconsistency.  The RTL describes to the fullest
extent possible what is happening.  We have a branch and of some kind and
an insn in the delay slot of the branch.  The specifics of what we can 
safely put into a delay slot are handled by a combination of generic code
which tracks dataflow and target code to deal with testing candidate insns
for validity.

The delay slot does _not_ execute before the branch, this is precisely why
you can't (for example) take an insn from before the branch which sets a
resource used by the branch.  The generic code in reorg which tracks dataflow
ensures this can't happen.

Are you saying it would be good if we put an adjustment of %r2/%r31 in the
delay slot instead of the JUMP_INSN as that more accurately represents what
assembly code we will produce?  If so, maybe, but you have some technical
difficulties to consider for the representation and ultimately I don't think
it's going to be that useful.


 > 1)  As you noted below, there is no gain on PA8000 and above.
Right.

 > 2)  We have to disable the return pointer optimization under hppa-linux
 >     for correct operation of the dwarf2 exception support.
Right.

 > 3)  If we didn't allow an unconditional branch as the second insn in
 >     a delayed branch sequence, then the compiler might be able to find
 >     some other insn to fit in the sequence.  I guess we could allow
 >     an unconditional jump as the last alternative if there are no
 >     others.
For targets where it's useful (older PAs running HPUX), then it is far
more profitable to do the return address hack than any other filling
of those delay slots.  That's why it's done first.  If we fall outside
the confines of the systems where it is useful, then we should not do
the return address hack at all (ie #1 & #2 above).

 > At a minimum, we should turn off allowing branches in the "delay slot"
 > of sequences under hppa-linux at the same point as we do for the PA8000.
Yup.  It's somewhere in pa.c.  following_call or something like that

 > > Note carefully we are not trying to optimize the case of a branch in anoth
 > er
 > > branch's delay slot.  We do not generate such code to the best of my
 > > knowledge,
 > > and such code is not generally useful (you get a single instruction execut
 > ed
 > 
 > The loop that I had was something like
 > 
 > extern __inline__ void __delay(unsigned long loops) {
 >   asm volatile(
 > "	addib,UV,n      -1,%0,.
 > 	addib,NUV,n     -1,%0,.+8
 > 	nop"
 >  : "=r" (loops) : "0" (loops));
 > 
 > Don't know if this is generally useful.  It does run much faster on a
 > 7100 (more or less one cycle per decrement and test).
Much faster than what?


jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2002-04-29 15:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-25 21:41 Exception causing insns in delay slots David S. Miller
2002-04-26  3:53 ` Jason Merrill
2002-04-26  4:15   ` David S. Miller
2002-04-26  4:19     ` Jason Merrill
2002-04-26  4:45       ` David S. Miller
2002-04-26  4:51         ` Jason Merrill
2002-04-26  4:59           ` David S. Miller
2002-04-26  5:03             ` Jason Merrill
2002-04-26 13:34               ` John David Anglin
2002-04-26 14:42                 ` law
2002-04-26 18:59                   ` John David Anglin
2002-04-27  9:32                     ` law
2002-04-27 11:03                       ` John David Anglin
2002-04-29  8:25                         ` law
2002-04-28  8:23                   ` David S. Miller
2002-04-28 23:24                     ` law
2002-04-27 23:31                 ` David S. Miller
2002-04-26 13:08 ` Mark Mitchell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).