public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] gold: Add linker relaxation of tail calls on sparc.
@ 2012-04-23  9:34 David Miller
  2012-04-24 21:45 ` David Miller
  2012-04-24 22:40 ` Ian Lance Taylor
  0 siblings, 2 replies; 4+ messages in thread
From: David Miller @ 2012-04-23  9:34 UTC (permalink / raw)
  To: binutils


This gets GOLD in line with the relaxations performed by the
BFD linker on sparc.  Basically we can turn a tail call into
a branch if certain conditions are satisfied, and in particular
the final call displacement is small enough.

The code is heavily and thoroughly commented.

I've validated that the relaxation triggers during the testsuite on
both 32-bit and 64-bit sparc, and that the transformations performed
are correct.

I looked at the relaxation support that exists for the kind of stuff
ARM does with stubs where section sizes and offsets end up changing,
and that didn't seem appropriate nor necessary for what I'm doing
here.  So I implemented this similarly to how the TLS code
optimizations are done.

This is relative to "[PATCH] gold: Maintain sparc ELF header bits
properly" for which I'm still waiting approval.

After this I only have 3 items on my agenda for GOLD/Sparc:

1) Implement proper R_SPARC_REGISTER support, this has been bugging
   me for years but last time I tried this it was hard to frob out
   dummy relocations and/or symbols and have GOLD just leave them
   completely alone and let them propagate unimpeded to the target
   relocation function.

   My frustration matched that which Roland recently endured with
   those special ARM PLT symbols.  :-) Hopefully this kind of thing is
   easier now.

2) Fix all of the bugs wrt. large PLTs on sparc64.  As I mentioned
   in my IFUNC patch posting, in order to get this right we'll need
   to pass PLT indexes around rather than PLT offsets.

   The core issue is that PLTs are variably sized on 64-bit sparc, the
   first 32767 PLT slots are one size, and then the format and the
   size of the PLT entries change starting at index 32768.  So if you
   have an IFUNC plt slot, you won't know how big it's going to be
   until you know it's final index.

3) Implement full incremental linking support just like x86-64 does.
   Any known gotchas or implementation issues would be appreciated.

Thanks.

gold/

	* sparc.cc (Target_sparc::Relocate::relax_call): New function.
	(Target_sparc::Relocate::relocate): Call it for R_SPARC_WDISP30
	and R_SPARC_WPLT30.

diff --git a/gold/sparc.cc b/gold/sparc.cc
index 004a265..a9fe9de 100644
--- a/gold/sparc.cc
+++ b/gold/sparc.cc
@@ -333,6 +333,11 @@ class Target_sparc : public Sized_target<size, big_endian>
 		 typename elfcpp::Elf_types<size>::Elf_Addr,
 		 section_size_type);
 
+    inline void
+    relax_call(Target_sparc<size, big_endian>* target,
+	       unsigned char* view,
+	       const elfcpp::Rela<size, big_endian>& rela);
+
     // Ignore the next relocation which should be R_SPARC_TLS_GD_ADD
     bool ignore_gd_add_;
 
@@ -3304,6 +3309,8 @@ Target_sparc<size, big_endian>::Relocate::relocate(
     case elfcpp::R_SPARC_WDISP30:
     case elfcpp::R_SPARC_WPLT30:
       Reloc::wdisp30(view, object, psymval, addend, address);
+      if (target->may_relax())
+	relax_call(target, view, rela);
       break;
 
     case elfcpp::R_SPARC_WDISP22:
@@ -3954,6 +3961,145 @@ Target_sparc<size, big_endian>::Relocate::relocate_tls(
     }
 }
 
+// Relax a call instruction.
+
+template<int size, bool big_endian>
+inline void
+Target_sparc<size, big_endian>::Relocate::relax_call(
+    Target_sparc<size, big_endian>* target,
+    unsigned char* view,
+    const elfcpp::Rela<size, big_endian>& rela)
+{
+  typedef typename elfcpp::Swap<32, true>::Valtype Insntype;
+  Insntype *wv = reinterpret_cast<Insntype*>(view);
+  Insntype call_insn, delay_insn, set_insn;
+  uint32_t op3, reg, off;
+
+  // This code tries to relax call instructions that meet
+  // certain criteria.
+  //
+  // The first criteria is that the call must be such that the return
+  // address which the call writes into %o7 is unused.  Two sequences
+  // meet this criteria, and are used to implement tail calls.
+  //
+  // Leaf function tail call:
+  //
+  // or %o7, %g0, %ANY_REG
+  // call FUNC
+  //  or %ANY_REG, %g0, %o7
+  //
+  // Non-leaf function tail call:
+  //
+  // call FUNC
+  //  restore
+  //
+  // The second criteria is that the call destination is close.  If
+  // the displacement can fit in a signed 22-bit immediate field of a
+  // pre-V9 branch, we can do it.  If we are generating a 64-bit
+  // object or a 32-bit object with ELF machine type EF_SPARC32PLUS,
+  // and the displacement fits in a signed 19-bit immediate field,
+  // then we can use a V9 branch.
+
+  call_insn = elfcpp::Swap<32, true>::readval(wv);
+  delay_insn = elfcpp::Swap<32, true>::readval(wv + 1);
+
+  // Make sure it is really a call instruction.
+  if (((call_insn >> 30) & 0x3) != 1)
+    return;
+
+  if (((delay_insn >> 30) & 0x3) != 2)
+    return;
+
+  // Accept only a restore or an integer arithmetic operation whose
+  // sole side effect is to write the %o7 register (and perhaps set
+  // the condition codes, which are considered clobbered across
+  // function calls).
+  //
+  // For example, we don't want to match a tagged addition or
+  // subtraction.  We also don't want to match something like a
+  // divide.
+  //
+  // Specifically we accept add{,cc}, and{,cc}, or{,cc},
+  // xor{,cc}, sub{,cc}, andn{,cc}, orn{,cc}, and xnor{,cc}.
+
+  op3 = (delay_insn >> 19) & 0x3f;
+  reg = (delay_insn >> 25) & 0x1f;
+  if (op3 != 0x3d
+      && ((op3 & 0x28) != 0 || reg != 15))
+    return;
+
+  // For non-restore instructions, make sure %o7 isn't
+  // an input.
+  if (op3 != 0x3d)
+    {
+      // First check RS1
+      reg = (delay_insn >> 14) & 0x15;
+      if (reg == 15)
+	return;
+
+      // And if non-immediate, check RS2
+      if (((delay_insn >> 13) & 1) == 0)
+	{
+	  reg = (delay_insn & 0x1f);
+	  if (reg == 15)
+	    return;
+	}
+    }
+
+  // Now check the branch distance.  We are called after the
+  // call has been relocated, so we just have to peek at the
+  // offset contained in the instruction.
+  off = call_insn & 0x3fffffff;
+  if ((off & 0x3fe00000)
+      && (off & 0x3fe00000) != 0x3fe00000)
+    return;
+
+  if ((size == 64 || target->elf_machine_ == elfcpp::EM_SPARC32PLUS)
+      && ((off & 0x3c0000) == 0
+	  || (off & 0x3c0000) == 0x3c0000))
+    {
+      // ba,pt %xcc, FUNC
+      call_insn = 0x10680000 | (off & 0x07ffff);
+    }
+  else
+    {
+      // ba FUNC
+      call_insn = 0x10800000 | (off & 0x3fffff);
+    }
+  elfcpp::Swap<32, true>::writeval(wv, call_insn);
+
+  // See if we can NOP out the delay slot instruction.  We peek
+  // at the instruction before the call to make sure we're dealing
+  // with exactly the:
+  //
+  // or %o7, %g0, %ANY_REG
+  // call
+  //  or %ANY_REG, %g0, %o7
+  //
+  // case.  Otherwise this might be a tricky piece of hand written
+  // assembler calculating %o7 in some non-trivial way, and therefore
+  // we can't be sure that NOP'ing out the delay slot is safe.
+  if (op3 == 0x02
+      && rela.get_r_offset() >= 4)
+    {
+      if ((delay_insn & ~(0x1f << 14)) != 0x9e100000)
+	return;
+
+      set_insn = elfcpp::Swap<32, true>::readval(wv - 1);
+      if ((set_insn & ~(0x1f << 25)) != 0x8013c000)
+	return;
+
+      reg = (set_insn >> 25) & 0x1f;
+      if (reg == 0 || reg == 15)
+	return;
+      if (reg != ((delay_insn >> 14) & 0x1f))
+	return;
+
+      // All tests pass, nop it out.
+      elfcpp::Swap<32, true>::writeval(wv + 1, sparc_nop);
+    }
+}
+
 // Relocate section data.
 
 template<int size, bool big_endian>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] gold: Add linker relaxation of tail calls on sparc.
  2012-04-23  9:34 [PATCH] gold: Add linker relaxation of tail calls on sparc David Miller
@ 2012-04-24 21:45 ` David Miller
  2012-04-24 22:40 ` Ian Lance Taylor
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2012-04-24 21:45 UTC (permalink / raw)
  To: binutils; +Cc: iant

From: David Miller <davem@davemloft.net>
Date: Mon, 23 Apr 2012 03:08:21 -0400 (EDT)

Ian, could you please review this patch?

Thanks a lot.

> This gets GOLD in line with the relaxations performed by the
> BFD linker on sparc.  Basically we can turn a tail call into
> a branch if certain conditions are satisfied, and in particular
> the final call displacement is small enough.
> 
> The code is heavily and thoroughly commented.
> 
> I've validated that the relaxation triggers during the testsuite on
> both 32-bit and 64-bit sparc, and that the transformations performed
> are correct.
> 
> I looked at the relaxation support that exists for the kind of stuff
> ARM does with stubs where section sizes and offsets end up changing,
> and that didn't seem appropriate nor necessary for what I'm doing
> here.  So I implemented this similarly to how the TLS code
> optimizations are done.
> 
> This is relative to "[PATCH] gold: Maintain sparc ELF header bits
> properly" for which I'm still waiting approval.
> 
> After this I only have 3 items on my agenda for GOLD/Sparc:
> 
> 1) Implement proper R_SPARC_REGISTER support, this has been bugging
>    me for years but last time I tried this it was hard to frob out
>    dummy relocations and/or symbols and have GOLD just leave them
>    completely alone and let them propagate unimpeded to the target
>    relocation function.
> 
>    My frustration matched that which Roland recently endured with
>    those special ARM PLT symbols.  :-) Hopefully this kind of thing is
>    easier now.
> 
> 2) Fix all of the bugs wrt. large PLTs on sparc64.  As I mentioned
>    in my IFUNC patch posting, in order to get this right we'll need
>    to pass PLT indexes around rather than PLT offsets.
> 
>    The core issue is that PLTs are variably sized on 64-bit sparc, the
>    first 32767 PLT slots are one size, and then the format and the
>    size of the PLT entries change starting at index 32768.  So if you
>    have an IFUNC plt slot, you won't know how big it's going to be
>    until you know it's final index.
> 
> 3) Implement full incremental linking support just like x86-64 does.
>    Any known gotchas or implementation issues would be appreciated.
> 
> Thanks.
> 
> gold/
> 
> 	* sparc.cc (Target_sparc::Relocate::relax_call): New function.
> 	(Target_sparc::Relocate::relocate): Call it for R_SPARC_WDISP30
> 	and R_SPARC_WPLT30.
> 
> diff --git a/gold/sparc.cc b/gold/sparc.cc
> index 004a265..a9fe9de 100644
> --- a/gold/sparc.cc
> +++ b/gold/sparc.cc
> @@ -333,6 +333,11 @@ class Target_sparc : public Sized_target<size, big_endian>
>  		 typename elfcpp::Elf_types<size>::Elf_Addr,
>  		 section_size_type);
>  
> +    inline void
> +    relax_call(Target_sparc<size, big_endian>* target,
> +	       unsigned char* view,
> +	       const elfcpp::Rela<size, big_endian>& rela);
> +
>      // Ignore the next relocation which should be R_SPARC_TLS_GD_ADD
>      bool ignore_gd_add_;
>  
> @@ -3304,6 +3309,8 @@ Target_sparc<size, big_endian>::Relocate::relocate(
>      case elfcpp::R_SPARC_WDISP30:
>      case elfcpp::R_SPARC_WPLT30:
>        Reloc::wdisp30(view, object, psymval, addend, address);
> +      if (target->may_relax())
> +	relax_call(target, view, rela);
>        break;
>  
>      case elfcpp::R_SPARC_WDISP22:
> @@ -3954,6 +3961,145 @@ Target_sparc<size, big_endian>::Relocate::relocate_tls(
>      }
>  }
>  
> +// Relax a call instruction.
> +
> +template<int size, bool big_endian>
> +inline void
> +Target_sparc<size, big_endian>::Relocate::relax_call(
> +    Target_sparc<size, big_endian>* target,
> +    unsigned char* view,
> +    const elfcpp::Rela<size, big_endian>& rela)
> +{
> +  typedef typename elfcpp::Swap<32, true>::Valtype Insntype;
> +  Insntype *wv = reinterpret_cast<Insntype*>(view);
> +  Insntype call_insn, delay_insn, set_insn;
> +  uint32_t op3, reg, off;
> +
> +  // This code tries to relax call instructions that meet
> +  // certain criteria.
> +  //
> +  // The first criteria is that the call must be such that the return
> +  // address which the call writes into %o7 is unused.  Two sequences
> +  // meet this criteria, and are used to implement tail calls.
> +  //
> +  // Leaf function tail call:
> +  //
> +  // or %o7, %g0, %ANY_REG
> +  // call FUNC
> +  //  or %ANY_REG, %g0, %o7
> +  //
> +  // Non-leaf function tail call:
> +  //
> +  // call FUNC
> +  //  restore
> +  //
> +  // The second criteria is that the call destination is close.  If
> +  // the displacement can fit in a signed 22-bit immediate field of a
> +  // pre-V9 branch, we can do it.  If we are generating a 64-bit
> +  // object or a 32-bit object with ELF machine type EF_SPARC32PLUS,
> +  // and the displacement fits in a signed 19-bit immediate field,
> +  // then we can use a V9 branch.
> +
> +  call_insn = elfcpp::Swap<32, true>::readval(wv);
> +  delay_insn = elfcpp::Swap<32, true>::readval(wv + 1);
> +
> +  // Make sure it is really a call instruction.
> +  if (((call_insn >> 30) & 0x3) != 1)
> +    return;
> +
> +  if (((delay_insn >> 30) & 0x3) != 2)
> +    return;
> +
> +  // Accept only a restore or an integer arithmetic operation whose
> +  // sole side effect is to write the %o7 register (and perhaps set
> +  // the condition codes, which are considered clobbered across
> +  // function calls).
> +  //
> +  // For example, we don't want to match a tagged addition or
> +  // subtraction.  We also don't want to match something like a
> +  // divide.
> +  //
> +  // Specifically we accept add{,cc}, and{,cc}, or{,cc},
> +  // xor{,cc}, sub{,cc}, andn{,cc}, orn{,cc}, and xnor{,cc}.
> +
> +  op3 = (delay_insn >> 19) & 0x3f;
> +  reg = (delay_insn >> 25) & 0x1f;
> +  if (op3 != 0x3d
> +      && ((op3 & 0x28) != 0 || reg != 15))
> +    return;
> +
> +  // For non-restore instructions, make sure %o7 isn't
> +  // an input.
> +  if (op3 != 0x3d)
> +    {
> +      // First check RS1
> +      reg = (delay_insn >> 14) & 0x15;
> +      if (reg == 15)
> +	return;
> +
> +      // And if non-immediate, check RS2
> +      if (((delay_insn >> 13) & 1) == 0)
> +	{
> +	  reg = (delay_insn & 0x1f);
> +	  if (reg == 15)
> +	    return;
> +	}
> +    }
> +
> +  // Now check the branch distance.  We are called after the
> +  // call has been relocated, so we just have to peek at the
> +  // offset contained in the instruction.
> +  off = call_insn & 0x3fffffff;
> +  if ((off & 0x3fe00000)
> +      && (off & 0x3fe00000) != 0x3fe00000)
> +    return;
> +
> +  if ((size == 64 || target->elf_machine_ == elfcpp::EM_SPARC32PLUS)
> +      && ((off & 0x3c0000) == 0
> +	  || (off & 0x3c0000) == 0x3c0000))
> +    {
> +      // ba,pt %xcc, FUNC
> +      call_insn = 0x10680000 | (off & 0x07ffff);
> +    }
> +  else
> +    {
> +      // ba FUNC
> +      call_insn = 0x10800000 | (off & 0x3fffff);
> +    }
> +  elfcpp::Swap<32, true>::writeval(wv, call_insn);
> +
> +  // See if we can NOP out the delay slot instruction.  We peek
> +  // at the instruction before the call to make sure we're dealing
> +  // with exactly the:
> +  //
> +  // or %o7, %g0, %ANY_REG
> +  // call
> +  //  or %ANY_REG, %g0, %o7
> +  //
> +  // case.  Otherwise this might be a tricky piece of hand written
> +  // assembler calculating %o7 in some non-trivial way, and therefore
> +  // we can't be sure that NOP'ing out the delay slot is safe.
> +  if (op3 == 0x02
> +      && rela.get_r_offset() >= 4)
> +    {
> +      if ((delay_insn & ~(0x1f << 14)) != 0x9e100000)
> +	return;
> +
> +      set_insn = elfcpp::Swap<32, true>::readval(wv - 1);
> +      if ((set_insn & ~(0x1f << 25)) != 0x8013c000)
> +	return;
> +
> +      reg = (set_insn >> 25) & 0x1f;
> +      if (reg == 0 || reg == 15)
> +	return;
> +      if (reg != ((delay_insn >> 14) & 0x1f))
> +	return;
> +
> +      // All tests pass, nop it out.
> +      elfcpp::Swap<32, true>::writeval(wv + 1, sparc_nop);
> +    }
> +}
> +
>  // Relocate section data.
>  
>  template<int size, bool big_endian>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] gold: Add linker relaxation of tail calls on sparc.
  2012-04-23  9:34 [PATCH] gold: Add linker relaxation of tail calls on sparc David Miller
  2012-04-24 21:45 ` David Miller
@ 2012-04-24 22:40 ` Ian Lance Taylor
  2012-04-25  8:04   ` David Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Ian Lance Taylor @ 2012-04-24 22:40 UTC (permalink / raw)
  To: David Miller; +Cc: binutils

David Miller <davem@davemloft.net> writes:

> 	* sparc.cc (Target_sparc::Relocate::relax_call): New function.
> 	(Target_sparc::Relocate::relocate): Call it for R_SPARC_WDISP30
> 	and R_SPARC_WPLT30.


> +template<int size, bool big_endian>
> +inline void
> +Target_sparc<size, big_endian>::Relocate::relax_call(
> +    Target_sparc<size, big_endian>* target,
> +    unsigned char* view,
> +    const elfcpp::Rela<size, big_endian>& rela)
> +{
> +  typedef typename elfcpp::Swap<32, true>::Valtype Insntype;
> +  Insntype *wv = reinterpret_cast<Insntype*>(view);
> +  Insntype call_insn, delay_insn, set_insn;
> +  uint32_t op3, reg, off;
> +
> +  // This code tries to relax call instructions that meet
> +  // certain criteria.
> +  //
> +  // The first criteria is that the call must be such that the return
> +  // address which the call writes into %o7 is unused.  Two sequences
> +  // meet this criteria, and are used to implement tail calls.
> +  //
> +  // Leaf function tail call:
> +  //
> +  // or %o7, %g0, %ANY_REG
> +  // call FUNC
> +  //  or %ANY_REG, %g0, %o7
> +  //
> +  // Non-leaf function tail call:
> +  //
> +  // call FUNC
> +  //  restore
> +  //
> +  // The second criteria is that the call destination is close.  If
> +  // the displacement can fit in a signed 22-bit immediate field of a
> +  // pre-V9 branch, we can do it.  If we are generating a 64-bit
> +  // object or a 32-bit object with ELF machine type EF_SPARC32PLUS,
> +  // and the displacement fits in a signed 19-bit immediate field,
> +  // then we can use a V9 branch.
> +
> +  call_insn = elfcpp::Swap<32, true>::readval(wv);
> +  delay_insn = elfcpp::Swap<32, true>::readval(wv + 1);

It is possible that reading delay_insn is reading past the end of the
section data.  Consider passing in view_size from the caller and
verifying that rela.get_r_offset() + 8 <= view_size.

> +  // Now check the branch distance.  We are called after the
> +  // call has been relocated, so we just have to peek at the
> +  // offset contained in the instruction.
> +  off = call_insn & 0x3fffffff;
> +  if ((off & 0x3fe00000)
> +      && (off & 0x3fe00000) != 0x3fe00000)
> +    return;

I think this is a little clearer if you write the explicit != 0.

  if ((off & 0x3fe00000) != 0
      && (off & 0x3fe00000) != 0x3fe00000)


This is OK with those changes.

Thanks.

Ian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] gold: Add linker relaxation of tail calls on sparc.
  2012-04-24 22:40 ` Ian Lance Taylor
@ 2012-04-25  8:04   ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2012-04-25  8:04 UTC (permalink / raw)
  To: iant; +Cc: binutils

From: Ian Lance Taylor <iant@google.com>
Date: Tue, 24 Apr 2012 14:57:54 -0700

>> +  call_insn = elfcpp::Swap<32, true>::readval(wv);
>> +  delay_insn = elfcpp::Swap<32, true>::readval(wv + 1);
> 
> It is possible that reading delay_insn is reading past the end of the
> section data.  Consider passing in view_size from the caller and
> verifying that rela.get_r_offset() + 8 <= view_size.

Good idea, I'll write down a TODO to add similar checks to the sparc
TLS relocation handling which does the same kind of delay slot access.

>> +  // Now check the branch distance.  We are called after the
>> +  // call has been relocated, so we just have to peek at the
>> +  // offset contained in the instruction.
>> +  off = call_insn & 0x3fffffff;
>> +  if ((off & 0x3fe00000)
>> +      && (off & 0x3fe00000) != 0x3fe00000)
>> +    return;
> 
> I think this is a little clearer if you write the explicit != 0.
> 
>   if ((off & 0x3fe00000) != 0
>       && (off & 0x3fe00000) != 0x3fe00000)

Ok.

> This is OK with those changes.

All done and committed, thanks Ian.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-04-24 22:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-23  9:34 [PATCH] gold: Add linker relaxation of tail calls on sparc David Miller
2012-04-24 21:45 ` David Miller
2012-04-24 22:40 ` Ian Lance Taylor
2012-04-25  8:04   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).