public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
@ 2022-08-31 13:22 Xi Ruoyao
  2022-08-31 13:41 ` Xi Ruoyao
  2022-09-01  1:27 ` liuzhensong
  0 siblings, 2 replies; 12+ messages in thread
From: Xi Ruoyao @ 2022-08-31 13:22 UTC (permalink / raw)
  To: binutils; +Cc: liuzhensong, WANG Xuerui, caiyinyu, Xi Ruoyao

We'd like to use PC-relative addressing instead of GOT for external
symbols in main executable images.  This can improve code locality.
Doing so will need to implement copy relocation.

Despite there was a comment saying "Glibc does not support copy
relocation yet", I've run a very simple test and it seems copy
relocation is handled by ld.so without a problem:

	pcalau12i       $a1, %pc_hi20(stdout)
	ld.d            $a1, $a1, %pc_lo12(stdout)
	pcalau12i       $a0, %pc_hi20(msg)
	addi.d          $a0, $a0, %pc_lo12(msg)
	pcalau12i       $ra, %pc_hi20(fputs)
	jirl            $ra, $ra, %pc_lo12(fputs)

With this patch, R_LARCH_COPY is correctly emitted for "stdout" in
Glibc, and the test program runs and outputs "Hello world" in "msg"
successfully.
---
 bfd/elfnn-loongarch.c | 93 ++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 84 insertions(+), 9 deletions(-)

diff --git a/bfd/elfnn-loongarch.c b/bfd/elfnn-loongarch.c
index ed42b8b6770..0cf24bd24d8 100644
--- a/bfd/elfnn-loongarch.c
+++ b/bfd/elfnn-loongarch.c
@@ -743,14 +743,26 @@ loongarch_elf_check_relocs (bfd *abfd, struct bfd_link_info *info,
 	    h->non_got_ref = 1;
 	  break;
 
-	case R_LARCH_PCALA_HI20:
+	case R_LARCH_PCALA_LO12:
 	  if (h != NULL)
 	    {
-	      /* For pcalau12i + jirl.  */
-	      h->needs_plt = 1;
-	      if (h->plt.refcount < 0)
-		h->plt.refcount = 0;
-	      h->plt.refcount++;
+	      /* Check if it's a jirl instruction.  */
+	      bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
+	      uint32_t insn = 0;
+
+	      if (contents || bfd_malloc_and_get_section (abfd, sec, &contents))
+		memcpy(&insn, contents + rel->r_offset, sizeof(insn));
+
+	      if ((insn & 0xfc000000) == 0x4c000000)
+		{
+		  /* For pcalau12i + jirl.  */
+		  h->needs_plt = 1;
+		  if (h->plt.refcount < 0)
+		    h->plt.refcount = 0;
+		  h->plt.refcount++;
+		}
+	      else
+		need_dynreloc = 1;
 
 	      h->non_got_ref = 1;
 	      h->pointer_equality_needed = 1;
@@ -949,7 +961,9 @@ loongarch_elf_adjust_dynamic_symbol (struct bfd_link_info *info,
 				     struct elf_link_hash_entry *h)
 {
   struct loongarch_elf_link_hash_table *htab;
+  struct loongarch_elf_link_hash_entry *hent;
   bfd *dynobj;
+  asection *s, *srel;
 
   htab = loongarch_elf_hash_table (info);
   BFD_ASSERT (htab != NULL);
@@ -999,9 +1013,52 @@ loongarch_elf_adjust_dynamic_symbol (struct bfd_link_info *info,
       return true;
     }
 
-  /* R_LARCH_COPY is not adept glibc, not to generate.  */
-  /* Can not print anything, because make check ld.  */
-  return true;
+  /* If copy relocations is disabled via -z nocopyreloc, or we don't find any
+     dynamic relocs in read-only sections, avoid the copy reloc.  */
+  if (info->nocopyreloc || !_bfd_elf_readonly_dynrelocs (h))
+    h->non_got_ref = 0;
+
+  if (!h->non_got_ref)
+    return true;
+
+  /* We must allocate the symbol in our .dynbss section, which will
+     become part of the .bss section of the executable.  There will be
+     an entry for this symbol in the .dynsym section.  The dynamic
+     object will contain position independent code, so all references
+     from the dynamic object to this symbol will go through the global
+     offset table.  The dynamic linker will use the .dynsym entry to
+     determine the address it must put in the global offset table, so
+     both the dynamic object and the regular object will refer to the
+     same memory location for the variable.  */
+
+  /* Generate a copy relocation to tell the dynamic linker to copy the
+     initial value out of the dynamic object and into the runtime process
+     image.  We need to remember the offset into the .rel.bss section we
+     are going to use.  */
+  hent = (struct loongarch_elf_link_hash_entry *) h;
+  if (hent->tls_type & ~GOT_NORMAL)
+    {
+      s = htab->sdyntdata;
+      srel = htab->elf.srelbss;
+    }
+  else if (h->root.u.def.section->flags & SEC_READONLY)
+    {
+      s = htab->elf.sdynrelro;
+      srel = htab->elf.sreldynrelro;
+    }
+  else
+    {
+      s = htab->elf.sdynbss;
+      srel = htab->elf.srelbss;
+    }
+
+  if ((h->root.u.def.section->flags & SEC_ALLOC) != 0 && h->size != 0)
+    {
+      srel->size += sizeof (ElfNN_External_Rela);
+      h->needs_copy = 1;
+    }
+
+  return _bfd_elf_adjust_dynamic_copy (info, h, s);
 }
 
 /* Allocate space in .plt, .got and associated reloc sections for
@@ -3702,6 +3759,24 @@ loongarch_elf_finish_dynamic_symbol (bfd *output_bfd,
       loongarch_elf_append_rela (output_bfd, srela, &rela);
     }
 
+  if (h->needs_copy)
+    {
+      Elf_Internal_Rela rela;
+      asection *s;
+
+      /* This symbols needs a copy reloc.  Set it up.  */
+      BFD_ASSERT (h->dynindx != -1);
+
+      rela.r_offset = sec_addr (h->root.u.def.section) + h->root.u.def.value;
+      rela.r_info = ELFNN_R_INFO (h->dynindx, R_LARCH_COPY);
+      rela.r_addend = 0;
+      if (h->root.u.def.section == htab->elf.sdynrelro)
+	s = htab->elf.sreldynrelro;
+      else
+	s = htab->elf.srelbss;
+      loongarch_elf_append_rela (output_bfd, s, &rela);
+    }
+
   /* Mark some specially defined symbols as absolute.  */
   if (h == htab->elf.hdynamic || h == htab->elf.hgot || h == htab->elf.hplt)
     sym->st_shndx = SHN_ABS;
-- 
2.37.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-08-31 13:22 [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol Xi Ruoyao
@ 2022-08-31 13:41 ` Xi Ruoyao
  2022-09-01  1:38   ` liuzhensong
  2022-09-01  1:27 ` liuzhensong
  1 sibling, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2022-08-31 13:41 UTC (permalink / raw)
  To: binutils; +Cc: liuzhensong, WANG Xuerui, caiyinyu

Some self review:

On Wed, 2022-08-31 at 21:22 +0800, Xi Ruoyao wrote:
> -             /* For pcalau12i + jirl.  */
> -             h->needs_plt = 1;
> -             if (h->plt.refcount < 0)
> -               h->plt.refcount = 0;
> -             h->plt.refcount++;
> +             /* Check if it's a jirl instruction.  */
> +             bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
> +             uint32_t insn = 0;
> +
> +             if (contents || bfd_malloc_and_get_section (abfd, sec, &contents))
> +               memcpy(&insn, contents + rel->r_offset, sizeof(insn));
> +
> +             if ((insn & 0xfc000000) == 0x4c000000)

This is tricky.  But in commit 42bd525 we already started to rely on
undocumented %pc_{hi20,lo12} behavior: if you just apply them "as
documented" to the pcalau12i/jirl pairs the result will be absolutely
wrong.  And 42bd525 behavior is not fully correct: if you just write

pcalau12i $t0, %pc_hi20(data)
ld.d, $t0, $t0, %pc_lo12(data)

With 42bd525, the BFD linker generates a "PLT for data" (absolutely
nonsense), and destroys the ld.d instruction.  All of these buggy
behavior happens silently, the user will only find something wrong when
the linked program crashes.

I'm not sure if checking JIRL (i. e. the behavior of a relocation now
depends on the instruction where it's applied) is a good idea.  Maybe we
should use the following instead of pcalau12i/jirl:

pcaddu18i $t0, %b16_hi20(func)
jirl $ra, $t0, %b16(func)

("b16_hi20" is a hypothetical thing here.)  This will make things less
tricky, and also expand the range to 128 GiB.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-08-31 13:22 [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol Xi Ruoyao
  2022-08-31 13:41 ` Xi Ruoyao
@ 2022-09-01  1:27 ` liuzhensong
  2022-09-01  1:27   ` liuzhensong
  2022-09-01  1:41   ` Xi Ruoyao
  1 sibling, 2 replies; 12+ messages in thread
From: liuzhensong @ 2022-09-01  1:27 UTC (permalink / raw)
  To: Xi Ruoyao, binutils; +Cc: WANG Xuerui, chenglulu


在 2022/8/31 下午9:22, Xi Ruoyao 写道:
> We'd like to use PC-relative addressing instead of GOT for external
> symbols in main executable images.  This can improve code locality.
> Doing so will need to implement copy relocation.
>
> Despite there was a comment saying "Glibc does not support copy
> relocation yet", I've run a very simple test and it seems copy
> relocation is handled by ld.so without a problem:
>
> 	pcalau12i       $a1, %pc_hi20(stdout)
> 	ld.d            $a1, $a1, %pc_lo12(stdout)
> 	pcalau12i       $a0, %pc_hi20(msg)
> 	addi.d          $a0, $a0, %pc_lo12(msg)
> 	pcalau12i       $ra, %pc_hi20(fputs)
> 	jirl            $ra, $ra, %pc_lo12(fputs)
>
> With this patch, R_LARCH_COPY is correctly emitted for "stdout" in
> Glibc, and the test program runs and outputs "Hello world" in "msg"
> successfully.
> ---
>   bfd/elfnn-loongarch.c | 93 ++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 84 insertions(+), 9 deletions(-)
>
> diff --git a/bfd/elfnn-loongarch.c b/bfd/elfnn-loongarch.c
> index ed42b8b6770..0cf24bd24d8 100644
> --- a/bfd/elfnn-loongarch.c
> +++ b/bfd/elfnn-loongarch.c
> @@ -743,14 +743,26 @@ loongarch_elf_check_relocs (bfd *abfd, struct bfd_link_info *info,
>   	    h->non_got_ref = 1;
>   	  break;
>   
> -	case R_LARCH_PCALA_HI20:
> +	case R_LARCH_PCALA_LO12:
>   	  if (h != NULL)
>   	    {
> -	      /* For pcalau12i + jirl.  */
> -	      h->needs_plt = 1;
> -	      if (h->plt.refcount < 0)
> -		h->plt.refcount = 0;
> -	      h->plt.refcount++;
> +	      /* Check if it's a jirl instruction.  */
> +	      bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
> +	      uint32_t insn = 0;
> +
> +	      if (contents || bfd_malloc_and_get_section (abfd, sec, &contents))
> +		memcpy(&insn, contents + rel->r_offset, sizeof(insn));
> +
> +	      if ((insn & 0xfc000000) == 0x4c000000)
> +		{
> +		  /* For pcalau12i + jirl.  */
> +		  h->needs_plt = 1;
> +		  if (h->plt.refcount < 0)
> +		    h->plt.refcount = 0;
> +		  h->plt.refcount++;
> +		}
> +	      else
> +		need_dynreloc = 1;
>   
>   	      h->non_got_ref = 1;
>   	      h->pointer_equality_needed = 1;
> @@ -949,7 +961,9 @@ loongarch_elf_adjust_dynamic_symbol (struct bfd_link_info *info,
>   				     struct elf_link_hash_entry *h)
>   {
>     struct loongarch_elf_link_hash_table *htab;
> +  struct loongarch_elf_link_hash_entry *hent;
>     bfd *dynobj;
> +  asection *s, *srel;
>   
>     htab = loongarch_elf_hash_table (info);
>     BFD_ASSERT (htab != NULL);
> @@ -999,9 +1013,52 @@ loongarch_elf_adjust_dynamic_symbol (struct bfd_link_info *info,
>         return true;
>       }
>   
> -  /* R_LARCH_COPY is not adept glibc, not to generate.  */
> -  /* Can not print anything, because make check ld.  */
> -  return true;
> +  /* If copy relocations is disabled via -z nocopyreloc, or we don't find any
> +     dynamic relocs in read-only sections, avoid the copy reloc.  */
> +  if (info->nocopyreloc || !_bfd_elf_readonly_dynrelocs (h))
> +    h->non_got_ref = 0;
> +
> +  if (!h->non_got_ref)
> +    return true;
> +
> +  /* We must allocate the symbol in our .dynbss section, which will
> +     become part of the .bss section of the executable.  There will be
> +     an entry for this symbol in the .dynsym section.  The dynamic
> +     object will contain position independent code, so all references
> +     from the dynamic object to this symbol will go through the global
> +     offset table.  The dynamic linker will use the .dynsym entry to
> +     determine the address it must put in the global offset table, so
> +     both the dynamic object and the regular object will refer to the
> +     same memory location for the variable.  */
> +
> +  /* Generate a copy relocation to tell the dynamic linker to copy the
> +     initial value out of the dynamic object and into the runtime process
> +     image.  We need to remember the offset into the .rel.bss section we
> +     are going to use.  */
> +  hent = (struct loongarch_elf_link_hash_entry *) h;
> +  if (hent->tls_type & ~GOT_NORMAL)
> +    {
> +      s = htab->sdyntdata;
> +      srel = htab->elf.srelbss;
> +    }
> +  else if (h->root.u.def.section->flags & SEC_READONLY)
> +    {
> +      s = htab->elf.sdynrelro;
> +      srel = htab->elf.sreldynrelro;
> +    }
> +  else
> +    {
> +      s = htab->elf.sdynbss;
> +      srel = htab->elf.srelbss;
> +    }
> +
> +  if ((h->root.u.def.section->flags & SEC_ALLOC) != 0 && h->size != 0)
> +    {
> +      srel->size += sizeof (ElfNN_External_Rela);
> +      h->needs_copy = 1;
> +    }
> +
> +  return _bfd_elf_adjust_dynamic_copy (info, h, s);
>   }
>   
>   /* Allocate space in .plt, .got and associated reloc sections for
> @@ -3702,6 +3759,24 @@ loongarch_elf_finish_dynamic_symbol (bfd *output_bfd,
>         loongarch_elf_append_rela (output_bfd, srela, &rela);
>       }
>   
> +  if (h->needs_copy)
> +    {
> +      Elf_Internal_Rela rela;
> +      asection *s;
> +
> +      /* This symbols needs a copy reloc.  Set it up.  */
> +      BFD_ASSERT (h->dynindx != -1);
> +
> +      rela.r_offset = sec_addr (h->root.u.def.section) + h->root.u.def.value;
> +      rela.r_info = ELFNN_R_INFO (h->dynindx, R_LARCH_COPY);
> +      rela.r_addend = 0;
> +      if (h->root.u.def.section == htab->elf.sdynrelro)
> +	s = htab->elf.sreldynrelro;
> +      else
> +	s = htab->elf.srelbss;
> +      loongarch_elf_append_rela (output_bfd, s, &rela);
> +    }
> +
>     /* Mark some specially defined symbols as absolute.  */
>     if (h == htab->elf.hdynamic || h == htab->elf.hgot || h == htab->elf.hplt)
>       sym->st_shndx = SHN_ABS;

The R_LARCH_COPY is supported in older versions,we removed this feature 
and are not going to support.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-09-01  1:27 ` liuzhensong
@ 2022-09-01  1:27   ` liuzhensong
  2022-09-01  1:41   ` Xi Ruoyao
  1 sibling, 0 replies; 12+ messages in thread
From: liuzhensong @ 2022-09-01  1:27 UTC (permalink / raw)
  To: Xi Ruoyao, binutils; +Cc: WANG Xuerui, chenglulu

[-- Attachment #1: Type: text/plain, Size: 5715 bytes --]


在 2022/8/31 下午9:22, Xi Ruoyao 写道:
> We'd like to use PC-relative addressing instead of GOT for external
> symbols in main executable images.  This can improve code locality.
> Doing so will need to implement copy relocation.
>
> Despite there was a comment saying "Glibc does not support copy
> relocation yet", I've run a very simple test and it seems copy
> relocation is handled by ld.so without a problem:
>
> 	pcalau12i       $a1, %pc_hi20(stdout)
> 	ld.d            $a1, $a1, %pc_lo12(stdout)
> 	pcalau12i       $a0, %pc_hi20(msg)
> 	addi.d          $a0, $a0, %pc_lo12(msg)
> 	pcalau12i       $ra, %pc_hi20(fputs)
> 	jirl            $ra, $ra, %pc_lo12(fputs)
>
> With this patch, R_LARCH_COPY is correctly emitted for "stdout" in
> Glibc, and the test program runs and outputs "Hello world" in "msg"
> successfully.
> ---
>   bfd/elfnn-loongarch.c | 93 ++++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 84 insertions(+), 9 deletions(-)
>
> diff --git a/bfd/elfnn-loongarch.c b/bfd/elfnn-loongarch.c
> index ed42b8b6770..0cf24bd24d8 100644
> --- a/bfd/elfnn-loongarch.c
> +++ b/bfd/elfnn-loongarch.c
> @@ -743,14 +743,26 @@ loongarch_elf_check_relocs (bfd *abfd, struct bfd_link_info *info,
>   	    h->non_got_ref = 1;
>   	  break;
>   
> -	case R_LARCH_PCALA_HI20:
> +	case R_LARCH_PCALA_LO12:
>   	  if (h != NULL)
>   	    {
> -	      /* For pcalau12i + jirl.  */
> -	      h->needs_plt = 1;
> -	      if (h->plt.refcount < 0)
> -		h->plt.refcount = 0;
> -	      h->plt.refcount++;
> +	      /* Check if it's a jirl instruction.  */
> +	      bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
> +	      uint32_t insn = 0;
> +
> +	      if (contents || bfd_malloc_and_get_section (abfd, sec, &contents))
> +		memcpy(&insn, contents + rel->r_offset, sizeof(insn));
> +
> +	      if ((insn & 0xfc000000) == 0x4c000000)
> +		{
> +		  /* For pcalau12i + jirl.  */
> +		  h->needs_plt = 1;
> +		  if (h->plt.refcount < 0)
> +		    h->plt.refcount = 0;
> +		  h->plt.refcount++;
> +		}
> +	      else
> +		need_dynreloc = 1;
>   
>   	      h->non_got_ref = 1;
>   	      h->pointer_equality_needed = 1;
> @@ -949,7 +961,9 @@ loongarch_elf_adjust_dynamic_symbol (struct bfd_link_info *info,
>   				     struct elf_link_hash_entry *h)
>   {
>     struct loongarch_elf_link_hash_table *htab;
> +  struct loongarch_elf_link_hash_entry *hent;
>     bfd *dynobj;
> +  asection *s, *srel;
>   
>     htab = loongarch_elf_hash_table (info);
>     BFD_ASSERT (htab != NULL);
> @@ -999,9 +1013,52 @@ loongarch_elf_adjust_dynamic_symbol (struct bfd_link_info *info,
>         return true;
>       }
>   
> -  /* R_LARCH_COPY is not adept glibc, not to generate.  */
> -  /* Can not print anything, because make check ld.  */
> -  return true;
> +  /* If copy relocations is disabled via -z nocopyreloc, or we don't find any
> +     dynamic relocs in read-only sections, avoid the copy reloc.  */
> +  if (info->nocopyreloc || !_bfd_elf_readonly_dynrelocs (h))
> +    h->non_got_ref = 0;
> +
> +  if (!h->non_got_ref)
> +    return true;
> +
> +  /* We must allocate the symbol in our .dynbss section, which will
> +     become part of the .bss section of the executable.  There will be
> +     an entry for this symbol in the .dynsym section.  The dynamic
> +     object will contain position independent code, so all references
> +     from the dynamic object to this symbol will go through the global
> +     offset table.  The dynamic linker will use the .dynsym entry to
> +     determine the address it must put in the global offset table, so
> +     both the dynamic object and the regular object will refer to the
> +     same memory location for the variable.  */
> +
> +  /* Generate a copy relocation to tell the dynamic linker to copy the
> +     initial value out of the dynamic object and into the runtime process
> +     image.  We need to remember the offset into the .rel.bss section we
> +     are going to use.  */
> +  hent = (struct loongarch_elf_link_hash_entry *) h;
> +  if (hent->tls_type & ~GOT_NORMAL)
> +    {
> +      s = htab->sdyntdata;
> +      srel = htab->elf.srelbss;
> +    }
> +  else if (h->root.u.def.section->flags & SEC_READONLY)
> +    {
> +      s = htab->elf.sdynrelro;
> +      srel = htab->elf.sreldynrelro;
> +    }
> +  else
> +    {
> +      s = htab->elf.sdynbss;
> +      srel = htab->elf.srelbss;
> +    }
> +
> +  if ((h->root.u.def.section->flags & SEC_ALLOC) != 0 && h->size != 0)
> +    {
> +      srel->size += sizeof (ElfNN_External_Rela);
> +      h->needs_copy = 1;
> +    }
> +
> +  return _bfd_elf_adjust_dynamic_copy (info, h, s);
>   }
>   
>   /* Allocate space in .plt, .got and associated reloc sections for
> @@ -3702,6 +3759,24 @@ loongarch_elf_finish_dynamic_symbol (bfd *output_bfd,
>         loongarch_elf_append_rela (output_bfd, srela, &rela);
>       }
>   
> +  if (h->needs_copy)
> +    {
> +      Elf_Internal_Rela rela;
> +      asection *s;
> +
> +      /* This symbols needs a copy reloc.  Set it up.  */
> +      BFD_ASSERT (h->dynindx != -1);
> +
> +      rela.r_offset = sec_addr (h->root.u.def.section) + h->root.u.def.value;
> +      rela.r_info = ELFNN_R_INFO (h->dynindx, R_LARCH_COPY);
> +      rela.r_addend = 0;
> +      if (h->root.u.def.section == htab->elf.sdynrelro)
> +	s = htab->elf.sreldynrelro;
> +      else
> +	s = htab->elf.srelbss;
> +      loongarch_elf_append_rela (output_bfd, s, &rela);
> +    }
> +
>     /* Mark some specially defined symbols as absolute.  */
>     if (h == htab->elf.hdynamic || h == htab->elf.hgot || h == htab->elf.hplt)
>       sym->st_shndx = SHN_ABS;

The R_LARCH_COPY is supported in older versions,we removed this feature 
and are not going to support.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-08-31 13:41 ` Xi Ruoyao
@ 2022-09-01  1:38   ` liuzhensong
  2022-09-01  1:38     ` liuzhensong
  2022-09-01  2:12     ` Xi Ruoyao
  0 siblings, 2 replies; 12+ messages in thread
From: liuzhensong @ 2022-09-01  1:38 UTC (permalink / raw)
  To: Xi Ruoyao, binutils; +Cc: WANG Xuerui, chenglulu


在 2022/8/31 下午9:41, Xi Ruoyao 写道:
> Some self review:
>
> On Wed, 2022-08-31 at 21:22 +0800, Xi Ruoyao wrote:
>> -             /* For pcalau12i + jirl.  */
>> -             h->needs_plt = 1;
>> -             if (h->plt.refcount < 0)
>> -               h->plt.refcount = 0;
>> -             h->plt.refcount++;
>> +             /* Check if it's a jirl instruction.  */
>> +             bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
>> +             uint32_t insn = 0;
>> +
>> +             if (contents || bfd_malloc_and_get_section (abfd, sec, &contents))
>> +               memcpy(&insn, contents + rel->r_offset, sizeof(insn));
>> +
>> +             if ((insn & 0xfc000000) == 0x4c000000)
> This is tricky.  But in commit 42bd525 we already started to rely on
> undocumented %pc_{hi20,lo12} behavior: if you just apply them "as
> documented" to the pcalau12i/jirl pairs the result will be absolutely
> wrong.  And 42bd525 behavior is not fully correct: if you just write
>
> pcalau12i $t0, %pc_hi20(data)
> ld.d, $t0, $t0, %pc_lo12(data)

Do you have a test case?

>
> With 42bd525, the BFD linker generates a "PLT for data" (absolutely
> nonsense), and destroys the ld.d instruction.  All of these buggy
> behavior happens silently, the user will only find something wrong when
> the linked program crashes.
>
> I'm not sure if checking JIRL (i. e. the behavior of a relocation now
> depends on the instruction where it's applied) is a good idea.  Maybe we
> should use the following instead of pcalau12i/jirl:
>
> pcaddu18i $t0, %b16_hi20(func)
> jirl $ra, $t0, %b16(func)
>
> ("b16_hi20" is a hypothetical thing here.)  This will make things less
> tricky, and also expand the range to 128 GiB.

It doesn't make sense for only "pcaddu18i + jirl" to access 128G.What we 
need is a jump that can access ±2G, just like any other pc-relative instructions can access ±2G.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-09-01  1:38   ` liuzhensong
@ 2022-09-01  1:38     ` liuzhensong
  2022-09-01  2:12     ` Xi Ruoyao
  1 sibling, 0 replies; 12+ messages in thread
From: liuzhensong @ 2022-09-01  1:38 UTC (permalink / raw)
  To: Xi Ruoyao, binutils; +Cc: WANG Xuerui, chenglulu

[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]


在 2022/8/31 下午9:41, Xi Ruoyao 写道:
> Some self review:
>
> On Wed, 2022-08-31 at 21:22 +0800, Xi Ruoyao wrote:
>> -             /* For pcalau12i + jirl.  */
>> -             h->needs_plt = 1;
>> -             if (h->plt.refcount < 0)
>> -               h->plt.refcount = 0;
>> -             h->plt.refcount++;
>> +             /* Check if it's a jirl instruction.  */
>> +             bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
>> +             uint32_t insn = 0;
>> +
>> +             if (contents || bfd_malloc_and_get_section (abfd, sec, &contents))
>> +               memcpy(&insn, contents + rel->r_offset, sizeof(insn));
>> +
>> +             if ((insn & 0xfc000000) == 0x4c000000)
> This is tricky.  But in commit 42bd525 we already started to rely on
> undocumented %pc_{hi20,lo12} behavior: if you just apply them "as
> documented" to the pcalau12i/jirl pairs the result will be absolutely
> wrong.  And 42bd525 behavior is not fully correct: if you just write
>
> pcalau12i $t0, %pc_hi20(data)
> ld.d, $t0, $t0, %pc_lo12(data)

Do you have a test case?

>
> With 42bd525, the BFD linker generates a "PLT for data" (absolutely
> nonsense), and destroys the ld.d instruction.  All of these buggy
> behavior happens silently, the user will only find something wrong when
> the linked program crashes.
>
> I'm not sure if checking JIRL (i. e. the behavior of a relocation now
> depends on the instruction where it's applied) is a good idea.  Maybe we
> should use the following instead of pcalau12i/jirl:
>
> pcaddu18i $t0, %b16_hi20(func)
> jirl $ra, $t0, %b16(func)
>
> ("b16_hi20" is a hypothetical thing here.)  This will make things less
> tricky, and also expand the range to 128 GiB.

It doesn't make sense for only "pcaddu18i + jirl" to access 128G.What we 
need is a jump that can access ±2G, just like any other pc-relative instructions can access ±2G.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-09-01  1:27 ` liuzhensong
  2022-09-01  1:27   ` liuzhensong
@ 2022-09-01  1:41   ` Xi Ruoyao
  2022-09-01  2:09     ` liuzhensong
  2022-09-01  7:42     ` Fangrui Song
  1 sibling, 2 replies; 12+ messages in thread
From: Xi Ruoyao @ 2022-09-01  1:41 UTC (permalink / raw)
  To: liuzhensong, binutils; +Cc: WANG Xuerui, chenglulu

On Thu, 2022-09-01 at 09:27 +0800, liuzhensong wrote:

> The R_LARCH_COPY is supported in older versions, we removed this
> feature and are not going to support.

Hmm, then for

extern int x;
int f() { return x; }

it will produce a GOT access.  If x is in another TU but not in a shared
library, such a GOT access and the GOT entry is unneeded.

This will be really a pain for situations like building an OS kernel, or
with -static or -static-pie.  Even for this simple example, there is no
way to tell "x" is not in a shared library (maybe, expect LTO or linker
relaxation, but IIRC LTO or relaxation do not *guarantee* to avoid the
GOT, just "try to" avoid the GOT).

So is there any serious reason forbidding us from using R_LARCH_COPY? Or
any options for this "unneeded GOT" issue?
-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-09-01  1:41   ` Xi Ruoyao
@ 2022-09-01  2:09     ` liuzhensong
  2022-09-01  7:42     ` Fangrui Song
  1 sibling, 0 replies; 12+ messages in thread
From: liuzhensong @ 2022-09-01  2:09 UTC (permalink / raw)
  To: Xi Ruoyao, binutils; +Cc: WANG Xuerui, chenglulu


在 2022/9/1 上午9:41, Xi Ruoyao 写道:
> On Thu, 2022-09-01 at 09:27 +0800, liuzhensong wrote:
>
>> The R_LARCH_COPY is supported in older versions, we removed this
>> feature and are not going to support.
> Hmm, then for
>
> extern int x;
> int f() { return x; }
>
> it will produce a GOT access.  If x is in another TU but not in a shared
> library, such a GOT access and the GOT entry is unneeded.
>
> This will be really a pain for situations like building an OS kernel, or
> with -static or -static-pie.  Even for this simple example, there is no
> way to tell "x" is not in a shared library (maybe, expect LTO or linker
> relaxation, but IIRC LTO or relaxation do not *guarantee* to avoid the
> GOT, just "try to" avoid the GOT).
>
> So is there any serious reason forbidding us from using R_LARCH_COPY? Or
> any options for this "unneeded GOT" issue?
https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-09-01  1:38   ` liuzhensong
  2022-09-01  1:38     ` liuzhensong
@ 2022-09-01  2:12     ` Xi Ruoyao
  2022-09-01  2:31       ` liuzhensong
  1 sibling, 1 reply; 12+ messages in thread
From: Xi Ruoyao @ 2022-09-01  2:12 UTC (permalink / raw)
  To: liuzhensong, binutils; +Cc: WANG Xuerui, chenglulu

On Thu, 2022-09-01 at 09:38 +0800, liuzhensong wrote:
> > But in commit 42bd525 we already started to rely on
> > undocumented %pc_{hi20,lo12} behavior: if you just apply them "as
> > documented" to the pcalau12i/jirl pairs the result will be
> > absolutely
> > wrong.  And 42bd525 behavior is not fully correct: if you just write
> > 
> > pcalau12i $t0, %pc_hi20(data)
> > ld.d, $t0, $t0, %pc_lo12(data)

> Do you have a test case?

$ cat t2.s   
.text
.align 2
.type x, @function
.global x
x:
	pcalau12i	$a0, %pc_hi20(data)
	ld.d		$a0, $a0, %pc_lo12(data)
	jr		$ra
$ gcc t2.s -c
$ ./ld/ld-new t2.o -shared
$ objdump -d | grep -A50 data
0000000000000210 <data@plt>:
 210:	1c00010f 	pcaddu12i   	$t3, 8(0x8)
 214:	28f801ef 	ld.d        	$t3, $t3, -512(0xe00)
 218:	4c0001ed 	jirl        	$t1, $t3, 0
 21c:	03400000 	andi        	$zero, $zero, 0x0

Disassembly of section .text:

0000000000000220 <x>:
 220:	1a000004 	pcalau12i   	$a0, 0
 224:	28c84084 	ld.d        	$a0, $a0, 528(0x210)
 228:	4c000020 	jirl        	$zero, $ra, 0

i.e.  Instead of reporting an error like "cannot create a runtime
relocation against external symbol 'data'", the linker silently produces
a PLT (nonsense: can you use a PLT for data?) and load two instructions
from the PLT into the register (also nonsense).  So if someone mistypes
"la.local" where "la.global" should be used (it's just a simple
programming mistake, and it's likely to happen in the practice!), the
linking will "succeeds" silently.  Then the program blows up at runtime.

> It doesn't make sense for only "pcaddu18i + jirl" to access 128G.
> What we need is a jump that can access ±2G, just like any other pc-
> relative instructions can access ±2G.

The point is, if we interpret %pc_lo12 "as it's documented":

    "(*(uint32_t *) PC) [21 ... 10] = (S+A) [11 ... 0]"

it will be absolutely wrong for a jirl instruction.  You may update the
doc to say something like "if R_LARCH_PCALA_LO12 is applied to a jirl
instruction, a PLT entry will be created and blah blah". But again I'm
not sure about if "the behavior of a relocation depends on the
instruction for which it's applied" is a good idea.

<rant>We are using highly imprecise descriptions for PCALA-style
relocations in ELF psABI, despite I've disagreed in the review.  Now if
someone wants to know "how this relocation will *really* behave", he/she
will need to read BFD code.  PCALAU12I instruction itself is already
puzzling enough (comparing with PCADDU12I, which behaves more "normal"),
now the doc just makes it more puzzling.</rant>
-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-09-01  2:12     ` Xi Ruoyao
@ 2022-09-01  2:31       ` liuzhensong
  2022-09-01  2:31         ` liuzhensong
  0 siblings, 1 reply; 12+ messages in thread
From: liuzhensong @ 2022-09-01  2:31 UTC (permalink / raw)
  To: Xi Ruoyao, binutils; +Cc: WANG Xuerui, chenglulu


在 2022/9/1 上午10:12, Xi Ruoyao 写道:
> On Thu, 2022-09-01 at 09:38 +0800, liuzhensong wrote:
>>> But in commit 42bd525 we already started to rely on
>>> undocumented %pc_{hi20,lo12} behavior: if you just apply them "as
>>> documented" to the pcalau12i/jirl pairs the result will be
>>> absolutely
>>> wrong.  And 42bd525 behavior is not fully correct: if you just write
>>>
>>> pcalau12i $t0, %pc_hi20(data)
>>> ld.d, $t0, $t0, %pc_lo12(data)
>> Do you have a test case?
> $ cat t2.s
> .text
> .align 2
> .type x, @function
> .global x
> x:
> 	pcalau12i	$a0, %pc_hi20(data)
> 	ld.d		$a0, $a0, %pc_lo12(data)
> 	jr		$ra
> $ gcc t2.s -c
> $ ./ld/ld-new t2.o -shared
> $ objdump -d | grep -A50 data
> 0000000000000210 <data@plt>:
>   210:	1c00010f 	pcaddu12i   	$t3, 8(0x8)
>   214:	28f801ef 	ld.d        	$t3, $t3, -512(0xe00)
>   218:	4c0001ed 	jirl        	$t1, $t3, 0
>   21c:	03400000 	andi        	$zero, $zero, 0x0
>
> Disassembly of section .text:
>
> 0000000000000220 <x>:
>   220:	1a000004 	pcalau12i   	$a0, 0
>   224:	28c84084 	ld.d        	$a0, $a0, 528(0x210)
>   228:	4c000020 	jirl        	$zero, $ra, 0
>
> i.e.  Instead of reporting an error like "cannot create a runtime
> relocation against external symbol 'data'", the linker silently produces
> a PLT (nonsense: can you use a PLT for data?) and load two instructions
> from the PLT into the register (also nonsense).  So if someone mistypes
> "la.local" where "la.global" should be used (it's just a simple
> programming mistake, and it's likely to happen in the practice!), the
> linking will "succeeds" silently.  Then the program blows up at runtime.
This can be fixed as a bug.
>> It doesn't make sense for only "pcaddu18i + jirl" to access 128G.
>> What we need is a jump that can access ±2G, just like any other pc-
>> relative instructions can access ±2G.
> The point is, if we interpret %pc_lo12 "as it's documented":
>
>      "(*(uint32_t *) PC) [21 ... 10] = (S+A) [11 ... 0]"
>
> it will be absolutely wrong for a jirl instruction.  You may update the
> doc to say something like "if R_LARCH_PCALA_LO12 is applied to a jirl
> instruction, a PLT entry will be created and blah blah". But again I'm
> not sure about if "the behavior of a relocation depends on the
> instruction for which it's applied" is a good idea.
>
> <rant>We are using highly imprecise descriptions for PCALA-style
> relocations in ELF psABI, despite I've disagreed in the review.  Now if
> someone wants to know "how this relocation will *really* behave", he/she
> will need to read BFD code.  PCALAU12I instruction itself is already
> puzzling enough (comparing with PCADDU12I, which behaves more "normal"),
> now the doc just makes it more puzzling.</rant>
pcalau12i makes it easier to access 4k starting addresses, and pcaddu12i 
need more info in relocation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-09-01  2:31       ` liuzhensong
@ 2022-09-01  2:31         ` liuzhensong
  0 siblings, 0 replies; 12+ messages in thread
From: liuzhensong @ 2022-09-01  2:31 UTC (permalink / raw)
  To: Xi Ruoyao, binutils; +Cc: WANG Xuerui, chenglulu

[-- Attachment #1: Type: text/plain, Size: 2824 bytes --]


在 2022/9/1 上午10:12, Xi Ruoyao 写道:
> On Thu, 2022-09-01 at 09:38 +0800, liuzhensong wrote:
>>> But in commit 42bd525 we already started to rely on
>>> undocumented %pc_{hi20,lo12} behavior: if you just apply them "as
>>> documented" to the pcalau12i/jirl pairs the result will be
>>> absolutely
>>> wrong.  And 42bd525 behavior is not fully correct: if you just write
>>>
>>> pcalau12i $t0, %pc_hi20(data)
>>> ld.d, $t0, $t0, %pc_lo12(data)
>> Do you have a test case?
> $ cat t2.s
> .text
> .align 2
> .type x, @function
> .global x
> x:
> 	pcalau12i	$a0, %pc_hi20(data)
> 	ld.d		$a0, $a0, %pc_lo12(data)
> 	jr		$ra
> $ gcc t2.s -c
> $ ./ld/ld-new t2.o -shared
> $ objdump -d | grep -A50 data
> 0000000000000210 <data@plt>:
>   210:	1c00010f 	pcaddu12i   	$t3, 8(0x8)
>   214:	28f801ef 	ld.d        	$t3, $t3, -512(0xe00)
>   218:	4c0001ed 	jirl        	$t1, $t3, 0
>   21c:	03400000 	andi        	$zero, $zero, 0x0
>
> Disassembly of section .text:
>
> 0000000000000220 <x>:
>   220:	1a000004 	pcalau12i   	$a0, 0
>   224:	28c84084 	ld.d        	$a0, $a0, 528(0x210)
>   228:	4c000020 	jirl        	$zero, $ra, 0
>
> i.e.  Instead of reporting an error like "cannot create a runtime
> relocation against external symbol 'data'", the linker silently produces
> a PLT (nonsense: can you use a PLT for data?) and load two instructions
> from the PLT into the register (also nonsense).  So if someone mistypes
> "la.local" where "la.global" should be used (it's just a simple
> programming mistake, and it's likely to happen in the practice!), the
> linking will "succeeds" silently.  Then the program blows up at runtime.
This can be fixed as a bug.
>> It doesn't make sense for only "pcaddu18i + jirl" to access 128G.
>> What we need is a jump that can access ±2G, just like any other pc-
>> relative instructions can access ±2G.
> The point is, if we interpret %pc_lo12 "as it's documented":
>
>      "(*(uint32_t *) PC) [21 ... 10] = (S+A) [11 ... 0]"
>
> it will be absolutely wrong for a jirl instruction.  You may update the
> doc to say something like "if R_LARCH_PCALA_LO12 is applied to a jirl
> instruction, a PLT entry will be created and blah blah". But again I'm
> not sure about if "the behavior of a relocation depends on the
> instruction for which it's applied" is a good idea.
>
> <rant>We are using highly imprecise descriptions for PCALA-style
> relocations in ELF psABI, despite I've disagreed in the review.  Now if
> someone wants to know "how this relocation will *really* behave", he/she
> will need to read BFD code.  PCALAU12I instruction itself is already
> puzzling enough (comparing with PCADDU12I, which behaves more "normal"),
> now the doc just makes it more puzzling.</rant>
pcalau12i makes it easier to access 4k starting addresses, and pcaddu12i 
need more info in relocation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol
  2022-09-01  1:41   ` Xi Ruoyao
  2022-09-01  2:09     ` liuzhensong
@ 2022-09-01  7:42     ` Fangrui Song
  1 sibling, 0 replies; 12+ messages in thread
From: Fangrui Song @ 2022-09-01  7:42 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: liuzhensong, binutils, chenglulu

On 2022-09-01, Xi Ruoyao via Binutils wrote:
>On Thu, 2022-09-01 at 09:27 +0800, liuzhensong wrote:
>
>> The R_LARCH_COPY is supported in older versions, we removed this
>> feature and are not going to support.
>
>Hmm, then for
>
>extern int x;
>int f() { return x; }
>
>it will produce a GOT access.  If x is in another TU but not in a shared
>library, such a GOT access and the GOT entry is unneeded.
>
>This will be really a pain for situations like building an OS kernel, or
>with -static or -static-pie.  Even for this simple example, there is no
>way to tell "x" is not in a shared library (maybe, expect LTO or linker
>relaxation, but IIRC LTO or relaxation do not *guarantee* to avoid the
>GOT, just "try to" avoid the GOT).
>
>So is there any serious reason forbidding us from using R_LARCH_COPY? Or
>any options for this "unneeded GOT" issue?
>-- 
>Xi Ruoyao <xry111@xry111.site>
>School of Aerospace Science and Technology, Xidian University

There is a slow trend to eliminate copy relocations for ELF platforms.
If a new port doesn't have copy relocations, leave it.

An architecture can add GOT indirection to PC-relative addressing
optimization if desired.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-09-01  7:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31 13:22 [PATCH] LoongArch: Use copy relocation for %pc_lo12 against external symbol Xi Ruoyao
2022-08-31 13:41 ` Xi Ruoyao
2022-09-01  1:38   ` liuzhensong
2022-09-01  1:38     ` liuzhensong
2022-09-01  2:12     ` Xi Ruoyao
2022-09-01  2:31       ` liuzhensong
2022-09-01  2:31         ` liuzhensong
2022-09-01  1:27 ` liuzhensong
2022-09-01  1:27   ` liuzhensong
2022-09-01  1:41   ` Xi Ruoyao
2022-09-01  2:09     ` liuzhensong
2022-09-01  7:42     ` Fangrui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).