[PATCH] pe/coff - add support for base64 encoded long section names

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] pe/coff - add support for base64 encoded long section names
@ 2023-05-11  5:28 Tristan Gingold
  2023-05-17 13:30 ` Nick Clifton
  0 siblings, 1 reply; 9+ messages in thread
From: Tristan Gingold @ 2023-05-11  5:28 UTC (permalink / raw)
  To: binutils

Hello,

to overcome the 8 characters limit of COFF section names, PE has introduced a convention to put long section names in the strtab and use special section names '/nnnnnnn' when this convention is used.

LLVM has added another convention for very large strtab, using '//xxxxxx' names and base64 encoding of the index in the strtab.

For the LLVM implementation, see:
https://github.com/llvm/llvm-project/blob/6311ab21474a0f3e0340515185cd1d6e33a9892a/llvm/lib/BinaryFormat/COFF.cpp#L20

The object files generated by LLVM using this convention cannot be handled by binutils and this can result in weird errors like:

objdump: ../../unisim_retarget_VCOMP.o: warning: COMDAT symbol '.rdata$.refptr.ieee__numeric_std__ELABORATED' does not match section name '//AA1GKj'
objdump: ../../unisim_retarget_VCOMP.o: warning: COMDAT symbol '.rdata$.refptr.ieee__std_logic_1164__ELABORATED' does not match section name '//AA1GLQ'

This patch adds support in bfd to correctly decode those names.
There is no real need to support encoding as the section names are usually put first in the strtab (issues could happen only if there are a huge number of sections with very long names).

Mostly manually checked, no regressions with 'make check' (when configured for x86_64-linux + mingw64 target enabled).

It has been a while since I haven't submitted a patch, I hope I am correctly following the procedure!

Tristan.

bfd/
	* coffgen.c (extract_long_section_name): New function extracted
	from ...
	(make_a_section_from_file): ... here.  Add support for base64
	long section names.


diff --git a/bfd/coffgen.c b/bfd/coffgen.c
index ac936def566..c7ccfae59ed 100644
--- a/bfd/coffgen.c
+++ b/bfd/coffgen.c
@@ -43,6 +43,29 @@
#include "coff/internal.h"
#include "libcoff.h"

+/* Extract a long section name at STRINDEX and copy it to the bfd objstack.
+   Return NULL in case of error.  */
+
+static char *
+extract_long_section_name(bfd *abfd, unsigned long strindex)
+{
+  const char *strings;
+  char *name;
+
+  strings = _bfd_coff_read_string_table (abfd);
+  if (strings == NULL)
+    return NULL;
+  if ((bfd_size_type)(strindex + 2) >= obj_coff_strings_len (abfd))
+    return NULL;
+  strings += strindex;
+  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 1 + 1);
+  if (name == NULL)
+    return NULL;
+  strcpy (name, strings);
+
+  return name;
+}
+
/* Take a section header read from a coff file (in HOST byte order),
   and make a BFD "section" out of it.  This is used by ECOFF.  */

@@ -67,32 +90,62 @@ make_a_section_from_file (bfd *abfd,
  if (bfd_coff_set_long_section_names (abfd, bfd_coff_long_section_names (abfd))
      && hdr->s_name[0] == '/')
    {
-      char buf[SCNNMLEN];
-      long strindex;
-      char *p;
-      const char *strings;
-
      /* Flag that this BFD uses long names, even though the format might
	 expect them to be off by default.  This won't directly affect the
	 format of any output BFD created from this one, but the information
	 can be used to decide what to do.  */
      bfd_coff_set_long_section_names (abfd, true);
-      memcpy (buf, hdr->s_name + 1, SCNNMLEN - 1);
-      buf[SCNNMLEN - 1] = '\0';
-      strindex = strtol (buf, &p, 10);
-      if (*p == '\0' && strindex >= 0)
+
+      if (hdr->s_name[1] == '/')
	{
-	  strings = _bfd_coff_read_string_table (abfd);
-	  if (strings == NULL)
-	    return false;
-	  if ((bfd_size_type)(strindex + 2) >= obj_coff_strings_len (abfd))
-	    return false;
-	  strings += strindex;
-	  name = (char *) bfd_alloc (abfd,
-				     (bfd_size_type) strlen (strings) + 1 + 1);
+	  /* LLVM extension: the '/' is followed by another '/' and then by
+	     the index in the strtab encoded in base64 without NUL at the
+	     end.  */
+	  unsigned strindex;
+	  unsigned i;
+
+	  strindex = 0;
+	  for (i = 2; i < SCNNMLEN; i++)
+	    {
+	      char c = hdr->s_name[i];
+	      unsigned d;
+
+	      if (c >= 'A' && c <= 'Z')
+		d = c - 'A';
+	      else if (c >= 'a' && c <= 'z')
+		d = c - 'a' + 26;
+	      else if (c >= '0' && c <= '9')
+		d = c - '0' + 52;
+	      else if (c == '+')
+		d = 62;
+	      else if (c == '/')
+		d = 63;
+	      else
+		return false;
+	      strindex = (strindex << 6) + d;
+	    }
+
+	  name = extract_long_section_name (abfd, strindex);
	  if (name == NULL)
	    return false;
-	  strcpy (name, strings);
+	}
+      else
+	{
+	  /* PE classic long section name.  The '/' is followed by the index
+	     in the strtab.  The index is formatted as a decimal string.  */
+	  char buf[SCNNMLEN];
+	  long strindex;
+	  char *p;
+
+	  memcpy (buf, hdr->s_name + 1, SCNNMLEN - 1);
+	  buf[SCNNMLEN - 1] = '\0';
+	  strindex = strtol (buf, &p, 10);
+	  if (*p == '\0' && strindex >= 0)
+	    {
+	      name = extract_long_section_name (abfd, strindex);
+	      if (name == NULL)
+		return false;
+	    }
	}
    }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] pe/coff - add support for base64 encoded long section names
  2023-05-11  5:28 [PATCH] pe/coff - add support for base64 encoded long section names Tristan Gingold
@ 2023-05-17 13:30 ` Nick Clifton
  2023-05-17 14:14   ` Jose E. Marchesi
  2023-05-21 15:40   ` Tristan Gingold
  0 siblings, 2 replies; 9+ messages in thread
From: Nick Clifton @ 2023-05-17 13:30 UTC (permalink / raw)
  To: Tristan Gingold, binutils

Hi Tristan,

> LLVM has added another convention for very large strtab, using '//xxxxxx' names and base64 encoding of the index in the strtab.

Coincidentally, this has also been reported in PR 30444...

> It has been a while since I haven't submitted a patch, I hope I am correctly following the procedure!

Oh you are. :-)

So - I have a couple of questions on the patch:

> +  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 1 + 1);
> +  if (name == NULL)
> +    return NULL;
> +  strcpy (name, strings);

Why "+1 +1" in the bfd_alloc ?  I understand one of them, but two ?

> +	  unsigned strindex;
> +	  unsigned i;
> +
> +	  strindex = 0;
> +	  for (i = 2; i < SCNNMLEN; i++)
> +	    {
> +	      char c = hdr->s_name[i];
> +	      unsigned d;
> +
> +	      if (c >= 'A' && c <= 'Z')
> +		d = c - 'A';
> +	      else if (c >= 'a' && c <= 'z')
> +		d = c - 'a' + 26;
> +	      else if (c >= '0' && c <= '9')
> +		d = c - '0' + 52;
> +	      else if (c == '+')
> +		d = 62;
> +	      else if (c == '/')
> +		d = 63;
> +	      else
> +		return false;
> +	      strindex = (strindex << 6) + d;
> +	    }

Is "unsigned" the right type for strindex ?  It seems to me that it might
be possible to encode really large numbers using base64.

Also - it might be nice to move this loop into a stand alone function so
that it can be used from other parts of the BFD library.  (Assuming that
the functionality requested in PR 30444 is implemented).

Cheers
   Nick



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] pe/coff - add support for base64 encoded long section names
  2023-05-17 13:30 ` Nick Clifton
@ 2023-05-17 14:14   ` Jose E. Marchesi
  2023-05-21 15:50     ` Tristan Gingold
  2023-05-21 15:40   ` Tristan Gingold
  1 sibling, 1 reply; 9+ messages in thread
From: Jose E. Marchesi @ 2023-05-17 14:14 UTC (permalink / raw)
  To: Nick Clifton via Binutils; +Cc: Tristan Gingold, Nick Clifton


Heh, I had spent a day working on this, due to PR 30444, not noticing
this existing patch :)

I went so far as implementing what Tristan's patch does, and then looked
at why gas is complaining about too big object file when you try to
assemble a file that needs using th enew //BASE64 schema.

In particular, I found that BFD complains about too big object in that
scenario in two places.

First, in coff_write_object_contents (the #if 0 is mine):

@@ -3635,7 +3636,11 @@ coff_write_object_contents (bfd * abfd)
 
 	      /* An inherent limitation of the /nnnnnnn notation used to indicate
 		 the offset of the long name in the string table is that we
-		 cannot address entries beyone the ten million byte boundary.  */
+		 cannot address entries beyone the ten million byte boundary.
+
+                 However, the //BASE64 notation allows addressing
+                 entries up to 0xFFFFFFFF.  */
+#if 0
 	      if (string_size >= 10000000)
 		{
 		  bfd_set_error (bfd_error_file_too_big);
@@ -3645,7 +3650,7 @@ coff_write_object_contents (bfd * abfd)
 		    abfd, current, (unsigned long) string_size);
 		  return false;
 		}
-
+#endif
 	      /* We do not need to use snprintf here as we have already verfied
 		 that string_size is not too big, plus we have an overlarge
 		 buffer, just in case.  */


Then, in coffcode.h coff_compute_section_file_positions, where it checks
bfd_coff_max_nscns, whose default is set to 32768 in the
bfd_coff_std_swap_table also in coffcode.h.

Tristan, you may want to try to assemble the test-gcc.s file that is
attached in PR30444 to reproduce the "file too big error", with your
patch applied.

> Hi Tristan,
>
>> LLVM has added another convention for very large strtab, using
>> '//xxxxxx' names and base64 encoding of the index in the strtab.
>
> Coincidentally, this has also been reported in PR 30444...
>
>> It has been a while since I haven't submitted a patch, I hope I am correctly following the procedure!
>
> Oh you are. :-)
>
> So - I have a couple of questions on the patch:
>
>> +  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 1 + 1);
>> +  if (name == NULL)
>> +    return NULL;
>> +  strcpy (name, strings);
>
> Why "+1 +1" in the bfd_alloc ?  I understand one of them, but two ?
>
>> +	  unsigned strindex;
>> +	  unsigned i;
>> +
>> +	  strindex = 0;
>> +	  for (i = 2; i < SCNNMLEN; i++)
>> +	    {
>> +	      char c = hdr->s_name[i];
>> +	      unsigned d;
>> +
>> +	      if (c >= 'A' && c <= 'Z')
>> +		d = c - 'A';
>> +	      else if (c >= 'a' && c <= 'z')
>> +		d = c - 'a' + 26;
>> +	      else if (c >= '0' && c <= '9')
>> +		d = c - '0' + 52;
>> +	      else if (c == '+')
>> +		d = 62;
>> +	      else if (c == '/')
>> +		d = 63;
>> +	      else
>> +		return false;
>> +	      strindex = (strindex << 6) + d;
>> +	    }
>
> Is "unsigned" the right type for strindex ?  It seems to me that it might
> be possible to encode really large numbers using base64.
>
> Also - it might be nice to move this loop into a stand alone function so
> that it can be used from other parts of the BFD library.  (Assuming that
> the functionality requested in PR 30444 is implemented).
>
> Cheers
>   Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] pe/coff - add support for base64 encoded long section names
  2023-05-17 13:30 ` Nick Clifton
  2023-05-17 14:14   ` Jose E. Marchesi
@ 2023-05-21 15:40   ` Tristan Gingold
  2023-05-22  6:36     ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: Tristan Gingold @ 2023-05-21 15:40 UTC (permalink / raw)
  To: Nick Clifton, binutils

Hello,

On 17/05/2023 15:30, Nick Clifton wrote:
> Hi Tristan,
> 
>> LLVM has added another convention for very large strtab, using 
>> '//xxxxxx' names and base64 encoding of the index in the strtab.
> 
> Coincidentally, this has also been reported in PR 30444...

Not exactly a coincidence, but I will try to address one problem after 
another.

[...]

>> +  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 
>> 1 + 1);
>> +  if (name == NULL)
>> +    return NULL;
>> +  strcpy (name, strings);
> 
> Why "+1 +1" in the bfd_alloc ?  I understand one of them, but two ?

I think I moved the code without re-reading it!  I agree with you and
I fixed that.

[...]
> Is "unsigned" the right type for strindex ?  It seems to me that it might
> be possible to encode really large numbers using base64.

Ok, I have replaced "unsigned" with "uint32_t" to clarify my choice, 
although that doesn't change anything.

Yes, you can encode large numbers with base64, but as the section name 
length in PE/COFF header is 8 bytes, you can only have 6 base64 digits 
(the first two bytes are "//").  So you can encode only 6*6=36 bit.
But, and more important, the string table length is limited to 2^32 bit, 
as the length is written in the first four bytes.  So at the end, the 
index must fit on 32b otherwise it's an error.
I added a comment to explain.

> Also - it might be nice to move this loop into a stand alone function so
> that it can be used from other parts of the BFD library.  (Assuming that
> the functionality requested in PR 30444 is implemented).

Yes, I added the decoding in a separate function.

Thank you for the review, and here is the new patch!

Tristan.

2023-05-21  Tristan Gingold  <tgingold@thinkpad-e15>

	* coffgen.c (extract_long_section_name): New function extracted
	from ...
	(make_a_section_from_file): ... here.  Add support for base64
	long section names.
	(decode_base64): New function.

diff --git a/bfd/coffgen.c b/bfd/coffgen.c
index ac936def566..970e06bf2c0 100644
--- a/bfd/coffgen.c
+++ b/bfd/coffgen.c
@@ -43,6 +43,69 @@
  #include "coff/internal.h"
  #include "libcoff.h"

+/* Extract a long section name at STRINDEX and copy it to the bfd objstack.
+   Return NULL in case of error.  */
+
+static char *
+extract_long_section_name(bfd *abfd, unsigned long strindex)
+{
+  const char *strings;
+  char *name;
+
+  strings = _bfd_coff_read_string_table (abfd);
+  if (strings == NULL)
+    return NULL;
+  if ((bfd_size_type)(strindex + 2) >= obj_coff_strings_len (abfd))
+    return NULL;
+  strings += strindex;
+  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 1);
+  if (name == NULL)
+    return NULL;
+  strcpy (name, strings);
+
+  return name;
+}
+
+/* Decode a base64 coded string at STR of length LEN, and write the result
+   to RES.  Return true on success.
+   Return false in case of invalid character or overflow.  */
+
+static bool
+decode_base64 (const char *str, unsigned len, uint32_t *res)
+{
+  unsigned i;
+  uint32_t val;
+
+  val = 0;
+  for (i = 0; i < len; i++)
+    {
+      char c = str[i];
+      unsigned d;
+
+      if (c >= 'A' && c <= 'Z')
+	d = c - 'A';
+      else if (c >= 'a' && c <= 'z')
+	d = c - 'a' + 26;
+      else if (c >= '0' && c <= '9')
+	d = c - '0' + 52;
+      else if (c == '+')
+	d = 62;
+      else if (c == '/')
+	d = 63;
+      else
+	return false;
+
+      /* Check for overflow. */
+      if ((val >> 26) != 0)
+	return false;
+
+      val = (val << 6) + d;
+    }
+
+  *res = val;
+  return true;
+}
+
  /* Take a section header read from a coff file (in HOST byte order),
     and make a BFD "section" out of it.  This is used by ECOFF.  */

@@ -67,32 +130,46 @@ make_a_section_from_file (bfd *abfd,
    if (bfd_coff_set_long_section_names (abfd, 
bfd_coff_long_section_names (abfd))
        && hdr->s_name[0] == '/')
      {
-      char buf[SCNNMLEN];
-      long strindex;
-      char *p;
-      const char *strings;
-
        /* Flag that this BFD uses long names, even though the format might
  	 expect them to be off by default.  This won't directly affect the
  	 format of any output BFD created from this one, but the information
  	 can be used to decide what to do.  */
        bfd_coff_set_long_section_names (abfd, true);
-      memcpy (buf, hdr->s_name + 1, SCNNMLEN - 1);
-      buf[SCNNMLEN - 1] = '\0';
-      strindex = strtol (buf, &p, 10);
-      if (*p == '\0' && strindex >= 0)
+
+      if (hdr->s_name[1] == '/')
  	{
-	  strings = _bfd_coff_read_string_table (abfd);
-	  if (strings == NULL)
-	    return false;
-	  if ((bfd_size_type)(strindex + 2) >= obj_coff_strings_len (abfd))
+	  /* LLVM extension: the '/' is followed by another '/' and then by
+	     the index in the strtab encoded in base64 without NUL at the
+	     end.  */
+	  uint32_t strindex;
+
+	  /* Decode the index.  No overflow is expected as the string table
+	     length is at most 2^32 - 1 (the length is written on the first
+	     four bytes).  */
+	  if (!decode_base64 (hdr->s_name + 2, SCNNMLEN - 2, &strindex))
  	    return false;
-	  strings += strindex;
-	  name = (char *) bfd_alloc (abfd,
-				     (bfd_size_type) strlen (strings) + 1 + 1);
+
+	  name = extract_long_section_name (abfd, strindex);
  	  if (name == NULL)
  	    return false;
-	  strcpy (name, strings);
+	}
+      else
+	{
+	  /* PE classic long section name.  The '/' is followed by the index
+	     in the strtab.  The index is formatted as a decimal string.  */
+	  char buf[SCNNMLEN];
+	  long strindex;
+	  char *p;
+
+	  memcpy (buf, hdr->s_name + 1, SCNNMLEN - 1);
+	  buf[SCNNMLEN - 1] = '\0';
+	  strindex = strtol (buf, &p, 10);
+	  if (*p == '\0' && strindex >= 0)
+	    {
+	      name = extract_long_section_name (abfd, strindex);
+	      if (name == NULL)
+		return false;
+	    }
  	}
      }


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] pe/coff - add support for base64 encoded long section names
  2023-05-17 14:14   ` Jose E. Marchesi
@ 2023-05-21 15:50     ` Tristan Gingold
  0 siblings, 0 replies; 9+ messages in thread
From: Tristan Gingold @ 2023-05-21 15:50 UTC (permalink / raw)
  To: Jose E. Marchesi, Nick Clifton via Binutils; +Cc: Nick Clifton

Hello,

I am first addressing the decoding problem, without
trying to address the PR.

I am not sure it is possible to address it, simply because the number of 
sections in PE/COFF formats is limited to 2^16.  I have to look at what 
clang generates.

I am also not sure that bfd needs to emit base64 names, because it emits 
the section names at the beginning of the string table.  So unless you 
have very very long section names, it won't hit the issue.

Of course you can make an artificial testcase that could generate a 
correct output and not be accepted by gas.  But I am not sure how 
interesting it is.

Tristan.


On 17/05/2023 16:14, Jose E. Marchesi wrote:
> 
> Heh, I had spent a day working on this, due to PR 30444, not noticing
> this existing patch :)
> 
> I went so far as implementing what Tristan's patch does, and then looked
> at why gas is complaining about too big object file when you try to
> assemble a file that needs using th enew //BASE64 schema.
> 
> In particular, I found that BFD complains about too big object in that
> scenario in two places.
> 
> First, in coff_write_object_contents (the #if 0 is mine):
> 
> @@ -3635,7 +3636,11 @@ coff_write_object_contents (bfd * abfd)
>   
>   	      /* An inherent limitation of the /nnnnnnn notation used to indicate
>   		 the offset of the long name in the string table is that we
> -		 cannot address entries beyone the ten million byte boundary.  */
> +		 cannot address entries beyone the ten million byte boundary.
> +
> +                 However, the //BASE64 notation allows addressing
> +                 entries up to 0xFFFFFFFF.  */
> +#if 0
>   	      if (string_size >= 10000000)
>   		{
>   		  bfd_set_error (bfd_error_file_too_big);
> @@ -3645,7 +3650,7 @@ coff_write_object_contents (bfd * abfd)
>   		    abfd, current, (unsigned long) string_size);
>   		  return false;
>   		}
> -
> +#endif
>   	      /* We do not need to use snprintf here as we have already verfied
>   		 that string_size is not too big, plus we have an overlarge
>   		 buffer, just in case.  */
> 
> 
> Then, in coffcode.h coff_compute_section_file_positions, where it checks
> bfd_coff_max_nscns, whose default is set to 32768 in the
> bfd_coff_std_swap_table also in coffcode.h.
> 
> Tristan, you may want to try to assemble the test-gcc.s file that is
> attached in PR30444 to reproduce the "file too big error", with your
> patch applied.
> 
>> Hi Tristan,
>>
>>> LLVM has added another convention for very large strtab, using
>>> '//xxxxxx' names and base64 encoding of the index in the strtab.
>>
>> Coincidentally, this has also been reported in PR 30444...
>>
>>> It has been a while since I haven't submitted a patch, I hope I am correctly following the procedure!
>>
>> Oh you are. :-)
>>
>> So - I have a couple of questions on the patch:
>>
>>> +  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 1 + 1);
>>> +  if (name == NULL)
>>> +    return NULL;
>>> +  strcpy (name, strings);
>>
>> Why "+1 +1" in the bfd_alloc ?  I understand one of them, but two ?
>>
>>> +	  unsigned strindex;
>>> +	  unsigned i;
>>> +
>>> +	  strindex = 0;
>>> +	  for (i = 2; i < SCNNMLEN; i++)
>>> +	    {
>>> +	      char c = hdr->s_name[i];
>>> +	      unsigned d;
>>> +
>>> +	      if (c >= 'A' && c <= 'Z')
>>> +		d = c - 'A';
>>> +	      else if (c >= 'a' && c <= 'z')
>>> +		d = c - 'a' + 26;
>>> +	      else if (c >= '0' && c <= '9')
>>> +		d = c - '0' + 52;
>>> +	      else if (c == '+')
>>> +		d = 62;
>>> +	      else if (c == '/')
>>> +		d = 63;
>>> +	      else
>>> +		return false;
>>> +	      strindex = (strindex << 6) + d;
>>> +	    }
>>
>> Is "unsigned" the right type for strindex ?  It seems to me that it might
>> be possible to encode really large numbers using base64.
>>
>> Also - it might be nice to move this loop into a stand alone function so
>> that it can be used from other parts of the BFD library.  (Assuming that
>> the functionality requested in PR 30444 is implemented).
>>
>> Cheers
>>    Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] pe/coff - add support for base64 encoded long section names
  2023-05-21 15:40   ` Tristan Gingold
@ 2023-05-22  6:36     ` Jan Beulich
  2023-05-22  8:53       ` Martin Storsjö
  2023-05-22 19:40       ` Tristan Gingold
  0 siblings, 2 replies; 9+ messages in thread
From: Jan Beulich @ 2023-05-22  6:36 UTC (permalink / raw)
  To: Tristan Gingold; +Cc: Nick Clifton, binutils

On 21.05.2023 17:40, Tristan Gingold via Binutils wrote:
> On 17/05/2023 15:30, Nick Clifton wrote:
>> Is "unsigned" the right type for strindex ?  It seems to me that it might
>> be possible to encode really large numbers using base64.
> 
> Ok, I have replaced "unsigned" with "uint32_t" to clarify my choice, 
> although that doesn't change anything.
> 
> Yes, you can encode large numbers with base64, but as the section name 
> length in PE/COFF header is 8 bytes, you can only have 6 base64 digits 
> (the first two bytes are "//").  So you can encode only 6*6=36 bit.
> But, and more important, the string table length is limited to 2^32 bit, 
> as the length is written in the first four bytes.  So at the end, the 
> index must fit on 32b otherwise it's an error.
> I added a comment to explain.

I wonder whether this wouldn't better be handled either by returning the
full 36-bit value and then letting the caller decide (it wants to check
against string table size anyway, not just 2³²), or by checking that the
first digit is only 'A'...'D'. The latter would of course make the
helper function less generic (if any further use appeared).

I further wonder what LLVM's motivation is to zero-pad (i.e. 'A'-prefix)
the encoded values, rather than - like pre-existing section name
representation - nul-padding at the end. At the very least I'm inclined
to suggest that we also support that obvious (I think) alternative here.

Jan

>> Also - it might be nice to move this loop into a stand alone function so
>> that it can be used from other parts of the BFD library.  (Assuming that
>> the functionality requested in PR 30444 is implemented).
> 
> Yes, I added the decoding in a separate function.
> 
> Thank you for the review, and here is the new patch!
> 
> Tristan.
> 
> 2023-05-21  Tristan Gingold  <tgingold@thinkpad-e15>
> 
> 	* coffgen.c (extract_long_section_name): New function extracted
> 	from ...
> 	(make_a_section_from_file): ... here.  Add support for base64
> 	long section names.
> 	(decode_base64): New function.
> 
> diff --git a/bfd/coffgen.c b/bfd/coffgen.c
> index ac936def566..970e06bf2c0 100644
> --- a/bfd/coffgen.c
> +++ b/bfd/coffgen.c
> @@ -43,6 +43,69 @@
>   #include "coff/internal.h"
>   #include "libcoff.h"
> 
> +/* Extract a long section name at STRINDEX and copy it to the bfd objstack.
> +   Return NULL in case of error.  */
> +
> +static char *
> +extract_long_section_name(bfd *abfd, unsigned long strindex)
> +{
> +  const char *strings;
> +  char *name;
> +
> +  strings = _bfd_coff_read_string_table (abfd);
> +  if (strings == NULL)
> +    return NULL;
> +  if ((bfd_size_type)(strindex + 2) >= obj_coff_strings_len (abfd))
> +    return NULL;
> +  strings += strindex;
> +  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 1);
> +  if (name == NULL)
> +    return NULL;
> +  strcpy (name, strings);
> +
> +  return name;
> +}
> +
> +/* Decode a base64 coded string at STR of length LEN, and write the result
> +   to RES.  Return true on success.
> +   Return false in case of invalid character or overflow.  */
> +
> +static bool
> +decode_base64 (const char *str, unsigned len, uint32_t *res)
> +{
> +  unsigned i;
> +  uint32_t val;
> +
> +  val = 0;
> +  for (i = 0; i < len; i++)
> +    {
> +      char c = str[i];
> +      unsigned d;
> +
> +      if (c >= 'A' && c <= 'Z')
> +	d = c - 'A';
> +      else if (c >= 'a' && c <= 'z')
> +	d = c - 'a' + 26;
> +      else if (c >= '0' && c <= '9')
> +	d = c - '0' + 52;
> +      else if (c == '+')
> +	d = 62;
> +      else if (c == '/')
> +	d = 63;
> +      else
> +	return false;
> +
> +      /* Check for overflow. */
> +      if ((val >> 26) != 0)
> +	return false;
> +
> +      val = (val << 6) + d;
> +    }
> +
> +  *res = val;
> +  return true;
> +}
> +
>   /* Take a section header read from a coff file (in HOST byte order),
>      and make a BFD "section" out of it.  This is used by ECOFF.  */
> 
> @@ -67,32 +130,46 @@ make_a_section_from_file (bfd *abfd,
>     if (bfd_coff_set_long_section_names (abfd, 
> bfd_coff_long_section_names (abfd))
>         && hdr->s_name[0] == '/')
>       {
> -      char buf[SCNNMLEN];
> -      long strindex;
> -      char *p;
> -      const char *strings;
> -
>         /* Flag that this BFD uses long names, even though the format might
>   	 expect them to be off by default.  This won't directly affect the
>   	 format of any output BFD created from this one, but the information
>   	 can be used to decide what to do.  */
>         bfd_coff_set_long_section_names (abfd, true);
> -      memcpy (buf, hdr->s_name + 1, SCNNMLEN - 1);
> -      buf[SCNNMLEN - 1] = '\0';
> -      strindex = strtol (buf, &p, 10);
> -      if (*p == '\0' && strindex >= 0)
> +
> +      if (hdr->s_name[1] == '/')
>   	{
> -	  strings = _bfd_coff_read_string_table (abfd);
> -	  if (strings == NULL)
> -	    return false;
> -	  if ((bfd_size_type)(strindex + 2) >= obj_coff_strings_len (abfd))
> +	  /* LLVM extension: the '/' is followed by another '/' and then by
> +	     the index in the strtab encoded in base64 without NUL at the
> +	     end.  */
> +	  uint32_t strindex;
> +
> +	  /* Decode the index.  No overflow is expected as the string table
> +	     length is at most 2^32 - 1 (the length is written on the first
> +	     four bytes).  */
> +	  if (!decode_base64 (hdr->s_name + 2, SCNNMLEN - 2, &strindex))
>   	    return false;
> -	  strings += strindex;
> -	  name = (char *) bfd_alloc (abfd,
> -				     (bfd_size_type) strlen (strings) + 1 + 1);
> +
> +	  name = extract_long_section_name (abfd, strindex);
>   	  if (name == NULL)
>   	    return false;
> -	  strcpy (name, strings);
> +	}
> +      else
> +	{
> +	  /* PE classic long section name.  The '/' is followed by the index
> +	     in the strtab.  The index is formatted as a decimal string.  */
> +	  char buf[SCNNMLEN];
> +	  long strindex;
> +	  char *p;
> +
> +	  memcpy (buf, hdr->s_name + 1, SCNNMLEN - 1);
> +	  buf[SCNNMLEN - 1] = '\0';
> +	  strindex = strtol (buf, &p, 10);
> +	  if (*p == '\0' && strindex >= 0)
> +	    {
> +	      name = extract_long_section_name (abfd, strindex);
> +	      if (name == NULL)
> +		return false;
> +	    }
>   	}
>       }
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] pe/coff - add support for base64 encoded long section names
  2023-05-22  6:36     ` Jan Beulich
@ 2023-05-22  8:53       ` Martin Storsjö
  2023-05-22 19:40       ` Tristan Gingold
  1 sibling, 0 replies; 9+ messages in thread
From: Martin Storsjö @ 2023-05-22  8:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Tristan Gingold, Nick Clifton, binutils

On Mon, 22 May 2023, Jan Beulich via Binutils wrote:

> I further wonder what LLVM's motivation is to zero-pad (i.e. 'A'-prefix)
> the encoded values, rather than - like pre-existing section name
> representation - nul-padding at the end.

FYI, this format isn't LLVM's own invention, it's modelled after what MSVC 
does produce. See 
https://github.com/llvm/llvm-project/commit/9d2c15eff7edd220a3246ebedd2e502051e49844 
for the commit where it was introduced in LLVM:

> MC: Support COFF string tables larger than 10MB
> 
> Offsets past the range of single-slash encoding are encoded as base64,
> padded to 6 characters, and prefixed with two slashes. This encoding is
> undocumented but used by MSVC.


> At the very least I'm inclined to suggest that we also support that 
> obvious (I think) alternative here.

I guess that's fine, but it's probably safest to produce the zero-padded 
format, if support for generating it gets added later.

// Martin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] pe/coff - add support for base64 encoded long section names
  2023-05-22  6:36     ` Jan Beulich
  2023-05-22  8:53       ` Martin Storsjö
@ 2023-05-22 19:40       ` Tristan Gingold
  2023-05-23  6:31         ` Jan Beulich
  1 sibling, 1 reply; 9+ messages in thread
From: Tristan Gingold @ 2023-05-22 19:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Nick Clifton, binutils



On 22/05/2023 08:36, Jan Beulich wrote:
[...]
> I wonder whether this wouldn't better be handled either by returning
> the full 36-bit value and then letting the caller decide (it wants to
> check against string table size anyway, not just 2³²), or by checking
> that the first digit is only 'A'...'D'. The latter would of course
> make the helper function less generic (if any further use appeared).

I prefer to keep the overflow check and also to keep the function generic.
Using a 64b type might be tricky as it is easy to have the value 
silently truncated due to the use of a 32b type.

> I further wonder what LLVM's motivation is to zero-pad (i.e.
> 'A'-prefix) the encoded values, rather than - like pre-existing
> section name representation - nul-padding at the end. At the very
> least I'm inclined to suggest that we also support that obvious (I
> think) alternative here.

I don't see the point. The nul-padding is not used by anyone, and 
therefore cannot even be tested.  Why creating dead code ?

I will submit a v2 of the patch.

Tristan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] pe/coff - add support for base64 encoded long section names
  2023-05-22 19:40       ` Tristan Gingold
@ 2023-05-23  6:31         ` Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2023-05-23  6:31 UTC (permalink / raw)
  To: Tristan Gingold; +Cc: Nick Clifton, binutils

On 22.05.2023 21:40, Tristan Gingold wrote:
> On 22/05/2023 08:36, Jan Beulich wrote:
> [...]
>> I wonder whether this wouldn't better be handled either by returning
>> the full 36-bit value and then letting the caller decide (it wants to
>> check against string table size anyway, not just 2³²), or by checking
>> that the first digit is only 'A'...'D'. The latter would of course
>> make the helper function less generic (if any further use appeared).
> 
> I prefer to keep the overflow check and also to keep the function generic.
> Using a 64b type might be tricky as it is easy to have the value 
> silently truncated due to the use of a 32b type.

Yet that's exactly what leaves the function not really generic.

>> I further wonder what LLVM's motivation is to zero-pad (i.e.
>> 'A'-prefix) the encoded values, rather than - like pre-existing
>> section name representation - nul-padding at the end. At the very
>> least I'm inclined to suggest that we also support that obvious (I
>> think) alternative here.
> 
> I don't see the point. The nul-padding is not used by anyone, and 
> therefore cannot even be tested.  Why creating dead code ?

Oh, I see you've answered the question here, and I didn't look here
first. Fair enough, yet to explain myself: I find the 'A'-padding
simply odd. It's more clutter and inconsistent with the nul-padded
original long-section-name representation.

Anyway - with the two spelling nits taken care of I don't mind v3
going in otherwise as is.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-23  6:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-11  5:28 [PATCH] pe/coff - add support for base64 encoded long section names Tristan Gingold
2023-05-17 13:30 ` Nick Clifton
2023-05-17 14:14   ` Jose E. Marchesi
2023-05-21 15:50     ` Tristan Gingold
2023-05-21 15:40   ` Tristan Gingold
2023-05-22  6:36     ` Jan Beulich
2023-05-22  8:53       ` Martin Storsjö
2023-05-22 19:40       ` Tristan Gingold
2023-05-23  6:31         ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).