From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=UJrE=BK=free.fr=tgingold@sourceware.org>
Received: from smtp4-g21.free.fr (smtp4-g21.free.fr [212.27.42.4])
	by sourceware.org (Postfix) with ESMTPS id 47D543857700
	for <binutils@sourceware.org>; Sun, 21 May 2023 15:40:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 47D543857700
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=free.fr
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=free.fr
Received: from [192.168.2.42] (unknown [88.173.63.180])
	(Authenticated sender: tgingold@free.fr)
	by smtp4-g21.free.fr (Postfix) with ESMTPSA id 8789419F750;
	Sun, 21 May 2023 17:40:15 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=free.fr;
	s=smtp-20201208; t=1684683617;
	bh=UlfdgF5FEMhYoV/NpYTx1v9U/O8pPhtWoGZy8By0ZRs=;
	h=Date:Subject:To:References:From:In-Reply-To:From;
	b=AjZQCOZbXV6Cr+/ASVsH69DJKUKDz3lnBlKMkJ84m3fc83hnL52Jq7BWHUujNHp5G
	 1GHciOmdswuQrPCbR3cfPsw7wSjrLRvRrZxr+F8sJcFQBCI/0WsOWfX7ay2av7nkQX
	 H2W+GJENUAZujobYuf4I83hRhtl53gqSH0K+E3DkQOcDVv8Olvosph8m7dPrFG6kFA
	 ty5PH9v3bxkUJRCFlH6ru1HBl4j2eonIJHF28x0kkjOxdntuX+dwEk00IkzFVPiOpI
	 bYeHdwPUoNYU/2icFerzzxh09qEhJKLEybK3h8vmRDDXZQZARo6bb63H7Hvd/4gJvg
	 syEB6J+S7QCvw==
Message-ID: <040cfba6-009c-4e57-34b9-07469191aa88@free.fr>
Date: Sun, 21 May 2023 17:40:15 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.10.0
Subject: Re: [PATCH] pe/coff - add support for base64 encoded long section
 names
To: Nick Clifton <nickc@redhat.com>, binutils <binutils@sourceware.org>
References: <243D0799-E3D0-4938-A438-DD8725593F67@free.fr>
 <aef8b910-1a0c-1a0c-3109-c5d5e01d6e37@redhat.com>
Content-Language: en-US
From: Tristan Gingold <tgingold@free.fr>
In-Reply-To: <aef8b910-1a0c-1a0c-3109-c5d5e01d6e37@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,JMQ_SPF_NEUTRAL,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <binutils.sourceware.org>

Hello,

On 17/05/2023 15:30, Nick Clifton wrote:
> Hi Tristan,
> 
>> LLVM has added another convention for very large strtab, using 
>> '//xxxxxx' names and base64 encoding of the index in the strtab.
> 
> Coincidentally, this has also been reported in PR 30444...

Not exactly a coincidence, but I will try to address one problem after 
another.

[...]

>> +  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 
>> 1 + 1);
>> +  if (name == NULL)
>> +    return NULL;
>> +  strcpy (name, strings);
> 
> Why "+1 +1" in the bfd_alloc ?  I understand one of them, but two ?

I think I moved the code without re-reading it!  I agree with you and
I fixed that.

[...]
> Is "unsigned" the right type for strindex ?  It seems to me that it might
> be possible to encode really large numbers using base64.

Ok, I have replaced "unsigned" with "uint32_t" to clarify my choice, 
although that doesn't change anything.

Yes, you can encode large numbers with base64, but as the section name 
length in PE/COFF header is 8 bytes, you can only have 6 base64 digits 
(the first two bytes are "//").  So you can encode only 6*6=36 bit.
But, and more important, the string table length is limited to 2^32 bit, 
as the length is written in the first four bytes.  So at the end, the 
index must fit on 32b otherwise it's an error.
I added a comment to explain.

> Also - it might be nice to move this loop into a stand alone function so
> that it can be used from other parts of the BFD library.  (Assuming that
> the functionality requested in PR 30444 is implemented).

Yes, I added the decoding in a separate function.

Thank you for the review, and here is the new patch!

Tristan.

2023-05-21  Tristan Gingold  <tgingold@thinkpad-e15>

	* coffgen.c (extract_long_section_name): New function extracted
	from ...
	(make_a_section_from_file): ... here.  Add support for base64
	long section names.
	(decode_base64): New function.

diff --git a/bfd/coffgen.c b/bfd/coffgen.c
index ac936def566..970e06bf2c0 100644
--- a/bfd/coffgen.c
+++ b/bfd/coffgen.c
@@ -43,6 +43,69 @@
  #include "coff/internal.h"
  #include "libcoff.h"

+/* Extract a long section name at STRINDEX and copy it to the bfd objstack.
+   Return NULL in case of error.  */
+
+static char *
+extract_long_section_name(bfd *abfd, unsigned long strindex)
+{
+  const char *strings;
+  char *name;
+
+  strings = _bfd_coff_read_string_table (abfd);
+  if (strings == NULL)
+    return NULL;
+  if ((bfd_size_type)(strindex + 2) >= obj_coff_strings_len (abfd))
+    return NULL;
+  strings += strindex;
+  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 1);
+  if (name == NULL)
+    return NULL;
+  strcpy (name, strings);
+
+  return name;
+}
+
+/* Decode a base64 coded string at STR of length LEN, and write the result
+   to RES.  Return true on success.
+   Return false in case of invalid character or overflow.  */
+
+static bool
+decode_base64 (const char *str, unsigned len, uint32_t *res)
+{
+  unsigned i;
+  uint32_t val;
+
+  val = 0;
+  for (i = 0; i < len; i++)
+    {
+      char c = str[i];
+      unsigned d;
+
+      if (c >= 'A' && c <= 'Z')
+	d = c - 'A';
+      else if (c >= 'a' && c <= 'z')
+	d = c - 'a' + 26;
+      else if (c >= '0' && c <= '9')
+	d = c - '0' + 52;
+      else if (c == '+')
+	d = 62;
+      else if (c == '/')
+	d = 63;
+      else
+	return false;
+
+      /* Check for overflow. */
+      if ((val >> 26) != 0)
+	return false;
+
+      val = (val << 6) + d;
+    }
+
+  *res = val;
+  return true;
+}
+
  /* Take a section header read from a coff file (in HOST byte order),
     and make a BFD "section" out of it.  This is used by ECOFF.  */

@@ -67,32 +130,46 @@ make_a_section_from_file (bfd *abfd,
    if (bfd_coff_set_long_section_names (abfd, 
bfd_coff_long_section_names (abfd))
        && hdr->s_name[0] == '/')
      {
-      char buf[SCNNMLEN];
-      long strindex;
-      char *p;
-      const char *strings;
-
        /* Flag that this BFD uses long names, even though the format might
  	 expect them to be off by default.  This won't directly affect the
  	 format of any output BFD created from this one, but the information
  	 can be used to decide what to do.  */
        bfd_coff_set_long_section_names (abfd, true);
-      memcpy (buf, hdr->s_name + 1, SCNNMLEN - 1);
-      buf[SCNNMLEN - 1] = '\0';
-      strindex = strtol (buf, &p, 10);
-      if (*p == '\0' && strindex >= 0)
+
+      if (hdr->s_name[1] == '/')
  	{
-	  strings = _bfd_coff_read_string_table (abfd);
-	  if (strings == NULL)
-	    return false;
-	  if ((bfd_size_type)(strindex + 2) >= obj_coff_strings_len (abfd))
+	  /* LLVM extension: the '/' is followed by another '/' and then by
+	     the index in the strtab encoded in base64 without NUL at the
+	     end.  */
+	  uint32_t strindex;
+
+	  /* Decode the index.  No overflow is expected as the string table
+	     length is at most 2^32 - 1 (the length is written on the first
+	     four bytes).  */
+	  if (!decode_base64 (hdr->s_name + 2, SCNNMLEN - 2, &strindex))
  	    return false;
-	  strings += strindex;
-	  name = (char *) bfd_alloc (abfd,
-				     (bfd_size_type) strlen (strings) + 1 + 1);
+
+	  name = extract_long_section_name (abfd, strindex);
  	  if (name == NULL)
  	    return false;
-	  strcpy (name, strings);
+	}
+      else
+	{
+	  /* PE classic long section name.  The '/' is followed by the index
+	     in the strtab.  The index is formatted as a decimal string.  */
+	  char buf[SCNNMLEN];
+	  long strindex;
+	  char *p;
+
+	  memcpy (buf, hdr->s_name + 1, SCNNMLEN - 1);
+	  buf[SCNNMLEN - 1] = '\0';
+	  strindex = strtol (buf, &p, 10);
+	  if (*p == '\0' && strindex >= 0)
+	    {
+	      name = extract_long_section_name (abfd, strindex);
+	      if (name == NULL)
+		return false;
+	    }
  	}
      }