public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Tristan Gingold <tgingold@free.fr>
To: "Jose E. Marchesi" <jose.marchesi@oracle.com>,
	Nick Clifton via Binutils <binutils@sourceware.org>
Cc: Nick Clifton <nickc@redhat.com>
Subject: Re: [PATCH] pe/coff - add support for base64 encoded long section names
Date: Sun, 21 May 2023 17:50:10 +0200	[thread overview]
Message-ID: <cb7288a3-1d43-ce58-0209-4793a2581e03@free.fr> (raw)
In-Reply-To: <87zg63kpue.fsf@oracle.com>

Hello,

I am first addressing the decoding problem, without
trying to address the PR.

I am not sure it is possible to address it, simply because the number of 
sections in PE/COFF formats is limited to 2^16.  I have to look at what 
clang generates.

I am also not sure that bfd needs to emit base64 names, because it emits 
the section names at the beginning of the string table.  So unless you 
have very very long section names, it won't hit the issue.

Of course you can make an artificial testcase that could generate a 
correct output and not be accepted by gas.  But I am not sure how 
interesting it is.

Tristan.


On 17/05/2023 16:14, Jose E. Marchesi wrote:
> 
> Heh, I had spent a day working on this, due to PR 30444, not noticing
> this existing patch :)
> 
> I went so far as implementing what Tristan's patch does, and then looked
> at why gas is complaining about too big object file when you try to
> assemble a file that needs using th enew //BASE64 schema.
> 
> In particular, I found that BFD complains about too big object in that
> scenario in two places.
> 
> First, in coff_write_object_contents (the #if 0 is mine):
> 
> @@ -3635,7 +3636,11 @@ coff_write_object_contents (bfd * abfd)
>   
>   	      /* An inherent limitation of the /nnnnnnn notation used to indicate
>   		 the offset of the long name in the string table is that we
> -		 cannot address entries beyone the ten million byte boundary.  */
> +		 cannot address entries beyone the ten million byte boundary.
> +
> +                 However, the //BASE64 notation allows addressing
> +                 entries up to 0xFFFFFFFF.  */
> +#if 0
>   	      if (string_size >= 10000000)
>   		{
>   		  bfd_set_error (bfd_error_file_too_big);
> @@ -3645,7 +3650,7 @@ coff_write_object_contents (bfd * abfd)
>   		    abfd, current, (unsigned long) string_size);
>   		  return false;
>   		}
> -
> +#endif
>   	      /* We do not need to use snprintf here as we have already verfied
>   		 that string_size is not too big, plus we have an overlarge
>   		 buffer, just in case.  */
> 
> 
> Then, in coffcode.h coff_compute_section_file_positions, where it checks
> bfd_coff_max_nscns, whose default is set to 32768 in the
> bfd_coff_std_swap_table also in coffcode.h.
> 
> Tristan, you may want to try to assemble the test-gcc.s file that is
> attached in PR30444 to reproduce the "file too big error", with your
> patch applied.
> 
>> Hi Tristan,
>>
>>> LLVM has added another convention for very large strtab, using
>>> '//xxxxxx' names and base64 encoding of the index in the strtab.
>>
>> Coincidentally, this has also been reported in PR 30444...
>>
>>> It has been a while since I haven't submitted a patch, I hope I am correctly following the procedure!
>>
>> Oh you are. :-)
>>
>> So - I have a couple of questions on the patch:
>>
>>> +  name = (char *) bfd_alloc (abfd, (bfd_size_type) strlen (strings) + 1 + 1);
>>> +  if (name == NULL)
>>> +    return NULL;
>>> +  strcpy (name, strings);
>>
>> Why "+1 +1" in the bfd_alloc ?  I understand one of them, but two ?
>>
>>> +	  unsigned strindex;
>>> +	  unsigned i;
>>> +
>>> +	  strindex = 0;
>>> +	  for (i = 2; i < SCNNMLEN; i++)
>>> +	    {
>>> +	      char c = hdr->s_name[i];
>>> +	      unsigned d;
>>> +
>>> +	      if (c >= 'A' && c <= 'Z')
>>> +		d = c - 'A';
>>> +	      else if (c >= 'a' && c <= 'z')
>>> +		d = c - 'a' + 26;
>>> +	      else if (c >= '0' && c <= '9')
>>> +		d = c - '0' + 52;
>>> +	      else if (c == '+')
>>> +		d = 62;
>>> +	      else if (c == '/')
>>> +		d = 63;
>>> +	      else
>>> +		return false;
>>> +	      strindex = (strindex << 6) + d;
>>> +	    }
>>
>> Is "unsigned" the right type for strindex ?  It seems to me that it might
>> be possible to encode really large numbers using base64.
>>
>> Also - it might be nice to move this loop into a stand alone function so
>> that it can be used from other parts of the BFD library.  (Assuming that
>> the functionality requested in PR 30444 is implemented).
>>
>> Cheers
>>    Nick

  reply	other threads:[~2023-05-21 15:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11  5:28 Tristan Gingold
2023-05-17 13:30 ` Nick Clifton
2023-05-17 14:14   ` Jose E. Marchesi
2023-05-21 15:50     ` Tristan Gingold [this message]
2023-05-21 15:40   ` Tristan Gingold
2023-05-22  6:36     ` Jan Beulich
2023-05-22  8:53       ` Martin Storsjö
2023-05-22 19:40       ` Tristan Gingold
2023-05-23  6:31         ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb7288a3-1d43-ce58-0209-4793a2581e03@free.fr \
    --to=tgingold@free.fr \
    --cc=binutils@sourceware.org \
    --cc=jose.marchesi@oracle.com \
    --cc=nickc@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).