public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Michael Matz <matz@suse.de>
Cc: binutils@sourceware.org
Subject: Re: ld: Avoid overflows in string merging
Date: Wed, 8 Nov 2023 08:30:02 +0100	[thread overview]
Message-ID: <ab97a876-28aa-9582-4fb4-12a2363a29c9@suse.com> (raw)
In-Reply-To: <alpine.LSU.2.20.2311071650430.15233@wotan.suse.de>

On 07.11.2023 17:51, Michael Matz wrote:
> as the bug report shows we had an overflow in the test if
> hash table resizing is needed.  Reorder the expression to avoid
> that.  There's still a bug somewhere in gracefully handling
> failure in resizing (e.g. out of memory), but this pushes the
> boundary for that occurring somewhen into the future and
> immediately helps the reporter.
> 
>     bfd/
> 
>     PR ld/31009
>     * merge.c (sec_merge_maybe_resize): Avoid overflow in expression.
>     (sec_merge_hash_insert): Adjust assert.
> ---
> 
> regtested on many targets, okay for master?

This is an improvement, so okay to put in, but:

> --- a/bfd/merge.c
> +++ b/bfd/merge.c
> @@ -167,7 +167,7 @@ static bool
>  sec_merge_maybe_resize (struct sec_merge_hash *table, unsigned added)
>  {
>    struct bfd_hash_table *bfdtab = &table->table;
> -  if (bfdtab->count + added > table->nbuckets * 2 / 3)
> +  if (bfdtab->count + added > table->nbuckets / 3 * 2)
>      {
>        unsigned i;
>        unsigned long newnb = table->nbuckets * 2;
> @@ -175,7 +175,7 @@ sec_merge_maybe_resize (struct sec_merge_hash *table, unsigned added)
>        uint64_t *newl;
>        unsigned long alloc;
>  
> -      while (bfdtab->count + added > newnb * 2 / 3)
> +      while (bfdtab->count + added > newnb / 3 * 2)
>  	{
>  	  newnb *= 2;
>  	  if (!newnb)

Isn't this overly aggressive? We want to resize when past two thirds,
but why would we go by two thirds even within this loop? We've doubled
once already, and all we care about is that new capacity be enough to
cover "added". The more that - as the comment there says - the caller
already overestimates heavily.

Even that estimate could do with tweaking. For small sections,
assuming there may be relatively many very short strings is certainly
okay. But there can be only 255 of them (for 8-bit chars). So for
larger sections, the estimate could surely be more realistic.

Otoh aren't we also at risk of underestimating when entsize == 1 and
we're not dealing with strings?

> @@ -240,7 +240,7 @@ sec_merge_hash_insert (struct sec_merge_hash *table,
>    hashp->u.suffix = NULL;
>    hashp->next = NULL;
>    // We must not need resizing, otherwise _index is wrong
> -  BFD_ASSERT (bfdtab->count + 1 <= table->nbuckets * 2 / 3);
> +  BFD_ASSERT (bfdtab->count + 1 <= table->nbuckets / 3 * 2);
>    bfdtab->count++;
>    table->key_lens[_index] = (hash << 32) | (uint32_t)len;
>    table->values[_index] = hashp;

I'm puzzled by both comment and assertion here: Afaict we're past
resizing already, and hence all that matters is that the new index
is within table boundaries.

Further, the same expression occurring three times (and now needing
updating consistently) pretty clearly calls for putting in a macro
or function. (Assuming of course the same calculation remains to
be there three times, which may not be the case as per above.)

Finally, is using unsigned int variables / fields actually
appropriate when BFD64 and hence section sizes can be wider than 32
bits?

Jan

  reply	other threads:[~2023-11-08  7:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-07 16:51 Michael Matz
2023-11-08  7:30 ` Jan Beulich [this message]
2023-11-08 14:31   ` Michael Matz
2023-11-08 16:16     ` Michael Matz
2023-11-09  7:55       ` Jan Beulich
2023-11-09  7:52     ` Jan Beulich
2023-11-09 16:27       ` Michael Matz
2023-11-10 10:10         ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab97a876-28aa-9582-4fb4-12a2363a29c9@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=matz@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).