From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <binutils-return-90696-listarch-binutils=sources.redhat.com@sourceware.org>
Received: (qmail 80850 invoked by alias); 5 Feb 2016 22:38:05 -0000
Mailing-List: contact binutils-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <binutils.sourceware.org>
List-Subscribe: <mailto:binutils-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/binutils/>
List-Post: <mailto:binutils@sourceware.org>
List-Help: <mailto:binutils-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: binutils-owner@sourceware.org
Received: (qmail 80834 invoked by uid 89); 5 Feb 2016 22:38:05 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=H*Ad:D*cc, cruz, Cruz, santa
X-HELO: mailapp01.imgtec.com
Received: from mailapp01.imgtec.com (HELO mailapp01.imgtec.com) (195.59.15.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 05 Feb 2016 22:38:00 +0000
Received: from hhmail02.hh.imgtec.org (unknown [10.100.10.20])	by Websense Email Security Gateway with ESMTPS id 6399A20030041;	Fri,  5 Feb 2016 22:37:52 +0000 (GMT)
Received: from [10.100.200.149] (10.100.200.149) by hhmail02.hh.imgtec.org (10.100.10.21) with Microsoft SMTP Server id 14.3.266.1; Fri, 5 Feb 2016 22:37:55 +0000
Date: Fri, 05 Feb 2016 22:38:00 -0000
From: "Maciej W. Rozycki" <macro@imgtec.com>
To: "Neil Schellenberger (neschell)" <neschell@cisco.com>
CC: Nick Clifton <nickc@redhat.com>, Huang Pei <huangpei@loongson.cn>,	ma.jiang <ma.jiang@zte.com.cn>, <r@hev.cc>, <binutils@sourceware.org>
Subject: Re: [PATCH] MIPS support for --hash-style=gnu
In-Reply-To: <b49cd7d6b04d41749d56d557879ad4a6@XCH-RCD-006.cisco.com>
Message-ID: <alpine.DEB.2.00.1602011346020.15885@tp.orcam.me.uk>
References: <b49cd7d6b04d41749d56d557879ad4a6@XCH-RCD-006.cisco.com>
User-Agent: Alpine 2.00 (DEB 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
X-SW-Source: 2016-02/txt/msg00097.txt.bz2

Hi Neil,

 First of all, thank you for your contribution.

 It's been a while since you posted the proposal, long enough that 
meanwhile I took the post of a binutils MIPS target maintainer.  I 
understand that Nick, who is the binutils head maintainer, has already 
approved your change and I am not going to object it, not at least in 
principle, however since the process is still in progress I'm taking the 
opportunity to chime in and comment on your proposal from the MIPS 
architecture's rather than general binutils' point of view.

 Have you been able to sort out your copyright assignment paperwork with 
FSF meanwhile?

On Tue, 6 Oct 2015, Neil Schellenberger (neschell) wrote:

> I am tasked with reducing the time spent in the interpreter/loader at 
> program start time for a very large, multi-platform, multi-architecture, 
> legacy system (25+Mloc, 2000+ DSOs, 1+M symbols).  On MIPS this loader 
> overhead is several much larger than for other supported architectures, 
> which is not unexpected given the lack of .gnu.hash support. [1] 
> Measurement shows that a principal factor for the programs most affected 
> is the very large number of DSOs which are directly or indirectly needed 
> -- the chief expense being the cost of successive table misses during 
> symbol resolution.  A secondary factor is that some of the executables 
> and DSOs have a very large number of symbols with relocations -- on MIPS 
> these tend to run afoul of the multi-GOT mechanism which places many 
> into secondary GOTs, resulting in unconditionally forcing their 
> resolution at load time. [2][3][4] At this stage in the lifecycle of 
> this particular product, large scale architectural changes tend not to 
> be feasible (e.g. appreciably decreasing either the number of DSOs or 
> the number of relocations) so a more transparent solution is preferable.  
> (c.f. [5][6])

 Thank you for the detailed background information and analysis, and the 
references.

> In order to tackle the main problem -- large numbers of needed DSOs -- 
> some means of avoiding unprofitable (i.e. certain miss) hash table 
> probes would help significantly.  Since code to support Bloom filtering 
> already exists for GNU-style hash tables (DT_GNU_HASH), it seemed 
> attractive to enable that feature for MIPS (and then also benefit from 
> as much of the hash chain optimization as possible). [6] As I understand 
> it, the fundamental problem which has historically prevented this 
> support is, briefly, that the MIPS ABI requires that the .dynsym table 
> be sorted in a particular order, while .gnu.hash mandates a different 
> order; this appears to have stymied at least one earlier attempt. [8] As 
> I am expert neither with MIPS and its many ABI oddities, nor with the 
> implementation of the BFD linker, I have opted to take a very, very 
> simple -- if perhaps non-optimal -- approach.  Inspired by the comment 
> made by Richard Sandison to Hiroki Kaminaga concerning external symbol 
> blocks [9], I chose to reuse GNU-style hash tables, only with the simple 
> addition of a translation table to map the GNU symbol order to the MIPS 
> symbol order.  The MIPS .dynsym table proper continues to be completely 
> identical: its sort order and content are entirely unchanged.  The 
> proposed changes also leave the output of all other targets/backends 
> entirely unchanged (including MIPS using traditional SysV .hash).

 One important point I need to make here is that for many environments it 
is going to be necessary to keep a standard ELF hash section in binaries 
along your newly introduced GNU hash section, because they will have to be 
cooperative with the existing tools out there.  This is indeed a standard 
arrangement supported by GNU LD (with the `--hash-style=both' option) in 
addition to producing binaries embedding a single kind of a hash section 
of your choice.  And this has already been used with other targets which 
support using a GNU hash section.  In fact I have previously experienced 
problems myself in a configuration which stopped producing ELF hash 
sections along GNU hash as that caused a tool, the prelinker (more on the 
tool below), to stop working as it didn't support the GNU hash back then.

 So your "very, very simple -- if perhaps non-optimal -- approach" is in 
my opinion actually the best (as most simple solutions usually are), as 
not only it reuses proved existing code we already have in the tools, also 
making long-term maintenance easier, but by keeping all the other ELF file 
structures unchanged it ensures backwards compatibility as well, with 
environments out there that for whatever reasons have or want to keep 
working with the standard ELF hash.

> In an attempt to avoid forward compatibility issues, a new section and 
> related dynamic tag are proposed: .gnu.xhash and DT_GNU_XHASH.  (The "X" 
> standing either for "extended" or as a mnemonic for "translated", as the 
> reader prefers.)  This ensures that old binutils, glibc, and other third 
> party code do not attempt to behave as if a traditional 
> .gnu.hash/DT_GNU_HASH is present.  Likewise, the name of the section was 
> chosen so that it is not a textual prefix of .gnu.hash in an attempt to 
> preclude any insufficiently discriminating pattern from matching 
> inadvertently (e.g. in a script parsing readelf output).

 I agree this is a good choice.

> In practice, though, the .gnu.xhash section is virtually identical to 
> .gnu.hash. [9]
> 
> 	// Part 0: .gnu.xhash header
> 	Elf32_Word  ngnusyms;  // number of entries in chains (and xlat); dynsymcount=symndx+ngnusyms
> 	// Part 1: .gnu.hash header
> 	Elf32_Word  nbuckets;  // number of hash table buckets
> 	Elf32_Word  symndx;  // number of initial .dynsym entires skipped in chains[] (and xlat[])
> 	Elf32_Word  maskwords; // number of ElfW(Addr) words in bitmask
> 	Elf32_Word  shift2;  // bit shift of hashval for second Bloom filter bit
> 	// Part 2: .gnu.hash Bloom filter
> 	ElfW(Addr)  bitmask[maskwords];  // 2 bit Bloom filter on hashval
> 	// Part 3: .gnu.hash buckets
> 	Elf32_Word  buckets[nbuckets];  // indices into chains[]
> 	// Part 4: .gnu.hash chains
> 	Elf32_Word  chains[ngnusyms];  // consecutive hashvals in a given bucket; last entry in chain has LSB set
> 	// Part 5: .gnu.xhash translation table
> 	Elf32_Word  xlat[ngnusyms];  // parallel to chains[]; index into .dynsym
> 
> Parts 1 through 4 correspond exactly in layout to the traditional 
> .gnu.hash; part 4 has slightly different semantics since the correct 
> MIPS .dynsym index has to be first looked up in the parallel xlat table 
> i.e. the symbol corresponding to the hashval in chain[N] is 
> .dynsym[xlat[N]].  It is intentional that the .gnu.xhash layout contains 
> a .gnu.hash layout as a subcomponent: it represents an attempt to reuse 
> unchanged as much BFD and readelf code as possible.  The space cost of 
> .gnu.xhash relative to .gnu.hash is ngnusyms+1 32 bit words.  (For time 
> cost, the L2 cache hit from the xlat table lookup is probably atrocious, 
> but at that point we're already about to perform a string compare which 
> is probably going to be even more atrocious....  In any case, it's a 
> whole lot better than SysV .hash.)

 Fair enough, however as a matter of interest -- have you actually 
benchmarked the difference between your choice and a `.gnu.xhash' layout 
where parts 4 and 5 are intermixed i.e.:

	struct {
		Elf32_Word  hashval;
		Elf32_Word  indxlat;
	} xchains[ngnusyms];

-- maybe in reality that doesn't matter that much, especially with a set 
associative cache?

> In order to reuse the maximum amount of existing code with the minimum 
> amount of copying while also maintaining 100% backward compatibility, I 
> chose to implement the support as a BFD ELF backend hook: 
> record_hash_symbol().  For all platforms other than MIPS, this hook is 
> NULL and the behaviour is to write .gnu.hash (or not) exactly as they 
> always have done.  For MIPS, this hook is non-NULL and (when 
> --hash-style=gnu) will output a .gnu.xhash section which the MIPS 
> specific ELF backend then updates with a translation table to allow 
> .gnu.xhash symbol indices to be translated to MIPS .dynsym indices at 
> the right time during linking.  The principal changes to the common 
> support (in bfd/elflink.c) are in bfd_elf_size_dynsym_hash_dynstr() 
> which computes the correct size for the .gnu.[x]hash section; and in 
> elf_renumber_gnu_hash_syms() which did the sorting for .gnu.hash.  On 
> the ELF MIPS specific side, the main changes are to 
> mips_elf_sort_hash_table_f(); and in the addition of the backend 
> function _bfd_mips_elf_record_hash_symbol() with an associated new field 
> in struct mips_elf_link_hash_entry.

 Please rename the hook to `record_xhash_symbol', to give the name at 
least some meaning.  Right now it does not really say anything offhand, 
you need to know from elsewhere that it's specific to the modified GNU 
hash.

> In bfd_elf_size_dynsym_hash_dynstr() the code was modified as little as 
> possible in order to keep the diff small and simple to review; the 
> unfortunate corollary to that is that the changes are a little ugly and 
> somewhat brittle (conditionally shifting the contents pointers along 
> etc.)  This also to an extent dictated the layout of the .gnu.xhash 
> section: it is essentially a .gnu.hash section with a leading ngnusyms 
> word and a following translation table.  (The basic form of the 
> .gnu.hash section was retained so as also to keep the readelf changes to 
> a minimum.)

 With the switch to DT_MIPS_SYMTABNO as discussed below you can get rid of 
the shift, with the only change remaining being extra contents added at 
the end, which will be very little disturbing.

> The elf_renumber_gnu_hash_syms() function no longer unconditionally 
> renumbers the symbols.  Instead, if the backend supplies 
> record_hash_symbol(), then that is called instead to allow it to record 
> the offset of the translation table entry for that symbol.  The MIPS 
> backend will then fill this in later once it has finished fiddling with 
> the GOT(s).  I chose to pass an offset here rather than a pointer only 
> because I wasn't entirely certain if it was architecturally acceptable 
> to assume that the content of the section would never be replaced 
> sometime in between (although this is not currently the case).

 I think this is a good choice regardless of any assumptions you may or 
may not make, this way you have a single section pointer and handle the 
rest with offsets (of course you can still locally use temporary pointers 
calculated with these offsets if this makes code more reasonable).

 Given the changed semantics I think you need to rename the function 
though, as the old name becomes confusing now.  Something like 
`elf_gnu_hash_process_symidx' might do (no idea why the current name is 
plural, only one symbol is handled per call).  You'll need to update the 
comment too, both at the head of the function and at its (only) call site.

> On the ELF MIPS side, mips_elf_sort_hash_table_f() now also outputs the 
> final index of each symbol into the .gnu.xhash translation table as 
> required.  This is also a bit brittle since it assumes that nothing else 
> is going to come along and change the order yet again afterwards, but 
> that is also true of all of the already existing MIPS backend code.

 By design we have a certain order of processing within BFD, so as long as 
you follow it you can rely on it rather than assuming.  Here the sorting 
is the final processing stage before sections are written out to output, 
so nothing is going to change the order.  And if this is to be changed in 
the future, then it'll be the problem of whoever is going to make that 
change to ensure nothing breaks.

> No changes to gold are proposed here because, if I understand correctly, 
> there is as yet no general MIPS support in any case.

 There is MIPS support in GOLD I believe, but the tool is maintained 
separately and you don't need to update it at the same time, or at all 
unless you want.

> I have included the glibc changes here only as a convenience to 
> reviewers; I will be posting to libc-alpha as well.  (It is perhaps 
> interesting to note in passing that although --hash-style=gnu was 
> disabled in the linker, DT_GNU_HASH support was never removed from the 
> glibc MIPS sysdep dl-lookup.c.  This means that on MIPS the new and old 
> hashvals are currently always both computed at runtime for every symbol.  
> Fortunately in practice this cost appears to be entirely negligible.  
> Laterally, I suspect that Jakub Jelinek had independently confirmed this 
> insignificance and is why .hashvals never made it into .gnu.hash. [11] I 
> experimented with adding .hashvals as well as .gnu.xhash, but it made no 
> appreciable difference.)

 This looks like good material to discuss on `libc-alpha' indeed.  I've 
skimmed over the patch and you'll have to update it to use 
DT_MIPS_SYMTABNO too.

> The patch is relative to binutils-2_24 (which is where I'll ultimately 
> need it) but is equally applicable to trunk.  (The glibc patch is 
> relative to a lightly customized 2.16 but again is equally applicable to 
> trunk.)  As this is my first attempt at a patch for the linker, I've 
> omitted the ChangeLog and test cases for the moment, pending feedback.  
> Technical and procedural criticism is gratefully welcomed.  I should 
> very much like to thank the many who have taken the time to post on 
> these and related subjects over the years -- I would have found even 
> this modest attempt very difficult without the historical context.  
> Particular thanks are due to Michael Meeks, Jakub Jelinek, Richard 
> Sandiford, and Hiroki Kaminaga.  Errors in comprehension and judgement 
> are entirely my own.

 Our head has diverged a little bit, making your patch conflict in 3 
places.  All were purely mechanical and trivial to resolve, so I made them 
so as to apply your proposal to my working tree and run through the usual 
regression testing I do for any serious changes.  Right now it includes 35 
MIPS target configurations (subject to change), in addition to other 
targets.  There have been no regressions, so from the quality's point of 
view your change is fine to go in, once the problems I've listed below 
have been addressed.

> P.S.  It is not entirely clear to me how xgot support does or should 
> interact with multi-GOT. [12] With xgot supporting about 2**32 entries, 
> shouldn't it be the case that multiple GOTs are almost never needed?

 The problem with XGOT is it requires recompiling all the sources involved 
in a static link, also causing code size growth.  All this for a mere 
handful of programs which overflow 16-bit GOT, however with the code size 
regression applying to all the programs which may fit in 16-bit GOT just 
fine.  The multi-GOT approach does not suffer from this problem as it does 
not require changes to object files generated.  It does have some other 
limitations, but they are marginal enough for multi-GOT to have virtually 
superseded XGOT.  NB XGOT dates back to much earlier than multi-GOT, it 
was already specified in the original MIPS SVR4 psABI[1].

On Thu, 14 Jan 2016, Neil Schellenberger (neschell) wrote:

> In a separate email chain, it was also noted that DT_MIPS_SYMTABNO might 
> be used to compute ngnusyms.
> I originally avoided DT_MIPS_SYMTABNO only because I wasn't absolutely 
> certain that it was guaranteed that it (the number of entries in the 
> .dynsym section) was always going to be the same as the the number of 
> entries in chains (plus symidx) so I decided to play it safe.  That may 
> well be needless paranoia on my part.

 Yes, you absolutely have to avoid alignment issues in 64-bit ELF, and 
using DT_MIPS_SYMTABNO is the right choice -- the presence of this tag's 
entry is mandatory in the dynamic structure, as per Figure 5-7: "Dynamic 
Array Tags d_tag" in the MIPS psABI[1].  This entry is needed for the 
dynamic linker, to process the global parts of the GOT and the dynamic 
symbol table which are mapped to each other and the very cause of this all 
hassle, so you can rely on it; DT_MIPS_SYMTABNO=symndx+ngnusyms.

 A handful of nits as to the patch itself:

> diff --git a/bfd/elf-bfd.h b/bfd/elf-bfd.h
> index add80b3..5418e1d 100644
> --- a/bfd/elf-bfd.h
> +++ b/bfd/elf-bfd.h
> @@ -1216,6 +1216,13 @@ struct elf_backend_data
>    /* Return TRUE if symbol should be hashed in the `.gnu.hash' section.  */
>    bfd_boolean (*elf_hash_symbol) (struct elf_link_hash_entry *);
>  
> +  /* If non-NULL, called to register the location XLAT_LOC within
> +     .gnu.xhash at which real final dynindx for H should be written.
> +     If XLAT_LOC is zero, the symbol is not included in
> +     .gnu.xhash and no dynindx should be written.  */
> +  void (*record_hash_symbol)
> +    (struct elf_link_hash_entry *h, bfd_vma xlat_loc);
> +

 Please s/should/will/ as this is not a recommendation -- it describes how 
we will actually proceed.  Please also reformat the comment for more even 
alignment, i.e.:

  /* If non-NULL, called to register the location XLAT_LOC within
     .gnu.xhash at which real final dynindx for H will be written.
     If XLAT_LOC is zero, the symbol is not included in .gnu.xhash
     and no dynindx will be written.  */

> diff --git a/bfd/elflink.c b/bfd/elflink.c
> index 99b7ca1..1e3e2db 100644
> --- a/bfd/elflink.c
> +++ b/bfd/elflink.c
> @@ -271,8 +271,12 @@ _bfd_elf_link_create_dynamic_sections (bfd *abfd, struct bfd_link_info *info)
>  
>    if (info->emit_gnu_hash)
>      {
> -      s = bfd_make_section_anyway_with_flags (abfd, ".gnu.hash",
> -					      flags | SEC_READONLY);
> +      if (bed->record_hash_symbol == NULL)
> +	s = bfd_make_section_anyway_with_flags (abfd, ".gnu.hash",
> +						flags | SEC_READONLY);
> +      else
> +	s = bfd_make_section_anyway_with_flags (abfd, ".gnu.xhash",
> +						flags | SEC_READONLY);

 Sometimes you write the condition as `bed->record_hash_symbol == NULL' 
and sometimes as `bed->record_hash_symbol != NULL'.  I think it would make 
sense to keep it consistent, except where there is no `else' clause of 
course that is.  However in all cases where there actually is no `else' 
clause the condition is `bed->record_hash_symbol != NULL' and I find it 
more natural to read.  Can you therefore please adjust your code to use 
the `!=' variant throughout?

> @@ -5199,6 +5203,7 @@ struct collect_gnu_hash_codes
>    unsigned long int *counts;
>    bfd_vma *bitmask;
>    bfd_byte *contents;
> +  bfd_vma xlat;

 Please change the type to be `bfd_size_type' rather than `bfd_vma', as 
`bfd_vma' is used for target addresses (as the name implies) and we use 
`bfd_size_type' for offsets into structures processed internally (as these 
offsets will necessarily be in the same ranges as the respective sizes).

> @@ -5278,7 +5283,15 @@ elf_renumber_gnu_hash_syms (struct elf_link_hash_entry *h, void *data)
>    if (! (*s->bed->elf_hash_symbol) (h))
>      {
>        if (h->dynindx >= s->min_dynindx)
> -	h->dynindx = s->local_indx++;
> +	{
> +	  if (s->bed->record_hash_symbol != NULL)
> +	    {
> +	      (*s->bed->record_hash_symbol) (h, 0);
> +	      ++s->local_indx;
> +	    }
> +	  else
> +	    h->dynindx = s->local_indx++;
> +	}

 For consistency please use postincrementation in both legs.

> @@ -5295,7 +5308,14 @@ elf_renumber_gnu_hash_syms (struct elf_link_hash_entry *h, void *data)
>    bfd_put_32 (s->output_bfd, val,
>  	      s->contents + (s->indx[bucket] - s->symindx) * 4);
>    --s->counts[bucket];
> -  h->dynindx = s->indx[bucket]++;
> +  if (s->bed->record_hash_symbol != NULL)
> +    {
> +      bfd_vma xlat_loc = s->xlat + (s->indx[bucket]++ - s->symindx) * 4;
> +      BFD_ASSERT (xlat_loc != 0);
> +      (*s->bed->record_hash_symbol) (h, xlat_loc);

 Please add an empty line between variable declarations from following 
code.  With the observation below you can remove the assertion too.

> @@ -6645,12 +6685,15 @@ bfd_elf_size_dynsym_hash_dynstr (bfd *output_bfd, struct bfd_link_info *info)
>  		}
>  
>  	      cinfo.contents = contents;
> -
> +	      if (bed->record_hash_symbol != NULL)
> +		cinfo.xlat = contents + cinfo.nsyms * 4 - s->contents;

 I'd say just set `cinfo.xlat' unconditionally -- there's no benefit from 
the extra conditional and at worst the value won't be used (and then you 
can remove the assertion above).

> diff --git a/bfd/elfxx-mips.c b/bfd/elfxx-mips.c
> index d7498e1..d286a62 100644
> --- a/bfd/elfxx-mips.c
> +++ b/bfd/elfxx-mips.c
> @@ -3777,6 +3798,12 @@ mips_elf_sort_hash_table_f (struct mips_elf_link_hash_entry *h, void *data)
>        break;
>      }
>  
> +  if (h->gnuxhash_loc != 0 && hsd->gnuxhash != NULL)
> +    {
> +      bfd_put_32 (hsd->output_bfd, h->root.dynindx,
> +		  hsd->gnuxhash + h->gnuxhash_loc);
> +    }

 No need for curly brackets here.  Also please add a comment explaining 
what this piece does.

> @@ -15285,3 +15312,10 @@ _bfd_mips_post_process_headers (bfd *abfd, struct bfd_link_info *link_info)
>  	i_ehdrp->e_ident[EI_ABIVERSION] = 1;
>      }
>  }
> +
> +void
> +_bfd_mips_elf_record_hash_symbol (struct elf_link_hash_entry *h, bfd_vma xlat_loc)
> +{
> +  struct mips_elf_link_hash_entry *hmips = (struct mips_elf_link_hash_entry *) h;
> +  hmips->gnuxhash_loc = xlat_loc;
> +}

 Please add a comment to this function, explaining what it does.

 You need to wrap this piece as you went beyond 79 columns, see 
<https://www.gnu.org/prep/standards/html_node/Formatting.html>; I think 
separating the variable declaration from initialisation will help.  Also 
please add an empty line between the variable declaration and code that 
follows.

> diff --git a/binutils/readelf.c b/binutils/readelf.c
> index 61ea0ad..e87b111 100644
> --- a/binutils/readelf.c
> +++ b/binutils/readelf.c
> @@ -8454,6 +8457,16 @@ process_dynamic_section (FILE * file)
>  	    }
>  	  break;
>  
> +	case DT_GNU_XHASH:
> +	  dynamic_info_DT_GNU_XHASH = entry->d_un.d_val;
> +	  dynamic_info_DT_GNU_HASH = dynamic_info_DT_GNU_XHASH + 4;
> +	  if (do_dynamic)
> +	    {
> +	      print_vma (entry->d_un.d_val, PREFIX_HEX);
> +	      putchar ('\n');
> +	    }
> +	  break;

 With the removal of the `ngnusyms' entry you can make `DT_GNU_XHASH' fall 
through to `DT_GNU_HASH' here and avoid code duplication.

> @@ -9510,7 +9524,7 @@ process_symbol_table (FILE * file)
>        if (fseek (file,
>  		 (archive_file_offset
>  		  + offset_from_vma (file, buckets_vma
> -					   + 4 * (ngnubuckets + maxchain), 4)),
> +				     + 4 * (ngnubuckets + maxchain), 4)),

 Gratuitous change, please remove.  Formatting is wrong here (also after 
your change), but please don't mix coding style changes and code updates 
unless changing the ill-formatted line anyway.

> @@ -9543,7 +9581,45 @@ process_symbol_table (FILE * file)
>  
>        gnuchains = get_dynamic_data (file, maxchain, 4);
>  
> +      if (gnuchains == NULL)
> +	goto no_gnu_hash;
> +
> +      if (dynamic_info_DT_GNU_XHASH)
> +	{
> +	  if (fseek (file,
> +		     (archive_file_offset
> +		      + offset_from_vma (file, dynamic_info_DT_GNU_XHASH, 4)),
> +		     SEEK_SET))
> +	    {
> +	      error (_("Unable to seek to start of dynamic information\n"));
> +	      goto no_gnu_hash;
> +	    }
> +
> +	  if (fread (nb, 4, 1, file) != 1)
> +	    {
> +	      error (_("Failed to read in number of buckets\n"));
> +	      goto no_gnu_hash;
> +	    }
> +	  if (fseek (file,
> +		     (archive_file_offset
> +		      + offset_from_vma (file, buckets_vma
> +					       + 4 * (ngnubuckets
> +						      + maxchain), 4)),

 Bad formatting here, you need to enclose a wrapped expression in 
parentheses:

		      + offset_from_vma (file, (buckets_vma
						+ 4 * (ngnubuckets
						       + maxchain), 4))),

I wonder if one or more auxiliary variables can be used to reduce wrapping 
here and improve readability.  They might serve a documentation purpose 
even.

> @@ -9583,7 +9659,8 @@ process_symbol_table (FILE * file)
>  
>        if (dynamic_info_DT_GNU_HASH)
>  	{
> -	  printf (_("\nSymbol table of `.gnu.hash' for image:\n"));
> +	  printf (_("\nSymbol table of `%s' for image:\n"),
> +		  !dynamic_info_DT_GNU_XHASH ? ".gnu.hash" : ".gnu.xhash");

 Positive conditionals are easier to read, so please swap this expression.

> @@ -9597,7 +9674,10 @@ process_symbol_table (FILE * file)
>  
>  		do
>  		  {
> -		    print_dynamic_symbol (si, hn);
> +		    if (!dynamic_info_DT_GNU_XHASH)
> +		      print_dynamic_symbol (si, hn);
> +		    else
> +		      print_dynamic_symbol (gnuxlat[off], hn);

 And likewise this conditional.

> @@ -9931,7 +10011,8 @@ process_symbol_table (FILE * file)
>  	  return 0;
>  	}
>  
> -      printf (_("\nHistogram for `.gnu.hash' bucket list length (total of %lu buckets):\n"),
> +      printf (_("\nHistogram for `%s' bucket list length (total of %lu buckets):\n"),
> +	      !dynamic_info_DT_GNU_XHASH ? ".gnu.hash" : ".gnu.xhash",

 And likewise here.

> @@ -9977,6 +10058,8 @@ process_symbol_table (FILE * file)
>        free (lengths);
>        free (gnubuckets);
>        free (gnuchains);
> +      free (gnuxlat);
> +
>      }

 Extraneous new line, please remove.

> diff --git a/ld/testsuite/ld-mips-elf/hash1b.d b/ld/testsuite/ld-mips-elf/hash1b.d
> index 5af9037..6836ba9 100644
> --- a/ld/testsuite/ld-mips-elf/hash1b.d
> +++ b/ld/testsuite/ld-mips-elf/hash1b.d
> @@ -1,3 +1,4 @@
>  #source: hash1.s
>  #ld: -shared --hash-style=both
> -#error: .gnu.hash is incompatible with the MIPS ABI
> +#objdump: -dr
> +#pass
> diff --git a/ld/testsuite/ld-mips-elf/hash1c.d b/ld/testsuite/ld-mips-elf/hash1c.d
> index 09bff3c..72bdc18 100644
> --- a/ld/testsuite/ld-mips-elf/hash1c.d
> +++ b/ld/testsuite/ld-mips-elf/hash1c.d
> @@ -1,3 +1,4 @@
>  #source: hash1.s
>  #ld: -shared --hash-style=gnu
> -#error: .gnu.hash is incompatible with the MIPS ABI
> +#objdump: -dr
> +#pass

 Given that these tests (and `hash1a.d') were added along code to handle 
our non-support for GNU hash on the MIPS target in `mips_after_parse' with 
commit 73934d319dae it looks to me they were meant to verify that we bail 
out gracefully.  Now that this code is going away I think merely silencing 
the tests in this manner is not the way to go.

 With `mips_after_parse' removed they serve no purpose anymore, but I 
think at the very least we should have minimum coverage for the actual 
feature.  So instead please arrange for a dynamic symbol to be produced 
and then verify that the machinery works e.g. by passing the DSO built 
through `readelf -I'.  Especially as it seems we have little if any 
coverage here or at least I have troubles finding any relevant test cases.  
All I found is `ld/testsuite/ld-elf/hash.d' (which BTW needs to be updated 
accordingly; please include it with the next version of your patch) and 
that is really a bare minimum.

 I can help you with making such an update to these test cases if you find 
yourself having troubles with it.

 Then as the next step I think we should actually verify the contents of 
the section generated, so in addition to the minimal tests outlined above 
a `readelf -x .gnu.xhash' or `objdump -s -j .gnu.xhash' test would be good 
to have, with more than one dynamic symbol involved so as to make it less 
trivial.  This further test is not a prerequisite for the acceptance of 
your patch as far as I'm concerned, however if you'd like to experiment, 
learn a bit about our test suite and also to verify your work, then I 
encourage you and will greatly appreciate such an update.

 Please resubmit the change with the updates requested and a ChangeLog 
entry, written according to 
<https://www.gnu.org/prep/standards/html_node/Change-Logs.html> in case 
you haven't seen that, or let me know if you have any questions or 
comments.

 Regardless of this dynamic load performance improvement, which is greatly 
appreciated, what I think you might also look into to solve your problem 
is prelinking.  Have you considered that?  With the numerous mentions 
across the references you quoted you must have certainly been aware of the 
existence of the prelinker[2], which in principle is a tool to address the 
very problem you have, that is speeding up dynamic loading, especially 
where the number of dynamic symbols and/or DSOs is huge.  While not a part 
of the GNU project the tool is nevertheless free software available under 
the terms of the GNU GPL and currently maintained as a part of the Yocto 
Project[3].  Support for the MIPS target is included, which is what you 
may have not realised.

 The only drawback of prelinking I know of is that, by the nature of its 
operation, its incompatible with ASLR, so unless you need this feature the 
tool may be worth trying.

 And finally I apologise if the review process took maybe a little bit 
longer than you may have wished.  I'll do my best to drive it to a 
successful conclusion quickly now.  Again if you have any questions or 
comments, just let me know.

References:

[1] "SYSTEM V APPLICATION BINARY INTERFACE, MIPS RISC Processor
    Supplement, 3rd Edition", The Santa Cruz Operation, Inc., February 
    1996
    <http://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf>
[2] Jakub Jelinek, "Prelink", Red Hat, Inc., December 10, 2003
    <https://people.redhat.com/jakub/prelink/prelink.pdf>
[3] "Cross-Prelink"
    <https://wiki.yoctoproject.org/wiki/Cross-Prelink>

  Maciej