From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <binutils-return-69306-listarch-binutils=sources.redhat.com@sourceware.org>
Received: (qmail 14796 invoked by alias); 14 Dec 2010 12:06:38 -0000
Received: (qmail 14749 invoked by uid 22791); 14 Dec 2010 12:06:31 -0000
X-SWARE-Spam-Status: No, hits=-1.7 required=5.0	tests=AWL,BAYES_00,TW_FX,TW_XF,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 14 Dec 2010 12:06:24 +0000
Received: (qmail 2672 invoked from network); 14 Dec 2010 12:06:17 -0000
Received: from unknown (HELO pl.orcam.me.uk) (macro@127.0.0.2)  by mail.codesourcery.com with ESMTPA; 14 Dec 2010 12:06:17 -0000
Date: Tue, 14 Dec 2010 13:30:00 -0000
From: "Maciej W. Rozycki" <macro@codesourcery.com>
To: Richard Sandiford <rdsandiford@googlemail.com>,     Ilie Garbacea <ilie@mips.com>, Joseph Myers <joseph@codesourcery.com>
cc: binutils@sourceware.org, Chao-ying Fu <fu@mips.com>,     Rich Fuhler <rich@mips.com>, David Lau <davidlau@mips.com>,     Kevin Mills <kevinm@mips.com>, Catherine Moore <clm@codesourcery.com>,     Nathan Sidwell <nathan@codesourcery.com>,     Nathan Froyd <froydnj@codesourcery.com>
Subject: Re: [PATCH] MIPS: microMIPS ASE support
In-Reply-To: <871v5n9m7e.fsf@firetop.home>
Message-ID: <alpine.DEB.1.10.1012131400320.4142@tp.orcam.me.uk>
References: <alpine.DEB.1.10.1005181806590.4023@tp.orcam.me.uk> <87y6fa9u3t.fsf@firetop.home> <alpine.DEB.1.10.1007112325020.2824@tp.orcam.me.uk> <876302kqvu.fsf@firetop.home> <alpine.DEB.1.10.1012061924490.5345@tp.orcam.me.uk> <871v5n9m7e.fsf@firetop.home>
User-Agent: Alpine 1.10 (DEB 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Mailing-List: contact binutils-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <binutils.sourceware.org>
List-Subscribe: <mailto:binutils-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/binutils/>
List-Post: <mailto:binutils@sourceware.org>
List-Help: <mailto:binutils-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: binutils-owner@sourceware.org
X-SW-Source: 2010-12/txt/msg00454.txt.bz2

Hello,

 Ilie, Joseph: I have some questions for you below, please respond.

> This is a review of everything up to the end of elfxx-mips.c.  Not sure
> when I'll be able to do the rest.

 OK, thanks for your effort so far.

> I think we're getting close, so could you post any updates as patches
> relative to this one, rather than reposting the whole thing?

 Yes, I think it will make it easier for us both to keep track of what has 
been addressed and what not.  I have no technical problem with including 
further changes in a separate patch.

> > @@ -2372,7 +3002,14 @@ bfd_elf64_bfd_reloc_name_lookup (bfd *ab
> >         i++)
> >      if (mips16_elf64_howto_table_rela[i].name != NULL
> >  	&& strcasecmp (mips16_elf64_howto_table_rela[i].name, r_name) == 0)
> > -      return &mips16_elf64_howto_table_rela[i];
> > +
> > +  for (i = 0;
> > +       i < (sizeof (micromips_elf64_howto_table_rela)
> > +	    / sizeof (micromips_elf64_howto_table_rela[0]));
> > +       i++)
> > +    if (micromips_elf64_howto_table_rela[i].name != NULL
> > +	&& strcasecmp (micromips_elf64_howto_table_rela[i].name, r_name) == 0)
> > +      return &micromips_elf64_howto_table_rela[i];
> >  
> >    if (strcasecmp (elf_mips_gnu_vtinherit_howto.name, r_name) == 0)
> >      return &elf_mips_gnu_vtinherit_howto;
> 
> This hunk looks wrong.  I doubt you meant to remove the mips16 line.
> Same for elfn32-mips.c.

 I believe I fixed this problem with elf32-mips.c, but obviously must have 
missed it here, thanks for spotting.

> > @@ -5040,6 +5159,10 @@ mips_elf_calculate_relocation (bfd *abfd
> >  	}
> >  
> >        target_is_16_bit_code_p = ELF_ST_IS_MIPS16 (h->root.other);
> > +      /* If the output section is the PLT section,
> > +         then the target is not microMIPS.  */
> > +      target_is_micromips_code_p = (htab->splt != sec)
> > +				    && ELF_ST_IS_MICROMIPS (h->root.other);
> >      }
> >  
> >    /* If this is a reference to a 16-bit function with a stub, we need
> 
> Formatting nit:
> 
>       /* If the output section is the PLT section,
>          then the target is not microMIPS.  */
>       target_is_micromips_code_p = (htab->splt != sec
> 				    && ELF_ST_IS_MICROMIPS (h->root.other));
> 
> More importantly, the comment isn't any help.  When do we create
> statically-resolved relocations against .plt?

 PLT entries are currently hardcoded to standard MIPS code -- this is 
what the comment refers to.  This has nothing to do with relocations 
being static or dynamic.  The implication is if jumping to the PLT from 
microMIPS code then the usual JALX's restrictions apply (no sibling calls 
and no short instructions in the delay slot).

 I've fixed formatting, thanks -- there were so many of such errors I 
could have easily missed one or two.

> >    /* Calls from 16-bit code to 32-bit code and vice versa require the
> > -     mode change.  */
> > -  *cross_mode_jump_p = !info->relocatable
> > -		       && ((r_type == R_MIPS16_26 && !target_is_16_bit_code_p)
> > -			   || ((r_type == R_MIPS_26 || r_type == R_MIPS_JALR)
> > -			       && target_is_16_bit_code_p));
> > +     mode change.  This is not required for calls to undefined weak
> > +     symbols, which should never be executed at runtime.  */
> 
> But why do we need to go out of our way to check for them?  I'm sure
> there's a good reason, but the comment doesn't give much clue what it is.

 Undefined weak symbols are, well, undefined, so they have resolved to nil 
and are meant never to be jumped to, so we don't want to error out on them 
just because they do not have the ISA bit set and a JALX therefore 
required could not be used for some reason, like the invocation being a 
sibling call or because it would not satisfy the fixed delay slot 
dependency.

 So we decide never to make a cross-mode jump in this situation and leave 
the original jump instruction (i.e. JAL, JALS or JR) in place.  If the 
instruction is indeed reached, then 1 will be written to the PC rather 
than 0 that would "canonically" be required here, but the outcome will be 
the same (assuming the zeroth page is unmapped), i.e. a segfault will 
happen.

 Joseph, I reckon you were involved with this piece -- did I get all the 
context right here?

> > @@ -9223,6 +9499,16 @@ _bfd_mips_elf_relocate_section (bfd *out
> >  	case bfd_reloc_ok:
> >  	  break;
> >  
> > +	case bfd_reloc_outofrange:
> > +	  if (jal_reloc_p (howto->type))
> > +	    {
> > +	      msg = _("JALX to not a word-aligned address");
> > +	      info->callbacks->warning
> > +		(info, msg, name, input_bfd, input_section, rel->r_offset);
> > +	      return FALSE;
> > +	    }
> > +	  /* Fall through.  */
> > +
> >  	default:
> >  	  abort ();
> >  	  break;
> 
> I suppose that should be something like "JALX to a non-word-aligned address".

 Updated, thanks.

> > +  /* microMIPS local and global symbols have the least significant bit
> > +     set.  Because all values are either multiple of 2 or multiple of
> > +     4, compensating for this bit is as simple as setting it in
> > +     comparisons. Just to be sure, check anyway before proceeding. */
> > +  BFD_ASSERT (addr % 2 == 0);
> > +  BFD_ASSERT (toaddr % 2 == 0);
> > +  BFD_ASSERT (count % 2 == 0);
> 
> I'm suspicious about the toaddr assert.  You can create text sections
> with odd-valued sizes:
> 
> 	.section	.text.foo,"ax",@progbits
> 	.byte	1
> 
> and as discussed elsewhere, asserts are really to double-check
> conditions that we think must already hold.

 Will check that.  [FIXME]

> > +  addr |= 1;
> > +  toaddr |= 1;
> > +
> > +  /* Adjust the local symbols defined in this section.  */
> > +  symtab_hdr = &elf_tdata (abfd)->symtab_hdr;
> > +  isym = (Elf_Internal_Sym *) symtab_hdr->contents;
> > +  for (isymend = isym + symtab_hdr->sh_info; isym < isymend; isym++)
> > +    {
> > +      if (isym->st_shndx == sec_shndx
> > +	  && isym->st_value > addr
> > +	  && isym->st_value < toaddr)
> > +	isym->st_value -= count;
> > +    }
> 
> Hmm, microMIPS symbols created in the proper, blessed way (by proper,
> blessed ways of defining MIPS16 labels in assembly) should be even in
> the input object, and IIRC, the linker keeps local symbols that way
> internally.  Only symbols entered into the hash table are odd.
> (c.f. mips_elf_calculate_relocation, where we have to add 1 to
> local symbols.)  So I would have expected the "|= 1"s to be applied
> after this block rather than before it.  Even then:
> 
> > +  /* Now adjust the global symbols defined in this section.  */
> > +  symcount = (symtab_hdr->sh_size / sizeof (Elf32_External_Sym)
> > +	      - symtab_hdr->sh_info);
> > +  sym_hashes = start_hashes = elf_sym_hashes (abfd);
> > +  end_hashes = sym_hashes + symcount;
> > +
> > +  for (; sym_hashes < end_hashes; sym_hashes++)
> > +    {
> > +      struct elf_link_hash_entry *sym_hash = *sym_hashes;
> > +
> > +      if ((sym_hash->root.type == bfd_link_hash_defined
> > +	   || sym_hash->root.type == bfd_link_hash_defweak)
> > +	  && sym_hash->root.u.def.section == sec
> > +	  && sym_hash->root.u.def.value > addr
> > +	  && sym_hash->root.u.def.value < toaddr)
> > +	sym_hash->root.u.def.value -= count;
> > +    }
> 
> ...if we're willing to extend the upper bound in this way, I wonder
> whether there's really any point having an upper bound at all.
> 
> Then again, why doesn't the standard range (used by most targets)
> include toaddr?  If you define an end marker:
> 
> end_of_section:
> 	# nothing after this
> 
> then wouldn't end_of_section == toaddr, and shouldn't it be included?

 Ilie, I'm told you were responsible for this piece of code -- would you 
please respond to these questions?

> I see some targets rightly adjust st_size too.  We should do
> the same here.

 Agreed.  [FIXME]

> > +static unsigned long
> > +find_match (unsigned long opcode, const struct opcode_descriptor insn[])
> > +{
> > +    unsigned long indx;
> > +
> > +    /* First opcode_table[] entry is ignored.  */
> > +    for (indx = 1; insn[indx].mask != 0; indx++)
> > +      if (MATCH (opcode, insn[indx]))
> > +	return indx;
> > +
> > +    return 0;
> > +}
> 
> But _why_ is the first entry ignored?

 There must be a reason, Ilie?

> > +/* Check if there *might* be a 16-bit branch or jump at OFFSET
> > +   with a fixed or variable length delay slot.  */
> > +
> > +static int
> > +relax_dslot_norel16 (bfd *abfd, bfd_byte *ptr)
> 
> There's no parameter called "offset".  How about:

 Yes, that's my typo; it should have been PTR, not OFFSET.

> /* If PTR is known to point to a 16-bit branch or jump, return the
>    minimum length of its delay slot, otherwise return 0.  */

 I'll keep the "*might*" because this is heuristics allowing possible 
false positives.  It may be the second half of another instruction and not 
a branch at all.  Otherwise OK, thanks.

> At least, that's what the code seems to do, since it returns 2 for
> things like:
> 
> > +  { /* "b(g|l)(e|t)z", "s,p",	*/ 0x40000000, 0xff200000 },
> 
> which as far as I can tell from the opcodes patch allows both 16-bit
> and 32-bit delay slots.  (IV-g doesn't seem to be public yet, so I can't
> check the spec.)  But from the way the function is used, it looks
> like 0 really means "no size constraints", so why doesn't the function
> return 0 for instructions like these?

 Only link variations of branches and jumps have a fixed-size delay slot 
-- that's because the link register is set to a fixed offset from the 
delay-slot instruction (either four as with JAL or two as with JALS).  Of 
all such jumps and branches only JALX does not have a JALXS counterpart 
(regrettably, as it would have made life of software much, much easier).

 I've explained the meaning of 0 below -- it's unsafe to return this value 
for a variable-size delay slot.

 BTW, I've just spotted and fixed a bug in relax_dslot_norel32 -- 
jal_insn_32_bd32 should be matched against instead of bz_insns_32 
obviously.  Hopefully you didn't get confused by that.

> Why are these routines checking for branches anyway?  We don't support
> branches without relocations when doing this kind of relaxation.
> I'd have expected only indirect jumps to be handled here.
> 
> Likewise relax_dslot_norel32.

 They are checked for because a LUI cannot be deleted entirely if it is in 
a delay slot as to do so would change the semantics of code being 
modified.  The implementation of relax_dslot_norel{16,32}() reflect this.  
These functions return the minimum delay slot size required, i.e. 0 if 
none, 2 if a fixed 16-bit or a variable-size delay slot is needed or 4 if 
a fixed 32-bit delay slot is needed.  In addition to this some branches 
may be replaced with a compact (no-delay-slot) variation making the 
deletion of the LUI possible (as in relax_dslot_rel()).

 If the deletion of the LUI is not possible after all, then it's replaced 
with a NOP according to the size of the delay slot.

 Good point about the relocations -- perhaps the instruction tables used 
for matching can be reduced to include jumps only (I'd keep absolute jumps 
here too as they may legitimately appear in input with no relocation 
associated and will normally cause no trouble even if moved around a bit, 
i.e. unless they cross to another 27-bit region).  I'll think about that.  
Original code I received didn't check for branch relocations here at all 
and relied purely on instruction decoding, sigh... :(  [FIXME]

> > +  /* We don't have to do anything for a relocatable link, if
> > +     this section does not have relocs, or if this is not a
> > +     code section.  */
> > +
> > +  if (link_info->relocatable
> > +      || (sec->flags & SEC_RELOC) == 0
> > +      || sec->reloc_count == 0
> > +      || (sec->flags & SEC_CODE) == 0)
> > +    return TRUE;
> 
> Should we also check whether the micromips bit is set in the ELF
> header flags?

 Yes, with my recent file ASE flag improvements this will definitely make 
sense, thanks for the hint.  [FIXME]

> > +	      nextopc = bfd_get_16 (abfd, contents + irel->r_offset + 4);
> > +	      /* B16  */
> > +	      if (MATCH (nextopc, b_insn_16))
> > +		break;
> > +	      /* JR16  */
> > +	      if (MATCH (nextopc, jr_insn_16) && reg != JR16_REG (nextopc))
> > +		break;
> > +	      /* BEQZ16, BNEZ16  */
> > +	      if (MATCH (nextopc, bz_insn_16) && reg != BZ16_REG (nextopc))
> > +		break;
> > +	      /* JALR16  */
> > +	      if (MATCH (nextopc, jalr_insn_16_bd32)
> > +		  && reg != JR16_REG (nextopc) && reg != RA)
> > +		break;
> > +	      continue;
> 
> I think this should be split out into a subfunction, like the relax_* ones.
> 
> /* PTR points to a 2-byte instruction.  Return true if it is a 16-bit
>    branch that does not use register REG.  */
> 
> static bfd_boolean
> is_16bit_branch_p (bfd *abfd, bfd_byte *ptr, unsigned int reg)
> 
> Likewise for the 4-byte case.

 Indeed, that should be possible.  I'll look into it.  [FIXME]

> > +	  /* Give up if not the same register used with both relocations.  */
> 
> "Give up unless the same register is used with both relocations."

 Fixed, thanks.

> > +	  /* Now adjust pcrval, subtracting the offset to the LO16 reloc,
> > +	     adding the size of the LUI instruction deleted and rounding
> > +	     up to take masking of the two LSBs into account.  */
> > +	  pcrval = ((pcrval - offset + 4 + 3) | 3) ^ 3;
> 
> Maybe its just me, but since pcrval was calculated much earlier,
> and hasn't been used since being set, I'd find it easier to understand
> if we simply calculate the pcrval for irel[1] directly:
> 
>       /* Calculate the pc-relative distance of the target from the LO16,
> 	 assuming that we can delete the 4-byte LUI.  */
>       pcrval = (symval
> 		- (sec->output_section->vma + sec->output_offset)
> 		- (irel[1].r_offset - 4));
> 
>       /* Round up to take the masking of the two LSBs into account.  */
>       pcrval = (prcval + 3) & -4;
> 
> What happens when we can't delete the LUI, because of its being in
> a branch delay slot?

 Hmm, that's a bug (although unlikely to trigger), thanks for spotting it.  
I've fixed it by making sure the distance fits into ADDIUPC both with and 
without the LUI deleted.  It's a minimal pessimisation only for a boundary 
case, so not worth any further improvement.

 Recalculating pcrval looks to me like a waste of CPU time here -- we've 
already calculated it for the other "if" clauses, so we just need to fetch 
it from the local frame now.

> > +	  int bdsize = -1;
> [...]
> > +	  /* See if there is a jump or a branch reloc preceding the
> > +	     LUI instruction immediately.  If so, then perform jump
> > +	     or branch relaxation.  */
> > +	  for (ibrrel = internal_relocs; ibrrel < irelend; ibrrel++)
> > +	    {
> > +	      offset = irel->r_offset - ibrrel->r_offset;
> > +	      if (offset != 2 && offset != 4)
> > +		continue;
> > +
> > +	      br_r_type = ELF32_R_TYPE (ibrrel->r_info);
> > +	      if (offset == 2
> > +		  && br_r_type != R_MICROMIPS_PC7_S1
> > +		  && br_r_type != R_MICROMIPS_PC10_S1
> > +		  && br_r_type != R_MICROMIPS_JALR)
> > +		continue;
> > +	      if (offset == 4
> > +		  && br_r_type != R_MICROMIPS_26_S1
> > +		  && br_r_type != R_MICROMIPS_PC16_S1
> > +		  && br_r_type != R_MICROMIPS_JALR)
> > +		continue;
> > +
> > +	      bdsize = relax_dslot_rel (abfd,
> > +					contents + ibrrel->r_offset, offset);
> > +	      break;
> > +	    }
> > +
> > +	  /* Otherwise see if the LUI instruction *might* be in a
> > +	     branch delay slot.  */
> > +	  if (bdsize == -1)
> > +	    {
> > +	      bfd_byte *ptr = contents + ibrrel->r_offset;
> > +
> > +	      /* Assume the 32-bit LUI instruction is in a fixed delay slot
> > +	         and cannot be optimised away.  */
> > +	      bdsize = 4;
> > +
> > +	      if (ibrrel->r_offset >= 2)
> > +		bdsize16 = relax_dslot_norel16 (abfd, ptr - 2);
> > +	      if (ibrrel->r_offset >= 4)
> > +		bdsize32 = relax_dslot_norel32 (abfd, ptr - 4);
> 
> I think the "ibrrel"s in the last block should be "irel"s instead.
> As best I can tell, "ibrrel" == "irelend" here.  Is this case
> covered by the testsuite?

 Correct and fixed, thanks.

 There are no testcases addressing any of this linker relaxation I would 
be aware of, sorry.

> I'm not sure whether I like the idea of making this function quadratic
> simply to decide whether the preceding instruction is a branch,
> but there's probably not much we can do about that.

 Hmm, we could do this in two passes over the reloc table (still O(n)) and 
first make a fast auxiliary data structure (e.g. a hash) to keep addresses 
of branch and jump relocations.  Given linker relaxation is off by default 
I'd vote for this as a future improvement.

 As a summary here's the patch resulting from the changes I made as noted 
above.  No regressions so far.  Regrettably owing to higher priority 
commitments I'll have to defer further work on issues marked with [FIXME] 
above until a later date.

  Maciej

binutils-umips-fix.diff
Index: binutils-fsf-trunk-quilt/bfd/elf64-mips.c
===================================================================
--- binutils-fsf-trunk-quilt.orig/bfd/elf64-mips.c	2010-12-13 15:20:06.000000000 +0000
+++ binutils-fsf-trunk-quilt/bfd/elf64-mips.c	2010-12-13 15:22:01.000000000 +0000
@@ -3002,6 +3002,7 @@ bfd_elf64_bfd_reloc_name_lookup (bfd *ab
        i++)
     if (mips16_elf64_howto_table_rela[i].name != NULL
 	&& strcasecmp (mips16_elf64_howto_table_rela[i].name, r_name) == 0)
+      return &mips16_elf64_howto_table_rela[i];
 
   for (i = 0;
        i < (sizeof (micromips_elf64_howto_table_rela)
Index: binutils-fsf-trunk-quilt/bfd/elfn32-mips.c
===================================================================
--- binutils-fsf-trunk-quilt.orig/bfd/elfn32-mips.c	2010-12-13 15:20:06.000000000 +0000
+++ binutils-fsf-trunk-quilt/bfd/elfn32-mips.c	2010-12-13 15:22:50.000000000 +0000
@@ -2820,6 +2820,7 @@ bfd_elf32_bfd_reloc_name_lookup (bfd *ab
        i++)
     if (elf_mips16_howto_table_rela[i].name != NULL
 	&& strcasecmp (elf_mips16_howto_table_rela[i].name, r_name) == 0)
+      return &elf_mips16_howto_table_rela[i];
 
   for (i = 0;
        i < (sizeof (elf_micromips_howto_table_rela)
Index: binutils-fsf-trunk-quilt/bfd/elfxx-mips.c
===================================================================
--- binutils-fsf-trunk-quilt.orig/bfd/elfxx-mips.c	2010-12-13 15:20:06.000000000 +0000
+++ binutils-fsf-trunk-quilt/bfd/elfxx-mips.c	2010-12-13 19:08:46.000000000 +0000
@@ -5161,8 +5161,8 @@ mips_elf_calculate_relocation (bfd *abfd
       target_is_16_bit_code_p = ELF_ST_IS_MIPS16 (h->root.other);
       /* If the output section is the PLT section,
          then the target is not microMIPS.  */
-      target_is_micromips_code_p = (htab->splt != sec)
-				    && ELF_ST_IS_MICROMIPS (h->root.other);
+      target_is_micromips_code_p = ((htab->splt != sec)
+				    && ELF_ST_IS_MICROMIPS (h->root.other));
     }
 
   /* If this is a reference to a 16-bit function with a stub, we need
@@ -9502,7 +9502,7 @@ _bfd_mips_elf_relocate_section (bfd *out
 	case bfd_reloc_outofrange:
 	  if (jal_reloc_p (howto->type))
 	    {
-	      msg = _("JALX to not a word-aligned address");
+	      msg = _("JALX to a non-word-aligned address");
 	      info->callbacks->warning
 		(info, msg, name, input_bfd, input_section, rel->r_offset);
 	      return FALSE;
@@ -12188,8 +12188,10 @@ find_match (unsigned long opcode, const 
 
 /* Delay slot size and relaxation support.  */
 
-/* Check if there *might* be a 16-bit branch or jump at OFFSET
-   with a fixed or variable length delay slot.  */
+/* If PTR points to what *might* be a 16-bit branch or jump, then
+   return the minimum length of its delay slot, otherwise return 0.
+   Non-zero results are not definitive as we might be checking against
+   the second half of another instruction.  */
 
 static int
 relax_dslot_norel16 (bfd *abfd, bfd_byte *ptr)
@@ -12212,8 +12214,10 @@ relax_dslot_norel16 (bfd *abfd, bfd_byte
   return bdsize;
 }
 
-/* Check if there *might* be a 32-bit branch or jump at OFFSET
-   with a fixed or variable length delay slot.  */
+/* If PTR points to what *might* be a 32-bit branch or jump, then
+   return the minimum length of its delay slot, otherwise return 0.
+   Non-zero results are not definitive as we might be checking against
+   the second half of another instruction.  */
 
 static int
 relax_dslot_norel32 (bfd *abfd, bfd_byte *ptr)
@@ -12223,7 +12227,7 @@ relax_dslot_norel32 (bfd *abfd, bfd_byte
 
   opcode = (bfd_get_16 (abfd, ptr) << 16) | bfd_get_16 (abfd, ptr + 2);
   if (find_match (opcode, call_insns_32_bd32) != 0
-      || find_match (opcode, bz_insns_32) != 0
+      || MATCH (opcode, jal_insn_32_bd32) != 0
       || MATCH (opcode, jalx_insn_32_bd32) != 0)
     /* 32-bit branch/jump with a 32-bit delay slot.  */
     bdsize = 4;
@@ -12550,14 +12554,13 @@ _bfd_mips_elf_relax_section (bfd *abfd, 
 	  nextopc  = bfd_get_16 (abfd, contents + irel[1].r_offset    ) << 16;
 	  nextopc |= bfd_get_16 (abfd, contents + irel[1].r_offset + 2);
 
-	  /* Give up if not the same register used with both relocations.  */
+	  /* Give up unless the same register used with both relocations.  */
 	  if (OP32_SREG (nextopc) != reg)
 	    continue;
 
-	  /* Now adjust pcrval, subtracting the offset to the LO16 reloc,
-	     adding the size of the LUI instruction deleted and rounding
-	     up to take masking of the two LSBs into account.  */
-	  pcrval = ((pcrval - offset + 4 + 3) | 3) ^ 3;
+	  /* Now adjust pcrval, subtracting the offset to the LO16 reloc
+	     and rounding up to take masking of the two LSBs into account.  */
+	  pcrval = ((pcrval - offset + 3) | 3) ^ 3;
 
 	  /* R_MICROMIPS_LO16 relaxation to R_MICROMIPS_HI0_LO16.  */
 	  if (IS_BITSIZE (symval, 16))
@@ -12573,9 +12576,14 @@ _bfd_mips_elf_relax_section (bfd *abfd, 
 			  contents + irel[1].r_offset);
 	    }
 
-	  /* R_MICROMIPS_LO16 / ADDIU relaxation to R_MICROMIPS_PC23_S2.  */
+	  /* R_MICROMIPS_LO16 / ADDIU relaxation to R_MICROMIPS_PC23_S2.
+	     As we don't know at this stage if we'll be able to delete
+	     the LUI in the end, we check both the original PC-relative
+	     distance and one with 4 added to take LUI deletion into
+	     account.  */
 	  else if (symval % 4 == 0
 		   && IS_BITSIZE (pcrval, 25)
+		   && IS_BITSIZE (pcrval + 4, 25)
 		   && MATCH (nextopc, addiu_insn)
 		   && OP32_TREG (nextopc) == OP32_SREG (nextopc)
 		   && OP16_VALID_REG (OP32_TREG (nextopc)))
@@ -12627,15 +12635,15 @@ _bfd_mips_elf_relax_section (bfd *abfd, 
 	     branch delay slot.  */
 	  if (bdsize == -1)
 	    {
-	      bfd_byte *ptr = contents + ibrrel->r_offset;
+	      bfd_byte *ptr = contents + irel->r_offset;
 
 	      /* Assume the 32-bit LUI instruction is in a fixed delay slot
 	         and cannot be optimised away.  */
 	      bdsize = 4;
 
-	      if (ibrrel->r_offset >= 2)
+	      if (irel->r_offset >= 2)
 		bdsize16 = relax_dslot_norel16 (abfd, ptr - 2);
-	      if (ibrrel->r_offset >= 4)
+	      if (irel->r_offset >= 4)
 		bdsize32 = relax_dslot_norel32 (abfd, ptr - 4);
 
 	      if (bdsize16 == -1 && bdsize32 == -1)