public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Nelson Chu <nelson.chu@sifive.com>
To: "Patrick O'Neill" <patrick@rivosinc.com>
Cc: Binutils <binutils@sourceware.org>,
	gnu-toolchain@rivosinc.com,
	 Jim Wilson <jim.wilson.gcc@gmail.com>,
	Palmer Dabbelt <palmer@rivosinc.com>,
	 Andrew Waterman <andrew@sifive.com>,
	Kito Cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH v2 3/5] RISCV: Implement piecewise deletion
Date: Fri, 20 May 2022 18:48:48 +0800	[thread overview]
Message-ID: <CAJYME4Gmp_VUpwX6qdkZ3n4u9gmODxPJjHk0tBsoENpF_P4kUQ@mail.gmail.com> (raw)
In-Reply-To: <20220502135048.1392596-4-patrick@rivosinc.com>

Hi Patrick,

Thanks for giving us this great idea.  I would suggest that we don't
consider the alignment issues for now, and focus on how to reduce the
time of the normal relaxations, which includes lui, auipc, call and
tls le relaxations.  The current relax behavior are listed as follows,

Pass 1: For lui, call and tls le relaxations, delete the redundant
bytes immediately;  For auipc, we build the table (pcgp_relocs) to
chain the %pcrel_lo insn to the corresponding %pcrel_hi insn, and then
actually delete the bytes until next relax pass.

Pass 2: Delete the bytes for each R_RISCV_DELETE  which are marked for
auipc relaxations.

Pass 3: Handle alignment, delete the bytes immediately for each R_RISCV_ALIGN.

So the current relax order should be,
Pass 1 -> Pass 1 -> ... -> Pass 1 -> Pass 2 -> Pass 3.  Since we
delete the bytes for lui/call/tls_le relaxations immediately, the Pass
1 will be done multiple times until there are no any bytes deleted.
There is a problem here - the auipc will not be deleted until Pass 2,
so we will need to update the pcgp_relocs table every time.  You can
see the PR28410 for details,
https://sourceware.org/bugzilla/show_bug.cgi?id=28410.

According to your patches, the relax order will be updated to,
Pass 1 (Do not delete the bytes immediately) -> Pass 2 (Handle
alignment) -> Pass 3 (Piecewise delete bytes for
lui/auipc/call/tls_le/alignment)

I would suggest we continue to put the alignment relaxations at the
last relax pass, that would reduce a lot of problems, including losing
too many relax opportunities in pass 1.  Therefore, the order will be,

Pass 1 (Do not delete the bytes immediately) -> Pass 2 (Piecewise
delete bytes for lui/auipc/call/tls_le) -> Pass 3 (Handle alignment)

However, the order doesn't help to increase the chance to relax more,
since we actually delete the bytes in Pass 2, so we won't be doing the
Pass 1 as much as we could, we just do it once.  However, if we want
to delete the bytes in the separate relax pass, then we probably need
the "restart_relax" again, to do the "Pass 1 -> Pass 2" many times.
Please see the following deprecated patch,
https://sourceware.org/pipermail/binutils-cvs/2021-March/056052.html.

Or we probably have another solution without adding the restart_relax
flag.  If my understanding is correct (please feel free to tell me if
I'm wrong), we always delete the bytes and adjust the tables (reloc,
local and global symbol tables) only for "one input section", and
won't cross sections.  That means, we could create another table to
record the deleted bytes rather than marking them as R_RISCV_DELETE,
and then actually delete them at the end of the
_bfd_riscv_relax_section, for each input section.  So that the order
will be,

Pass 1 (record the deleted bytes for lui/auipc/call/tls_le, and then
actually piecewise delete them at the end of _bfd_riscv_relax_section)
-> Pass 1 -> ... -> Pass 1 -> Pass 2 (Handle alignment).

As for the improvement of alignment, we could discuss it in the later
patches.  I would like to accept the piecewise deletion for the
lui/auipc/call/tls_le relaxations first, so let us start by making
some progress here.

BTW, we don't need to update the pcgp_relocs tables in the
_bfd_riscv_relax_delete anymore, since we don't delete bytes
immediately for each relax pattern.

Thanks
Nelson

On Mon, May 2, 2022 at 9:54 PM Patrick O'Neill <patrick@rivosinc.com> wrote:
>
> Rather than deleting bytes immediately, we can first mark the bytes for
> deletion. Then, using piecewise deletion, we reduce the runtime from
> O(n^2) to O(n).
>
> By moving the logic from the riscv_relax_delete_bytes function into the
> RISCV_DELETE reloc handler, we can delete the bytes in a piecewise
> manner.
>
> 2022-04-29 Patrick O'Neill <patrick@rivosinc.com>
>
>         * elfnn-riscv.c: Move deletion logic into DELETE reloc handler
>           and replace with DELETE reloc creation logic.
>
> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
> ---
> This change overwrites the R_RISCV_RELAX with a R_RISCV_DELETE reloc
> when a relaxation wants to delete bytes. If it was going to delete a
> reloc, we use that reloc rather than deleting it.
> ---
> v2 Changelog:
> - Arrange functions as to reduce the diff of each patch and make
>   reviewing more manageable
> ---
>  bfd/elfnn-riscv.c | 144 ++++++++++++++++++++++++++++------------------
>  1 file changed, 88 insertions(+), 56 deletions(-)
>
> diff --git a/bfd/elfnn-riscv.c b/bfd/elfnn-riscv.c
> index 17f9607744f..10e76e8628d 100644
> --- a/bfd/elfnn-riscv.c
> +++ b/bfd/elfnn-riscv.c
> @@ -4054,15 +4054,28 @@ riscv_update_pcgp_relocs (riscv_pcgp_relocs *p, asection *deleted_sec,
>      }
>  }
>
> -/* Delete some bytes from a section while relaxing.  */
> +typedef bool (*relax_func_t) (bfd *, asection *, asection *,
> +                             struct bfd_link_info *,
> +                             Elf_Internal_Rela *,
> +                             bfd_vma, bfd_vma, bfd_vma, bool *,
> +                             riscv_pcgp_relocs *,
> +                             bool undefined_weak, bfd_vma *);
> +
> +/* Delete the bytes for R_RISCV_DELETE.  */
>
>  static bool
> -riscv_relax_delete_bytes (bfd *abfd,
> -                         asection *sec,
> -                         bfd_vma addr,
> -                         size_t count,
> -                         struct bfd_link_info *link_info,
> -                         riscv_pcgp_relocs *p)
> +_bfd_riscv_relax_delete (bfd *abfd,
> +                        asection *sec,
> +                        asection *sym_sec ATTRIBUTE_UNUSED,
> +                        struct bfd_link_info *link_info,
> +                        Elf_Internal_Rela *rel,
> +                        bfd_vma symval ATTRIBUTE_UNUSED,
> +                        bfd_vma max_alignment ATTRIBUTE_UNUSED,
> +                        bfd_vma reserve_size ATTRIBUTE_UNUSED,
> +                        bool *again ATTRIBUTE_UNUSED,
> +                        riscv_pcgp_relocs *pcgp_relocs,
> +                        bool undefined_weak ATTRIBUTE_UNUSED,
> +                        bfd_vma *delete_total)
>  {
>    unsigned int i, symcount;
>    bfd_vma toaddr = sec->size;
> @@ -4072,9 +4085,42 @@ riscv_relax_delete_bytes (bfd *abfd,
>    struct bfd_elf_section_data *data = elf_section_data (sec);
>    bfd_byte *contents = data->this_hdr.contents;
>
> +  bfd_vma addr = rel->r_offset;
> +  size_t count = rel->r_addend;
> +  riscv_pcgp_relocs *p = pcgp_relocs;
> +
> +  /* Find the next DELETE reloc (if one exists).  */
> +  Elf_Internal_Rela *relocs = elf_section_data (sec)->relocs;
> +  Elf_Internal_Rela *next_delete = NULL;
> +  /* Since we only replace existing relocs and don't add new relocs, the
> +     relocs are in sequential order. We can skip the relocs prior to this one,
> +     making this search linear time.  */
> +  int start = rel - relocs;
> +  for (unsigned int i = start; i < sec->reloc_count; i++)
> +    {
> +      next_delete = relocs + i;
> +      if (ELFNN_R_TYPE (next_delete->r_info) == R_RISCV_DELETE
> +         && next_delete->r_offset > rel->r_offset)
> +       break;
> +      else
> +       next_delete = NULL;
> +    }
> +
> +  size_t bytes_to_move;
> +  /* Make this a piecewise deletion.  */
> +  if (next_delete == NULL)
> +    bytes_to_move = toaddr - addr - count;
> +  else
> +    bytes_to_move = next_delete->r_offset - addr - count;
> +
>    /* Actually delete the bytes.  */
>    sec->size -= count;
> -  memmove (contents + addr, contents + addr + count, toaddr - addr - count);
> +  memmove (contents + addr, contents + addr + count + *delete_total, bytes_to_move);
> +
> +  *delete_total += count;
> +
> +  /* Delete the reloc.  */
> +  rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE);
>
>    /* Adjust the location of all of the relocs.  Note that we need not
>       adjust the addends, since all PC-relative references must be against
> @@ -4172,12 +4218,20 @@ riscv_relax_delete_bytes (bfd *abfd,
>    return true;
>  }
>
> -typedef bool (*relax_func_t) (bfd *, asection *, asection *,
> -                             struct bfd_link_info *,
> -                             Elf_Internal_Rela *,
> -                             bfd_vma, bfd_vma, bfd_vma, bool *,
> -                             riscv_pcgp_relocs *,
> -                             bool undefined_weak);
> +/* Delete some bytes from a section while relaxing.  */
> +
> +static bool
> +riscv_relax_delete_bytes (bfd_vma addr,
> +                         size_t count,
> +                         Elf_Internal_Rela *rel)
> +{
> +  /* Mark bytes for deletion.  */
> +  rel->r_info = ELFNN_R_INFO (0, R_RISCV_DELETE);
> +  rel->r_offset = addr;
> +  rel->r_addend = count;
> +
> +  return true;
> +}
>
>  /* Relax AUIPC + JALR into JAL.  */
>
> @@ -4190,7 +4244,8 @@ _bfd_riscv_relax_call (bfd *abfd, asection *sec, asection *sym_sec,
>                        bfd_vma reserve_size ATTRIBUTE_UNUSED,
>                        bool *again,
>                        riscv_pcgp_relocs *pcgp_relocs,
> -                      bool undefined_weak ATTRIBUTE_UNUSED)
> +                      bool undefined_weak ATTRIBUTE_UNUSED,
> +                      bfd_vma *delete_total ATTRIBUTE_UNUSED)
>  {
>    bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
>    bfd_vma foff = symval - (sec_addr (sec) + rel->r_offset);
> @@ -4252,8 +4307,7 @@ _bfd_riscv_relax_call (bfd *abfd, asection *sec, asection *sym_sec,
>
>    /* Delete unnecessary JALR.  */
>    *again = true;
> -  return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + len, 8 - len,
> -                                  link_info, pcgp_relocs);
> +  return riscv_relax_delete_bytes (rel->r_offset + len, 8 - len, rel + 1);
>  }
>
>  /* Traverse all output sections and return the max alignment.  */
> @@ -4286,7 +4340,8 @@ _bfd_riscv_relax_lui (bfd *abfd,
>                       bfd_vma reserve_size,
>                       bool *again,
>                       riscv_pcgp_relocs *pcgp_relocs,
> -                     bool undefined_weak)
> +                     bool undefined_weak,
> +                     bfd_vma *delete_total ATTRIBUTE_UNUSED)
>  {
>    bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
>    bfd_vma gp = riscv_global_pointer_value (link_info);
> @@ -4343,11 +4398,10 @@ _bfd_riscv_relax_lui (bfd *abfd,
>           return true;
>
>         case R_RISCV_HI20:
> -         /* We can delete the unnecessary LUI and reloc.  */
> +         /* We can delete the unnecessary LUI and reuse the reloc.  */
>           rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE);
>           *again = true;
> -         return riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4,
> -                                          link_info, pcgp_relocs);
> +         return riscv_relax_delete_bytes (rel->r_offset, 4, rel);
>
>         default:
>           abort ();
> @@ -4379,8 +4433,7 @@ _bfd_riscv_relax_lui (bfd *abfd,
>        rel->r_info = ELFNN_R_INFO (ELFNN_R_SYM (rel->r_info), R_RISCV_RVC_LUI);
>
>        *again = true;
> -      return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + 2, 2,
> -                                      link_info, pcgp_relocs);
> +      return riscv_relax_delete_bytes (rel->r_offset + 2, 2, rel + 1);
>      }
>
>    return true;
> @@ -4399,7 +4452,8 @@ _bfd_riscv_relax_tls_le (bfd *abfd,
>                          bfd_vma reserve_size ATTRIBUTE_UNUSED,
>                          bool *again,
>                          riscv_pcgp_relocs *pcgp_relocs,
> -                        bool undefined_weak ATTRIBUTE_UNUSED)
> +                        bool undefined_weak ATTRIBUTE_UNUSED,
> +                        bfd_vma *delete_total ATTRIBUTE_UNUSED)
>  {
>    /* See if this symbol is in range of tp.  */
>    if (RISCV_CONST_HIGH_PART (tpoff (link_info, symval)) != 0)
> @@ -4418,11 +4472,10 @@ _bfd_riscv_relax_tls_le (bfd *abfd,
>
>      case R_RISCV_TPREL_HI20:
>      case R_RISCV_TPREL_ADD:
> -      /* We can delete the unnecessary instruction and reloc.  */
> +      /* We can delete the unnecessary instruction and reuse the reloc.  */
>        rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE);
>        *again = true;
> -      return riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4, link_info,
> -                                      pcgp_relocs);
> +      return riscv_relax_delete_bytes (rel->r_offset, 4, rel);
>
>      default:
>        abort ();
> @@ -4442,7 +4495,8 @@ _bfd_riscv_relax_align (bfd *abfd, asection *sec,
>                         bfd_vma reserve_size ATTRIBUTE_UNUSED,
>                         bool *again ATTRIBUTE_UNUSED,
>                         riscv_pcgp_relocs *pcgp_relocs ATTRIBUTE_UNUSED,
> -                       bool undefined_weak ATTRIBUTE_UNUSED)
> +                       bool undefined_weak ATTRIBUTE_UNUSED,
> +                       bfd_vma *delete_total ATTRIBUTE_UNUSED)
>  {
>    bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
>    bfd_vma alignment = 1, pos;
> @@ -4496,10 +4550,8 @@ _bfd_riscv_relax_align (bfd *abfd, asection *sec,
>      bfd_putl16 (RVC_NOP, contents + rel->r_offset + pos);
>
>    /* Mark the excess bytes for deletion.  */
> -  rel->r_info = ELFNN_R_INFO (0, R_RISCV_DELETE);
> -  rel->r_addend = rel->r_addend - nop_bytes;
> -  rel->r_offset = rel->r_offset + nop_bytes;
> -  return true;
> +  return riscv_relax_delete_bytes (rel->r_offset + nop_bytes,
> +                                  rel->r_addend - nop_bytes, rel);
>  }
>
>  /* Relax PC-relative references to GP-relative references.  */
> @@ -4515,7 +4567,8 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED,
>                      bfd_vma reserve_size,
>                      bool *again ATTRIBUTE_UNUSED,
>                      riscv_pcgp_relocs *pcgp_relocs,
> -                    bool undefined_weak)
> +                    bool undefined_weak,
> +                    bfd_vma *delete_total ATTRIBUTE_UNUSED)
>  {
>    bfd_byte *contents = elf_section_data (sec)->this_hdr.contents;
>    bfd_vma gp = riscv_global_pointer_value (link_info);
> @@ -4654,28 +4707,6 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED,
>    return true;
>  }
>
> -/* Delete the bytes for R_RISCV_DELETE.  */
> -
> -static bool
> -_bfd_riscv_relax_delete (bfd *abfd,
> -                        asection *sec,
> -                        asection *sym_sec ATTRIBUTE_UNUSED,
> -                        struct bfd_link_info *link_info,
> -                        Elf_Internal_Rela *rel,
> -                        bfd_vma symval ATTRIBUTE_UNUSED,
> -                        bfd_vma max_alignment ATTRIBUTE_UNUSED,
> -                        bfd_vma reserve_size ATTRIBUTE_UNUSED,
> -                        bool *again ATTRIBUTE_UNUSED,
> -                        riscv_pcgp_relocs *pcgp_relocs ATTRIBUTE_UNUSED,
> -                        bool undefined_weak ATTRIBUTE_UNUSED)
> -{
> -  if (!riscv_relax_delete_bytes (abfd, sec, rel->r_offset, rel->r_addend,
> -                                link_info, NULL))
> -    return false;
> -  rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE);
> -  return true;
> -}
> -
>  /* Called by after_allocation to set the information of data segment
>     before relaxing.  */
>
> @@ -4706,6 +4737,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec,
>    unsigned int i;
>    bfd_vma max_alignment, reserve_size = 0;
>    riscv_pcgp_relocs pcgp_relocs;
> +  bfd_vma delete_bytes = 0;
>
>    *again = false;
>
> @@ -4941,7 +4973,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec,
>
>        if (!relax_func (abfd, sec, sym_sec, info, rel, symval,
>                        max_alignment, reserve_size, again,
> -                      &pcgp_relocs, undefined_weak))
> +                      &pcgp_relocs, undefined_weak, &delete_bytes))
>         goto fail;
>      }
>
> --
> 2.25.1
>

  reply	other threads:[~2022-05-20 10:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 16:25 [PATCH 0/4] RISCV: Improve linker time complexity Patrick O'Neill
2022-04-12 16:25 ` [PATCH 1/4] RISCV: Add linker relaxation tests Patrick O'Neill
2022-04-12 16:25 ` [PATCH 2/4] RISCV: Arrange DELETE pass after .align pass Patrick O'Neill
2022-04-12 16:26 ` [PATCH 3/4] RISCV: Implement piecewise deletion Patrick O'Neill
2022-04-12 16:26 ` [PATCH 4/4] RISCV: Improve runtime of align directives Patrick O'Neill
2022-04-13  0:58 ` [PATCH 0/4] RISCV: Improve linker time complexity Kito Cheng
2022-04-13  2:23   ` Palmer Dabbelt
2022-04-13  5:12   ` Alan Modra
2022-04-13 18:11     ` Palmer Dabbelt
2022-04-25 17:26       ` Patrick O'Neill
2022-05-02 13:50 ` [PATCH v2 0/5] " Patrick O'Neill
2022-05-02 13:50   ` [PATCH v2 1/5] RISCV: Add linker relaxation tests Patrick O'Neill
2022-05-02 13:50   ` [PATCH v2 2/5] RISCV: Arrange DELETE pass after .align pass Patrick O'Neill
2022-05-02 13:50   ` [PATCH v2 3/5] RISCV: Implement piecewise deletion Patrick O'Neill
2022-05-20 10:48     ` Nelson Chu [this message]
2022-05-20 17:36       ` Patrick O'Neill
2022-05-02 13:50   ` [PATCH v2 4/5] RISCV: Improve runtime of align directives Patrick O'Neill
2022-05-02 13:50   ` [PATCH v2 5/5] RISCV: Add --defer-deletion flag Patrick O'Neill
2022-05-27 21:20   ` [PATCH v3 0/3] RISCV: Improve linker time complexity Patrick O'Neill
2022-05-27 21:20     ` [PATCH v3 1/3] RISCV: Add linker relaxation tests Patrick O'Neill
2022-05-27 21:20     ` [PATCH v3 2/3] RISCV: Implement piecewise deletion Patrick O'Neill
2022-05-27 21:20     ` [PATCH v3 3/3] RISCV: Add --defer-deletion flag Patrick O'Neill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJYME4Gmp_VUpwX6qdkZ3n4u9gmODxPJjHk0tBsoENpF_P4kUQ@mail.gmail.com \
    --to=nelson.chu@sifive.com \
    --cc=andrew@sifive.com \
    --cc=binutils@sourceware.org \
    --cc=gnu-toolchain@rivosinc.com \
    --cc=jim.wilson.gcc@gmail.com \
    --cc=kito.cheng@gmail.com \
    --cc=palmer@rivosinc.com \
    --cc=patrick@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).