From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe2b.google.com (mail-vs1-xe2b.google.com [IPv6:2607:f8b0:4864:20::e2b]) by sourceware.org (Postfix) with ESMTPS id 5D7E63839436 for ; Fri, 20 May 2022 10:49:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5D7E63839436 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=sifive.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sifive.com Received: by mail-vs1-xe2b.google.com with SMTP id o2so7745348vsd.13 for ; Fri, 20 May 2022 03:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=nPFW6XeKtoiMpoOHmr1PD808dpovXMXSVyldo9enur4=; b=ZQ+JqhCSsYNd13twaCbrr17w2S0ipgahBNkuK7M0Yb2MdvYEF8uzXklFb7RCvMNd6C FSJGMZ+i3CwEDo8htSFzZb618lKr+A/naVflomX5GOxrOH7vFbKXKrsWTN6GChWWxht6 2LODUPxn9Z4g/CSm8V+nmwdG+bnVisH9A/FMbIYRkBrWIxJ/kLo4u2xGQcFlJpmFL+53 TCRxrjUe40Dg+OW46b/XHXKIQTNNyLM/Im8lumWOyX/fMuk0tVjWZBjx4GLCB3rZwMNQ 9IutWc0wfpDuDdHu+De4kP9nbgduhWDNCPjXnUM0XCCwGQIgI8VTTjxmyahiLquqVjiY Ak7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nPFW6XeKtoiMpoOHmr1PD808dpovXMXSVyldo9enur4=; b=BKri57gELPPb2CU2z9kZlbL1eCfprNjHzkX+IPLeKZlEvAyWbBo81pII+6I4Rvs9TO /VvywSvzLmUPmCInKbVSbT313k+UF1YPuvNV1VVrRRAmiG5oZQd8DE4zueTjZp6Smza6 UTGp7btJcPJ+OZsavy1IAtTjG/IIxJ/cPiEy4V+tsoJiqfqLtbsNsoXb4xXa1yF+l1/4 0/uWz7rUxf5u5Ia2jCVEBfPg9RDo1kDtJJm6RoL9BflEoysZNQf3NtFd9NVttSMdu0Ez EwWRAXpWTcI1+xjHIl0gVOCc1INsL1K5EouDaGDGGkEI4o/RONYVG3FbadTRXz54qj/0 +68w== X-Gm-Message-State: AOAM531WFd5Fc8jrDXJFARXlzCKutdHIC1etndn9emE0bVQhWMU1ZTMK Yjps03HTBOis87IO7pKM/tdHb4p6Iiiu1FIXsoPtnw== X-Google-Smtp-Source: ABdhPJyJhOiILn7a3mC9nI/9GUYUiJGmXgr5fTsIMfFofqfkDWmPnqatsWdbBkC6xKOZ2kTe73GRlzjqGp48QuGhGlo= X-Received: by 2002:a67:d78b:0:b0:32d:c0f:f6fe with SMTP id q11-20020a67d78b000000b0032d0c0ff6femr3805669vsj.51.1653043739818; Fri, 20 May 2022 03:48:59 -0700 (PDT) MIME-Version: 1.0 References: <20220412162601.146507-1-patrick@rivosinc.com> <20220502135048.1392596-1-patrick@rivosinc.com> <20220502135048.1392596-4-patrick@rivosinc.com> In-Reply-To: <20220502135048.1392596-4-patrick@rivosinc.com> From: Nelson Chu Date: Fri, 20 May 2022 18:48:48 +0800 Message-ID: Subject: Re: [PATCH v2 3/5] RISCV: Implement piecewise deletion To: "Patrick O'Neill" Cc: Binutils , gnu-toolchain@rivosinc.com, Jim Wilson , Palmer Dabbelt , Andrew Waterman , Kito Cheng Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 May 2022 10:49:04 -0000 Hi Patrick, Thanks for giving us this great idea. I would suggest that we don't consider the alignment issues for now, and focus on how to reduce the time of the normal relaxations, which includes lui, auipc, call and tls le relaxations. The current relax behavior are listed as follows, Pass 1: For lui, call and tls le relaxations, delete the redundant bytes immediately; For auipc, we build the table (pcgp_relocs) to chain the %pcrel_lo insn to the corresponding %pcrel_hi insn, and then actually delete the bytes until next relax pass. Pass 2: Delete the bytes for each R_RISCV_DELETE which are marked for auipc relaxations. Pass 3: Handle alignment, delete the bytes immediately for each R_RISCV_ALIGN. So the current relax order should be, Pass 1 -> Pass 1 -> ... -> Pass 1 -> Pass 2 -> Pass 3. Since we delete the bytes for lui/call/tls_le relaxations immediately, the Pass 1 will be done multiple times until there are no any bytes deleted. There is a problem here - the auipc will not be deleted until Pass 2, so we will need to update the pcgp_relocs table every time. You can see the PR28410 for details, https://sourceware.org/bugzilla/show_bug.cgi?id=28410. According to your patches, the relax order will be updated to, Pass 1 (Do not delete the bytes immediately) -> Pass 2 (Handle alignment) -> Pass 3 (Piecewise delete bytes for lui/auipc/call/tls_le/alignment) I would suggest we continue to put the alignment relaxations at the last relax pass, that would reduce a lot of problems, including losing too many relax opportunities in pass 1. Therefore, the order will be, Pass 1 (Do not delete the bytes immediately) -> Pass 2 (Piecewise delete bytes for lui/auipc/call/tls_le) -> Pass 3 (Handle alignment) However, the order doesn't help to increase the chance to relax more, since we actually delete the bytes in Pass 2, so we won't be doing the Pass 1 as much as we could, we just do it once. However, if we want to delete the bytes in the separate relax pass, then we probably need the "restart_relax" again, to do the "Pass 1 -> Pass 2" many times. Please see the following deprecated patch, https://sourceware.org/pipermail/binutils-cvs/2021-March/056052.html. Or we probably have another solution without adding the restart_relax flag. If my understanding is correct (please feel free to tell me if I'm wrong), we always delete the bytes and adjust the tables (reloc, local and global symbol tables) only for "one input section", and won't cross sections. That means, we could create another table to record the deleted bytes rather than marking them as R_RISCV_DELETE, and then actually delete them at the end of the _bfd_riscv_relax_section, for each input section. So that the order will be, Pass 1 (record the deleted bytes for lui/auipc/call/tls_le, and then actually piecewise delete them at the end of _bfd_riscv_relax_section) -> Pass 1 -> ... -> Pass 1 -> Pass 2 (Handle alignment). As for the improvement of alignment, we could discuss it in the later patches. I would like to accept the piecewise deletion for the lui/auipc/call/tls_le relaxations first, so let us start by making some progress here. BTW, we don't need to update the pcgp_relocs tables in the _bfd_riscv_relax_delete anymore, since we don't delete bytes immediately for each relax pattern. Thanks Nelson On Mon, May 2, 2022 at 9:54 PM Patrick O'Neill wrote: > > Rather than deleting bytes immediately, we can first mark the bytes for > deletion. Then, using piecewise deletion, we reduce the runtime from > O(n^2) to O(n). > > By moving the logic from the riscv_relax_delete_bytes function into the > RISCV_DELETE reloc handler, we can delete the bytes in a piecewise > manner. > > 2022-04-29 Patrick O'Neill > > * elfnn-riscv.c: Move deletion logic into DELETE reloc handler > and replace with DELETE reloc creation logic. > > Signed-off-by: Patrick O'Neill > --- > This change overwrites the R_RISCV_RELAX with a R_RISCV_DELETE reloc > when a relaxation wants to delete bytes. If it was going to delete a > reloc, we use that reloc rather than deleting it. > --- > v2 Changelog: > - Arrange functions as to reduce the diff of each patch and make > reviewing more manageable > --- > bfd/elfnn-riscv.c | 144 ++++++++++++++++++++++++++++------------------ > 1 file changed, 88 insertions(+), 56 deletions(-) > > diff --git a/bfd/elfnn-riscv.c b/bfd/elfnn-riscv.c > index 17f9607744f..10e76e8628d 100644 > --- a/bfd/elfnn-riscv.c > +++ b/bfd/elfnn-riscv.c > @@ -4054,15 +4054,28 @@ riscv_update_pcgp_relocs (riscv_pcgp_relocs *p, asection *deleted_sec, > } > } > > -/* Delete some bytes from a section while relaxing. */ > +typedef bool (*relax_func_t) (bfd *, asection *, asection *, > + struct bfd_link_info *, > + Elf_Internal_Rela *, > + bfd_vma, bfd_vma, bfd_vma, bool *, > + riscv_pcgp_relocs *, > + bool undefined_weak, bfd_vma *); > + > +/* Delete the bytes for R_RISCV_DELETE. */ > > static bool > -riscv_relax_delete_bytes (bfd *abfd, > - asection *sec, > - bfd_vma addr, > - size_t count, > - struct bfd_link_info *link_info, > - riscv_pcgp_relocs *p) > +_bfd_riscv_relax_delete (bfd *abfd, > + asection *sec, > + asection *sym_sec ATTRIBUTE_UNUSED, > + struct bfd_link_info *link_info, > + Elf_Internal_Rela *rel, > + bfd_vma symval ATTRIBUTE_UNUSED, > + bfd_vma max_alignment ATTRIBUTE_UNUSED, > + bfd_vma reserve_size ATTRIBUTE_UNUSED, > + bool *again ATTRIBUTE_UNUSED, > + riscv_pcgp_relocs *pcgp_relocs, > + bool undefined_weak ATTRIBUTE_UNUSED, > + bfd_vma *delete_total) > { > unsigned int i, symcount; > bfd_vma toaddr = sec->size; > @@ -4072,9 +4085,42 @@ riscv_relax_delete_bytes (bfd *abfd, > struct bfd_elf_section_data *data = elf_section_data (sec); > bfd_byte *contents = data->this_hdr.contents; > > + bfd_vma addr = rel->r_offset; > + size_t count = rel->r_addend; > + riscv_pcgp_relocs *p = pcgp_relocs; > + > + /* Find the next DELETE reloc (if one exists). */ > + Elf_Internal_Rela *relocs = elf_section_data (sec)->relocs; > + Elf_Internal_Rela *next_delete = NULL; > + /* Since we only replace existing relocs and don't add new relocs, the > + relocs are in sequential order. We can skip the relocs prior to this one, > + making this search linear time. */ > + int start = rel - relocs; > + for (unsigned int i = start; i < sec->reloc_count; i++) > + { > + next_delete = relocs + i; > + if (ELFNN_R_TYPE (next_delete->r_info) == R_RISCV_DELETE > + && next_delete->r_offset > rel->r_offset) > + break; > + else > + next_delete = NULL; > + } > + > + size_t bytes_to_move; > + /* Make this a piecewise deletion. */ > + if (next_delete == NULL) > + bytes_to_move = toaddr - addr - count; > + else > + bytes_to_move = next_delete->r_offset - addr - count; > + > /* Actually delete the bytes. */ > sec->size -= count; > - memmove (contents + addr, contents + addr + count, toaddr - addr - count); > + memmove (contents + addr, contents + addr + count + *delete_total, bytes_to_move); > + > + *delete_total += count; > + > + /* Delete the reloc. */ > + rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > > /* Adjust the location of all of the relocs. Note that we need not > adjust the addends, since all PC-relative references must be against > @@ -4172,12 +4218,20 @@ riscv_relax_delete_bytes (bfd *abfd, > return true; > } > > -typedef bool (*relax_func_t) (bfd *, asection *, asection *, > - struct bfd_link_info *, > - Elf_Internal_Rela *, > - bfd_vma, bfd_vma, bfd_vma, bool *, > - riscv_pcgp_relocs *, > - bool undefined_weak); > +/* Delete some bytes from a section while relaxing. */ > + > +static bool > +riscv_relax_delete_bytes (bfd_vma addr, > + size_t count, > + Elf_Internal_Rela *rel) > +{ > + /* Mark bytes for deletion. */ > + rel->r_info = ELFNN_R_INFO (0, R_RISCV_DELETE); > + rel->r_offset = addr; > + rel->r_addend = count; > + > + return true; > +} > > /* Relax AUIPC + JALR into JAL. */ > > @@ -4190,7 +4244,8 @@ _bfd_riscv_relax_call (bfd *abfd, asection *sec, asection *sym_sec, > bfd_vma reserve_size ATTRIBUTE_UNUSED, > bool *again, > riscv_pcgp_relocs *pcgp_relocs, > - bool undefined_weak ATTRIBUTE_UNUSED) > + bool undefined_weak ATTRIBUTE_UNUSED, > + bfd_vma *delete_total ATTRIBUTE_UNUSED) > { > bfd_byte *contents = elf_section_data (sec)->this_hdr.contents; > bfd_vma foff = symval - (sec_addr (sec) + rel->r_offset); > @@ -4252,8 +4307,7 @@ _bfd_riscv_relax_call (bfd *abfd, asection *sec, asection *sym_sec, > > /* Delete unnecessary JALR. */ > *again = true; > - return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + len, 8 - len, > - link_info, pcgp_relocs); > + return riscv_relax_delete_bytes (rel->r_offset + len, 8 - len, rel + 1); > } > > /* Traverse all output sections and return the max alignment. */ > @@ -4286,7 +4340,8 @@ _bfd_riscv_relax_lui (bfd *abfd, > bfd_vma reserve_size, > bool *again, > riscv_pcgp_relocs *pcgp_relocs, > - bool undefined_weak) > + bool undefined_weak, > + bfd_vma *delete_total ATTRIBUTE_UNUSED) > { > bfd_byte *contents = elf_section_data (sec)->this_hdr.contents; > bfd_vma gp = riscv_global_pointer_value (link_info); > @@ -4343,11 +4398,10 @@ _bfd_riscv_relax_lui (bfd *abfd, > return true; > > case R_RISCV_HI20: > - /* We can delete the unnecessary LUI and reloc. */ > + /* We can delete the unnecessary LUI and reuse the reloc. */ > rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > *again = true; > - return riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4, > - link_info, pcgp_relocs); > + return riscv_relax_delete_bytes (rel->r_offset, 4, rel); > > default: > abort (); > @@ -4379,8 +4433,7 @@ _bfd_riscv_relax_lui (bfd *abfd, > rel->r_info = ELFNN_R_INFO (ELFNN_R_SYM (rel->r_info), R_RISCV_RVC_LUI); > > *again = true; > - return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + 2, 2, > - link_info, pcgp_relocs); > + return riscv_relax_delete_bytes (rel->r_offset + 2, 2, rel + 1); > } > > return true; > @@ -4399,7 +4452,8 @@ _bfd_riscv_relax_tls_le (bfd *abfd, > bfd_vma reserve_size ATTRIBUTE_UNUSED, > bool *again, > riscv_pcgp_relocs *pcgp_relocs, > - bool undefined_weak ATTRIBUTE_UNUSED) > + bool undefined_weak ATTRIBUTE_UNUSED, > + bfd_vma *delete_total ATTRIBUTE_UNUSED) > { > /* See if this symbol is in range of tp. */ > if (RISCV_CONST_HIGH_PART (tpoff (link_info, symval)) != 0) > @@ -4418,11 +4472,10 @@ _bfd_riscv_relax_tls_le (bfd *abfd, > > case R_RISCV_TPREL_HI20: > case R_RISCV_TPREL_ADD: > - /* We can delete the unnecessary instruction and reloc. */ > + /* We can delete the unnecessary instruction and reuse the reloc. */ > rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > *again = true; > - return riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4, link_info, > - pcgp_relocs); > + return riscv_relax_delete_bytes (rel->r_offset, 4, rel); > > default: > abort (); > @@ -4442,7 +4495,8 @@ _bfd_riscv_relax_align (bfd *abfd, asection *sec, > bfd_vma reserve_size ATTRIBUTE_UNUSED, > bool *again ATTRIBUTE_UNUSED, > riscv_pcgp_relocs *pcgp_relocs ATTRIBUTE_UNUSED, > - bool undefined_weak ATTRIBUTE_UNUSED) > + bool undefined_weak ATTRIBUTE_UNUSED, > + bfd_vma *delete_total ATTRIBUTE_UNUSED) > { > bfd_byte *contents = elf_section_data (sec)->this_hdr.contents; > bfd_vma alignment = 1, pos; > @@ -4496,10 +4550,8 @@ _bfd_riscv_relax_align (bfd *abfd, asection *sec, > bfd_putl16 (RVC_NOP, contents + rel->r_offset + pos); > > /* Mark the excess bytes for deletion. */ > - rel->r_info = ELFNN_R_INFO (0, R_RISCV_DELETE); > - rel->r_addend = rel->r_addend - nop_bytes; > - rel->r_offset = rel->r_offset + nop_bytes; > - return true; > + return riscv_relax_delete_bytes (rel->r_offset + nop_bytes, > + rel->r_addend - nop_bytes, rel); > } > > /* Relax PC-relative references to GP-relative references. */ > @@ -4515,7 +4567,8 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED, > bfd_vma reserve_size, > bool *again ATTRIBUTE_UNUSED, > riscv_pcgp_relocs *pcgp_relocs, > - bool undefined_weak) > + bool undefined_weak, > + bfd_vma *delete_total ATTRIBUTE_UNUSED) > { > bfd_byte *contents = elf_section_data (sec)->this_hdr.contents; > bfd_vma gp = riscv_global_pointer_value (link_info); > @@ -4654,28 +4707,6 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED, > return true; > } > > -/* Delete the bytes for R_RISCV_DELETE. */ > - > -static bool > -_bfd_riscv_relax_delete (bfd *abfd, > - asection *sec, > - asection *sym_sec ATTRIBUTE_UNUSED, > - struct bfd_link_info *link_info, > - Elf_Internal_Rela *rel, > - bfd_vma symval ATTRIBUTE_UNUSED, > - bfd_vma max_alignment ATTRIBUTE_UNUSED, > - bfd_vma reserve_size ATTRIBUTE_UNUSED, > - bool *again ATTRIBUTE_UNUSED, > - riscv_pcgp_relocs *pcgp_relocs ATTRIBUTE_UNUSED, > - bool undefined_weak ATTRIBUTE_UNUSED) > -{ > - if (!riscv_relax_delete_bytes (abfd, sec, rel->r_offset, rel->r_addend, > - link_info, NULL)) > - return false; > - rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > - return true; > -} > - > /* Called by after_allocation to set the information of data segment > before relaxing. */ > > @@ -4706,6 +4737,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, > unsigned int i; > bfd_vma max_alignment, reserve_size = 0; > riscv_pcgp_relocs pcgp_relocs; > + bfd_vma delete_bytes = 0; > > *again = false; > > @@ -4941,7 +4973,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, > > if (!relax_func (abfd, sec, sym_sec, info, rel, symval, > max_alignment, reserve_size, again, > - &pcgp_relocs, undefined_weak)) > + &pcgp_relocs, undefined_weak, &delete_bytes)) > goto fail; > } > > -- > 2.25.1 >