From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by sourceware.org (Postfix) with ESMTPS id 2BF56382D502 for ; Fri, 20 May 2022 17:36:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2BF56382D502 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pj1-x102b.google.com with SMTP id l20-20020a17090a409400b001dd2a9d555bso8464062pjg.0 for ; Fri, 20 May 2022 10:36:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to; bh=FLgNMGDVWIp7knUQr8+YIhN1Eb/jErWZgq+Mc+9go2Y=; b=E1bJUtHjmSGVI/1xpshwU1OjGLdzMopVJbx+jhtf6eIzdz/94+QqtAfUSX3w3dr3hk JkRNOReegLe8dkQ2ZotnbCZwhHjTUfiRlKRtUYFkeuDWwL0RfUAJzf1XxOi+ZhP1W8BI O1RRTPviUJkYu2g1IAs+Dlf0ZD7tDyEOyUigKRu4zykMWl3bOhWauqz4l6cgijqTaACB 74Lohip0N9nbvFNGu53tgALGNfVv3BSGtRkstB6iaw5KVS8e0vFcmBbAaFMWtw7+xaNS dvM5735wZglkLFrFUXQW/Kx2creEo0PRIm7Uybp3CHlmGUO913lSPEXf+KNO7fafb+WT tnoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to; bh=FLgNMGDVWIp7knUQr8+YIhN1Eb/jErWZgq+Mc+9go2Y=; b=8Pt2wnegg8SlyeG+12GBkgTbA98bdeGakr5JREGhvLeKo3wS3tUFxg53N51E07kxRf bWQ01ckglGHgutklm0KlftZCst3X+g7cIjhVEouTmLuYpGDhMqCMiHQCKDh1rMPZrKeS BQqFYqOr/qlgqwisSckF/4ftPw2RjuX1nA8bvfRT9ARuS6Sjjz0hn+Ux/YXSD2a16kgy aqUPOXNg1fYH2NZcFSS8VYj6VMJ/4MhNSpXPgrKoQCJ2iIc1D8a5D6EV7K/bVEkbIsyn dcL72/fVpQSiS8szNpsBcylxzuS1satrWrbWA6tAMa0hW7R+AR1XzyX0P5aGs7QhLST4 KRCQ== X-Gm-Message-State: AOAM530N2cZKbjCAthqUleoyQxsjI6inGmHKuY7XZA5RYqHWynFDEiJU /0WENWEUUoLfCyCWRBaJ2di2YQ== X-Google-Smtp-Source: ABdhPJzXCVIfaSFT+Zbz8B9BjneQDQmieCRN/LCqn/NpRNDsvmq3Au0X7Q9dIbHeeL2F0nNCNMrJiA== X-Received: by 2002:a17:903:110f:b0:15e:7d64:bdad with SMTP id n15-20020a170903110f00b0015e7d64bdadmr11237214plh.59.1653068164125; Fri, 20 May 2022 10:36:04 -0700 (PDT) Received: from [10.0.17.150] ([12.3.194.138]) by smtp.gmail.com with ESMTPSA id l13-20020a6542cd000000b003c619f3d086sm42542pgp.2.2022.05.20.10.36.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 20 May 2022 10:36:03 -0700 (PDT) Message-ID: Date: Fri, 20 May 2022 10:36:01 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v2 3/5] RISCV: Implement piecewise deletion Content-Language: en-US To: Nelson Chu Cc: Binutils , gnu-toolchain@rivosinc.com, Jim Wilson , Palmer Dabbelt , Andrew Waterman , Kito Cheng References: <20220412162601.146507-1-patrick@rivosinc.com> <20220502135048.1392596-1-patrick@rivosinc.com> <20220502135048.1392596-4-patrick@rivosinc.com> From: Patrick O'Neill In-Reply-To: X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, HTML_MESSAGE, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 May 2022 17:36:09 -0000 On 5/20/22 03:48, Nelson Chu wrote: > Hi Patrick, > > Thanks for giving us this great idea. I would suggest that we don't > consider the alignment issues for now, and focus on how to reduce the > time of the normal relaxations, which includes lui, auipc, call and > tls le relaxations. The current relax behavior are listed as follows, > > Pass 1: For lui, call and tls le relaxations, delete the redundant > bytes immediately; For auipc, we build the table (pcgp_relocs) to > chain the %pcrel_lo insn to the corresponding %pcrel_hi insn, and then > actually delete the bytes until next relax pass. > > Pass 2: Delete the bytes for each R_RISCV_DELETE which are marked for > auipc relaxations. > > Pass 3: Handle alignment, delete the bytes immediately for each R_RISCV_ALIGN. > > So the current relax order should be, > Pass 1 -> Pass 1 -> ... -> Pass 1 -> Pass 2 -> Pass 3. Since we > delete the bytes for lui/call/tls_le relaxations immediately, the Pass > 1 will be done multiple times until there are no any bytes deleted. > There is a problem here - the auipc will not be deleted until Pass 2, > so we will need to update the pcgp_relocs table every time. You can > see the PR28410 for details, > https://sourceware.org/bugzilla/show_bug.cgi?id=28410. > > According to your patches, the relax order will be updated to, > Pass 1 (Do not delete the bytes immediately) -> Pass 2 (Handle > alignment) -> Pass 3 (Piecewise delete bytes for > lui/auipc/call/tls_le/alignment) > > I would suggest we continue to put the alignment relaxations at the > last relax pass, that would reduce a lot of problems, including losing > too many relax opportunities in pass 1. Therefore, the order will be, > > Pass 1 (Do not delete the bytes immediately) -> Pass 2 (Piecewise > delete bytes for lui/auipc/call/tls_le) -> Pass 3 (Handle alignment) Hi Nelson, Thanks for reviewing the patch! I had misunderstood the flow of the passes. I had assumed that the flow was: Pass 1 (again=true) -> Pass 2 (skipped when again is true) -> Pass 3 -> Pass 1 (again=false) -> Pass 2 (performed) -> Pass 3 Basically trying the whole sequence until again=false. When I put a print statement to display the pass that is being performed I get this output (isolating print statements from a single section id): (relax-call-2.s from Patch 1/5) Pass 0 Pass 0 Pass 1 Pass 2 Pass 0 Pass 1 Pass 2 The second pass through was what had tripped me up. Thanks for the clarification. > However, the order doesn't help to increase the chance to relax more, > since we actually delete the bytes in Pass 2, so we won't be doing the > Pass 1 as much as we could, we just do it once. However, if we want > to delete the bytes in the separate relax pass, then we probably need > the "restart_relax" again, to do the "Pass 1 -> Pass 2" many times. > Please see the following deprecated patch, > https://sourceware.org/pipermail/binutils-cvs/2021-March/056052.html. > > Or we probably have another solution without adding the restart_relax > flag. If my understanding is correct (please feel free to tell me if > I'm wrong), we always delete the bytes and adjust the tables (reloc, > local and global symbol tables) only for "one input section", and > won't cross sections. That means, we could create another table to > record the deleted bytes rather than marking them as R_RISCV_DELETE, > and then actually delete them at the end of the > _bfd_riscv_relax_section, for each input section. So that the order > will be, > > Pass 1 (record the deleted bytes for lui/auipc/call/tls_le, and then > actually piecewise delete them at the end of _bfd_riscv_relax_section) > -> Pass 1 -> ... -> Pass 1 -> Pass 2 (Handle alignment). I agree that that would be better. I'll try implementing this - it has the added benefit of being reusable for the alignment directives in future patches. It's fundamentally the same issue - deleting bytes piecewise per pass rather than immediately. > As for the improvement of alignment, we could discuss it in the later > patches. I would like to accept the piecewise deletion for the > lui/auipc/call/tls_le relaxations first, so let us start by making > some progress here. Sounds good. > BTW, we don't need to update the pcgp_relocs tables in the > _bfd_riscv_relax_delete anymore, since we don't delete bytes > immediately for each relax pattern. Thanks, I'll remove that for v3. Thank you again for looking at this, linkers are complicated and I'm still learning ;) Thanks, Patrick > > Thanks > Nelson > > On Mon, May 2, 2022 at 9:54 PM Patrick O'Neill wrote: >> Rather than deleting bytes immediately, we can first mark the bytes for >> deletion. Then, using piecewise deletion, we reduce the runtime from >> O(n^2) to O(n). >> >> By moving the logic from the riscv_relax_delete_bytes function into the >> RISCV_DELETE reloc handler, we can delete the bytes in a piecewise >> manner. >> >> 2022-04-29 Patrick O'Neill >> >> * elfnn-riscv.c: Move deletion logic into DELETE reloc handler >> and replace with DELETE reloc creation logic. >> >> Signed-off-by: Patrick O'Neill >> --- >> This change overwrites the R_RISCV_RELAX with a R_RISCV_DELETE reloc >> when a relaxation wants to delete bytes. If it was going to delete a >> reloc, we use that reloc rather than deleting it. >> --- >> v2 Changelog: >> - Arrange functions as to reduce the diff of each patch and make >> reviewing more manageable >> --- >> bfd/elfnn-riscv.c | 144 ++++++++++++++++++++++++++++------------------ >> 1 file changed, 88 insertions(+), 56 deletions(-) >> >> diff --git a/bfd/elfnn-riscv.c b/bfd/elfnn-riscv.c >> index 17f9607744f..10e76e8628d 100644 >> --- a/bfd/elfnn-riscv.c >> +++ b/bfd/elfnn-riscv.c >> @@ -4054,15 +4054,28 @@ riscv_update_pcgp_relocs (riscv_pcgp_relocs *p, asection *deleted_sec, >> } >> } >> >> -/* Delete some bytes from a section while relaxing. */ >> +typedef bool (*relax_func_t) (bfd *, asection *, asection *, >> + struct bfd_link_info *, >> + Elf_Internal_Rela *, >> + bfd_vma, bfd_vma, bfd_vma, bool *, >> + riscv_pcgp_relocs *, >> + bool undefined_weak, bfd_vma *); >> + >> +/* Delete the bytes for R_RISCV_DELETE. */ >> >> static bool >> -riscv_relax_delete_bytes (bfd *abfd, >> - asection *sec, >> - bfd_vma addr, >> - size_t count, >> - struct bfd_link_info *link_info, >> - riscv_pcgp_relocs *p) >> +_bfd_riscv_relax_delete (bfd *abfd, >> + asection *sec, >> + asection *sym_sec ATTRIBUTE_UNUSED, >> + struct bfd_link_info *link_info, >> + Elf_Internal_Rela *rel, >> + bfd_vma symval ATTRIBUTE_UNUSED, >> + bfd_vma max_alignment ATTRIBUTE_UNUSED, >> + bfd_vma reserve_size ATTRIBUTE_UNUSED, >> + bool *again ATTRIBUTE_UNUSED, >> + riscv_pcgp_relocs *pcgp_relocs, >> + bool undefined_weak ATTRIBUTE_UNUSED, >> + bfd_vma *delete_total) >> { >> unsigned int i, symcount; >> bfd_vma toaddr = sec->size; >> @@ -4072,9 +4085,42 @@ riscv_relax_delete_bytes (bfd *abfd, >> struct bfd_elf_section_data *data = elf_section_data (sec); >> bfd_byte *contents = data->this_hdr.contents; >> >> + bfd_vma addr = rel->r_offset; >> + size_t count = rel->r_addend; >> + riscv_pcgp_relocs *p = pcgp_relocs; >> + >> + /* Find the next DELETE reloc (if one exists). */ >> + Elf_Internal_Rela *relocs = elf_section_data (sec)->relocs; >> + Elf_Internal_Rela *next_delete = NULL; >> + /* Since we only replace existing relocs and don't add new relocs, the >> + relocs are in sequential order. We can skip the relocs prior to this one, >> + making this search linear time. */ >> + int start = rel - relocs; >> + for (unsigned int i = start; i < sec->reloc_count; i++) >> + { >> + next_delete = relocs + i; >> + if (ELFNN_R_TYPE (next_delete->r_info) == R_RISCV_DELETE >> + && next_delete->r_offset > rel->r_offset) >> + break; >> + else >> + next_delete = NULL; >> + } >> + >> + size_t bytes_to_move; >> + /* Make this a piecewise deletion. */ >> + if (next_delete == NULL) >> + bytes_to_move = toaddr - addr - count; >> + else >> + bytes_to_move = next_delete->r_offset - addr - count; >> + >> /* Actually delete the bytes. */ >> sec->size -= count; >> - memmove (contents + addr, contents + addr + count, toaddr - addr - count); >> + memmove (contents + addr, contents + addr + count + *delete_total, bytes_to_move); >> + >> + *delete_total += count; >> + >> + /* Delete the reloc. */ >> + rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); >> >> /* Adjust the location of all of the relocs. Note that we need not >> adjust the addends, since all PC-relative references must be against >> @@ -4172,12 +4218,20 @@ riscv_relax_delete_bytes (bfd *abfd, >> return true; >> } >> >> -typedef bool (*relax_func_t) (bfd *, asection *, asection *, >> - struct bfd_link_info *, >> - Elf_Internal_Rela *, >> - bfd_vma, bfd_vma, bfd_vma, bool *, >> - riscv_pcgp_relocs *, >> - bool undefined_weak); >> +/* Delete some bytes from a section while relaxing. */ >> + >> +static bool >> +riscv_relax_delete_bytes (bfd_vma addr, >> + size_t count, >> + Elf_Internal_Rela *rel) >> +{ >> + /* Mark bytes for deletion. */ >> + rel->r_info = ELFNN_R_INFO (0, R_RISCV_DELETE); >> + rel->r_offset = addr; >> + rel->r_addend = count; >> + >> + return true; >> +} >> >> /* Relax AUIPC + JALR into JAL. */ >> >> @@ -4190,7 +4244,8 @@ _bfd_riscv_relax_call (bfd *abfd, asection *sec, asection *sym_sec, >> bfd_vma reserve_size ATTRIBUTE_UNUSED, >> bool *again, >> riscv_pcgp_relocs *pcgp_relocs, >> - bool undefined_weak ATTRIBUTE_UNUSED) >> + bool undefined_weak ATTRIBUTE_UNUSED, >> + bfd_vma *delete_total ATTRIBUTE_UNUSED) >> { >> bfd_byte *contents = elf_section_data (sec)->this_hdr.contents; >> bfd_vma foff = symval - (sec_addr (sec) + rel->r_offset); >> @@ -4252,8 +4307,7 @@ _bfd_riscv_relax_call (bfd *abfd, asection *sec, asection *sym_sec, >> >> /* Delete unnecessary JALR. */ >> *again = true; >> - return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + len, 8 - len, >> - link_info, pcgp_relocs); >> + return riscv_relax_delete_bytes (rel->r_offset + len, 8 - len, rel + 1); >> } >> >> /* Traverse all output sections and return the max alignment. */ >> @@ -4286,7 +4340,8 @@ _bfd_riscv_relax_lui (bfd *abfd, >> bfd_vma reserve_size, >> bool *again, >> riscv_pcgp_relocs *pcgp_relocs, >> - bool undefined_weak) >> + bool undefined_weak, >> + bfd_vma *delete_total ATTRIBUTE_UNUSED) >> { >> bfd_byte *contents = elf_section_data (sec)->this_hdr.contents; >> bfd_vma gp = riscv_global_pointer_value (link_info); >> @@ -4343,11 +4398,10 @@ _bfd_riscv_relax_lui (bfd *abfd, >> return true; >> >> case R_RISCV_HI20: >> - /* We can delete the unnecessary LUI and reloc. */ >> + /* We can delete the unnecessary LUI and reuse the reloc. */ >> rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); >> *again = true; >> - return riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4, >> - link_info, pcgp_relocs); >> + return riscv_relax_delete_bytes (rel->r_offset, 4, rel); >> >> default: >> abort (); >> @@ -4379,8 +4433,7 @@ _bfd_riscv_relax_lui (bfd *abfd, >> rel->r_info = ELFNN_R_INFO (ELFNN_R_SYM (rel->r_info), R_RISCV_RVC_LUI); >> >> *again = true; >> - return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + 2, 2, >> - link_info, pcgp_relocs); >> + return riscv_relax_delete_bytes (rel->r_offset + 2, 2, rel + 1); >> } >> >> return true; >> @@ -4399,7 +4452,8 @@ _bfd_riscv_relax_tls_le (bfd *abfd, >> bfd_vma reserve_size ATTRIBUTE_UNUSED, >> bool *again, >> riscv_pcgp_relocs *pcgp_relocs, >> - bool undefined_weak ATTRIBUTE_UNUSED) >> + bool undefined_weak ATTRIBUTE_UNUSED, >> + bfd_vma *delete_total ATTRIBUTE_UNUSED) >> { >> /* See if this symbol is in range of tp. */ >> if (RISCV_CONST_HIGH_PART (tpoff (link_info, symval)) != 0) >> @@ -4418,11 +4472,10 @@ _bfd_riscv_relax_tls_le (bfd *abfd, >> >> case R_RISCV_TPREL_HI20: >> case R_RISCV_TPREL_ADD: >> - /* We can delete the unnecessary instruction and reloc. */ >> + /* We can delete the unnecessary instruction and reuse the reloc. */ >> rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); >> *again = true; >> - return riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4, link_info, >> - pcgp_relocs); >> + return riscv_relax_delete_bytes (rel->r_offset, 4, rel); >> >> default: >> abort (); >> @@ -4442,7 +4495,8 @@ _bfd_riscv_relax_align (bfd *abfd, asection *sec, >> bfd_vma reserve_size ATTRIBUTE_UNUSED, >> bool *again ATTRIBUTE_UNUSED, >> riscv_pcgp_relocs *pcgp_relocs ATTRIBUTE_UNUSED, >> - bool undefined_weak ATTRIBUTE_UNUSED) >> + bool undefined_weak ATTRIBUTE_UNUSED, >> + bfd_vma *delete_total ATTRIBUTE_UNUSED) >> { >> bfd_byte *contents = elf_section_data (sec)->this_hdr.contents; >> bfd_vma alignment = 1, pos; >> @@ -4496,10 +4550,8 @@ _bfd_riscv_relax_align (bfd *abfd, asection *sec, >> bfd_putl16 (RVC_NOP, contents + rel->r_offset + pos); >> >> /* Mark the excess bytes for deletion. */ >> - rel->r_info = ELFNN_R_INFO (0, R_RISCV_DELETE); >> - rel->r_addend = rel->r_addend - nop_bytes; >> - rel->r_offset = rel->r_offset + nop_bytes; >> - return true; >> + return riscv_relax_delete_bytes (rel->r_offset + nop_bytes, >> + rel->r_addend - nop_bytes, rel); >> } >> >> /* Relax PC-relative references to GP-relative references. */ >> @@ -4515,7 +4567,8 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED, >> bfd_vma reserve_size, >> bool *again ATTRIBUTE_UNUSED, >> riscv_pcgp_relocs *pcgp_relocs, >> - bool undefined_weak) >> + bool undefined_weak, >> + bfd_vma *delete_total ATTRIBUTE_UNUSED) >> { >> bfd_byte *contents = elf_section_data (sec)->this_hdr.contents; >> bfd_vma gp = riscv_global_pointer_value (link_info); >> @@ -4654,28 +4707,6 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED, >> return true; >> } >> >> -/* Delete the bytes for R_RISCV_DELETE. */ >> - >> -static bool >> -_bfd_riscv_relax_delete (bfd *abfd, >> - asection *sec, >> - asection *sym_sec ATTRIBUTE_UNUSED, >> - struct bfd_link_info *link_info, >> - Elf_Internal_Rela *rel, >> - bfd_vma symval ATTRIBUTE_UNUSED, >> - bfd_vma max_alignment ATTRIBUTE_UNUSED, >> - bfd_vma reserve_size ATTRIBUTE_UNUSED, >> - bool *again ATTRIBUTE_UNUSED, >> - riscv_pcgp_relocs *pcgp_relocs ATTRIBUTE_UNUSED, >> - bool undefined_weak ATTRIBUTE_UNUSED) >> -{ >> - if (!riscv_relax_delete_bytes (abfd, sec, rel->r_offset, rel->r_addend, >> - link_info, NULL)) >> - return false; >> - rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); >> - return true; >> -} >> - >> /* Called by after_allocation to set the information of data segment >> before relaxing. */ >> >> @@ -4706,6 +4737,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, >> unsigned int i; >> bfd_vma max_alignment, reserve_size = 0; >> riscv_pcgp_relocs pcgp_relocs; >> + bfd_vma delete_bytes = 0; >> >> *again = false; >> >> @@ -4941,7 +4973,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, >> >> if (!relax_func (abfd, sec, sym_sec, info, rel, symval, >> max_alignment, reserve_size, again, >> - &pcgp_relocs, undefined_weak)) >> + &pcgp_relocs, undefined_weak, &delete_bytes)) >> goto fail; >> } >> >> -- >> 2.25.1 >>