From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 081D63858418 for ; Tue, 25 Oct 2022 04:36:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 081D63858418 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-oi1-x236.google.com with SMTP id j188so12993208oih.4 for ; Mon, 24 Oct 2022 21:36:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=7bGTxFh/D3WA/nKDcVBk+F0FE8wxiWFWpZBkFiGKbNc=; b=iiyUfr57esBzJggLuOC2rn9HuTkdRk4NL1BCyhaS9/2IhunrX82W/geY7ixEgEguA/ knApWVxyDwjQeQQPwtcOP9Rd2crzst1Hmy6eYIUH/Ob3w/6Pg5TnptrvrKM+LkBOSwqX G/93wg7Z3wdt+lFJIV40stYp+oCpgxqifMsjCLhyadIMljNV2JTpHfWCX5urL7hTZqyF KgYYyX5HdWC9+B6HZxa9PsYr7+G5ty32apkPXQPodyZT3z/XhHqnye7v9mPPIsfN9bKe Oy29ea1nEbXO7knW+0J+Y6GOPk0TMMDKWV+OpUz2vWbcze4E52onNnQ+NTisG+2ERG1a u8QQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7bGTxFh/D3WA/nKDcVBk+F0FE8wxiWFWpZBkFiGKbNc=; b=t1To9kOzgUWPN2YskWJLfBm51eXRhj62B3ETSCXUqoyjjZv+Zuc87XyOT8L5uBLP7u Gz0zN5hckMeTL8u/NUTtysyJsLuvGhvd/nDiMroF8bTeLQEVC7l3N7Rvxnx3jAwOVKR4 kftOOWJdrRcH41u6R4GEKIKqMI/K3RYMcVisWMz5yxRQr36EeurFJIgypSDraV9e+ko+ UFPtJtXGE2PsG0aV/LmASLVKes3T9zBekGZOXnhR5y1brH1/dKPFl68xlVPTCOiVTbOq gDG2J+/MabhNeGk57VgPgpRegOLR6QxmgP3xLd1Tj+sVmenEh9YV8TTw91YsPVNeKZd+ ns8Q== X-Gm-Message-State: ACrzQf2InlvEYf0tqOOlHmWIfTJfv/km3IwYqdASv+RrVD8VM16ivgkr XexjurcF4DErhtANt97DPVAYYeLh3ryPIewT6Vu7dfUsY5lrwQ== X-Google-Smtp-Source: AMsMyM6MGSj25MNVo8t5rgZiHDos4fCEMh4RIEeoBYlhb6PFHpPeRvLW/O1nNR/kytkauVJlf6IBGErqzRyt62aZK+U= X-Received: by 2002:a05:6808:1910:b0:355:3525:8d5 with SMTP id bf16-20020a056808191000b00355352508d5mr22384422oib.82.1666672572112; Mon, 24 Oct 2022 21:36:12 -0700 (PDT) MIME-Version: 1.0 References: <20221025013347.68282-1-nelson@rivosinc.com> In-Reply-To: <20221025013347.68282-1-nelson@rivosinc.com> From: Nelson Chu Date: Tue, 25 Oct 2022 12:36:01 +0800 Message-ID: Subject: Re: [committed 1/2] RISC-V: Improve link time complexity. To: binutils@sourceware.org, Jim Wilson , Palmer Dabbelt , Andrew Waterman Cc: "Patrick O'Neill" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Btw, for some corner cases that the relaxations may have dependency to others (relax A needs to relax B in advance), though the piecewise deletion reduce the time of memove, but it seems like the time is still wasted by the more relax rounds since the `again' flag. Not sure if it is worth it to support multiple levels of deletions, for example, to limit the relax round to 1 when using the piecewise deletion, so that people who don't care much about the relaxed results will have a faster linker. Thanks Nelson On Tue, Oct 25, 2022 at 9:33 AM Nelson Chu wrote: > > From: Patrick O'Neill > > The riscv port does deletion and symbol table update for each relocation > while relaxing, so we are moving section bytes and scanning symbol table once > for each relocation. Compared to microblaze port, they record the relaxation > changes into a table, then do the deletion and symbol table update once per > section, rather than per relocation. Therefore, they should have better link > time complexity than us. > > To improve the link time complexity, this patch try to make the deletion in > linear time. Compared to record the relaxation changes into a table, we > replace the unused relocation with R_RISCV_DELETE for the deleted bytes, and > then resolve them at the end of the section. Assuming the number of > R_RISCV_DELETE is m, and the number of symbols is n, the total link complexity > should be O(m) for moving section bytes, and O(m*n^2) for symbol table update. > If we record the relaxation changes into the table, and then sort the symbol > table by values, then probably can reduce the time complexity to O(m*n*log(n)) > for updating symbol table, but it doesn't seem worth it for now. > > bfd/ > * elfnn-riscv.c (_riscv_relax_delete_bytes): Renamed from > riscv_relax_delete_bytes, updated to reduce the tiem complexity to O(m) > for memmove. > (typedef relax_delete_t): Function pointer declaration of delete functions. > (riscv_relax_delete_bytes): Can choose to use _riscv_relax_delete_piecewise > or _riscv_relax_delete_immediate for deletion. > (_riscv_relax_delete_piecewise): Just mark the deleted bytes as R_RISCV_DELETE. > (_riscv_relax_delete_immediate): Delete some bytes from a section while > relaxing. > (riscv_relax_resolve_delete_relocs): Delete the bytes for R_RISCV_DELETE > relocations from a section, at the end of _bfd_riscv_relax_section. > (_bfd_riscv_relax_call): Mark deleted bytes as R_RISCV_DELETE by reusing > R_RISCV_RELAX. > (_bfd_riscv_relax_lui): Likewise, but reuse R_RISCV_HI20 for lui, and reuse > R_RISCV_RELAX for c.lui. > (_bfd_riscv_relax_tls_le): Likewise, but resue R_RISCV_TPREL_HI20 and > R_RISCV_TPREL_ADD. > (_bfd_riscv_relax_pc): Likewise, but resue R_RISCV_PCREL_HI20 for auipc. > (_bfd_riscv_relax_align): Updated, don't need to resue relocation since > calling _riscv_relax_delete_immediate. > (_bfd_riscv_relax_delete): Removed. > (_bfd_riscv_relax_section): Set riscv_relax_delete_bytes for each relax_func, > to delete bytes immediately or later. Call riscv_relax_resolve_delete_relocs > to delete bytes for DELETE relocations from a section. > --- > bfd/elfnn-riscv.c | 180 +++++++++++++++++++++++++++++++++------------- > 1 file changed, 131 insertions(+), 49 deletions(-) > > diff --git a/bfd/elfnn-riscv.c b/bfd/elfnn-riscv.c > index 3d2ddf4e651..e4064313724 100644 > --- a/bfd/elfnn-riscv.c > +++ b/bfd/elfnn-riscv.c > @@ -4043,27 +4043,32 @@ riscv_update_pcgp_relocs (riscv_pcgp_relocs *p, asection *deleted_sec, > } > } > > -/* Delete some bytes from a section while relaxing. */ > +/* Delete some bytes, adjust relcocations and symbol table from a section. */ > > static bool > -riscv_relax_delete_bytes (bfd *abfd, > - asection *sec, > - bfd_vma addr, > - size_t count, > - struct bfd_link_info *link_info, > - riscv_pcgp_relocs *p) > +_riscv_relax_delete_bytes (bfd *abfd, > + asection *sec, > + bfd_vma addr, > + size_t count, > + struct bfd_link_info *link_info, > + riscv_pcgp_relocs *p, > + bfd_vma delete_total, > + bfd_vma toaddr) > { > unsigned int i, symcount; > - bfd_vma toaddr = sec->size; > struct elf_link_hash_entry **sym_hashes = elf_sym_hashes (abfd); > Elf_Internal_Shdr *symtab_hdr = &elf_tdata (abfd)->symtab_hdr; > unsigned int sec_shndx = _bfd_elf_section_from_bfd_section (abfd, sec); > struct bfd_elf_section_data *data = elf_section_data (sec); > bfd_byte *contents = data->this_hdr.contents; > + size_t bytes_to_move = toaddr - addr - count; > > /* Actually delete the bytes. */ > sec->size -= count; > - memmove (contents + addr, contents + addr + count, toaddr - addr - count); > + memmove (contents + addr, contents + addr + count + delete_total, bytes_to_move); > + > + /* Still adjust relocations and symbols in non-linear times. */ > + toaddr = sec->size + count; > > /* Adjust the location of all of the relocs. Note that we need not > adjust the addends, since all PC-relative references must be against > @@ -4161,6 +4166,99 @@ riscv_relax_delete_bytes (bfd *abfd, > return true; > } > > +typedef bool (*relax_delete_t) (bfd *, asection *, > + bfd_vma, size_t, > + struct bfd_link_info *, > + riscv_pcgp_relocs *, > + Elf_Internal_Rela *); > + > +static relax_delete_t riscv_relax_delete_bytes; > + > +/* Do not delete some bytes from a section while relaxing. > + Just mark the deleted bytes as R_RISCV_DELETE. */ > + > +static bool > +_riscv_relax_delete_piecewise (bfd *abfd ATTRIBUTE_UNUSED, > + asection *sec ATTRIBUTE_UNUSED, > + bfd_vma addr, > + size_t count, > + struct bfd_link_info *link_info ATTRIBUTE_UNUSED, > + riscv_pcgp_relocs *p ATTRIBUTE_UNUSED, > + Elf_Internal_Rela *rel) > +{ > + if (rel == NULL) > + return false; > + rel->r_info = ELFNN_R_INFO (0, R_RISCV_DELETE); > + rel->r_offset = addr; > + rel->r_addend = count; > + return true; > +} > + > +/* Delete some bytes from a section while relaxing. */ > + > +static bool > +_riscv_relax_delete_immediate (bfd *abfd, > + asection *sec, > + bfd_vma addr, > + size_t count, > + struct bfd_link_info *link_info, > + riscv_pcgp_relocs *p, > + Elf_Internal_Rela *rel) > +{ > + if (rel != NULL) > + rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > + return _riscv_relax_delete_bytes (abfd, sec, addr, count, > + link_info, p, 0, sec->size); > +} > + > +/* Delete the bytes for R_RISCV_DELETE relocs. */ > + > +static bool > +riscv_relax_resolve_delete_relocs (bfd *abfd, > + asection *sec, > + struct bfd_link_info *link_info, > + Elf_Internal_Rela *relocs) > +{ > + bfd_vma delete_total = 0; > + unsigned int i; > + > + for (i = 0; i < sec->reloc_count; i++) > + { > + Elf_Internal_Rela *rel = relocs + i; > + if (ELFNN_R_TYPE (rel->r_info) != R_RISCV_DELETE) > + continue; > + > + /* Find the next R_RISCV_DELETE reloc if possible. */ > + Elf_Internal_Rela *rel_next = NULL; > + unsigned int start = rel - relocs; > + for (i = start; i < sec->reloc_count; i++) > + { > + /* Since we only replace existing relocs and don't add new relocs, the > + relocs are in sequential order. We can skip the relocs prior to this > + one, making this search linear time. */ > + rel_next = relocs + i; > + if (ELFNN_R_TYPE ((rel_next)->r_info) == R_RISCV_DELETE > + && (rel_next)->r_offset > rel->r_offset) > + break; > + else > + rel_next = NULL; > + } > + > + bfd_vma toaddr = rel_next == NULL ? sec->size : rel_next->r_offset; > + if (!_riscv_relax_delete_bytes (abfd, sec, rel->r_offset, rel->r_addend, > + link_info, NULL, delete_total, toaddr)) > + return false; > + > + delete_total += rel->r_addend; > + rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > + > + /* Skip ahead to the next delete reloc. */ > + i = rel_next != NULL ? rel_next - relocs - 1 : sec->reloc_count; > + } > + > + return true; > +} > + > typedef bool (*relax_func_t) (bfd *, asection *, asection *, > struct bfd_link_info *, > Elf_Internal_Rela *, > @@ -4239,10 +4337,10 @@ _bfd_riscv_relax_call (bfd *abfd, asection *sec, asection *sym_sec, > /* Replace the AUIPC. */ > riscv_put_insn (8 * len, auipc, contents + rel->r_offset); > > - /* Delete unnecessary JALR. */ > + /* Delete unnecessary JALR and reuse the R_RISCV_RELAX reloc. */ > *again = true; > return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + len, 8 - len, > - link_info, pcgp_relocs); > + link_info, pcgp_relocs, rel + 1); > } > > /* Traverse all output sections and return the max alignment. */ > @@ -4332,11 +4430,10 @@ _bfd_riscv_relax_lui (bfd *abfd, > return true; > > case R_RISCV_HI20: > - /* We can delete the unnecessary LUI and reloc. */ > - rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > + /* Delete unnecessary LUI and reuse the reloc. */ > *again = true; > return riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4, > - link_info, pcgp_relocs); > + link_info, pcgp_relocs, rel); > > default: > abort (); > @@ -4367,9 +4464,10 @@ _bfd_riscv_relax_lui (bfd *abfd, > /* Replace the R_RISCV_HI20 reloc. */ > rel->r_info = ELFNN_R_INFO (ELFNN_R_SYM (rel->r_info), R_RISCV_RVC_LUI); > > + /* Delete extra bytes and reuse the R_RISCV_RELAX reloc. */ > *again = true; > return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + 2, 2, > - link_info, pcgp_relocs); > + link_info, pcgp_relocs, rel + 1); > } > > return true; > @@ -4407,11 +4505,10 @@ _bfd_riscv_relax_tls_le (bfd *abfd, > > case R_RISCV_TPREL_HI20: > case R_RISCV_TPREL_ADD: > - /* We can delete the unnecessary instruction and reloc. */ > - rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > + /* Delete unnecessary instruction and reuse the reloc. */ > *again = true; > return riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4, link_info, > - pcgp_relocs); > + pcgp_relocs, rel); > > default: > abort (); > @@ -4472,10 +4569,10 @@ _bfd_riscv_relax_align (bfd *abfd, asection *sec, > if (nop_bytes % 4 != 0) > bfd_putl16 (RVC_NOP, contents + rel->r_offset + pos); > > - /* Delete the excess bytes. */ > + /* Delete excess bytes. */ > return riscv_relax_delete_bytes (abfd, sec, rel->r_offset + nop_bytes, > rel->r_addend - nop_bytes, link_info, > - NULL); > + NULL, NULL); > } > > /* Relax PC-relative references to GP-relative references. */ > @@ -4617,9 +4714,9 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED, > ELFNN_R_SYM(rel->r_info), > sym_sec, > undefined_weak); > - /* We can delete the unnecessary AUIPC and reloc. */ > - rel->r_info = ELFNN_R_INFO (0, R_RISCV_DELETE); > - rel->r_addend = 4; > + /* Delete unnecessary AUIPC and reuse the reloc. */ > + riscv_relax_delete_bytes (abfd, sec, rel->r_offset, 4, link_info, > + pcgp_relocs, rel); > return true; > > default: > @@ -4630,28 +4727,6 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED, > return true; > } > > -/* Delete the bytes for R_RISCV_DELETE. */ > - > -static bool > -_bfd_riscv_relax_delete (bfd *abfd, > - asection *sec, > - asection *sym_sec ATTRIBUTE_UNUSED, > - struct bfd_link_info *link_info, > - Elf_Internal_Rela *rel, > - bfd_vma symval ATTRIBUTE_UNUSED, > - bfd_vma max_alignment ATTRIBUTE_UNUSED, > - bfd_vma reserve_size ATTRIBUTE_UNUSED, > - bool *again ATTRIBUTE_UNUSED, > - riscv_pcgp_relocs *pcgp_relocs ATTRIBUTE_UNUSED, > - bool undefined_weak ATTRIBUTE_UNUSED) > -{ > - if (!riscv_relax_delete_bytes (abfd, sec, rel->r_offset, rel->r_addend, > - link_info, NULL)) > - return false; > - rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); > - return true; > -} > - > /* Called by after_allocation to set the information of data segment > before relaxing. */ > > @@ -4729,6 +4804,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, > bool undefined_weak = false; > > relax_func = NULL; > + riscv_relax_delete_bytes = NULL; > if (info->relax_pass == 0) > { > if (type == R_RISCV_CALL > @@ -4750,6 +4826,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, > relax_func = _bfd_riscv_relax_pc; > else > continue; > + riscv_relax_delete_bytes = _riscv_relax_delete_piecewise; > > /* Only relax this reloc if it is paired with R_RISCV_RELAX. */ > if (i == sec->reloc_count - 1 > @@ -4760,10 +4837,11 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, > /* Skip over the R_RISCV_RELAX. */ > i++; > } > - else if (info->relax_pass == 1 && type == R_RISCV_DELETE) > - relax_func = _bfd_riscv_relax_delete; > - else if (info->relax_pass == 2 && type == R_RISCV_ALIGN) > - relax_func = _bfd_riscv_relax_align; > + else if (info->relax_pass == 1 && type == R_RISCV_ALIGN) > + { > + relax_func = _bfd_riscv_relax_align; > + riscv_relax_delete_bytes = _riscv_relax_delete_immediate; > + } > else > continue; > > @@ -4921,6 +4999,10 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, > goto fail; > } > > + /* Resolve R_RISCV_DELETE relocations. */ > + if (!riscv_relax_resolve_delete_relocs (abfd, sec, info, relocs)) > + goto fail; > + > ret = true; > > fail: > -- > 2.37.0 (Apple Git-136) >