From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by sourceware.org (Postfix) with ESMTPS id 66E26393C860 for ; Thu, 24 Sep 2020 19:06:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 66E26393C860 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=maskray.me Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=emacsray@gmail.com Received: by mail-pf1-f174.google.com with SMTP id x123so271047pfc.7 for ; Thu, 24 Sep 2020 12:06:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=P9wYYdTm2OP0VAC75pHkLBC/bT+ySGOQY4lf2VenAQ0=; b=uBZ+FnzRaLGB2CtVh9/JjN0HG4ci0RoOg+1RGVCzsp69hl4Px7AKTte25sAAtP591m v4Q+Rt/3T4n42pXczU7bTZKSmiv6fFCLdwxir+fZMrOu9bbP3L7wa8aJCsLV7ZVuF7jX 2wcXG4fVs8lkdLuFIAA+qli1tZ6hKe0TXG19UPKhidd0Kt8uf5wt87HsV4PllOr4Ng7Z GtiiDJZ3jRtQXUtq0SENlVPEKxbb+VE6co3PPSa5KacjTOWoqQc7h4QU19A995VHKV7s S3x9jobZ52Cksujmvrq4OxwBaYROfbN38Gpufu5fhBL1sqKi3OZ4IBSGDlJ5rNncHvM5 D+8Q== X-Gm-Message-State: AOAM532JqTPHnkL/EY+5CTxwOF9t94tREAKMn9CP81/cypovC437zRtm tAQqyOmUTBH2nJrnEhuavJzB4B1odtN3Ew== X-Google-Smtp-Source: ABdhPJzzo4+W/VN6WUUHtvhE3VnzowagwcDlx2ANuhxilIHsu6SgZBkO23DMn22rx/l/ymSu76/1VQ== X-Received: by 2002:a62:62c3:0:b029:151:223b:3ba0 with SMTP id w186-20020a6262c30000b0290151223b3ba0mr511919pfb.28.1600974409440; Thu, 24 Sep 2020 12:06:49 -0700 (PDT) Received: from localhost ([2601:647:4b01:ae80::51fb]) by smtp.gmail.com with ESMTPSA id mh8sm102888pjb.32.2020.09.24.12.06.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Sep 2020 12:06:48 -0700 (PDT) Date: Thu, 24 Sep 2020 12:06:48 -0700 From: Fangrui Song To: Jozef Lawrynowicz Cc: Michael Matz , Binutils , "H.J. Lu" , ccoutant@gmail.com Subject: Re: [PATCH] Support SHF_GNU_RETAIN ELF section flag Message-ID: <20200924190647.d3oeplxvt7qnruoz@gmail.com> References: <20200923165211.fr4rqzp5uqqmrufq@jozef-acer-manjaro> <20200923184735.4k2tji4yro452bep@jozef-acer-manjaro> <20200923200437.mnegrmwebjuzmfeu@jozef-acer-manjaro> <20200923232943.kasbrmqtpone4yi7@gmail.com> <20200924113910.zm2ocfura3egmq44@jozef-acer-manjaro> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20200924113910.zm2ocfura3egmq44@jozef-acer-manjaro> X-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2020 19:06:52 -0000 On 2020-09-24, Jozef Lawrynowicz wrote: >On Wed, Sep 23, 2020 at 04:29:43PM -0700, Fangrui Song wrote: >> Hi Jozef, > >Hi Fangrui, > >> I saw your proposal https://sourceware.org/pipermail/gnu-gabi/2020q3/000429.html >> I did not subscribe to gnu-gabi before yesterday so it is inconvenient for me to >> reply there. Since SHF_GNU_RETAIN is a new feature, and we already have facility >> for making arbitrary sections alive with R_*_NONE, can you highlight the selling >> point of a new flag? >> >> Copying me previous reply here >> > We already have a way to create an artificial reference: >> > >> > .reloc ., R_X86_64_NONE, target_symbol >> > >> > If we allow a relocation number for the second operand >> > >> > .reloc ., 0, target_symbol >> > >> > this will be generic. You can insert the directives in a GC root (e.g. >> > _start or a symbol referenced by -u or maybe an .init_array) >> >> If you do not want to touch the section containing the -e (--entry) symbol, you >> can use: >> >> .section .init_array.1,"a",@init_array >> .reloc ., R_X86_64_NONE, retained_section >> >> (I find that gold has an internal error with such a relocation.) >> But GNU ld should have been supported this for a very long time. >> >> (I added these directives to llvm last year: https://reviews.llvm.org/D62014 ) >> > >The fact that this relies on the compiler knowing a specific section >will be present in a linker script, when we are dealing with such a >broad ecosystem of targets and operating systems, makes me uneasy. The >functionality simply breaks if the user has a custom linker script which >does not have .init_array. > >Many embedded applications can be written without requiring this >section. If someone has written their linker script from scratch, only >including the section directives for the sections they actually need, >why must we enforce that they have a .init_array input section rule just >so they can make use of the "retain" attribute. It doesn't make sense - >.init_array and "retain" are not related. I use .init_array (which happens to be a GC root) as an example, not that I am advertising .init_array . My main point is about .reloc You can use a .reloc directive in a known GC root. .init_array happens to be such a GC root so I used it as an example. It is not too bad if you think about a benign zero-sized section. >Even if this approach would work and pick the right section, I think >it is nicer for the user for the "retain" attribute to have a >dedicated ELF construct which describes the requirement to retain the >section, instead of using an existing construct whose purpose is not >related. Relocations are the keystone of --gc-sections. In some cases we want a dependency relation but do not want the relocation to alter the content. We use R_*_NONE in such cases. A relocation gives more control than a section flag. In cases you need "if this section is retained, please retain some other sections", instead of "please always retain these sections". >Your average user is going to be very confused why there are relocs in >section X which point to various symbols in their code. If they have >written the entire application, they might be able to infer that it is >the "retain" attribute which generated these relocs, but if someone else >wrote the code or the code is from a library or SDK it will not be >clear. > >Ok we could maybe name a reloc like BFD_RELOC_RETAIN, but then what >would the description be? > This relocation type does not actually perform any relocation action, > but is used to indicate that the symbol it references should not be > discarded by linker garbage collection. It must be placed in a section > which will definitely be present in the linked output file, and not be > subject to garbage collection, otherwise it will not have any effect. > >Can you tell me why it is preferable to use the relocation mechanism to >implement this, instead of a precisely defined new section flag? > >Why must we look to workarounds to implement something like this >anyway? We can work out the details of a new section flag, and ensure >it is precisely specified to ensure robustness, and then developers can >benefit from understanding more about how their program has been put >together. > >Do we want to make life easier for ourselves, or easier for our users? > >I get that ABI changes can be a bit disruptive, but this new flag in >particular really isn't complicated anyway. > >> --- >> >> For a new section flag, there are a bunch of things needing thoughts >> >> * assembler >> >> The .retain directive seems to be discouraged... For section flags: >> >> .section .foo,"a" >> .section .foo,"aR" # is this a new section >> .pushsection .foo,"aR" # is this a new section > >No they are not new sections. From my original proposal: If we use a section flag, my expected behavior for the second .section with different flags is an error: https://sourceware.org/pipermail/binutils/2020-February/109945.html >> .section .foo,"a" >> .section .foo,"aR" # error In this case, I agree that a separate directive can be more convenient because the compiler does not need to known the flag when it is about to emit the first .section directive (for example, due to a faraway __attribute__((section(...)))) But then, it will be an innovation I don't know a precedent exists. >> Alternatively, the "R" flag is recognized by the "flags" argument to the >> .section directive and will apply SHF_GNU_RETAIN to that section. >> It is intended that SHF_GNU_RETAIN does not interfere with any validation when >> switching to a section. It can be used to augment the section flags in a section >> which has already been created. > >When you have two .section directives for the same section, GAS >"switches" between them instead of creating new sections, which is what >I referred to above. > >This is why the .retain directive more precisely describes what is >happening. The compiler is telling the assembler that the section >containing the declaration of the function or data symbol should have >the SHF_GNU_RETAIN flag applied. > >> >> Does the compiler need to remember that a section has the flag? >> (Think how this works with __attribute__((section(...))); many asm streamers are >> one-pass) > >The compiler does not need to worry about sections beyond getting the >name of the section the declaration is in. The "retain" attribute just >means that the section containing the declaration of the function or >data object must be retained, so it emits a directive to describe that. >Once the assembler has set SHF_GNU_RETAIN on a section, it will not be >unset. > >I expect the most common use case to actually be when either the >"section" attribute has been used, or the -f{function,data}-sections GCC >options have been passed. If the user is trying to make the most out of >garbage collection, they should be using -f{function,data}-sections. > >> >> * linker >> - What does -r do on two sections of the same, one with the flag and the other >> without? (as HJ mentioned) > >To reply to H.J. as well for this point: >I don't think this warrants any special behavior, SHF_GNU_RETAIN doesn't >need to change the behavior of section merging. The user should put the >object to retain in it's own section if they don't want large parts of >their program to possibly be unnecessarily retained. The unique section >name they give their SHF_GNU_RETAIN section will not be merged into a >general output section name until they perform the final non-relocatable >link. > >A section with SHF_GNU_RETAIN applied is being retained because it >contains some information that is important to the program. So wherever >the that information ends up needs to be retained. > >> - Does the output section have the flag? > >SHF_GNU_RETAIN is applied to an input section. >To ensure the input section is retained, SHF_GNU_RETAIN must be applied >to any section that input section is merged with. The flag doesn't get >removed from output sections. > >> - Does the flag retain other sections in the same section group? > >Yes. >From the description on section groups from the ELF spec: > ... such groups must be included or omitted from the linked > object as a unit. > >I think potentially the only confusing part of any section flag merging >behavior is the fact that the assembly code might have different >.section directives for the same section, some with "R" and some without >(+1 for a .retain directive ;)). >Once the assembler has emitted its output, the SHF_GNU_RETAIN flag >applied to an input section behaves like any other section flag. >There is only one line of linker code which does anything specific with >SHF_GNU_RETAIN, and that is the code in bfd/elflink.c to "gc_mark" the >section. > >Thanks, >Jozef > >> >> >> On 2020-09-23, H.J. Lu via Binutils wrote: >> > On Wed, Sep 23, 2020 at 1:04 PM Jozef Lawrynowicz >> > wrote: >> > > >> > > On Wed, Sep 23, 2020 at 12:03:28PM -0700, H.J. Lu via Binutils wrote: >> > > > On Wed, Sep 23, 2020 at 11:47 AM Jozef Lawrynowicz >> > > > wrote: >> > > > > >> > > > > On Wed, Sep 23, 2020 at 10:13:37AM -0700, H.J. Lu via Binutils wrote: >> > > > > > On Wed, Sep 23, 2020 at 9:52 AM Jozef Lawrynowicz >> > > > > > wrote: >> > > > > > > >> > > > > > > On Wed, Sep 23, 2020 at 01:51:56PM +0000, Michael Matz wrote: >> > > > > > > > Hello, >> > > > > > > > >> > > > > > > > On Wed, 23 Sep 2020, H.J. Lu via Binutils wrote: >> > > > > > > > >> > > > > > > > > > I think that: >> > > > > > > > > > >> > > > > > > > > > > .section .text,"ax" >> > > > > > > > > > > ... >> > > > > > > > > > > foo: >> > > > > > > > > > > ... >> > > > > > > > > > > .retain >> > > > > > > > > > > retained_fn: >> > > > > > > > > > > ... >> > > > > > > > > > >> > > > > > > > > > is some nice syntactic sugar compared to: >> > > > > > > > > > >> > > > > > > > > > > .section .text,"ax" >> > > > > > > > > > > ... >> > > > > > > > > > > foo: >> > > > > > > > > > > ... >> > > > > > > > > > > .section .text,"axR" >> > > > > > > > > > > retained_fn: >> > > > > > > > > > > ... >> > > > > > > > > > >> > > > > > > > > > It's also partly for convenience; we have other directives which are >> > > > > > > > > > synonyms or short-hand for each other. >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > You don't need to keep the whole section when only one symbol should >> > > > > > > > > be kept. Please drop the .retain directive. GCC, as and ld should do the >> > > > > > > > > right thing with >> > > > > > > > > >> > > > > > > > > .section .text,"ax" >> > > > > > > > > ... >> > > > > > > > > foo: >> > > > > > > > > ... >> > > > > > > > > .section .text,"axR" >> > > > > > > > > >> > > > > > > > > retained_fn: >> > > > > > > > > >> > > > > > > > > where foo can be dropped and retained_fn will be kept. >> > > > > > > > >> > > > > > > > This is not what we discussed at the ABI list, the flag is per section, so >> > > > > > > > either the whole section is retained or not. What you describe is >> > > > > > > > something else that would work on a per symbol basis, which would have to >> > > > > > > > be specified in a different way and might or might not be a good idea. >> > > > > > > > But let's not conflate these two. >> > > > > > > >> > > > > > > Also, the linker cannot currently dissect a section and remove a >> > > > > > > particular unused symbol anyway. Since garbage collection only operates >> > > > > > > on the section level, marking the section itself as "retained" seems >> > > > > > > most appropriate. >> > > > > > >> > > > > > It can be done. If you put your branch on >> > > > > > >> > > > > > https://gitlab.com/x86-binutils/binutils-gdb >> > > > > > >> > > > > > I can help you implement it. >> > > > > >> > > > > It's not something I have time to look into at the moment, for now the >> > > > > aim is just to prevent garbage collection of sections. >> > > > >> > > > Linker and assembler already support it. You just need to add SHF_GNU_RETAIN >> > > > to the framework. Check how SHF_GNU_MBIND works. >> > > >> > > Sorry, I don't understand. >> > > >> > > Are you saying that LD already supports the garbage collection of >> > > individual unused symbol definitions from input sections? Whilst >> > > retaining other symbol definitions which are required by the program? >> > > I cannot find any reference to this. >> > > >> > > How does that relate to SHF_GNU_MBIND? I looked at all the references >> > > to "mbind" in Binutils and nothing seemed related garbage collection of >> > > sections, since SHF_GNU_MBIND is just used to indicate a particular >> > > section should be placed in a special memory area. >> > >> > For >> > >> > section .text,"ax" >> > ... >> > foo: >> > ... >> > .section .text,"axR" >> > retained_fn: >> > >> > you need to create a new .text section with SHF_GNU_RETAIN for >> > retained_fn. See get_section in obj-elf.c. If you want to avoid >> > merging .text section with SHF_GNU_RETAIN with other .text >> > sections by ld -r, linker needs to distinguish sections of the >> > same name with and without SHF_GNU_RETAIN. >