From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x141.google.com (mail-lf1-x141.google.com [IPv6:2a00:1450:4864:20::141]) by sourceware.org (Postfix) with ESMTPS id 7D3A3385F022 for ; Wed, 11 Mar 2020 12:51:46 +0000 (GMT) Received: by mail-lf1-x141.google.com with SMTP id j11so1609289lfg.4 for ; Wed, 11 Mar 2020 05:51:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=OJBNWUC1Th9Z/G5/lezN2DI2zA1xWjELnkhVwQExjg0=; b=hSle7AWeEG7GRIAhpgn1ds3jYji/24dXMHNLEQ5O3Ra2SnDUPasMiC7/pnIB5Owsqi XwvZ/BwPuXImBiA/9v5DIeKm/EVD5HFg6UGFn8ur5Vz8tIkTIRUUzdbgBKyrEbCYDaML qbOkAUtAHYWl0erorkP10NZweYr4a19UhyuAYmT5Xtwa6k8U3jAvV2YEjpy2SaPwDAVS P2tVFzkHz0H2y4IVY+gzxxMdgywlwEKbeR3QHJ0mf+aoErWjImdBPTKqz3gk1cHrqHnZ vXyMglxKR/KQ0A66LaQ2O87YtUowmRzDOYOym+ciOt/4VtafJNAMEp/Rlhh1x4RhLU5j cwvw== X-Gm-Message-State: ANhLgQ2f+SoRd5Sa85L4cwE04rFmorJH2/pxZ8lMXsO0Vh82yDgyBM97 LfJ/bSrCTZNQ9xdtrcgy0zNxXWfL1diz7DlrMzM= X-Google-Smtp-Source: ADFU+vuqsXOYzyrF6K6pDWJlyV0eBE4hlcIiiXFGUKJLW+CJCiFKzhUWvHtjSfsl8aa4BnOE+WDFZk8TgEYCZGeT2f0= X-Received: by 2002:ac2:5e85:: with SMTP id b5mr1929912lfq.99.1583931105054; Wed, 11 Mar 2020 05:51:45 -0700 (PDT) MIME-Version: 1.0 References: <1ab500be-a957-7dde-8bad-c94fbf8483ab@suse.cz> <4be8dd93-35cf-3155-2843-87a56fb774d9@suse.cz> <20200309201922.GI9796@kam.mff.cuni.cz> <5eeace6a-40cf-354f-238e-59a4740aa165@suse.cz> <20200310110929.GA48643@kam.mff.cuni.cz> <36d32a03-a4b5-1318-38b6-ece55fe7ed70@suse.cz> <78b445d1-ab4f-ad21-e3a1-aa791a15361c@suse.cz> In-Reply-To: <78b445d1-ab4f-ad21-e3a1-aa791a15361c@suse.cz> From: Richard Biener Date: Wed, 11 Mar 2020 13:51:33 +0100 Message-ID: Subject: Re: [PATCH][RFC] API extension for binutils (type of symbols). To: =?UTF-8?Q?Martin_Li=C5=A1ka?= Cc: Jan Hubicka , GCC Patches Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Mar 2020 12:51:47 -0000 On Wed, Mar 11, 2020 at 1:22 PM Martin Li=C5=A1ka wrote: > > On 3/11/20 11:22 AM, Richard Biener wrote: > > On Wed, Mar 11, 2020 at 10:19 AM Martin Li=C5=A1ka wro= te: > >> > >> On 3/10/20 1:07 PM, Martin Li=C5=A1ka wrote: > >>> On 3/10/20 12:24 PM, Richard Biener wrote: > >>>> Not sure how symtab is encoded right now but we also could have > >>> > >>> Ok, right now I don't see symtab entry much extensible. > >>> > >>> But what am I suggesting is to parse LTO bytecode version and then > >>> process conditional parsing of lto_symtab section. > >>> > >>> Thoughts? > >>> Martin > >> > >> So as H.J. correctly pointed I can't add new symbol_type into ld_plugi= n_symbol struct. > >> It would make ABI change as correctly identified by abidiff: > >> > >> abidiff /tmp/before.o /tmp/after.o > >> Functions changes summary: 0 Removed, 1 Changed, 0 Added function > >> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable > >> Function symbols changes summary: 0 Removed, 0 Added function symbol n= ot referenced by debug info > >> Variable symbols changes summary: 0 Removed, 1 Added variable symbol n= ot referenced by debug info > >> > >> 1 function with some indirect sub-type change: > >> > >> [C]'function ld_plugin_status onload(ld_plugin_tv*)' at lto-plugin= .c:1275:1 has some indirect sub-type changes: > >> parameter 1 of type 'ld_plugin_tv*' has sub-type changes: > >> in pointed to type 'struct ld_plugin_tv' at plugin-api.h:451:1= : > >> type size hasn't changed > >> 1 data member changes (1 filtered): > >> type of 'union {int tv_val; const char* tv_string; ld_plugi= n_register_claim_file tv_register_claim_file; ld_plugin_register_all_symbol= s_read tv_register_all_symbols_read; ld_plugin_register_cleanup tv_register= _cleanup; ld_plugin_add_symbols tv_add_symbols; ld_plugin_get_symbols tv_ge= t_symbols; ld_plugin_add_input_file tv_add_input_file; ld_plugin_message tv= _message; ld_plugin_get_input_file tv_get_input_file; ld_plugin_get_view tv= _get_view; ld_plugin_release_input_file tv_release_input_file; ld_plugin_ad= d_input_library tv_add_input_library; ld_plugin_set_extra_library_path tv_s= et_extra_library_path; ld_plugin_get_input_section_count tv_get_input_secti= on_count; ld_plugin_get_input_section_type tv_get_input_section_type; ld_pl= ugin_get_input_section_name tv_get_input_section_name; ld_plugin_get_input_= section_contents tv_get_input_section_contents; ld_plugin_update_section_or= der tv_update_section_order; ld_plugin_allow_section_ordering tv_allow_sect= ion_ordering; ld_plugin_allow_unique_segment_for_sections tv_allow_unique_s= egment_for_sections; ld_plugin_unique_segment_for_sections tv_unique_segmen= t_for_sections; ld_plugin_get_input_section_alignment tv_get_input_section_= alignment; ld_plugin_get_input_section_size tv_get_input_section_size; ld_p= lugin_register_new_input tv_register_new_input; ld_plugin_get_wrap_symbols = tv_get_wrap_symbols;} ld_plugin_tv::tv_u' changed: > >> type size hasn't changed > >> 1 data member changes (1 filtered): > >> type of 'ld_plugin_add_symbols tv_add_symbols' changed: > >> underlying type 'enum ld_plugin_status (void*, int, co= nst ld_plugin_symbol*)*' changed: > >> in pointed to type 'function type enum ld_plugin_sta= tus (void*, int, const ld_plugin_symbol*)': > >> parameter 3 of type 'const ld_plugin_symbol*' has = sub-type changes: > >> in pointed to type 'const ld_plugin_symbol': > >> in unqualified underlying type 'struct ld_plug= in_symbol' at plugin-api.h:86:1: > >> type size changed from 256 to 288 (in bits) > >> 1 data member insertion: > >> 'int ld_plugin_symbol::symbol_type', at of= fset 256 (in bits) at plugin-api.h:95:1 > >> > >> So that I need to come up with ld_plugin_symbol_v2. It brings more cha= llenges: one has 2 parallel symbol > >> tables: > >> > >> struct plugin_symtab > >> { > >> ... > >> struct ld_plugin_symbol_v2 *syms; > >> struct ld_plugin_symbol *syms_v1; > >> ... > >> }; > >> > >> and the information of these should by aligned. > >> > >> The patch can survive lto.exp and I would like to ask H.J. to write bi= ntuils counterpart that will > >> utilize the new LDPT_GET_SYMBOLS_V4, LDPT_ADD_SYMBOLS_V2. > >> > >> Thoughts? > > > > Can't we simply have _V4/V2 use the upper half of > > ld_plugin_symbol::def? If the linker > > then requests _V4 but the plugin cannot cope it could still "use" the > > data but get > > LDST_UNKNOWN (zero) there. > > Can be possible, but it's hack a bit. The plugin has a mechanisms for ver= sioning > and this change does not align with the idea. I'm not sure I understand the versioning, we should aim at something where an updated plugin can talk to old and new ld and where a new ld can also ta= lk to an old plugin. That requires an arbitration which I don't see implement= ed? Splitting an existing field isn't hackish IMHO. I guess even explicitely changing it to one short and two char fields would be OK. Is there a comprehensive list of plugins out in the wild using the LD plugin API? Note we also have to bring in gold folks (not sure if lld also implements the same plugin API) > > > > IMHO LDST_VARIABLE_BSS is "misplaced"? "BSS" is the section of the var= iable. > > Yes. > > > If we want to encode more of ELF it should be LDST_OBJECT and LDST_FUNC= . > > Note there's also rodata vs data info that would be missing in case > > we'd want tools > > like readelf -s dump the symbol table of the IL part of an object. It > > looks like > > nm can also distinguish rodata from data ("R", "r" vs "d") and "small o= bject" > > data sections (not sure what's that about). It seems nm cannot disting= uish > > symbols in mergeable string sections (it dumps "R" for me there). So i= ntead > > of mangling everything into enum ld_plugin_symbol_type should we instea= d > > add a > > > > enum ld_plugin_symbol_special_section > > { > > LDSSS_DEFAULT, > > LDSSS_BSS, > > LDSSS_RODATA, > > ... > > } > > Which maps to what we have for: > > /* Information that is provided by all instances of the section type. */ > struct GTY(()) section_common { > /* The set of SECTION_* flags that apply to this section. */ > unsigned int flags; > }; > > #define SECTION_CODE 0x00100 /* contains code */ > ... > #define SECTION_BSS 0x02000 /* contains zeros only */ > ... > #define SECTION_TLS 0x40000 /* contains thread-local storage = */ > ... > #define SECTION_COMMON 0x800000 /* contains common data */ > #define SECTION_RELRO 0x1000000 /* data is readonly after relocat= ion processing */ > ... > > Anyway, that would be another type which we need in ld_plugin_symbol. > > Martin > > > > > where LDSSS_DEFAULT means .text for FUNC and .data for OBJECT? > > LDSSS_COMDAT might also apply but there's already the comdat_key > > member which makes this info implicitely available. > > > > Richard. > > > >> Martin > >> > >> >