From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x35.google.com (mail-oa1-x35.google.com [IPv6:2001:4860:4864:20::35]) by sourceware.org (Postfix) with ESMTPS id 10A5A3856DC2 for ; Sun, 15 May 2022 06:57:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 10A5A3856DC2 Received: by mail-oa1-x35.google.com with SMTP id 586e51a60fabf-f17f1acffeso3384270fac.4 for ; Sat, 14 May 2022 23:57:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=L3ya2qxkRQGrPe4+BKZNNM4AqzQ1UsIPqZwSJEXpfB8=; b=vLaSXzf1IfjJpuhq6jA+IOZV7O9rLxTjge/GITU0lD6kUNH50pzlQyzKOKXuJs8eT4 ZtHBV61WXrFXlPzHxmL9ja/9FFd82Batoy4Vgi8lfBQgYJ+uJ8G4bYLWH4uuvPycXv17 WPz2YVoX1uatu4VtVEEYnfPlh99NQcvy1EoTZAxaXshtchc/8GT8weHTPKzlAKZh9PsT iLSJQyW8VJ5ohjLmJ+7od+Ox+J1hDgcwZev7BJAFTAQavx2Ptm/TEn9FYDEw2S/2jaDF 52JjUKJySlzP1Bj55fS/AJoXC2/tDJQB24mish7yZR1hQCniYaQO+R0c5tRS1xe0Gn59 F74g== X-Gm-Message-State: AOAM533lZEkpHjpoB4m/05vpsLPJarJtcHKmhPk/8eT7e/TACBdfg3jt ju23/PX7XPFqe91x64Gu7pyJpy4afECsy4xYeDQ= X-Google-Smtp-Source: ABdhPJyRvz1REhwOP09wGuH2BmpP11XvJviKd7VW6vJoWqg6ZJcqou8WU1w7MjVm7nVKIoYMK0QsKk0aD3DnjKPvxWQ= X-Received: by 2002:a05:6870:612a:b0:ed:a58f:eaf7 with SMTP id s42-20020a056870612a00b000eda58feaf7mr12265644oae.0.1652597856230; Sat, 14 May 2022 23:57:36 -0700 (PDT) MIME-Version: 1.0 References: <63633ead-aa7e-c424-9851-ac332ac13df3@suse.cz> <27841a42-baef-d53e-c601-ad265030854d@suse.cz> <80f37f2-efdf-673-a8f4-69f2d5842ea2@ispras.ru> In-Reply-To: <80f37f2-efdf-673-a8f4-69f2d5842ea2@ispras.ru> From: Rui Ueyama Date: Sun, 15 May 2022 14:57:25 +0800 Message-ID: Subject: Re: [PATCH] lto-plugin: add support for feature detection To: Alexander Monakov Cc: =?UTF-8?Q?Martin_Li=C5=A1ka?= , GCC Patches , Jan Hubicka Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 May 2022 06:57:38 -0000 On Fri, May 6, 2022 at 10:47 PM Alexander Monakov wrot= e: > > > > On Thu, 5 May 2022, Martin Li=C5=A1ka wrote: > > > On 5/5/22 12:52, Alexander Monakov wrote: > > > Feels a bit weird to ask, but before entertaining such an API extensi= on, > > > can we step back and understand the v3 variant of get_symbols? It is = not > > > documented, and from what little I saw I did not get the "motivation"= for > > > its existence (what it is doing that couldn't be done with the v2 api= ). > > > > Please see here: > > https://github.com/rui314/mold/issues/181#issuecomment-1037927757 > > Thanks. I've also re-read [1] and [2] which provided some relevant ideas. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D86490 > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=3D23411 > > > OK, so the crux of the issue is that sometimes the linker needs to feed t= he > compiler plugin with LTO .o files extracted from static archives. This is > not really obvious, because normally .a archives have an index that enume= rates > symbols defined/used by its .o files, and even during LTO the linker can = simply > consult the index to find out which members to extract. In theory, at le= ast. > > The theory breaks in the following cases: > > - ld.bfd and common symbols (I wonder if weak/comdat code is also affect= ed?): > archive index does not indicate which definitions are common, so ld.bfd > extracts the member and feeds it to the plugin to find out; > > - ld.gold and emulated archives via --start-lib a.o b.o ... --end-lib: h= ere > there's no index to consult and ld.gold feeds each .o to the plugin. > > In those cases it may happen that the linker extracts an .o file that wou= ld > not be extracted during non-LTO link, and if that happens, the linker nee= ds to > inform the plugin. This is not the same as marking each symbol from spuri= ously > extracted .o file as PREEMPTED when the .o file has constructors (the plu= gin > will assume the constructors are kept while the linker needs to discard t= hem). > > So get_symbols_v3 allows the linker to discard an LTO .o file to solve th= is. > > In absence of get_symbols_v3 mold tries to ensure correctness by restarti= ng > itself while appending a list of .o files to be discarded to its command = line. > > I wonder if mold can invoke plugin cleanup callback to solve this without > restarting. We can call the plugin cleanup callback from mold, but there are a few prob= lems: First of all, it looks like it is not clear what state the plugin cleanup callback resets to. It may reset it to the initial state with which we need to restart everything from calling `onload` callback, or it may not deregister functions registered by the previous `onload` call. Since the exact semantics is not documented, the LLVM gold plugin may behave differently than the GCC plugin. Second, if we reset a plugin's internal state, we need to register all input files by calling the `claim_file_hook` callback, which in turn calls the `add_symbols` callback. But we don't need any symbol information at this point because mold already knows what are in LTO object files as it calls `claim_file_hook` already on the same sets of files. So the `add_symbols` invocations would be ignored, which is a waste of resources. So, I prefer get_symbols_v3 over calling the plugin cleanup callback. > (also, hm, it seems to confirm my idea that LTO .o files should have had = the > correct symbol table so normal linker algorithms would work) I agree. If GCC LTO object file contains a correct ELF symbol table, we can also eliminate the need of the special LTO-aware ar command. It looks like it is a very common error to use an ar command that doesn't understand the LTO object file, which results in mysterious "undefined symbol" errors even though the object files in an archive file provide that very symbols.