From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 9E3363857C52 for ; Thu, 24 Sep 2020 08:03:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 9E3363857C52 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rguenther@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3221CB234; Thu, 24 Sep 2020 08:04:16 +0000 (UTC) Date: Thu, 24 Sep 2020 10:03:38 +0200 (CEST) From: Richard Biener Sender: rguenther@c653.arch.suse.de To: Tobias Burnus cc: gcc-patches , Jakub Jelinek , Jan Hubicka Subject: Re: [Patch] LTO: Force externally_visible for offload_vars/funcs (PR97179) In-Reply-To: <54a8767f-3cfe-a3ca-6149-0a6d3ee0b6d9@codesourcery.com> Message-ID: References: <4250958d-f7bf-1a0a-31d2-63eff191b258@codesourcery.com> <0e22d8c5-1008-cad4-c131-57ee3950a73a@codesourcery.com> <26b07ad0-ba42-b2c6-2325-cad7360f8e2c@codesourcery.com> <54a8767f-3cfe-a3ca-6149-0a6d3ee0b6d9@codesourcery.com> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2020 08:03:42 -0000 On Thu, 24 Sep 2020, Tobias Burnus wrote: > On 9/24/20 9:03 AM, Richard Biener wrote: > > > Hmm, but offload_vars and offload_funcs do not need to be exported > > since they get stored into tables with addresses pointing to them > > (and that table is exported). > > Granted but the x86-64 linker does not seem to be able to resolve > the symbol if the table is in a.ltrans0.ltrans.o and the variable > or function is in a.ltrans1.ltrans.o > > That's both host/x86-64 code; the linker might not see that the > table is used by a dynamic library ? but still it should resolve > the links, shouldn't it? > > Possibly, the 'externally_visible = 1' in my code is also a > read herring; it also works by using: > TREE_PUBLIC (decl) = 1; > gcc_assert (!node->offloadable); > node->offloadable = 1; > and below > if (node->offloadable) > { > node->offloadable = 0; > validize_symbol_for_target (node); > continue; > } > Namely: PUBLIC + avoid calling promote_symbol. > > > Note that ultimatively the desired visibility is determined by > > the linker and communicated via the resolution file to the WPA > > stage. I'm not sure whether both host and offload code participate > > in the same link and thus if the offload tables are properly > > seen as being referenced > > This could be the problem. The device part is linked by the > host/x86-64 linker ? but the device's ".o" files are just linked > and not processed by 'ld. (In case of nvptx, they are host > compiled .o files which contain everything as strings with the > nvptx as text ? to be passed to the JIT at startup.) > > Note that *no* WPA/LTO is done on the device side ? there only all > generated files are collected without any inter-file > optimizations. (Sufficient for the code generated by the program, > which is all in one file ? but it still would be useful to > inline, e.g., libm functions.) > > > (for a non-DSO symbols are usually _not_ > > force-exported) - so, how is the offload table constructed? > > First, the offload tables exist both on the host and on the > device(s). They have to be identical as otherwise the > association between variables and function is lost. > > The symbols are added to offload_vars + offload_funcs. > > In lto-cgraph.c's output_offload_tables there is the last chance > to remove now unused nodes ? as once the tables are streamed > for device usage, they cannot be changed. Hence, there one > has > node->force_output = 1; > [Unrelated: this prevents later optimizations, which still > could be done; cf. PR95622] > > > The table itself is written in omp-offload.c's omp_finish_file. But this is called at LTRANS time only, in particular we seem to stream the offload_funcs/vars array, marking streamed nodes as force_output but we do not make the offload table visible to the partitioner. But force_output should make the nodes not renamed. But then output_offload_tables is called at the very end and we likely do not stream the altered force_output state. So - can you try, in prune_offload_funcs, in addition to setting DECL_PRESERVE_P, mark the cgraph node ->force_output so this happens early? I guess the same is needed for variables (there's no prune_offloar_vars ...). > For the host, the constructor is constructed in > add_decls_addresses_to_decl_constructor, which does: > CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr); > if (is_var) > CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size); > and then in omp_finish_file: > tree funcs_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL, > get_identifier (".offload_func_table"), > funcs_decl_type); > DECL_USER_ALIGN (funcs_decl) = DECL_USER_ALIGN (vars_decl) = 1; > SET_DECL_ALIGN (funcs_decl, TYPE_ALIGN (funcs_decl_type)); > DECL_INITIAL (funcs_decl) = ctor_f; > set_decl_section_name (funcs_decl, OFFLOAD_FUNC_TABLE_SECTION_NAME); > varpool_node::finalize_decl (vars_decl); > > Tobias > > ----------------- > Mentor Graphics (Deutschland) GmbH, Arnulfstra?e 201, 80634 M?nchen / Germany > Registergericht M?nchen HRB 106955, Gesch?ftsf?hrer: Thomas Heurung, Alexander > Walter > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imend