From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id 929EF3858C60 for ; Thu, 2 Sep 2021 11:52:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 929EF3858C60 Received: by mail-ed1-x536.google.com with SMTP id r7so2458734edd.6 for ; Thu, 02 Sep 2021 04:52:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=S7xUY9KK41OhZ/4AN2ulXFNU4pj8GOenG2i5h8XSCGU=; b=QQqsWZiizCXY5/OPiHROsxYtdNGnWia4GB6CBAOzDDnOalKAuKHhpNG/AYrEiISGPU uj+3fiZRCwqMKqE2AoFqbT+KQWhwcJYGMInhoOvmoWFkFC1fES+3rXZehi9nz5yM6yGH 4IDnGsspFnBsWNJ/6xgjTkNGszQSCvudjJd/FvFZSw6nDlvDsDB/cZdv4wwvL4WjhHGK LmdpePJAtXa2Y7WJyFYhjqQx7XAuLArVpW1FIvKSlODCr/dxJZHKunSHLGp07y6aKzYl 3GtzOn7d8CRQMXdqHHOCg5yg/fBSG9fW1s+NU5ShaBsMFswPQqBkRpr0aX2nN/Ah4EI9 RRig== X-Gm-Message-State: AOAM533OwEg9ih4xgeSY1ggklsrlVYMduLLrkKy1axuydmDgy3KvfB8G t2GPT3YQ7kUuRe81XfNi+9U7An2/ctrI228skhQ= X-Google-Smtp-Source: ABdhPJyF8zV1U84XgZMSNnp4J2yJmcF6MahMFBVQZIMJNvIrrbjio7cb5FwHQTGzWOqCMPUYzW4ethRiLyAFii95K3Q= X-Received: by 2002:a05:6402:5188:: with SMTP id q8mr3262955edd.138.1630583528984; Thu, 02 Sep 2021 04:52:08 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Thu, 2 Sep 2021 13:51:58 +0200 Message-ID: Subject: Re: [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059] To: "Kewen.Lin" Cc: GCC Patches , Jan Hubicka , =?UTF-8?Q?Martin_Li=C5=A1ka?= , Segher Boessenkool , Bill Schmidt , Florian Weimer , Martin Jambor Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Sep 2021 11:52:21 -0000 On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin wrote: > > Hi Richi, > > Thanks for the comments! > > on 2021/9/2 =E4=B8=8B=E5=8D=885:25, Richard Biener wrote: > > On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin wrote: > >> > >> Hi! > >> > >> Power ISA 2.07 (Power8) introduces transactional memory feature > >> but ISA3.1 (Power10) removes it. It exposes one troublesome > >> issue as PR102059 shows. Users define some function with > >> target pragma cpu=3Dpower10 then it calls one function with > >> attribute always_inline which inherits command line option > >> cpu=3Dpower8 which enables HTM implicitly. The current isa_flags > >> check doesn't allow this inlining due to "target specific > >> option mismatch" and error mesasge is emitted. > >> > >> Normally, the callee function isn't intended to exploit HTM > >> feature, but the default flag setting make it look it has. > >> As Richi raised in the PR, we have fp_expressions flag in > >> function summary, and allow us to check the function actually > >> contains any floating point expressions to avoid overkill. > >> So this patch follows the similar idea but is more target > >> specific, for this rs6000 port specific requirement on HTM > >> feature check, we would like to check rs6000 specific HTM > >> built-in functions and inline assembly, it allows targets > >> to do their own customized checks and updates. > >> > >> It introduces two target hooks need_ipa_fn_target_info and > >> update_ipa_fn_target_info. The former allows target to do > >> some previous check and decides to collect target specific > >> information for this function or not. For some special case, > >> it can predict the analysis result and push it early without > >> any scannings. The latter allows the analyze_function_body > >> to pass gimple stmts down just like fp_expressions handlings, > >> target can do its own tricks. I put them as one hook initially > >> with one boolean to indicates whether it's initial time, but > >> the code looks a bit ugly, to separate them seems to have > >> better readability. > >> > >> To make it simple, this patch uses HOST_WIDE_INT to record the > >> flags just like what we use for isa_flags. For rs6000's HTM > >> need, one HOST_WIDE_INT variable is quite enough, but it seems > >> good to have one auto_vec for scalability as I noticed some > >> targets have more than one HOST_WIDE_INT flag. For now, this > >> target information collection is only for always_inline function, > >> function ipa_merge_fn_summary_after_inlining deals with target > >> information merging. > >> > >> The patch has been bootstrapped and regress-tested on > >> powerpc64le-linux-gnu Power9. > >> > >> Is it on the right track? > > > > + if (always_inline) > > + { > > + cgraph_node *callee_node =3D cgraph_node::get (callee); > > + if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) !=3D= NULL) > > + { > > + if (dump_file) > > + ipa_dump_fn_summary (dump_file, callee_node); > > + const vec &info =3D > > + ipa_fn_summaries->get (callee_node)->target_info; > > + if (!info.is_empty ()) > > + always_inline_safe_mask |=3D ~info[0] & OPTION_MASK_HTM; > > + } > > + > > + caller_isa &=3D ~always_inline_safe_mask; > > + callee_isa &=3D ~always_inline_safe_mask; > > + } > > > > that's a bit convoluted but obviously the IPA info can be used for > > non-always_inline cases as well. > > > > As said above the info can be produced for not always-inline functions > > as well, the usual case would be for LTO inlining across TUs compiled > > with different target options. In your case the special -mcpu=3Dpower1= 0 > > TU would otherwise not be able to inline from a general -mcpu=3Dpower8 = TU. > > > > Agree it can be extended to non-always_inline cases. Since always_inline > is kind of user "forced" requirement and compiler emits error if it fails > to inline, while non-always_inline will have warning instead. Considerin= g > the scanning might be considered as costly for some big functions, I > guessed it might be good to start from always_inline as the first step. > But if different target options among LTO TUs is a common user case, I > think it's worth to extending it now. I was merely looking at this from the perspective of test coverage - with restricting it to always-inline we're not going to notice issues very reliably I think. > > On the streaming side we possibly have to take care about the > > GPU offloading path where we likely want to avoid pushing host target > > bits to the GPU target in some way. > > > > I guess this comment is about lto_stream_offload_p, I just did some quick > checks, this flag seems to guard things into section offload_lto, while > the function summary has its own section, it seems fine? Yes, but the data, since its target specific, is interpreted different by the host target than by the offload target so IMHO we should drop this to a conservative state when offloading? > > Your case is specifically looking for HTM target builtins - for more ge= neral > > cases, like for example deciding whether we can inline across, say, > > -mlzcnt on x86 the scanning would have to include at least direct inter= nal-fns > > mapping to optabs that could change. With such inlining we might also > > work against heuristic tuning decisions based on the ISA availability > > making code (much) more expensive to expand without such ISA availabili= ty, > > but that wouldn't be a correctness issue then. > > OK=EF=BC=8CI would assume the hook function parameter gimple* will be als= o enough for > this example. :) Yes, I think so. > IMHO, even with this target information collection, we are unable to chec= k all > ISA features, it can only work for some "dull" ISA features, like HTM on > Power which can only be exploited by builtin (or inline asm), the downstr= eam > passes don't try to exploit it in other ways. For some features like VSX= , > vectorization pass can produce vector code after we generating fn summary= and > think it doesn't use, it seems no way to get it right in early stage of p= ass > pipeline. I don't think later passes are an issue - they would operate under the constraints of the caller flag and thus still generate valid code. Yes, if, say, someb= ody disabled VSX on a function in the attempt to not vectorize an unprofitable = case and that function gets inlined it might end up using VSX (as now active in = the caller) to vectorize the unprofitable case. But in general it should work for any ISA feature (and some ISA features might just change what we expand common GIMPLE to - differences in those ISA features do not need to prevent inlining from a correctness perspective). Richard. > > > > Otherwise the overall bits look OK but I'll leave the details to our IP= A folks. > > > > Thanks again! > > BR, > Kewen > > > Thanks, > > Richard. > > > >> > >> Any comments are highly appreciated! > >> > >> BR, > >> Kewen > >> ------ > >> gcc/ChangeLog: > >> > >> PR ipa/102059 > >> * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New > >> function. > >> * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p):= New > >> declare. > >> * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New= macro. > >> (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise. > >> (rs6000_need_ipa_fn_target_info): New function. > >> (rs6000_update_ipa_fn_target_info): Likewise. > >> (rs6000_can_inline_p): Adjust for ipa function summary target = info. > >> * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa functi= on > >> summary target info. > >> (analyze_function_body): Adjust for ipa function summary targe= t > >> info and call hook rs6000_need_ipa_fn_target_info and > >> rs6000_update_ipa_fn_target_info. > >> (ipa_merge_fn_summary_after_inlining): Adjust for ipa function > >> summary target info. > >> (inline_read_section): Likewise. > >> (ipa_fn_summary_write): Likewise. > >> * ipa-fnsummary.h (ipa_fn_summary::target_info): New member. > >> * doc/tm.texi: Regenerate. > >> * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document = new > >> hook. > >> (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise. > >> * target.def (update_ipa_fn_target_info): New hook. > >> (need_ipa_fn_target_info): Likewise. > >> * targhooks.c (default_need_ipa_fn_target_info): New function. > >> (default_update_ipa_fn_target_info): Likewise. > >> * targhooks.h (default_update_ipa_fn_target_info): New declare= . > >> (default_need_ipa_fn_target_info): Likewise. > >> > >> gcc/testsuite/ChangeLog: > >> > >> PR ipa/102059 > >> * gcc.dg/lto/pr102059_0.c: New test. > >> * gcc.dg/lto/pr102059_1.c: New test. > >> * gcc.dg/lto/pr102059_2.c: New test. > >> * gcc.target/powerpc/pr102059-5.c: New test. > >> * gcc.target/powerpc/pr102059-6.c: New test. > >>