From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1579 invoked by alias); 2 May 2011 19:32:57 -0000 Received: (qmail 1561 invoked by uid 22791); 2 May 2011 19:32:55 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 02 May 2011 19:32:39 +0000 Received: from kpbe19.cbf.corp.google.com (kpbe19.cbf.corp.google.com [172.25.105.83]) by smtp-out.google.com with ESMTP id p42JWcWH018915 for ; Mon, 2 May 2011 12:32:38 -0700 Received: from yxe42 (yxe42.prod.google.com [10.190.2.42]) by kpbe19.cbf.corp.google.com with ESMTP id p42JW4Zd020608 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Mon, 2 May 2011 12:32:37 -0700 Received: by yxe42 with SMTP id 42so2083264yxe.30 for ; Mon, 02 May 2011 12:32:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.165.12 with SMTP id n12mr7874950ybe.16.1304364756984; Mon, 02 May 2011 12:32:36 -0700 (PDT) Received: by 10.150.192.11 with HTTP; Mon, 2 May 2011 12:32:36 -0700 (PDT) In-Reply-To: References: <20110429025248.90D61B21AB@azwildcat.mtv.corp.google.com> Date: Mon, 02 May 2011 19:32:00 -0000 Message-ID: Subject: Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078) From: Xinliang David Li To: Sriraman Tallam Cc: Richard Guenther , reply@codereview.appspotmail.com, gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg00109.txt.bz2 Ok. There may be more correctness related comments -- but those can be addressed when available. For trunk, you need to address issues such as multi-way dispatch. Thanks, David On Mon, May 2, 2011 at 12:11 PM, Sriraman Tallam wrot= e: > Hi, > > =A0I want to submit this patch to google/main to make sure it is > available for our internal use at Google in order to materialize some > optimization opportunities. Let us continue this dicussion as I make > changes and submit this for review for trunk. > > Thanks, > -Sri. > > > On Mon, May 2, 2011 at 9:41 AM, Xinliang David Li wr= ote: >> On Mon, May 2, 2011 at 2:11 AM, Richard Guenther >> wrote: >>> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li = wrote: >>>> Here is the background for this feature: >>>> >>>> 1) People relies on function multi-version to explore hw features and >>>> squeeze performance, but there is no standard ways of doing so, either >>>> a) using indirect function calls with function pointers set at program >>>> initialization; b) using manual dispatch at each callsite; b) using >>>> features like IFUNC. =A0The dispatch mechanism needs to be promoted to >>>> the language level and becomes the first class citizen; >>> >>> You are not doing that, you are inventing a new (crude) GCC extension. >> >> To capture the high level semantics and prevent user from lowering the >> dispatch calls into forms compiler can not recognize, language >> extension is the way to go. >> >>> >> >>>> See above. Multi-way dispatch can be added similarly. >>> >>> Not with the specified syntax. =A0So you'd end up with _two_ >>> language extensions. =A0That's bad (and unacceptable, IMHO). >> >> This part can be improved. >> >>> >>>> >>>>> >>>>> That's a nice idea, but why not simply process all functions with >>>>> a const attribute and no arguments this way? =A0IMHO >>>>> >>>>> int testsse (void) __attribute__((const)); >>>>> >>>>> is as good and avoids the new attribute completely. =A0The pass would >>>>> also benefit other uses of this idiom (giving a way to have global >>>>> dynamic initializers in C). =A0The functions may not be strictly 'con= st' >>>>> in a way the compiler autodetects this attribute but it presents >>>>> exactly the promises to the compiler that allow this optimization. >>>>> >>>>> Thus, please split this piece out into a separate patch and use >>>>> the const attribute. >>>> >>>> >>>> Sounds good. >>>> >> >>>> >>>> 1) the function selection may happen in a different function; >>> >>> Well, sure. =A0I propose to have the above as lowered form. =A0If people >>> deliberately obfuscate code it's their fault. =A0Your scheme simply >>> makes that obfuscation impossible (or less likely possible). =A0So >>> 1) is not a valid argument. >> >> Lowering too early may defeat the purpose because >> >> 1) the desired optimization may not happen subject to many compiler >> heuristic changes; >> 2) it has other side effects such as wrong estimation of function size >> which may impact inlining >> 3) it limits the lowering into one form which may not be ideal =A0-- >> with builtin_dispatch, after hoisting optimization, the lowering can >> use more efficient IFUNC scheme, for instance. >> >>> >>> >>> My point is that such optimization is completely independent of >>> that dispatch thing. =A0The above could as well be a selection to >>> use different input arrays, one causing alias analysis issues >>> and one not. >>> >>> Thus, a __builtin_dispatch centric optimization pass is the wrong >>> way to go. >> >> I agree that many things can implemented in different ways, but a high >> level standard builtin_dispatch mechanism doing hoisting >> interprocedcurally is cleaner and simpler and targeted for function >> multi-versioning -- it does not depend on/rely on later phase's >> heuristic tunings to make the right things to happen. Function MV >> deserves this level of treatment as it will become more and more >> important for some users (e.g., Google). >> >>> >>> Now, with FDO I'd expect the foo is inlined into bar (because foo >>> is deemed hot), >> >> That is a myth -- the truth is that there are other heuristics which >> can prevent this from happening. >> >>> then you only need to deal with loop unswitching, >>> which should be easy to drive from FDO. >> >> Same here -- the loop body may not be well formed/recognized. The loop >> nests may not be perfectly nested, or other unswitching heuristics may >> block it from happening. =A0This is the common problem form many other >> things that get lowered too early. It is cleaner to make the high >> level transformation first in IPA, and let unswitching dealing with >> intra-procedural optimization. >> >> Thanks, >> >> David >> >>> >>> Richard. >>> >>>> >>>> >>>> Thanks, >>>> >>>> David >>>> >>>> >>>> >>>>> >>>>> Note that we already have special FDO support for indirect to direct >>>>> call promotion, so that might work already. >>>>> >>>>> Thus, with all this, __builtin_dispatch would be more like syntactic >>>>> sugar to avoid writing above switch statement yourself. =A0If you res= trict >>>>> that sugar to a constant number of candidates it can be replaced by >>>>> a macro (or several ones, specialized for the number of candidates). >>>>> >>>>> Richard. >>>>> >>>>>> =A0For the call graph that looks like this before cloning : >>>>>> >>>>>> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----= > popcnt >>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0| >>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0-----> no_popcnt >>>>>> >>>>>> =A0where popcnt and no_popcnt are the multi-versioned functions, the= n after cloning : >>>>>> >>>>>> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> findOne= s_clone0 ----> popcnt >>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| >>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ----> fu= nc_hot_clone1 ----> findOnes_clone1 ----> no_popcnt >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> =A0Flags : -fclone-hot-version-paths does function unswitching via c= loning. >>>>>> =A0 =A0 =A0 =A0 =A0--param=3Dnum-mversn-clones=3D allows to spe= cify the number of >>>>>> =A0 =A0 =A0 =A0 =A0functions that should be cloned. >>>>>> =A0 =A0 =A0 =A0 =A0--param=3Dmversn-clone-depth=3D allows to sp= ecify the length of >>>>>> =A0 =A0 =A0 =A0 =A0the call graph path that should be cloned. =A0num= =3D 0 implies only >>>>>> =A0 =A0 =A0 =A0 =A0leaf node that contains the __builtin_dispatch st= atement must be >>>>>> =A0 =A0 =A0 =A0 =A0cloned. >>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>> >>>>>> Google ref 45947, 45986 >>>>>> >>>>>> 2011-04-28 =A0Sriraman Tallam =A0 >>>>>> >>>>>> =A0 =A0 =A0 =A0* c-family/c-common.c =A0 (revision 173122) (handle_v= ersion_selector_attribute): New function. >>>>>> =A0 =A0 =A0 =A0(c_common_attribute_table): New attribute "version_se= lector". >>>>>> =A0 =A0 =A0 =A0* tree-pass.h =A0 (revision 173122) (pass_tree_conver= t_builtin_dispatch): New pass. >>>>>> =A0 =A0 =A0 =A0(pass_ipa_multiversion_dispatch): New pass. >>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn7.c =A0 =A0(revision 0): New= test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4.c =A0 =A0(revision 0): New= test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4.h =A0 =A0(revision 0): New= test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4a.c =A0 (revision 0): New t= est case. >>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/torture/mversn1.c =A0 =A0(revision= 0): New test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn2.c =A0 =A0(revision 0): New= test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn6.c =A0 =A0(revision 0): New= test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn3.c =A0 =A0(revision 0): New= test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn8.C =A0 =A0(revision 0): New= test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn10a.C =A0(revision 0): New t= est case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn14a.C =A0(revision 0): New t= est case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn13.C (revision 0):= New test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn15.C (revision 0):= New test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn15a.C =A0 =A0 =A0 = =A0(revision 0): New test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn9.C =A0 =A0(revision 0): New= test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn10.C =A0 (revision 0): New t= est case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn12.C =A0 (revision 0): New t= est case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn14.C =A0 (revision 0): New t= est case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn16.C =A0 (revision 0): New t= est case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn11.C =A0 (revision 0= ): New test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5.C =A0 =A0(revision= 0): New test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5.h =A0 =A0(revision= 0): New test case. >>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5a.C =A0 (revision 0= ): New test case. >>>>>> =A0 =A0 =A0 =A0* builtin-types.def =A0 =A0 (revision 173122) (BT_PTR= _FN_INT): New pointer type. >>>>>> =A0 =A0 =A0 =A0(BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type= for __builtin_dispatch. >>>>>> =A0 =A0 =A0 =A0* builtins.def =A0(revision 173122) (BUILT_IN_DISPATC= H): New builtin to >>>>>> =A0 =A0 =A0 =A0support multi-version calls. >>>>>> =A0 =A0 =A0 =A0* mversn-dispatch.c =A0 =A0 (revision 0): New file. >>>>>> =A0 =A0 =A0 =A0* timevar.def =A0 (revision 173122) (TV_MVERSN_DISPAT= CH): New time var. >>>>>> =A0 =A0 =A0 =A0* common.opt =A0 =A0(revision 173122) (fclone-hot-ver= sion-paths): New flag. >>>>>> =A0 =A0 =A0 =A0* Makefile.in =A0 (revision 173122) (mversn-dispatch.= o): New rule. >>>>>> =A0 =A0 =A0 =A0* passes.c =A0 =A0 =A0(revision 173122) (init_optimiz= ation_passes): Add the new >>>>>> =A0 =A0 =A0 =A0multi-version and dispatch passes to the pass list. >>>>>> =A0 =A0 =A0 =A0* params.def =A0 =A0(revision 173122) (PARAM_NUMBER_O= F_MVERSN_CLONES): Define. >>>>>> =A0 =A0 =A0 =A0(PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define. >>>>>> =A0 =A0 =A0 =A0* doc/invoke.texi =A0 =A0 =A0 (revision 173122) (mver= sn-clone-depth): Document. >>>>>> =A0 =A0 =A0 =A0(num-mversn-clones): Document. >>>>>> =A0 =A0 =A0 =A0(fclone-hot-version-paths): Document. >>>>> >>>> >>> >> >