From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1624 invoked by alias); 29 Apr 2011 16:23:58 -0000 Received: (qmail 1588 invoked by uid 22791); 29 Apr 2011 16:23:54 -0000 X-SWARE-Spam-Status: No, hits=0.2 required=5.0 tests=AWL,BAYES_99,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (74.125.121.67) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 29 Apr 2011 16:23:39 +0000 Received: from kpbe17.cbf.corp.google.com (kpbe17.cbf.corp.google.com [172.25.105.81]) by smtp-out.google.com with ESMTP id p3TGNaH0015472 for ; Fri, 29 Apr 2011 09:23:37 -0700 Received: from yic13 (yic13.prod.google.com [10.243.65.141]) by kpbe17.cbf.corp.google.com with ESMTP id p3TGNZ2R012763 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Fri, 29 Apr 2011 09:23:35 -0700 Received: by yic13 with SMTP id 13so2005457yic.17 for ; Fri, 29 Apr 2011 09:23:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.170.12 with SMTP id s12mr2005971ybe.358.1304094214735; Fri, 29 Apr 2011 09:23:34 -0700 (PDT) Received: by 10.150.192.11 with HTTP; Fri, 29 Apr 2011 09:23:34 -0700 (PDT) In-Reply-To: References: <20110429025248.90D61B21AB@azwildcat.mtv.corp.google.com> Date: Fri, 29 Apr 2011 16:55:00 -0000 Message-ID: Subject: Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078) From: Xinliang David Li To: Richard Guenther Cc: Sriraman Tallam , reply@codereview.appspotmail.com, gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-04/txt/msg02339.txt.bz2 Here is the background for this feature: 1) People relies on function multi-version to explore hw features and squeeze performance, but there is no standard ways of doing so, either a) using indirect function calls with function pointers set at program initialization; b) using manual dispatch at each callsite; b) using features like IFUNC. The dispatch mechanism needs to be promoted to the language level and becomes the first class citizen; 2) Most importantly, the above non standard approaches block interprocedural optimizations such as inlining. With the introduction of buitlin_dispatch and the clear semantics, compiler can do more aggressive optimization. Multi-way dispatch will be good, but most of the use cases we have seen is 2-way -- a fast path + a default path. Who are the targeted consumer of the feature? 1) For developers who like to MV function manually; 2) For user directed function cloning e.g, int foo (...) __attribute__((clone ("sse")): 3) Automatic function cloning determined by compiler. > > I am not sure that combining the function choice and its invocation > to a single builtin is good. =A0First, you use variadic arguments for > the actual function arguments which will cause vararg promotions > to apply and thus it will be impossible to dispatch to functions > taking single precision float values (and dependent on ABI details > many more similar issues). =A0Second, you restrict yourself to > only two versions of a function - that looks quite limited and this > restriction is not easy to lift with your proposed interface. See above. Multi-way dispatch can be added similarly. > > That's a nice idea, but why not simply process all functions with > a const attribute and no arguments this way? =A0IMHO > > int testsse (void) __attribute__((const)); > > is as good and avoids the new attribute completely. =A0The pass would > also benefit other uses of this idiom (giving a way to have global > dynamic initializers in C). =A0The functions may not be strictly 'const' > in a way the compiler autodetects this attribute but it presents > exactly the promises to the compiler that allow this optimization. > > Thus, please split this piece out into a separate patch and use > the const attribute. Sounds good. > >> =A0What happens with cloning, -fclone-hot-version-paths ? >> =A0----------------------------------------------------- > > Now, here you lost me somewhat, because I didn't look into the > patch details and I am missing an example on how the lowered > IL looks before that cloning. =A0So for now I suppose this > -fclone-hot-version-paths > is to expose direct calls to the IL. =A0If you would lower __builtin_disp= atch > to a style like > > =A0int sel =3D selector (); > =A0switch (sel) > =A0 =A0 { > =A0 =A0 case 0: > =A0 =A0 =A0 fn =3D popcnt; > =A0 =A0 =A0break; > =A0 =A0 case 1: > =A0 =A0 =A0 fn =3D popcnt_sse4; > =A0 =A0 =A0 break; > =A0 =A0 } > =A0 res =3D (*fn) (25); > > then rather than a new pass specialized for __builtin_dispatch existing > optimization passes that are able to do tracing like VRP and DOM > via jump-threading or the tracer pass should be able to do this > optimization for you. =A0If they do not use FDO in a good way it is better > to improve them for this case instead of writing a new pass. What you describe may not work 1) the function selection may happen in a different function; 2) Compiler will need to hoist the selection into the program initializer to avoid overhead As an example of why dispatch hoisting and call chain cloning is needed: void foo(); void bar(); void doit_v1(); void doit_v2(); bool check_v () __attribute__((const)); int test(); void bar() { .... for (.....) { foo(); .... } } void foo() { ... for (...) { __builtin_dispatch(check_v, doit_v1, doit_v2,...); ... } } int test () { .. bar (); } The feature testing and dispatch is embedded in a 2-deep loop nest crossing function boundaries. The call paths test --> bar --> foo needs to be cloned. This is done by hoisting dispatch up the call chain -- it ends up : void bar_v1() { .... for (..) { foo_v1 (); } .. } void bar_v2 () { ... for (..) { foo_v2(); } .. } void foo_v1 () { .. for () { doit_v1() } } void foo_v2 () { .. for () { doit_v2() } } int test () { __builtin_dispatch(check_v, bar_v1, bar_v2); .. } Thanks, David > > Note that we already have special FDO support for indirect to direct > call promotion, so that might work already. > > Thus, with all this, __builtin_dispatch would be more like syntactic > sugar to avoid writing above switch statement yourself. =A0If you restrict > that sugar to a constant number of candidates it can be replaced by > a macro (or several ones, specialized for the number of candidates). > > Richard. > >> =A0For the call graph that looks like this before cloning : >> >> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ----> po= pcnt >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0| >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0-----> no_popcnt >> >> =A0where popcnt and no_popcnt are the multi-versioned functions, then af= ter cloning : >> >> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> findOnes_cl= one0 ----> popcnt >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ----> func_h= ot_clone1 ----> findOnes_clone1 ----> no_popcnt >> >> >> >> >> =A0Flags : -fclone-hot-version-paths does function unswitching via cloni= ng. >> =A0 =A0 =A0 =A0 =A0--param=3Dnum-mversn-clones=3D allows to specify= the number of >> =A0 =A0 =A0 =A0 =A0functions that should be cloned. >> =A0 =A0 =A0 =A0 =A0--param=3Dmversn-clone-depth=3D allows to specif= y the length of >> =A0 =A0 =A0 =A0 =A0the call graph path that should be cloned. =A0num =3D= 0 implies only >> =A0 =A0 =A0 =A0 =A0leaf node that contains the __builtin_dispatch statem= ent must be >> =A0 =A0 =A0 =A0 =A0cloned. >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> Google ref 45947, 45986 >> >> 2011-04-28 =A0Sriraman Tallam =A0 >> >> =A0 =A0 =A0 =A0* c-family/c-common.c =A0 (revision 173122) (handle_versi= on_selector_attribute): New function. >> =A0 =A0 =A0 =A0(c_common_attribute_table): New attribute "version_select= or". >> =A0 =A0 =A0 =A0* tree-pass.h =A0 (revision 173122) (pass_tree_convert_bu= iltin_dispatch): New pass. >> =A0 =A0 =A0 =A0(pass_ipa_multiversion_dispatch): New pass. >> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn7.c =A0 =A0(revision 0): New tes= t case. >> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4.c =A0 =A0(revision 0): New tes= t case. >> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4.h =A0 =A0(revision 0): New tes= t case. >> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4a.c =A0 (revision 0): New test = case. >> =A0 =A0 =A0 =A0* testsuite/gcc.dg/torture/mversn1.c =A0 =A0(revision 0):= New test case. >> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn2.c =A0 =A0(revision 0): New tes= t case. >> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn6.c =A0 =A0(revision 0): New tes= t case. >> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn3.c =A0 =A0(revision 0): New tes= t case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn8.C =A0 =A0(revision 0): New tes= t case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn10a.C =A0(revision 0): New test = case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn14a.C =A0(revision 0): New test = case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn13.C (revision 0): New= test case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn15.C (revision 0): New= test case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn15a.C =A0 =A0 =A0 =A0(= revision 0): New test case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn9.C =A0 =A0(revision 0): New tes= t case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn10.C =A0 (revision 0): New test = case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn12.C =A0 (revision 0): New test = case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn14.C =A0 (revision 0): New test = case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn16.C =A0 (revision 0): New test = case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn11.C =A0 (revision 0): N= ew test case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5.C =A0 =A0(revision 0):= New test case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5.h =A0 =A0(revision 0):= New test case. >> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5a.C =A0 (revision 0): N= ew test case. >> =A0 =A0 =A0 =A0* builtin-types.def =A0 =A0 (revision 173122) (BT_PTR_FN_= INT): New pointer type. >> =A0 =A0 =A0 =A0(BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function type for= __builtin_dispatch. >> =A0 =A0 =A0 =A0* builtins.def =A0(revision 173122) (BUILT_IN_DISPATCH): = New builtin to >> =A0 =A0 =A0 =A0support multi-version calls. >> =A0 =A0 =A0 =A0* mversn-dispatch.c =A0 =A0 (revision 0): New file. >> =A0 =A0 =A0 =A0* timevar.def =A0 (revision 173122) (TV_MVERSN_DISPATCH):= New time var. >> =A0 =A0 =A0 =A0* common.opt =A0 =A0(revision 173122) (fclone-hot-version= -paths): New flag. >> =A0 =A0 =A0 =A0* Makefile.in =A0 (revision 173122) (mversn-dispatch.o): = New rule. >> =A0 =A0 =A0 =A0* passes.c =A0 =A0 =A0(revision 173122) (init_optimizatio= n_passes): Add the new >> =A0 =A0 =A0 =A0multi-version and dispatch passes to the pass list. >> =A0 =A0 =A0 =A0* params.def =A0 =A0(revision 173122) (PARAM_NUMBER_OF_MV= ERSN_CLONES): Define. >> =A0 =A0 =A0 =A0(PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define. >> =A0 =A0 =A0 =A0* doc/invoke.texi =A0 =A0 =A0 (revision 173122) (mversn-c= lone-depth): Document. >> =A0 =A0 =A0 =A0(num-mversn-clones): Document. >> =A0 =A0 =A0 =A0(fclone-hot-version-paths): Document. >