From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26053 invoked by alias); 2 May 2011 20:38:29 -0000 Received: (qmail 25856 invoked by uid 22791); 2 May 2011 20:38:27 -0000 X-SWARE-Spam-Status: No, hits=-1.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 02 May 2011 20:38:11 +0000 Received: from wpaz29.hot.corp.google.com (wpaz29.hot.corp.google.com [172.24.198.93]) by smtp-out.google.com with ESMTP id p42KcAo7016140 for ; Mon, 2 May 2011 13:38:10 -0700 Received: from gwb1 (gwb1.prod.google.com [10.200.2.1]) by wpaz29.hot.corp.google.com with ESMTP id p42KbOHx029477 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Mon, 2 May 2011 13:38:09 -0700 Received: by gwb1 with SMTP id 1so3031639gwb.22 for ; Mon, 02 May 2011 13:38:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.46.2 with SMTP id t2mr802352ant.87.1304368688727; Mon, 02 May 2011 13:38:08 -0700 (PDT) Received: by 10.101.54.3 with HTTP; Mon, 2 May 2011 13:38:08 -0700 (PDT) In-Reply-To: References: <20110429025248.90D61B21AB@azwildcat.mtv.corp.google.com> Date: Mon, 02 May 2011 20:38:00 -0000 Message-ID: Subject: Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078) From: Sriraman Tallam To: Xinliang David Li Cc: Richard Guenther , reply@codereview.appspotmail.com, gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg00113.txt.bz2 Thanks. I committed this patch. -Sri. svn commit Sending gcc/ChangeLog.google-main Sending gcc/Makefile.in Sending gcc/builtin-types.def Sending gcc/builtins.def Sending gcc/c-family/c-common.c Sending gcc/common.opt Sending gcc/doc/invoke.texi Adding gcc/mversn-dispatch.c Sending gcc/params.def Sending gcc/passes.c Adding gcc/testsuite/g++.dg/mversn10.C Adding gcc/testsuite/g++.dg/mversn10a.C Adding gcc/testsuite/g++.dg/mversn12.C Adding gcc/testsuite/g++.dg/mversn14.C Adding gcc/testsuite/g++.dg/mversn14a.C Adding gcc/testsuite/g++.dg/mversn16.C Adding gcc/testsuite/g++.dg/mversn8.C Adding gcc/testsuite/g++.dg/mversn9.C Adding gcc/testsuite/g++.dg/torture/mversn11.C Adding gcc/testsuite/g++.dg/torture/mversn5.C Adding gcc/testsuite/g++.dg/torture/mversn5.h Adding gcc/testsuite/g++.dg/torture/mversn5a.C Adding gcc/testsuite/g++.dg/tree-prof/mversn13.C Adding gcc/testsuite/g++.dg/tree-prof/mversn15.C Adding gcc/testsuite/g++.dg/tree-prof/mversn15a.C Adding gcc/testsuite/gcc.dg/mversn2.c Adding gcc/testsuite/gcc.dg/mversn3.c Adding gcc/testsuite/gcc.dg/mversn4.c Adding gcc/testsuite/gcc.dg/mversn4.h Adding gcc/testsuite/gcc.dg/mversn4a.c Adding gcc/testsuite/gcc.dg/mversn6.c Adding gcc/testsuite/gcc.dg/mversn7.c Adding gcc/testsuite/gcc.dg/torture/mversn1.c Sending gcc/timevar.def Sending gcc/tree-pass.h Transmitting file data ................................... Committed revision 173271. On Mon, May 2, 2011 at 12:32 PM, Xinliang David Li wro= te: > Ok. There may be more correctness related comments -- but those can be > addressed when available. For trunk, you need to address issues such > as multi-way dispatch. > > Thanks, > > David > > On Mon, May 2, 2011 at 12:11 PM, Sriraman Tallam wr= ote: >> Hi, >> >> =A0I want to submit this patch to google/main to make sure it is >> available for our internal use at Google in order to materialize some >> optimization opportunities. Let us continue this dicussion as I make >> changes and submit this for review for trunk. >> >> Thanks, >> -Sri. >> >> >> On Mon, May 2, 2011 at 9:41 AM, Xinliang David Li w= rote: >>> On Mon, May 2, 2011 at 2:11 AM, Richard Guenther >>> wrote: >>>> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li wrote: >>>>> Here is the background for this feature: >>>>> >>>>> 1) People relies on function multi-version to explore hw features and >>>>> squeeze performance, but there is no standard ways of doing so, either >>>>> a) using indirect function calls with function pointers set at program >>>>> initialization; b) using manual dispatch at each callsite; b) using >>>>> features like IFUNC. =A0The dispatch mechanism needs to be promoted to >>>>> the language level and becomes the first class citizen; >>>> >>>> You are not doing that, you are inventing a new (crude) GCC extension. >>> >>> To capture the high level semantics and prevent user from lowering the >>> dispatch calls into forms compiler can not recognize, language >>> extension is the way to go. >>> >>>> >>> >>>>> See above. Multi-way dispatch can be added similarly. >>>> >>>> Not with the specified syntax. =A0So you'd end up with _two_ >>>> language extensions. =A0That's bad (and unacceptable, IMHO). >>> >>> This part can be improved. >>> >>>> >>>>> >>>>>> >>>>>> That's a nice idea, but why not simply process all functions with >>>>>> a const attribute and no arguments this way? =A0IMHO >>>>>> >>>>>> int testsse (void) __attribute__((const)); >>>>>> >>>>>> is as good and avoids the new attribute completely. =A0The pass would >>>>>> also benefit other uses of this idiom (giving a way to have global >>>>>> dynamic initializers in C). =A0The functions may not be strictly 'co= nst' >>>>>> in a way the compiler autodetects this attribute but it presents >>>>>> exactly the promises to the compiler that allow this optimization. >>>>>> >>>>>> Thus, please split this piece out into a separate patch and use >>>>>> the const attribute. >>>>> >>>>> >>>>> Sounds good. >>>>> >>> >>>>> >>>>> 1) the function selection may happen in a different function; >>>> >>>> Well, sure. =A0I propose to have the above as lowered form. =A0If peop= le >>>> deliberately obfuscate code it's their fault. =A0Your scheme simply >>>> makes that obfuscation impossible (or less likely possible). =A0So >>>> 1) is not a valid argument. >>> >>> Lowering too early may defeat the purpose because >>> >>> 1) the desired optimization may not happen subject to many compiler >>> heuristic changes; >>> 2) it has other side effects such as wrong estimation of function size >>> which may impact inlining >>> 3) it limits the lowering into one form which may not be ideal =A0-- >>> with builtin_dispatch, after hoisting optimization, the lowering can >>> use more efficient IFUNC scheme, for instance. >>> >>>> >>>> >>>> My point is that such optimization is completely independent of >>>> that dispatch thing. =A0The above could as well be a selection to >>>> use different input arrays, one causing alias analysis issues >>>> and one not. >>>> >>>> Thus, a __builtin_dispatch centric optimization pass is the wrong >>>> way to go. >>> >>> I agree that many things can implemented in different ways, but a high >>> level standard builtin_dispatch mechanism doing hoisting >>> interprocedcurally is cleaner and simpler and targeted for function >>> multi-versioning -- it does not depend on/rely on later phase's >>> heuristic tunings to make the right things to happen. Function MV >>> deserves this level of treatment as it will become more and more >>> important for some users (e.g., Google). >>> >>>> >>>> Now, with FDO I'd expect the foo is inlined into bar (because foo >>>> is deemed hot), >>> >>> That is a myth -- the truth is that there are other heuristics which >>> can prevent this from happening. >>> >>>> then you only need to deal with loop unswitching, >>>> which should be easy to drive from FDO. >>> >>> Same here -- the loop body may not be well formed/recognized. The loop >>> nests may not be perfectly nested, or other unswitching heuristics may >>> block it from happening. =A0This is the common problem form many other >>> things that get lowered too early. It is cleaner to make the high >>> level transformation first in IPA, and let unswitching dealing with >>> intra-procedural optimization. >>> >>> Thanks, >>> >>> David >>> >>>> >>>> Richard. >>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> David >>>>> >>>>> >>>>> >>>>>> >>>>>> Note that we already have special FDO support for indirect to direct >>>>>> call promotion, so that might work already. >>>>>> >>>>>> Thus, with all this, __builtin_dispatch would be more like syntactic >>>>>> sugar to avoid writing above switch statement yourself. =A0If you re= strict >>>>>> that sugar to a constant number of candidates it can be replaced by >>>>>> a macro (or several ones, specialized for the number of candidates). >>>>>> >>>>>> Richard. >>>>>> >>>>>>> =A0For the call graph that looks like this before cloning : >>>>>>> >>>>>>> func_cold ----> func_hot ----> findOnes ----> IsPopCntSupported ---= -> popcnt >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0| >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0-----> no_popcnt >>>>>>> >>>>>>> =A0where popcnt and no_popcnt are the multi-versioned functions, th= en after cloning : >>>>>>> >>>>>>> func_cold ---->IsPopCntSupported ----> func_hot_clone0 ----> findOn= es_clone0 ----> popcnt >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| >>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ----> f= unc_hot_clone1 ----> findOnes_clone1 ----> no_popcnt >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> =A0Flags : -fclone-hot-version-paths does function unswitching via = cloning. >>>>>>> =A0 =A0 =A0 =A0 =A0--param=3Dnum-mversn-clones=3D allows to sp= ecify the number of >>>>>>> =A0 =A0 =A0 =A0 =A0functions that should be cloned. >>>>>>> =A0 =A0 =A0 =A0 =A0--param=3Dmversn-clone-depth=3D allows to s= pecify the length of >>>>>>> =A0 =A0 =A0 =A0 =A0the call graph path that should be cloned. =A0nu= m =3D 0 implies only >>>>>>> =A0 =A0 =A0 =A0 =A0leaf node that contains the __builtin_dispatch s= tatement must be >>>>>>> =A0 =A0 =A0 =A0 =A0cloned. >>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>>>> >>>>>>> Google ref 45947, 45986 >>>>>>> >>>>>>> 2011-04-28 =A0Sriraman Tallam =A0 >>>>>>> >>>>>>> =A0 =A0 =A0 =A0* c-family/c-common.c =A0 (revision 173122) (handle_= version_selector_attribute): New function. >>>>>>> =A0 =A0 =A0 =A0(c_common_attribute_table): New attribute "version_s= elector". >>>>>>> =A0 =A0 =A0 =A0* tree-pass.h =A0 (revision 173122) (pass_tree_conve= rt_builtin_dispatch): New pass. >>>>>>> =A0 =A0 =A0 =A0(pass_ipa_multiversion_dispatch): New pass. >>>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn7.c =A0 =A0(revision 0): Ne= w test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4.c =A0 =A0(revision 0): Ne= w test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4.h =A0 =A0(revision 0): Ne= w test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn4a.c =A0 (revision 0): New = test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/torture/mversn1.c =A0 =A0(revisio= n 0): New test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn2.c =A0 =A0(revision 0): Ne= w test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn6.c =A0 =A0(revision 0): Ne= w test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/gcc.dg/mversn3.c =A0 =A0(revision 0): Ne= w test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn8.C =A0 =A0(revision 0): Ne= w test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn10a.C =A0(revision 0): New = test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn14a.C =A0(revision 0): New = test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn13.C (revision 0)= : New test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn15.C (revision 0)= : New test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/tree-prof/mversn15a.C =A0 =A0 =A0= =A0(revision 0): New test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn9.C =A0 =A0(revision 0): Ne= w test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn10.C =A0 (revision 0): New = test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn12.C =A0 (revision 0): New = test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn14.C =A0 (revision 0): New = test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/mversn16.C =A0 (revision 0): New = test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn11.C =A0 (revision = 0): New test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5.C =A0 =A0(revisio= n 0): New test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5.h =A0 =A0(revisio= n 0): New test case. >>>>>>> =A0 =A0 =A0 =A0* testsuite/g++.dg/torture/mversn5a.C =A0 (revision = 0): New test case. >>>>>>> =A0 =A0 =A0 =A0* builtin-types.def =A0 =A0 (revision 173122) (BT_PT= R_FN_INT): New pointer type. >>>>>>> =A0 =A0 =A0 =A0(BT_FN_INT_PTR_FN_INT_PTR_PTR_VAR): New function typ= e for __builtin_dispatch. >>>>>>> =A0 =A0 =A0 =A0* builtins.def =A0(revision 173122) (BUILT_IN_DISPAT= CH): New builtin to >>>>>>> =A0 =A0 =A0 =A0support multi-version calls. >>>>>>> =A0 =A0 =A0 =A0* mversn-dispatch.c =A0 =A0 (revision 0): New file. >>>>>>> =A0 =A0 =A0 =A0* timevar.def =A0 (revision 173122) (TV_MVERSN_DISPA= TCH): New time var. >>>>>>> =A0 =A0 =A0 =A0* common.opt =A0 =A0(revision 173122) (fclone-hot-ve= rsion-paths): New flag. >>>>>>> =A0 =A0 =A0 =A0* Makefile.in =A0 (revision 173122) (mversn-dispatch= .o): New rule. >>>>>>> =A0 =A0 =A0 =A0* passes.c =A0 =A0 =A0(revision 173122) (init_optimi= zation_passes): Add the new >>>>>>> =A0 =A0 =A0 =A0multi-version and dispatch passes to the pass list. >>>>>>> =A0 =A0 =A0 =A0* params.def =A0 =A0(revision 173122) (PARAM_NUMBER_= OF_MVERSN_CLONES): Define. >>>>>>> =A0 =A0 =A0 =A0(PARAM_MVERSN_CLONE_CGRAPH_DEPTH): Define. >>>>>>> =A0 =A0 =A0 =A0* doc/invoke.texi =A0 =A0 =A0 (revision 173122) (mve= rsn-clone-depth): Document. >>>>>>> =A0 =A0 =A0 =A0(num-mversn-clones): Document. >>>>>>> =A0 =A0 =A0 =A0(fclone-hot-version-paths): Document. >>>>>> >>>>> >>>> >>> >> >