From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24377 invoked by alias); 18 Aug 2011 07:51:43 -0000 Received: (qmail 24365 invoked by uid 22791); 18 Aug 2011 07:51:40 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-gy0-f175.google.com (HELO mail-gy0-f175.google.com) (209.85.160.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 18 Aug 2011 07:51:23 +0000 Received: by gyg4 with SMTP id 4so1342328gyg.20 for ; Thu, 18 Aug 2011 00:51:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.174.15 with SMTP id w15mr408881ybe.193.1313653882482; Thu, 18 Aug 2011 00:51:22 -0700 (PDT) Received: by 10.150.57.5 with HTTP; Thu, 18 Aug 2011 00:51:22 -0700 (PDT) In-Reply-To: References: Date: Thu, 18 Aug 2011 07:51:00 -0000 Message-ID: Subject: Re: Function Multiversioning Usability. From: Richard Guenther To: Xinliang David Li Cc: Sriraman Tallam , gcc@gcc.gnu.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-08/txt/msg00324.txt.bz2 On Wed, Aug 17, 2011 at 6:37 PM, Xinliang David Li wro= te: > On Wed, Aug 17, 2011 at 8:12 AM, Richard Guenther > wrote: >> On Wed, Aug 17, 2011 at 4:52 PM, Xinliang David Li = wrote: >>> The gist of previous discussion is to use function overloading instead >>> of exposing underlying implementation such as builtin_dispatch to the >>> user. This new refined proposal has not changed in that, but is more >>> elaborate on various use cases which has been carefully thought out. >>> Please be specific on which part needs to improvement. >> >> See below ... >> >>> Thanks, >>> >>> David >>> >>> On Wed, Aug 17, 2011 at 12:29 AM, Richard Guenther >>> wrote: >>>> On Tue, Aug 16, 2011 at 10:37 PM, Sriraman Tallam wrote: >>>>> Hi, >>>>> >>>>> =A0I am working on supporting function multi-versioning in GCC and he= re >>>>> is a write-up on its usability. >>>>> >>>>> Multiversioning Usability >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> >>>>> For a simple motivating example, >>>>> >>>>> int >>>>> find_popcount(unsigned int i) >>>>> { >>>>> =A0return __builtin_popcount(i); >>>>> } >>>>> >>>>> Currently, compiling this with -mpopcnt will result in the =93popcnt= =94 >>>>> instruction being used and otherwise call a built-in generic >>>>> implementation. It is desirable to have two versions of this function >>>>> so that it can be run both on targets that support the popcnt insn and >>>>> those that do not. >>>>> >>>>> >>>>> * Case I - User Guided Versioning where only one function body is >>>>> provided by the user. >>>>> >>>>> This case addresses a use where the user wants multi-versioning but >>>>> provides only one function body. =A0I want to add a new attribute cal= led >>>>> =93mversion=94 which will be used like this: >>>>> >>>>> int __attribute__(mversion(=93popcnt=94)) >>>>> find_popcount(unsigned int i) >>>>> { >>>>> =A0return __builtin_popcount(i); >>>>> } >>>>> >>>>> With the attribute, the compiler should understand that it should >>>>> generate two versions for this function. The user makes a call to this >>>>> function like a regular call but the code generated would call the >>>>> appropriate function at run-time based on a check to determine if that >>>>> instruction is supported or not. >> >> The example seems to be particularly ill-suited. =A0Trying to 2nd guess = you >> here I think you want to direct the compiler to emit multiple versions >> with different target capabilities enabled, probably for elaborate code = that >> _doesn't_ use any fancy builtins, right? =A0It seems this is a shortcut = for >> >> static inline __attribute__((always_iniline)) implementation () { ... } >> >> symbol __attribute__((target("msse2"))) { implementation(); } >> symbol __attribute__((target("msse3"))) { implementation(); } >> ... >> >> and so should be fully handled by the frontend (if at all, it seems to >> be purely syntactic sugar). > > Yes, it is a handy short cut -- I don't see the base for objection to > this convenience. And I don't see why we need to discuss it at this point. It also seems severely limited considering when I want to version for -msse2 -mpopcount and -msse4a - that doesn't look expressible. A more elaborate variant would be, say, foo () { ... }; foo __attribute__((version("sse2","popcount"))); foo __attribute__((version("sse4a"))); thus trigger a overload clone by a declaration as well, not just by a definition, similar to an explicit template instantiation. That sounds more scalable to me. >> >>>>> The attribute can be scaled to support many versions but allowing a >>>>> comma separated list of values for the mversion attribute. For >>>>> instance, =93__attribute__(mversion(=93sse3=94, =93sse4=94, ...)) wil= l provide a >>>>> version for each. For N attributes, N clones plus one clone for the >>>>> default case will have to be generated by the compiler. The arguments >>>>> to the "mversion" attribute will be similar to the arguments supported >>>>> by the "target" attribute. >>>>> >>>>> This attribute is useful if the same source is going to be used to >>>>> generate the different versions. If this has to be done manually, the >>>>> user has to duplicate the body of the function and specify a target >>>>> attribute of =93popcnt=94 on one clone. Then, the user has to use >>>>> something like IFUNC support or manually write code to call the >>>>> appropriate version. All of this will be done automatically by the >>>>> compiler with this new attribute. >>>>> >>>>> * Case II - User Guided Versioning where the function bodies for each >>>>> version differ and is provided by the user. >>>>> >>>>> This case pertains to multi-versioning when the source bodies of the >>>>> two or more versions are different and are provided by the user. Here >>>>> too, I want to use a new attribute, =93version=94. Now, the user can >>>>> specify versioning intent like this: >>>>> >>>>> int __attribute__((version(=93popcnt=94)) >>>>> find_popcnt(unsigned int i) >>>>> { >>>>> =A0 // inline assembly of the popcnt instruction, specialized version. >>>>> =A0asm(=93popcnt =85.=94); >>>>> } >>>>> >>>>> int >>>>> find_popcnt(unsigned int i) >>>>> { >>>>> =A0//generic code for doing this >>>>> =A0... >>>>> } >>>>> >>>>> This uses function overloading to specify versions. =A0The compiler w= ill >>>>> understand that versioning is requested, since the functions have >>>>> different attributes with "version", and will generate the code to >>>>> execute the right function at run-time. =A0The compiler should check = for >>>>> the existence of one body without the attribute which will be the >>>>> default version. >> >> Yep, we agreed that this is a good idea. =A0But we also agreed to >> use either the target attribute (for compiler-generated tests) or >> a single predicate attribute that takes a function which is const >> with no arguments and returns whether the variant is selected or not. > > 'target' attribute is an existing one, so adding overloading changes > its semantics -- that is why a new 'version' attribute is proposed. > For most of the cases, user does not need to provide his selector > function, and compiler can use runtime support (builtins) to do the > selection (See Sri's runtime patch). Sure, I just want to make sure we re-use the same infrastructure for both. > For power users, yes, the original agreed proposal is useful. The > flavor of syntax that supports selector can be added back. Well, you said it was definitely required ;) I had my doubts that it would be relevant in practice, so we can as well leave it out. >> >>>>> * Case III - Versioning is done automatically by the compiler. >>>>> >>>>> I want to add a new compiler flag =93-mversion=94 along the lines of = =93-m=94. >>>>> If the user specifies =93-mversion=3Dpopcnt=94 then the compiler will >>>>> automatically create two versions of any function that is impacted by >>>>> the new instruction. The difference between =93-m=94 and =93-mversion= =94 will >> >> How do you plan to detect "impacted by the new instruction?". =A0Again >> popcnt seems to be a poor example - most use probably lies in >> autovectorization (but then it's closely tied to active capabilites of t= he >> backend and not really ready for auto-versioning). >> > > This is just an example. Major use cases involves versioning against > the cpu model, such as core2, corei7, amd15h, etc. It has impact on > decisions on code layout, unrolling, vectorization, scheduling, etc, > but then again, this (MV heuristic) is a whole different topic. =A0The > discussion here is about infrastructure. I don't see how auto-MV has any impact on the infrastructure, so we might as well postpone any discussion until the infrastructure is set. Richard. > Thanks, > > David > >> This will be a lot of work if it shouldn't be very inefficient. >> >> Richard. >> >>>>> be that while =93-m=94 generates only the specialized version, =93-mv= ersion=94 >>>>> will generate both the specialized and the generic versions. =A0There= is >>>>> no need to explicity mark any function for versioning, no source >>>>> changes. >>>>> >>>>> The compiler will decide if it is beneficial to multi-version a >>>>> function based on heuristics using hotness information, code size >>>>> growth, etc. >>>>> >>>>> >>>>> Runtime support >>>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> >>>>> In order for the compiler to generate multi-versioned code, it needs >>>>> to call functions that would test if a particular feature exists or >>>>> not at run-time. For example, IsPopcntSupported() would be one such >>>>> function. I have prepared a patch to do this which adds the runtime >>>>> support in libgcc and supports new builtins to test the various >>>>> features. I will send the patch separately to keep the dicussions >>>>> focused. >>>>> >>>>> >>>>> Thoughts? >>>> >>>> Please focus on one mechanism and re-use existing facilities as much as >>>> possible. =A0Thus, see the old discussion where we settled on overload= ing >>>> with either using the existing target attribute or a selector function. >>>> I don't see any benefit repeating the discussions here. >>>> >>>> Richard. >>>> >>>>> Thanks, >>>>> -Sri. >>>>> >>>> >>> >> >