From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23021 invoked by alias); 4 May 2011 09:27:32 -0000 Received: (qmail 22986 invoked by uid 22791); 4 May 2011 09:27:28 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_TM X-Spam-Check-By: sourceware.org Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 04 May 2011 09:27:12 +0000 Received: by wye20 with SMTP id 20so809844wye.20 for ; Wed, 04 May 2011 02:27:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.206.21 with SMTP id fs21mr927826wbb.40.1304501230708; Wed, 04 May 2011 02:27:10 -0700 (PDT) Received: by 10.227.20.74 with HTTP; Wed, 4 May 2011 02:27:10 -0700 (PDT) In-Reply-To: References: <20110429025248.90D61B21AB@azwildcat.mtv.corp.google.com> Date: Wed, 04 May 2011 09:30:00 -0000 Message-ID: Subject: Re: [google] Patch to support calling multi-versioned functions via new GCC builtin. (issue4440078) From: Richard Guenther To: Xinliang David Li Cc: Sriraman Tallam , reply@codereview.appspotmail.com, gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg00261.txt.bz2 On Tue, May 3, 2011 at 11:57 PM, Xinliang David Li wro= te: > On Tue, May 3, 2011 at 3:00 AM, Richard Guenther > wrote: >> On Tue, May 3, 2011 at 1:07 AM, Xinliang David Li w= rote: >>> On Mon, May 2, 2011 at 2:33 PM, Richard Guenther >>> wrote: >>>> On Mon, May 2, 2011 at 6:41 PM, Xinliang David Li = wrote: >>>>> On Mon, May 2, 2011 at 2:11 AM, Richard Guenther >>>>> wrote: >>>>>> On Fri, Apr 29, 2011 at 6:23 PM, Xinliang David Li wrote: >>>>>>> Here is the background for this feature: >>>>>>> >>>>>>> 1) People relies on function multi-version to explore hw features a= nd >>>>>>> squeeze performance, but there is no standard ways of doing so, eit= her >>>>>>> a) using indirect function calls with function pointers set at prog= ram >>>>>>> initialization; b) using manual dispatch at each callsite; b) using >>>>>>> features like IFUNC. =A0The dispatch mechanism needs to be promoted= to >>>>>>> the language level and becomes the first class citizen; >>>>>> >>>>>> You are not doing that, you are inventing a new (crude) GCC extensio= n. >>>>> >>>>> To capture the high level semantics and prevent user from lowering the >>>>> dispatch calls into forms compiler can not recognize, language >>>>> extension is the way to go. >>>> >>>> I don't think so. =A0With your patch only two passes understand the new >>>> high-level form, the rest of the gimple passes are just confused. >>> >>> There is no need for other passes to understand it -- just treat it as >>> opaque calls. This is goodness otherwise other passes need to be >>> modified. This is true (only some passes understand it) for things >>> like __builtin_expect. >> >> Certainly __builtin_dispatch has to be understood by alias analysis and >> all other passes that care about calls (like all IPA passes). =A0You can >> of course treat it conservatively (may call any function, even those >> which have their address not taken, clobber and read all memory, even >> that which doesn't escape the TU). >> >> Why obfuscate things when it is not necessary? > > MVed functions are usually non-trivial, so I doubt anything will be > lost due to the obfuscation. It won't be too difficult to teach > aliaser to 'merge' the attributes from target functions either. > > >>> No that is not my argument. What I tried to say is it will be harder >>> to achieve without high level semantics -- it requires more >>> handshaking between compiler passes. >> >> Sure - that's life. >> > > We are looking at improving the life .. > >>>> Which nobody will see benefit >>>> from unless they rewrite their code? >>> >>> The target users for the builtin include compiler itself -- it can >>> synthesize dispatch calls. >> >> Hum. =A0I'm not at all sure the dispatch calls are the best representati= on >> for the IL. >> > > The intension is to provide an interface at both C level (for > programmers) and IL level. =A0It does not have to be a builtin (both > internally and externally) =A0-- but it needs to map to some language > construct. > > >>>>=A0Well, I say if we can improve >>>> _some_ of the existing usages that's better than never doing wrong >>>> on a new language extension. >>> >>> This is independent. >> >> It is not. >> >>>>=A0One that I'm not convinced is the way >>>> to go (you didn't address at all the inability to use float arguments >>>> and the ABI issues with using variadic arguments - after all you >>>> did a poor-mans language extension by using GCC builtins instead >>>> of inventing a true one). >>> >>> This is an independent issue that either needs to be addressed or >>> marked as limitation. The key of the debate is whether source/IR >>> annotation using construct with high level semantics helps optimizer. >>> In fact this is common. Would it make any difference (in terms of >>> acceptance) if the builtin is only used internally by the compiler and >>> not exposed to the user? >> >> No. =A0I don't see at all why having everything in a single stmt is so m= uch >> more convenient. =A0And I don't see why existing IL features cannot be >> used to make things a little more convenient. > > Why not? The high level construct is simpler to deal with. It is all > about doing the right optimization at the right level of abstraction. > Set aside the question whether using builtin for MV dispatch is the > right high level construct, looking at gcc, we can find that gcc's IR > is pretty low level resulting in missing optimizations. > > For instance, there is no high level doloop representation -- Fortran > do-loop needs to be lowered and raised back again -- the consequence > is that you may not raise the loop nest into the way it was originally > written -- perfect nested loop become non-perfect loop nest -- > blocking certain loop transformations. =A0Not only that, I am not sure > it is even possible to record any loop level information anywhere -- > is it possible to have per loop attribute such as unroll factor? > > Assuming gcc can do full math function inlining (for common math > routines) -- in this case, do we still want to do sin/cos optimization > or rely on the scalar optimizer to optimize the inlined copies of sin > and cos? > > Not sure about gcc, I remember that dead temporary variable removal > can be very hard to do if some intrinsic gets lowered too early > introducing allocator and deallocator calls etc. Sure, there is always a trade-off between lowering early and lowering late. Both can have advantages. For all the examples above we already have both, a high-level and a low-level form - for dispatch we currently only have a low-level form for which I think it is not difficult to improve optimization a tad bit. And of course I don't like the initial __builtin_dispatch () proposal for a high-level form. I can think of some more-or-less obvious high-level forms, one would for example simply stick a new DISPATCH tree into gimple_call_fn (similar to how we can have OBJ_TYPE_REF there), the DISPATCH tree would be of variable length, first operand the selector function and further operands function addresses. That would keep the actual call visible (instead of a fake __builtin_dispatch call), something I'd really like to see. Lowering that would then simply "gimplify" that DISPATCH tree. That doesn't map to a source construct, but as I said it doesn't have to ;) >>>>> 3) it limits the lowering into one form which may not be ideal =A0-- >>>>> with builtin_dispatch, after hoisting optimization, the lowering can >>>>> use more efficient IFUNC scheme, for instance. >>>> >>>> I see no reason why we cannot transform a switch-indirect-call >>>> pattern into an IFUNC call. >>>> >>> >>> It is possible -- but it is like asking user to lower the dispatch and >>> tell compiler to raise it again .. >> >> There is no possibility for a high-level dispatch at the source level. >> And if I'd have to design one I would use function overloading, like >> >> float compute_sth (float) __attribute__((version("sse4"))) >> { >> =A0... sse4 code ... >> } >> >> float compute_sth (float) >> { >> =A0... fallback ... >> } >> >> float foo (float f) >> { >> =A0return compute_sth (f); >> } >> >> and if you not only want to dispatch for target features you could >> specify a selector function and value in the attribute. =A0You might >> notice that the above eventually matches the target attribute >> directly, just the frontends need to be taught to emit dispatch >> code whenever overload resolution results in ambiguities involving >> target attribute differences. > > Now we are talking. =A0 Allowing selector function is a must -- as > target features are just too weak. If we have this, it would be a > really nice for users. Restricting ourselves to use the existing target attribute at the beginning (with a single, compiler-generated selector function) is probably good enough to get a prototype up and running. Extending it to arbitrary selector-function, value pairs using a new attribute is then probably easy (I don't see the exact use-case for that yet, but I suppose it exists if you say so). For the overloading to work we probably have to force that the functions are local (so we can mangle them arbitrarily) and that if the function should be visible externally people add an externally visible dispatcher (foo in the above example would be one). >=A0The implicit dispatch lowering can be done > after the dispatch hoisting is done -- most of the ipa-clone work by > Sri can be retained. =A0 Not sure how hard the FE part of the work is > though. The FE parts would also be language specific I guess, eventually easier in the C++ frontend. >> Now, a language extension to support multi-versioning should be >> completely independent on any IL representation - with using >> a builtin you are tying them together with the only convenient >> mechanism we have - a mechanism that isn't optimal for either >> side IMNSHO. >> > > Yes -- they don't have to be tied -- they just happen to suite the > needs of both ends -- but I see the value of the latest proposal > (overloading) above. I did realize that using builtins was convenient (been there and done the same for some experiments ...). Richard.