From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9812 invoked by alias); 17 Oct 2011 15:58:16 -0000 Received: (qmail 9801 invoked by uid 22791); 17 Oct 2011 15:58:14 -0000 X-SWARE-Spam-Status: No, hits=-6.6 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS,TW_BJ X-Spam-Check-By: sourceware.org Received: from mail.commerzbank.com (HELO mail.commerzbank.com) (212.149.50.150) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 17 Oct 2011 15:57:52 +0000 Received: by mail.commerzbank.com (Commerzbank Mail-System, from userid 1002) id DEA6E2F60D8; Mon, 17 Oct 2011 17:57:48 +0200 (CEST) X-Spam-Virus: No Received: from mail.commerzbank.com (localhost [127.0.0.1]) by mail.commerzbank.com (Commerzbank Mail-System) with ESMTP id 952F02F6276; Mon, 17 Oct 2011 17:57:46 +0200 (CEST) Received: from pfx2.commerzbank.com (intern.postfix.commerzbank.com [172.16.71.213]) by mail.commerzbank.com (Commerzbank Mail-System) with ESMTPS id 903592F61FB; Mon, 17 Oct 2011 17:57:46 +0200 (CEST) Received: by pfx2.commerzbank.com (Commerzbank Internal Mail-System, from userid 1001) id 778C943346; Mon, 17 Oct 2011 17:57:46 +0200 (CEST) Received: from SE002520.cs.commerzbank.com (se002520.cs.commerzbank.com [140.26.81.20]) by pfx2.commerzbank.com (Commerzbank Internal Mail-System) with ESMTPS id 6EC8343339; Mon, 17 Oct 2011 17:57:46 +0200 (CEST) Received: from SE002568.cs.commerzbank.com ([fe80::3ccb:5984:6f44:b0cd]) by SE002520.cs.commerzbank.com ([fe80::d5d3:f347:fd24:da1e%10]) with mapi; Mon, 17 Oct 2011 17:57:45 +0200 From: "Hite, Christopher" To: "gcc@gcc.gnu.org" CC: Richard Guenther , 'Piotr Rak' Date: Mon, 17 Oct 2011 16:35:00 -0000 Subject: RE: optimization question: mpl Message-ID: <56888C35366397458A8FA0626BA836F119A4E0CDB5@SE002568.cs.commerzbank.com> References: <824586E7387D1443B5DD024DAC75B91C139049@SE000319.ztb.icb.commerzbank.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg00252.txt.bz2 I'm sorry I haven't gotten around to chasing this sooner. I needed to lear= n how to use objdump and read the assembler. The code now lets you define = the depth. It generates some pretty crappy code even with gcc-4.6.0 on O3. The code w= hen generating a 20 deep template inlines every 6 levels. At 50 depth it o= nly does 2 levels at a time. Someone mentioned taking control of this myself and implementing array look= ups. Sure, I could also do binary searches and hash tables when the data n= umbers aren't continuous. The problem with doing these things is that it t= akes control away from the compiler's optimizer. Theoretically an optimize= r should know best when to go from if/elses to binary lookups to hash table= s. I could do this kind of stuff at the metaprogramming level, but that co= uld break an optimizer in the future. It reminds me of all the "if x=3D=3D= 0" you see in old Fortran code which made things run faster on old machines= were branching was cheap. Theoretically I might know which ids are most probable, but profile feedbac= k optimization could be used for this. Would gcc join switch/case statements? switch(id) { case 1: foo(); break;} switch(id) { case 2: bar(); break;} Another idea to revive the switch/case is to implement a utility with a N s= witch cases, one for each size. I could maybe use self including headers o= r generate such code. By the way this kind of code exists. Boost::variant::apply_visitor uses re= peated switch/case statements. Changing foo() below to switch/case doesn't= seem to help. Chris ------------------ improved code #include #include #include struct DecodeContext; struct M1{ static const int id=3D13; static void decode(DecodeContext& ){ std::cout<<"1\n";} }; struct M2{ static const int id=3D17; static void decode(DecodeContext& ){ std::cout<<"2\n";} }; struct M3{ static const int id=3D19; static void decode(DecodeContext& ){ std::cout<<"3\n";} }; template struct M{ static const int id=3DN*3+1024; static void decode(DecodeContext& ){ std::cout< struct ListMember; template <> struct ListMember<1>{ typedef M1 type; }; template <> struct ListMember<2>{ typedef M2 type; }; template <> struct ListMember<3>{ typedef M3 type; }; template struct ListMember { typedef M type; }; template void foo(int id, DecodeContext& dc) { typedef typename ListMember::type M; if(M::id=3D=3Did) M::decode(dc); else foo(id,dc); } template<> void foo<0>(int id, DecodeContext& dc) {} int main(){ DecodeContext& dc=3D *(DecodeContext*)0;// junk for now int id=3Dstd::rand(); // runtime dependent foo<50>(id,dc); return 0; } -----Original Message----- From: Piotr Rak [mailto:piotr.rak@gmail.com]=20 Sent: Wednesday, July 28, 2010 8:17 PM To: Hite, Christopher Cc: Richard Guenther; gcc@gcc.gnu.org Subject: Re: optimization question: mpl Hi, 2010/7/28 Hite, Christopher : >> Generally without knowing the compiler version you are using it is=20 >> hard to tell. > I'll use whatever's best. =A0Right now I'm still on 4.4.3. =A0I'll=20 > probably upgrade soon to 4.5. > >> The same is true without a complete compilable testcase. > I didn't want to make a test case that depends on boost::mpl. How's > this: > > struct DecodeContext; > > struct M1{ > =A0 =A0 =A0 =A0static const int id=3D1; > =A0 =A0 =A0 =A0static void decode(DecodeContext& ){} }; > > struct M2{ > =A0 =A0 =A0 =A0static const int id=3D2; > =A0 =A0 =A0 =A0static void decode(DecodeContext& ){} }; > > struct M3{ > =A0 =A0 =A0 =A0static const int id=3D3; > =A0 =A0 =A0 =A0static void decode(DecodeContext& ){} }; > > > template struct ListMember; > > template <> struct ListMember<1>{ typedef M1 type; }; > template <> struct ListMember<2>{ typedef M2 type; }; > template <> struct ListMember<3>{ typedef M3 type; }; > > template > void foo(int id, DecodeContext& dc) { > =A0 =A0 =A0 =A0typedef typename ListMember::type M; > =A0 =A0 =A0 =A0if(M::id=3D=3Did) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0M::decode(dc); > =A0 =A0 =A0 =A0else > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0foo(id,dc); > } > > template<> > void foo<0>(int id, DecodeContext& dc) {} > > > > int main(){ > =A0 =A0 =A0 =A0DecodeContext& dc=3D *(DecodeContext*)0;// junk for now > =A0 =A0 =A0 =A0int id=3D2; //sometime runtime dependent > =A0 =A0 =A0 =A0foo<3>(id,dc); > =A0 =A0 =A0 =A0return 0; > } > > >> You can use the flatten attribute to tell the compiler to inline all >> calls in a given function, like >> >> void __attribute__((flatten)) foo(void) >> { >> ... >> decode1(); >> ... >> } > > That would cause decode1() to be inlined, which might not be what you > want. > > Hmm maybe I could rewrite things so the switch case returns a function > pointer. =A0I'm guessing that would make things slower though. > Or you could just initialize static array of pointers, like that (please note, that I never compiled code below): typedef void (*pfn) (DeclContext&); tempate struct InitDispatchHelper { static void init(std::tr1::array& a) { a[I-1] =3D ListMemeber::foo; InitPointersHelper::init(a); } }; // End recursion template<> struct InitDispatchHelper { static void init(std::tr1::array& a) { /*does nothing */ } }; // Dispatch table; template struct DispatchFoo : std::tr1::array { DispatchFoo() : std::tr1::array() { InitDispatchHelper::init(*this); } }; then your call would look like: static const int X =3D /* your maximal I in ListMembers meta-class */ void call_foo(int i, DeclContext& c) { static const DispatchFoo foo_dispatch; foo_dispatch[i] (c); } Bonus points for benchmarking both solutions and making it policy class choosing between those two solutions depending on X. Good luck, Piotr