* optimization question: mpl @ 2010-07-28 14:38 Hite, Christopher 2010-07-28 15:26 ` Richard Guenther 2010-07-28 15:35 ` Larry Evans 0 siblings, 2 replies; 9+ messages in thread From: Hite, Christopher @ 2010-07-28 14:38 UTC (permalink / raw) To: gcc I'm writing a decoder using a meta programming techniques alla boost::mpl and I'd like to know if I'm asking too much of the compiler. Basically I've got lots of packet types with different ids. The classical way to write these would be a switch case int id; switch(id){ case Id1: decode1() break; case Id2: decode2() break; ... } I'm tring to use the template compiler to generate equivalent code. I have a mpl::list of packet descriptions like this: struct Packet1{ static const int id=1; void decode(); }; I then use boost::mpl::fold to generate a decode function which might look sort of like this: void decode1(int id){ if(id==1) Packet1::decode(); else decode2(id); } What I'm hoping is that the compiler is smart enough * to inline all decode*() calls into one function * to notice I compare id to N constants and use switch case optimization (binary search or lookup table) Am I asking too much? Do I have to watch for any pitfalls that willl break the optimization, like making id a member of a class or leaving out the 'else' ? I could write more complicated meta-code which sorts by id and does a runtime binary search, that would probably prevent the optimizer from doing anything smarter. Chris Hite ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: optimization question: mpl 2010-07-28 14:38 optimization question: mpl Hite, Christopher @ 2010-07-28 15:26 ` Richard Guenther 2010-07-28 15:53 ` Hite, Christopher 2010-07-29 12:16 ` Larry Evans 2010-07-28 15:35 ` Larry Evans 1 sibling, 2 replies; 9+ messages in thread From: Richard Guenther @ 2010-07-28 15:26 UTC (permalink / raw) To: Hite, Christopher; +Cc: gcc On Wed, Jul 28, 2010 at 4:37 PM, Hite, Christopher <Christopher.Hite@partner.commerzbank.com> wrote: > > > > I'm writing a decoder using a meta programming techniques alla > boost::mpl and I'd like to know if I'm asking too much of the compiler. > > Basically I've got lots of packet types with different ids. The > classical way to write these would be a switch case > > int id; > switch(id){ > case Id1: decode1() break; > case Id2: decode2() break; > ... > } > > I'm tring to use the template compiler to generate equivalent code. I > have a mpl::list of packet descriptions like this: > struct Packet1{ > static const int id=1; > void decode(); > }; > > I then use boost::mpl::fold to generate a decode function which might > look sort of like this: > > void decode1(int id){ > if(id==1) > Packet1::decode(); > else > decode2(id); > } > > What I'm hoping is that the compiler is smart enough > * to inline all decode*() calls into one function > * to notice I compare id to N constants and use switch case optimization > (binary search or lookup table) > > Am I asking too much? > > Do I have to watch for any pitfalls that willl break the optimization, > like making id a member of a class or leaving out the 'else' ? > > > I could write more complicated meta-code which sorts by id and does a > runtime binary search, that would probably prevent the optimizer from > doing anything smarter. Generally without knowing the compiler version you are using it is hard to tell. The same is true without a complete compilable testcase. You can use the flatten attribute to tell the compiler to inline all calls in a given function, like void __attribute__((flatten)) foo(void) { ... decode1(); ... } and it will inline decode1() (and recursively all its callees). Richard. > Chris Hite > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: optimization question: mpl 2010-07-28 15:26 ` Richard Guenther @ 2010-07-28 15:53 ` Hite, Christopher 2010-07-28 17:26 ` Larry Evans 2010-07-28 18:17 ` Piotr Rak 2010-07-29 12:16 ` Larry Evans 1 sibling, 2 replies; 9+ messages in thread From: Hite, Christopher @ 2010-07-28 15:53 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc > Generally without knowing the compiler version you are using > it is hard to tell. I'll use whatever's best. Right now I'm still on 4.4.3. I'll probably upgrade soon to 4.5. > The same is true without a complete compilable testcase. I didn't want to make a test case that depends on boost::mpl. How's this: struct DecodeContext; struct M1{ static const int id=1; static void decode(DecodeContext& ){} }; struct M2{ static const int id=2; static void decode(DecodeContext& ){} }; struct M3{ static const int id=3; static void decode(DecodeContext& ){} }; template <int> struct ListMember; template <> struct ListMember<1>{ typedef M1 type; }; template <> struct ListMember<2>{ typedef M2 type; }; template <> struct ListMember<3>{ typedef M3 type; }; template<int i> void foo(int id, DecodeContext& dc) { typedef typename ListMember<i>::type M; if(M::id==id) M::decode(dc); else foo<i-1>(id,dc); } template<> void foo<0>(int id, DecodeContext& dc) {} int main(){ DecodeContext& dc= *(DecodeContext*)0;// junk for now int id=2; //sometime runtime dependent foo<3>(id,dc); return 0; } > You can use the flatten attribute to tell the compiler to inline all > calls in a given function, like > > void __attribute__((flatten)) foo(void) > { > ... > decode1(); > ... > } That would cause decode1() to be inlined, which might not be what you want. Hmm maybe I could rewrite things so the switch case returns a function pointer. I'm guessing that would make things slower though. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: optimization question: mpl 2010-07-28 15:53 ` Hite, Christopher @ 2010-07-28 17:26 ` Larry Evans 2010-07-28 18:17 ` Piotr Rak 1 sibling, 0 replies; 9+ messages in thread From: Larry Evans @ 2010-07-28 17:26 UTC (permalink / raw) To: gcc On 07/28/10 10:53, Hite, Christopher wrote: [snip] > struct DecodeContext; > > struct M1{ > static const int id=1; > static void decode(DecodeContext& ){} > }; > > struct M2{ > static const int id=2; > static void decode(DecodeContext& ){} > }; > > struct M3{ > static const int id=3; > static void decode(DecodeContext& ){} > }; > > > template <int> struct ListMember; > > template <> struct ListMember<1>{ typedef M1 type; }; > template <> struct ListMember<2>{ typedef M2 type; }; > template <> struct ListMember<3>{ typedef M3 type; }; > > template<int i> > void foo(int id, DecodeContext& dc) { > typedef typename ListMember<i>::type M; > if(M::id==id) > M::decode(dc); > else > foo<i-1>(id,dc); > } > > template<> > void foo<0>(int id, DecodeContext& dc) {} > > > > int main(){ > DecodeContext& dc= *(DecodeContext*)0;// junk for now > int id=2; //sometime runtime dependent > foo<3>(id,dc); > return 0; > } > [snip] Wouldn't the following simplification work just as well? template <int Id> struct ListMember{ static void decode(DecodeContext& ){} }; template<int i> void foo(int id, DecodeContext& dc) { if(i=id) ListMember<i>::decode(dc); else foo<i-1>(id,dc); } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: optimization question: mpl 2010-07-28 15:53 ` Hite, Christopher 2010-07-28 17:26 ` Larry Evans @ 2010-07-28 18:17 ` Piotr Rak 2010-07-28 19:00 ` Larry Evans 2011-10-17 16:35 ` Hite, Christopher 1 sibling, 2 replies; 9+ messages in thread From: Piotr Rak @ 2010-07-28 18:17 UTC (permalink / raw) To: Hite, Christopher; +Cc: Richard Guenther, gcc Hi, 2010/7/28 Hite, Christopher <Christopher.Hite@partner.commerzbank.com>: >> Generally without knowing the compiler version you are using >> it is hard to tell. > I'll use whatever's best. Right now I'm still on 4.4.3. I'll probably > upgrade soon to 4.5. > >> The same is true without a complete compilable testcase. > I didn't want to make a test case that depends on boost::mpl. How's > this: > > struct DecodeContext; > > struct M1{ > static const int id=1; > static void decode(DecodeContext& ){} > }; > > struct M2{ > static const int id=2; > static void decode(DecodeContext& ){} > }; > > struct M3{ > static const int id=3; > static void decode(DecodeContext& ){} > }; > > > template <int> struct ListMember; > > template <> struct ListMember<1>{ typedef M1 type; }; > template <> struct ListMember<2>{ typedef M2 type; }; > template <> struct ListMember<3>{ typedef M3 type; }; > > template<int i> > void foo(int id, DecodeContext& dc) { > typedef typename ListMember<i>::type M; > if(M::id==id) > M::decode(dc); > else > foo<i-1>(id,dc); > } > > template<> > void foo<0>(int id, DecodeContext& dc) {} > > > > int main(){ > DecodeContext& dc= *(DecodeContext*)0;// junk for now > int id=2; //sometime runtime dependent > foo<3>(id,dc); > return 0; > } > > >> You can use the flatten attribute to tell the compiler to inline all >> calls in a given function, like >> >> void __attribute__((flatten)) foo(void) >> { >> ... >> decode1(); >> ... >> } > > That would cause decode1() to be inlined, which might not be what you > want. > > Hmm maybe I could rewrite things so the switch case returns a function > pointer. I'm guessing that would make things slower though. > Or you could just initialize static array of pointers, like that (please note, that I never compiled code below): typedef void (*pfn) (DeclContext&); tempate <int I> struct InitDispatchHelper { static void init(std::tr1::array<pfn,N>& a) { a[I-1] = ListMemeber<I-1>::foo; InitPointersHelper<I-1>::init(a); } }; // End recursion template<> struct InitDispatchHelper { static void init(std::tr1::array<pfn,N>& a) { /*does nothing */ } }; // Dispatch table; template <int N> struct DispatchFoo : std::tr1::array<pfn, N> { DispatchFoo() : std::tr1::array<pfn, N>() { InitDispatchHelper<N>::init(*this); } }; then your call would look like: static const int X = /* your maximal I in ListMembers meta-class */ void call_foo(int i, DeclContext& c) { static const DispatchFoo<X> foo_dispatch; foo_dispatch[i] (c); } Bonus points for benchmarking both solutions and making it policy class choosing between those two solutions depending on X. Good luck, Piotr ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: optimization question: mpl 2010-07-28 18:17 ` Piotr Rak @ 2010-07-28 19:00 ` Larry Evans 2011-10-17 16:35 ` Hite, Christopher 1 sibling, 0 replies; 9+ messages in thread From: Larry Evans @ 2010-07-28 19:00 UTC (permalink / raw) To: gcc On 07/28/10 13:16, Piotr Rak wrote: [snip] > Or you could just initialize static array of pointers, like that > (please note, that I never compiled code below): [snip] > Piotr, something similar was proposed here: http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/364c0dd5ce512497# however, the response there indicates the code would be faster with a switch. I don't really understand why, especially if the array of pointers was a constexpr, but then I'm not compiler writer. -regards, Larry ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: optimization question: mpl 2010-07-28 18:17 ` Piotr Rak 2010-07-28 19:00 ` Larry Evans @ 2011-10-17 16:35 ` Hite, Christopher 1 sibling, 0 replies; 9+ messages in thread From: Hite, Christopher @ 2011-10-17 16:35 UTC (permalink / raw) To: gcc; +Cc: Richard Guenther, 'Piotr Rak' I'm sorry I haven't gotten around to chasing this sooner. I needed to learn how to use objdump and read the assembler. The code now lets you define the depth. It generates some pretty crappy code even with gcc-4.6.0 on O3. The code when generating a 20 deep template inlines every 6 levels. At 50 depth it only does 2 levels at a time. Someone mentioned taking control of this myself and implementing array lookups. Sure, I could also do binary searches and hash tables when the data numbers aren't continuous. The problem with doing these things is that it takes control away from the compiler's optimizer. Theoretically an optimizer should know best when to go from if/elses to binary lookups to hash tables. I could do this kind of stuff at the metaprogramming level, but that could break an optimizer in the future. It reminds me of all the "if x==0" you see in old Fortran code which made things run faster on old machines were branching was cheap. Theoretically I might know which ids are most probable, but profile feedback optimization could be used for this. Would gcc join switch/case statements? switch(id) { case 1: foo(); break;} switch(id) { case 2: bar(); break;} Another idea to revive the switch/case is to implement a utility with a N switch cases, one for each size. I could maybe use self including headers or generate such code. By the way this kind of code exists. Boost::variant::apply_visitor uses repeated switch/case statements. Changing foo() below to switch/case doesn't seem to help. Chris ------------------ improved code #include <iostream> #include <cstdlib> #include <typeinfo> struct DecodeContext; struct M1{ static const int id=13; static void decode(DecodeContext& ){ std::cout<<"1\n";} }; struct M2{ static const int id=17; static void decode(DecodeContext& ){ std::cout<<"2\n";} }; struct M3{ static const int id=19; static void decode(DecodeContext& ){ std::cout<<"3\n";} }; template<int N> struct M{ static const int id=N*3+1024; static void decode(DecodeContext& ){ std::cout<<typeid(M).name()<<" "<<N<<"!\n";} }; template <int> struct ListMember; template <> struct ListMember<1>{ typedef M1 type; }; template <> struct ListMember<2>{ typedef M2 type; }; template <> struct ListMember<3>{ typedef M3 type; }; template <int N> struct ListMember { typedef M<N> type; }; template<int i> void foo(int id, DecodeContext& dc) { typedef typename ListMember<i>::type M; if(M::id==id) M::decode(dc); else foo<i-1>(id,dc); } template<> void foo<0>(int id, DecodeContext& dc) {} int main(){ DecodeContext& dc= *(DecodeContext*)0;// junk for now int id=std::rand(); // runtime dependent foo<50>(id,dc); return 0; } -----Original Message----- From: Piotr Rak [mailto:piotr.rak@gmail.com] Sent: Wednesday, July 28, 2010 8:17 PM To: Hite, Christopher Cc: Richard Guenther; gcc@gcc.gnu.org Subject: Re: optimization question: mpl Hi, 2010/7/28 Hite, Christopher <Christopher.Hite@partner.commerzbank.com>: >> Generally without knowing the compiler version you are using it is >> hard to tell. > I'll use whatever's best. Right now I'm still on 4.4.3. I'll > probably upgrade soon to 4.5. > >> The same is true without a complete compilable testcase. > I didn't want to make a test case that depends on boost::mpl. How's > this: > > struct DecodeContext; > > struct M1{ > static const int id=1; > static void decode(DecodeContext& ){} }; > > struct M2{ > static const int id=2; > static void decode(DecodeContext& ){} }; > > struct M3{ > static const int id=3; > static void decode(DecodeContext& ){} }; > > > template <int> struct ListMember; > > template <> struct ListMember<1>{ typedef M1 type; }; > template <> struct ListMember<2>{ typedef M2 type; }; > template <> struct ListMember<3>{ typedef M3 type; }; > > template<int i> > void foo(int id, DecodeContext& dc) { > typedef typename ListMember<i>::type M; > if(M::id==id) > M::decode(dc); > else > foo<i-1>(id,dc); > } > > template<> > void foo<0>(int id, DecodeContext& dc) {} > > > > int main(){ > DecodeContext& dc= *(DecodeContext*)0;// junk for now > int id=2; //sometime runtime dependent > foo<3>(id,dc); > return 0; > } > > >> You can use the flatten attribute to tell the compiler to inline all >> calls in a given function, like >> >> void __attribute__((flatten)) foo(void) >> { >> ... >> decode1(); >> ... >> } > > That would cause decode1() to be inlined, which might not be what you > want. > > Hmm maybe I could rewrite things so the switch case returns a function > pointer. I'm guessing that would make things slower though. > Or you could just initialize static array of pointers, like that (please note, that I never compiled code below): typedef void (*pfn) (DeclContext&); tempate <int I> struct InitDispatchHelper { static void init(std::tr1::array<pfn,N>& a) { a[I-1] = ListMemeber<I-1>::foo; InitPointersHelper<I-1>::init(a); } }; // End recursion template<> struct InitDispatchHelper { static void init(std::tr1::array<pfn,N>& a) { /*does nothing */ } }; // Dispatch table; template <int N> struct DispatchFoo : std::tr1::array<pfn, N> { DispatchFoo() : std::tr1::array<pfn, N>() { InitDispatchHelper<N>::init(*this); } }; then your call would look like: static const int X = /* your maximal I in ListMembers meta-class */ void call_foo(int i, DeclContext& c) { static const DispatchFoo<X> foo_dispatch; foo_dispatch[i] (c); } Bonus points for benchmarking both solutions and making it policy class choosing between those two solutions depending on X. Good luck, Piotr ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: optimization question: mpl 2010-07-28 15:26 ` Richard Guenther 2010-07-28 15:53 ` Hite, Christopher @ 2010-07-29 12:16 ` Larry Evans 1 sibling, 0 replies; 9+ messages in thread From: Larry Evans @ 2010-07-29 12:16 UTC (permalink / raw) To: gcc On 07/28/10 10:26, Richard Guenther wrote: [snip] > > You can use the flatten attribute to tell the compiler to inline all > calls in a given function, like > > void __attribute__((flatten)) foo(void) > { > ... > decode1(); > ... > } > > and it will inline decode1() (and recursively all its callees). > [snip] Will this attribute work when the function calls are done by dereferencing const function pointers in a const array as done in the dispatch_funvec shown here: http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/364c0dd5ce512497# or is there maybe some other attribute needed? TIA. -Larry ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: optimization question: mpl 2010-07-28 14:38 optimization question: mpl Hite, Christopher 2010-07-28 15:26 ` Richard Guenther @ 2010-07-28 15:35 ` Larry Evans 1 sibling, 0 replies; 9+ messages in thread From: Larry Evans @ 2010-07-28 15:35 UTC (permalink / raw) To: gcc On 07/28/10 09:37, Hite, Christopher wrote: [snip] > I'm tring to use the template compiler to generate equivalent code. I > have a mpl::list of packet descriptions like this: > struct Packet1{ > static const int id=1; > void decode(); > }; > > I then use boost::mpl::fold to generate a decode function which might > look sort of like this: > > void decode1(int id){ > if(id==1) > Packet1::decode(); > else > decode2(id); > } > [snip] boost::mpl::fold generates types, not functions; hence, I don't understand how it can be used to generate this function. Maybe if you posted the code that generates the function or functions it would be open my eyes. Also, maybe the proposed boost switch library would be some help: http://svn.boost.org/svn/boost/sandbox/switch/ HTH. -Larry BTW, I've asked a somewhat similar question about used a constant vector of function pointers and whether that would be faster than a switch: http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/364c0dd5ce512497# The response was it wouldn't. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-10-17 15:58 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-28 14:38 optimization question: mpl Hite, Christopher 2010-07-28 15:26 ` Richard Guenther 2010-07-28 15:53 ` Hite, Christopher 2010-07-28 17:26 ` Larry Evans 2010-07-28 18:17 ` Piotr Rak 2010-07-28 19:00 ` Larry Evans 2011-10-17 16:35 ` Hite, Christopher 2010-07-29 12:16 ` Larry Evans 2010-07-28 15:35 ` Larry Evans
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).