optimization question: mpl

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* optimization question: mpl
@ 2010-07-28 14:38 Hite, Christopher
  2010-07-28 15:26 ` Richard Guenther
  2010-07-28 15:35 ` Larry Evans
  0 siblings, 2 replies; 9+ messages in thread
From: Hite, Christopher @ 2010-07-28 14:38 UTC (permalink / raw)
  To: gcc

I'm writing a decoder using a meta programming techniques alla
boost::mpl and I'd like to know if I'm asking too much of the compiler.

Basically I've got lots of packet types with different ids.  The
classical way to write these would be a switch case

	int id;
	switch(id){
	case Id1: decode1() break;
	case Id2: decode2() break;
	...
	}

I'm tring to use the template compiler to generate equivalent code.  I
have a mpl::list of packet descriptions like this:
	struct Packet1{
		static const int id=1;
		void decode();
	};

I then use boost::mpl::fold to generate a decode function which might
look sort of like this:

	void decode1(int id){
		if(id==1)
			Packet1::decode();
		else 
			decode2(id);
	}

What I'm hoping is that the compiler is smart enough 
* to inline all decode*() calls into one function
* to notice I compare id to N constants and use switch case optimization
(binary search or lookup table)

Am I asking too much?

Do I have to watch for any pitfalls that willl break the optimization,
like making id a member of a class or leaving out the 'else' ?

I could write more complicated meta-code which sorts by id and does a
runtime binary search, that would probably prevent the optimizer from
doing anything smarter.

Chris Hite

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: optimization question: mpl
  2010-07-28 14:38 optimization question: mpl Hite, Christopher
@ 2010-07-28 15:26 ` Richard Guenther
  2010-07-28 15:53   ` Hite, Christopher
  2010-07-29 12:16   ` Larry Evans
  2010-07-28 15:35 ` Larry Evans
  1 sibling, 2 replies; 9+ messages in thread
From: Richard Guenther @ 2010-07-28 15:26 UTC (permalink / raw)
  To: Hite, Christopher; +Cc: gcc

On Wed, Jul 28, 2010 at 4:37 PM, Hite, Christopher
<Christopher.Hite@partner.commerzbank.com> wrote:
>
>
>
> I'm writing a decoder using a meta programming techniques alla
> boost::mpl and I'd like to know if I'm asking too much of the compiler.
>
> Basically I've got lots of packet types with different ids.  The
> classical way to write these would be a switch case
>
>        int id;
>        switch(id){
>        case Id1: decode1() break;
>        case Id2: decode2() break;
>        ...
>        }
>
> I'm tring to use the template compiler to generate equivalent code.  I
> have a mpl::list of packet descriptions like this:
>        struct Packet1{
>                static const int id=1;
>                void decode();
>        };
>
> I then use boost::mpl::fold to generate a decode function which might
> look sort of like this:
>
>        void decode1(int id){
>                if(id==1)
>                        Packet1::decode();
>                else
>                        decode2(id);
>        }
>
> What I'm hoping is that the compiler is smart enough
> * to inline all decode*() calls into one function
> * to notice I compare id to N constants and use switch case optimization
> (binary search or lookup table)
>
> Am I asking too much?
>
> Do I have to watch for any pitfalls that willl break the optimization,
> like making id a member of a class or leaving out the 'else' ?
>
>
> I could write more complicated meta-code which sorts by id and does a
> runtime binary search, that would probably prevent the optimizer from
> doing anything smarter.

Generally without knowing the compiler version you are using
it is hard to tell.  The same is true without a complete compilable
testcase.

You can use the flatten attribute to tell the compiler to inline all
calls in a given function, like

void __attribute__((flatten)) foo(void)
{
...
decode1();
...
}

and it will inline decode1() (and recursively all its callees).

Richard.

> Chris Hite
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: optimization question: mpl
  2010-07-28 14:38 optimization question: mpl Hite, Christopher
  2010-07-28 15:26 ` Richard Guenther
@ 2010-07-28 15:35 ` Larry Evans
  1 sibling, 0 replies; 9+ messages in thread
From: Larry Evans @ 2010-07-28 15:35 UTC (permalink / raw)
  To: gcc

On 07/28/10 09:37, Hite, Christopher wrote:
[snip]
> I'm tring to use the template compiler to generate equivalent code.  I
> have a mpl::list of packet descriptions like this:
> 	struct Packet1{
> 		static const int id=1;
> 		void decode();
> 	};
> 
> I then use boost::mpl::fold to generate a decode function which might
> look sort of like this:
> 
> 	void decode1(int id){
> 		if(id==1)
> 			Packet1::decode();
> 		else 
> 			decode2(id);
> 	}
> 
[snip]
boost::mpl::fold generates types, not functions; hence, I don't
understand how it can be used to generate this function.
Maybe if you posted the code that generates the function or functions
it would be open my eyes.

Also, maybe the proposed boost switch library would be some help:

   http://svn.boost.org/svn/boost/sandbox/switch/

HTH.

-Larry

BTW, I've asked a somewhat similar question about used a
constant vector of function pointers and whether that would
be faster than a switch:

http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/364c0dd5ce512497# 


The response was it wouldn't.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: optimization question: mpl
  2010-07-28 15:26 ` Richard Guenther
@ 2010-07-28 15:53   ` Hite, Christopher
  2010-07-28 17:26     ` Larry Evans
  2010-07-28 18:17     ` Piotr Rak
  2010-07-29 12:16   ` Larry Evans
  1 sibling, 2 replies; 9+ messages in thread
From: Hite, Christopher @ 2010-07-28 15:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

> Generally without knowing the compiler version you are using
> it is hard to tell.  
I'll use whatever's best.  Right now I'm still on 4.4.3.  I'll probably
upgrade soon to 4.5.

> The same is true without a complete compilable testcase.
I didn't want to make a test case that depends on boost::mpl. How's
this:

struct DecodeContext;

struct M1{
	static const int id=1;
	static void decode(DecodeContext& ){}
};

struct M2{
	static const int id=2;
	static void decode(DecodeContext& ){}
};

struct M3{
	static const int id=3;
	static void decode(DecodeContext& ){}
};


template <int> struct ListMember;

template <> struct ListMember<1>{ typedef M1 type; };
template <> struct ListMember<2>{ typedef M2 type; };
template <> struct ListMember<3>{ typedef M3 type; };

template<int i>
void foo(int id, DecodeContext& dc) {
	typedef typename ListMember<i>::type M;
	if(M::id==id)
		M::decode(dc);
	else
		foo<i-1>(id,dc);
}

template<>
void foo<0>(int id, DecodeContext& dc) {}



int main(){
	DecodeContext& dc= *(DecodeContext*)0;// junk for now
	int id=2; //sometime runtime dependent
	foo<3>(id,dc);
	return 0;
}


> You can use the flatten attribute to tell the compiler to inline all
> calls in a given function, like
> 
> void __attribute__((flatten)) foo(void)
> {
> ...
> decode1();
> ...
> }

That would cause decode1() to be inlined, which might not be what you
want.

Hmm maybe I could rewrite things so the switch case returns a function
pointer.  I'm guessing that would make things slower though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: optimization question: mpl
  2010-07-28 15:53   ` Hite, Christopher
@ 2010-07-28 17:26     ` Larry Evans
  2010-07-28 18:17     ` Piotr Rak
  1 sibling, 0 replies; 9+ messages in thread
From: Larry Evans @ 2010-07-28 17:26 UTC (permalink / raw)
  To: gcc

On 07/28/10 10:53, Hite, Christopher wrote:
[snip]
> struct DecodeContext;
> 
> struct M1{
> 	static const int id=1;
> 	static void decode(DecodeContext& ){}
> };
> 
> struct M2{
> 	static const int id=2;
> 	static void decode(DecodeContext& ){}
> };
> 
> struct M3{
> 	static const int id=3;
> 	static void decode(DecodeContext& ){}
> };
> 
> 
> template <int> struct ListMember;
> 
> template <> struct ListMember<1>{ typedef M1 type; };
> template <> struct ListMember<2>{ typedef M2 type; };
> template <> struct ListMember<3>{ typedef M3 type; };
> 
> template<int i>
> void foo(int id, DecodeContext& dc) {
> 	typedef typename ListMember<i>::type M;
> 	if(M::id==id)
> 		M::decode(dc);
> 	else
> 		foo<i-1>(id,dc);
> }
> 
> template<>
> void foo<0>(int id, DecodeContext& dc) {}
> 
> 
> 
> int main(){
> 	DecodeContext& dc= *(DecodeContext*)0;// junk for now
> 	int id=2; //sometime runtime dependent
> 	foo<3>(id,dc);
> 	return 0;
> }
> 
[snip]
Wouldn't the following simplification work just as well?

template <int Id> struct ListMember{
         static void decode(DecodeContext& ){}
};

template<int i>
void foo(int id, DecodeContext& dc) {
	if(i=id)
                 ListMember<i>::decode(dc);
	else
                 foo<i-1>(id,dc);
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: optimization question: mpl
  2010-07-28 15:53   ` Hite, Christopher
  2010-07-28 17:26     ` Larry Evans
@ 2010-07-28 18:17     ` Piotr Rak
  2010-07-28 19:00       ` Larry Evans
  2011-10-17 16:35       ` Hite, Christopher
  1 sibling, 2 replies; 9+ messages in thread
From: Piotr Rak @ 2010-07-28 18:17 UTC (permalink / raw)
  To: Hite, Christopher; +Cc: Richard Guenther, gcc

Hi,

2010/7/28 Hite, Christopher <Christopher.Hite@partner.commerzbank.com>:
>> Generally without knowing the compiler version you are using
>> it is hard to tell.
> I'll use whatever's best.  Right now I'm still on 4.4.3.  I'll probably
> upgrade soon to 4.5.
>
>> The same is true without a complete compilable testcase.
> I didn't want to make a test case that depends on boost::mpl. How's
> this:
>
> struct DecodeContext;
>
> struct M1{
>        static const int id=1;
>        static void decode(DecodeContext& ){}
> };
>
> struct M2{
>        static const int id=2;
>        static void decode(DecodeContext& ){}
> };
>
> struct M3{
>        static const int id=3;
>        static void decode(DecodeContext& ){}
> };
>
>
> template <int> struct ListMember;
>
> template <> struct ListMember<1>{ typedef M1 type; };
> template <> struct ListMember<2>{ typedef M2 type; };
> template <> struct ListMember<3>{ typedef M3 type; };
>
> template<int i>
> void foo(int id, DecodeContext& dc) {
>        typedef typename ListMember<i>::type M;
>        if(M::id==id)
>                M::decode(dc);
>        else
>                foo<i-1>(id,dc);
> }
>
> template<>
> void foo<0>(int id, DecodeContext& dc) {}
>
>
>
> int main(){
>        DecodeContext& dc= *(DecodeContext*)0;// junk for now
>        int id=2; //sometime runtime dependent
>        foo<3>(id,dc);
>        return 0;
> }
>
>
>> You can use the flatten attribute to tell the compiler to inline all
>> calls in a given function, like
>>
>> void __attribute__((flatten)) foo(void)
>> {
>> ...
>> decode1();
>> ...
>> }
>
> That would cause decode1() to be inlined, which might not be what you
> want.
>
> Hmm maybe I could rewrite things so the switch case returns a function
> pointer.  I'm guessing that would make things slower though.
>

Or you could just initialize static array of pointers, like that
(please note, that I never compiled code below):

typedef void (*pfn) (DeclContext&);

tempate <int I>
struct InitDispatchHelper
{
  static void init(std::tr1::array<pfn,N>& a)
  {
    a[I-1] = ListMemeber<I-1>::foo;
    InitPointersHelper<I-1>::init(a);
  }
};

// End recursion
template<>
struct InitDispatchHelper
{
  static void init(std::tr1::array<pfn,N>& a)
  {    /*does nothing */  }
};

// Dispatch table;
template <int N>
struct DispatchFoo : std::tr1::array<pfn, N>
{
  DispatchFoo() : std::tr1::array<pfn, N>()
  {
     InitDispatchHelper<N>::init(*this);
  }
};

then your call would look like:

static const int X = /* your maximal I in ListMembers meta-class */

void call_foo(int i, DeclContext& c)
{
  static const DispatchFoo<X> foo_dispatch;
  foo_dispatch[i] (c);
}

Bonus points for benchmarking both solutions and making it policy
class choosing between those two solutions depending on X.

Good luck,

Piotr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: optimization question: mpl
  2010-07-28 18:17     ` Piotr Rak
@ 2010-07-28 19:00       ` Larry Evans
  2011-10-17 16:35       ` Hite, Christopher
  1 sibling, 0 replies; 9+ messages in thread
From: Larry Evans @ 2010-07-28 19:00 UTC (permalink / raw)
  To: gcc

On 07/28/10 13:16, Piotr Rak wrote:
[snip]
> Or you could just initialize static array of pointers, like that
> (please note, that I never compiled code below):
[snip]
> 
Piotr, something similar was proposed here:

http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/364c0dd5ce512497#

however, the response there indicates the code would be faster with a
switch.  I don't really understand why, especially if the array of
pointers was a constexpr, but then I'm not compiler writer.

-regards,
Larry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: optimization question: mpl
  2010-07-28 15:26 ` Richard Guenther
  2010-07-28 15:53   ` Hite, Christopher
@ 2010-07-29 12:16   ` Larry Evans
  1 sibling, 0 replies; 9+ messages in thread
From: Larry Evans @ 2010-07-29 12:16 UTC (permalink / raw)
  To: gcc

On 07/28/10 10:26, Richard Guenther wrote:
[snip]
> 
> You can use the flatten attribute to tell the compiler to inline all
> calls in a given function, like
> 
> void __attribute__((flatten)) foo(void)
> {
> ...
> decode1();
> ...
> }
> 
> and it will inline decode1() (and recursively all its callees).
> 
[snip]
Will this attribute work when the function calls are done
by dereferencing const function pointers in a const array
as done in the dispatch_funvec shown here:

http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/364c0dd5ce512497#

or is there maybe some other attribute needed?

TIA.

-Larry




^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: optimization question: mpl
  2010-07-28 18:17     ` Piotr Rak
  2010-07-28 19:00       ` Larry Evans
@ 2011-10-17 16:35       ` Hite, Christopher
  1 sibling, 0 replies; 9+ messages in thread
From: Hite, Christopher @ 2011-10-17 16:35 UTC (permalink / raw)
  To: gcc; +Cc: Richard Guenther, 'Piotr Rak'


I'm sorry I haven't gotten around to chasing this sooner.  I needed to learn how to use objdump and read the assembler.  The code now lets you define the depth.

It generates some pretty crappy code even with gcc-4.6.0 on O3.  The code when generating a 20 deep template inlines every 6 levels.  At 50 depth it only does 2 levels at a time.

Someone mentioned taking control of this myself and implementing array lookups.  Sure, I could also do binary searches and hash tables when the data numbers aren't continuous.  The problem with doing these things is that it takes control away from the compiler's optimizer.  Theoretically an optimizer should know best when to go from if/elses to binary lookups to hash tables.  I could do this kind of stuff at the metaprogramming level, but that could break an optimizer in the future.  It reminds me of all the "if x==0" you see in old Fortran code which made things run faster on old machines were branching was cheap.

Theoretically I might know which ids are most probable, but profile feedback optimization could be used for this.

Would gcc join switch/case statements?
	switch(id) { case 1: foo(); break;}
	switch(id) { case 2: bar(); break;}

Another idea to revive the switch/case is to implement a utility with a N switch cases, one for each size.  I could maybe use self including headers or generate such code.


By the way this kind of code exists.  Boost::variant::apply_visitor uses repeated switch/case statements.  Changing foo() below to switch/case doesn't seem to help.

Chris

------------------ improved code
#include <iostream>
#include <cstdlib>
#include <typeinfo>

struct DecodeContext;

struct M1{
	static const int id=13;
	static void decode(DecodeContext& ){ std::cout<<"1\n";}
};

struct M2{
	static const int id=17;
	static void decode(DecodeContext& ){ std::cout<<"2\n";}
};

struct M3{
	static const int id=19;
	static void decode(DecodeContext& ){ std::cout<<"3\n";}
};


template<int N>
struct M{
	static const int id=N*3+1024;
	static void decode(DecodeContext& ){ std::cout<<typeid(M).name()<<" "<<N<<"!\n";}
};


template <int> struct ListMember;

template <> struct ListMember<1>{ typedef M1 type; };
template <> struct ListMember<2>{ typedef M2 type; };
template <> struct ListMember<3>{ typedef M3 type; };

template <int N> struct ListMember { typedef M<N> type; };



template<int i>
void foo(int id, DecodeContext& dc) {
	typedef typename ListMember<i>::type M;
	if(M::id==id)
		M::decode(dc);
	else
		foo<i-1>(id,dc);
}

template<>
void foo<0>(int id, DecodeContext& dc) {}



int main(){
	DecodeContext& dc= *(DecodeContext*)0;// junk for now
	int id=std::rand(); // runtime dependent
	foo<50>(id,dc);
	return 0;
}

-----Original Message-----
From: Piotr Rak [mailto:piotr.rak@gmail.com] 
Sent: Wednesday, July 28, 2010 8:17 PM
To: Hite, Christopher
Cc: Richard Guenther; gcc@gcc.gnu.org
Subject: Re: optimization question: mpl

Hi,

2010/7/28 Hite, Christopher <Christopher.Hite@partner.commerzbank.com>:
>> Generally without knowing the compiler version you are using it is 
>> hard to tell.
> I'll use whatever's best.  Right now I'm still on 4.4.3.  I'll 
> probably upgrade soon to 4.5.
>
>> The same is true without a complete compilable testcase.
> I didn't want to make a test case that depends on boost::mpl. How's
> this:
>
> struct DecodeContext;
>
> struct M1{
>        static const int id=1;
>        static void decode(DecodeContext& ){} };
>
> struct M2{
>        static const int id=2;
>        static void decode(DecodeContext& ){} };
>
> struct M3{
>        static const int id=3;
>        static void decode(DecodeContext& ){} };
>
>
> template <int> struct ListMember;
>
> template <> struct ListMember<1>{ typedef M1 type; };
> template <> struct ListMember<2>{ typedef M2 type; };
> template <> struct ListMember<3>{ typedef M3 type; };
>
> template<int i>
> void foo(int id, DecodeContext& dc) {
>        typedef typename ListMember<i>::type M;
>        if(M::id==id)
>                M::decode(dc);
>        else
>                foo<i-1>(id,dc);
> }
>
> template<>
> void foo<0>(int id, DecodeContext& dc) {}
>
>
>
> int main(){
>        DecodeContext& dc= *(DecodeContext*)0;// junk for now
>        int id=2; //sometime runtime dependent
>        foo<3>(id,dc);
>        return 0;
> }
>
>
>> You can use the flatten attribute to tell the compiler to inline all
>> calls in a given function, like
>>
>> void __attribute__((flatten)) foo(void)
>> {
>> ...
>> decode1();
>> ...
>> }
>
> That would cause decode1() to be inlined, which might not be what you
> want.
>
> Hmm maybe I could rewrite things so the switch case returns a function
> pointer.  I'm guessing that would make things slower though.
>

Or you could just initialize static array of pointers, like that
(please note, that I never compiled code below):

typedef void (*pfn) (DeclContext&);

tempate <int I>
struct InitDispatchHelper
{
  static void init(std::tr1::array<pfn,N>& a)
  {
    a[I-1] = ListMemeber<I-1>::foo;
    InitPointersHelper<I-1>::init(a);
  }
};

// End recursion
template<>
struct InitDispatchHelper
{
  static void init(std::tr1::array<pfn,N>& a)
  {    /*does nothing */  }
};

// Dispatch table;
template <int N>
struct DispatchFoo : std::tr1::array<pfn, N>
{
  DispatchFoo() : std::tr1::array<pfn, N>()
  {
     InitDispatchHelper<N>::init(*this);
  }
};

then your call would look like:

static const int X = /* your maximal I in ListMembers meta-class */

void call_foo(int i, DeclContext& c)
{
  static const DispatchFoo<X> foo_dispatch;
  foo_dispatch[i] (c);
}

Bonus points for benchmarking both solutions and making it policy
class choosing between those two solutions depending on X.

Good luck,

Piotr

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-10-17 15:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-28 14:38 optimization question: mpl Hite, Christopher
2010-07-28 15:26 ` Richard Guenther
2010-07-28 15:53   ` Hite, Christopher
2010-07-28 17:26     ` Larry Evans
2010-07-28 18:17     ` Piotr Rak
2010-07-28 19:00       ` Larry Evans
2011-10-17 16:35       ` Hite, Christopher
2010-07-29 12:16   ` Larry Evans
2010-07-28 15:35 ` Larry Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).