public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Propose C Metaprogramming-based qsort as GNU extension
@ 2015-02-16 21:13 Daniel Santos
  2015-04-01 15:27 ` Ondřej Bílka
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Santos @ 2015-02-16 21:13 UTC (permalink / raw)
  To: libc-alpha; +Cc: carlos

As part of my work on a paper on C Metaprogramming with gcc, I have 
implemented a sqort/insertion sort algorithm based off of the legacy 
glibc implementation (when msort is not used) who's performance far 
exceeds the current implementation. I have opened an enhancement bug 
here https://sourceware.org/bugzilla/show_bug.cgi?id=17941 with details 
about this, and was told I need to talk to you guys.

In short, a basic benchmark shows an average performance increase 83% 
(16k of data, using element sizes from 1 to 8192 bytes). These are 
techniques that I appear to have invented in 2012; I haven't run into 
anybody else doing it just yet, but I hope it will catch on. Currently I 
only know this to work on GCC, but it may be possible that other 
compilers have enough support (in their extensions) to make it possible.

The basic theory of operation is that by exploiting -findirect-inline, 
and attributes always_inline & flatten (in some cases), we force the 
compiler to inline a larger function than would normally be considered 
beneficial -- this is where instantiation of the "pseudo-template 
function" (as I call it) occurs. In this way a more efficient sort 
function is generated, because we know at compile-time the exact size 
and alignment of data elements and we can inline the compar function. We 
can even decide if we'll be doing a direct or indirect sort at 
compile-time, tune stack size, etc. The code its self needs more cleanup 
to be ready to integrate into glibc, although the algorithm appears to 
now be quite efficient and error-free.

The main question is really where it belongs. The Boost project was 
started as a place for experimental libraries, many of which ended up in 
a later C++ standard. As I see it, a similar process must take place 
with C metaprograming, as it provides a powerful tool to improve 
performance in programs, libraries and system-level code. I thought that 
glibc might be a nice place for this because there are already so many 
extensions. This is slightly different however, because we aren't just 
providing new functions, but new metafunctions.

Either way, I would very much like this work to fall under the GNU 
umbrella if at all possible.

Daniel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Propose C Metaprogramming-based qsort as GNU extension
  2015-02-16 21:13 Propose C Metaprogramming-based qsort as GNU extension Daniel Santos
@ 2015-04-01 15:27 ` Ondřej Bílka
  2015-05-13 15:11   ` Daniel Santos
  0 siblings, 1 reply; 3+ messages in thread
From: Ondřej Bílka @ 2015-04-01 15:27 UTC (permalink / raw)
  To: Daniel Santos; +Cc: libc-alpha, carlos

On Mon, Feb 16, 2015 at 03:15:07PM -0600, Daniel Santos wrote:
> As part of my work on a paper on C Metaprogramming with gcc, I have
> implemented a sqort/insertion sort algorithm based off of the legacy
> glibc implementation (when msort is not used) who's performance far
> exceeds the current implementation. I have opened an enhancement bug
> here https://sourceware.org/bugzilla/show_bug.cgi?id=17941 with
> details about this, and was told I need to talk to you guys.
> 
> In short, a basic benchmark shows an average performance increase
> 83% (16k of data, using element sizes from 1 to 8192 bytes). These
> are techniques that I appear to have invented in 2012; I haven't run
> into anybody else doing it just yet, but I hope it will catch on.
> Currently I only know this to work on GCC, but it may be possible
> that other compilers have enough support (in their extensions) to
> make it possible.
> 
> The basic theory of operation is that by exploiting
> -findirect-inline, and attributes always_inline & flatten (in some
> cases), we force the compiler to inline a larger function than would
> normally be considered beneficial -- this is where instantiation of
> the "pseudo-template function" (as I call it) occurs. In this way a
> more efficient sort function is generated, because we know at
> compile-time the exact size and alignment of data elements and we
> can inline the compar function. We can even decide if we'll be doing
> a direct or indirect sort at compile-time, tune stack size, etc. The
> code its self needs more cleanup to be ready to integrate into
> glibc, although the algorithm appears to now be quite efficient and
> error-free.
> 
> The main question is really where it belongs. The Boost project was
> started as a place for experimental libraries, many of which ended
> up in a later C++ standard. As I see it, a similar process must take
> place with C metaprograming, as it provides a powerful tool to
> improve performance in programs, libraries and system-level code. I
> thought that glibc might be a nice place for this because there are
> already so many extensions. This is slightly different however,
> because we aren't just providing new functions, but new
> metafunctions.
> 
> Either way, I would very much like this work to fall under the GNU
> umbrella if at all possible.
> 
> Daniel

Something like this is on my backlog. You can start with patch with
always_inline qsort template in header surrounded by

# if __GNUC_PREREQ (4, 8)

You do several basic optimizations. I wanted to do more, a hard part
would make gcc recognize different comparison functions. In most cases
radixsort is faster than quick/mergesort.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Propose C Metaprogramming-based qsort as GNU extension
  2015-04-01 15:27 ` Ondřej Bílka
@ 2015-05-13 15:11   ` Daniel Santos
  0 siblings, 0 replies; 3+ messages in thread
From: Daniel Santos @ 2015-05-13 15:11 UTC (permalink / raw)
  To: Ondřej Bílka; +Cc: libc-alpha, carlos

On 04/01/2015 10:26 AM, Ondřej Bílka wrote:
> On Mon, Feb 16, 2015 at 03:15:07PM -0600, Daniel Santos wrote:
>> As part of my work on a paper on C Metaprogramming with gcc, I have
>> implemented a sqort/insertion sort algorithm based off of the legacy
>> glibc implementation (when msort is not used) who's performance far
>> exceeds the current implementation. I have opened an enhancement bug
>> here https://sourceware.org/bugzilla/show_bug.cgi?id=17941 with
>> details about this, and was told I need to talk to you guys.
>>
>> In short, a basic benchmark shows an average performance increase
>> 83% (16k of data, using element sizes from 1 to 8192 bytes). These
>> are techniques that I appear to have invented in 2012; I haven't run
>> into anybody else doing it just yet, but I hope it will catch on.
>> Currently I only know this to work on GCC, but it may be possible
>> that other compilers have enough support (in their extensions) to
>> make it possible.
>>
>> The basic theory of operation is that by exploiting
>> -findirect-inline, and attributes always_inline & flatten (in some
>> cases), we force the compiler to inline a larger function than would
>> normally be considered beneficial -- this is where instantiation of
>> the "pseudo-template function" (as I call it) occurs. In this way a
>> more efficient sort function is generated, because we know at
>> compile-time the exact size and alignment of data elements and we
>> can inline the compar function. We can even decide if we'll be doing
>> a direct or indirect sort at compile-time, tune stack size, etc. The
>> code its self needs more cleanup to be ready to integrate into
>> glibc, although the algorithm appears to now be quite efficient and
>> error-free.
>>
>> The main question is really where it belongs. The Boost project was
>> started as a place for experimental libraries, many of which ended
>> up in a later C++ standard. As I see it, a similar process must take
>> place with C metaprograming, as it provides a powerful tool to
>> improve performance in programs, libraries and system-level code. I
>> thought that glibc might be a nice place for this because there are
>> already so many extensions. This is slightly different however,
>> because we aren't just providing new functions, but new
>> metafunctions.
>>
>> Either way, I would very much like this work to fall under the GNU
>> umbrella if at all possible.
>>
>> Daniel
> Something like this is on my backlog. You can start with patch with
> always_inline qsort template in header surrounded by
>
> # if __GNUC_PREREQ (4, 8)
>
> You do several basic optimizations. I wanted to do more, a hard part
> would make gcc recognize different comparison functions. In most cases
> radixsort is faster than quick/mergesort.

I sincerely apologize for my delayed response, I've had some things come 
up that have arrested my attention for the last month or so.

Please elaborate on the difficulty of getting gcc to recognize different 
comparison functions. At current, in none of my tests have I found a 
comparison function not being fully inlined (via pointer indirection). 
It is true that __attribute__((always_inline)) isn't really guaranteed 
to always inline a function via pointer (at current), but if I 
understand correctly, that is the reason that the flatten attribute was 
added -- exactly to ensure that that is occurs. Obviously, the 
comparison function needs to be in the same translation unit or an 
inefficient function call is made -- and it's not the function call so 
much that tends to be costly, but the inability to perform optimizations 
across the function boundary.

Aside from that, I would love to hear some suggestions on other 
optimizations.  I think I need to finish up my benchmarking scripts to 
be able to more quickly discern performance changes due to differing 
techniques. As far as radix sort vs quick/merge sort, your suggestion 
causes me to reconsider even the naming of the template function, being 
that there's no naming standard that needs being followed, perhaps 
"qmsort_template" would be more descriptive and a separate 
"radixsort_template" could be added.

One final note on the quick/merge sort, one of it's major drawbacks is 
memory consumption and one thing I haven't addressed yet is the static 
stack size the function allocates. This will eventually be controllable 
via the struct qsort_def fields max_count_bits and max_stack_space, 
which don't do anything yet. Another optimization that isn't employed is 
that if a program knows that it will always have less than 2^16 or 2^32 
data elements (assuming 64 bit pointers on the later), stack space can 
be further reduced by using offsets rather than pointers, potentially 
decreasing data cache misses at the expense (perhaps nominal) of using 
indexed addressing.

Daniel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-13 15:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-16 21:13 Propose C Metaprogramming-based qsort as GNU extension Daniel Santos
2015-04-01 15:27 ` Ondřej Bílka
2015-05-13 15:11   ` Daniel Santos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).