public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC on a new optimization
@ 2019-07-01 21:58 Gary Oblock
  2019-07-01 22:08 ` Jeff Law
  2019-07-02  9:46 ` Richard Biener
  0 siblings, 2 replies; 7+ messages in thread
From: Gary Oblock @ 2019-07-01 21:58 UTC (permalink / raw)
  To: gcc

I've been looking at trying to optimize the performance of code for
programs that use functions like qsort where a function is passed the
name of a function and some constant parameter(s).

The function qsort itself is an excellent example of what I'm trying to show
what I want to do, except for being in a library, so please ignore
that while I proceed assuming that that qsort is not in a library.  In
qsort the user passes in a size of the array elements and comparison
function name in addition to the location of the array to be sorted. I
noticed that for a given call site that the first two are always the
same so why not create a specialized version of qsort that eliminates
them and internally uses a constant value for the size parameter and
does a direct call instead of an indirect call. The later lets the
comparison function code be inlined.

This seems to me to be a very useful optimization where heavy use is
made of this programming idiom. I saw a 30%+ overall improvement when
I specialized a function like this by hand in an application.

My question is does anything inside gcc do something similar? I don't
want to reinvent the wheel and I want to do something that plays
nicely with the rest of gcc so it makes it into real world. Note, I
should mention that I'm an experienced compiler developed and I'm
planning on adding this optimization unless it's obvious from the
ensuing discussion that either it's a bad idea or that it's a matter
of simply tweaking gcc a bit to get this optimization to occur.

Thanks,

Gary Oblock

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC on a new optimization
  2019-07-01 21:58 RFC on a new optimization Gary Oblock
@ 2019-07-01 22:08 ` Jeff Law
  2019-07-01 23:02   ` [EXT] " Gary Oblock
  2019-07-02  9:46 ` Richard Biener
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Law @ 2019-07-01 22:08 UTC (permalink / raw)
  To: Gary Oblock, gcc

On 7/1/19 3:58 PM, Gary Oblock wrote:
> I've been looking at trying to optimize the performance of code for
> programs that use functions like qsort where a function is passed the
> name of a function and some constant parameter(s).
> 
> The function qsort itself is an excellent example of what I'm trying to show
> what I want to do, except for being in a library, so please ignore
> that while I proceed assuming that that qsort is not in a library.  In
> qsort the user passes in a size of the array elements and comparison
> function name in addition to the location of the array to be sorted. I
> noticed that for a given call site that the first two are always the
> same so why not create a specialized version of qsort that eliminates
> them and internally uses a constant value for the size parameter and
> does a direct call instead of an indirect call. The later lets the
> comparison function code be inlined.
> 
> This seems to me to be a very useful optimization where heavy use is
> made of this programming idiom. I saw a 30%+ overall improvement when
> I specialized a function like this by hand in an application.
> 
> My question is does anything inside gcc do something similar? I don't
> want to reinvent the wheel and I want to do something that plays
> nicely with the rest of gcc so it makes it into real world. Note, I
> should mention that I'm an experienced compiler developed and I'm
> planning on adding this optimization unless it's obvious from the
> ensuing discussion that either it's a bad idea or that it's a matter
> of simply tweaking gcc a bit to get this optimization to occur.
Jan is the expert in this space, but yes, GCC has devirtualization and
function specialization.  See ipa-devirt.c and ipa-cp.c  You can use the
-fdump-ipa-all-details option to produce debugging dumps for the IPA
passes.  THat might help guide you a bit.


jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] Re: RFC on a new optimization
  2019-07-01 22:08 ` Jeff Law
@ 2019-07-01 23:02   ` Gary Oblock
  2019-07-01 23:05     ` Jeff Law
  0 siblings, 1 reply; 7+ messages in thread
From: Gary Oblock @ 2019-07-01 23:02 UTC (permalink / raw)
  To: Jeff Law, gcc

On 7/1/19 3:08 PM, Jeff Law wrote:
> External Email
>
> ----------------------------------------------------------------------
> On 7/1/19 3:58 PM, Gary Oblock wrote:
>> I've been looking at trying to optimize the performance of code for
>> programs that use functions like qsort where a function is passed the
>> name of a function and some constant parameter(s).
>>
>> The function qsort itself is an excellent example of what I'm trying to show
>> what I want to do, except for being in a library, so please ignore
>> that while I proceed assuming that that qsort is not in a library.  In
>> qsort the user passes in a size of the array elements and comparison
>> function name in addition to the location of the array to be sorted. I
>> noticed that for a given call site that the first two are always the
>> same so why not create a specialized version of qsort that eliminates
>> them and internally uses a constant value for the size parameter and
>> does a direct call instead of an indirect call. The later lets the
>> comparison function code be inlined.
>>
>> This seems to me to be a very useful optimization where heavy use is
>> made of this programming idiom. I saw a 30%+ overall improvement when
>> I specialized a function like this by hand in an application.
>>
>> My question is does anything inside gcc do something similar? I don't
>> want to reinvent the wheel and I want to do something that plays
>> nicely with the rest of gcc so it makes it into real world. Note, I
>> should mention that I'm an experienced compiler developed and I'm
>> planning on adding this optimization unless it's obvious from the
>> ensuing discussion that either it's a bad idea or that it's a matter
>> of simply tweaking gcc a bit to get this optimization to occur.
> Jan is the expert in this space, but yes, GCC has devirtualization and
> function specialization.  See ipa-devirt.c and ipa-cp.c  You can use the
> -fdump-ipa-all-details option to produce debugging dumps for the IPA
> passes.  THat might help guide you a bit.
>
>
> jeff
>
Jeff,

I assume you mean Jan Hubicka?

I'll certainly have a look at the code dump you mention. I do have
a high level design in mind already but I'm always up for making
my life easier.

Thanks,

Gary

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] Re: RFC on a new optimization
  2019-07-01 23:02   ` [EXT] " Gary Oblock
@ 2019-07-01 23:05     ` Jeff Law
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Law @ 2019-07-01 23:05 UTC (permalink / raw)
  To: Gary Oblock, gcc

On 7/1/19 5:01 PM, Gary Oblock wrote:
> On 7/1/19 3:08 PM, Jeff Law wrote:
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 7/1/19 3:58 PM, Gary Oblock wrote:
>>> I've been looking at trying to optimize the performance of code for
>>> programs that use functions like qsort where a function is passed the
>>> name of a function and some constant parameter(s).
>>>
>>> The function qsort itself is an excellent example of what I'm trying to show
>>> what I want to do, except for being in a library, so please ignore
>>> that while I proceed assuming that that qsort is not in a library.  In
>>> qsort the user passes in a size of the array elements and comparison
>>> function name in addition to the location of the array to be sorted. I
>>> noticed that for a given call site that the first two are always the
>>> same so why not create a specialized version of qsort that eliminates
>>> them and internally uses a constant value for the size parameter and
>>> does a direct call instead of an indirect call. The later lets the
>>> comparison function code be inlined.
>>>
>>> This seems to me to be a very useful optimization where heavy use is
>>> made of this programming idiom. I saw a 30%+ overall improvement when
>>> I specialized a function like this by hand in an application.
>>>
>>> My question is does anything inside gcc do something similar? I don't
>>> want to reinvent the wheel and I want to do something that plays
>>> nicely with the rest of gcc so it makes it into real world. Note, I
>>> should mention that I'm an experienced compiler developed and I'm
>>> planning on adding this optimization unless it's obvious from the
>>> ensuing discussion that either it's a bad idea or that it's a matter
>>> of simply tweaking gcc a bit to get this optimization to occur.
>> Jan is the expert in this space, but yes, GCC has devirtualization and
>> function specialization.  See ipa-devirt.c and ipa-cp.c  You can use the
>> -fdump-ipa-all-details option to produce debugging dumps for the IPA
>> passes.  THat might help guide you a bit.
>>
>>
>> jeff
>>
> Jeff,
> 
> I assume you mean Jan Hubicka?
Yes.

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC on a new optimization
  2019-07-01 21:58 RFC on a new optimization Gary Oblock
  2019-07-01 22:08 ` Jeff Law
@ 2019-07-02  9:46 ` Richard Biener
  2019-07-02 10:02   ` Martin Jambor
  2019-07-02 18:04   ` [EXT] " Gary Oblock
  1 sibling, 2 replies; 7+ messages in thread
From: Richard Biener @ 2019-07-02  9:46 UTC (permalink / raw)
  To: Gary Oblock; +Cc: gcc

On Mon, Jul 1, 2019 at 11:58 PM Gary Oblock <goblock@marvell.com> wrote:
>
> I've been looking at trying to optimize the performance of code for
> programs that use functions like qsort where a function is passed the
> name of a function and some constant parameter(s).
>
> The function qsort itself is an excellent example of what I'm trying to show
> what I want to do, except for being in a library, so please ignore
> that while I proceed assuming that that qsort is not in a library.  In
> qsort the user passes in a size of the array elements and comparison
> function name in addition to the location of the array to be sorted. I
> noticed that for a given call site that the first two are always the
> same so why not create a specialized version of qsort that eliminates
> them and internally uses a constant value for the size parameter and
> does a direct call instead of an indirect call. The later lets the
> comparison function code be inlined.
>
> This seems to me to be a very useful optimization where heavy use is
> made of this programming idiom. I saw a 30%+ overall improvement when
> I specialized a function like this by hand in an application.
>
> My question is does anything inside gcc do something similar? I don't
> want to reinvent the wheel and I want to do something that plays
> nicely with the rest of gcc so it makes it into real world. Note, I
> should mention that I'm an experienced compiler developed and I'm
> planning on adding this optimization unless it's obvious from the
> ensuing discussion that either it's a bad idea or that it's a matter
> of simply tweaking gcc a bit to get this optimization to occur.

GCC performs intraprocedural constant propagation (IPA-CP) and
this should catch your case already.  The IPA-CP function cloning
might have too constrained limits (on code bloat) to apply on a
specific testcase but all functionality for the qsort case should
be available.

Richard.

> Thanks,
>
> Gary Oblock

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFC on a new optimization
  2019-07-02  9:46 ` Richard Biener
@ 2019-07-02 10:02   ` Martin Jambor
  2019-07-02 18:04   ` [EXT] " Gary Oblock
  1 sibling, 0 replies; 7+ messages in thread
From: Martin Jambor @ 2019-07-02 10:02 UTC (permalink / raw)
  To: Richard Biener, Gary Oblock; +Cc: gcc

Hi,

On Tue, Jul 02 2019, Richard Biener wrote:
> On Mon, Jul 1, 2019 at 11:58 PM Gary Oblock <goblock@marvell.com> wrote:
>>
>> I've been looking at trying to optimize the performance of code for
>> programs that use functions like qsort where a function is passed the
>> name of a function and some constant parameter(s).
>>
>> The function qsort itself is an excellent example of what I'm trying to show
>> what I want to do, except for being in a library, so please ignore
>> that while I proceed assuming that that qsort is not in a library.  In
>> qsort the user passes in a size of the array elements and comparison
>> function name in addition to the location of the array to be sorted. I
>> noticed that for a given call site that the first two are always the
>> same so why not create a specialized version of qsort that eliminates
>> them and internally uses a constant value for the size parameter and
>> does a direct call instead of an indirect call. The later lets the
>> comparison function code be inlined.
>>
>> This seems to me to be a very useful optimization where heavy use is
>> made of this programming idiom. I saw a 30%+ overall improvement when
>> I specialized a function like this by hand in an application.
>>
>> My question is does anything inside gcc do something similar? I don't
>> want to reinvent the wheel and I want to do something that plays
>> nicely with the rest of gcc so it makes it into real world. Note, I
>> should mention that I'm an experienced compiler developed and I'm
>> planning on adding this optimization unless it's obvious from the
>> ensuing discussion that either it's a bad idea or that it's a matter
>> of simply tweaking gcc a bit to get this optimization to occur.
>
> GCC performs intraprocedural constant propagation (IPA-CP) and
> this should catch your case already.  The IPA-CP function cloning
> might have too constrained limits (on code bloat) to apply on a
> specific testcase but all functionality for the qsort case should
> be available.

At least in 505.mcf/605.mcf we do inline the comparator to the qsort
function - and in order to do that, IPA-CP actually creates two clones,
one for each of the two used comparators in the benchmark, see:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

I'll be happy to see any examples where it fails to do the right thing.

Martin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] Re: RFC on a new optimization
  2019-07-02  9:46 ` Richard Biener
  2019-07-02 10:02   ` Martin Jambor
@ 2019-07-02 18:04   ` Gary Oblock
  1 sibling, 0 replies; 7+ messages in thread
From: Gary Oblock @ 2019-07-02 18:04 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc

On 7/2/19 2:45 AM, Richard Biener wrote:
> External Email
>
> ----------------------------------------------------------------------
> On Mon, Jul 1, 2019 at 11:58 PM Gary Oblock <goblock@marvell.com> wrote:
>> I've been looking at trying to optimize the performance of code for
>> programs that use functions like qsort where a function is passed the
>> name of a function and some constant parameter(s).
>>
>> The function qsort itself is an excellent example of what I'm trying to show
>> what I want to do, except for being in a library, so please ignore
>> that while I proceed assuming that that qsort is not in a library.  In
>> qsort the user passes in a size of the array elements and comparison
>> function name in addition to the location of the array to be sorted. I
>> noticed that for a given call site that the first two are always the
>> same so why not create a specialized version of qsort that eliminates
>> them and internally uses a constant value for the size parameter and
>> does a direct call instead of an indirect call. The later lets the
>> comparison function code be inlined.
>>
>> This seems to me to be a very useful optimization where heavy use is
>> made of this programming idiom. I saw a 30%+ overall improvement when
>> I specialized a function like this by hand in an application.
>>
>> My question is does anything inside gcc do something similar? I don't
>> want to reinvent the wheel and I want to do something that plays
>> nicely with the rest of gcc so it makes it into real world. Note, I
>> should mention that I'm an experienced compiler developed and I'm
>> planning on adding this optimization unless it's obvious from the
>> ensuing discussion that either it's a bad idea or that it's a matter
>> of simply tweaking gcc a bit to get this optimization to occur.
> GCC performs intraprocedural constant propagation (IPA-CP) and
> this should catch your case already.  The IPA-CP function cloning
> might have too constrained limits (on code bloat) to apply on a
> specific testcase but all functionality for the qsort case should
> be available.
>
> Richard.
>
>> Thanks,
>>
>> Gary Oblock
Richard, I'm planning on using profile based heuristics
that are fairly conservative. However, I'll also also let the user
have access to a parameter to relax the heuristics to the degree
they desire if they want to do so.

Gary

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-02 18:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-01 21:58 RFC on a new optimization Gary Oblock
2019-07-01 22:08 ` Jeff Law
2019-07-01 23:02   ` [EXT] " Gary Oblock
2019-07-01 23:05     ` Jeff Law
2019-07-02  9:46 ` Richard Biener
2019-07-02 10:02   ` Martin Jambor
2019-07-02 18:04   ` [EXT] " Gary Oblock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).