Creating a wrapper around a function at compile time

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Creating a wrapper around a function at compile time
@ 2022-07-14 12:38 Erick Ochoa
  2022-07-14 13:27 ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Erick Ochoa @ 2022-07-14 12:38 UTC (permalink / raw)
  To: gcc

Hello,

I'm looking for some help in how to create a new function at compile time /
link time. The idea is an alternative form of constant propagation.

The current implementation of ipa-cp, may specialize functions for which
arguments may be known at compile time. Call graph edges from the caller to
the new specialized functions will replace the old call graph edges from
the caller to the original functions. Call graph edges which have no known
compile time constants will still point to the original unspecialized
function.

I would like to explore a different approach to function specialization.
Instead of only specializing functions which are guaranteed to have a
compile time constant, I would like to also attempt to specialize the edges
which do not have compile time constants with a parameter test. In other
words, for call graph edges with non-constant arguments at compile time,
create a wrapper function around the original function and do a switch
statement around parameters.

For example, let's say we have a function mul, which multiplies two
integers.

int
mul (int a, int b) {
  return a * b;
}

Function mul is called from three different callsites in the whole program:

A: mul (a, 2);
B: mul (b, 4);
C: mul (c, d);

At the moment, ipa-cp might specialize mul into 3 different versions:

// unoptimized original mul
int
mul (int a, int b) {
  return a * b;
}

// optimized for b = 2;
int
mul.constprop1 (int a) {
  // DEBUG b => 2
  return a << 1;
}

// optimized for b = 4;
int
mul.constprop2 (int a) {
  // DEBUG b => 4
  return a << 2;
}

and change the callsites to:

A: mul.constprop1 (a);
B: mul.constprop2 (b);
C: mul (c, d);

I would like instead to do the following:

Create a function mul_test_param

int
mul_test_param (int a, int b) {
  switch (b)
  {
    case 2:
      return mul.constprop1 (a);
      break;
    case 4:
      return mul.constprop2 (a);
      break;
    default:
      return mul (a, b);
      break;
  }
}

The function mul_test_param will test each parameter and then call the
specialized function. The callsites can either be changed to:

A: mul.constprop1 (a);
B: mul.constprop2 (b);
C: mul_test_param (c, d);

or

A: mul_test_param (a, 2);
B: mul_test_param (b, 4);
C: mul_test_param (c, d);

The idea is that there exist some class of functions for which the
parameter test and the specialized version is less expensive than the
original function version. And if, at runtime, d might be a quasi-constant
with a good likelihood of being either 2 or 4, then it makes sense to have
this parameter test.

This is very similar to function tests for making direct to indirect
functions and to what could be done in value profiling.

I already know how to achieve most of this, but I have never created a
function from scratch. That is the bit that is challenging to me at the
moment. Any help is appreciated.

Thanks!

-Erick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-14 12:38 Creating a wrapper around a function at compile time Erick Ochoa
@ 2022-07-14 13:27 ` Richard Biener
  2022-07-14 13:29   ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2022-07-14 13:27 UTC (permalink / raw)
  To: Erick Ochoa; +Cc: GCC Development

On Thu, Jul 14, 2022 at 2:35 PM Erick Ochoa via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hello,
>
> I'm looking for some help in how to create a new function at compile time /
> link time. The idea is an alternative form of constant propagation.
>
> The current implementation of ipa-cp, may specialize functions for which
> arguments may be known at compile time. Call graph edges from the caller to
> the new specialized functions will replace the old call graph edges from
> the caller to the original functions. Call graph edges which have no known
> compile time constants will still point to the original unspecialized
> function.
>
> I would like to explore a different approach to function specialization.
> Instead of only specializing functions which are guaranteed to have a
> compile time constant, I would like to also attempt to specialize the edges
> which do not have compile time constants with a parameter test. In other
> words, for call graph edges with non-constant arguments at compile time,
> create a wrapper function around the original function and do a switch
> statement around parameters.
>
> For example, let's say we have a function mul, which multiplies two
> integers.
>
> int
> mul (int a, int b) {
>   return a * b;
> }
>
> Function mul is called from three different callsites in the whole program:
>
> A: mul (a, 2);
> B: mul (b, 4);
> C: mul (c, d);
>
> At the moment, ipa-cp might specialize mul into 3 different versions:
>
> // unoptimized original mul
> int
> mul (int a, int b) {
>   return a * b;
> }
>
> // optimized for b = 2;
> int
> mul.constprop1 (int a) {
>   // DEBUG b => 2
>   return a << 1;
> }
>
> // optimized for b = 4;
> int
> mul.constprop2 (int a) {
>   // DEBUG b => 4
>   return a << 2;
> }
>
> and change the callsites to:
>
> A: mul.constprop1 (a);
> B: mul.constprop2 (b);
> C: mul (c, d);
>
> I would like instead to do the following:
>
> Create a function mul_test_param
>
> int
> mul_test_param (int a, int b) {
>   switch (b)
>   {
>     case 2:
>       return mul.constprop1 (a);
>       break;
>     case 4:
>       return mul.constprop2 (a);
>       break;
>     default:
>       return mul (a, b);
>       break;
>   }
> }
>
> The function mul_test_param will test each parameter and then call the
> specialized function. The callsites can either be changed to:
>
> A: mul.constprop1 (a);
> B: mul.constprop2 (b);
> C: mul_test_param (c, d);
>
> or
>
> A: mul_test_param (a, 2);
> B: mul_test_param (b, 4);
> C: mul_test_param (c, d);
>
> The idea is that there exist some class of functions for which the
> parameter test and the specialized version is less expensive than the
> original function version. And if, at runtime, d might be a quasi-constant
> with a good likelihood of being either 2 or 4, then it makes sense to have
> this parameter test.
>
> This is very similar to function tests for making direct to indirect
> functions and to what could be done in value profiling.
>
> I already know how to achieve most of this, but I have never created a
> function from scratch. That is the bit that is challenging to me at the
> moment. Any help is appreciated.

So instead of wrapping the function why not transform the original function
to have a prologue doing a runtime check for the compile-time specialized
versions and perform tail-calls to them?

What I'm missing is who would call mul_test_param in your case?

>
> Thanks!
>
> -Erick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-14 13:27 ` Richard Biener
@ 2022-07-14 13:29   ` Richard Biener
  2022-07-14 13:46     ` Erick Ochoa
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2022-07-14 13:29 UTC (permalink / raw)
  To: Erick Ochoa; +Cc: GCC Development

On Thu, Jul 14, 2022 at 3:27 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Jul 14, 2022 at 2:35 PM Erick Ochoa via Gcc <gcc@gcc.gnu.org> wrote:
> >
> > Hello,
> >
> > I'm looking for some help in how to create a new function at compile time /
> > link time. The idea is an alternative form of constant propagation.
> >
> > The current implementation of ipa-cp, may specialize functions for which
> > arguments may be known at compile time. Call graph edges from the caller to
> > the new specialized functions will replace the old call graph edges from
> > the caller to the original functions. Call graph edges which have no known
> > compile time constants will still point to the original unspecialized
> > function.
> >
> > I would like to explore a different approach to function specialization.
> > Instead of only specializing functions which are guaranteed to have a
> > compile time constant, I would like to also attempt to specialize the edges
> > which do not have compile time constants with a parameter test. In other
> > words, for call graph edges with non-constant arguments at compile time,
> > create a wrapper function around the original function and do a switch
> > statement around parameters.
> >
> > For example, let's say we have a function mul, which multiplies two
> > integers.
> >
> > int
> > mul (int a, int b) {
> >   return a * b;
> > }
> >
> > Function mul is called from three different callsites in the whole program:
> >
> > A: mul (a, 2);
> > B: mul (b, 4);
> > C: mul (c, d);
> >
> > At the moment, ipa-cp might specialize mul into 3 different versions:
> >
> > // unoptimized original mul
> > int
> > mul (int a, int b) {
> >   return a * b;
> > }
> >
> > // optimized for b = 2;
> > int
> > mul.constprop1 (int a) {
> >   // DEBUG b => 2
> >   return a << 1;
> > }
> >
> > // optimized for b = 4;
> > int
> > mul.constprop2 (int a) {
> >   // DEBUG b => 4
> >   return a << 2;
> > }
> >
> > and change the callsites to:
> >
> > A: mul.constprop1 (a);
> > B: mul.constprop2 (b);
> > C: mul (c, d);
> >
> > I would like instead to do the following:
> >
> > Create a function mul_test_param
> >
> > int
> > mul_test_param (int a, int b) {
> >   switch (b)
> >   {
> >     case 2:
> >       return mul.constprop1 (a);
> >       break;
> >     case 4:
> >       return mul.constprop2 (a);
> >       break;
> >     default:
> >       return mul (a, b);
> >       break;
> >   }
> > }
> >
> > The function mul_test_param will test each parameter and then call the
> > specialized function. The callsites can either be changed to:
> >
> > A: mul.constprop1 (a);
> > B: mul.constprop2 (b);
> > C: mul_test_param (c, d);
> >
> > or
> >
> > A: mul_test_param (a, 2);
> > B: mul_test_param (b, 4);
> > C: mul_test_param (c, d);
> >
> > The idea is that there exist some class of functions for which the
> > parameter test and the specialized version is less expensive than the
> > original function version. And if, at runtime, d might be a quasi-constant
> > with a good likelihood of being either 2 or 4, then it makes sense to have
> > this parameter test.
> >
> > This is very similar to function tests for making direct to indirect
> > functions and to what could be done in value profiling.
> >
> > I already know how to achieve most of this, but I have never created a
> > function from scratch. That is the bit that is challenging to me at the
> > moment. Any help is appreciated.
>
> So instead of wrapping the function why not transform the original function
> to have a prologue doing a runtime check for the compile-time specialized
> versions and perform tail-calls to them?
>
> What I'm missing is who would call mul_test_param in your case?

Following your variant more closely would be doing value profiling
of function parameters and then "speculative IPA-CP".

Richard.

> >
> > Thanks!
> >
> > -Erick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-14 13:29   ` Richard Biener
@ 2022-07-14 13:46     ` Erick Ochoa
  2022-07-14 13:50       ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Erick Ochoa @ 2022-07-14 13:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

Hi Richard,

>
> > So instead of wrapping the function why not transform the original
> function
> > to have a prologue doing a runtime check for the compile-time specialized
> > versions and perform tail-calls to them?
> >
> > What I'm missing is who would call mul_test_param in your case?
>

The idea is that all callsites to mul may call instead mul_test_param or
only those for which there is no known compile time constant. If we had
some more information about the distribution of the parameter values at
runtime, then we could even choose not to use the wrapper.

> Following your variant more closely would be doing value profiling
> of function parameters and then "speculative IPA-CP".
>

Yes, the idea of doing value profiling on function parameters certainly
fits this very well. I refrained from calling it "speculative IPA-CP" and
instead called it specialization with a parameter test but the idea I think
is the same. However, the main difference between "speculative IPA-CP" and
this idea is that (if I understand correctly how speculative IPA-CP works)
is that instead of changing the callsite C to the following version:

C: mul_test_param (c, d);

speculative IPA-CP will have the transformation

C: if (a == 2) mul.constprop1 (a)
    else if (a == 4) mul.constprop2 (a)
    else mul (a, b)

Which depending on how many non compile-time constant callsites there are,
will increase the size of the binary. That's why I prefer the wrapper
around the function. But this would be essentially inlining the wrapper.

As of now, the only "speculative IPA-CP" that I've seen is the speculation
on the target of the indirect edge. I could look into exactly how this
mechanism works to try to instead speculate on an arbitrary argument. Do
you think that would be a good first step instead of creating a wrapper and
replacing the edges to the wrapper?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-14 13:46     ` Erick Ochoa
@ 2022-07-14 13:50       ` Richard Biener
  2022-07-14 14:08         ` Erick Ochoa
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2022-07-14 13:50 UTC (permalink / raw)
  To: Erick Ochoa; +Cc: GCC Development

On Thu, Jul 14, 2022 at 3:42 PM Erick Ochoa <eochoa@gcc.gnu.org> wrote:
>
> Hi Richard,
>
>
>> >
>> > So instead of wrapping the function why not transform the original function
>> > to have a prologue doing a runtime check for the compile-time specialized
>> > versions and perform tail-calls to them?
>> >
>> > What I'm missing is who would call mul_test_param in your case?
>
>
> The idea is that all callsites to mul may call instead mul_test_param or only those for which there is no known compile time constant. If we had some more information about the distribution of the parameter values at runtime, then we could even choose not to use the wrapper.
>
>>
>> Following your variant more closely would be doing value profiling
>> of function parameters and then "speculative IPA-CP".
>
>
> Yes, the idea of doing value profiling on function parameters certainly fits this very well. I refrained from calling it "speculative IPA-CP" and instead called it specialization with a parameter test but the idea I think is the same. However, the main difference between "speculative IPA-CP" and this idea is that (if I understand correctly how speculative IPA-CP works) is that instead of changing the callsite C to the following version:
>
> C: mul_test_param (c, d);
>
> speculative IPA-CP will have the transformation
>
> C: if (a == 2) mul.constprop1 (a)
>     else if (a == 4) mul.constprop2 (a)
>     else mul (a, b)
>
> Which depending on how many non compile-time constant callsites there are, will increase the size of the binary. That's why I prefer the wrapper around the function. But this would be essentially inlining the wrapper.

With a value-profile it would be per call site and IPA-CP can use that
to direct the cloning.  I'm not sure how many
values a value histogram can track reliably here.

>
> As of now, the only "speculative IPA-CP" that I've seen is the speculation on the target of the indirect edge. I could look into exactly how this mechanism works to try to instead speculate on an arbitrary argument. Do you think that would be a good first step instead of creating a wrapper and replacing the edges to the wrapper?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-14 13:50       ` Richard Biener
@ 2022-07-14 14:08         ` Erick Ochoa
  2022-07-14 14:10           ` Martin Liška
  0 siblings, 1 reply; 10+ messages in thread
From: Erick Ochoa @ 2022-07-14 14:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

On Thu, 14 Jul 2022 at 15:51, Richard Biener <richard.guenther@gmail.com>
wrote:

> With a value-profile it would be per call site and IPA-CP can use that
> to direct the cloning.  I'm not sure how many
> values a value histogram can track reliably here.
>

Last time I checked, value profiling can only track a single value per
statement. So that would need to be augmented if we wanted to specialize
for more than one parameter. However, this can also be a bad idea. Because
even if parameters a and b are quasi-constants, perhaps the cross product a
x b follows a more uniform distribution. This means that optimizing on a or
on b might be a good idea but optimizing on a x b is a bad idea. (There is
still some work to be done defining when specializing is a good idea or
not). Also, I do not believe that value profiling profiles arguments?
According to the comments only values regarding division and/or
indirect/virtual call specialization are tracked during value profiling.

However, even if value profiling would be a great addition to this, I would
like to explore the transformation independent of value profiling. There
are some heuristics that could be used, for example looking at the
callsites which do have constant arguments and which optimizations are
enabled by those constants.

You mention that: "IPA-CP can use that to direct the cloning". Can you
elaborate a bit further? I guess the idea here is augmenting ipa-cp with a
list of "likely values" and extend the infrastructure which deals with
speculation to support arguments. Am I following correctly? Any other
suggestions?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-14 14:08         ` Erick Ochoa
@ 2022-07-14 14:10           ` Martin Liška
  2022-07-14 14:25             ` Erick Ochoa
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Liška @ 2022-07-14 14:10 UTC (permalink / raw)
  To: Erick Ochoa, Richard Biener; +Cc: GCC Development

On 7/14/22 16:08, Erick Ochoa via Gcc wrote:
> Last time I checked, value profiling can only track a single value per
> statement.

Hi.

Take a look at HIST_TYPE_INDIR_CALL which we use for tracking at maximum 32
(#define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32) and we use for indirect call
speculative calls which you can see for instance here:

./gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C

Cheers,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-14 14:10           ` Martin Liška
@ 2022-07-14 14:25             ` Erick Ochoa
  2022-07-15  8:10               ` Martin Liška
  0 siblings, 1 reply; 10+ messages in thread
From: Erick Ochoa @ 2022-07-14 14:25 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, GCC Development

On Thu, 14 Jul 2022 at 16:10, Martin Liška <mliska@suse.cz> wrote:

> On 7/14/22 16:08, Erick Ochoa via Gcc wrote:
> > Last time I checked, value profiling can only track a single value per
> > statement.
>
> Hi.
>
> Take a look at HIST_TYPE_INDIR_CALL which we use for tracking at maximum 32
> (#define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32) and we use for indirect call
> speculative calls which you can see for instance here:
>
> ./gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C
>

Thanks Martin,

I'll give it a read. However, I have mis-spoken. If my understanding is
correct: multiple values are tracked, but only the values of a single
variable/expression per statement are tracked. That means that for a gcall
(which is a single statement and) which has n argument expressions, I
believe that the naive way to track all argument expressions is not
possible without extending how histograms are associated to statements.
Perhaps canonicalizing how callsites work (i.e., only variables are allowed
as arguments in call sites and then associating a histogram to the
definition of the variables being used in call sites) would be enough, but
I haven't given it much thought for the consequences that might follow from
this.

>
> Cheers,
> Martin
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-14 14:25             ` Erick Ochoa
@ 2022-07-15  8:10               ` Martin Liška
  2022-07-15  8:33                 ` Erick Ochoa
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Liška @ 2022-07-15  8:10 UTC (permalink / raw)
  To: Erick Ochoa; +Cc: Richard Biener, GCC Development

On 7/14/22 16:25, Erick Ochoa wrote:
> 
> 
> On Thu, 14 Jul 2022 at 16:10, Martin Liška <mliska@suse.cz <mailto:mliska@suse.cz>> wrote:
> 
>     On 7/14/22 16:08, Erick Ochoa via Gcc wrote:
>     > Last time I checked, value profiling can only track a single value per
>     > statement.
> 
>     Hi.
> 
>     Take a look at HIST_TYPE_INDIR_CALL which we use for tracking at maximum 32
>     (#define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32) and we use for indirect call
>     speculative calls which you can see for instance here:
> 
>     ./gcc/testsuite/g++.dg/tree-prof/indir-call-prof.C
> 
> 
> Thanks Martin,
> 
> I'll give it a read. However, I have mis-spoken. If my understanding is correct: multiple values are tracked, but only the values of a single variable/expression per statement are tracked. That means that for a gcall (which is a single statement and) which has n argument expressions, I believe that the naive way to track all argument expressions is not possible without extending how histograms are associated to statements. Perhaps canonicalizing how callsites work (i.e., only variables are allowed as arguments in call sites and then associating a histogram to the definition of the variables being used in call sites) would be enough, but I haven't given it much thought for the consequences that might follow from this.

Yes, you are correct that we track only one type of histogram per each gimple statement:

histogram_value
gimple_histogram_value_of_type (struct function *fun, gimple *stmt,
				enum hist_type type)
{
  histogram_value hist;
  for (hist = gimple_histogram_value (fun, stmt); hist;
       hist = hist->hvalue.next)
    if (hist->type == type)
      return hist;
  return NULL;
}

but that could be theoretically relaxed for your use-case.

Cheers,
Martin

>  
> 
> 
>     Cheers,
>     Martin
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Creating a wrapper around a function at compile time
  2022-07-15  8:10               ` Martin Liška
@ 2022-07-15  8:33                 ` Erick Ochoa
  0 siblings, 0 replies; 10+ messages in thread
From: Erick Ochoa @ 2022-07-15  8:33 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, GCC Development

Awesome! Thanks!

Let's go a little bit into the transformation mechanics itself. I am
somewhat familiar with LTO but I am always confused about the
transformations scheduled during WPA.

Let's assume that:

1. we have the profiling pass, which profiles each argument in each
callsite.
2. we are now running in the -fprofile-use so that means that we have
access to the value profiles for each of the arguments
3. there's an IPA/LTO pass which creates edge summaries storing "likely"
values for each of callsite/argument
4. we have a heuristic that let's us decide to some extent which subset of
likely values will yield good results (heavily used and specialization is
greatly beneficial if found) and run it during WPA.

Then, what happens next? I know that the idea is to create some parameter
tests at the callsite and call the respective parameter. The gap in my
understanding is that at this point I am assuming we are in WPA and
therefore have no function bodies to add these parameter tests. Would it be
sufficient to do the following:

1. Store in optimization (edge) summaries the (argument position x value x
specialized function version)
2. During LTRANS actually perform the transformation for the parameter test

Or is there some other way? I am still working on understanding how
ipa_make_edge_direct_to_target with speculative = true adds the edge and
function test during WPA. Otherwise, if we go with the two steps above, the
edges won't be visible to the rest of the IPA optimizations.

I'll be reading the cgraph_edge::make_speculative and the rest of the
infrastructure a bit more closely...

Any help or direction is greatly appreciated. Thanks!

-Erick

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-15  8:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14 12:38 Creating a wrapper around a function at compile time Erick Ochoa
2022-07-14 13:27 ` Richard Biener
2022-07-14 13:29   ` Richard Biener
2022-07-14 13:46     ` Erick Ochoa
2022-07-14 13:50       ` Richard Biener
2022-07-14 14:08         ` Erick Ochoa
2022-07-14 14:10           ` Martin Liška
2022-07-14 14:25             ` Erick Ochoa
2022-07-15  8:10               ` Martin Liška
2022-07-15  8:33                 ` Erick Ochoa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).