public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Question about function splitting
@ 2023-10-02 15:59 Hanke Zhang
  2023-10-02 16:34 ` Martin Jambor
  0 siblings, 1 reply; 5+ messages in thread
From: Hanke Zhang @ 2023-10-02 15:59 UTC (permalink / raw)
  To: gcc

Hi, I have some questions about the strategy and behavior of function
splitting in gcc, like the following code:

int glob;
void f() {
  if (glob) {
    printf("short path\n");
    return;
  }
  // do lots of expensive things
  // ...
}

I hope it can be broken down like below, so that the whole function
can perhaps be inlined, which is more efficient.

int glob;
void f() {
  if (glob) {
    printf("short path\n");
    return;
  }
  f_part();
}

void f_part() {
  // do lots of expensive things
  // ...
}


But on the contrary, gcc splits it like these, which not only does not
bring any benefits, but may increase the time consumption, because the
function call itself is a more resource-intensive thing.

int glob;
void f() {
  if (glob) {
    f_part();
    return;
  }
  // do lots of expensive things
  // ...
}

void f_part() {
  printf("short path\n"); // just do this????
}

Are there any options I can offer to gcc to change this behavior? Or
do I need to make some changes in ipa-split.cc?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about function splitting
  2023-10-02 15:59 Question about function splitting Hanke Zhang
@ 2023-10-02 16:34 ` Martin Jambor
  2023-10-02 17:13   ` Hanke Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Jambor @ 2023-10-02 16:34 UTC (permalink / raw)
  To: Hanke Zhang; +Cc: gcc

Hello,

On Mon, Oct 02 2023, Hanke Zhang via Gcc wrote:
> Hi, I have some questions about the strategy and behavior of function
> splitting in gcc, like the following code:
>
> int glob;
> void f() {
>   if (glob) {
>     printf("short path\n");
>     return;
>   }
>   // do lots of expensive things
>   // ...
> }
>
> I hope it can be broken down like below, so that the whole function
> can perhaps be inlined, which is more efficient.
>
> int glob;
> void f() {
>   if (glob) {
>     printf("short path\n");
>     return;
>   }
>   f_part();
> }
>
> void f_part() {
>   // do lots of expensive things
>   // ...
> }
>
>
> But on the contrary, gcc splits it like these, which not only does not
> bring any benefits, but may increase the time consumption, because the
> function call itself is a more resource-intensive thing.
>
> int glob;
> void f() {
>   if (glob) {
>     f_part();
>     return;
>   }
>   // do lots of expensive things
>   // ...
> }
>
> void f_part() {
>   printf("short path\n"); // just do this????
> }
>
> Are there any options I can offer to gcc to change this behavior? Or
> do I need to make some changes in ipa-split.cc?

I'd suggest you file a bug to Bugzilla with a specific example that is
mis-handled, then we can have a look and discuss what and why happens
and what can be done about it.

Thanks,

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about function splitting
  2023-10-02 16:34 ` Martin Jambor
@ 2023-10-02 17:13   ` Hanke Zhang
  2023-10-04  8:17     ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Hanke Zhang @ 2023-10-02 17:13 UTC (permalink / raw)
  To: Martin Jambor; +Cc: gcc

Martin Jambor <mjambor@suse.cz> 于2023年10月3日周二 00:34写道:
>
> Hello,
>
> On Mon, Oct 02 2023, Hanke Zhang via Gcc wrote:
> > Hi, I have some questions about the strategy and behavior of function
> > splitting in gcc, like the following code:
> >
> > int glob;
> > void f() {
> >   if (glob) {
> >     printf("short path\n");
> >     return;
> >   }
> >   // do lots of expensive things
> >   // ...
> > }
> >
> > I hope it can be broken down like below, so that the whole function
> > can perhaps be inlined, which is more efficient.
> >
> > int glob;
> > void f() {
> >   if (glob) {
> >     printf("short path\n");
> >     return;
> >   }
> >   f_part();
> > }
> >
> > void f_part() {
> >   // do lots of expensive things
> >   // ...
> > }
> >
> >
> > But on the contrary, gcc splits it like these, which not only does not
> > bring any benefits, but may increase the time consumption, because the
> > function call itself is a more resource-intensive thing.
> >
> > int glob;
> > void f() {
> >   if (glob) {
> >     f_part();
> >     return;
> >   }
> >   // do lots of expensive things
> >   // ...
> > }
> >
> > void f_part() {
> >   printf("short path\n"); // just do this????
> > }
> >
> > Are there any options I can offer to gcc to change this behavior? Or
> > do I need to make some changes in ipa-split.cc?
>
> I'd suggest you file a bug to Bugzilla with a specific example that is
> mis-handled, then we can have a look and discuss what and why happens
> and what can be done about it.
>
> Thanks,
>
> Martin

Hi, thanks for your reply.

I'm trying to create an account right now. And I put a copy of the
example code here in case someone is interested.

And I'm using gcc 12.3.0. When you complie the code below via 'gcc
test.c -O3 -flto -fdump-tree-fnsplit', you will find a phenomenon that
is consistent with what I described above in the gimple which is
dumped from fnsplit.

#include <stdio.h>
#include <stdlib.h>

int opstatus;
unsigned char *objcode = 0;
unsigned long position = 0;
char *globalfile;

int test_split_write(char *file) {
  FILE *fhd;

  if (!opstatus) {
    // short path here
    printf("Object code generation not active! Forgot to call "
           "quantum_objcode_start?\n");
    return 1;
  }

  if (!file)
    file = globalfile;

  fhd = fopen(file, "w");

  if (fhd == 0)
    return -1;

  fwrite(objcode, position, 1, fhd);

  fclose(fhd);

  int *arr = malloc(1000);
  for (int i = 0; i < 1000; i++) {
    arr[i] = rand();
  }

  return 0;
}

// to avoid `test_split_write` inlining into main
void __attribute__((noinline)) call() { test_split_write("./txt"); }

int main() {
  opstatus = rand();
  objcode = malloc(100);
  position = 0;
  call();
  return 0;
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about function splitting
  2023-10-02 17:13   ` Hanke Zhang
@ 2023-10-04  8:17     ` Richard Biener
  2023-10-04 14:22       ` Hanke Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2023-10-04  8:17 UTC (permalink / raw)
  To: Hanke Zhang; +Cc: Martin Jambor, gcc

On Mon, Oct 2, 2023 at 7:15 PM Hanke Zhang via Gcc <gcc@gcc.gnu.org> wrote:
>
> Martin Jambor <mjambor@suse.cz> 于2023年10月3日周二 00:34写道:
> >
> > Hello,
> >
> > On Mon, Oct 02 2023, Hanke Zhang via Gcc wrote:
> > > Hi, I have some questions about the strategy and behavior of function
> > > splitting in gcc, like the following code:
> > >
> > > int glob;
> > > void f() {
> > >   if (glob) {
> > >     printf("short path\n");
> > >     return;
> > >   }
> > >   // do lots of expensive things
> > >   // ...
> > > }
> > >
> > > I hope it can be broken down like below, so that the whole function
> > > can perhaps be inlined, which is more efficient.
> > >
> > > int glob;
> > > void f() {
> > >   if (glob) {
> > >     printf("short path\n");
> > >     return;
> > >   }
> > >   f_part();
> > > }
> > >
> > > void f_part() {
> > >   // do lots of expensive things
> > >   // ...
> > > }
> > >
> > >
> > > But on the contrary, gcc splits it like these, which not only does not
> > > bring any benefits, but may increase the time consumption, because the
> > > function call itself is a more resource-intensive thing.
> > >
> > > int glob;
> > > void f() {
> > >   if (glob) {
> > >     f_part();
> > >     return;
> > >   }
> > >   // do lots of expensive things
> > >   // ...
> > > }
> > >
> > > void f_part() {
> > >   printf("short path\n"); // just do this????
> > > }
> > >
> > > Are there any options I can offer to gcc to change this behavior? Or
> > > do I need to make some changes in ipa-split.cc?
> >
> > I'd suggest you file a bug to Bugzilla with a specific example that is
> > mis-handled, then we can have a look and discuss what and why happens
> > and what can be done about it.
> >
> > Thanks,
> >
> > Martin
>
> Hi, thanks for your reply.
>
> I'm trying to create an account right now. And I put a copy of the
> example code here in case someone is interested.
>
> And I'm using gcc 12.3.0. When you complie the code below via 'gcc
> test.c -O3 -flto -fdump-tree-fnsplit', you will find a phenomenon that
> is consistent with what I described above in the gimple which is
> dumped from fnsplit.

I think fnsplit currently splits out _cold_ code, I suppose !opstatus
is predicted to be false most of the time.

It looks like your intent is to inline this very early check as

  if (!opstatus) { test_split_write_1 (..); } else { test_split_write_2 (..); }

to possibly elide that test?  I would guess that IPA-CP is supposed to
do this but eventually refuses to create a clone for this case since
it would be large.

Unfortunately function splitting doesn't run during IPA transforms,
but maybe IPA-CP can be teached how to avoid the expensive clone
by performing what IPA split does in the case a check in the entry
block which splits control flow can be optimized?

Richard.

> #include <stdio.h>
> #include <stdlib.h>
>
> int opstatus;
> unsigned char *objcode = 0;
> unsigned long position = 0;
> char *globalfile;
>
> int test_split_write(char *file) {
>   FILE *fhd;
>
>   if (!opstatus) {
>     // short path here
>     printf("Object code generation not active! Forgot to call "
>            "quantum_objcode_start?\n");
>     return 1;
>   }
>
>   if (!file)
>     file = globalfile;
>
>   fhd = fopen(file, "w");
>
>   if (fhd == 0)
>     return -1;
>
>   fwrite(objcode, position, 1, fhd);
>
>   fclose(fhd);
>
>   int *arr = malloc(1000);
>   for (int i = 0; i < 1000; i++) {
>     arr[i] = rand();
>   }
>
>   return 0;
> }
>
> // to avoid `test_split_write` inlining into main
> void __attribute__((noinline)) call() { test_split_write("./txt"); }
>
> int main() {
>   opstatus = rand();
>   objcode = malloc(100);
>   position = 0;
>   call();
>   return 0;
> }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about function splitting
  2023-10-04  8:17     ` Richard Biener
@ 2023-10-04 14:22       ` Hanke Zhang
  0 siblings, 0 replies; 5+ messages in thread
From: Hanke Zhang @ 2023-10-04 14:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Jambor, gcc

But when I change the code 'opstatus = rand()' to 'opstatus = rand()
%2', the probability of opstatus being 0 should be 50%, but the result
remains the same, i.e. still split at that point.

And the specific information can be found in Bugzilla, the link is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111672

Richard Biener <richard.guenther@gmail.com> 于2023年10月4日周三 16:20写道:
>
> On Mon, Oct 2, 2023 at 7:15 PM Hanke Zhang via Gcc <gcc@gcc.gnu.org> wrote:
> >
> > Martin Jambor <mjambor@suse.cz> 于2023年10月3日周二 00:34写道:
> > >
> > > Hello,
> > >
> > > On Mon, Oct 02 2023, Hanke Zhang via Gcc wrote:
> > > > Hi, I have some questions about the strategy and behavior of function
> > > > splitting in gcc, like the following code:
> > > >
> > > > int glob;
> > > > void f() {
> > > >   if (glob) {
> > > >     printf("short path\n");
> > > >     return;
> > > >   }
> > > >   // do lots of expensive things
> > > >   // ...
> > > > }
> > > >
> > > > I hope it can be broken down like below, so that the whole function
> > > > can perhaps be inlined, which is more efficient.
> > > >
> > > > int glob;
> > > > void f() {
> > > >   if (glob) {
> > > >     printf("short path\n");
> > > >     return;
> > > >   }
> > > >   f_part();
> > > > }
> > > >
> > > > void f_part() {
> > > >   // do lots of expensive things
> > > >   // ...
> > > > }
> > > >
> > > >
> > > > But on the contrary, gcc splits it like these, which not only does not
> > > > bring any benefits, but may increase the time consumption, because the
> > > > function call itself is a more resource-intensive thing.
> > > >
> > > > int glob;
> > > > void f() {
> > > >   if (glob) {
> > > >     f_part();
> > > >     return;
> > > >   }
> > > >   // do lots of expensive things
> > > >   // ...
> > > > }
> > > >
> > > > void f_part() {
> > > >   printf("short path\n"); // just do this????
> > > > }
> > > >
> > > > Are there any options I can offer to gcc to change this behavior? Or
> > > > do I need to make some changes in ipa-split.cc?
> > >
> > > I'd suggest you file a bug to Bugzilla with a specific example that is
> > > mis-handled, then we can have a look and discuss what and why happens
> > > and what can be done about it.
> > >
> > > Thanks,
> > >
> > > Martin
> >
> > Hi, thanks for your reply.
> >
> > I'm trying to create an account right now. And I put a copy of the
> > example code here in case someone is interested.
> >
> > And I'm using gcc 12.3.0. When you complie the code below via 'gcc
> > test.c -O3 -flto -fdump-tree-fnsplit', you will find a phenomenon that
> > is consistent with what I described above in the gimple which is
> > dumped from fnsplit.
>
> I think fnsplit currently splits out _cold_ code, I suppose !opstatus
> is predicted to be false most of the time.
>
> It looks like your intent is to inline this very early check as
>
>   if (!opstatus) { test_split_write_1 (..); } else { test_split_write_2 (..); }
>
> to possibly elide that test?  I would guess that IPA-CP is supposed to
> do this but eventually refuses to create a clone for this case since
> it would be large.
>
> Unfortunately function splitting doesn't run during IPA transforms,
> but maybe IPA-CP can be teached how to avoid the expensive clone
> by performing what IPA split does in the case a check in the entry
> block which splits control flow can be optimized?
>
> Richard.
>
> > #include <stdio.h>
> > #include <stdlib.h>
> >
> > int opstatus;
> > unsigned char *objcode = 0;
> > unsigned long position = 0;
> > char *globalfile;
> >
> > int test_split_write(char *file) {
> >   FILE *fhd;
> >
> >   if (!opstatus) {
> >     // short path here
> >     printf("Object code generation not active! Forgot to call "
> >            "quantum_objcode_start?\n");
> >     return 1;
> >   }
> >
> >   if (!file)
> >     file = globalfile;
> >
> >   fhd = fopen(file, "w");
> >
> >   if (fhd == 0)
> >     return -1;
> >
> >   fwrite(objcode, position, 1, fhd);
> >
> >   fclose(fhd);
> >
> >   int *arr = malloc(1000);
> >   for (int i = 0; i < 1000; i++) {
> >     arr[i] = rand();
> >   }
> >
> >   return 0;
> > }
> >
> > // to avoid `test_split_write` inlining into main
> > void __attribute__((noinline)) call() { test_split_write("./txt"); }
> >
> > int main() {
> >   opstatus = rand();
> >   objcode = malloc(100);
> >   position = 0;
> >   call();
> >   return 0;
> > }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-04 14:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-02 15:59 Question about function splitting Hanke Zhang
2023-10-02 16:34 ` Martin Jambor
2023-10-02 17:13   ` Hanke Zhang
2023-10-04  8:17     ` Richard Biener
2023-10-04 14:22       ` Hanke Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).