public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Question on patch -fprofile-partial-training
@ 2023-05-03 19:10 Qing Zhao
  2023-05-04  8:30 ` Martin Liška
  0 siblings, 1 reply; 11+ messages in thread
From: Qing Zhao @ 2023-05-03 19:10 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc Patches

Hi, Jan,

You added the following patch into gcc10:

From 34fbe3f0946f88828765184ed6581bda62cdf49f Mon Sep 17 00:00:00 2001
From: Jan Hubicka <hubicka@ucw.cz>
Date: Thu, 5 Dec 2019 19:12:51 +0100
Subject: [PATCH] cgraphclones.c (localize_profile): New function.

        * cgraphclones.c (localize_profile): New function.
        (cgraph_node::create_clone): Use it for partial profiles.
        * common.opt (fprofile-partial-training): New flag.
        * doc/invoke.texi (-fprofile-partial-training): Document.
        * ipa-cp.c (update_profiling_info): For partial profiles do not
        set function profile to zero.
        * profile.c (compute_branch_probabilities): With partial profile
        watch if edge count is zero and turn all probabilities to guessed.
        (compute_branch_probabilities): For partial profiles do not apply
        profile when entry count is zero.
        * tree-profile.c (tree_profiling): Only do value_profile_transformations
        when profile is read.

My question is:

Can this patch be back ported to GCC8 easily? I am wondering any significant
Change between GCC8 and GCC10 that might make the backporting very hard?

Thanks a lot for your help.

Qing

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-03 19:10 Question on patch -fprofile-partial-training Qing Zhao
@ 2023-05-04  8:30 ` Martin Liška
  2023-05-04 12:54   ` Qing Zhao
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Liška @ 2023-05-04  8:30 UTC (permalink / raw)
  To: Qing Zhao, Jan Hubicka; +Cc: gcc Patches

On 5/3/23 21:10, Qing Zhao via Gcc-patches wrote:
> Hi, Jan,
> 
> You added the following patch into gcc10:
> 
> From 34fbe3f0946f88828765184ed6581bda62cdf49f Mon Sep 17 00:00:00 2001
> From: Jan Hubicka <hubicka@ucw.cz>
> Date: Thu, 5 Dec 2019 19:12:51 +0100
> Subject: [PATCH] cgraphclones.c (localize_profile): New function.
> 
>         * cgraphclones.c (localize_profile): New function.
>         (cgraph_node::create_clone): Use it for partial profiles.
>         * common.opt (fprofile-partial-training): New flag.
>         * doc/invoke.texi (-fprofile-partial-training): Document.
>         * ipa-cp.c (update_profiling_info): For partial profiles do not
>         set function profile to zero.
>         * profile.c (compute_branch_probabilities): With partial profile
>         watch if edge count is zero and turn all probabilities to guessed.
>         (compute_branch_probabilities): For partial profiles do not apply
>         profile when entry count is zero.
>         * tree-profile.c (tree_profiling): Only do value_profile_transformations
>         when profile is read.
> 
> My question is:

Hello.

Why would anybody backport such change to unsupported code-stream of GCC 8?
Generally speaking, I discourage from doing that.

Martin

> 
> Can this patch be back ported to GCC8 easily? I am wondering any significant
> Change between GCC8 and GCC10 that might make the backporting very hard> 
> Thanks a lot for your help.
> 
> Qing


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-04  8:30 ` Martin Liška
@ 2023-05-04 12:54   ` Qing Zhao
  2023-05-04 13:05     ` Martin Liška
  0 siblings, 1 reply; 11+ messages in thread
From: Qing Zhao @ 2023-05-04 12:54 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, gcc Patches



> On May 4, 2023, at 4:30 AM, Martin Liška <mliska@suse.cz> wrote:
> 
> On 5/3/23 21:10, Qing Zhao via Gcc-patches wrote:
>> Hi, Jan,
>> 
>> You added the following patch into gcc10:
>> 
>> From 34fbe3f0946f88828765184ed6581bda62cdf49f Mon Sep 17 00:00:00 2001
>> From: Jan Hubicka <hubicka@ucw.cz>
>> Date: Thu, 5 Dec 2019 19:12:51 +0100
>> Subject: [PATCH] cgraphclones.c (localize_profile): New function.
>> 
>>        * cgraphclones.c (localize_profile): New function.
>>        (cgraph_node::create_clone): Use it for partial profiles.
>>        * common.opt (fprofile-partial-training): New flag.
>>        * doc/invoke.texi (-fprofile-partial-training): Document.
>>        * ipa-cp.c (update_profiling_info): For partial profiles do not
>>        set function profile to zero.
>>        * profile.c (compute_branch_probabilities): With partial profile
>>        watch if edge count is zero and turn all probabilities to guessed.
>>        (compute_branch_probabilities): For partial profiles do not apply
>>        profile when entry count is zero.
>>        * tree-profile.c (tree_profiling): Only do value_profile_transformations
>>        when profile is read.
>> 
>> My question is:
> 
> Hello.
> 
> Why would anybody backport such change to unsupported code-stream of GCC 8?
> Generally speaking, I discourage from doing that.

Yes, I agree.
However, many users still use GCC8 right now, and some of them are asking for more performance
from PGO recently. That’s the reason I am studying this right now. 

From my understanding, -fprofile-partial-training is one important option for PGO performance. I’d like
to see any big technique difficult to prevent it from being back ported to GCC8. 

Thanks.

Qing

> 
> Martin
> 
>> 
>> Can this patch be back ported to GCC8 easily? I am wondering any significant
>> Change between GCC8 and GCC10 that might make the backporting very hard> 
>> Thanks a lot for your help.
>> 
>> Qing


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-04 12:54   ` Qing Zhao
@ 2023-05-04 13:05     ` Martin Liška
  2023-05-04 13:37       ` Qing Zhao
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Liška @ 2023-05-04 13:05 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Jan Hubicka, gcc Patches

On 5/4/23 14:54, Qing Zhao wrote:
> 
> 
>> On May 4, 2023, at 4:30 AM, Martin Liška <mliska@suse.cz> wrote:
>>
>> On 5/3/23 21:10, Qing Zhao via Gcc-patches wrote:
>>> Hi, Jan,
>>>
>>> You added the following patch into gcc10:
>>>
>>> From 34fbe3f0946f88828765184ed6581bda62cdf49f Mon Sep 17 00:00:00 2001
>>> From: Jan Hubicka <hubicka@ucw.cz>
>>> Date: Thu, 5 Dec 2019 19:12:51 +0100
>>> Subject: [PATCH] cgraphclones.c (localize_profile): New function.
>>>
>>>        * cgraphclones.c (localize_profile): New function.
>>>        (cgraph_node::create_clone): Use it for partial profiles.
>>>        * common.opt (fprofile-partial-training): New flag.
>>>        * doc/invoke.texi (-fprofile-partial-training): Document.
>>>        * ipa-cp.c (update_profiling_info): For partial profiles do not
>>>        set function profile to zero.
>>>        * profile.c (compute_branch_probabilities): With partial profile
>>>        watch if edge count is zero and turn all probabilities to guessed.
>>>        (compute_branch_probabilities): For partial profiles do not apply
>>>        profile when entry count is zero.
>>>        * tree-profile.c (tree_profiling): Only do value_profile_transformations
>>>        when profile is read.
>>>
>>> My question is:
>>
>> Hello.
>>
>> Why would anybody backport such change to unsupported code-stream of GCC 8?
>> Generally speaking, I discourage from doing that.
> 
> Yes, I agree.
> However, many users still use GCC8 right now, and some of them are asking for more performance
> from PGO recently. That’s the reason I am studying this right now. 

I understand there are products that are based on GCC8, but as the branch is officially unsupported, I don't
see a reason to backport a new feature from newer release. It's just asking for troubles. If your clients are
interested in more performance, then they should use a recent supported release.

> 
> From my understanding, -fprofile-partial-training is one important option for PGO performance.

I don't think so, speed benefit would be rather small I guess.

> I’d like
> to see any big technique difficult to prevent it from being back ported to GCC8. 

There might be of course some patch dependencies and I don't see a point why should we waste
time with that.

Cheers,
Martin

> 
> Thanks.
> 
> Qing
> 
>>
>> Martin
>>
>>>
>>> Can this patch be back ported to GCC8 easily? I am wondering any significant
>>> Change between GCC8 and GCC10 that might make the backporting very hard> 
>>> Thanks a lot for your help.
>>>
>>> Qing
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-04 13:05     ` Martin Liška
@ 2023-05-04 13:37       ` Qing Zhao
  2023-05-09 10:06         ` Martin Liška
  0 siblings, 1 reply; 11+ messages in thread
From: Qing Zhao @ 2023-05-04 13:37 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jan Hubicka, gcc Patches



> On May 4, 2023, at 9:05 AM, Martin Liška <mliska@suse.cz> wrote:
> 
> On 5/4/23 14:54, Qing Zhao wrote:
>> 
>> 
>>> On May 4, 2023, at 4:30 AM, Martin Liška <mliska@suse.cz> wrote:
>>> 
>>> On 5/3/23 21:10, Qing Zhao via Gcc-patches wrote:
>>>> Hi, Jan,
>>>> 
>>>> You added the following patch into gcc10:
>>>> 
>>>> From 34fbe3f0946f88828765184ed6581bda62cdf49f Mon Sep 17 00:00:00 2001
>>>> From: Jan Hubicka <hubicka@ucw.cz>
>>>> Date: Thu, 5 Dec 2019 19:12:51 +0100
>>>> Subject: [PATCH] cgraphclones.c (localize_profile): New function.
>>>> 
>>>>       * cgraphclones.c (localize_profile): New function.
>>>>       (cgraph_node::create_clone): Use it for partial profiles.
>>>>       * common.opt (fprofile-partial-training): New flag.
>>>>       * doc/invoke.texi (-fprofile-partial-training): Document.
>>>>       * ipa-cp.c (update_profiling_info): For partial profiles do not
>>>>       set function profile to zero.
>>>>       * profile.c (compute_branch_probabilities): With partial profile
>>>>       watch if edge count is zero and turn all probabilities to guessed.
>>>>       (compute_branch_probabilities): For partial profiles do not apply
>>>>       profile when entry count is zero.
>>>>       * tree-profile.c (tree_profiling): Only do value_profile_transformations
>>>>       when profile is read.
>>>> 
>>>> My question is:
>>> 
>>> Hello.
>>> 
>>> Why would anybody backport such change to unsupported code-stream of GCC 8?
>>> Generally speaking, I discourage from doing that.
>> 
>> Yes, I agree.
>> However, many users still use GCC8 right now, and some of them are asking for more performance
>> from PGO recently. That’s the reason I am studying this right now. 
> 
> I understand there are products that are based on GCC8, but as the branch is officially unsupported, I don't
> see a reason to backport a new feature from newer release. It's just asking for troubles. If your clients are
> interested in more performance, then they should use a recent supported release.
We are trying to persuade them to use newer GCC, but it’s quite hard...
> 
>> 
>> From my understanding, -fprofile-partial-training is one important option for PGO performance.
> 
> I don't think so, speed benefit would be rather small I guess.
I saw some articles online to introduce this option for gcc10, 
https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html#sec-gcc10-pgo
And also based on my previous experience in Studio compiler, I guess that this one might have
Some good performance impact on PGO.  Is there any old performance data on this option? (I cannot find online)

thanks.

Qing

> 
>> I’d like
>> to see any big technique difficult to prevent it from being back ported to GCC8. 
> 
> There might be of course some patch dependencies and I don't see a point why should we waste
> time with that.
> 
> Cheers,
> Martin
> 
>> 
>> Thanks.
>> 
>> Qing
>> 
>>> 
>>> Martin
>>> 
>>>> 
>>>> Can this patch be back ported to GCC8 easily? I am wondering any significant
>>>> Change between GCC8 and GCC10 that might make the backporting very hard> 
>>>> Thanks a lot for your help.
>>>> 
>>>> Qing


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-04 13:37       ` Qing Zhao
@ 2023-05-09 10:06         ` Martin Liška
  2023-05-09 10:22           ` Jan Hubicka
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Liška @ 2023-05-09 10:06 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Jan Hubicka, gcc Patches, Martin Jambor

On 5/4/23 15:37, Qing Zhao wrote:
> 
> 
>> On May 4, 2023, at 9:05 AM, Martin Liška <mliska@suse.cz> wrote:
>>
>> On 5/4/23 14:54, Qing Zhao wrote:
>>>
>>>
>>>> On May 4, 2023, at 4:30 AM, Martin Liška <mliska@suse.cz> wrote:
>>>>
>>>> On 5/3/23 21:10, Qing Zhao via Gcc-patches wrote:
>>>>> Hi, Jan,
>>>>>
>>>>> You added the following patch into gcc10:
>>>>>
>>>>>  From 34fbe3f0946f88828765184ed6581bda62cdf49f Mon Sep 17 00:00:00 2001
>>>>> From: Jan Hubicka <hubicka@ucw.cz>
>>>>> Date: Thu, 5 Dec 2019 19:12:51 +0100
>>>>> Subject: [PATCH] cgraphclones.c (localize_profile): New function.
>>>>>
>>>>>        * cgraphclones.c (localize_profile): New function.
>>>>>        (cgraph_node::create_clone): Use it for partial profiles.
>>>>>        * common.opt (fprofile-partial-training): New flag.
>>>>>        * doc/invoke.texi (-fprofile-partial-training): Document.
>>>>>        * ipa-cp.c (update_profiling_info): For partial profiles do not
>>>>>        set function profile to zero.
>>>>>        * profile.c (compute_branch_probabilities): With partial profile
>>>>>        watch if edge count is zero and turn all probabilities to guessed.
>>>>>        (compute_branch_probabilities): For partial profiles do not apply
>>>>>        profile when entry count is zero.
>>>>>        * tree-profile.c (tree_profiling): Only do value_profile_transformations
>>>>>        when profile is read.
>>>>>
>>>>> My question is:
>>>>
>>>> Hello.
>>>>
>>>> Why would anybody backport such change to unsupported code-stream of GCC 8?
>>>> Generally speaking, I discourage from doing that.
>>>
>>> Yes, I agree.
>>> However, many users still use GCC8 right now, and some of them are asking for more performance
>>> from PGO recently. That’s the reason I am studying this right now.
>>
>> I understand there are products that are based on GCC8, but as the branch is officially unsupported, I don't
>> see a reason to backport a new feature from newer release. It's just asking for troubles. If your clients are
>> interested in more performance, then they should use a recent supported release.
> We are trying to persuade them to use newer GCC, but it’s quite hard...
>>
>>>
>>>  From my understanding, -fprofile-partial-training is one important option for PGO performance.
>>
>> I don't think so, speed benefit would be rather small I guess.
> I saw some articles online to introduce this option for gcc10,
> https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html#sec-gcc10-pgo

Hi.

Ah, I see.

> And also based on my previous experience in Studio compiler, I guess that this one might have
> Some good performance impact on PGO.  Is there any old performance data on this option? (I cannot find online)

Maybe Honza can chime in here? Or Martin who is the author of the white paper.

Martin

> 
> thanks.
> 
> Qing
> 
>>
>>> I’d like
>>> to see any big technique difficult to prevent it from being back ported to GCC8.
>>
>> There might be of course some patch dependencies and I don't see a point why should we waste
>> time with that.
>>
>> Cheers,
>> Martin
>>
>>>
>>> Thanks.
>>>
>>> Qing
>>>
>>>>
>>>> Martin
>>>>
>>>>>
>>>>> Can this patch be back ported to GCC8 easily? I am wondering any significant
>>>>> Change between GCC8 and GCC10 that might make the backporting very hard>
>>>>> Thanks a lot for your help.
>>>>>
>>>>> Qing
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-09 10:06         ` Martin Liška
@ 2023-05-09 10:22           ` Jan Hubicka
  2023-05-09 20:11             ` Qing Zhao
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2023-05-09 10:22 UTC (permalink / raw)
  To: Martin Liška; +Cc: Qing Zhao, gcc Patches, Martin Jambor

> > > > 
> > > >  From my understanding, -fprofile-partial-training is one important option for PGO performance.
> > > 
> > > I don't think so, speed benefit would be rather small I guess.
> > I saw some articles online to introduce this option for gcc10,
> > https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html#sec-gcc10-pgo
> 
> Hi.
> 
> Ah, I see.
> 
> > And also based on my previous experience in Studio compiler, I guess that this one might have
> > Some good performance impact on PGO.  Is there any old performance data on this option? (I cannot find online)
> 
> Maybe Honza can chime in here? Or Martin who is the author of the white paper.

Main motivation for this was profiling programs that contain specific
code paths for different CPUs (such as graphics library in Firefox or Linux
kernel). In the situation training machine differs from the machine
program is run later, we end up optimizing for size all code paths
except ones taken by the specific CPU.  This patch essentially tells gcc
to consider every non-trained function as built without profile
feedback.

For Firefox it had important impact on graphics rendering tests back
then since the building machined had AVX while the benchmarking did not.
Some benchmarks improved several times which is not a surprise if you
consider tight graphics rendering loop optimized for size versus
vectorized one.  The patch has bad effect on code size which in turn
impacts performance too, so I think it makes sense to use
-fprofile-partial-training with bit of care (i.e. only one code where
such scenarios are likely).

As for backporting, I do not have checkout of GCC 8 right now. It
depends on profile infrastructure that was added in 2017 (so stage1 of
GCC 8), so the patch may backport quite easilly.  I am not 100% sure
what shape the infrastrucure was in the first version, but I am quite
convinced it had the necessary bits - it was able to make the difference
between 0 profile count and missing profile feedback.

Honza
> 
> Martin
> 
> > 
> > thanks.
> > 
> > Qing
> > 
> > > 
> > > > I’d like
> > > > to see any big technique difficult to prevent it from being back ported to GCC8.
> > > 
> > > There might be of course some patch dependencies and I don't see a point why should we waste
> > > time with that.
> > > 
> > > Cheers,
> > > Martin
> > > 
> > > > 
> > > > Thanks.
> > > > 
> > > > Qing
> > > > 
> > > > > 
> > > > > Martin
> > > > > 
> > > > > > 
> > > > > > Can this patch be back ported to GCC8 easily? I am wondering any significant
> > > > > > Change between GCC8 and GCC10 that might make the backporting very hard>
> > > > > > Thanks a lot for your help.
> > > > > > 
> > > > > > Qing
> > 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-09 10:22           ` Jan Hubicka
@ 2023-05-09 20:11             ` Qing Zhao
  2023-05-10 13:15               ` Jan Hubicka
  0 siblings, 1 reply; 11+ messages in thread
From: Qing Zhao @ 2023-05-09 20:11 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Martin Liška, gcc Patches, Martin Jambor

Honza,

Thanks a lot for your comments. 

> On May 9, 2023, at 6:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
>>>>> 
>>>>> From my understanding, -fprofile-partial-training is one important option for PGO performance.
>>>> 
>>>> I don't think so, speed benefit would be rather small I guess.
>>> I saw some articles online to introduce this option for gcc10,
>>> https://documentation.suse.com/sbp/all/html/SBP-GCC-10/index.html#sec-gcc10-pgo
>> 
>> Hi.
>> 
>> Ah, I see.
>> 
>>> And also based on my previous experience in Studio compiler, I guess that this one might have
>>> Some good performance impact on PGO.  Is there any old performance data on this option? (I cannot find online)
>> 
>> Maybe Honza can chime in here? Or Martin who is the author of the white paper.
> 
> Main motivation for this was profiling programs that contain specific
> code paths for different CPUs (such as graphics library in Firefox or Linux
> kernel). In the situation training machine differs from the machine
> program is run later, we end up optimizing for size all code paths
> except ones taken by the specific CPU.  This patch essentially tells gcc
> to consider every non-trained function as built without profile
> feedback.
Make sense.
> 
> For Firefox it had important impact on graphics rendering tests back
> then since the building machined had AVX while the benchmarking did not.
> Some benchmarks improved several times which is not a surprise if you
> consider tight graphics rendering loop optimized for size versus
> vectorized one.  

That’s a lot of improvement. So, without -fprofile-partial-training, the PGO hurt the performance for those cases? 

> The patch has bad effect on code size which in turn
> impacts performance too, so I think it makes sense to use
> -fprofile-partial-training with bit of care (i.e. only one code where
> such scenarios are likely).

Right. 
> 
> As for backporting, I do not have checkout of GCC 8 right now. It
> depends on profile infrastructure that was added in 2017 (so stage1 of
> GCC 8), so the patch may backport quite easilly.  I am not 100% sure
> what shape the infrastrucure was in the first version, but I am quite
> convinced it had the necessary bits - it was able to make the difference
> between 0 profile count and missing profile feedback.

This is good to know, I will try to back port to GCC8 and let them test to see any good impact.

Qing
> 
> Honza
>> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-09 20:11             ` Qing Zhao
@ 2023-05-10 13:15               ` Jan Hubicka
  2023-05-11 16:08                 ` Qing Zhao
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Hubicka @ 2023-05-10 13:15 UTC (permalink / raw)
  To: Qing Zhao; +Cc: Martin Liška, gcc Patches, Martin Jambor

> Honza,
> > Main motivation for this was profiling programs that contain specific
> > code paths for different CPUs (such as graphics library in Firefox or Linux
> > kernel). In the situation training machine differs from the machine
> > program is run later, we end up optimizing for size all code paths
> > except ones taken by the specific CPU.  This patch essentially tells gcc
> > to consider every non-trained function as built without profile
> > feedback.
> Make sense.
> > 
> > For Firefox it had important impact on graphics rendering tests back
> > then since the building machined had AVX while the benchmarking did not.
> > Some benchmarks improved several times which is not a surprise if you
> > consider tight graphics rendering loop optimized for size versus
> > vectorized one.  
> 
> That’s a lot of improvement. So, without -fprofile-partial-training, the PGO hurt the performance for those cases? 

Yes, to get code size improvements we assume that the non-trained part
of code is cold and with -Os we are very aggressive to optimize for
size.  We now have two-level optimize_for size, so I think we could
make this more fine grained this stage1.

Honza
> 
> > The patch has bad effect on code size which in turn
> > impacts performance too, so I think it makes sense to use
> > -fprofile-partial-training with bit of care (i.e. only one code where
> > such scenarios are likely).
> 
> Right. 
> > 
> > As for backporting, I do not have checkout of GCC 8 right now. It
> > depends on profile infrastructure that was added in 2017 (so stage1 of
> > GCC 8), so the patch may backport quite easilly.  I am not 100% sure
> > what shape the infrastrucure was in the first version, but I am quite
> > convinced it had the necessary bits - it was able to make the difference
> > between 0 profile count and missing profile feedback.
> 
> This is good to know, I will try to back port to GCC8 and let them test to see any good impact.
> 
> Qing
> > 
> > Honza
> >> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-10 13:15               ` Jan Hubicka
@ 2023-05-11 16:08                 ` Qing Zhao
  2023-05-15 18:14                   ` Qing Zhao
  0 siblings, 1 reply; 11+ messages in thread
From: Qing Zhao @ 2023-05-11 16:08 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Martin Liška, gcc Patches, Martin Jambor



> On May 10, 2023, at 9:15 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
>> Honza,
>>> Main motivation for this was profiling programs that contain specific
>>> code paths for different CPUs (such as graphics library in Firefox or Linux
>>> kernel). In the situation training machine differs from the machine
>>> program is run later, we end up optimizing for size all code paths
>>> except ones taken by the specific CPU.  This patch essentially tells gcc
>>> to consider every non-trained function as built without profile
>>> feedback.
>> Make sense.
>>> 
>>> For Firefox it had important impact on graphics rendering tests back
>>> then since the building machined had AVX while the benchmarking did not.
>>> Some benchmarks improved several times which is not a surprise if you
>>> consider tight graphics rendering loop optimized for size versus
>>> vectorized one.  
>> 
>> That’s a lot of improvement. So, without -fprofile-partial-training, the PGO hurt the performance for those cases? 
> 
> Yes, to get code size improvements we assume that the non-trained part
> of code is cold and with -Os we are very aggressive to optimize for
> size.  We now have two-level optimize_for size, so I think we could
> make this more fine grained this stage1.

Okay. I see. 

Thanks a lot for the info.

Another question (which is confusing us very much right now is):

When we lower the following  parameter from 999 to 950: (in GCC8)

DEFPARAM(HOT_BB_COUNT_WS_PERMILLE,
         "hot-bb-count-ws-permille",
         "A basic block profile count is considered hot if it contributes to "
         "the given permillage of the entire profiled execution.”
         999, 0, 1000)

The size of the “text.hot" section is 4x times SMALLER than the default one. Is this expected behavior? 
(From my reading of the GCC8 source code, when this parameter is getting smaller, more basic blocks and functions will
Be considered as HOT by GCC, then the text.hot section should be larger, not smaller, do I miss anything here?)

Thanks a lot for your help.

Qing

> 
> Honza
>> 
>>> The patch has bad effect on code size which in turn
>>> impacts performance too, so I think it makes sense to use
>>> -fprofile-partial-training with bit of care (i.e. only one code where
>>> such scenarios are likely).
>> 
>> Right. 
>>> 
>>> As for backporting, I do not have checkout of GCC 8 right now. It
>>> depends on profile infrastructure that was added in 2017 (so stage1 of
>>> GCC 8), so the patch may backport quite easilly.  I am not 100% sure
>>> what shape the infrastrucure was in the first version, but I am quite
>>> convinced it had the necessary bits - it was able to make the difference
>>> between 0 profile count and missing profile feedback.
>> 
>> This is good to know, I will try to back port to GCC8 and let them test to see any good impact.
>> 
>> Qing
>>> 
>>> Honza


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on patch -fprofile-partial-training
  2023-05-11 16:08                 ` Qing Zhao
@ 2023-05-15 18:14                   ` Qing Zhao
  0 siblings, 0 replies; 11+ messages in thread
From: Qing Zhao @ 2023-05-15 18:14 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Martin Liška, gcc Patches, Martin Jambor



> On May 11, 2023, at 12:08 PM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> 
>> On May 10, 2023, at 9:15 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> 
>>> Honza,
>>>> Main motivation for this was profiling programs that contain specific
>>>> code paths for different CPUs (such as graphics library in Firefox or Linux
>>>> kernel). In the situation training machine differs from the machine
>>>> program is run later, we end up optimizing for size all code paths
>>>> except ones taken by the specific CPU.  This patch essentially tells gcc
>>>> to consider every non-trained function as built without profile
>>>> feedback.
>>> Make sense.
>>>> 
>>>> For Firefox it had important impact on graphics rendering tests back
>>>> then since the building machined had AVX while the benchmarking did not.
>>>> Some benchmarks improved several times which is not a surprise if you
>>>> consider tight graphics rendering loop optimized for size versus
>>>> vectorized one.  
>>> 
>>> That’s a lot of improvement. So, without -fprofile-partial-training, the PGO hurt the performance for those cases? 
>> 
>> Yes, to get code size improvements we assume that the non-trained part
>> of code is cold and with -Os we are very aggressive to optimize for
>> size.  We now have two-level optimize_for size, so I think we could
>> make this more fine grained this stage1.
> 
> Okay. I see. 
> 
> Thanks a lot for the info.
> 
> Another question (which is confusing us very much right now is):
> 
> When we lower the following  parameter from 999 to 950: (in GCC8)
> 
> DEFPARAM(HOT_BB_COUNT_WS_PERMILLE,
>         "hot-bb-count-ws-permille",
>         "A basic block profile count is considered hot if it contributes to "
>         "the given permillage of the entire profiled execution.”
>         999, 0, 1000)
> 
> The size of the “text.hot" section is 4x times SMALLER than the default one. Is this expected behavior? 

As my further study of GCC8, yes, this is the expected behavior. -:).

Qing
> (From my reading of the GCC8 source code, when this parameter is getting smaller, more basic blocks and functions will
> Be considered as HOT by GCC, then the text.hot section should be larger, not smaller, do I miss anything here?)
> 
> Thanks a lot for your help.
> 
> Qing
> 
>> 
>> Honza
>>> 
>>>> The patch has bad effect on code size which in turn
>>>> impacts performance too, so I think it makes sense to use
>>>> -fprofile-partial-training with bit of care (i.e. only one code where
>>>> such scenarios are likely).
>>> 
>>> Right. 
>>>> 
>>>> As for backporting, I do not have checkout of GCC 8 right now. It
>>>> depends on profile infrastructure that was added in 2017 (so stage1 of
>>>> GCC 8), so the patch may backport quite easilly.  I am not 100% sure
>>>> what shape the infrastrucure was in the first version, but I am quite
>>>> convinced it had the necessary bits - it was able to make the difference
>>>> between 0 profile count and missing profile feedback.
>>> 
>>> This is good to know, I will try to back port to GCC8 and let them test to see any good impact.
>>> 
>>> Qing
>>>> 
>>>> Honza


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-15 18:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-03 19:10 Question on patch -fprofile-partial-training Qing Zhao
2023-05-04  8:30 ` Martin Liška
2023-05-04 12:54   ` Qing Zhao
2023-05-04 13:05     ` Martin Liška
2023-05-04 13:37       ` Qing Zhao
2023-05-09 10:06         ` Martin Liška
2023-05-09 10:22           ` Jan Hubicka
2023-05-09 20:11             ` Qing Zhao
2023-05-10 13:15               ` Jan Hubicka
2023-05-11 16:08                 ` Qing Zhao
2023-05-15 18:14                   ` Qing Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).