public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
@ 2017-06-02 11:52 Bin Cheng
  2017-06-06 17:47 ` Jeff Law
  0 siblings, 1 reply; 11+ messages in thread
From: Bin Cheng @ 2017-06-02 11:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

Hi,
This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?

Note I don't have strong opinion here and am fine with either it's accepted or rejected.

Thanks,
bin
2017-05-31  Bin Cheng  <bin.cheng@arm.com>

	* opts.c (default_options_table): Enable OPT_ftree_loop_distribution
	for -O3 and above levels.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-enable-loop-distribution-O3-20170525.txt.patch --]
[-- Type: text/x-patch; name="0005-enable-loop-distribution-O3-20170525.txt.patch", Size: 837 bytes --]

From e7f43d62eb8aa8d29700e5ed1cb737eec813860f Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Tue, 30 May 2017 15:02:36 +0100
Subject: [PATCH 5/5] enable-loop-distribution-O3-20170525.txt

---
 gcc/opts.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/opts.c b/gcc/opts.c
index ffedb10..e2427b3 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -525,6 +525,7 @@ static const struct default_options default_options_table[] =
 
     /* -O3 optimizations.  */
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribution, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
     /* Inlining of functions reducing size is a good idea with -Os
-- 
1.9.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-02 11:52 [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels Bin Cheng
@ 2017-06-06 17:47 ` Jeff Law
  2017-06-07  8:07   ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Law @ 2017-06-06 17:47 UTC (permalink / raw)
  To: Bin Cheng, gcc-patches; +Cc: nd

On 06/02/2017 05:52 AM, Bin Cheng wrote:
> Hi,
> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
> 
> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
> 
> Thanks,
> bin
> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
> 
> 	* opts.c (default_options_table): Enable OPT_ftree_loop_distribution
> 	for -O3 and above levels.
I think the question is how does this generally impact the performance
of the generated code and to a lesser degree compile-time.

Do you have any performance data?

jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-06 17:47 ` Jeff Law
@ 2017-06-07  8:07   ` Bin.Cheng
  2017-06-07  8:33     ` Richard Biener
  2017-06-23  5:05     ` Jeff Law
  0 siblings, 2 replies; 11+ messages in thread
From: Bin.Cheng @ 2017-06-07  8:07 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>> Hi,
>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>
>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>
>> Thanks,
>> bin
>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>
>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>       for -O3 and above levels.
> I think the question is how does this generally impact the performance
> of the generated code and to a lesser degree compile-time.
>
> Do you have any performance data?
Hi Jeff,
At this stage of the patch, only hmmer is impacted and improved
obviously in my local run of spec2006 for x86_64 and AArch64.  In long
term, loop distribution is also one prerequisite transformation to
handle bwaves (at least).  For these two impacted cases, it helps to
resolve the gap against ICC.  I didn't check compilation time slow
down, we can restrict it to problem with small partition number if
that's a problem.

Thanks,
bin
>
> jeff
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-07  8:07   ` Bin.Cheng
@ 2017-06-07  8:33     ` Richard Biener
  2017-06-07  8:49       ` Bin.Cheng
  2017-06-23  5:05     ` Jeff Law
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Biener @ 2017-06-07  8:33 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Jeff Law, gcc-patches

On Wed, Jun 7, 2017 at 10:07 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>> Hi,
>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>
>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>
>>> Thanks,
>>> bin
>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>       for -O3 and above levels.
>> I think the question is how does this generally impact the performance
>> of the generated code and to a lesser degree compile-time.
>>
>> Do you have any performance data?
> Hi Jeff,
> At this stage of the patch, only hmmer is impacted and improved
> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
> term, loop distribution is also one prerequisite transformation to
> handle bwaves (at least).  For these two impacted cases, it helps to
> resolve the gap against ICC.  I didn't check compilation time slow
> down, we can restrict it to problem with small partition number if
> that's a problem.

The source of extra compile-time will be dependence checking which
is quadratic, there is currently no limit in place on (# writes * (#
reads + # writes))
but one could easily be added.

Note that I recently added -fopt-info support for loop distribution so
it should be
possible to get an idea how many loops in SPEC are distributed and if small,
double-check them.

The cost model at this point is very conservative but due to
implementation details
distributing a loop can cause quite some arithmetic to be duplicated like for

int a[1024], b[1204];

void foo()
{
  for (int i = 0; i < 1024; ++i)
    {
       a[i] = i * i * i ... * i;
       b[i] = a[i];
    }
}

it will distribute to two loops both computing i * i * i .... rather than
reading from a[i] in the second loop.

Richard.

> Thanks,
> bin
>>
>> jeff
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-07  8:33     ` Richard Biener
@ 2017-06-07  8:49       ` Bin.Cheng
  2017-06-07  9:49         ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2017-06-07  8:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

On Wed, Jun 7, 2017 at 9:33 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Jun 7, 2017 at 10:07 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>> Hi,
>>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>
>>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>>
>>>> Thanks,
>>>> bin
>>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>       for -O3 and above levels.
>>> I think the question is how does this generally impact the performance
>>> of the generated code and to a lesser degree compile-time.
>>>
>>> Do you have any performance data?
>> Hi Jeff,
>> At this stage of the patch, only hmmer is impacted and improved
>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>> term, loop distribution is also one prerequisite transformation to
>> handle bwaves (at least).  For these two impacted cases, it helps to
>> resolve the gap against ICC.  I didn't check compilation time slow
>> down, we can restrict it to problem with small partition number if
>> that's a problem.
>
> The source of extra compile-time will be dependence checking which
> is quadratic, there is currently no limit in place on (# writes * (#
> reads + # writes))
> but one could easily be added.
Ah yes, the patch moves dependence computation before partition
construction now.  More likely this is the bottleneck now.

>
> Note that I recently added -fopt-info support for loop distribution so
> it should be
> possible to get an idea how many loops in SPEC are distributed and if small,
> double-check them.
During development, quite a lot loops get distributed.  I checked some
of them and restricted the pass to not distribute cases with no good.
But I didn't check with the final version patch.
>
> The cost model at this point is very conservative but due to
> implementation details
> distributing a loop can cause quite some arithmetic to be duplicated like for
>
> int a[1024], b[1204];
>
> void foo()
> {
>   for (int i = 0; i < 1024; ++i)
>     {
>        a[i] = i * i * i ... * i;
>        b[i] = a[i];
>     }
> }
>
> it will distribute to two loops both computing i * i * i .... rather than
> reading from a[i] in the second loop.
Hmm, this patch no longer distributes this case.  I think it is more
conservative than the original model, for example, the ldist tests
changed are now not distributed because there is no good to do it.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>> jeff
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-07  8:49       ` Bin.Cheng
@ 2017-06-07  9:49         ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2017-06-07  9:49 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Jeff Law, gcc-patches

On Wed, Jun 7, 2017 at 10:49 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Jun 7, 2017 at 9:33 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Jun 7, 2017 at 10:07 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>>> Hi,
>>>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>>
>>>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>>>
>>>>> Thanks,
>>>>> bin
>>>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>>       for -O3 and above levels.
>>>> I think the question is how does this generally impact the performance
>>>> of the generated code and to a lesser degree compile-time.
>>>>
>>>> Do you have any performance data?
>>> Hi Jeff,
>>> At this stage of the patch, only hmmer is impacted and improved
>>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>>> term, loop distribution is also one prerequisite transformation to
>>> handle bwaves (at least).  For these two impacted cases, it helps to
>>> resolve the gap against ICC.  I didn't check compilation time slow
>>> down, we can restrict it to problem with small partition number if
>>> that's a problem.
>>
>> The source of extra compile-time will be dependence checking which
>> is quadratic, there is currently no limit in place on (# writes * (#
>> reads + # writes))
>> but one could easily be added.
> Ah yes, the patch moves dependence computation before partition
> construction now.  More likely this is the bottleneck now.

Ah, that's bad (didn't look at the patch yet).  The idea of the current was
that applying any cost based merging reduces the number of checks that
need to be done.

Do you absolutely need to perform dependence checking upfront?

Richard.

>>
>> Note that I recently added -fopt-info support for loop distribution so
>> it should be
>> possible to get an idea how many loops in SPEC are distributed and if small,
>> double-check them.
> During development, quite a lot loops get distributed.  I checked some
> of them and restricted the pass to not distribute cases with no good.
> But I didn't check with the final version patch.
>>
>> The cost model at this point is very conservative but due to
>> implementation details
>> distributing a loop can cause quite some arithmetic to be duplicated like for
>>
>> int a[1024], b[1204];
>>
>> void foo()
>> {
>>   for (int i = 0; i < 1024; ++i)
>>     {
>>        a[i] = i * i * i ... * i;
>>        b[i] = a[i];
>>     }
>> }
>>
>> it will distribute to two loops both computing i * i * i .... rather than
>> reading from a[i] in the second loop.
> Hmm, this patch no longer distributes this case.  I think it is more
> conservative than the original model, for example, the ldist tests
> changed are now not distributed because there is no good to do it.
>
> Thanks,
> bin
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>>>
>>>> jeff
>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-07  8:07   ` Bin.Cheng
  2017-06-07  8:33     ` Richard Biener
@ 2017-06-23  5:05     ` Jeff Law
  2017-06-23  8:47       ` Bin.Cheng
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff Law @ 2017-06-23  5:05 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches

On 06/07/2017 02:07 AM, Bin.Cheng wrote:
> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>> Hi,
>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>
>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>
>>> Thanks,
>>> bin
>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>       for -O3 and above levels.
>> I think the question is how does this generally impact the performance
>> of the generated code and to a lesser degree compile-time.
>>
>> Do you have any performance data?
> Hi Jeff,
> At this stage of the patch, only hmmer is impacted and improved
> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
> term, loop distribution is also one prerequisite transformation to
> handle bwaves (at least).  For these two impacted cases, it helps to
> resolve the gap against ICC.  I didn't check compilation time slow
> down, we can restrict it to problem with small partition number if
> that's a problem.
Just a note. I know you've iterated further with Richi -- I'm not
objecting to the patch, nor was I ready to approve.

Are you and Richi happy with this as-is or are you looking to submit
something newer based on the conversation the two of you have had?

jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-23  5:05     ` Jeff Law
@ 2017-06-23  8:47       ` Bin.Cheng
  2017-06-23 11:04         ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2017-06-23  8:47 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Fri, Jun 23, 2017 at 6:04 AM, Jeff Law <law@redhat.com> wrote:
> On 06/07/2017 02:07 AM, Bin.Cheng wrote:
>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>> Hi,
>>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>
>>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>>
>>>> Thanks,
>>>> bin
>>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>       for -O3 and above levels.
>>> I think the question is how does this generally impact the performance
>>> of the generated code and to a lesser degree compile-time.
>>>
>>> Do you have any performance data?
>> Hi Jeff,
>> At this stage of the patch, only hmmer is impacted and improved
>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>> term, loop distribution is also one prerequisite transformation to
>> handle bwaves (at least).  For these two impacted cases, it helps to
>> resolve the gap against ICC.  I didn't check compilation time slow
>> down, we can restrict it to problem with small partition number if
>> that's a problem.
> Just a note. I know you've iterated further with Richi -- I'm not
> objecting to the patch, nor was I ready to approve.
>
> Are you and Richi happy with this as-is or are you looking to submit
> something newer based on the conversation the two of you have had?
Hi Jeff,
The patch series is updated in various ways according to review
comments, for example, it restricts compilation time by checking
number of data references against MAX_DATAREFS_FOR_DATADEPS as well as
restores data dependence cache.  There are still two missing parts I'd
like to do as followup patches: one is loop nest distribution and the
other is a data-locality cost model (at least) for small cases.  Now
Richi approved most patches except the last major one, but I still
need another iterate for some (approved) patches in order to fix
mistake/typo introduced when I separating the patch.

Thanks,
bin
>
> jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-23  8:47       ` Bin.Cheng
@ 2017-06-23 11:04         ` Richard Biener
  2017-08-07  9:10           ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2017-06-23 11:04 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Jeff Law, gcc-patches

On Fri, Jun 23, 2017 at 10:47 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 6:04 AM, Jeff Law <law@redhat.com> wrote:
>> On 06/07/2017 02:07 AM, Bin.Cheng wrote:
>>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>>> Hi,
>>>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>>
>>>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>>>
>>>>> Thanks,
>>>>> bin
>>>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>>       for -O3 and above levels.
>>>> I think the question is how does this generally impact the performance
>>>> of the generated code and to a lesser degree compile-time.
>>>>
>>>> Do you have any performance data?
>>> Hi Jeff,
>>> At this stage of the patch, only hmmer is impacted and improved
>>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>>> term, loop distribution is also one prerequisite transformation to
>>> handle bwaves (at least).  For these two impacted cases, it helps to
>>> resolve the gap against ICC.  I didn't check compilation time slow
>>> down, we can restrict it to problem with small partition number if
>>> that's a problem.
>> Just a note. I know you've iterated further with Richi -- I'm not
>> objecting to the patch, nor was I ready to approve.
>>
>> Are you and Richi happy with this as-is or are you looking to submit
>> something newer based on the conversation the two of you have had?
> Hi Jeff,
> The patch series is updated in various ways according to review
> comments, for example, it restricts compilation time by checking
> number of data references against MAX_DATAREFS_FOR_DATADEPS as well as
> restores data dependence cache.  There are still two missing parts I'd
> like to do as followup patches: one is loop nest distribution and the
> other is a data-locality cost model (at least) for small cases.  Now
> Richi approved most patches except the last major one, but I still
> need another iterate for some (approved) patches in order to fix
> mistake/typo introduced when I separating the patch.

The patch is ok after the approved parts of the ldist series has been committed.
Note your patch lacks updates to invoke.texi (what options are enabled at -O3).
Please adjust that before committing.

Thanks,
Richard.

> Thanks,
> bin
>>
>> jeff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-06-23 11:04         ` Richard Biener
@ 2017-08-07  9:10           ` Bin.Cheng
  2017-08-08 13:10             ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2017-08-07  9:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]

On Fri, Jun 23, 2017 at 12:04 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 10:47 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Jun 23, 2017 at 6:04 AM, Jeff Law <law@redhat.com> wrote:
>>> On 06/07/2017 02:07 AM, Bin.Cheng wrote:
>>>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>>>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>>>> Hi,
>>>>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>>>
>>>>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>>>>
>>>>>> Thanks,
>>>>>> bin
>>>>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>>>>
>>>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>>>       for -O3 and above levels.
>>>>> I think the question is how does this generally impact the performance
>>>>> of the generated code and to a lesser degree compile-time.
>>>>>
>>>>> Do you have any performance data?
>>>> Hi Jeff,
>>>> At this stage of the patch, only hmmer is impacted and improved
>>>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>>>> term, loop distribution is also one prerequisite transformation to
>>>> handle bwaves (at least).  For these two impacted cases, it helps to
>>>> resolve the gap against ICC.  I didn't check compilation time slow
>>>> down, we can restrict it to problem with small partition number if
>>>> that's a problem.
>>> Just a note. I know you've iterated further with Richi -- I'm not
>>> objecting to the patch, nor was I ready to approve.
>>>
>>> Are you and Richi happy with this as-is or are you looking to submit
>>> something newer based on the conversation the two of you have had?
>> Hi Jeff,
>> The patch series is updated in various ways according to review
>> comments, for example, it restricts compilation time by checking
>> number of data references against MAX_DATAREFS_FOR_DATADEPS as well as
>> restores data dependence cache.  There are still two missing parts I'd
>> like to do as followup patches: one is loop nest distribution and the
>> other is a data-locality cost model (at least) for small cases.  Now
>> Richi approved most patches except the last major one, but I still
>> need another iterate for some (approved) patches in order to fix
>> mistake/typo introduced when I separating the patch.
>
> The patch is ok after the approved parts of the ldist series has been committed.
> Note your patch lacks updates to invoke.texi (what options are enabled at -O3).
> Please adjust that before committing.
Hi All,
Given the loop distribution patches have been merged for a while and
couple of issues fixed.  I am submitting updated patch to enable the
pass by default at O3/above levels.
Bootstrap and test on x86_64 and AArch64 ongoing.  Hmmer still can be
improved.  Is it OK if no failure?

Thanks,
bin
2017-08-07  Bin Cheng  <bin.cheng@arm.com>

    * doc/invoke.texi: Document -ftree-loop-distribution for O3.
    * opts.c (default_options_table): Add OPT_ftree_loop_distribution.

[-- Attachment #2: 0001-enable-loop-distribution-O3-20170802.txt.patch --]
[-- Type: text/x-patch, Size: 2072 bytes --]

From 2bda01a939ac8c0bf54f04f7e29cc0d3155c7626 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Wed, 28 Jun 2017 10:54:17 +0100
Subject: [PATCH] enable-loop-distribution-O3-20170802.txt

---
 gcc/doc/invoke.texi | 21 ++++++++++++++-------
 gcc/opts.c          |  1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5ae9dc4..f48a71a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7248,13 +7248,20 @@ invoking @option{-O2} on programs that use computed gotos.
 @item -O3
 @opindex O3
 Optimize yet more.  @option{-O3} turns on all optimizations specified
-by @option{-O2} and also turns on the @option{-finline-functions},
-@option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
-@option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths}
-@option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
-@option{-ftree-partial-pre}, @option{-fpeel-loops}
-and @option{-fipa-cp-clone} options.
+by @option{-O2} and also turns on the following optimization flags:
+@gccoptlist{-finline-functions @gol
+-funswitch-loops @gol
+-fpredictive-commoning @gol
+-fgcse-after-reload @gol
+-ftree-loop-vectorize @gol
+-ftree-loop-distribution @gol
+-ftree-loop-distribute-patterns @gol
+-fsplit-paths @gol
+-ftree-slp-vectorize @gol
+-fvect-cost-model @gol
+-ftree-partial-pre @gol
+-fpeel-loops @gol
+-fipa-cp-clone}
 
 @item -O0
 @opindex O0
diff --git a/gcc/opts.c b/gcc/opts.c
index 989cc6b..19e8c7f 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -525,6 +525,7 @@ static const struct default_options default_options_table[] =
 
     /* -O3 optimizations.  */
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribution, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
     /* Inlining of functions reducing size is a good idea with -Os
-- 
1.9.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels.
  2017-08-07  9:10           ` Bin.Cheng
@ 2017-08-08 13:10             ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2017-08-08 13:10 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Jeff Law, gcc-patches

On Mon, Aug 7, 2017 at 11:10 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 12:04 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Jun 23, 2017 at 10:47 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Fri, Jun 23, 2017 at 6:04 AM, Jeff Law <law@redhat.com> wrote:
>>>> On 06/07/2017 02:07 AM, Bin.Cheng wrote:
>>>>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>>>>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>>>>> Hi,
>>>>>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>>>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>>>>
>>>>>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> bin
>>>>>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>>>>>
>>>>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>>>>       for -O3 and above levels.
>>>>>> I think the question is how does this generally impact the performance
>>>>>> of the generated code and to a lesser degree compile-time.
>>>>>>
>>>>>> Do you have any performance data?
>>>>> Hi Jeff,
>>>>> At this stage of the patch, only hmmer is impacted and improved
>>>>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>>>>> term, loop distribution is also one prerequisite transformation to
>>>>> handle bwaves (at least).  For these two impacted cases, it helps to
>>>>> resolve the gap against ICC.  I didn't check compilation time slow
>>>>> down, we can restrict it to problem with small partition number if
>>>>> that's a problem.
>>>> Just a note. I know you've iterated further with Richi -- I'm not
>>>> objecting to the patch, nor was I ready to approve.
>>>>
>>>> Are you and Richi happy with this as-is or are you looking to submit
>>>> something newer based on the conversation the two of you have had?
>>> Hi Jeff,
>>> The patch series is updated in various ways according to review
>>> comments, for example, it restricts compilation time by checking
>>> number of data references against MAX_DATAREFS_FOR_DATADEPS as well as
>>> restores data dependence cache.  There are still two missing parts I'd
>>> like to do as followup patches: one is loop nest distribution and the
>>> other is a data-locality cost model (at least) for small cases.  Now
>>> Richi approved most patches except the last major one, but I still
>>> need another iterate for some (approved) patches in order to fix
>>> mistake/typo introduced when I separating the patch.
>>
>> The patch is ok after the approved parts of the ldist series has been committed.
>> Note your patch lacks updates to invoke.texi (what options are enabled at -O3).
>> Please adjust that before committing.
> Hi All,
> Given the loop distribution patches have been merged for a while and
> couple of issues fixed.  I am submitting updated patch to enable the
> pass by default at O3/above levels.
> Bootstrap and test on x86_64 and AArch64 ongoing.  Hmmer still can be
> improved.  Is it OK if no failure?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-08-07  Bin Cheng  <bin.cheng@arm.com>
>
>     * doc/invoke.texi: Document -ftree-loop-distribution for O3.
>     * opts.c (default_options_table): Add OPT_ftree_loop_distribution.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-08 13:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-02 11:52 [PATCH GCC][5/5]Enable tree loop distribution at -O3 and above optimization levels Bin Cheng
2017-06-06 17:47 ` Jeff Law
2017-06-07  8:07   ` Bin.Cheng
2017-06-07  8:33     ` Richard Biener
2017-06-07  8:49       ` Bin.Cheng
2017-06-07  9:49         ` Richard Biener
2017-06-23  5:05     ` Jeff Law
2017-06-23  8:47       ` Bin.Cheng
2017-06-23 11:04         ` Richard Biener
2017-08-07  9:10           ` Bin.Cheng
2017-08-08 13:10             ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).