public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Increase lto-min-partition
@ 2016-09-22 13:19 Wilco Dijkstra
  2016-09-22 13:42 ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Wilco Dijkstra @ 2016-09-22 13:19 UTC (permalink / raw)
  To: GCC Patches; +Cc: nd

Increase the lto-min-partition size to 50000 to reduce the number of partitions.
See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a concise 
explanation why 10000 is too small for modern CPU/memory size.  Additionally,
larger values increase optimization opportunities and reduce bad decisions in the
layout of global variables across partitions (anchors do not work well with LTO).
Looking at SPEC2000, 8 more benchmarks now use a single LTO partition which
is the most optimal.  Build time with LTO increases only slightly, eg. SPEC2006
now takes 2% more time on an 8-core ARM server.

ChangeLog:
2016-09-22  Wilco Dijkstra  <wdijkstr@arm.com>

    gcc/
	* params.def (MIN_PARTITION_SIZE): Increase to 50000.

--
diff --git a/gcc/params.def b/gcc/params.def
index 79b7dd4cca9ec1bb67a64725fb1a596b6e937419..da8fd1825e15f2aa800b1c8b680985776c1080ed 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1045,7 +1045,7 @@ DEFPARAM (PARAM_LTO_PARTITIONS,
 DEFPARAM (MIN_PARTITION_SIZE,
 	  "lto-min-partition",
 	  "Minimal size of a partition for LTO (in estimated instructions).",
-	  10000, 0, 0)
+	  50000, 0, 0)
 
 DEFPARAM (MAX_PARTITION_SIZE,
 	  "lto-max-partition",

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-22 13:19 [PATCH] Increase lto-min-partition Wilco Dijkstra
@ 2016-09-22 13:42 ` Richard Biener
  2016-09-22 13:46   ` Markus Trippelsdorf
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2016-09-22 13:42 UTC (permalink / raw)
  To: Wilco Dijkstra, Markus Trippelsdorf; +Cc: GCC Patches, nd

On Thu, Sep 22, 2016 at 3:13 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Increase the lto-min-partition size to 50000 to reduce the number of partitions.
> See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a concise
> explanation why 10000 is too small for modern CPU/memory size.  Additionally,
> larger values increase optimization opportunities and reduce bad decisions in the
> layout of global variables across partitions (anchors do not work well with LTO).
> Looking at SPEC2000, 8 more benchmarks now use a single LTO partition which
> is the most optimal.  Build time with LTO increases only slightly, eg. SPEC2006
> now takes 2% more time on an 8-core ARM server.

Ok.  Marcus, how many partitions do we get with libreoffice/firefox currently
(I suppose they all hit lto-max-partition now?)

Thanks,
Richard.

> ChangeLog:
> 2016-09-22  Wilco Dijkstra  <wdijkstr@arm.com>
>
>     gcc/
>         * params.def (MIN_PARTITION_SIZE): Increase to 50000.
>
> --
> diff --git a/gcc/params.def b/gcc/params.def
> index 79b7dd4cca9ec1bb67a64725fb1a596b6e937419..da8fd1825e15f2aa800b1c8b680985776c1080ed 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -1045,7 +1045,7 @@ DEFPARAM (PARAM_LTO_PARTITIONS,
>  DEFPARAM (MIN_PARTITION_SIZE,
>           "lto-min-partition",
>           "Minimal size of a partition for LTO (in estimated instructions).",
> -         10000, 0, 0)
> +         50000, 0, 0)
>
>  DEFPARAM (MAX_PARTITION_SIZE,
>           "lto-max-partition",
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-22 13:42 ` Richard Biener
@ 2016-09-22 13:46   ` Markus Trippelsdorf
  2016-09-23 13:15     ` Markus Trippelsdorf
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2016-09-22 13:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: Wilco Dijkstra, GCC Patches, nd

On 2016.09.22 at 15:36 +0200, Richard Biener wrote:
> On Thu, Sep 22, 2016 at 3:13 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > Increase the lto-min-partition size to 50000 to reduce the number of partitions.
> > See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a concise
> > explanation why 10000 is too small for modern CPU/memory size.  Additionally,
> > larger values increase optimization opportunities and reduce bad decisions in the
> > layout of global variables across partitions (anchors do not work well with LTO).
> > Looking at SPEC2000, 8 more benchmarks now use a single LTO partition which
> > is the most optimal.  Build time with LTO increases only slightly, eg. SPEC2006
> > now takes 2% more time on an 8-core ARM server.
> 
> Ok.  Marcus, how many partitions do we get with libreoffice/firefox currently
> (I suppose they all hit lto-max-partition now?)

Yes. Even tramp3d currently gets 30 partitions. With this patch it gets
reduced to 20.
And I guess bigger projects like Firefox are unchanged at 32.

-- 
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-22 13:46   ` Markus Trippelsdorf
@ 2016-09-23 13:15     ` Markus Trippelsdorf
  2016-09-23 13:31       ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2016-09-23 13:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: Wilco Dijkstra, GCC Patches, nd

On 2016.09.22 at 15:42 +0200, Markus Trippelsdorf wrote:
> On 2016.09.22 at 15:36 +0200, Richard Biener wrote:
> > On Thu, Sep 22, 2016 at 3:13 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > > Increase the lto-min-partition size to 50000 to reduce the number of partitions.
> > > See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a concise
> > > explanation why 10000 is too small for modern CPU/memory size.  Additionally,
> > > larger values increase optimization opportunities and reduce bad decisions in the
> > > layout of global variables across partitions (anchors do not work well with LTO).
> > > Looking at SPEC2000, 8 more benchmarks now use a single LTO partition which
> > > is the most optimal.  Build time with LTO increases only slightly, eg. SPEC2006
> > > now takes 2% more time on an 8-core ARM server.
> > 
> > Ok.  Marcus, how many partitions do we get with libreoffice/firefox currently
> > (I suppose they all hit lto-max-partition now?)
> 
> Yes. Even tramp3d currently gets 30 partitions. With this patch it gets
> reduced to 20.
> And I guess bigger projects like Firefox are unchanged at 32.

Sorry I've reported wrong numbers above.

lto-min-partition was already increased from 1000 to 10000 on trunk by
Prathamesh in April.
And tramp3d only uses ten partitions (lto-min-partition=10000).
With lto-min-partition=50000 (current patch) this decrease to only two
partitions. As a result we loose the possible speedup on many core
machines (-flto=n).

E.g. on my 4-core machine I get the following tramp3d compile times with
-flto=4:

lto-min-partition=50000: 20.146 total
lto-min-partition=10000: 16.299 total
lto-min-partition=1000 : 16.093 total

So 50000 looks too big to me. 

Also the "increased optimization opportunities" with fewer partitions
were unmeasurable in the past. If I recall correctly Honza once said
that there should be no difference between single vs. many partitions.

-- 
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-23 13:15     ` Markus Trippelsdorf
@ 2016-09-23 13:31       ` Richard Biener
  2016-09-23 13:48         ` Richard Biener
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Richard Biener @ 2016-09-23 13:31 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Wilco Dijkstra, GCC Patches, nd

On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2016.09.22 at 15:42 +0200, Markus Trippelsdorf wrote:
>> On 2016.09.22 at 15:36 +0200, Richard Biener wrote:
>> > On Thu, Sep 22, 2016 at 3:13 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>> > > Increase the lto-min-partition size to 50000 to reduce the number of partitions.
>> > > See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a concise
>> > > explanation why 10000 is too small for modern CPU/memory size.  Additionally,
>> > > larger values increase optimization opportunities and reduce bad decisions in the
>> > > layout of global variables across partitions (anchors do not work well with LTO).
>> > > Looking at SPEC2000, 8 more benchmarks now use a single LTO partition which
>> > > is the most optimal.  Build time with LTO increases only slightly, eg. SPEC2006
>> > > now takes 2% more time on an 8-core ARM server.
>> >
>> > Ok.  Marcus, how many partitions do we get with libreoffice/firefox currently
>> > (I suppose they all hit lto-max-partition now?)
>>
>> Yes. Even tramp3d currently gets 30 partitions. With this patch it gets
>> reduced to 20.
>> And I guess bigger projects like Firefox are unchanged at 32.
>
> Sorry I've reported wrong numbers above.
>
> lto-min-partition was already increased from 1000 to 10000 on trunk by
> Prathamesh in April.

Ah, I forgot about this.  10000 is equal to large-unit-insns btw and about
four times of large-function-insns.

> And tramp3d only uses ten partitions (lto-min-partition=10000).
> With lto-min-partition=50000 (current patch) this decrease to only two
> partitions. As a result we loose the possible speedup on many core
> machines (-flto=n).
>
> E.g. on my 4-core machine I get the following tramp3d compile times with
> -flto=4:
>
> lto-min-partition=50000: 20.146 total
> lto-min-partition=10000: 16.299 total
> lto-min-partition=1000 : 16.093 total
>
> So 50000 looks too big to me.

I think the issue is that the default number of partitions is too high
(32) which pessimizes 4-core machines if the units are too small.

Maybe we can tune the triplet lto-partitions, lto-min-partition and
lto-max-partition in a way that it roughly scales the number of
partitions produced with program size rather than quickly raising
to 32 and then hovering there until the first unit hits lto-max-partition?

> Also the "increased optimization opportunities" with fewer partitions
> were unmeasurable in the past. If I recall correctly Honza once said
> that there should be no difference between single vs. many partitions.

Well, it definitely makes a difference for late IPA passes (that's mainly
IPA PTA).

Richard.

> --
> Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-23 13:31       ` Richard Biener
@ 2016-09-23 13:48         ` Richard Biener
  2016-09-23 14:23         ` Wilco Dijkstra
  2016-09-24 11:58         ` Markus Trippelsdorf
  2 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2016-09-23 13:48 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Wilco Dijkstra, GCC Patches, nd

On Fri, Sep 23, 2016 at 3:29 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
>> On 2016.09.22 at 15:42 +0200, Markus Trippelsdorf wrote:
>>> On 2016.09.22 at 15:36 +0200, Richard Biener wrote:
>>> > On Thu, Sep 22, 2016 at 3:13 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>>> > > Increase the lto-min-partition size to 50000 to reduce the number of partitions.
>>> > > See eg. https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00235.html for a concise
>>> > > explanation why 10000 is too small for modern CPU/memory size.  Additionally,
>>> > > larger values increase optimization opportunities and reduce bad decisions in the
>>> > > layout of global variables across partitions (anchors do not work well with LTO).
>>> > > Looking at SPEC2000, 8 more benchmarks now use a single LTO partition which
>>> > > is the most optimal.  Build time with LTO increases only slightly, eg. SPEC2006
>>> > > now takes 2% more time on an 8-core ARM server.
>>> >
>>> > Ok.  Marcus, how many partitions do we get with libreoffice/firefox currently
>>> > (I suppose they all hit lto-max-partition now?)
>>>
>>> Yes. Even tramp3d currently gets 30 partitions. With this patch it gets
>>> reduced to 20.
>>> And I guess bigger projects like Firefox are unchanged at 32.
>>
>> Sorry I've reported wrong numbers above.
>>
>> lto-min-partition was already increased from 1000 to 10000 on trunk by
>> Prathamesh in April.
>
> Ah, I forgot about this.  10000 is equal to large-unit-insns btw and about
> four times of large-function-insns.
>
>> And tramp3d only uses ten partitions (lto-min-partition=10000).
>> With lto-min-partition=50000 (current patch) this decrease to only two
>> partitions. As a result we loose the possible speedup on many core
>> machines (-flto=n).
>>
>> E.g. on my 4-core machine I get the following tramp3d compile times with
>> -flto=4:
>>
>> lto-min-partition=50000: 20.146 total
>> lto-min-partition=10000: 16.299 total
>> lto-min-partition=1000 : 16.093 total
>>
>> So 50000 looks too big to me.
>
> I think the issue is that the default number of partitions is too high
> (32) which pessimizes 4-core machines if the units are too small.
>
> Maybe we can tune the triplet lto-partitions, lto-min-partition and
> lto-max-partition in a way that it roughly scales the number of
> partitions produced with program size rather than quickly raising
> to 32 and then hovering there until the first unit hits lto-max-partition?

Which would imply lto-max-partition being on the order of
lto-partitions * lto-min-partition
or simply only having a single lto-partition-size param.

I suppose making all this runtime dependent on # cores isn't something we can do
as this will lead to code-generation changes.

Richard.

>
>> Also the "increased optimization opportunities" with fewer partitions
>> were unmeasurable in the past. If I recall correctly Honza once said
>> that there should be no difference between single vs. many partitions.
>
> Well, it definitely makes a difference for late IPA passes (that's mainly
> IPA PTA).
>
> Richard.
>
>> --
>> Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-23 13:31       ` Richard Biener
  2016-09-23 13:48         ` Richard Biener
@ 2016-09-23 14:23         ` Wilco Dijkstra
  2016-09-23 14:48           ` Markus Trippelsdorf
  2016-09-23 15:41           ` Prathamesh Kulkarni
  2016-09-24 11:58         ` Markus Trippelsdorf
  2 siblings, 2 replies; 15+ messages in thread
From: Wilco Dijkstra @ 2016-09-23 14:23 UTC (permalink / raw)
  To: Richard Biener, Markus Trippelsdorf; +Cc: GCC Patches, nd

Richard Biener wrote:
>On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> > And tramp3d only uses ten partitions (lto-min-partition=10000).
> > With lto-min-partition=50000 (current patch) this decrease to only two
> > partitions. As a result we loose the possible speedup on many core
> > machines (-flto=n).

Only if the size is close to the lto-min-partition. For larger applications there is
little difference.

> > E.g. on my 4-core machine I get the following tramp3d compile times with
> > -flto=4:
> >
> > lto-min-partition=50000: 20.146 total
> > lto-min-partition=10000: 16.299 total
> > lto-min-partition=1000 : 16.093 total
> >
> > So 50000 looks too big to me.

That's only 16 seconds? Seems like it's small so ideally it should have
used a single partition...

> I think the issue is that the default number of partitions is too high
> (32) which pessimizes 4-core machines if the units are too small.

Yes, 8 might be a better value as 32 core machines are rare.

> Maybe we can tune the triplet lto-partitions, lto-min-partition and
> lto-max-partition in a way that it roughly scales the number of
> partitions produced with program size rather than quickly raising
> to 32 and then hovering there until the first unit hits lto-max-partition?

Or use a single partition size rather than have the maximum size 
a hundred times the minimum size (which doesn't make sense at all).

> > Also the "increased optimization opportunities" with fewer partitions
> > were unmeasurable in the past. If I recall correctly Honza once said
> > that there should be no difference between single vs. many partitions.
>
> Well, it definitely makes a difference for late IPA passes (that's mainly
> IPA PTA).

Also anchors don't work with multiple partitions. I get around 1% gain
from using a single partition.

Wilco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-23 14:23         ` Wilco Dijkstra
@ 2016-09-23 14:48           ` Markus Trippelsdorf
  2016-09-23 15:15             ` Wilco Dijkstra
  2016-09-23 15:41           ` Prathamesh Kulkarni
  1 sibling, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2016-09-23 14:48 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Richard Biener, GCC Patches, nd

On 2016.09.23 at 14:19 +0000, Wilco Dijkstra wrote:
> Richard Biener wrote:
> >On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> > > And tramp3d only uses ten partitions (lto-min-partition=10000).
> > > With lto-min-partition=50000 (current patch) this decrease to only two
> > > partitions. As a result we loose the possible speedup on many core
> > > machines (-flto=n).
> 
> Only if the size is close to the lto-min-partition. For larger applications there is
> little difference.
> 
> > > E.g. on my 4-core machine I get the following tramp3d compile times with
> > > -flto=4:
> > >
> > > lto-min-partition=50000: 20.146 total
> > > lto-min-partition=10000: 16.299 total
> > > lto-min-partition=1000 : 16.093 total
> > >
> > > So 50000 looks too big to me.
> 
> That's only 16 seconds? Seems like it's small so ideally it should have
> used a single partition...

What I wanted to point out is that you of course loose the speedup you'll
get from parallel running backends with only a single partition.

 % time g++ -w -Ofast tramp3d-v4.cpp                                                                                                                                    
g++ -w -Ofast tramp3d-v4.cpp  25.61s user 0.31s system 99% cpu 25.944 total

 % time g++ -flto=4 -w -Ofast tramp3d-v4.cpp                                                                                                                            
g++ -flto=4 -w -Ofast tramp3d-v4.cpp  28.15s user 1.02s system 181% cpu 16.075 total

 % time g++ --param=lto-partitions=1 -flto=4 -w -Ofast tramp3d-v4.cpp
g++ --param=lto-partitions=1 -flto=4 -w -Ofast tramp3d-v4.cpp  26.98s user 0.57s system 99% cpu 27.629 total

-- 
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-23 14:48           ` Markus Trippelsdorf
@ 2016-09-23 15:15             ` Wilco Dijkstra
  0 siblings, 0 replies; 15+ messages in thread
From: Wilco Dijkstra @ 2016-09-23 15:15 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Richard Biener, GCC Patches, nd

Markus Trippelsdorf wrote:
> What I wanted to point out is that you of course loose the speedup you'll
> get from parallel running backends with only a single partition.

Absolutely. For every possible value of min-lto-partition you can find an
application that will build with more parallelism if you reduce the partition size.

So the question is whether it's the goal of LTO to build as parallel as possible
at all times? Or should it be set to a fairly large value that keeps plenty of
parallelism for large projects?

Wilco



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-23 14:23         ` Wilco Dijkstra
  2016-09-23 14:48           ` Markus Trippelsdorf
@ 2016-09-23 15:41           ` Prathamesh Kulkarni
  2016-09-26 13:16             ` Wilco Dijkstra
  1 sibling, 1 reply; 15+ messages in thread
From: Prathamesh Kulkarni @ 2016-09-23 15:41 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Richard Biener, Markus Trippelsdorf, GCC Patches, nd

On 23 September 2016 at 19:49, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Richard Biener wrote:
>>On Fri, Sep 23, 2016 at 3:02 PM, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
>> > And tramp3d only uses ten partitions (lto-min-partition=10000).
>> > With lto-min-partition=50000 (current patch) this decrease to only two
>> > partitions. As a result we loose the possible speedup on many core
>> > machines (-flto=n).
>
> Only if the size is close to the lto-min-partition. For larger applications there is
> little difference.
>
>> > E.g. on my 4-core machine I get the following tramp3d compile times with
>> > -flto=4:
>> >
>> > lto-min-partition=50000: 20.146 total
>> > lto-min-partition=10000: 16.299 total
>> > lto-min-partition=1000 : 16.093 total
>> >
>> > So 50000 looks too big to me.
>
> That's only 16 seconds? Seems like it's small so ideally it should have
> used a single partition...
>
>> I think the issue is that the default number of partitions is too high
>> (32) which pessimizes 4-core machines if the units are too small.
>
> Yes, 8 might be a better value as 32 core machines are rare.
>
>> Maybe we can tune the triplet lto-partitions, lto-min-partition and
>> lto-max-partition in a way that it roughly scales the number of
>> partitions produced with program size rather than quickly raising
>> to 32 and then hovering there until the first unit hits lto-max-partition?
>
> Or use a single partition size rather than have the maximum size
> a hundred times the minimum size (which doesn't make sense at all).
>
>> > Also the "increased optimization opportunities" with fewer partitions
>> > were unmeasurable in the past. If I recall correctly Honza once said
>> > that there should be no difference between single vs. many partitions.
>>
>> Well, it definitely makes a difference for late IPA passes (that's mainly
>> IPA PTA).
>
> Also anchors don't work with multiple partitions. I get around 1% gain
> from using a single partition.
Hi Wilco,
I am working on LTO varpool partitioning to improve performance for
section anchors.
I posted a preliminary patch posted at:
https://gcc.gnu.org/ml/gcc/2016-07/msg00033.html
Unfortunately I haven't yet been able to benchmark it on ARM yet.
I am planning to restart working on it again soon.

Building with a single partition is not scalable. LTO build of
chromium with x86->arm
cross with a single partition results in "branch out of range"
assembler error. I added lto-max-partition
primarily to work around that limitation.

Thanks,
Prathamesh
>
> Wilco
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-23 13:31       ` Richard Biener
  2016-09-23 13:48         ` Richard Biener
  2016-09-23 14:23         ` Wilco Dijkstra
@ 2016-09-24 11:58         ` Markus Trippelsdorf
  2016-09-26  8:27           ` Richard Biener
  2 siblings, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2016-09-24 11:58 UTC (permalink / raw)
  To: Richard Biener; +Cc: Wilco Dijkstra, GCC Patches, nd

On 2016.09.23 at 15:29 +0200, Richard Biener wrote:
> >
> > So 50000 looks too big to me.
> 
> I think the issue is that the default number of partitions is too high
> (32) which pessimizes 4-core machines if the units are too small.

The more partitions are used the less memory is required at LTRANS time.

If for example you limit partitions to 4 on a 4-core machine with 8GB
memory, you would start swapping when building Firefox.

And even lto-partitions=8 is slower than the default of 32:

(Firefox libxul build times with gcc-6.)

--param=lto-partitions=8 -flto=4:
1670.19s user 23.39s system 305% cpu 9:14.13 total

default -flto=4:
1668.94s user 32.51s system 320% cpu 8:50.36 total

If someone wants fewer partitions he can use -flto-partition=one/none 
or --param=lto-partitions=1.

-- 
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-24 11:58         ` Markus Trippelsdorf
@ 2016-09-26  8:27           ` Richard Biener
  2016-09-26  9:44             ` Markus Trippelsdorf
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2016-09-26  8:27 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Wilco Dijkstra, GCC Patches, nd

On Sat, Sep 24, 2016 at 10:52 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2016.09.23 at 15:29 +0200, Richard Biener wrote:
>> >
>> > So 50000 looks too big to me.
>>
>> I think the issue is that the default number of partitions is too high
>> (32) which pessimizes 4-core machines if the units are too small.
>
> The more partitions are used the less memory is required at LTRANS time.
>
> If for example you limit partitions to 4 on a 4-core machine with 8GB
> memory, you would start swapping when building Firefox.
>
> And even lto-partitions=8 is slower than the default of 32:
>
> (Firefox libxul build times with gcc-6.)
>
> --param=lto-partitions=8 -flto=4:
> 1670.19s user 23.39s system 305% cpu 9:14.13 total
>
> default -flto=4:
> 1668.94s user 32.51s system 320% cpu 8:50.36 total
>
> If someone wants fewer partitions he can use -flto-partition=one/none
> or --param=lto-partitions=1.

I know all this.  But then we seem to be stuck at 32 partitions from
an input size of 32 * lto-partition-min up to 32 * lto-partition-max
which is currently two orders of magnitude of difference in input size!

That can't be a good heuristic.

It's also about temporary disk space of which we use more the more
partitions we use (because we essentially duplicate the whole global
types/decls section for each partition).

I'm not saying increasing lto-partition-min is the best solution but it
certainly looks like the most appealing one to me.

Richard.

> --
> Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-26  8:27           ` Richard Biener
@ 2016-09-26  9:44             ` Markus Trippelsdorf
  2016-09-26 12:31               ` Wilco Dijkstra
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2016-09-26  9:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Wilco Dijkstra, GCC Patches, nd

On 2016.09.26 at 09:42 +0200, Richard Biener wrote:
> On Sat, Sep 24, 2016 at 10:52 AM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> > On 2016.09.23 at 15:29 +0200, Richard Biener wrote:
> >> >
> >> > So 50000 looks too big to me.
> >>
> >> I think the issue is that the default number of partitions is too high
> >> (32) which pessimizes 4-core machines if the units are too small.
> >
> > The more partitions are used the less memory is required at LTRANS time.
> >
> > If for example you limit partitions to 4 on a 4-core machine with 8GB
> > memory, you would start swapping when building Firefox.
> >
> > And even lto-partitions=8 is slower than the default of 32:
> >
> > (Firefox libxul build times with gcc-6.)
> >
> > --param=lto-partitions=8 -flto=4:
> > 1670.19s user 23.39s system 305% cpu 9:14.13 total
> >
> > default -flto=4:
> > 1668.94s user 32.51s system 320% cpu 8:50.36 total
> >
> > If someone wants fewer partitions he can use -flto-partition=one/none
> > or --param=lto-partitions=1.
>
> I know all this.  But then we seem to be stuck at 32 partitions from
> an input size of 32 * lto-partition-min up to 32 * lto-partition-max
> which is currently two orders of magnitude of difference in input size!
>
> That can't be a good heuristic.
>
> It's also about temporary disk space of which we use more the more
> partitions we use (because we essentially duplicate the whole global
> types/decls section for each partition).
>
> I'm not saying increasing lto-partition-min is the best solution but it
> certainly looks like the most appealing one to me.

I think the current lto-partition-min value of 10000 is reasonable, and
the proposed value of 50000 seems excessive.

Also see the comment in gcc/lto/lto-partition.c:

 428    We compute the expected size of a partition as:
 429
 430      max (total_size / lto_partitions, min_partition_size)
 431
 432    We use dynamic expected size of partition so small programs are partitioned
 433    into enough partitions to allow use of multiple CPUs, while large programs
 434    are not partitioned too much.  Creating too many partitions significantly
 435    increases the streaming overhead.
...
 442    The function implements a simple greedy algorithm.  Nodes are being added
 443    to the current partition until after 3/4 of the expected partition size is
 444    reached.  Past this threshold, we keep track of boundary size (number of
 445    edges going to other partitions) and continue adding functions until after
 446    the current partition has grown to twice the expected partition size,
        or is bigger than max_partition_size.
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ : this sentence should be added.

 447    Then the process is undone to the point where the minimal ratio of boundary size
 448    and in-partition calls was reached.  */


--
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-26  9:44             ` Markus Trippelsdorf
@ 2016-09-26 12:31               ` Wilco Dijkstra
  0 siblings, 0 replies; 15+ messages in thread
From: Wilco Dijkstra @ 2016-09-26 12:31 UTC (permalink / raw)
  To: Markus Trippelsdorf, Richard Biener; +Cc: GCC Patches, nd

Markus Trippelsdorf wrote: 
> On 2016.09.26 at 09:42 +0200, Richard Biener wrote:
> > On Sat, Sep 24, 2016 at 10:52 AM, Markus Trippelsdorf
> > <markus@trippelsdorf.de> wrote:
> > > On 2016.09.23 at 15:29 +0200, Richard Biener wrote:

> > > If for example you limit partitions to 4 on a 4-core machine with 8GB
> > > memory, you would start swapping when building Firefox.
> >
> > > And even lto-partitions=8 is slower than the default of 32:

If certain applications swap with 8 partitions, other applications that are
4 times larger will still swap with 32 partitions, agreed?

Ie. it implies the max partition size is way too large, not that 32 partitions
is best. You'd set it as large as possible to avoid the overhead of having
lots of partitions, but small enough so that a typical machine wouldn't swap.

> Also see the comment in gcc/lto/lto-partition.c:

 428    We compute the expected size of a partition as:
 429
 430      max (total_size / lto_partitions, min_partition_size)

That looks a bit too simplistic with current default settings... So up to
32000 instructions (ie. binary size of ~130KB) it uses as many partitions
as possible of 10000 insns, after that it uses 32 partitions until 32000000
instructions...

Wilco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Increase lto-min-partition
  2016-09-23 15:41           ` Prathamesh Kulkarni
@ 2016-09-26 13:16             ` Wilco Dijkstra
  0 siblings, 0 replies; 15+ messages in thread
From: Wilco Dijkstra @ 2016-09-26 13:16 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: Richard Biener, Markus Trippelsdorf, GCC Patches, nd

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> wrote:

> Hi Wilco,
> I am working on LTO varpool partitioning to improve performance for
> section anchors.
> I posted a preliminary patch posted at:
> https://gcc.gnu.org/ml/gcc/2016-07/msg00033.html
> Unfortunately I haven't yet been able to benchmark it on ARM yet.
> I am planning to restart working on it again soon.

Thanks, I'll have a look. However I'm not 100% convinced smarter symbol
partitioning is the best way forward. Although it should help, it doesn't take into
account which symbols are currently suitable as anchors (-fcommon
is still the default, and big arrays are not suitable). And you still have to make
difficult choices for symbols that are frequently used across most partitions.

So I believe the best solution is to assign anchors early on so that all partitions
can make use of anchors. Assuming we sort symbols on size and frequency,
it should be feasible to use a single anchor for all simple integer global variables
across the whole application. Assigning early should also allow common
variables to be used in anchors, further increasing the benefit.

Do you think that is feasible?

> Building with a single partition is not scalable. LTO build of
> chromium with x86->arm
> cross with a single partition results in "branch out of range"
> assembler error. I added lto-max-partition
> primarily to work around that limitation.

Yes, GCC doesn't split huge compilation units into multiple text sections
so that the linker can insert long branch veneers. So it's a workaround
for LTO but most RISC targets can still hit the same issue with a single
huge file.

Wilco

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-09-26 13:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 13:19 [PATCH] Increase lto-min-partition Wilco Dijkstra
2016-09-22 13:42 ` Richard Biener
2016-09-22 13:46   ` Markus Trippelsdorf
2016-09-23 13:15     ` Markus Trippelsdorf
2016-09-23 13:31       ` Richard Biener
2016-09-23 13:48         ` Richard Biener
2016-09-23 14:23         ` Wilco Dijkstra
2016-09-23 14:48           ` Markus Trippelsdorf
2016-09-23 15:15             ` Wilco Dijkstra
2016-09-23 15:41           ` Prathamesh Kulkarni
2016-09-26 13:16             ` Wilco Dijkstra
2016-09-24 11:58         ` Markus Trippelsdorf
2016-09-26  8:27           ` Richard Biener
2016-09-26  9:44             ` Markus Trippelsdorf
2016-09-26 12:31               ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).