GCC missing -flto optimizations? SPEC lbm benchmark

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* GCC missing -flto optimizations?  SPEC lbm benchmark
@ 2019-02-14 19:30 Steve Ellcey
  2019-02-15  9:12 ` Bin.Cheng
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Ellcey @ 2019-02-14 19:30 UTC (permalink / raw)
  To: gcc

I have a question about SPEC CPU 2017 and what GCC can and cannot do
with -flto.  As part of some SPEC analysis I am doing I found that with
-Ofast, ICC and GCC were not that far apart (especially spec int rate,
spec fp rate was a slightly larger difference).

But when I added -ipo to the ICC command and -flto to the GCC command,
the difference got larger.  In particular the 519.lbm_r was more than
twice as fast with ICC and -ipo, but -flto did not help GCC at all.

There are other tests that also show this type of improvement with -ipo
like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone have
any idea on what ICC is doing that GCC is missing?  Is GCC just not
agressive enough with its inlining?

Steve Ellcey
sellcey@marvell.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-14 19:30 GCC missing -flto optimizations? SPEC lbm benchmark Steve Ellcey
@ 2019-02-15  9:12 ` Bin.Cheng
  2019-02-15  9:48   ` Jun Ma
  0 siblings, 1 reply; 13+ messages in thread
From: Bin.Cheng @ 2019-02-15  9:12 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: gcc, Jun Ma

On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> wrote:
>
> I have a question about SPEC CPU 2017 and what GCC can and cannot do
> with -flto.  As part of some SPEC analysis I am doing I found that with
> -Ofast, ICC and GCC were not that far apart (especially spec int rate,
> spec fp rate was a slightly larger difference).
>
> But when I added -ipo to the ICC command and -flto to the GCC command,
> the difference got larger.  In particular the 519.lbm_r was more than
> twice as fast with ICC and -ipo, but -flto did not help GCC at all.
>
> There are other tests that also show this type of improvement with -ipo
> like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
> 548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone have
> any idea on what ICC is doing that GCC is missing?  Is GCC just not
> agressive enough with its inlining?

IIRC Jun did some investigation before? CCing.

Thanks,
bin
>
> Steve Ellcey
> sellcey@marvell.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15  9:12 ` Bin.Cheng
@ 2019-02-15  9:48   ` Jun Ma
  2019-02-15 12:45     ` Hi-Angel
  2019-02-15 17:53     ` [EXT] " Steve Ellcey
  0 siblings, 2 replies; 13+ messages in thread
From: Jun Ma @ 2019-02-15  9:48 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Steve Ellcey, gcc

Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道：

> On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> wrote:
> >
> > I have a question about SPEC CPU 2017 and what GCC can and cannot do
> > with -flto.  As part of some SPEC analysis I am doing I found that with
> > -Ofast, ICC and GCC were not that far apart (especially spec int rate,
> > spec fp rate was a slightly larger difference).
> >
> > But when I added -ipo to the ICC command and -flto to the GCC command,
> > the difference got larger.  In particular the 519.lbm_r was more than
> > twice as fast with ICC and -ipo, but -flto did not help GCC at all.
> >
> > There are other tests that also show this type of improvement with -ipo
> > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
> > 548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone have
> > any idea on what ICC is doing that GCC is missing?  Is GCC just not
> > agressive enough with its inlining?
>
> IIRC Jun did some investigation before? CCing.
>
> Thanks,
> bin
> >
> > Steve Ellcey
> > sellcey@marvell.com

ICC is doing much more than GCC in ipo, especially memory layout
optimizations. See https://software.intel.com/en-us/node/522667.
ICC is more aggressive in array transposition/structure splitting
/field reordering. However, these optimizations have been removed
from GCC long time ago.
As for case lbm_r, IIRC a loop with memory access which stride is 20 is
most time-consuming.  ICC will optimize the array(maybe structure?)
and vectorize the loop under ipo.

Thanks
Jun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15  9:48   ` Jun Ma
@ 2019-02-15 12:45     ` Hi-Angel
  2019-02-15 13:12       ` Richard Biener
  2019-02-15 15:01       ` Ian Lance Taylor
  2019-02-15 17:53     ` [EXT] " Steve Ellcey
  1 sibling, 2 replies; 13+ messages in thread
From: Hi-Angel @ 2019-02-15 12:45 UTC (permalink / raw)
  To: Jun Ma; +Cc: Bin.Cheng, Steve Ellcey, gcc

I never could understand, why field reordering was removed from GCC? I
mean, I know that it's prohibited in C and C++, but, sure, GCC can
detect whether it possibly can influence application behavior, and if
not, just do the reorder.

The veto is important to C/C++ as programming languages, but not to
machine code that is being generated from them. As long as app can't
detect that its fields were reordered through means defined by C/C++,
field reordering by compiler is fine, isn't it?

On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950646@gmail.com> wrote:
>
> Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道：
>
> > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> wrote:
> > >
> > > I have a question about SPEC CPU 2017 and what GCC can and cannot do
> > > with -flto.  As part of some SPEC analysis I am doing I found that with
> > > -Ofast, ICC and GCC were not that far apart (especially spec int rate,
> > > spec fp rate was a slightly larger difference).
> > >
> > > But when I added -ipo to the ICC command and -flto to the GCC command,
> > > the difference got larger.  In particular the 519.lbm_r was more than
> > > twice as fast with ICC and -ipo, but -flto did not help GCC at all.
> > >
> > > There are other tests that also show this type of improvement with -ipo
> > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
> > > 548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone have
> > > any idea on what ICC is doing that GCC is missing?  Is GCC just not
> > > agressive enough with its inlining?
> >
> > IIRC Jun did some investigation before? CCing.
> >
> > Thanks,
> > bin
> > >
> > > Steve Ellcey
> > > sellcey@marvell.com
>
> ICC is doing much more than GCC in ipo, especially memory layout
> optimizations. See https://software.intel.com/en-us/node/522667.
> ICC is more aggressive in array transposition/structure splitting
> /field reordering. However, these optimizations have been removed
> from GCC long time ago.
> As for case lbm_r, IIRC a loop with memory access which stride is 20 is
> most time-consuming.  ICC will optimize the array(maybe structure?)
> and vectorize the loop under ipo.
>
> Thanks
> Jun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15 12:45     ` Hi-Angel
@ 2019-02-15 13:12       ` Richard Biener
  2019-02-15 13:15         ` Jakub Jelinek
  2019-02-15 15:01       ` Ian Lance Taylor
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Biener @ 2019-02-15 13:12 UTC (permalink / raw)
  To: gcc, Hi-Angel, Jun Ma; +Cc: Bin.Cheng, Steve Ellcey, gcc

On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel <hiangel999@gmail.com> wrote:
>I never could understand, why field reordering was removed from GCC?

The implementation simply was seriously broken, bitrotten and unmaintained. 

Richard 

 I
>mean, I know that it's prohibited in C and C++, but, sure, GCC can
>detect whether it possibly can influence application behavior, and if
>not, just do the reorder.
>
>The veto is important to C/C++ as programming languages, but not to
>machine code that is being generated from them. As long as app can't
>detect that its fields were reordered through means defined by C/C++,
>field reordering by compiler is fine, isn't it?
>
>On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950646@gmail.com> wrote:
>>
>> Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道：
>>
>> > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com>
>wrote:
>> > >
>> > > I have a question about SPEC CPU 2017 and what GCC can and cannot
>do
>> > > with -flto.  As part of some SPEC analysis I am doing I found
>that with
>> > > -Ofast, ICC and GCC were not that far apart (especially spec int
>rate,
>> > > spec fp rate was a slightly larger difference).
>> > >
>> > > But when I added -ipo to the ICC command and -flto to the GCC
>command,
>> > > the difference got larger.  In particular the 519.lbm_r was more
>than
>> > > twice as fast with ICC and -ipo, but -flto did not help GCC at
>all.
>> > >
>> > > There are other tests that also show this type of improvement
>with -ipo
>> > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
>> > > 548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone
>have
>> > > any idea on what ICC is doing that GCC is missing?  Is GCC just
>not
>> > > agressive enough with its inlining?
>> >
>> > IIRC Jun did some investigation before? CCing.
>> >
>> > Thanks,
>> > bin
>> > >
>> > > Steve Ellcey
>> > > sellcey@marvell.com
>>
>> ICC is doing much more than GCC in ipo, especially memory layout
>> optimizations. See https://software.intel.com/en-us/node/522667.
>> ICC is more aggressive in array transposition/structure splitting
>> /field reordering. However, these optimizations have been removed
>> from GCC long time ago.
>> As for case lbm_r, IIRC a loop with memory access which stride is 20
>is
>> most time-consuming.  ICC will optimize the array(maybe structure?)
>> and vectorize the loop under ipo.
>>
>> Thanks
>> Jun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15 13:12       ` Richard Biener
@ 2019-02-15 13:15         ` Jakub Jelinek
  2019-02-15 13:34           ` Ramana Radhakrishnan
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2019-02-15 13:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, Hi-Angel, Jun Ma, Bin.Cheng, Steve Ellcey

On Fri, Feb 15, 2019 at 02:12:27PM +0100, Richard Biener wrote:
> On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel <hiangel999@gmail.com> wrote:
> >I never could understand, why field reordering was removed from GCC?
> 
> The implementation simply was seriously broken, bitrotten and unmaintained. 

Which of course doesn't mean somebody else can't submit a new
implementation, as long as it would be properly maintained and would avoid
the issues the old implementation had.  Just it is better not to have it if
it causes lots of wrong-code issues and there is nobody to fix those.

	Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15 13:15         ` Jakub Jelinek
@ 2019-02-15 13:34           ` Ramana Radhakrishnan
  0 siblings, 0 replies; 13+ messages in thread
From: Ramana Radhakrishnan @ 2019-02-15 13:34 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, gcc mailing list, Hi-Angel, Jun Ma, Bin.Cheng,
	Steve Ellcey

On Fri, Feb 15, 2019 at 1:16 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Fri, Feb 15, 2019 at 02:12:27PM +0100, Richard Biener wrote:
> > On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel <hiangel999@gmail.com> wrote:
> > >I never could understand, why field reordering was removed from GCC?
> >
> > The implementation simply was seriously broken, bitrotten and unmaintained.
>
> Which of course doesn't mean somebody else can't submit a new
> implementation, as long as it would be properly maintained and would avoid
> the issues the old implementation had.  Just it is better not to have it if
> it causes lots of wrong-code issues and there is nobody to fix those.

I also remember a cauldron talk in the recent past about this. It was in Prague.
Ah , here's a youtube video of it. :

https://www.youtube.com/watch?v=vhV75sys0Nw



Ramana



>
>         Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15 12:45     ` Hi-Angel
  2019-02-15 13:12       ` Richard Biener
@ 2019-02-15 15:01       ` Ian Lance Taylor
  2019-02-15 18:10         ` Joel Sherrill
  1 sibling, 1 reply; 13+ messages in thread
From: Ian Lance Taylor @ 2019-02-15 15:01 UTC (permalink / raw)
  To: Hi-Angel; +Cc: Jun Ma, Bin.Cheng, Steve Ellcey, gcc

On Fri, Feb 15, 2019 at 4:46 AM Hi-Angel <hiangel999@gmail.com> wrote:
>
> I never could understand, why field reordering was removed from GCC? I
> mean, I know that it's prohibited in C and C++, but, sure, GCC can
> detect whether it possibly can influence application behavior, and if
> not, just do the reorder.
>
> The veto is important to C/C++ as programming languages, but not to
> machine code that is being generated from them. As long as app can't
> detect that its fields were reordered through means defined by C/C++,
> field reordering by compiler is fine, isn't it?

In my opinion field reordering is very hard for the compiler to do
correctly and trivial for a human programmer to do correctly.  So in
practice the best approach is for the compiler, or some other tool, to
say "you should reorder the fields here."  As far as I can see, the
only real reason to implement field reordering in a compiler is for
benchmark cracking, since benchmarks typically don't let you modify
the source code.  It's not a useful optimization in practice other
than for benchmarks.

(Array transformations and struct splitting, on the other hand, can be useful.)

Ian



> On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950646@gmail.com> wrote:
> >
> > Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道：
> >
> > > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> wrote:
> > > >
> > > > I have a question about SPEC CPU 2017 and what GCC can and cannot do
> > > > with -flto.  As part of some SPEC analysis I am doing I found that with
> > > > -Ofast, ICC and GCC were not that far apart (especially spec int rate,
> > > > spec fp rate was a slightly larger difference).
> > > >
> > > > But when I added -ipo to the ICC command and -flto to the GCC command,
> > > > the difference got larger.  In particular the 519.lbm_r was more than
> > > > twice as fast with ICC and -ipo, but -flto did not help GCC at all.
> > > >
> > > > There are other tests that also show this type of improvement with -ipo
> > > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
p> > > > 548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone have
> > > > any idea on what ICC is doing that GCC is missing?  Is GCC just not
> > > > agressive enough with its inlining?
> > >
> > > IIRC Jun did some investigation before? CCing.
> > >
> > > Thanks,
> > > bin
> > > >
> > > > Steve Ellcey
> > > > sellcey@marvell.com
> >
> > ICC is doing much more than GCC in ipo, especially memory layout
> > optimizations. See https://software.intel.com/en-us/node/522667.
> > ICC is more aggressive in array transposition/structure splitting
> > /field reordering. However, these optimizations have been removed
> > from GCC long time ago.
> > As for case lbm_r, IIRC a loop with memory access which stride is 20 is
> > most time-consuming.  ICC will optimize the array(maybe structure?)
> > and vectorize the loop under ipo.
> >
> > Thanks
> > Jun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15 15:01       ` Ian Lance Taylor
@ 2019-02-15 18:10         ` Joel Sherrill
  2019-02-15 18:28           ` Richard Kenner
  2019-02-15 20:52           ` Eric Botcazou
  0 siblings, 2 replies; 13+ messages in thread
From: Joel Sherrill @ 2019-02-15 18:10 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Hi-Angel, Jun Ma, Bin.Cheng, Steve Ellcey, gcc

On Fri, Feb 15, 2019 at 9:02 AM Ian Lance Taylor <iant@golang.org> wrote:

> On Fri, Feb 15, 2019 at 4:46 AM Hi-Angel <hiangel999@gmail.com> wrote:
> >
> > I never could understand, why field reordering was removed from GCC? I
> > mean, I know that it's prohibited in C and C++, but, sure, GCC can
> > detect whether it possibly can influence application behavior, and if
> > not, just do the reorder.
> >
> > The veto is important to C/C++ as programming languages, but not to
> > machine code that is being generated from them. As long as app can't
> > detect that its fields were reordered through means defined by C/C++,
> > field reordering by compiler is fine, isn't it?
>
> In my opinion field reordering is very hard for the compiler to do
> correctly and trivial for a human programmer to do correctly.  So in
> practice the best approach is for the compiler, or some other tool, to
> say "you should reorder the fields here."  As far as I can see, the
> only real reason to implement field reordering in a compiler is for
> benchmark cracking, since benchmarks typically don't let you modify
> the source code.  It's not a useful optimization in practice other
> than for benchmarks.
>

Hasn't GNAT sorted Ada elements in records (e.g. structures) by size
since near its initial addition to GCC in the mid-90s? This results in the
largest elements up front and minimizes the need for alignment gaps.

I know Ada is traditionally more strongly typed than C/C++, but tf it can
be done for Ada programs reliably, why could it not be reliable in C?

>
> (Array transformations and struct splitting, on the other hand, can be
> useful.)
>

--joel

>
> Ian
>
>
>
> > On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950646@gmail.com> wrote:
> > >
> > > Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道：
> > >
> > > > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com>
> wrote:
> > > > >
> > > > > I have a question about SPEC CPU 2017 and what GCC can and cannot
> do
> > > > > with -flto.  As part of some SPEC analysis I am doing I found that
> with
> > > > > -Ofast, ICC and GCC were not that far apart (especially spec int
> rate,
> > > > > spec fp rate was a slightly larger difference).
> > > > >
> > > > > But when I added -ipo to the ICC command and -flto to the GCC
> command,
> > > > > the difference got larger.  In particular the 519.lbm_r was more
> than
> > > > > twice as fast with ICC and -ipo, but -flto did not help GCC at all.
> > > > >
> > > > > There are other tests that also show this type of improvement with
> -ipo
> > > > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and
> p> > > > 548.exchange2_r, but none are as dramatic as 519.lbm_r.  Anyone
> have
> > > > > any idea on what ICC is doing that GCC is missing?  Is GCC just not
> > > > > agressive enough with its inlining?
> > > >
> > > > IIRC Jun did some investigation before? CCing.
> > > >
> > > > Thanks,
> > > > bin
> > > > >
> > > > > Steve Ellcey
> > > > > sellcey@marvell.com
> > >
> > > ICC is doing much more than GCC in ipo, especially memory layout
> > > optimizations. See https://software.intel.com/en-us/node/522667.
> > > ICC is more aggressive in array transposition/structure splitting
> > > /field reordering. However, these optimizations have been removed
> > > from GCC long time ago.
> > > As for case lbm_r, IIRC a loop with memory access which stride is 20 is
> > > most time-consuming.  ICC will optimize the array(maybe structure?)
> > > and vectorize the loop under ipo.
> > >
> > > Thanks
> > > Jun
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15 18:10         ` Joel Sherrill
@ 2019-02-15 18:28           ` Richard Kenner
  2019-02-15 20:52           ` Eric Botcazou
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Kenner @ 2019-02-15 18:28 UTC (permalink / raw)
  To: joel; +Cc: amker.cheng, gcc, hiangel999, iant, majun4950646, sellcey

> Hasn't GNAT sorted Ada elements in records (e.g. structures) by size
> since near its initial addition to GCC in the mid-90s? 

No, it wasn't done early on and it was never done in that major a way
now.  Most reordering (possibly all; I'm not sure) is done between
objects of variable and fixed size, not between objects of differing
fixed sizes.

> I know Ada is traditionally more strongly typed than C/C++, but tf it can
> be done for Ada programs reliably, why could it not be reliable in C?

I don't see it as a reliability issue, but one of expectations.  One might
be using a struct to map some hardware layout or records in a file so that
reordering fields could break things.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15 18:10         ` Joel Sherrill
  2019-02-15 18:28           ` Richard Kenner
@ 2019-02-15 20:52           ` Eric Botcazou
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Botcazou @ 2019-02-15 20:52 UTC (permalink / raw)
  To: joel; +Cc: gcc, Ian Lance Taylor, Hi-Angel, Jun Ma, Bin.Cheng, Steve Ellcey

> Hasn't GNAT sorted Ada elements in records (e.g. structures) by size
> since near its initial addition to GCC in the mid-90s? This results in the
> largest elements up front and minimizes the need for alignment gaps.

No, that's a serious misconception, since one of the features of GNAT is to be 
compatible with C by default as much as possible.  But we started to do some 
reordering recently when the records don't have (direct) equivalents in C.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXT] Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15  9:48   ` Jun Ma
  2019-02-15 12:45     ` Hi-Angel
@ 2019-02-15 17:53     ` Steve Ellcey
  2019-02-16 14:36       ` Jun Ma
  1 sibling, 1 reply; 13+ messages in thread
From: Steve Ellcey @ 2019-02-15 17:53 UTC (permalink / raw)
  To: amker.cheng, majun4950646; +Cc: gcc

On Fri, 2019-02-15 at 17:48 +0800, Jun Ma wrote:
> 
> ICC is doing much more than GCC in ipo, especially memory layout 
> optimizations. See https://software.intel.com/en-us/node/522667.
> ICC is more aggressive in array transposition/structure splitting
> /field reordering. However, these optimizations have been removed
> from GCC long time ago.  
> As for case lbm_r, IIRC a loop with memory access which stride is 20 is 
> most time-consuming.  ICC will optimize the array(maybe structure?) 
> and vectorize the loop under ipo.
>  
> Thanks
> Jun

Interesting.  I tried using '-qno-opt-mem-layout-trans' on ICC
along with '-Ofast -ipo' and that had no affect on the speed.  I also
tried '-no-vec' and that had no affect either.  The only thing that 
slowed down ICC was '-ip-no-inlining' or '-fno-inline'.  I see that
'-Ofast -ipo' resulted in everything (except libc functions) getting
inlined into the main program when using ICC.  GCC did not do that, but
if I forced it to by using the always_inline attribute, GCC could
inline everything into main the way ICC does.  But that did not speed
up the GCC executable.

Steve Ellcey
sellcey@marvell.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [EXT] Re: GCC missing -flto optimizations? SPEC lbm benchmark
  2019-02-15 17:53     ` [EXT] " Steve Ellcey
@ 2019-02-16 14:36       ` Jun Ma
  0 siblings, 0 replies; 13+ messages in thread
From: Jun Ma @ 2019-02-16 14:36 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: amker.cheng, gcc

Steve Ellcey <sellcey@marvell.com> 于2019年2月16日周六 上午1:53写道：

> On Fri, 2019-02-15 at 17:48 +0800, Jun Ma wrote:
> >
> > ICC is doing much more than GCC in ipo, especially memory layout
> > optimizations. See https://software.intel.com/en-us/node/522667.
> > ICC is more aggressive in array transposition/structure splitting
> > /field reordering. However, these optimizations have been removed
> > from GCC long time ago.
> > As for case lbm_r, IIRC a loop with memory access which stride is 20 is
> > most time-consuming.  ICC will optimize the array(maybe structure?)
> > and vectorize the loop under ipo.
> >
> > Thanks
> > Jun
>
> Interesting.  I tried using '-qno-opt-mem-layout-trans' on ICC
> along with '-Ofast -ipo' and that had no affect on the speed.  I also
> tried '-no-vec' and that had no affect either.  The only thing that
> slowed down ICC was '-ip-no-inlining' or '-fno-inline'.  I see that
> '-Ofast -ipo' resulted in everything (except libc functions) getting
> inlined into the main program when using ICC.  GCC did not do that, but
> if I forced it to by using the always_inline attribute, GCC could
> inline everything into main the way ICC does.  But that did not speed
> up the GCC executable.
>
> Steve Ellcey
> sellcey@marvell.com

 you can use '-qopt-report' to see which optimizations has been applied by
icc.

Thanks
Jun

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-02-16 14:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14 19:30 GCC missing -flto optimizations? SPEC lbm benchmark Steve Ellcey
2019-02-15  9:12 ` Bin.Cheng
2019-02-15  9:48   ` Jun Ma
2019-02-15 12:45     ` Hi-Angel
2019-02-15 13:12       ` Richard Biener
2019-02-15 13:15         ` Jakub Jelinek
2019-02-15 13:34           ` Ramana Radhakrishnan
2019-02-15 15:01       ` Ian Lance Taylor
2019-02-15 18:10         ` Joel Sherrill
2019-02-15 18:28           ` Richard Kenner
2019-02-15 20:52           ` Eric Botcazou
2019-02-15 17:53     ` [EXT] " Steve Ellcey
2019-02-16 14:36       ` Jun Ma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).