* GCC missing -flto optimizations? SPEC lbm benchmark @ 2019-02-14 19:30 Steve Ellcey 2019-02-15 9:12 ` Bin.Cheng 0 siblings, 1 reply; 13+ messages in thread From: Steve Ellcey @ 2019-02-14 19:30 UTC (permalink / raw) To: gcc I have a question about SPEC CPU 2017 and what GCC can and cannot do with -flto. As part of some SPEC analysis I am doing I found that with -Ofast, ICC and GCC were not that far apart (especially spec int rate, spec fp rate was a slightly larger difference). But when I added -ipo to the ICC command and -flto to the GCC command, the difference got larger. In particular the 519.lbm_r was more than twice as fast with ICC and -ipo, but -flto did not help GCC at all. There are other tests that also show this type of improvement with -ipo like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone have any idea on what ICC is doing that GCC is missing? Is GCC just not agressive enough with its inlining? Steve Ellcey sellcey@marvell.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-14 19:30 GCC missing -flto optimizations? SPEC lbm benchmark Steve Ellcey @ 2019-02-15 9:12 ` Bin.Cheng 2019-02-15 9:48 ` Jun Ma 0 siblings, 1 reply; 13+ messages in thread From: Bin.Cheng @ 2019-02-15 9:12 UTC (permalink / raw) To: Steve Ellcey; +Cc: gcc, Jun Ma On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> wrote: > > I have a question about SPEC CPU 2017 and what GCC can and cannot do > with -flto. As part of some SPEC analysis I am doing I found that with > -Ofast, ICC and GCC were not that far apart (especially spec int rate, > spec fp rate was a slightly larger difference). > > But when I added -ipo to the ICC command and -flto to the GCC command, > the difference got larger. In particular the 519.lbm_r was more than > twice as fast with ICC and -ipo, but -flto did not help GCC at all. > > There are other tests that also show this type of improvement with -ipo > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and > 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone have > any idea on what ICC is doing that GCC is missing? Is GCC just not > agressive enough with its inlining? IIRC Jun did some investigation before? CCing. Thanks, bin > > Steve Ellcey > sellcey@marvell.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 9:12 ` Bin.Cheng @ 2019-02-15 9:48 ` Jun Ma 2019-02-15 12:45 ` Hi-Angel 2019-02-15 17:53 ` [EXT] " Steve Ellcey 0 siblings, 2 replies; 13+ messages in thread From: Jun Ma @ 2019-02-15 9:48 UTC (permalink / raw) To: Bin.Cheng; +Cc: Steve Ellcey, gcc Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道: > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> wrote: > > > > I have a question about SPEC CPU 2017 and what GCC can and cannot do > > with -flto. As part of some SPEC analysis I am doing I found that with > > -Ofast, ICC and GCC were not that far apart (especially spec int rate, > > spec fp rate was a slightly larger difference). > > > > But when I added -ipo to the ICC command and -flto to the GCC command, > > the difference got larger. In particular the 519.lbm_r was more than > > twice as fast with ICC and -ipo, but -flto did not help GCC at all. > > > > There are other tests that also show this type of improvement with -ipo > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and > > 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone have > > any idea on what ICC is doing that GCC is missing? Is GCC just not > > agressive enough with its inlining? > > IIRC Jun did some investigation before? CCing. > > Thanks, > bin > > > > Steve Ellcey > > sellcey@marvell.com ICC is doing much more than GCC in ipo, especially memory layout optimizations. See https://software.intel.com/en-us/node/522667. ICC is more aggressive in array transposition/structure splitting /field reordering. However, these optimizations have been removed from GCC long time ago. As for case lbm_r, IIRC a loop with memory access which stride is 20 is most time-consuming. ICC will optimize the array(maybe structure?) and vectorize the loop under ipo. Thanks Jun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 9:48 ` Jun Ma @ 2019-02-15 12:45 ` Hi-Angel 2019-02-15 13:12 ` Richard Biener 2019-02-15 15:01 ` Ian Lance Taylor 2019-02-15 17:53 ` [EXT] " Steve Ellcey 1 sibling, 2 replies; 13+ messages in thread From: Hi-Angel @ 2019-02-15 12:45 UTC (permalink / raw) To: Jun Ma; +Cc: Bin.Cheng, Steve Ellcey, gcc I never could understand, why field reordering was removed from GCC? I mean, I know that it's prohibited in C and C++, but, sure, GCC can detect whether it possibly can influence application behavior, and if not, just do the reorder. The veto is important to C/C++ as programming languages, but not to machine code that is being generated from them. As long as app can't detect that its fields were reordered through means defined by C/C++, field reordering by compiler is fine, isn't it? On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950646@gmail.com> wrote: > > Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道: > > > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> wrote: > > > > > > I have a question about SPEC CPU 2017 and what GCC can and cannot do > > > with -flto. As part of some SPEC analysis I am doing I found that with > > > -Ofast, ICC and GCC were not that far apart (especially spec int rate, > > > spec fp rate was a slightly larger difference). > > > > > > But when I added -ipo to the ICC command and -flto to the GCC command, > > > the difference got larger. In particular the 519.lbm_r was more than > > > twice as fast with ICC and -ipo, but -flto did not help GCC at all. > > > > > > There are other tests that also show this type of improvement with -ipo > > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and > > > 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone have > > > any idea on what ICC is doing that GCC is missing? Is GCC just not > > > agressive enough with its inlining? > > > > IIRC Jun did some investigation before? CCing. > > > > Thanks, > > bin > > > > > > Steve Ellcey > > > sellcey@marvell.com > > ICC is doing much more than GCC in ipo, especially memory layout > optimizations. See https://software.intel.com/en-us/node/522667. > ICC is more aggressive in array transposition/structure splitting > /field reordering. However, these optimizations have been removed > from GCC long time ago. > As for case lbm_r, IIRC a loop with memory access which stride is 20 is > most time-consuming. ICC will optimize the array(maybe structure?) > and vectorize the loop under ipo. > > Thanks > Jun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 12:45 ` Hi-Angel @ 2019-02-15 13:12 ` Richard Biener 2019-02-15 13:15 ` Jakub Jelinek 2019-02-15 15:01 ` Ian Lance Taylor 1 sibling, 1 reply; 13+ messages in thread From: Richard Biener @ 2019-02-15 13:12 UTC (permalink / raw) To: gcc, Hi-Angel, Jun Ma; +Cc: Bin.Cheng, Steve Ellcey, gcc On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel <hiangel999@gmail.com> wrote: >I never could understand, why field reordering was removed from GCC? The implementation simply was seriously broken, bitrotten and unmaintained. Richard I >mean, I know that it's prohibited in C and C++, but, sure, GCC can >detect whether it possibly can influence application behavior, and if >not, just do the reorder. > >The veto is important to C/C++ as programming languages, but not to >machine code that is being generated from them. As long as app can't >detect that its fields were reordered through means defined by C/C++, >field reordering by compiler is fine, isn't it? > >On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950646@gmail.com> wrote: >> >> Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道: >> >> > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> >wrote: >> > > >> > > I have a question about SPEC CPU 2017 and what GCC can and cannot >do >> > > with -flto. As part of some SPEC analysis I am doing I found >that with >> > > -Ofast, ICC and GCC were not that far apart (especially spec int >rate, >> > > spec fp rate was a slightly larger difference). >> > > >> > > But when I added -ipo to the ICC command and -flto to the GCC >command, >> > > the difference got larger. In particular the 519.lbm_r was more >than >> > > twice as fast with ICC and -ipo, but -flto did not help GCC at >all. >> > > >> > > There are other tests that also show this type of improvement >with -ipo >> > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and >> > > 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone >have >> > > any idea on what ICC is doing that GCC is missing? Is GCC just >not >> > > agressive enough with its inlining? >> > >> > IIRC Jun did some investigation before? CCing. >> > >> > Thanks, >> > bin >> > > >> > > Steve Ellcey >> > > sellcey@marvell.com >> >> ICC is doing much more than GCC in ipo, especially memory layout >> optimizations. See https://software.intel.com/en-us/node/522667. >> ICC is more aggressive in array transposition/structure splitting >> /field reordering. However, these optimizations have been removed >> from GCC long time ago. >> As for case lbm_r, IIRC a loop with memory access which stride is 20 >is >> most time-consuming. ICC will optimize the array(maybe structure?) >> and vectorize the loop under ipo. >> >> Thanks >> Jun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 13:12 ` Richard Biener @ 2019-02-15 13:15 ` Jakub Jelinek 2019-02-15 13:34 ` Ramana Radhakrishnan 0 siblings, 1 reply; 13+ messages in thread From: Jakub Jelinek @ 2019-02-15 13:15 UTC (permalink / raw) To: Richard Biener; +Cc: gcc, Hi-Angel, Jun Ma, Bin.Cheng, Steve Ellcey On Fri, Feb 15, 2019 at 02:12:27PM +0100, Richard Biener wrote: > On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel <hiangel999@gmail.com> wrote: > >I never could understand, why field reordering was removed from GCC? > > The implementation simply was seriously broken, bitrotten and unmaintained. Which of course doesn't mean somebody else can't submit a new implementation, as long as it would be properly maintained and would avoid the issues the old implementation had. Just it is better not to have it if it causes lots of wrong-code issues and there is nobody to fix those. Jakub ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 13:15 ` Jakub Jelinek @ 2019-02-15 13:34 ` Ramana Radhakrishnan 0 siblings, 0 replies; 13+ messages in thread From: Ramana Radhakrishnan @ 2019-02-15 13:34 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, gcc mailing list, Hi-Angel, Jun Ma, Bin.Cheng, Steve Ellcey On Fri, Feb 15, 2019 at 1:16 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Fri, Feb 15, 2019 at 02:12:27PM +0100, Richard Biener wrote: > > On February 15, 2019 1:45:10 PM GMT+01:00, Hi-Angel <hiangel999@gmail.com> wrote: > > >I never could understand, why field reordering was removed from GCC? > > > > The implementation simply was seriously broken, bitrotten and unmaintained. > > Which of course doesn't mean somebody else can't submit a new > implementation, as long as it would be properly maintained and would avoid > the issues the old implementation had. Just it is better not to have it if > it causes lots of wrong-code issues and there is nobody to fix those. I also remember a cauldron talk in the recent past about this. It was in Prague. Ah , here's a youtube video of it. : https://www.youtube.com/watch?v=vhV75sys0Nw Ramana > > Jakub ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 12:45 ` Hi-Angel 2019-02-15 13:12 ` Richard Biener @ 2019-02-15 15:01 ` Ian Lance Taylor 2019-02-15 18:10 ` Joel Sherrill 1 sibling, 1 reply; 13+ messages in thread From: Ian Lance Taylor @ 2019-02-15 15:01 UTC (permalink / raw) To: Hi-Angel; +Cc: Jun Ma, Bin.Cheng, Steve Ellcey, gcc On Fri, Feb 15, 2019 at 4:46 AM Hi-Angel <hiangel999@gmail.com> wrote: > > I never could understand, why field reordering was removed from GCC? I > mean, I know that it's prohibited in C and C++, but, sure, GCC can > detect whether it possibly can influence application behavior, and if > not, just do the reorder. > > The veto is important to C/C++ as programming languages, but not to > machine code that is being generated from them. As long as app can't > detect that its fields were reordered through means defined by C/C++, > field reordering by compiler is fine, isn't it? In my opinion field reordering is very hard for the compiler to do correctly and trivial for a human programmer to do correctly. So in practice the best approach is for the compiler, or some other tool, to say "you should reorder the fields here." As far as I can see, the only real reason to implement field reordering in a compiler is for benchmark cracking, since benchmarks typically don't let you modify the source code. It's not a useful optimization in practice other than for benchmarks. (Array transformations and struct splitting, on the other hand, can be useful.) Ian > On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950646@gmail.com> wrote: > > > > Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道: > > > > > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> wrote: > > > > > > > > I have a question about SPEC CPU 2017 and what GCC can and cannot do > > > > with -flto. As part of some SPEC analysis I am doing I found that with > > > > -Ofast, ICC and GCC were not that far apart (especially spec int rate, > > > > spec fp rate was a slightly larger difference). > > > > > > > > But when I added -ipo to the ICC command and -flto to the GCC command, > > > > the difference got larger. In particular the 519.lbm_r was more than > > > > twice as fast with ICC and -ipo, but -flto did not help GCC at all. > > > > > > > > There are other tests that also show this type of improvement with -ipo > > > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and p> > > > 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone have > > > > any idea on what ICC is doing that GCC is missing? Is GCC just not > > > > agressive enough with its inlining? > > > > > > IIRC Jun did some investigation before? CCing. > > > > > > Thanks, > > > bin > > > > > > > > Steve Ellcey > > > > sellcey@marvell.com > > > > ICC is doing much more than GCC in ipo, especially memory layout > > optimizations. See https://software.intel.com/en-us/node/522667. > > ICC is more aggressive in array transposition/structure splitting > > /field reordering. However, these optimizations have been removed > > from GCC long time ago. > > As for case lbm_r, IIRC a loop with memory access which stride is 20 is > > most time-consuming. ICC will optimize the array(maybe structure?) > > and vectorize the loop under ipo. > > > > Thanks > > Jun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 15:01 ` Ian Lance Taylor @ 2019-02-15 18:10 ` Joel Sherrill 2019-02-15 18:28 ` Richard Kenner 2019-02-15 20:52 ` Eric Botcazou 0 siblings, 2 replies; 13+ messages in thread From: Joel Sherrill @ 2019-02-15 18:10 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: Hi-Angel, Jun Ma, Bin.Cheng, Steve Ellcey, gcc On Fri, Feb 15, 2019 at 9:02 AM Ian Lance Taylor <iant@golang.org> wrote: > On Fri, Feb 15, 2019 at 4:46 AM Hi-Angel <hiangel999@gmail.com> wrote: > > > > I never could understand, why field reordering was removed from GCC? I > > mean, I know that it's prohibited in C and C++, but, sure, GCC can > > detect whether it possibly can influence application behavior, and if > > not, just do the reorder. > > > > The veto is important to C/C++ as programming languages, but not to > > machine code that is being generated from them. As long as app can't > > detect that its fields were reordered through means defined by C/C++, > > field reordering by compiler is fine, isn't it? > > In my opinion field reordering is very hard for the compiler to do > correctly and trivial for a human programmer to do correctly. So in > practice the best approach is for the compiler, or some other tool, to > say "you should reorder the fields here." As far as I can see, the > only real reason to implement field reordering in a compiler is for > benchmark cracking, since benchmarks typically don't let you modify > the source code. It's not a useful optimization in practice other > than for benchmarks. > Hasn't GNAT sorted Ada elements in records (e.g. structures) by size since near its initial addition to GCC in the mid-90s? This results in the largest elements up front and minimizes the need for alignment gaps. I know Ada is traditionally more strongly typed than C/C++, but tf it can be done for Ada programs reliably, why could it not be reliable in C? > > (Array transformations and struct splitting, on the other hand, can be > useful.) > --joel > > Ian > > > > > On Fri, 15 Feb 2019 at 12:49, Jun Ma <majun4950646@gmail.com> wrote: > > > > > > Bin.Cheng <amker.cheng@gmail.com> 于2019年2月15日周五 下午5:12写道: > > > > > > > On Fri, Feb 15, 2019 at 3:30 AM Steve Ellcey <sellcey@marvell.com> > wrote: > > > > > > > > > > I have a question about SPEC CPU 2017 and what GCC can and cannot > do > > > > > with -flto. As part of some SPEC analysis I am doing I found that > with > > > > > -Ofast, ICC and GCC were not that far apart (especially spec int > rate, > > > > > spec fp rate was a slightly larger difference). > > > > > > > > > > But when I added -ipo to the ICC command and -flto to the GCC > command, > > > > > the difference got larger. In particular the 519.lbm_r was more > than > > > > > twice as fast with ICC and -ipo, but -flto did not help GCC at all. > > > > > > > > > > There are other tests that also show this type of improvement with > -ipo > > > > > like 538.imagick_r, 544.nab_r, 525.x264_r, 531.deepsjeng_r, and > p> > > > 548.exchange2_r, but none are as dramatic as 519.lbm_r. Anyone > have > > > > > any idea on what ICC is doing that GCC is missing? Is GCC just not > > > > > agressive enough with its inlining? > > > > > > > > IIRC Jun did some investigation before? CCing. > > > > > > > > Thanks, > > > > bin > > > > > > > > > > Steve Ellcey > > > > > sellcey@marvell.com > > > > > > ICC is doing much more than GCC in ipo, especially memory layout > > > optimizations. See https://software.intel.com/en-us/node/522667. > > > ICC is more aggressive in array transposition/structure splitting > > > /field reordering. However, these optimizations have been removed > > > from GCC long time ago. > > > As for case lbm_r, IIRC a loop with memory access which stride is 20 is > > > most time-consuming. ICC will optimize the array(maybe structure?) > > > and vectorize the loop under ipo. > > > > > > Thanks > > > Jun > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 18:10 ` Joel Sherrill @ 2019-02-15 18:28 ` Richard Kenner 2019-02-15 20:52 ` Eric Botcazou 1 sibling, 0 replies; 13+ messages in thread From: Richard Kenner @ 2019-02-15 18:28 UTC (permalink / raw) To: joel; +Cc: amker.cheng, gcc, hiangel999, iant, majun4950646, sellcey > Hasn't GNAT sorted Ada elements in records (e.g. structures) by size > since near its initial addition to GCC in the mid-90s? No, it wasn't done early on and it was never done in that major a way now. Most reordering (possibly all; I'm not sure) is done between objects of variable and fixed size, not between objects of differing fixed sizes. > I know Ada is traditionally more strongly typed than C/C++, but tf it can > be done for Ada programs reliably, why could it not be reliable in C? I don't see it as a reliability issue, but one of expectations. One might be using a struct to map some hardware layout or records in a file so that reordering fields could break things. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 18:10 ` Joel Sherrill 2019-02-15 18:28 ` Richard Kenner @ 2019-02-15 20:52 ` Eric Botcazou 1 sibling, 0 replies; 13+ messages in thread From: Eric Botcazou @ 2019-02-15 20:52 UTC (permalink / raw) To: joel; +Cc: gcc, Ian Lance Taylor, Hi-Angel, Jun Ma, Bin.Cheng, Steve Ellcey > Hasn't GNAT sorted Ada elements in records (e.g. structures) by size > since near its initial addition to GCC in the mid-90s? This results in the > largest elements up front and minimizes the need for alignment gaps. No, that's a serious misconception, since one of the features of GNAT is to be compatible with C by default as much as possible. But we started to do some reordering recently when the records don't have (direct) equivalents in C. -- Eric Botcazou ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [EXT] Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 9:48 ` Jun Ma 2019-02-15 12:45 ` Hi-Angel @ 2019-02-15 17:53 ` Steve Ellcey 2019-02-16 14:36 ` Jun Ma 1 sibling, 1 reply; 13+ messages in thread From: Steve Ellcey @ 2019-02-15 17:53 UTC (permalink / raw) To: amker.cheng, majun4950646; +Cc: gcc On Fri, 2019-02-15 at 17:48 +0800, Jun Ma wrote: > > ICC is doing much more than GCC in ipo, especially memory layout > optimizations. See https://software.intel.com/en-us/node/522667. > ICC is more aggressive in array transposition/structure splitting > /field reordering. However, these optimizations have been removed > from GCC long time ago. > As for case lbm_r, IIRC a loop with memory access which stride is 20 is > most time-consuming. ICC will optimize the array(maybe structure?) > and vectorize the loop under ipo. > > Thanks > Jun Interesting. I tried using '-qno-opt-mem-layout-trans' on ICC along with '-Ofast -ipo' and that had no affect on the speed. I also tried '-no-vec' and that had no affect either. The only thing that slowed down ICC was '-ip-no-inlining' or '-fno-inline'. I see that '-Ofast -ipo' resulted in everything (except libc functions) getting inlined into the main program when using ICC. GCC did not do that, but if I forced it to by using the always_inline attribute, GCC could inline everything into main the way ICC does. But that did not speed up the GCC executable. Steve Ellcey sellcey@marvell.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [EXT] Re: GCC missing -flto optimizations? SPEC lbm benchmark 2019-02-15 17:53 ` [EXT] " Steve Ellcey @ 2019-02-16 14:36 ` Jun Ma 0 siblings, 0 replies; 13+ messages in thread From: Jun Ma @ 2019-02-16 14:36 UTC (permalink / raw) To: Steve Ellcey; +Cc: amker.cheng, gcc Steve Ellcey <sellcey@marvell.com> 于2019年2月16日周六 上午1:53写道: > On Fri, 2019-02-15 at 17:48 +0800, Jun Ma wrote: > > > > ICC is doing much more than GCC in ipo, especially memory layout > > optimizations. See https://software.intel.com/en-us/node/522667. > > ICC is more aggressive in array transposition/structure splitting > > /field reordering. However, these optimizations have been removed > > from GCC long time ago. > > As for case lbm_r, IIRC a loop with memory access which stride is 20 is > > most time-consuming. ICC will optimize the array(maybe structure?) > > and vectorize the loop under ipo. > > > > Thanks > > Jun > > Interesting. I tried using '-qno-opt-mem-layout-trans' on ICC > along with '-Ofast -ipo' and that had no affect on the speed. I also > tried '-no-vec' and that had no affect either. The only thing that > slowed down ICC was '-ip-no-inlining' or '-fno-inline'. I see that > '-Ofast -ipo' resulted in everything (except libc functions) getting > inlined into the main program when using ICC. GCC did not do that, but > if I forced it to by using the always_inline attribute, GCC could > inline everything into main the way ICC does. But that did not speed > up the GCC executable. > > Steve Ellcey > sellcey@marvell.com you can use '-qopt-report' to see which optimizations has been applied by icc. Thanks Jun ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-02-16 14:36 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-14 19:30 GCC missing -flto optimizations? SPEC lbm benchmark Steve Ellcey 2019-02-15 9:12 ` Bin.Cheng 2019-02-15 9:48 ` Jun Ma 2019-02-15 12:45 ` Hi-Angel 2019-02-15 13:12 ` Richard Biener 2019-02-15 13:15 ` Jakub Jelinek 2019-02-15 13:34 ` Ramana Radhakrishnan 2019-02-15 15:01 ` Ian Lance Taylor 2019-02-15 18:10 ` Joel Sherrill 2019-02-15 18:28 ` Richard Kenner 2019-02-15 20:52 ` Eric Botcazou 2019-02-15 17:53 ` [EXT] " Steve Ellcey 2019-02-16 14:36 ` Jun Ma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).