public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* Advice with finding speed between O2 and O3
@ 2023-05-22 15:31 Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
  2023-05-22 21:36 ` Thomas Koenig
  0 siblings, 1 reply; 5+ messages in thread
From: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] @ 2023-05-22 15:31 UTC (permalink / raw)
  To: fortran

[-- Attachment #1: Type: text/plain, Size: 2454 bytes --]

All,

Recently, one of the computing centers I run on updated their OS. And in that update, the model went from "working with GNU" to "crashing with GNU". No code change on our side, just OS.

Some experimenting later and I found that the code did run with debugging options, and it still ran with our "aggressive" options (much of which is due to Jerry DeLisle from here). Only our release flags failed. Surprising since the Aggressive options seem more likely to have issues as they are speed for speed's sake (different MPI layouts lead to different answers).

But, one of the main differences are the aggressive flags use -O2 and our release flags are -O3. So I test our release flags with -O2 and boom, works again! Bad news: much slower.

Our release flags are (essentially):

  -O3 -march=haswell -mtune=generic -funroll-loops -g -fPIC -fopenmp

so we aren't doing anything fancy (portability at the cost of speed).

Staring at the man page I saw this:

                   gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
                   gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
                   diff /tmp/O2-opts /tmp/O3-opts | grep enabled

and when I did that I saw:

$ diff /tmp/O2-opts /tmp/O3-opts | grep enabled
>   -fgcse-after-reload               [enabled]
>   -fipa-cp-clone                    [enabled]
>   -floop-interchange                [enabled]
>   -floop-unroll-and-jam             [enabled]
>   -fpeel-loops                      [enabled]
>   -fpredictive-commoning            [enabled]
>   -fsplit-loops                     [enabled]
>   -fsplit-paths                     [enabled]
>   -ftree-loop-distribution          [enabled]
>   -ftree-partial-pre                [enabled]
>   -funroll-completely-grow-size     [enabled]
>   -funswitch-loops                  [enabled]
>   -fversion-loops-for-strides       [enabled]

Now, I'll be doing some experiments, but...that's a lot of tests and rebuilds. I was hoping maybe someone here can point me to "this flag is useful for Fortran" vs "this doesn't matter".

And maybe which one might be triggered by an OS update? ¯\_(ツ)_/¯

Thanks,
Matt
--
Matt Thompson, SSAI, Ld Scientific Programmer/Analyst
NASA GSFC,    Global Modeling and Assimilation Office
Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
Phone: 301-614-6712                 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Advice with finding speed between O2 and O3
  2023-05-22 15:31 Advice with finding speed between O2 and O3 Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
@ 2023-05-22 21:36 ` Thomas Koenig
  2023-05-25 16:05   ` [EXTERNAL] " Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Koenig @ 2023-05-22 21:36 UTC (permalink / raw)
  To: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC],
	fortran

Hi Matt,

> Recently, one of the computing centers I run on updated their  > OS. And in that update, the model went from "working with GNU"
> to "crashing with GNU". No code change on our side, just OS.

That sounds suspicious, and points to possible bugs in the
code.

Hmm... does the upgrade mean another compiler version?
That could break things, one way or another.  Which
version were you using on the old system, and which one are
you using now?

Does code compiled on the old system still work?

In your case, I would try out whatever debugging options you have
at your disposal, to find the culprit(s).  Use -fcheck=all.
Link with -static-libgfortran to make sure the right library
is used.  Use -fsanitize=undefined and -fsanitize=address. Run
your code under valgrind. Use another compiler (nagfor is excellent
at finding bugs with its catch-all debug option). Use -finit-real=NAN.
Use -Wall -Werror and look at the warnings. Use LTO to find mismatches
in code, or concatenate the whole source into one file and compile
it (never versions of gfortran will then issue errors on suspect code).



> 
> Some experimenting later and I found that the code did run with debugging > options, and it still ran with our "aggressive" options (much of 
which> is due to Jerry DeLisle from here). Only our release flags 
failed.> Surprising since the Aggressive options seem more likely to 
have issues> as they are speed for speed's sake (different MPI layouts 
lead to different> answers).

I've never used MPI, but what you describe also sounds suspicious;
maybe some sort of race condition in the code?

> But, one of the main differences are the aggressive flags use -O2 > and our release flags are -O3. So I test our release flags with> -O2 
and boom, works again! Bad news: much slower.
> 
> Our release flags are (essentially):
> 
>    -O3 -march=haswell -mtune=generic -funroll-loops -g -fPIC -fopenmp
> 
> so we aren't doing anything fancy (portability at the cost of speed).
> 
> Staring at the man page I saw this:
> 
>                     gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
>                     gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
>                     diff /tmp/O2-opts /tmp/O3-opts | grep enabled
> 
> and when I did that I saw:
> 
> $ diff /tmp/O2-opts /tmp/O3-opts | grep enabled
>>    -fgcse-after-reload               [enabled]
>>    -fipa-cp-clone                    [enabled]
>>    -floop-interchange                [enabled]
>>    -floop-unroll-and-jam             [enabled]
>>    -fpeel-loops                      [enabled]
>>    -fpredictive-commoning            [enabled]
>>    -fsplit-loops                     [enabled]
>>    -fsplit-paths                     [enabled]
>>    -ftree-loop-distribution          [enabled]
>>    -ftree-partial-pre                [enabled]
>>    -funroll-completely-grow-size     [enabled]
>>    -funswitch-loops                  [enabled]
>>    -fversion-loops-for-strides       [enabled]
> 
> Now, I'll be doing some experiments, but...that's a lot > of tests and rebuilds. I was hoping maybe someone here> can point me 
to "this flag is useful for Fortran"

I think -floop-interchange has little effect on Fortran,
there is a PR on it, IIRC.

If you want to test, a binary search could help.

> vs "this doesn't matter".
> 
> And maybe which one might be triggered by an OS update? ¯\_(ツ)_/¯

Was the compiler also upgraded, or was it just the kernel?
Like I wrote above, different compiler versions may well
cause problems, which is why there is a porting_to.html
file for every gcc release.  The newest one can be found
here: https://gcc.gnu.org/gcc-13/porting_to.html

Best regards

	Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXTERNAL] Re: Advice with finding speed between O2 and O3
  2023-05-22 21:36 ` Thomas Koenig
@ 2023-05-25 16:05   ` Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
  2023-05-25 17:01     ` Steve Kargl
  0 siblings, 1 reply; 5+ messages in thread
From: Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] @ 2023-05-25 16:05 UTC (permalink / raw)
  To: Thomas Koenig, fortran

Thomas,

Well, the code did not change. Period. Neither did the compiler. It was 12.3. (We can't use GCC 13 because it seems not to like something in our advanced Fortran code (lots of OO, submodules, string fun...)).

And I did a run with essentially all the GNU checks on (our Debug build mode) and it happily runs!

That said, I did some further tests and I am *really* confused. This fails:

-O3 -march=haswell -mtune=generic -funroll-loops -g

And this works:

-O2 -march=haswell -mtune=generic -funroll-loops -g

Now I just tried:

-O2     -fgcse-after-reload    -fipa-cp-clone    -floop-interchange    -floop-unroll-and-jam    -fpeel-loops    -fpredictive-commoning    -fsplit-loops    -fsplit-paths    -ftree-loop-distribution    -ftree-partial-pre    -funroll-completely-grow-size    -funswitch-loops    -fversion-loops-for-strides  -march=haswell -mtune=generic -funroll-loops -g

which as far as I can see from the gcc man page:

       -O3 Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the following optimization flags:

           -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops
           -fvect-cost-model=dynamic -fversion-loops-for-strides

means I am running essentially -O3.

But it works.

I'm...baffled. Is there something that *gfortran* enables with -O3 that isn't visible from the *gcc* man page?

Matt
--
Matt Thompson, SSAI, Ld Scientific Prog/Analyst/Super
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson



On 5/22/23, 5:36 PM, "Thomas Koenig" <tkoenig@netcologne.de <mailto:tkoenig@netcologne.de>> wrote:


CAUTION: This email originated from outside of NASA. Please take care when clicking links or opening attachments. Use the "Report Message" button to report suspicious messages to the NASA SOC.








Hi Matt,


> Recently, one of the computing centers I run on updated their > OS. And in that update, the model went from "working with GNU"
> to "crashing with GNU". No code change on our side, just OS.


That sounds suspicious, and points to possible bugs in the
code.


Hmm... does the upgrade mean another compiler version?
That could break things, one way or another. Which
version were you using on the old system, and which one are
you using now?


Does code compiled on the old system still work?


In your case, I would try out whatever debugging options you have
at your disposal, to find the culprit(s). Use -fcheck=all.
Link with -static-libgfortran to make sure the right library
is used. Use -fsanitize=undefined and -fsanitize=address. Run
your code under valgrind. Use another compiler (nagfor is excellent
at finding bugs with its catch-all debug option). Use -finit-real=NAN.
Use -Wall -Werror and look at the warnings. Use LTO to find mismatches
in code, or concatenate the whole source into one file and compile
it (never versions of gfortran will then issue errors on suspect code).






>
> Some experimenting later and I found that the code did run with debugging > options, and it still ran with our "aggressive" options (much of
which> is due to Jerry DeLisle from here). Only our release flags
failed.> Surprising since the Aggressive options seem more likely to
have issues> as they are speed for speed's sake (different MPI layouts
lead to different> answers).


I've never used MPI, but what you describe also sounds suspicious;
maybe some sort of race condition in the code?


> But, one of the main differences are the aggressive flags use -O2 > and our release flags are -O3. So I test our release flags with> -O2
and boom, works again! Bad news: much slower.
>
> Our release flags are (essentially):
>
> -O3 -march=haswell -mtune=generic -funroll-loops -g -fPIC -fopenmp
>
> so we aren't doing anything fancy (portability at the cost of speed).
>
> Staring at the man page I saw this:
>
> gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
> gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
> diff /tmp/O2-opts /tmp/O3-opts | grep enabled
>
> and when I did that I saw:
>
> $ diff /tmp/O2-opts /tmp/O3-opts | grep enabled
>> -fgcse-after-reload [enabled]
>> -fipa-cp-clone [enabled]
>> -floop-interchange [enabled]
>> -floop-unroll-and-jam [enabled]
>> -fpeel-loops [enabled]
>> -fpredictive-commoning [enabled]
>> -fsplit-loops [enabled]
>> -fsplit-paths [enabled]
>> -ftree-loop-distribution [enabled]
>> -ftree-partial-pre [enabled]
>> -funroll-completely-grow-size [enabled]
>> -funswitch-loops [enabled]
>> -fversion-loops-for-strides [enabled]
>
> Now, I'll be doing some experiments, but...that's a lot > of tests and rebuilds. I was hoping maybe someone here> can point me
to "this flag is useful for Fortran"


I think -floop-interchange has little effect on Fortran,
there is a PR on it, IIRC.


If you want to test, a binary search could help.


> vs "this doesn't matter".
>
> And maybe which one might be triggered by an OS update? ¯\_(ツ)_/¯


Was the compiler also upgraded, or was it just the kernel?
Like I wrote above, different compiler versions may well
cause problems, which is why there is a porting_to.html
file for every gcc release. The newest one can be found
here: https://gcc.gnu.org/gcc-13/porting_to.html <https://gcc.gnu.org/gcc-13/porting_to.html>


Best regards


Thomas




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXTERNAL] Re: Advice with finding speed between O2 and O3
  2023-05-25 16:05   ` [EXTERNAL] " Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
@ 2023-05-25 17:01     ` Steve Kargl
  2023-05-25 18:51       ` Harald Anlauf
  0 siblings, 1 reply; 5+ messages in thread
From: Steve Kargl @ 2023-05-25 17:01 UTC (permalink / raw)
  To: Thompson,
	Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via
	Fortran
  Cc: Thomas Koenig

On Thu, May 25, 2023 at 04:05:11PM +0000, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via Fortran wrote:
> Thomas,
> 
> Well, the code did not change. Period. Neither did the compiler. It was 12.3. (We can't use GCC 13 because it seems not to like something in our advanced Fortran code (lots of OO, submodules, string fun...)).
> 
> And I did a run with essentially all the GNU checks on (our Debug build mode) and it happily runs!
> 
> That said, I did some further tests and I am *really* confused. This fails:
> 
> -O3 -march=haswell -mtune=generic -funroll-loops -g
> 
> And this works:
> 
> -O2 -march=haswell -mtune=generic -funroll-loops -g
> 
> Now I just tried:
> 
> -O2     -fgcse-after-reload    -fipa-cp-clone    -floop-interchange    -floop-unroll-and-jam    -fpeel-loops    -fpredictive-commoning    -fsplit-loops    -fsplit-paths    -ftree-loop-distribution    -ftree-partial-pre    -funroll-completely-grow-size    -funswitch-loops    -fversion-loops-for-strides  -march=haswell -mtune=generic -funroll-loops -g
> 
> which as far as I can see from the gcc man page:
> 
>        -O3 Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the following optimization flags:
> 
>            -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops
>            -fvect-cost-model=dynamic -fversion-loops-for-strides
> 
> means I am running essentially -O3.
> 
> But it works.
> 
> I'm...baffled. Is there something that *gfortran* enables with -O3 that isn't visible from the *gcc* man page?
> 

gcc/gcc/opts.cc also shows some fiddling with parameters.

  /* -O3 parameters.  */
  { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
  { OPT_LEVELS_3_PLUS, OPT__param_early_inlining_insns_, NULL, 14 },
  { OPT_LEVELS_3_PLUS, OPT__param_inline_heuristics_hint_percent_, NULL, 600 },
  { OPT_LEVELS_3_PLUS, OPT__param_inline_min_speedup_, NULL, 15 },
  { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_single_, NULL, 200 },

AFAICT, gfortran does not add or change anything with -O3.
Out of curosity, does it compile and run with -O3 if you 
remove one or both of '-march=haswell -mtune=generic'?

One other possibility is an issue with signed integer overflow,
but I don't remember if the change that causes the issue has
reached 12.x.  Does the code run if you add -fwrapv to your
options list?
-- 
Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXTERNAL] Re: Advice with finding speed between O2 and O3
  2023-05-25 17:01     ` Steve Kargl
@ 2023-05-25 18:51       ` Harald Anlauf
  0 siblings, 0 replies; 5+ messages in thread
From: Harald Anlauf @ 2023-05-25 18:51 UTC (permalink / raw)
  To: sgk, Thompson,
	Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via
	Fortran
  Cc: Thomas Koenig

Am 25.05.23 um 19:01 schrieb Steve Kargl:
> On Thu, May 25, 2023 at 04:05:11PM +0000, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via Fortran wrote:
>> Thomas,
>>
>> Well, the code did not change. Period. Neither did the compiler. It was 12.3. (We can't use GCC 13 because it seems not to like something in our advanced Fortran code (lots of OO, submodules, string fun...)).
>>
>> And I did a run with essentially all the GNU checks on (our Debug build mode) and it happily runs!
>>
>> That said, I did some further tests and I am *really* confused. This fails:
>>
>> -O3 -march=haswell -mtune=generic -funroll-loops -g
>>
>> And this works:
>>
>> -O2 -march=haswell -mtune=generic -funroll-loops -g
>>
>> Now I just tried:
>>
>> -O2     -fgcse-after-reload    -fipa-cp-clone    -floop-interchange    -floop-unroll-and-jam    -fpeel-loops    -fpredictive-commoning    -fsplit-loops    -fsplit-paths    -ftree-loop-distribution    -ftree-partial-pre    -funroll-completely-grow-size    -funswitch-loops    -fversion-loops-for-strides  -march=haswell -mtune=generic -funroll-loops -g
>>
>> which as far as I can see from the gcc man page:
>>
>>         -O3 Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the following optimization flags:
>>
>>             -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops
>>             -fvect-cost-model=dynamic -fversion-loops-for-strides
>>
>> means I am running essentially -O3.
>>
>> But it works.
>>
>> I'm...baffled. Is there something that *gfortran* enables with -O3 that isn't visible from the *gcc* man page?
>>

When I look at the complete *difference* of
   gfortran-12 -c -Q --help=optimizers
between -O2 and -O3, I see other differing parameters:

-  -fvect-cost-model=[unlimited|dynamic|cheap|very-cheap]       very-cheap

+  -fvect-cost-model=[unlimited|dynamic|cheap|very-cheap]       dynamic

Could these be relevant?

Harald


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-25 18:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-22 15:31 Advice with finding speed between O2 and O3 Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
2023-05-22 21:36 ` Thomas Koenig
2023-05-25 16:05   ` [EXTERNAL] " Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
2023-05-25 17:01     ` Steve Kargl
2023-05-25 18:51       ` Harald Anlauf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).