From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by sourceware.org (Postfix) with ESMTPS id 689A83858D32 for ; Thu, 25 May 2023 17:01:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 689A83858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=troutmask.apl.washington.edu Authentication-Results: sourceware.org; spf=none smtp.mailfrom=troutmask.apl.washington.edu Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.17.1/8.17.1) with ESMTPS id 34PH177d052068 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 25 May 2023 10:01:07 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.17.1/8.17.1/Submit) id 34PH16nb052067; Thu, 25 May 2023 10:01:06 -0700 (PDT) (envelope-from sgk) Date: Thu, 25 May 2023 10:01:06 -0700 From: Steve Kargl To: "Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via Fortran" Cc: Thomas Koenig Subject: Re: [EXTERNAL] Re: Advice with finding speed between O2 and O3 Message-ID: Reply-To: sgk@troutmask.apl.washington.edu References: <902e0bde-702b-9f89-fbf7-7e16c198241a@netcologne.de> <075103A8-4E87-4FD1-ABA6-F5C5C6A2540E@nasa.gov> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <075103A8-4E87-4FD1-ABA6-F5C5C6A2540E@nasa.gov> X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, May 25, 2023 at 04:05:11PM +0000, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via Fortran wrote: > Thomas, > > Well, the code did not change. Period. Neither did the compiler. It was 12.3. (We can't use GCC 13 because it seems not to like something in our advanced Fortran code (lots of OO, submodules, string fun...)). > > And I did a run with essentially all the GNU checks on (our Debug build mode) and it happily runs! > > That said, I did some further tests and I am *really* confused. This fails: > > -O3 -march=haswell -mtune=generic -funroll-loops -g > > And this works: > > -O2 -march=haswell -mtune=generic -funroll-loops -g > > Now I just tried: > > -O2 -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funroll-completely-grow-size -funswitch-loops -fversion-loops-for-strides -march=haswell -mtune=generic -funroll-loops -g > > which as far as I can see from the gcc man page: > > -O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the following optimization flags: > > -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops > -fvect-cost-model=dynamic -fversion-loops-for-strides > > means I am running essentially -O3. > > But it works. > > I'm...baffled. Is there something that *gfortran* enables with -O3 that isn't visible from the *gcc* man page? > gcc/gcc/opts.cc also shows some fiddling with parameters. /* -O3 parameters. */ { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 }, { OPT_LEVELS_3_PLUS, OPT__param_early_inlining_insns_, NULL, 14 }, { OPT_LEVELS_3_PLUS, OPT__param_inline_heuristics_hint_percent_, NULL, 600 }, { OPT_LEVELS_3_PLUS, OPT__param_inline_min_speedup_, NULL, 15 }, { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_single_, NULL, 200 }, AFAICT, gfortran does not add or change anything with -O3. Out of curosity, does it compile and run with -O3 if you remove one or both of '-march=haswell -mtune=generic'? One other possibility is an issue with signed integer overflow, but I don't remember if the change that causes the issue has reached 12.x. Does the code run if you add -fwrapv to your options list? -- Steve