From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C8C7C389853D; Tue, 12 May 2020 09:30:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C8C7C389853D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1589275826; bh=Z7/Mmcv7BynpB7jC9zbDPMrLHV3iI3s0isGq/y5Gs4M=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ObJ21FQg45I1ya6fj1g3Z/vUxI9CjznhG0Hlp8kD1xa3dQ1B2XgcFD85el15gI8qK /GGsztJwrwg/qvQNUCrPwKq5D0GByN3QPrEmFWUfeVYuPt732E96Q4cLWx4FEcAAM9 3KNEcAACVN02lq1RkUms2ic8+WVx/6NV8KljXxok= From: "tkoenig at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling Date: Tue, 12 May 2020 09:30:25 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: tkoenig at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 10.2 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 May 2020 09:30:26 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95018 --- Comment #21 from Thomas Koenig --- (In reply to Richard Biener from comment #19) > Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?).=20 Just plain -O2 (for size reasons), with matmul as an exception where we add -funroll-loops and other optoins. > so what's the speciality on POWER? Code growth should trigger with -O3 o= nly. > Given we have only a guessed profile (and that does not detect the inner > loop as completely cold) we're allowing growth then. GCC has no idea the > outer loop iterates more than the inner. As a test, I changed the condition of the loop in question to @@ -88,7 +88,7 @@ internal_pack_r4 (gfc_array_r4 * source) count[0]++; /* Advance to the next source element. */ index_type n =3D 0; - while (count[n] =3D=3D extent[n]) + while (unlikely (count[n] =3D=3D extent[n])) { /* When we get to the end of a dimension, reset it and increment the next dimension. */ which then results in while (__builtin_expect(!!(count[n] =3D=3D extent[n]), 0)) and the loop is still completely peeled on POWER at -O2, which I do not understand. > Thomas - where did you measure the slowness? For which dimensionality? Actually, I didn't, I just made an assumption that it would be bad for speed as well. The tests that I ran then didn't show any such slowdown, so I guess the POWER9 branch predictor is doing a good job here. However, this kind of loop is the standard way of accessing multi- dimensional arrays of unknown dimension in libgfortran. It occurs in around 400 files there, sometimes more than once, so the size issue is significant. I haven't checked if there is an actual degradation for other use cases.=20 > I'm quite sure the loop structure will be sub-optimal for certain > input shapes... (stride0 =3D=3D 1 could even use memcpy for the inner dim= ension). Yes. I plan to revisit this when looking at PR 93114, where I have to touch that part of the code anyway.=