From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id DBD7C3858408; Tue, 28 Sep 2021 19:03:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DBD7C3858408 From: "anlauf at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug fortran/102510] Function call has unnecessary stride check Date: Tue, 28 Sep 2021 19:03:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: fortran X-Bugzilla-Version: 11.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: anlauf at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Sep 2021 19:03:23 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102510 --- Comment #3 from anlauf at gcc dot gnu.org --- It helps to look at the (Fortran) context. As written, the subroutine vers= ion is declared with explicit size contiguous arrays. If the caller has a non-contiguous (strided) result array, it needs to pack/unpack. For the function version - as is - we might need a temporary to handle different situations. However, if you offer the compiler the chance to inline the calls, and using optimization to inline the packing, you may get better code than you think. Compile this example with -O3 -mavx: module p use iso_fortran_env, only: r32 =3D> real32 real(r32), dimension(8) :: a,b real(r32), dimension(8) :: c1, c2 real(r32), dimension(16) :: d1, d2 contains subroutine add2vecs1(a,b,c) real(r32), dimension(8), intent(in) :: a,b real(r32), dimension(8), intent(out) :: c c =3D a + b end subroutine add2vecs1 function add2vecs2(a,b) real(r32), dimension(8), intent(in) :: a,b real(r32), dimension(8) :: add2vecs2 add2vecs2 =3D a + b end function add2vecs2 !- subroutine s1 () call add2vecs1 (a, b, c1) end subroutine s1 !- subroutine s2 () c2 =3D add2vecs2 (a, b) end subroutine s2 !- subroutine s3 () call add2vecs1 (a, b, d1(1:16:2)) end subroutine s3 !- subroutine s4 () d2(1:16:2) =3D add2vecs2 (a, b) end subroutine s4 end You'll find that s1 and s2 compile to the same code, and the strided versio= ns s3 and s4 (at least this is my reading of the assembly, but correct me if I am wrong). Is there really more to expect?=