From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id E7AFE3858408; Tue, 28 Sep 2021 13:55:47 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E7AFE3858408
From: "dwwork at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug fortran/102510] Function call has unnecessary stride check
Date: Tue, 28 Sep 2021 13:55:47 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: fortran
X-Bugzilla-Version: 11.2.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: dwwork at gmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-102510-4-aa3UGBvg4N@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-102510-4@http.gcc.gnu.org/bugzilla/>
References: <bug-102510-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Sep 2021 13:55:48 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102510
--- Comment #2 from Dalon Work <dwwork at gmail dot com> ---
Thanks for the information. Based on your comments, I've created 2 new
subroutines that call the "bad" function. The first places the result in a
contiguous array, while the second places the result in a strided array.
(https://godbolt.org/z/bTnWr3bMn)

The first:

subroutine add2vecs3(a,b,c)
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(8), intent(out) :: c
    c =3D add2vecs2(a,b)
end subroutine

With "-O3 -mavx", this subroutine becomes fully vectorized:

__blah_MOD_add2vecs3:
        vmovups ymm0, YMMWORD PTR [rdi]
        vaddps  ymm0, ymm0, YMMWORD PTR [rsi]
        vmovups YMMWORD PTR [rdx], ymm0
        vzeroupper
        ret

The second:

subroutine add2vecs4(a,b,c)
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(16), intent(out) :: c
    c(1:16:2) =3D add2vecs2(a,b)
end subroutine

In this case we get the non-vectorized version:

__blah_MOD_add2vecs4:
        vmovups ymm0, YMMWORD PTR [rsi]
        vaddps  ymm0, ymm0, YMMWORD PTR [rdi]
        vmovss  DWORD PTR [rdx], xmm0
        vextractps      DWORD PTR [rdx+8], xmm0, 1
        vextractps      DWORD PTR [rdx+16], xmm0, 2
        vextractps      DWORD PTR [rdx+24], xmm0, 3
        vextractf128    xmm0, ymm0, 0x1
        vmovss  DWORD PTR [rdx+32], xmm0
        vextractps      DWORD PTR [rdx+40], xmm0, 1
        vextractps      DWORD PTR [rdx+48], xmm0, 2
        vextractps      DWORD PTR [rdx+56], xmm0, 3
        vzeroupper
        ret

>>From this, it seems you are correct. The result gets passed in as a descrip=
tor
to a block of memory and from that the function figures out the best way to
fill in the data. Perhaps other compilers handle this differently, but ther=
e we
have it.

Changing this behavior might be difficult or impossible, as this would be an
ABI change, would it not? It's arguable whether it's even worth changing.
Perhaps other compilers do it differently. I guess what I assumed is that t=
he
compiler would have a contigous block of memory available for the return
result. Any necessary striding would happen external to the function.=