public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/102510] New: Function call has unnecessary aliasing check
@ 2021-09-28  2:17 dwwork at gmail dot com
  2021-09-28  8:54 ` [Bug fortran/102510] Function call has unnecessary stride check rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: dwwork at gmail dot com @ 2021-09-28  2:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510

            Bug ID: 102510
           Summary: Function call has unnecessary aliasing check
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dwwork at gmail dot com
  Target Milestone: ---

The following 2 functions semantically do the same thing, they add two fixed
size arrays and store them into a third. When compiled with "-O3 -mavx" for
x86_64, I expect to see a single avx instruction. The first version does this
correctly, while the second has an aliasing check with a vectorized branch and
a scalar branch (I think). The second version is incorrect, and should produce
similar vectorized assembly to the first, as fortran does not allow function
arguments to alias. I could be wrong of course, but that is my understanding.

subroutine add2vecs1(a,b,c)
    use iso_fortran_env, only: r32 => real32
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(8), intent(out) :: c
    c = a + b
end subroutine

Output Assembly (from godbolt.org, https://godbolt.org/z/aedEe7rGM):

add2vecs1_:
        vmovups ymm0, YMMWORD PTR [rdi]
        vaddps  ymm0, ymm0, YMMWORD PTR [rsi]
        vmovups YMMWORD PTR [rdx], ymm0
        vzeroupper
        ret

function add2vecs2(a,b)
    use iso_fortran_env, only: r32 => real32
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(8) :: add2vecs2
    add2vecs2 = a + b
end function

Output Assembly:

add2vecs2_:
        mov     rax, QWORD PTR [rdi+40]
        mov     rcx, QWORD PTR [rdi]
        test    rax, rax
        je      .L5
        cmp     rax, 1
        jne     .L11
.L5:
        vmovups ymm0, YMMWORD PTR [rdx]
        vaddps  ymm0, ymm0, YMMWORD PTR [rsi]
        vmovups YMMWORD PTR [rcx], ymm0
        vzeroupper
        ret
.L11:
        vmovups xmm1, XMMWORD PTR [rdx]
        vaddps  xmm0, xmm1, XMMWORD PTR [rsi]
        lea     rdi, [rcx+rax*8]
        mov     r8, rax
        sal     r8, 4
        vmovss  DWORD PTR [rcx], xmm0
        vextractps      DWORD PTR [rcx+rax*4], xmm0, 1
        vextractps      DWORD PTR [rcx+rax*8], xmm0, 2
        vextractps      DWORD PTR [rdi+rax*4], xmm0, 3
        vmovups xmm0, XMMWORD PTR [rdx+16]
        vaddps  xmm0, xmm0, XMMWORD PTR [rsi+16]
        lea     rdi, [rcx+r8]
        lea     rdx, [rdi+rax*8]
        vmovss  DWORD PTR [rcx+r8], xmm0
        vextractps      DWORD PTR [rdi+rax*4], xmm0, 1
        vextractps      DWORD PTR [rdi+rax*8], xmm0, 2
        vextractps      DWORD PTR [rdx+rax*4], xmm0, 3
        ret

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/102510] Function call has unnecessary stride check
  2021-09-28  2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
@ 2021-09-28  8:54 ` rguenth at gcc dot gnu.org
  2021-09-28 13:55 ` dwwork at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-28  8:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-09-28
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
      Known to fail|                            |12.0
           Keywords|                            |missed-optimization
            Summary|Function call has           |Function call has
                   |unnecessary aliasing check  |unnecessary stride check

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's not an alias but we seem to not know that the array stride of the result
is one and thus we version on that.  I'm not sure whether fortran allows

  res(/1, 16, 2/) = add2vecs2(a, b)

or so or why  we know the inputs are not strided.  The result comes in as
array descriptor while the other arguments are pointers to a flat [8] array.
Adding 'contiguous' yields

Error: 'add2vecs2' at (1) has the CONTIGUOUS attribute but is not an array
pointer or an assumed-shape or assumed-rank array

so maybe it's really just a missed optimization in the Frontend?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/102510] Function call has unnecessary stride check
  2021-09-28  2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
  2021-09-28  8:54 ` [Bug fortran/102510] Function call has unnecessary stride check rguenth at gcc dot gnu.org
@ 2021-09-28 13:55 ` dwwork at gmail dot com
  2021-09-28 19:03 ` anlauf at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: dwwork at gmail dot com @ 2021-09-28 13:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510

--- Comment #2 from Dalon Work <dwwork at gmail dot com> ---
Thanks for the information. Based on your comments, I've created 2 new
subroutines that call the "bad" function. The first places the result in a
contiguous array, while the second places the result in a strided array.
(https://godbolt.org/z/bTnWr3bMn)

The first:

subroutine add2vecs3(a,b,c)
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(8), intent(out) :: c
    c = add2vecs2(a,b)
end subroutine

With "-O3 -mavx", this subroutine becomes fully vectorized:

__blah_MOD_add2vecs3:
        vmovups ymm0, YMMWORD PTR [rdi]
        vaddps  ymm0, ymm0, YMMWORD PTR [rsi]
        vmovups YMMWORD PTR [rdx], ymm0
        vzeroupper
        ret

The second:

subroutine add2vecs4(a,b,c)
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(16), intent(out) :: c
    c(1:16:2) = add2vecs2(a,b)
end subroutine

In this case we get the non-vectorized version:

__blah_MOD_add2vecs4:
        vmovups ymm0, YMMWORD PTR [rsi]
        vaddps  ymm0, ymm0, YMMWORD PTR [rdi]
        vmovss  DWORD PTR [rdx], xmm0
        vextractps      DWORD PTR [rdx+8], xmm0, 1
        vextractps      DWORD PTR [rdx+16], xmm0, 2
        vextractps      DWORD PTR [rdx+24], xmm0, 3
        vextractf128    xmm0, ymm0, 0x1
        vmovss  DWORD PTR [rdx+32], xmm0
        vextractps      DWORD PTR [rdx+40], xmm0, 1
        vextractps      DWORD PTR [rdx+48], xmm0, 2
        vextractps      DWORD PTR [rdx+56], xmm0, 3
        vzeroupper
        ret

>From this, it seems you are correct. The result gets passed in as a descriptor
to a block of memory and from that the function figures out the best way to
fill in the data. Perhaps other compilers handle this differently, but there we
have it.

Changing this behavior might be difficult or impossible, as this would be an
ABI change, would it not? It's arguable whether it's even worth changing.
Perhaps other compilers do it differently. I guess what I assumed is that the
compiler would have a contigous block of memory available for the return
result. Any necessary striding would happen external to the function.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/102510] Function call has unnecessary stride check
  2021-09-28  2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
  2021-09-28  8:54 ` [Bug fortran/102510] Function call has unnecessary stride check rguenth at gcc dot gnu.org
  2021-09-28 13:55 ` dwwork at gmail dot com
@ 2021-09-28 19:03 ` anlauf at gcc dot gnu.org
  2021-09-28 19:26 ` dwwork at gmail dot com
  2021-09-29 21:01 ` anlauf at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: anlauf at gcc dot gnu.org @ 2021-09-28 19:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510

--- Comment #3 from anlauf at gcc dot gnu.org ---
It helps to look at the (Fortran) context.  As written, the subroutine version
is declared with explicit size contiguous arrays.  If the caller has a
non-contiguous (strided) result array, it needs to pack/unpack.  For the
function version - as is - we might need a temporary to handle different
situations.

However, if you offer the compiler the chance to inline the calls, and using
optimization to inline the packing, you may get better code than you think.

Compile this example with -O3 -mavx:

module p
  use iso_fortran_env, only: r32 => real32
  real(r32), dimension(8)  :: a,b
  real(r32), dimension(8)  :: c1, c2
  real(r32), dimension(16) :: d1, d2
contains
  subroutine add2vecs1(a,b,c)
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(8), intent(out) :: c
    c = a + b
  end subroutine add2vecs1
  function add2vecs2(a,b)
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(8) :: add2vecs2
    add2vecs2 = a + b
  end function add2vecs2
  !-
  subroutine s1 ()
    call add2vecs1 (a, b, c1)
  end subroutine s1
  !-
  subroutine s2 ()
    c2         = add2vecs2 (a, b)
  end subroutine s2
  !-
  subroutine s3 ()
    call add2vecs1 (a, b, d1(1:16:2))
  end subroutine s3
  !-
  subroutine s4 ()
    d2(1:16:2) = add2vecs2 (a, b)
  end subroutine s4
end

You'll find that s1 and s2 compile to the same code, and the strided versions
s3 and s4 (at least this is my reading of the assembly, but correct me if I
am wrong).

Is there really more to expect?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/102510] Function call has unnecessary stride check
  2021-09-28  2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
                   ` (2 preceding siblings ...)
  2021-09-28 19:03 ` anlauf at gcc dot gnu.org
@ 2021-09-28 19:26 ` dwwork at gmail dot com
  2021-09-29 21:01 ` anlauf at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: dwwork at gmail dot com @ 2021-09-28 19:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510

--- Comment #4 from Dalon Work <dwwork at gmail dot com> ---
No, I don't think there is more to expect. My mistaken assumption was that the
return value of the function had to be a contiguous array of 8 elements. I
don't find this to be a stupid assumption, since the declaration looks so
similar to that of the dummy arguments. See:

function add2vecs2(a,b)
    use iso_fortran_env, only: r32 => real32
    real(r32), dimension(8), intent(in) :: a,b
    real(r32), dimension(8) :: add2vecs2 ! Basically the same as the prev line
    add2vecs2 = a + b
end function


But, as I've been informed, that is not the case. Apparently nothing enforces
this "contiguous" restriction on function array return values, and so gfortran
has taken the liberty of constructing an array descriptor instead. It's good to
know that when given more information, inlining is able to figure that out, but
when compiled in isolation, the function needs to check the array descriptor.

I'm satisfied with the answers here, so if you want to close it as not a bug,
I'm okay with that.

Thanks for the help.

Dalon

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/102510] Function call has unnecessary stride check
  2021-09-28  2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
                   ` (3 preceding siblings ...)
  2021-09-28 19:26 ` dwwork at gmail dot com
@ 2021-09-29 21:01 ` anlauf at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: anlauf at gcc dot gnu.org @ 2021-09-29 21:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510

anlauf at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |WORKSFORME

--- Comment #5 from anlauf at gcc dot gnu.org ---
If you collect your procedures in a separate file, there is the possible
question of what happens when you "use" these in another module.

A possible expectation is that this situation should be handled with LTO.
I have not checked whether this works (at the time of writing), though.

I'ld like to close this issue for now.  Please reopen or open a new one
if you think there is room for improvement.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-29 21:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28  2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
2021-09-28  8:54 ` [Bug fortran/102510] Function call has unnecessary stride check rguenth at gcc dot gnu.org
2021-09-28 13:55 ` dwwork at gmail dot com
2021-09-28 19:03 ` anlauf at gcc dot gnu.org
2021-09-28 19:26 ` dwwork at gmail dot com
2021-09-29 21:01 ` anlauf at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).