public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/102510] New: Function call has unnecessary aliasing check
@ 2021-09-28 2:17 dwwork at gmail dot com
2021-09-28 8:54 ` [Bug fortran/102510] Function call has unnecessary stride check rguenth at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: dwwork at gmail dot com @ 2021-09-28 2:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510
Bug ID: 102510
Summary: Function call has unnecessary aliasing check
Product: gcc
Version: 11.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: dwwork at gmail dot com
Target Milestone: ---
The following 2 functions semantically do the same thing, they add two fixed
size arrays and store them into a third. When compiled with "-O3 -mavx" for
x86_64, I expect to see a single avx instruction. The first version does this
correctly, while the second has an aliasing check with a vectorized branch and
a scalar branch (I think). The second version is incorrect, and should produce
similar vectorized assembly to the first, as fortran does not allow function
arguments to alias. I could be wrong of course, but that is my understanding.
subroutine add2vecs1(a,b,c)
use iso_fortran_env, only: r32 => real32
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(8), intent(out) :: c
c = a + b
end subroutine
Output Assembly (from godbolt.org, https://godbolt.org/z/aedEe7rGM):
add2vecs1_:
vmovups ymm0, YMMWORD PTR [rdi]
vaddps ymm0, ymm0, YMMWORD PTR [rsi]
vmovups YMMWORD PTR [rdx], ymm0
vzeroupper
ret
function add2vecs2(a,b)
use iso_fortran_env, only: r32 => real32
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(8) :: add2vecs2
add2vecs2 = a + b
end function
Output Assembly:
add2vecs2_:
mov rax, QWORD PTR [rdi+40]
mov rcx, QWORD PTR [rdi]
test rax, rax
je .L5
cmp rax, 1
jne .L11
.L5:
vmovups ymm0, YMMWORD PTR [rdx]
vaddps ymm0, ymm0, YMMWORD PTR [rsi]
vmovups YMMWORD PTR [rcx], ymm0
vzeroupper
ret
.L11:
vmovups xmm1, XMMWORD PTR [rdx]
vaddps xmm0, xmm1, XMMWORD PTR [rsi]
lea rdi, [rcx+rax*8]
mov r8, rax
sal r8, 4
vmovss DWORD PTR [rcx], xmm0
vextractps DWORD PTR [rcx+rax*4], xmm0, 1
vextractps DWORD PTR [rcx+rax*8], xmm0, 2
vextractps DWORD PTR [rdi+rax*4], xmm0, 3
vmovups xmm0, XMMWORD PTR [rdx+16]
vaddps xmm0, xmm0, XMMWORD PTR [rsi+16]
lea rdi, [rcx+r8]
lea rdx, [rdi+rax*8]
vmovss DWORD PTR [rcx+r8], xmm0
vextractps DWORD PTR [rdi+rax*4], xmm0, 1
vextractps DWORD PTR [rdi+rax*8], xmm0, 2
vextractps DWORD PTR [rdx+rax*4], xmm0, 3
ret
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug fortran/102510] Function call has unnecessary stride check
2021-09-28 2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
@ 2021-09-28 8:54 ` rguenth at gcc dot gnu.org
2021-09-28 13:55 ` dwwork at gmail dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-28 8:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-09-28
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Known to fail| |12.0
Keywords| |missed-optimization
Summary|Function call has |Function call has
|unnecessary aliasing check |unnecessary stride check
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's not an alias but we seem to not know that the array stride of the result
is one and thus we version on that. I'm not sure whether fortran allows
res(/1, 16, 2/) = add2vecs2(a, b)
or so or why we know the inputs are not strided. The result comes in as
array descriptor while the other arguments are pointers to a flat [8] array.
Adding 'contiguous' yields
Error: 'add2vecs2' at (1) has the CONTIGUOUS attribute but is not an array
pointer or an assumed-shape or assumed-rank array
so maybe it's really just a missed optimization in the Frontend?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug fortran/102510] Function call has unnecessary stride check
2021-09-28 2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
2021-09-28 8:54 ` [Bug fortran/102510] Function call has unnecessary stride check rguenth at gcc dot gnu.org
@ 2021-09-28 13:55 ` dwwork at gmail dot com
2021-09-28 19:03 ` anlauf at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: dwwork at gmail dot com @ 2021-09-28 13:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510
--- Comment #2 from Dalon Work <dwwork at gmail dot com> ---
Thanks for the information. Based on your comments, I've created 2 new
subroutines that call the "bad" function. The first places the result in a
contiguous array, while the second places the result in a strided array.
(https://godbolt.org/z/bTnWr3bMn)
The first:
subroutine add2vecs3(a,b,c)
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(8), intent(out) :: c
c = add2vecs2(a,b)
end subroutine
With "-O3 -mavx", this subroutine becomes fully vectorized:
__blah_MOD_add2vecs3:
vmovups ymm0, YMMWORD PTR [rdi]
vaddps ymm0, ymm0, YMMWORD PTR [rsi]
vmovups YMMWORD PTR [rdx], ymm0
vzeroupper
ret
The second:
subroutine add2vecs4(a,b,c)
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(16), intent(out) :: c
c(1:16:2) = add2vecs2(a,b)
end subroutine
In this case we get the non-vectorized version:
__blah_MOD_add2vecs4:
vmovups ymm0, YMMWORD PTR [rsi]
vaddps ymm0, ymm0, YMMWORD PTR [rdi]
vmovss DWORD PTR [rdx], xmm0
vextractps DWORD PTR [rdx+8], xmm0, 1
vextractps DWORD PTR [rdx+16], xmm0, 2
vextractps DWORD PTR [rdx+24], xmm0, 3
vextractf128 xmm0, ymm0, 0x1
vmovss DWORD PTR [rdx+32], xmm0
vextractps DWORD PTR [rdx+40], xmm0, 1
vextractps DWORD PTR [rdx+48], xmm0, 2
vextractps DWORD PTR [rdx+56], xmm0, 3
vzeroupper
ret
>From this, it seems you are correct. The result gets passed in as a descriptor
to a block of memory and from that the function figures out the best way to
fill in the data. Perhaps other compilers handle this differently, but there we
have it.
Changing this behavior might be difficult or impossible, as this would be an
ABI change, would it not? It's arguable whether it's even worth changing.
Perhaps other compilers do it differently. I guess what I assumed is that the
compiler would have a contigous block of memory available for the return
result. Any necessary striding would happen external to the function.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug fortran/102510] Function call has unnecessary stride check
2021-09-28 2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
2021-09-28 8:54 ` [Bug fortran/102510] Function call has unnecessary stride check rguenth at gcc dot gnu.org
2021-09-28 13:55 ` dwwork at gmail dot com
@ 2021-09-28 19:03 ` anlauf at gcc dot gnu.org
2021-09-28 19:26 ` dwwork at gmail dot com
2021-09-29 21:01 ` anlauf at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: anlauf at gcc dot gnu.org @ 2021-09-28 19:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510
--- Comment #3 from anlauf at gcc dot gnu.org ---
It helps to look at the (Fortran) context. As written, the subroutine version
is declared with explicit size contiguous arrays. If the caller has a
non-contiguous (strided) result array, it needs to pack/unpack. For the
function version - as is - we might need a temporary to handle different
situations.
However, if you offer the compiler the chance to inline the calls, and using
optimization to inline the packing, you may get better code than you think.
Compile this example with -O3 -mavx:
module p
use iso_fortran_env, only: r32 => real32
real(r32), dimension(8) :: a,b
real(r32), dimension(8) :: c1, c2
real(r32), dimension(16) :: d1, d2
contains
subroutine add2vecs1(a,b,c)
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(8), intent(out) :: c
c = a + b
end subroutine add2vecs1
function add2vecs2(a,b)
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(8) :: add2vecs2
add2vecs2 = a + b
end function add2vecs2
!-
subroutine s1 ()
call add2vecs1 (a, b, c1)
end subroutine s1
!-
subroutine s2 ()
c2 = add2vecs2 (a, b)
end subroutine s2
!-
subroutine s3 ()
call add2vecs1 (a, b, d1(1:16:2))
end subroutine s3
!-
subroutine s4 ()
d2(1:16:2) = add2vecs2 (a, b)
end subroutine s4
end
You'll find that s1 and s2 compile to the same code, and the strided versions
s3 and s4 (at least this is my reading of the assembly, but correct me if I
am wrong).
Is there really more to expect?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug fortran/102510] Function call has unnecessary stride check
2021-09-28 2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
` (2 preceding siblings ...)
2021-09-28 19:03 ` anlauf at gcc dot gnu.org
@ 2021-09-28 19:26 ` dwwork at gmail dot com
2021-09-29 21:01 ` anlauf at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: dwwork at gmail dot com @ 2021-09-28 19:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510
--- Comment #4 from Dalon Work <dwwork at gmail dot com> ---
No, I don't think there is more to expect. My mistaken assumption was that the
return value of the function had to be a contiguous array of 8 elements. I
don't find this to be a stupid assumption, since the declaration looks so
similar to that of the dummy arguments. See:
function add2vecs2(a,b)
use iso_fortran_env, only: r32 => real32
real(r32), dimension(8), intent(in) :: a,b
real(r32), dimension(8) :: add2vecs2 ! Basically the same as the prev line
add2vecs2 = a + b
end function
But, as I've been informed, that is not the case. Apparently nothing enforces
this "contiguous" restriction on function array return values, and so gfortran
has taken the liberty of constructing an array descriptor instead. It's good to
know that when given more information, inlining is able to figure that out, but
when compiled in isolation, the function needs to check the array descriptor.
I'm satisfied with the answers here, so if you want to close it as not a bug,
I'm okay with that.
Thanks for the help.
Dalon
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug fortran/102510] Function call has unnecessary stride check
2021-09-28 2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
` (3 preceding siblings ...)
2021-09-28 19:26 ` dwwork at gmail dot com
@ 2021-09-29 21:01 ` anlauf at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: anlauf at gcc dot gnu.org @ 2021-09-29 21:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102510
anlauf at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |WORKSFORME
--- Comment #5 from anlauf at gcc dot gnu.org ---
If you collect your procedures in a separate file, there is the possible
question of what happens when you "use" these in another module.
A possible expectation is that this situation should be handled with LTO.
I have not checked whether this works (at the time of writing), though.
I'ld like to close this issue for now. Please reopen or open a new one
if you think there is room for improvement.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-09-29 21:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28 2:17 [Bug fortran/102510] New: Function call has unnecessary aliasing check dwwork at gmail dot com
2021-09-28 8:54 ` [Bug fortran/102510] Function call has unnecessary stride check rguenth at gcc dot gnu.org
2021-09-28 13:55 ` dwwork at gmail dot com
2021-09-28 19:03 ` anlauf at gcc dot gnu.org
2021-09-28 19:26 ` dwwork at gmail dot com
2021-09-29 21:01 ` anlauf at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).