public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/101053] New: Incorrect code at -O1 on arm64
@ 2021-06-14 5:25 gilles.gouaillardet at gmail dot com
2021-06-14 5:34 ` [Bug fortran/101053] " pinskia at gcc dot gnu.org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14 5:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
Bug ID: 101053
Summary: Incorrect code at -O1 on arm64
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: gilles.gouaillardet at gmail dot com
Target Milestone: ---
Created attachment 51003
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51003&action=edit
A simple reproducer
This issue was initially reported at
https://github.com/numpy/numpy/issues/18422
Bottom line, since the gcc-9 series(!), gfortran generates incorrect code for
OpenBLAS from -O1 on arm64.
Here is how to reproduce the issue:
# set the local prefix (to be customized)
prefix=...
# Download OpenBLAS
wget
https://github.com/xianyi/OpenBLAS/releases/download/v0.3.15/OpenBLAS-0.3.15.tar.gz
# Build and install OpenBLAS
tar xfz OpenBLAS-0.3.15.tar.gz
cd OpenBLAS-0.3.15/
make -j 56 libs netlib shared BINARY='64' CC='gcc' FC='gfortran'
MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' COMMON_OPT="-g -O1"
make install PREFIX=$prefix
cd ..
# Build and execute the attached reproducer
gfortran dgehd2.f90 -o dgehd2 -L$prefix/lib -Wl,-rpath,$prefix/lib -lopenblas
./dgehd2
Expected result (obtained with gfortran 8.3.1 (from rhel8) and 8.5.0, or if
OpenBLAS is built with COMMON_OPT="-g -O0":
INFO = 0
1.0000000000000000 -8.0622577482985491 0.58032253547122137
-3.5970073030870449 11.461538461538458 -3.6923076923076938
-0.24806946917841688 4.3076923076923075 2.5384615384615383
Current result (from gfortran 9.1.0 up to the gcc-12-20210606 snapshot):
INFO = 0
1.0000000000000000 -8.0622577482985491 0.58032253547122137
-Infinity NaN NaN
-Infinity NaN NaN
The faulty code is in the dgehd2 subroutine:
PARAMETER ( ONE = 1.0D+0 )
DO 10 I = ILO, IHI - 1
CALL DLARFG( IHI-I, A( I+1, I ), A( MIN( I+2, N ), I ), 1,
$ TAU( I ) )
AII = A( I+1, I )
A( I+1, I ) = ONE
CALL DLARF( 'Right', IHI, IHI-I, A( I+1, I ), 1, TAU( I ),
$ A( 1, I+1 ), LDA, WORK )
CALL DLARF( 'Left', IHI-I, N-I, A( I+1, I ), 1, TAU( I ),
$ A( I+1, I+1 ), LDA, WORK )
A( I+1, I ) = AII
10 CONTINUE
At the following line
A( I+1, I ) = ONE
Here is a snippet of the assembly (generated with gfortran 10.3.0)
.LBE9:
.loc 1 206 72 view .LVU34
fmov d9, 1.0e+0
.LBB10:
.loc 1 211 72 view .LVU35
adrp x0, .LC1
add x0, x0, :lo12:.LC1
str x0, [sp, 192]
.LBE10:
.LBB11:
.loc 1 216 72 view .LVU36
adrp x0, .LC2
add x0, x0, :lo12:.LC2
str x0, [sp, 200]
.LVL20:
.L7:
.loc 1 216 72 is_stmt 0 view .LVU37
.LBE11:
.LBB12:
.loc 1 204 72 is_stmt 1 view .LVU38
ldr w0, [x22]
sub w0, w0, w20
str w0, [sp, 224]
add w0, w20, 2
ldr w2, [x26]
cmp w2, w0
csel w2, w2, w0, le
mov w24, w20
add w20, w20, 1
.LVL21:
.loc 1 204 72 is_stmt 0 view .LVU39
add x2, x23, x2, sxtw
mov x4, x21
mov x3, x25
ldr x0, [sp, 136]
add x2, x0, x2, lsl 3
mov x1, x19
ldr x0, [sp, 184]
bl dlarfg_
.LVL22:
.LBE12:
.loc 1 205 72 is_stmt 1 view .LVU40
ldr d8, [x19]
.LVL23:
.loc 1 206 72 view .LVU41
str d9, [x19]
The constant 1.0D+0 is stored in $d9, but this register is used **after** the
invocation of the dlarfg_ subroutine, and it turns out this subroutine does
modify the $d9 register.
When $d9 is used to be stored into [x19], its value is
(gdb) p $d9
$1 = ( f = inf, u = 9218868437227405312, s = 9218868437227405312 )
If I set a breakpoint at that instruction, and manually
(gdb) set $d9=1.0
then the program behaves as expected.
Bottom line, there is an issue from gfortran 9 on arm64 from -O1 with this:
- Did gfortran incorrectly assume $d9 will not be modified (or at least, will
be restored) by other subroutines?
- Did dlarfg_ forget to restore $d9?
- Something else?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug fortran/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
@ 2021-06-14 5:34 ` pinskia at gcc dot gnu.org
2021-06-14 5:52 ` [Bug target/101053] " pinskia at gcc dot gnu.org
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14 5:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>From aapcs64:
Registers v8-v15 must be preserved by a callee across subroutine calls
...
Additionally, only the bottom 64 bits of each value stored in v8-v15 need to be
preserved
So the bug is in dlarfg_.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
2021-06-14 5:34 ` [Bug fortran/101053] " pinskia at gcc dot gnu.org
@ 2021-06-14 5:52 ` pinskia at gcc dot gnu.org
2021-06-14 5:57 ` gilles.gouaillardet at gmail dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14 5:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The code for dlarfg_ looks correct.
78: 6d0627e8 stp d8, d9, [sp, #96]
....
238: 6d4627e8 ldp d8, d9, [sp, #96]
23c: fd402fea ldr d10, [sp, #88]
240: 17ffff8a b 68 <dlarfg_+0x68>
The only thing I can think of is that someone is writing in the stack location.
I guess I Need to run the code to find out.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
2021-06-14 5:34 ` [Bug fortran/101053] " pinskia at gcc dot gnu.org
2021-06-14 5:52 ` [Bug target/101053] " pinskia at gcc dot gnu.org
@ 2021-06-14 5:57 ` gilles.gouaillardet at gmail dot com
2021-06-14 6:07 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14 5:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #3 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
Thanks for the clarification about which registers have to be preserved.
I will dig this a bit more from now
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
` (2 preceding siblings ...)
2021-06-14 5:57 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14 6:07 ` pinskia at gcc dot gnu.org
2021-06-14 6:08 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14 6:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I tried to reproduce it but I could not on the trunk (it has one patch which
should not make a difference):
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/home/ubuntu/upstream-gcc/libexec/gcc/aarch64-unknown-linux-gnu/12.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /home/ubuntu/src/upstream-gcc-aarch64/gcc/configure
--prefix=/home/ubuntu/upstream-gcc --enable-languages=c,c++,fortran,lto,objc,go
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210604 (experimental) [master revision
a3f6bd78914:c52e7e6a230:f1e9719e06611fdbdedccbab54d32fc6611a2fb0] (GCC)
make -j 24 libs netlib shared BINARY='64' CC='gcc' FC='gfortran'
MAKE_NB_JOBS='-1' USE_OPENMP='1' USE_THREAD='1' COMMON_OPT="-g -O1"
ubuntu@ubuntu:~/src/OpenBLAS-0.3.15\# gfortran libopenblas.a -o test1 test.f90
libopenblas.a -lgomp -pthread
ubuntu@ubuntu:~/src/OpenBLAS-0.3.15\# ./test1
INFO = 0
1.0000000000000000 -8.0622577482985491 0.58032253547122137
-3.5970073030870449 11.461538461538458 -3.6923076923076938
-0.24806946917841688 4.3076923076923075 2.5384615384615383
--------- CUT -------
I tried the shared library version and that also passed.
Maybe the best thing to do is set a breakpoint at dlarfg_ and then watch the
stack location where d9 is stored to and see who changes it.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
` (3 preceding siblings ...)
2021-06-14 6:07 ` pinskia at gcc dot gnu.org
@ 2021-06-14 6:08 ` pinskia at gcc dot gnu.org
2021-06-14 6:30 ` gilles.gouaillardet at gmail dot com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14 6:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
And yes I got the d9 register having 1.0 inside dgehd2 .
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
` (4 preceding siblings ...)
2021-06-14 6:08 ` pinskia at gcc dot gnu.org
@ 2021-06-14 6:30 ` gilles.gouaillardet at gmail dot com
2021-06-14 6:33 ` gilles.gouaillardet at gmail dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14 6:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #6 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
I will set the watchpoint and follow the flow ...
That being said, I still see the issue with the latest snapshot
gcc (GCC) 12.0.0 20210613 (experimental)
./dgehd2
INFO = 0
1.0000000000000000 -8.0622577482985491 0.58032253547122137
-Infinity NaN NaN
-Infinity NaN NaN
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
` (5 preceding siblings ...)
2021-06-14 6:30 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14 6:33 ` gilles.gouaillardet at gmail dot com
2021-06-14 6:50 ` gilles.gouaillardet at gmail dot com
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14 6:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #7 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
Note you have to 'make clean' before re-running 'make ...' with different
options.
Otherwise, pretty much nothing gets rebuilt.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
` (6 preceding siblings ...)
2021-06-14 6:33 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14 6:50 ` gilles.gouaillardet at gmail dot com
2021-06-14 7:01 ` gilles.gouaillardet at gmail dot com
2021-06-14 8:45 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14 6:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #8 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
It seems OpenBLAS is to be blamed after all ...
>From kernel/arm64/dznrm2_thunderx2t99.c:
#define REGINF "d9"
static void nrm2_compute(BLASLONG n, FLOAT *x, BLASLONG inc_x,
double *ssq, double *scale)
{
__asm__ __volatile__ (
" fmov "REGINF", x6
There is no mechanism to push/pop $d9 in the inline assembly :-(
If you concur, please close this issue and my apologies for the noise.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
` (7 preceding siblings ...)
2021-06-14 6:50 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14 7:01 ` gilles.gouaillardet at gmail dot com
2021-06-14 8:45 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14 7:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
--- Comment #9 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
And here is the one-liner to fix this mess
--- orig/OpenBLAS-0.3.15/kernel/arm64/dznrm2_thunderx2t99.c 2021-05-03
06:50:22.000000000 +0900
+++ OpenBLAS-0.3.15/kernel/arm64/dznrm2_thunderx2t99.c 2021-06-14
15:54:26.087232321 +0900
@@ -321,7 +321,7 @@
: "cc",
"memory",
"x0", "x1", "x2", "x3", "x4", "x5", "x6",
- "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7", "d8"
+ "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7", "d8", REGINF
);
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug target/101053] Incorrect code at -O1 on arm64
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
` (8 preceding siblings ...)
2021-06-14 7:01 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14 8:45 ` pinskia at gcc dot gnu.org
9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14 8:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |MOVED
Status|UNCONFIRMED |RESOLVED
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Gilles Gouaillardet from comment #8)
> It seems OpenBLAS is to be blamed after all ...
>
>
> From kernel/arm64/dznrm2_thunderx2t99.c:
>
>
> #define REGINF "d9"
>
> static void nrm2_compute(BLASLONG n, FLOAT *x, BLASLONG inc_x,
> double *ssq, double *scale)
> {
>
> __asm__ __volatile__ (
> " fmov "REGINF", x6
>
>
> There is no mechanism to push/pop $d9 in the inline assembly :-(
>
>
> If you concur, please close this issue and my apologies for the noise.
Yes this is an openblas bug. I also see why I could not reproduce it either
cpuid_arm64.c in openblas does not know OcteonTX2 96xx which I was doing the
testing on.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-06-14 8:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14 5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
2021-06-14 5:34 ` [Bug fortran/101053] " pinskia at gcc dot gnu.org
2021-06-14 5:52 ` [Bug target/101053] " pinskia at gcc dot gnu.org
2021-06-14 5:57 ` gilles.gouaillardet at gmail dot com
2021-06-14 6:07 ` pinskia at gcc dot gnu.org
2021-06-14 6:08 ` pinskia at gcc dot gnu.org
2021-06-14 6:30 ` gilles.gouaillardet at gmail dot com
2021-06-14 6:33 ` gilles.gouaillardet at gmail dot com
2021-06-14 6:50 ` gilles.gouaillardet at gmail dot com
2021-06-14 7:01 ` gilles.gouaillardet at gmail dot com
2021-06-14 8:45 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).