public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/101053] New: Incorrect code at -O1 on arm64
@ 2021-06-14  5:25 gilles.gouaillardet at gmail dot com
  2021-06-14  5:34 ` [Bug fortran/101053] " pinskia at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14  5:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

            Bug ID: 101053
           Summary: Incorrect code at -O1 on arm64
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gilles.gouaillardet at gmail dot com
  Target Milestone: ---

Created attachment 51003
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51003&action=edit
A simple reproducer

This issue was initially reported at
https://github.com/numpy/numpy/issues/18422

Bottom line, since the gcc-9 series(!), gfortran generates incorrect code for
OpenBLAS from -O1 on arm64.

Here is how to reproduce the issue:

# set the local prefix (to be customized)
prefix=...

# Download OpenBLAS
wget
https://github.com/xianyi/OpenBLAS/releases/download/v0.3.15/OpenBLAS-0.3.15.tar.gz

# Build and install OpenBLAS
tar xfz OpenBLAS-0.3.15.tar.gz
cd OpenBLAS-0.3.15/
make -j 56 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran' 
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1' COMMON_OPT="-g -O1"
make install PREFIX=$prefix
cd ..

# Build and execute the attached reproducer
gfortran dgehd2.f90 -o dgehd2 -L$prefix/lib -Wl,-rpath,$prefix/lib -lopenblas
./dgehd2

Expected result (obtained with gfortran 8.3.1 (from rhel8) and 8.5.0, or if
OpenBLAS is built with COMMON_OPT="-g -O0":
 INFO =            0
   1.0000000000000000       -8.0622577482985491       0.58032253547122137      
-3.5970073030870449        11.461538461538458       -3.6923076923076938     
-0.24806946917841688        4.3076923076923075        2.5384615384615383     

Current result (from gfortran 9.1.0 up to the gcc-12-20210606 snapshot):
 INFO =            0
   1.0000000000000000       -8.0622577482985491       0.58032253547122137      
               -Infinity                       NaN                       NaN   
             -Infinity                       NaN                       NaN


The faulty code is in the dgehd2 subroutine:
      PARAMETER          ( ONE = 1.0D+0 )

      DO 10 I = ILO, IHI - 1
         CALL DLARFG( IHI-I, A( I+1, I ), A( MIN( I+2, N ), I ), 1,
     $                TAU( I ) )
         AII = A( I+1, I )
         A( I+1, I ) = ONE
         CALL DLARF( 'Right', IHI, IHI-I, A( I+1, I ), 1, TAU( I ),
     $               A( 1, I+1 ), LDA, WORK )
         CALL DLARF( 'Left', IHI-I, N-I, A( I+1, I ), 1, TAU( I ),
     $               A( I+1, I+1 ), LDA, WORK )
         A( I+1, I ) = AII
   10 CONTINUE

At the following line
         A( I+1, I ) = ONE


Here is a snippet of the assembly (generated with gfortran 10.3.0)
.LBE9:  
        .loc 1 206 72 view .LVU34
        fmov    d9, 1.0e+0
.LBB10: 
        .loc 1 211 72 view .LVU35
        adrp    x0, .LC1
        add     x0, x0, :lo12:.LC1
        str     x0, [sp, 192]
.LBE10:
.LBB11: 
        .loc 1 216 72 view .LVU36
        adrp    x0, .LC2
        add     x0, x0, :lo12:.LC2
        str     x0, [sp, 200]
.LVL20:
.L7:
        .loc 1 216 72 is_stmt 0 view .LVU37
.LBE11:
.LBB12: 
        .loc 1 204 72 is_stmt 1 view .LVU38
        ldr     w0, [x22]
        sub     w0, w0, w20
        str     w0, [sp, 224]
        add     w0, w20, 2
        ldr     w2, [x26]
        cmp     w2, w0
        csel    w2, w2, w0, le
        mov     w24, w20
        add     w20, w20, 1
.LVL21: 
        .loc 1 204 72 is_stmt 0 view .LVU39
        add     x2, x23, x2, sxtw
        mov     x4, x21
        mov     x3, x25
        ldr     x0, [sp, 136]
        add     x2, x0, x2, lsl 3
        mov     x1, x19
        ldr     x0, [sp, 184]
        bl      dlarfg_
.LVL22:
.LBE12: 
        .loc 1 205 72 is_stmt 1 view .LVU40
        ldr     d8, [x19]
.LVL23: 
        .loc 1 206 72 view .LVU41
        str     d9, [x19]


The constant 1.0D+0 is stored in $d9, but this register is used **after** the
invocation of the dlarfg_ subroutine, and it turns out this subroutine does
modify the $d9 register.
When $d9 is used to be stored into [x19], its value is
(gdb) p $d9
$1 = ( f = inf, u = 9218868437227405312, s = 9218868437227405312 )

If I set a breakpoint at that instruction, and manually
(gdb) set $d9=1.0
then the program behaves as expected.


Bottom line, there is an issue from gfortran 9 on arm64 from -O1 with this:
 - Did gfortran incorrectly assume $d9 will not be modified (or at least, will
be restored) by other subroutines?
 - Did dlarfg_ forget to restore $d9?
 - Something else?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug fortran/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
@ 2021-06-14  5:34 ` pinskia at gcc dot gnu.org
  2021-06-14  5:52 ` [Bug target/101053] " pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14  5:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>From aapcs64:
Registers v8-v15 must be preserved by a callee across subroutine calls
...
Additionally, only the bottom 64 bits of each value stored in v8-v15 need to be
preserved



So the bug is in dlarfg_.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
  2021-06-14  5:34 ` [Bug fortran/101053] " pinskia at gcc dot gnu.org
@ 2021-06-14  5:52 ` pinskia at gcc dot gnu.org
  2021-06-14  5:57 ` gilles.gouaillardet at gmail dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14  5:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The code for dlarfg_ looks correct.
  78:   6d0627e8        stp     d8, d9, [sp, #96]
....
 238:   6d4627e8        ldp     d8, d9, [sp, #96]
 23c:   fd402fea        ldr     d10, [sp, #88]
 240:   17ffff8a        b       68 <dlarfg_+0x68>


The only thing I can think of is that someone is writing in the stack location.
I guess I Need to run the code to find out.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
  2021-06-14  5:34 ` [Bug fortran/101053] " pinskia at gcc dot gnu.org
  2021-06-14  5:52 ` [Bug target/101053] " pinskia at gcc dot gnu.org
@ 2021-06-14  5:57 ` gilles.gouaillardet at gmail dot com
  2021-06-14  6:07 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14  5:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #3 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
Thanks for the clarification about which registers have to be preserved.

I will dig this a bit more from now

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
                   ` (2 preceding siblings ...)
  2021-06-14  5:57 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14  6:07 ` pinskia at gcc dot gnu.org
  2021-06-14  6:08 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14  6:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I tried to reproduce it but I could not on the trunk (it has one patch which
should not make a difference):
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/home/ubuntu/upstream-gcc/libexec/gcc/aarch64-unknown-linux-gnu/12.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /home/ubuntu/src/upstream-gcc-aarch64/gcc/configure
--prefix=/home/ubuntu/upstream-gcc --enable-languages=c,c++,fortran,lto,objc,go
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210604 (experimental) [master revision
a3f6bd78914:c52e7e6a230:f1e9719e06611fdbdedccbab54d32fc6611a2fb0] (GCC)

make -j 24 libs netlib shared  BINARY='64'  CC='gcc'  FC='gfortran' 
MAKE_NB_JOBS='-1'  USE_OPENMP='1'  USE_THREAD='1' COMMON_OPT="-g -O1"


ubuntu@ubuntu:~/src/OpenBLAS-0.3.15\# gfortran libopenblas.a -o test1 test.f90 
libopenblas.a -lgomp -pthread
ubuntu@ubuntu:~/src/OpenBLAS-0.3.15\# ./test1
 INFO =            0
   1.0000000000000000       -8.0622577482985491       0.58032253547122137      
-3.5970073030870449        11.461538461538458       -3.6923076923076938     
-0.24806946917841688        4.3076923076923075        2.5384615384615383

--------- CUT -------
I tried the shared library version and that also passed.

Maybe the best thing to do is set a breakpoint at dlarfg_ and then watch the
stack location where d9 is stored to and see who changes it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
                   ` (3 preceding siblings ...)
  2021-06-14  6:07 ` pinskia at gcc dot gnu.org
@ 2021-06-14  6:08 ` pinskia at gcc dot gnu.org
  2021-06-14  6:30 ` gilles.gouaillardet at gmail dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14  6:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #4)
And yes I got the d9 register having 1.0 inside dgehd2 .

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
                   ` (4 preceding siblings ...)
  2021-06-14  6:08 ` pinskia at gcc dot gnu.org
@ 2021-06-14  6:30 ` gilles.gouaillardet at gmail dot com
  2021-06-14  6:33 ` gilles.gouaillardet at gmail dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14  6:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #6 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
I will set the watchpoint and follow the flow ...

That being said, I still see the issue with the latest snapshot

gcc (GCC) 12.0.0 20210613 (experimental)

./dgehd2 
 INFO =            0
   1.0000000000000000       -8.0622577482985491       0.58032253547122137      
               -Infinity                       NaN                       NaN   
             -Infinity                       NaN                       NaN

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
                   ` (5 preceding siblings ...)
  2021-06-14  6:30 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14  6:33 ` gilles.gouaillardet at gmail dot com
  2021-06-14  6:50 ` gilles.gouaillardet at gmail dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14  6:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #7 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
Note you have to 'make clean' before re-running 'make ...' with different
options.
Otherwise, pretty much nothing gets rebuilt.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
                   ` (6 preceding siblings ...)
  2021-06-14  6:33 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14  6:50 ` gilles.gouaillardet at gmail dot com
  2021-06-14  7:01 ` gilles.gouaillardet at gmail dot com
  2021-06-14  8:45 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14  6:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #8 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
It seems OpenBLAS is to be blamed after all ...


>From kernel/arm64/dznrm2_thunderx2t99.c:


#define REGINF          "d9"

static void nrm2_compute(BLASLONG n, FLOAT *x, BLASLONG inc_x,
                         double *ssq, double *scale)
{

        __asm__ __volatile__ (
        "       fmov    "REGINF", x6


There is no mechanism to push/pop $d9 in the inline assembly :-(


If you concur, please close this issue and my apologies for the noise.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
                   ` (7 preceding siblings ...)
  2021-06-14  6:50 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14  7:01 ` gilles.gouaillardet at gmail dot com
  2021-06-14  8:45 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: gilles.gouaillardet at gmail dot com @ 2021-06-14  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

--- Comment #9 from Gilles Gouaillardet <gilles.gouaillardet at gmail dot com> ---
And here is the one-liner to fix this mess

--- orig/OpenBLAS-0.3.15/kernel/arm64/dznrm2_thunderx2t99.c     2021-05-03
06:50:22.000000000 +0900
+++ OpenBLAS-0.3.15/kernel/arm64/dznrm2_thunderx2t99.c  2021-06-14
15:54:26.087232321 +0900
@@ -321,7 +321,7 @@
        : "cc",
          "memory",
          "x0", "x1", "x2", "x3", "x4", "x5", "x6",
-         "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7", "d8"
+         "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7", "d8", REGINF
        );

 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug target/101053] Incorrect code at -O1 on arm64
  2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
                   ` (8 preceding siblings ...)
  2021-06-14  7:01 ` gilles.gouaillardet at gmail dot com
@ 2021-06-14  8:45 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-14  8:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101053

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |MOVED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Gilles Gouaillardet from comment #8)
> It seems OpenBLAS is to be blamed after all ...
> 
> 
> From kernel/arm64/dznrm2_thunderx2t99.c:
> 
> 
> #define REGINF          "d9"
> 
> static void nrm2_compute(BLASLONG n, FLOAT *x, BLASLONG inc_x,
>                          double *ssq, double *scale)
> {
> 
>         __asm__ __volatile__ (
>         "       fmov    "REGINF", x6
> 
> 
> There is no mechanism to push/pop $d9 in the inline assembly :-(
> 
> 
> If you concur, please close this issue and my apologies for the noise.

Yes this is an openblas bug.  I also see why I could not reproduce it either
cpuid_arm64.c in openblas does not know OcteonTX2 96xx which I was doing the
testing on.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-14  8:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14  5:25 [Bug fortran/101053] New: Incorrect code at -O1 on arm64 gilles.gouaillardet at gmail dot com
2021-06-14  5:34 ` [Bug fortran/101053] " pinskia at gcc dot gnu.org
2021-06-14  5:52 ` [Bug target/101053] " pinskia at gcc dot gnu.org
2021-06-14  5:57 ` gilles.gouaillardet at gmail dot com
2021-06-14  6:07 ` pinskia at gcc dot gnu.org
2021-06-14  6:08 ` pinskia at gcc dot gnu.org
2021-06-14  6:30 ` gilles.gouaillardet at gmail dot com
2021-06-14  6:33 ` gilles.gouaillardet at gmail dot com
2021-06-14  6:50 ` gilles.gouaillardet at gmail dot com
2021-06-14  7:01 ` gilles.gouaillardet at gmail dot com
2021-06-14  8:45 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).