public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/105922] New: AArch64 SVE instruction generated with all SIMD lane active and zero-divide exception flag raised
@ 2022-06-11  0:10 kawakami.k at fujitsu dot com
  2022-06-12 21:07 ` [Bug target/105922] autovectorizer does not handle fp exceptions correctly for SVE pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: kawakami.k at fujitsu dot com @ 2022-06-11  0:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105922

            Bug ID: 105922
           Summary: AArch64 SVE instruction generated with all SIMD lane
                    active and zero-divide exception flag raised
           Product: gcc
           Version: 12.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kawakami.k at fujitsu dot com
  Target Milestone: ---

Created attachment 53118
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53118&action=edit
preprocessed file

FDIV SVE instruction is generated with the predicate register whose all bits
are active. This FDIV sets the divide-by-zero flag (the bit 1 (DZC) of FPSR
register) unnecessarily. Should this instruction be executed with the
appropriate predicate bits? 

In addition to FDIV, there may be other flags in the FPSR(IOC, OFC, UFC, IXC,
etc.) that could be set by performing operations on SIMD lanes that do not
contain the intended values in FADD/FSUB/FMUL as well.

/*
float a[7];
float b[7];

for(int i=0; i<7; i++) {
  a[i] = some_initial\values;
}

for(int i=0; i<7; i++) {
  b[i] = COEF / a[i];
}
*/

(p0 = {0x11, 0x11, 0x11, 0x1,  0x0,  0x0,  0x0, 0x0}
(p1 = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}
(z1.s = {3, 3.29999995, 3.5999999, 3.9000001, 4.19999981, 4.5, 4.80000019, 0,
0, 0, 0, 0, 0, 0, 0, 0}
add     x0, sp, 64
fdiv    z0.s, p0/m, z0.s, z1.s  # divide-by-zero exception raised
st1w    z0.s, p1, [x0]


% gcc -v -save-temps -O3 -g -march=armv8.2-a+sve main.c

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/12.1.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ./configure --prefix=/opt/causal/gcc-12.1.0_gcc-11.3.0
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.1.0 (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-g' '-march=armv8.2-a+sve'
'-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a-'

/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/12.1.0/cc1
-E -quiet -v main.c -march=armv8.2-a+sve -mlittle-endian -mabi=lp64 -g
-fworking-directory -O3 -fpch-preprocess -o a-main.i
ignoring nonexistent directory
"/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/../../../../aarch64-unknown-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:

/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/include
 /usr/local/include
 /opt/causal/gcc-12.1.0_gcc-11.3.0/include

/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/include-fixed
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-g' '-march=armv8.2-a+sve'
'-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a-'

/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/12.1.0/cc1
-fpreprocessed a-main.i -quiet -dumpdir a- -dumpbase main.c -dumpbase-ext .c
-march=armv8.2-a+sve -mlittle-endian -mabi=lp64 -g -O3 -version -o a-main.s
GNU C17 (GCC) version 12.1.0 (aarch64-unknown-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.1.2, MPFR version
3.1.6-p2, MPC version 1.0.2, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C17 (GCC) version 12.1.0 (aarch64-unknown-linux-gnu)
        compiled by GNU C version 12.1.0, GMP version 6.1.2, MPFR version
3.1.6-p2, MPC version 1.0.2, isl version none
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 44e79a65ea64887de47cdc9a11ff4739
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-g' '-march=armv8.2-a+sve'
'-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a-'
 as -v --gdwarf-5 -EL -march=armv8.2-a+sve -mabi=lp64 -o a-main.o a-main.s
GNU assembler version 2.38 (aarch64-unknown-linux-gnu) using BFD version (GNU
Binutils) 2.38
COMPILER_PATH=/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/12.1.0/:/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/12.1.0/:/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/:/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/:/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/
LIBRARY_PATH=/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/:/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-g' '-march=armv8.2-a+sve'
'-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a.'

/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/12.1.0/collect2
-plugin
/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/12.1.0/liblto_plugin.so
-plugin-opt=/opt/causal/gcc-12.1.0_gcc-11.3.0/libexec/gcc/aarch64-unknown-linux-gnu/12.1.0/lto-wrapper
-plugin-opt=-fresolution=a.res -plugin-opt=-pass-through=-lgcc
-plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s
--eh-frame-hdr -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -EL -maarch64linux
/lib/../lib64/crt1.o /lib/../lib64/crti.o
/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/crtbegin.o
-L/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0
-L/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64
-L/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/../../..
a-main.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc
--push-state --as-needed -lgcc_s --pop-state
/opt/causal/gcc-12.1.0_gcc-11.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.1.0/crtend.o
/lib/../lib64/crtn.o
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-O3' '-g' '-march=armv8.2-a+sve'
'-mlittle-endian' '-mabi=lp64' '-dumpdir' 'a.'

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105922] autovectorizer does not handle fp exceptions correctly for SVE
  2022-06-11  0:10 [Bug c/105922] New: AArch64 SVE instruction generated with all SIMD lane active and zero-divide exception flag raised kawakami.k at fujitsu dot com
@ 2022-06-12 21:07 ` pinskia at gcc dot gnu.org
  2022-06-14  7:38 ` rguenth at gcc dot gnu.org
  2024-02-29  5:33 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-06-12 21:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105922

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|AArch64 SVE instruction     |autovectorizer does not
                   |generated with all SIMD     |handle fp exceptions
                   |lane active and zero-divide |correctly for SVE
                   |exception flag raised       |
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-06-12

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. The division should have been predicated on the same as the
load/store but currently GCC does not do that.

GCC does not really support looking into fpu status bits or exceptions while
vectorizing either.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105922] autovectorizer does not handle fp exceptions correctly for SVE
  2022-06-11  0:10 [Bug c/105922] New: AArch64 SVE instruction generated with all SIMD lane active and zero-divide exception flag raised kawakami.k at fujitsu dot com
  2022-06-12 21:07 ` [Bug target/105922] autovectorizer does not handle fp exceptions correctly for SVE pinskia at gcc dot gnu.org
@ 2022-06-14  7:38 ` rguenth at gcc dot gnu.org
  2024-02-29  5:33 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-06-14  7:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105922

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> Confirmed. The division should have been predicated on the same as the
> load/store but currently GCC does not do that.
> 
> GCC does not really support looking into fpu status bits or exceptions while
> vectorizing either.

It effectively "supports" it by failing to vectorize when exception state
builtins are used in the vectorized region and otherwise it just accumulates
exception bits (but it doesn't support in-order traps if you enable exceptions
to trap).

Note there's a bit of confusion as to what exactly controls FP exception
bit correctness and the documentation should probably be clarified.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/105922] autovectorizer does not handle fp exceptions correctly for SVE
  2022-06-11  0:10 [Bug c/105922] New: AArch64 SVE instruction generated with all SIMD lane active and zero-divide exception flag raised kawakami.k at fujitsu dot com
  2022-06-12 21:07 ` [Bug target/105922] autovectorizer does not handle fp exceptions correctly for SVE pinskia at gcc dot gnu.org
  2022-06-14  7:38 ` rguenth at gcc dot gnu.org
@ 2024-02-29  5:33 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-29  5:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105922

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dup of bug 96373.

*** This bug has been marked as a duplicate of bug 96373 ***

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-29  5:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-11  0:10 [Bug c/105922] New: AArch64 SVE instruction generated with all SIMD lane active and zero-divide exception flag raised kawakami.k at fujitsu dot com
2022-06-12 21:07 ` [Bug target/105922] autovectorizer does not handle fp exceptions correctly for SVE pinskia at gcc dot gnu.org
2022-06-14  7:38 ` rguenth at gcc dot gnu.org
2024-02-29  5:33 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).