public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later
@ 2024-04-10 10:52 aleksei.nikiforov at linux dot ibm.com
  2024-04-10 11:06 ` [Bug target/114676] " pinskia at gcc dot gnu.org
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: aleksei.nikiforov at linux dot ibm.com @ 2024-04-10 10:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

            Bug ID: 114676
           Summary: [12/13/14 Regression] DSE removes assignment that is
                    used later
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: aleksei.nikiforov at linux dot ibm.com
  Target Milestone: ---

Created attachment 57916
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57916&action=edit
GridSamplerKernel.cpp.ZVECTOR.cpp.o.prep2.cpp.bz2

When building pytorch on s390x with gcc >= 12, resulting pytorch application
crashes in some tests. It doesn't happen with gcc <= 11. I've bisected gcc, and
issue first appears with gcc commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6.
I've also found in which object file miscompilation happens.

gcc configuration:
/bin/sh /var/tmp/portage/sys-devel/gcc-12.3.9999/work/gcc-12.3.9999/configure
--host=s390x-ibm-linux-gnu --build=s390x-ibm-linux-gnu --prefix=/usr
--bindir=/usr/s390x-ibm-linux-gnu/gcc-bin/12
--includedir=/usr/lib/gcc/s390x-ibm-linux-gnu/12/include
--datadir=/usr/share/gcc-data/s390x-ibm-linux-gnu/12
--mandir=/usr/share/gcc-data/s390x-ibm-linux-gnu/12/man
--infodir=/usr/share/gcc-data/s390x-ibm-linux-gnu/12/info
--with-gxx-include-dir=/usr/lib/gcc/s390x-ibm-linux-gnu/12/include/g++-v12
--disable-silent-rules --disable-dependency-tracking
--with-python-dir=/share/gcc-data/s390x-ibm-linux-gnu/12/python
--enable-languages='c,c++,fortran' --enable-obsolete --enable-secureplt
--disable-werror --with-system-zlib --enable-nls --without-included-gettext
--disable-libunwind-exceptions --enable-checking=release
--with-bugurl='https://bugs.gentoo.org/' --with-pkgversion='Gentoo 12.0.0,
commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6' --with-gcc-major-version-only
--enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
--disable-multilib --disable-fixed-point --enable-libgomp --disable-libssp
--disable-libada --disable-cet --disable-systemtap
--disable-valgrind-annotations --disable-vtable-verify --disable-libvtv
--without-zstd --without-isl --disable-libsanitizer --enable-default-pie
--enable-default-ssp --with-arch=z15

I'm attaching preprocessed file. Full compilation command is:
/usr/bin/g++-12 -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB
-DFMT_HEADER_ONLY=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1
-DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
-DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO
-DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE
-D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS
-I/home/user/work12/pytorch/build/aten/src -I/home/user/work12/pytorch/aten/src
-I/home/user/work12/pytorch/build -I/home/user/work12/pytorch
-I/home/user/work12/pytorch/cmake/../third_party/benchmark/include
-I/home/user/work12/pytorch/third_party/onnx
-I/home/user/work12/pytorch/build/third_party/onnx
-I/home/user/work12/pytorch/third_party/foxi
-I/home/user/work12/pytorch/build/third_party/foxi
-I/home/user/work12/pytorch/torch/csrc/api
-I/home/user/work12/pytorch/torch/csrc/api/include
-I/home/user/work12/pytorch/caffe2/aten/src/TH
-I/home/user/work12/pytorch/build/caffe2/aten/src/TH
-I/home/user/work12/pytorch/build/caffe2/aten/src
-I/home/user/work12/pytorch/build/caffe2/../aten/src
-I/home/user/work12/pytorch/torch/csrc
-I/home/user/work12/pytorch/third_party/miniz-2.1.0
-I/home/user/work12/pytorch/third_party/kineto/libkineto/include
-I/home/user/work12/pytorch/third_party/kineto/libkineto/src
-I/home/user/work12/pytorch/aten/src/ATen/.. -I/home/user/work12/pytorch/c10/..
-I/home/user/work12/pytorch/third_party/FP16/include
-I/home/user/work12/pytorch/third_party/tensorpipe
-I/home/user/work12/pytorch/build/third_party/tensorpipe
-I/home/user/work12/pytorch/third_party/tensorpipe/third_party/libnop/include
-I/home/user/work12/pytorch/third_party/fmt/include
-I/home/user/work12/pytorch/third_party/flatbuffers/include -isystem
/home/user/work12/pytorch/build/third_party/gloo -isystem
/home/user/work12/pytorch/cmake/../third_party/gloo -isystem
/home/user/work12/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include
-isystem
/home/user/work12/pytorch/cmake/../third_party/googletest/googlemock/include
-isystem
/home/user/work12/pytorch/cmake/../third_party/googletest/googletest/include
-isystem /home/user/work12/pytorch/third_party/protobuf/src -isystem
/home/user/work12/pytorch/cmake/../third_party/eigen -isystem
/home/user/work12/pytorch/build/include -march=z15 -D_GLIBCXX_USE_CXX11_ABI=1
-fvisibility-inlines-hidden -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI
-DLIBKINETO_NOROCTRACER -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall
-Wextra -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits
-Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter
-Wno-unused-function -Wno-unused-result -Wno-strict-overflow
-Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi
-Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces
-fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable
-Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math
-Wno-stringop-overflow -DHAVE_ZVECTOR_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG
-std=gnu++17 -fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra
-Wdeprecated -Wno-unused-parameter -Wno-unused-function
-Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits
-Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing
-Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -O3    -mvx -mzvector
-march=z15 -mtune=z15 -DCPU_CAPABILITY=ZVECTOR -DCPU_CAPABILITY_ZVECTOR -MD -MT
caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.ZVECTOR.cpp.o
-MF
caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.ZVECTOR.cpp.o.d
-o
caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.ZVECTOR.cpp.o
-c
/home/user/work12/pytorch/build/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.ZVECTOR.cpp


There are following lines in file around line 121590:

    integer_t mask_arr[iVec::size()];
    mask.store(mask_arr);


    scalar_t gInp_corner_arr[Vec::size()];
    delta.store(gInp_corner_arr);

    mask_scatter_add(gInp_corner_arr, data, i_gInp_offset_arr, mask_arr, len);

store call (lines 117929-117940):
  void __attribute__((__always_inline__)) inline store(void* ptr, int count =
size()) const {
    if (count == size()) {

# 421
"/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h" 3
4
     __builtin_s390_vec_xst
# 421
"/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h"
            (_vec0, offset0, reinterpret_cast<ElementType*>(ptr));

# 422
"/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h" 3
4
     __builtin_s390_vec_xst
# 422
"/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h"
            (_vec1, offset16, reinterpret_cast<ElementType*>(ptr));

mask.store(mask_arr) is first replaced by 2 corresponding calls to
__builtin_s390_vec_xst, and those are later incorrectly removed by DSE.

I've also ran compilation command with -fdump-tree-all-all -fdump-rtl-all-all.
In file *.040t.dse1 I've found following lines:

;; Function at::native::{anonymous}::ApplyGridSample<double, 2,
at::native::detail::GridSamplerInterpolation::Bicubic,
at::native::detail::GridSamplerPadding::Border, true>::add_value_bounded
(_ZNK2at6nat
ive12_GLOBAL__N_115ApplyGridSampleIdLi2ELNS0_6detail24GridSamplerInterpolationE2ELNS3_18GridSamplerPaddingE1ELb1EE17add_value_boundedEPdlRKNS_3vec7ZVECTOR10VectorizedIdvEESD_SD_,
funcdef_no=13629, decl_ui
d=274419, cgraph_uid=8075, symbol_order=9478)


Pass statistics of "dse": ----------------

  Deleted dead store: # .MEM_369 = VDEF <.MEM_368>
MEM <const vtypeD.254540> [(ElementTypeD.254545 *)&mask_arrD.383153 + 16B] =
_244;

  Deleted dead store: # .MEM_368 = VDEF <.MEM_360>
MEM <const vtypeD.254540> [(ElementTypeD.254545 *)&mask_arrD.383153] = _242;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
@ 2024-04-10 11:06 ` pinskia at gcc dot gnu.org
  2024-04-10 11:22 ` aleksei.nikiforov at linux dot ibm.com
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-10 11:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code
          Component|tree-optimization           |target
   Target Milestone|---                         |12.4

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
        /* Build a vector type with the alignment of the target
           location in order to enable correct alignment hints to be
           generated for vst.  */
        tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]),
                                            TYPE_ALIGN (TREE_TYPE (TREE_TYPE
((*arglist)[2]))));
        return build2 (MODIFY_EXPR, mem_type,
                       build1 (INDIRECT_REF, mem_type,
                               fold_build_pointer_plus ((*arglist)[2],
(*arglist)[1])),
                       (*arglist)[0]);


Does -fno-strict-aliasing help? I wonder if the above (which is
S390_OVERLOADED_BUILTIN_s390_vec_xst from s390_expand_overloaded_builtin)
should be have an may_alias variant .

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
  2024-04-10 11:06 ` [Bug target/114676] " pinskia at gcc dot gnu.org
@ 2024-04-10 11:22 ` aleksei.nikiforov at linux dot ibm.com
  2024-04-10 11:25 ` pinskia at gcc dot gnu.org
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: aleksei.nikiforov at linux dot ibm.com @ 2024-04-10 11:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #2 from Aleksei Nikiforov <aleksei.nikiforov at linux dot ibm.com> ---
Yes, -fno-strict-aliasing helped.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
  2024-04-10 11:06 ` [Bug target/114676] " pinskia at gcc dot gnu.org
  2024-04-10 11:22 ` aleksei.nikiforov at linux dot ibm.com
@ 2024-04-10 11:25 ` pinskia at gcc dot gnu.org
  2024-04-10 11:28 ` aleksei.nikiforov at linux dot ibm.com
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-10 11:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Aleksei Nikiforov from comment #2)
> Yes, -fno-strict-aliasing helped.

Then the issue is in s390_expand_overloaded_builtin where it should be more
aliasing friendly. and it is a target issue ...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (2 preceding siblings ...)
  2024-04-10 11:25 ` pinskia at gcc dot gnu.org
@ 2024-04-10 11:28 ` aleksei.nikiforov at linux dot ibm.com
  2024-04-10 11:33 ` pinskia at gcc dot gnu.org
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: aleksei.nikiforov at linux dot ibm.com @ 2024-04-10 11:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #4 from Aleksei Nikiforov <aleksei.nikiforov at linux dot ibm.com> ---
Yes, it doesn't reproduce on x86_64, and previously getting rid of -O3 or using
-fno-tree-dse also helped.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (3 preceding siblings ...)
  2024-04-10 11:28 ` aleksei.nikiforov at linux dot ibm.com
@ 2024-04-10 11:33 ` pinskia at gcc dot gnu.org
  2024-04-10 13:40 ` jakub at gcc dot gnu.org
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-10 11:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> I've bisected gcc, and issue first appears with gcc commit
> 32955416d8040b1fa1ba21cd4179b3264e6c5bd6.

This just improved DSE but I suspect there might be a way to reproduce it
before that.

Note the rs6000 backend works correctly with vec_sxt/vec_xl because it builds a
MEM_REF directly instead of an INDIRECT_REF.
```
        /* Since arg1 may be cast to a different type, just use ptr_type_node
           here instead of trying to enforce TBAA on pointer types.  */
        tree arg1_type = ptr_type_node;

...
        /* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
           take an offset, but since we've already incorporated the offset
           above, here we just pass in a zero.  */
        gimple *g
          = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,
                                              build_int_cst (arg1_type, 0)));

```
arg1_type is used for the aliasing set ...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (4 preceding siblings ...)
  2024-04-10 11:33 ` pinskia at gcc dot gnu.org
@ 2024-04-10 13:40 ` jakub at gcc dot gnu.org
  2024-04-10 16:47 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-04-10 13:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |iii at gcc dot gnu.org,
                   |                            |jakub at gcc dot gnu.org,
                   |                            |krebbel at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
>         /* Build a vector type with the alignment of the target
>            location in order to enable correct alignment hints to be
>            generated for vst.  */
>         tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]),
>                                             TYPE_ALIGN (TREE_TYPE (TREE_TYPE
> ((*arglist)[2]))));
>         return build2 (MODIFY_EXPR, mem_type,
>                        build1 (INDIRECT_REF, mem_type,
>                                fold_build_pointer_plus ((*arglist)[2],
> (*arglist)[1])),
>                        (*arglist)[0]);
> 
> 
> Does -fno-strict-aliasing help? I wonder if the above (which is
> S390_OVERLOADED_BUILTIN_s390_vec_xst from s390_expand_overloaded_builtin)
> should be have an may_alias variant .

Building MEM_REF is one option, another one is to cast the operand of
INDIRECT_REF to
build_pointer_type (mem_type, VOIDmode, true) type.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (5 preceding siblings ...)
  2024-04-10 13:40 ` jakub at gcc dot gnu.org
@ 2024-04-10 16:47 ` rguenth at gcc dot gnu.org
  2024-04-11 12:30 ` krebbel at gcc dot gnu.org
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-04-10 16:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
The question is what the intrinsic constraints are on TBAA.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (6 preceding siblings ...)
  2024-04-10 16:47 ` rguenth at gcc dot gnu.org
@ 2024-04-11 12:30 ` krebbel at gcc dot gnu.org
  2024-04-11 12:35 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-04-11 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #8 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
Apparently, I decided to go with a MEM_REF already for the load variant of the
builtin - vec_xl. I've to check whether there was any reason not to do this
also for vec_xst.

Making it a pointer which aliases everything might be too big of a hammer I
guess?!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (7 preceding siblings ...)
  2024-04-11 12:30 ` krebbel at gcc dot gnu.org
@ 2024-04-11 12:35 ` jakub at gcc dot gnu.org
  2024-04-11 12:36 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-04-11 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
It depends on what the vec_xl*/vec_xst* documentation requires/user expect.
If the expectation is that the loads/stores should alias the scalar pointee of
the pointer type passed to those intrinsics, then
--- gcc/config/s390/s390-c.cc.jj        2024-01-03 11:51:54.847407588 +0100
+++ gcc/config/s390/s390-c.cc   2024-04-10 16:30:31.896197832 +0200
@@ -498,8 +498,8 @@ s390_expand_overloaded_builtin (location
        /* Build a vector type with the alignment of the source
           location in order to enable correct alignment hints to be
           generated for vl.  */
-       tree mem_type = build_aligned_type (return_type,
-                                           TYPE_ALIGN (TREE_TYPE (TREE_TYPE
((*arglist)[1]))));
+       unsigned align = TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[1])));
+       tree mem_type = build_aligned_type (return_type, align);
        return build2 (MEM_REF, mem_type,
                       fold_build_pointer_plus ((*arglist)[1], (*arglist)[0]),
                       build_int_cst (TREE_TYPE ((*arglist)[1]), 0));
@@ -511,11 +511,13 @@ s390_expand_overloaded_builtin (location
        /* Build a vector type with the alignment of the target
           location in order to enable correct alignment hints to be
           generated for vst.  */
-       tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]),
-                                           TYPE_ALIGN (TREE_TYPE (TREE_TYPE
((*arglist)[2]))));
+       unsigned align = TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[2])));
+       tree mem_type = build_aligned_type (TREE_TYPE ((*arglist)[0]), align);
        return build2 (MODIFY_EXPR, mem_type,
-                      build1 (INDIRECT_REF, mem_type,
-                              fold_build_pointer_plus ((*arglist)[2],
(*arglist)[1])),
+                      build2 (MEM_REF, mem_type,
+                              fold_build_pointer_plus ((*arglist)[2],
+                                                       (*arglist)[1]),
+                              build_int_cst (TREE_TYPE ((*arglist)[2]), 0)),
                       (*arglist)[0]);
       }
     case S390_OVERLOADED_BUILTIN_s390_vec_load_pair:
might be all that is needed (if it is needed at all).
If the expectation is that one can read what has been written by those
intrinsics or write
what will be read by those intrinsics though unrelated effective pointers, then
it should alias everything,
which could be say just using ptr_type_node instead of the current types in
both of the build_int_cst (..., 0) calls above.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (8 preceding siblings ...)
  2024-04-11 12:35 ` jakub at gcc dot gnu.org
@ 2024-04-11 12:36 ` jakub at gcc dot gnu.org
  2024-04-11 15:48 ` krebbel at gcc dot gnu.org
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-04-11 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I admit I haven't studied what exactly pytorch does there.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (9 preceding siblings ...)
  2024-04-11 12:36 ` jakub at gcc dot gnu.org
@ 2024-04-11 15:48 ` krebbel at gcc dot gnu.org
  2024-04-12 13:32 ` law at gcc dot gnu.org
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-04-11 15:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #11 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
The documentation of vec_xl and vec_xst doesn't seem to mention anything
special with regard to that. So I understand the memory is only accessed
through pointers which are compatible to the ones used when invoking the
builtin.

That particular usage within pytorch looks ok to me.

I'm already testing a patch which matches what you are proposing. I hope to be
able to reduce the testcase somewhat.

Thanks for your help!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (10 preceding siblings ...)
  2024-04-11 15:48 ` krebbel at gcc dot gnu.org
@ 2024-04-12 13:32 ` law at gcc dot gnu.org
  2024-04-12 18:36 ` sjames at gcc dot gnu.org
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: law at gcc dot gnu.org @ 2024-04-12 13:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |law at gcc dot gnu.org
           Priority|P3                          |P1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-04-12

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (11 preceding siblings ...)
  2024-04-12 13:32 ` law at gcc dot gnu.org
@ 2024-04-12 18:36 ` sjames at gcc dot gnu.org
  2024-04-17 17:01 ` krebbel at gcc dot gnu.org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: sjames at gcc dot gnu.org @ 2024-04-12 18:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1                          |P2

--- Comment #12 from Sam James <sjames at gcc dot gnu.org> ---
P1->P2 after IRC discussion (we released with it & unclear what the intrinsic
behaviour is even supposed to be wrt tbaa).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (12 preceding siblings ...)
  2024-04-12 18:36 ` sjames at gcc dot gnu.org
@ 2024-04-17 17:01 ` krebbel at gcc dot gnu.org
  2024-04-17 17:34 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-04-17 17:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #13 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
We will go and fix PyTorch instead. Although it is not clearly documented, the
way PyTorch uses the builtin right now is probably not what was intended. It is
pretty clear that the element type pointer needs to alias vectors of the same
element type, but there is no saying about aliasing everything.

I'm just wondering how to improve the diagnostics in our backend to catch this.
The example below is similar to what PyTorch does today. Casting mem to
(float*) prevents our builtin code from complaining about the type mismatch and
by that opens the door for the much harder to debug TBAA problem.

#include <vecintrin.h>

void __attribute__((noinline)) foo (int *mem)
{
  vec_xst ((vector float){ 1.0f, 2.0f, 3.0f, 4.0f }, 0, (float*)mem);
}

int
main ()
{
  int m[4] = { 0 };
  foo (m);
  if (m[3] == 0)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (13 preceding siblings ...)
  2024-04-17 17:01 ` krebbel at gcc dot gnu.org
@ 2024-04-17 17:34 ` jakub at gcc dot gnu.org
  2024-04-18 14:29 ` aleksei.nikiforov at linux dot ibm.com
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-04-17 17:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Andreas Krebbel from comment #13)
> We will go and fix PyTorch instead. Although it is not clearly documented,
> the way PyTorch uses the builtin right now is probably not what was
> intended. It is pretty clear that the element type pointer needs to alias
> vectors of the same element type, but there is no saying about aliasing
> everything.
> 
> I'm just wondering how to improve the diagnostics in our backend to catch
> this. The example below is similar to what PyTorch does today. Casting mem
> to (float*) prevents our builtin code from complaining about the type
> mismatch and by that opens the door for the much harder to debug TBAA
> problem.

We need a TBAA analyzer among sanitizers (but writing it is really hard).

> #include <vecintrin.h>
> 
> void __attribute__((noinline)) foo (int *mem)
> {
>   vec_xst ((vector float){ 1.0f, 2.0f, 3.0f, 4.0f }, 0, (float*)mem);

So use
  *(vector float __attribute__((__may_alias__)) *)mem = (vector float){ 1.0f,
2.0f, 3.0f, 4.0f };
instead?  Sure, GCC extension, not an intrinsic in that case...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (14 preceding siblings ...)
  2024-04-17 17:34 ` jakub at gcc dot gnu.org
@ 2024-04-18 14:29 ` aleksei.nikiforov at linux dot ibm.com
  2024-04-22  9:37 ` krebbel at gcc dot gnu.org
  2024-04-23  8:17 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: aleksei.nikiforov at linux dot ibm.com @ 2024-04-18 14:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #15 from Aleksei Nikiforov <aleksei.nikiforov at linux dot ibm.com> ---
I think fixing compiled code should be possible. I'm not sure if this bug
should be just closed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (15 preceding siblings ...)
  2024-04-18 14:29 ` aleksei.nikiforov at linux dot ibm.com
@ 2024-04-22  9:37 ` krebbel at gcc dot gnu.org
  2024-04-23  8:17 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: krebbel at gcc dot gnu.org @ 2024-04-22  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #16 from Andreas Krebbel <krebbel at gcc dot gnu.org> ---
(In reply to Aleksei Nikiforov from comment #15)
> I think fixing compiled code should be possible. I'm not sure if this bug
> should be just closed.

In addition to fixing the PyTorch usage of the builtin, I also plan to change
GCC to the "alias everything" approach now. Although the documentation does not
strictly requires us to, it prevents other users from falling into the same
trap and makes GCC to match what Clang already does. The documentation anyway
discourages everyone from using these builtins. So it should not be a big deal,
if we sacrifice a bit of performance to make it more robust.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
  2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
                   ` (16 preceding siblings ...)
  2024-04-22  9:37 ` krebbel at gcc dot gnu.org
@ 2024-04-23  8:17 ` cvs-commit at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-04-23  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #17 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andreas Krebbel <krebbel@gcc.gnu.org>:

https://gcc.gnu.org/g:42189f21b22c43ac8ab46edf5f6a7b4d99bc86a5

commit r14-10087-g42189f21b22c43ac8ab46edf5f6a7b4d99bc86a5
Author: Andreas Krebbel <krebbel@linux.ibm.com>
Date:   Tue Apr 23 10:05:46 2024 +0200

    s390x: Fix vec_xl/vec_xst type aliasing [PR114676]

    The requirements of the vec_xl/vec_xst intrinsincs wrt aliasing of the
    pointer argument are not really documented.  As it turns out, users
    are likely to get it wrong.  With this patch we let the pointer
    argument alias everything in order to make it more robust for users.

    gcc/ChangeLog:

            PR target/114676
            * config/s390/s390-c.cc (s390_expand_overloaded_builtin): Use a
            MEM_REF with an addend of type ptr_type_node.

    gcc/testsuite/ChangeLog:

            PR target/114676
            * gcc.target/s390/zvector/pr114676.c: New test.

    Suggested-by: Jakub Jelinek <jakub@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-04-23  8:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-10 10:52 [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later aleksei.nikiforov at linux dot ibm.com
2024-04-10 11:06 ` [Bug target/114676] " pinskia at gcc dot gnu.org
2024-04-10 11:22 ` aleksei.nikiforov at linux dot ibm.com
2024-04-10 11:25 ` pinskia at gcc dot gnu.org
2024-04-10 11:28 ` aleksei.nikiforov at linux dot ibm.com
2024-04-10 11:33 ` pinskia at gcc dot gnu.org
2024-04-10 13:40 ` jakub at gcc dot gnu.org
2024-04-10 16:47 ` rguenth at gcc dot gnu.org
2024-04-11 12:30 ` krebbel at gcc dot gnu.org
2024-04-11 12:35 ` jakub at gcc dot gnu.org
2024-04-11 12:36 ` jakub at gcc dot gnu.org
2024-04-11 15:48 ` krebbel at gcc dot gnu.org
2024-04-12 13:32 ` law at gcc dot gnu.org
2024-04-12 18:36 ` sjames at gcc dot gnu.org
2024-04-17 17:01 ` krebbel at gcc dot gnu.org
2024-04-17 17:34 ` jakub at gcc dot gnu.org
2024-04-18 14:29 ` aleksei.nikiforov at linux dot ibm.com
2024-04-22  9:37 ` krebbel at gcc dot gnu.org
2024-04-23  8:17 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).