From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 41F173858D39; Wed, 10 Apr 2024 10:52:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 41F173858D39 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1712746375; bh=7SN5gWra3pjTEUGsO98GvrodW0Ww8vsjyLbVLZIIBPI=; h=From:To:Subject:Date:From; b=dmFdPEG0Zu76q+2lUMVdC1JG+Ap/Xjk3uvfMx3EEYMXDnHwL+BYy3c2rE2MBZ7n2b SkimPz0cwxzBhcQq10q7eF05Tv6slkqznxtAv7XEdfBpfz4nIMLKsaJjANXfUCGQfh QaxUXaT1Qtjx4mscct7mzQF3MDLVUA6osFV0Qico= From: "aleksei.nikiforov at linux dot ibm.com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114676] New: [12/13/14 Regression] DSE removes assignment that is used later Date: Wed, 10 Apr 2024 10:52:52 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.1.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: aleksei.nikiforov at linux dot ibm.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114676 Bug ID: 114676 Summary: [12/13/14 Regression] DSE removes assignment that is used later Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: aleksei.nikiforov at linux dot ibm.com Target Milestone: --- Created attachment 57916 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D57916&action=3Dedit GridSamplerKernel.cpp.ZVECTOR.cpp.o.prep2.cpp.bz2 When building pytorch on s390x with gcc >=3D 12, resulting pytorch applicat= ion crashes in some tests. It doesn't happen with gcc <=3D 11. I've bisected gc= c, and issue first appears with gcc commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd= 6. I've also found in which object file miscompilation happens. gcc configuration: /bin/sh /var/tmp/portage/sys-devel/gcc-12.3.9999/work/gcc-12.3.9999/configu= re --host=3Ds390x-ibm-linux-gnu --build=3Ds390x-ibm-linux-gnu --prefix=3D/usr --bindir=3D/usr/s390x-ibm-linux-gnu/gcc-bin/12 --includedir=3D/usr/lib/gcc/s390x-ibm-linux-gnu/12/include --datadir=3D/usr/share/gcc-data/s390x-ibm-linux-gnu/12 --mandir=3D/usr/share/gcc-data/s390x-ibm-linux-gnu/12/man --infodir=3D/usr/share/gcc-data/s390x-ibm-linux-gnu/12/info --with-gxx-include-dir=3D/usr/lib/gcc/s390x-ibm-linux-gnu/12/include/g++-v12 --disable-silent-rules --disable-dependency-tracking --with-python-dir=3D/share/gcc-data/s390x-ibm-linux-gnu/12/python --enable-languages=3D'c,c++,fortran' --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=3Drelease --with-bugurl=3D'https://bugs.gentoo.org/' --with-pkgversion=3D'Gentoo 12.0= .0, commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6' --with-gcc-major-version-o= nly --enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared --enable-threads=3Dposix --enable-__cxa_atexit --enable-clocale=3Dgnu --disable-multilib --disable-fixed-point --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-valgrind-annotations --disable-vtable-verify --disable-libvtv --without-zstd --without-isl --disable-libsanitizer --enable-default-pie --enable-default-ssp --with-arch=3Dz15 I'm attaching preprocessed file. Full compilation command is: /usr/bin/g++-12 -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DFMT_HEADER_ONLY=3D1 -DHAVE_MALLOC_USABLE_SIZE=3D1 -DHAVE_MMAP=3D1 -DHAVE_= SHM_OPEN=3D1 -DHAVE_SHM_UNLINK=3D1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=3D1 -DONNX_ML=3D1 -DONNX_NAMESPACE=3Donnx_torch -DUSE_= C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=3D64 -Dtorch_cpu_EXPORTS -I/home/user/work12/pytorch/build/aten/src -I/home/user/work12/pytorch/aten= /src -I/home/user/work12/pytorch/build -I/home/user/work12/pytorch -I/home/user/work12/pytorch/cmake/../third_party/benchmark/include -I/home/user/work12/pytorch/third_party/onnx -I/home/user/work12/pytorch/build/third_party/onnx -I/home/user/work12/pytorch/third_party/foxi -I/home/user/work12/pytorch/build/third_party/foxi -I/home/user/work12/pytorch/torch/csrc/api -I/home/user/work12/pytorch/torch/csrc/api/include -I/home/user/work12/pytorch/caffe2/aten/src/TH -I/home/user/work12/pytorch/build/caffe2/aten/src/TH -I/home/user/work12/pytorch/build/caffe2/aten/src -I/home/user/work12/pytorch/build/caffe2/../aten/src -I/home/user/work12/pytorch/torch/csrc -I/home/user/work12/pytorch/third_party/miniz-2.1.0 -I/home/user/work12/pytorch/third_party/kineto/libkineto/include -I/home/user/work12/pytorch/third_party/kineto/libkineto/src -I/home/user/work12/pytorch/aten/src/ATen/.. -I/home/user/work12/pytorch/c1= 0/.. -I/home/user/work12/pytorch/third_party/FP16/include -I/home/user/work12/pytorch/third_party/tensorpipe -I/home/user/work12/pytorch/build/third_party/tensorpipe -I/home/user/work12/pytorch/third_party/tensorpipe/third_party/libnop/inclu= de -I/home/user/work12/pytorch/third_party/fmt/include -I/home/user/work12/pytorch/third_party/flatbuffers/include -isystem /home/user/work12/pytorch/build/third_party/gloo -isystem /home/user/work12/pytorch/cmake/../third_party/gloo -isystem /home/user/work12/pytorch/cmake/../third_party/tensorpipe/third_party/libuv= /include -isystem /home/user/work12/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/user/work12/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/user/work12/pytorch/third_party/protobuf/src -isystem /home/user/work12/pytorch/cmake/../third_party/eigen -isystem /home/user/work12/pytorch/build/include -march=3Dz15 -D_GLIBCXX_USE_CXX11_A= BI=3D1 -fvisibility-inlines-hidden -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=3Dpedantic -Wno-error=3Dold-style-cast -Wno-missing-braces -fdiagnostics-color=3Dalways -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow -DHAVE_ZVECTOR_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -std=3Dgnu++17 -fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -Wno-maybe-uninitialized -fvisibility=3Dhidden -O2 -fopenmp -O3 -mvx -mz= vector -march=3Dz15 -mtune=3Dz15 -DCPU_CAPABILITY=3DZVECTOR -DCPU_CAPABILITY_ZVECT= OR -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKern= el.cpp.ZVECTOR.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKern= el.cpp.ZVECTOR.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKern= el.cpp.ZVECTOR.cpp.o -c /home/user/work12/pytorch/build/aten/src/ATen/native/cpu/GridSamplerKernel.= cpp.ZVECTOR.cpp There are following lines in file around line 121590: integer_t mask_arr[iVec::size()]; mask.store(mask_arr); scalar_t gInp_corner_arr[Vec::size()]; delta.store(gInp_corner_arr); mask_scatter_add(gInp_corner_arr, data, i_gInp_offset_arr, mask_arr, le= n); store call (lines 117929-117940): void __attribute__((__always_inline__)) inline store(void* ptr, int count= =3D size()) const { if (count =3D=3D size()) { # 421 "/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.= h" 3 4 __builtin_s390_vec_xst # 421 "/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.= h" (_vec0, offset0, reinterpret_cast(ptr)); # 422 "/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.= h" 3 4 __builtin_s390_vec_xst # 422 "/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.= h" (_vec1, offset16, reinterpret_cast(ptr)); mask.store(mask_arr) is first replaced by 2 corresponding calls to __builtin_s390_vec_xst, and those are later incorrectly removed by DSE. I've also ran compilation command with -fdump-tree-all-all -fdump-rtl-all-a= ll. In file *.040t.dse1 I've found following lines: ;; Function at::native::{anonymous}::ApplyGridSample::add_value_bounded (_ZNK2at6nat ive12_GLOBAL__N_115ApplyGridSampleIdLi2ELNS0_6detail24GridSamplerInterpolat= ionE2ELNS3_18GridSamplerPaddingE1ELb1EE17add_value_boundedEPdlRKNS_3vec7ZVE= CTOR10VectorizedIdvEESD_SD_, funcdef_no=3D13629, decl_ui d=3D274419, cgraph_uid=3D8075, symbol_order=3D9478) Pass statistics of "dse": ---------------- Deleted dead store: # .MEM_369 =3D VDEF <.MEM_368> MEM [(ElementTypeD.254545 *)&mask_arrD.383153 + 16B] = =3D _244; Deleted dead store: # .MEM_368 =3D VDEF <.MEM_360> MEM [(ElementTypeD.254545 *)&mask_arrD.383153] =3D _2= 42;=