public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug lto/102649] New: GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed
@ 2021-10-08 13:38 davidhaufegcc at gmail dot com
  2021-10-09  1:49 ` [Bug lto/102649] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: davidhaufegcc at gmail dot com @ 2021-10-08 13:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102649

            Bug ID: 102649
           Summary: GCC 9.3.1 LTO bug -- incorrect function call, bad
                    stack arguments pushed
           Product: gcc
           Version: 9.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: lto
          Assignee: unassigned at gcc dot gnu.org
          Reporter: davidhaufegcc at gmail dot com
                CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Hello,
We witnessed incorrect application behavior in a large binary built using LTO.
Doing an assembly instruction stepping of the binary, the issue was identified.
We have a function with 21 parameters. The function is called from many
call-sites. In the instance that is not working properly, the C++ function
caller passes a hard-coded integer '0' to a variable which is passed on the
stack (ie not register passed). GCC ends up generating two versions of the
called function under LTO. A version of the function that takes this integer
parameter, and one that optimizes out the need for this integer to be passed at
all, as it is a hardcoded 0. 

The issue is that the caller is still pushing an integer 0 function parameter
onto the stack. The callee does not expect the caller to have done this and
then is incorrectly popping stack function arguments that have been offset by
this extra stack arg. 

This issue was complicated to track down because some time later in our
codebase, unrelated classes/files in the same static library as the caller were
touched. The bug has since stopped. Rolling back GIT we can reproduce the bug
over about 10 checkins of unrelated code, and then unrelated code causes the
bug to stop. GCC generates the proper variable passing stack for the optimized
function. 

Compile flag investigation:
All builds were done with -O3 -flto -fno-fat-lto-objects -ffast-math
-funroll-loops
Disabling LTO -- bug does not present itself
With LTO on, we decomposed -ffast-math into its individual flags. If we leave
all -ffast-math flags on but disable -freciprocal-math, the bug does not
present itself. The code in question doesn't have any division anywhere around
it.

We speculate that disabling -freciprocal-math or the codebase generally
changing fixed the bug because it simply changes the global state of the
compile. This made us very nervous as there was no way to anticipate this bug
going forward. 

We are using the devtoolset-9 (GCC 9.3.1) centos7/rh7 package. Moving to the
devtoolset-10 (GCC 10.2.1) package "fixes" the issue with the same code and
build flags. devtoolset-8 (GCC 8.3.1)  does not present the bug either.

Our concern is that the bug is not actually fixed though, and that moving
versions of GCC is like changing our codebase by 10 unrelated check-ins or
disabling -freciprocal-math. It is simply changing the state of the compile.
The bug may or may not be fixed.

I would like to help in any way I can. This build generates a binary that is
200MB w/o debug symbols. It is a lot of code. I do not think we can create a
smaller test case showing this behavior. I thought about doing a bisect of the
GCC repo, but even that might just be changing the state of GCC and not
actually showing the bug is fixed. 

It is a concerning bug. I can try to provide any further information that would
be useful. 

Thanks,
Dave Haufe

$ ./gcc -v
Using built-in specs.
COLLECT_GCC=./gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap
--enable-languages=c,c++,fortran,lto --prefix=/opt/rh/devtoolset-9/root/usr
--mandir=/opt/rh/devtoolset-9/root/usr/share/man
--infodir=/opt/rh/devtoolset-9/root/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix --enable-checking=release --enable-multilib
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-gcc-major-version-only --with-linker-hash-style=gnu
--with-default-libstdcxx-abi=gcc4-compatible --enable-plugin
--enable-initfini-array
--with-isl=/builddir/build/BUILD/gcc-9.3.1-20200408/obj-x86_64-redhat-linux/isl-install
--disable-libmpx --enable-gnu-indirect-function --with-tune=generic
--with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 9.3.1 20200408 (Red Hat 9.3.1-2) (GCC)

$ cat /etc/*release*
CentOS Linux release 7.9.2009 (Core)
Derived from Red Hat Enterprise Linux 7.9 (Source)
cat: /etc/lsb-release.d: Is a directory
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.9.2009 (Core)
CentOS Linux release 7.9.2009 (Core)
cpe:/o:centos:centos:7

Example of .cpp file compile with args
g++      -m64 -std=c++17 -Wsuggest-override -Wduplicated-cond
-Wduplicated-branches -Wcast-qual -Wmissing-include-dirs      -Wall -Werror
-Wextra -fno-strict-aliasing -ggdb -frecord-gcc-switches  -I. -I...... -O3
-flto -fno-fat-lto-objects -ffast-math -funroll-loops -c ServiceThread.cpp -o
release/gcc/ServiceThread.o

Example of final link
g++ -Werror -Wl,--fatal-warnings release/gcc/main.o ...many *.a libs ...   
-lcap -lnuma -lpthread -lrt -ldl -lutil -lstdc++ -lstdc++fs -lm -lcrypto -lz
-flto=4  -O3 -ffast-math -funroll-loops   -o ./release/gcc/app

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug lto/102649] GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed
  2021-10-08 13:38 [Bug lto/102649] New: GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed davidhaufegcc at gmail dot com
@ 2021-10-09  1:49 ` pinskia at gcc dot gnu.org
  2021-10-11 17:12 ` davidhaufegcc at gmail dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-09  1:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102649

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-10-09

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is going to need more information, mainly the sources and how exactly to
reproduce the issue.  There has been some known issues with -ffast-math and
-flto but I don't remember offhand if there was something described like this
before.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug lto/102649] GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed
  2021-10-08 13:38 [Bug lto/102649] New: GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed davidhaufegcc at gmail dot com
  2021-10-09  1:49 ` [Bug lto/102649] " pinskia at gcc dot gnu.org
@ 2021-10-11 17:12 ` davidhaufegcc at gmail dot com
  2021-10-12 13:10 ` marxin at gcc dot gnu.org
  2024-04-13  1:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: davidhaufegcc at gmail dot com @ 2021-10-11 17:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102649

--- Comment #2 from David Haufe <davidhaufegcc at gmail dot com> ---
I had assumed this would be the response. Unfortunately the source code
involved is both large (1000+ object files in this build) and proprietary. The
behavior we see where if we roll forward GIT and rebuild, and unrelated changes
"fix" the problem, makes it seem futile to develop an isolated test case. 

I can provide the assembly for the functions that highlight the error if that
would be beneficial? Not sure how helpful that would be though. Are there any
other best practices in a case like this one?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug lto/102649] GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed
  2021-10-08 13:38 [Bug lto/102649] New: GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed davidhaufegcc at gmail dot com
  2021-10-09  1:49 ` [Bug lto/102649] " pinskia at gcc dot gnu.org
  2021-10-11 17:12 ` davidhaufegcc at gmail dot com
@ 2021-10-12 13:10 ` marxin at gcc dot gnu.org
  2024-04-13  1:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-10-12 13:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102649

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
All right, so being in the described situation I would recommend the following
steps:

1) please try bisecting GCC revisions and find the first on that fixed the
issue in between version 9.x and 10.x release. That would tell us more.

2) You can experiment with -fno-lto, which you can use for a subset of object
files (the rest will be built with LTO). Doing that, you will be able to
isolate that to a minimal set of objects that need -flto in order to expose the
issue.

3) You can use -fdump-tree-all for LTO linking step and you can investigate if
the caller really calls the function with an argument being set to 0.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug lto/102649] GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed
  2021-10-08 13:38 [Bug lto/102649] New: GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed davidhaufegcc at gmail dot com
                   ` (2 preceding siblings ...)
  2021-10-12 13:10 ` marxin at gcc dot gnu.org
@ 2024-04-13  1:05 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-13  1:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102649

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|---                         |INVALID

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is hard to reproduce without a testcase and there does not seem like there
is a testcase forthcoming since it has been 2.5 years since this was filed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-13  1:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-08 13:38 [Bug lto/102649] New: GCC 9.3.1 LTO bug -- incorrect function call, bad stack arguments pushed davidhaufegcc at gmail dot com
2021-10-09  1:49 ` [Bug lto/102649] " pinskia at gcc dot gnu.org
2021-10-11 17:12 ` davidhaufegcc at gmail dot com
2021-10-12 13:10 ` marxin at gcc dot gnu.org
2024-04-13  1:05 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).