public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/102206] New: amd zen hosts running zen-optimized gcc: gimplification ICE after 94e24187
@ 2021-09-05  7:49 gmt@be-evil.net
  2021-09-06  4:33 ` [Bug middle-end/102206] " gmt@be-evil.net
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: gmt@be-evil.net @ 2021-09-05  7:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102206

            Bug ID: 102206
           Summary: amd zen hosts running zen-optimized gcc:
                    gimplification ICE after 94e24187
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gmt@be-evil.net
  Target Milestone: ---

The bad news: why this bug report will be long and confusing
============================================================

There some things in this bug report that will probably make folks think "This
is a hardware or software stability problem and not gcc's fault."

Strictly speaking, I can't entirely disprove this hypothesis, but I will
present evidence below which has led me to believe it's probably a legit gcc
bug.

AFAICT, due to the nature of binary distributions, this bug manifests
exclusively on Gentoo.  I imagine there are like 50 Gentoo users on Zen and 25
of them have experienced the bug, half of whom filed bug reports, and the rest
of whom shrugged it off and decided it must have been a cosmic ray or something
:)

So, Gentoo-only.  "Rice" is an arguably racially-insensitive term Gentoo people
use to describe excessive customization of Gentoo systems resulting in various
breakage and non-reproducible problems.  Initially I thought this was probably
a "rice"-related problem, but I have taken considerable pains to rule this out.
 It's not rice-related.

But wait, there's more!  It's also non-deterministically non-deterministic!

That is, almost everyone experiences this bug non-deterministically.  But, for
reasons not yet understood, some users (I was once one of them) have found
<software, hardware> configurations in which this bug manifests fully
repeatably and deterministically.  Sadly, AFAICT none of these users have
managed to preserve these fully deterministic software configurations.

OK, enough prefacing. I just want to prepare the reader: the nature of this
bug/issue will raise some doubts, which will likely need to be overcome before
this bug looks "legit".  I also wish to encourage the reader not to jump to
easy "not-a-bug" conclusions without careful consideration of the circumstances
presented below.

Scope/Domain of the bug
=======================

On Gentoo, there are several AMD Zen hardware users who report that they must
either

A) avoid building gcc with

  -m{arch,tune}=znver?

and their

  -m{arch,tune}=native

equivalents, or

B) must downgrade to gcc-9 or earlier.

If they fail to do so, the bug will occur, eventually.  The compile which seems
to most reliably reproduce the bug is boost (any recent version will do the
trick).  But it appears in other builds.

Zen (1xxx) and Zen+ (2xxx) hosts seem most susceptible.  But Zen-2 and Zen 3
hosts (3xxx/4xxx(?) and 5xxx, respectively) also appear to be at least
occasionally affected.

Note that once an optimized gcc is built, optimizing the target build with
similar -m{arch,tune} options is not a requirement.  But, such target
optimizations do seem to reproduce the problem with a considerably greater
probability.

Bug Manifestation
=================

The bug itself appears as an ICE, stack smash, or zero-pointer-deference fault
during c++ compiles.  The problem seems to always manifest during
gimplification and to produce distinctive stack-dumps, ie:

Thread 2.1 "cc1plus" received signal SIGABRT, Aborted.
[Switching to process 15911]
0x00007ffff7bc3f71 in raise () from /lib64/libc.so.6
#0  0x00007ffff7bc3f71 in raise () from /lib64/libc.so.6
#1  0x00007ffff7bad537 in abort () from /lib64/libc.so.6
#2  0x00007ffff7c08207 in ?? () from /lib64/libc.so.6
#3  0x00007ffff7c99892 in __fortify_fail () from /lib64/libc.so.6
#4  0x00007ffff7c99870 in __stack_chk_fail () from /lib64/libc.so.6
#5  0x000000000065a1f2 in cp_gimplify_expr(tree_node**, gimple**, gimple**) ()
#6  0x00000000009f4ffc in gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int) ()
#7  0x00000000009faeb1 in ?? ()
#8  0x00000000009f6304 in gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int) ()
#9  0x00000000009f6196 in gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int) ()
#10 0x00000000009f5d42 in gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int) ()
#11 0x00000000009f6196 in gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int) ()
#12 0x00000000009fd5b9 in ?? ()
#13 0x00000000009f638d in gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int) ()
#14 0x00000000009f6196 in gimplify_expr(tree_node**, gimple**, gimple**, bool
(*)(tree_node*), int) ()
#15 0x00000000009f9ed9 in gimplify_body(tree_node*, bool) ()
#16 0x00000000009fa316 in gimplify_function_tree(tree_node*) ()
#17 0x0000000000885f58 in cgraph_node::analyze() ()
#18 0x0000000000888878 in ?? ()
#19 0x00000000008894c3 in symbol_table::finalize_compilation_unit() ()
#20 0x0000000000c769b1 in ?? ()
#21 0x000000000060065a in toplev::main(int, char**) ()
#22 0x000000000060413c in main ()

This is a pretty manageable example; others report stack traces with very deep
gimplify_expr recursion*.

Git Bisect: 94e2418780f1d13235f3e2e6e5c09dbe821c1ce3
====================================================

A few months ago I git bisected this thing.  Since the bug was manifesting
nondeterministically, it took some doing; I wrote scripts to repeatedly build
boost, treating a point in history as "good" after no less than 50 consecutive
successful builds with the resulting optimized compiler*.  Thankfully, this did
result in a culprit which was revertible without crippling gcc:

  94e24187 | c++: Avoid unnecessary empty class copy [94175]

I must admit I don't really understand what this commit does.  But reverting it
and rebuilding gcc-1{0.{1,2,3},1.{1,2}} results in a compiler which seems to
work fine and does not suffer from the bug/issue.

Since then, every reporter so far in Gentoo bug 724314 (where most discussion
of this bug has occurred) has reported that applying this patch also solved the
problem for them*.

The specific Gentoo-friendly patch folks have been using is available at:

  https://724314.bugs.gentoo.org/attachment.cgi?id=718944 

Significance of this finding
============================

?

--
* See https://bugs.gentoo.org/724314 for examples/specifics

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-06 16:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-05  7:49 [Bug middle-end/102206] New: amd zen hosts running zen-optimized gcc: gimplification ICE after 94e24187 gmt@be-evil.net
2021-09-06  4:33 ` [Bug middle-end/102206] " gmt@be-evil.net
2021-09-06  6:53 ` [Bug middle-end/102206] amd zen hosts running zen-optimized gcc: gimplification ICE after r10-7284 marxin at gcc dot gnu.org
2021-09-06  8:00 ` gmt@be-evil.net
2021-09-06  8:22 ` marxin at gcc dot gnu.org
2021-09-06  8:33 ` gmt@be-evil.net
2021-09-06  8:41 ` marxin at gcc dot gnu.org
2021-09-06  8:41 ` gmt@be-evil.net
2021-09-06  8:48 ` gmt@be-evil.net
2021-09-06  8:50 ` gmt@be-evil.net
2021-09-06  9:02 ` gmt@be-evil.net
2021-09-06  9:14 ` jakub at gcc dot gnu.org
2021-09-06  9:39 ` marxin at gcc dot gnu.org
2021-09-06 16:33 ` amonakov at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).