From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 9A3CD3858D28; Thu, 28 Mar 2024 10:39:11 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9A3CD3858D28
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1711622351;
	bh=3W7yzsiRss2gL0lMb3JZEsjWm1lKKVeowQWNN0VC7cs=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=L+XCjXsxP/GUCN18Xowm0QA34oOGUdm4zQ3CBUhNZU9QkDqxipUk86ZvRIsN6eOM2
	 oV880fxzcEilgpoKOaY/f4+0KVq5OHrqRjCTVGFshmkxz0Ej87F4EgKUgE08VPjv5o
	 7Pj/waXbvY+nlFBgtGqSTyxNR2ncK7Qtwa+qdkEk=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/114480] g++: internal compiler error: Segmentation fault
 signal terminated program cc1plus
Date: Thu, 28 Mar 2024 10:39:11 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Version: 11.4.0
X-Bugzilla-Keywords: compile-time-hog, memory-hog, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: assigned_to bug_status
Message-ID: <bug-114480-4-TIb4EtgwJr@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114480-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114480-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114480

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot =
gnu.org
             Status|NEW                         |ASSIGNED
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #14)
> Created attachment 57829 [details]
> smaller testcase
>=20
> Smaller testcase, shows the same compile-time issue at -O0.  At -O1 it's a
> lot
> less bad but memory usage is better (8GB), so the slowness of the full
> testcase
> is likely memory bandwidth related.
>=20
> -O1 is then
>=20
>  tree PTA                           :  20.59 ( 21%)
>  expand vars                        :   9.19 (  9%)
>  expand                             :  14.26 ( 15%)

The memory use goes into RTXen created during RTL expansion.  The compile-t=
ime
part is add_scope_conflicts.  There's the possibility to do like
var-tracking and use rev_post_order_and_mark_dfs_back_seme, avoiding iterat=
ion
for non-loops and have better cache locality.

We have half of the profile hits on ggc_internal_alloc and it's

    17 | d8:+- mov    %r14,%rax=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20
                     #
       |    |  mov    (%r14),%r14=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20
                     #
  1440 |    |  test   %r14,%r14=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20
                     #
     4 |    |  je     530=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20
                     #
       |    |if (p->bytes =3D=3D entry_size)=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20
                     #
       | e7:|  cmp    0x10(%r14),%r12=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20
                     #
 65582 |    +--jne    d8=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20

which is the linear walk

  /* Check the list of free pages for one we can use.  */
  for (pp =3D &G.free_pages, p =3D *pp; p; pp =3D &p->next, p =3D *pp)=20
    if (p->bytes =3D=3D entry_size)
      break;

so we seem to have many free pages for some reason but the free pages
pool is global and not per order?!

Samples: 299K of event 'cycles', Event count (approx.): 338413178083=20=20=
=20=20=20=20=20=20=20=20=20=20
Overhead       Samples  Command  Shared Object       Symbol=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  23.16%         67756  cc1plus  cc1plus             [.] ggc_internal_alloc
   6.98%         21637  cc1plus  cc1plus             [.] bitmap_tree_splay
   6.89%         20413  cc1plus  cc1plus             [.] bitmap_ior_into
   4.05%         11989  cc1plus  cc1plus             [.] bitmap_elt_ior
   3.16%          9840  cc1plus  cc1plus             [.] mergesort<sort_ctx>
   2.90%          8860  cc1plus  cc1plus             [.] bitmap_set_bit
   2.76%          8281  cc1plus  cc1plus             [.]
get_ref_base_and_extent
   1.37%          4071  cc1plus  cc1plus             [.]
stmt_may_clobber_ref_p_1
   1.32%          4095  cc1plus  cc1plus             [.] dominated_by_p
   1.16%          3597  cc1plus  cc1plus             [.]
bitmap_tree_unlink_element
   1.06%          3128  cc1plus  cc1plus             [.] walk_aliased_vdefs=
_1

the bitmap_tree_splay is from compute_idf, refactoring that some more,
also avoiding the duplicate processing and doing away with the bitmap
for the workset might help a bit there (not using tree view just gets
set-bit up with no overall positive change).

I will look into the above things more (but not the RA slowness at -O0).=