From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 2A39B3858439; Tue, 25 Jul 2023 08:39:08 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2A39B3858439
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1690274348;
	bh=xHCzNMIm+HVY19s7Lw32UP2YgNxcKditqe2nMD6XXak=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=oXcT3a0xcofZElT0M+MLyUMXoRAN9qlAMqttxotypIKdI1OJYLbMqqbpT6k06tmIM
	 A+hHAAlv8eGDFCVplSSg60MBPKNrROwnv7DC2tOjB9KcTRbyNOoLnBm1SBMNuU461z
	 CAvLCr+MGcZQwpOf0dVo3tbmQXII+5olIIG4AA1c=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile
 time regression since r14-2337-g37a231cc7594d1
Date: Tue, 25 Jul 2023 08:38:44 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: compile-time-hog, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: roger at nextmovesoftware dot com
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-110587-4-djbPdOEyOa@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110587-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110587-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110587
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
compile-time is back to the first jump caused by r14-2337-g37a231cc7594d1,
thanks Roger.  We still have

 LRA non-specific                   :   3.53 ( 75%)

at -O0 here which Rogers followup patch will improve (but not generally
solve the issue).

At -O1 combine dominates, at -O2 we see other parts of RA being slow:

 integrated RA                      :   7.10 ( 23%)=20
 LRA non-specific                   :   1.56 (  5%)
 LRA virtuals elimination           :   0.07 (  0%)
 LRA reload inheritance             :   1.02 (  3%)
 LRA create live ranges             :   0.88 (  3%)
 LRA hard reg assignment            :   8.22 ( 27%)
 LRA coalesce pseudo regs           :   0.00 (  0%)
 LRA rematerialization              :   0.18 (  1%)

Samples: 124K of event 'cycles:u', Event count (approx.): 164730867020=20=
=20=20=20=20=20=20=20=20=20
Overhead       Samples  Command  Shared Object       Symbol=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
  16.60%         20660  cc1      cc1                 [.] find_hard_regno_fo=
r_1
  11.90%         14742  cc1      cc1                 [.] bitmap_set_bit
   6.47%          7973  cc1      cc1                 [.] color_allocnos
   3.31%          4023  cc1      cc1                 [.] bitmap_bit_p
   3.07%          3791  cc1      cc1                 [.]
remove_allocno_from_bucket_and_push
   2.77%          3435  cc1      cc1                 [.] assign_hard_reg
   2.54%          3138  cc1      cc1                 [.] ira_build_conflicts

in find_hard_regno_for_1 the loop over live ranges is what's costly, esp.
because it seems the conditionals in the loops depend on (indirect) memory
and that no longer fits nicely into caches.

Maybe regno_allocno_class_array can be shrunk from 'enum reg_class'
(unsigned int) to something smaller.  It looks like this array is a
memory optimization since reg_allocno_class would perform a much sparser
access.=