From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2A39B3858439; Tue, 25 Jul 2023 08:39:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2A39B3858439 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690274348; bh=xHCzNMIm+HVY19s7Lw32UP2YgNxcKditqe2nMD6XXak=; h=From:To:Subject:Date:In-Reply-To:References:From; b=oXcT3a0xcofZElT0M+MLyUMXoRAN9qlAMqttxotypIKdI1OJYLbMqqbpT6k06tmIM A+hHAAlv8eGDFCVplSSg60MBPKNrROwnv7DC2tOjB9KcTRbyNOoLnBm1SBMNuU461z CAvLCr+MGcZQwpOf0dVo3tbmQXII+5olIIG4AA1c= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1 Date: Tue, 25 Jul 2023 08:38:44 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: compile-time-hog, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: roger at nextmovesoftware dot com X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110587 --- Comment #14 from Richard Biener --- compile-time is back to the first jump caused by r14-2337-g37a231cc7594d1, thanks Roger. We still have LRA non-specific : 3.53 ( 75%) at -O0 here which Rogers followup patch will improve (but not generally solve the issue). At -O1 combine dominates, at -O2 we see other parts of RA being slow: integrated RA : 7.10 ( 23%)=20 LRA non-specific : 1.56 ( 5%) LRA virtuals elimination : 0.07 ( 0%) LRA reload inheritance : 1.02 ( 3%) LRA create live ranges : 0.88 ( 3%) LRA hard reg assignment : 8.22 ( 27%) LRA coalesce pseudo regs : 0.00 ( 0%) LRA rematerialization : 0.18 ( 1%) Samples: 124K of event 'cycles:u', Event count (approx.): 164730867020=20= =20=20=20=20=20=20=20=20=20 Overhead Samples Command Shared Object Symbol=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 16.60% 20660 cc1 cc1 [.] find_hard_regno_fo= r_1 11.90% 14742 cc1 cc1 [.] bitmap_set_bit 6.47% 7973 cc1 cc1 [.] color_allocnos 3.31% 4023 cc1 cc1 [.] bitmap_bit_p 3.07% 3791 cc1 cc1 [.] remove_allocno_from_bucket_and_push 2.77% 3435 cc1 cc1 [.] assign_hard_reg 2.54% 3138 cc1 cc1 [.] ira_build_conflicts in find_hard_regno_for_1 the loop over live ranges is what's costly, esp. because it seems the conditionals in the loops depend on (indirect) memory and that no longer fits nicely into caches. Maybe regno_allocno_class_array can be shrunk from 'enum reg_class' (unsigned int) to something smaller. It looks like this array is a memory optimization since reg_allocno_class would perform a much sparser access.=