From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 66CCD385843F; Tue, 1 Mar 2022 21:58:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 66CCD385843F From: "vmakarov at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/104686] [12 Regression] Huge compile-time regression building SPEC 2017 538.imagick_r with -march=skylake Date: Tue, 01 Mar 2022 21:58:47 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: compile-time-hog, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: vmakarov at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Mar 2022 21:58:47 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104686 --- Comment #19 from Vladimir Makarov --- (In reply to Richard Biener from comment #16) > it doesn't make a difference for this testcase but profiling shows that > allocnos_conflict_p is quite expensive so it's best to do it after the ot= her > continue checks like the following. I also notice that the comment of > allocnos_conflict_p says >=20 > /* Return TRUE if allocnos A1 and A2 conflicts. Here we are > interesting only in conflicts of allocnos with intersected allocno > classes. */ >=20 > so doing it after the ira_reg_classes_intersect_p check makes even more > sense(?) >=20 > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > index 8b6db1bb417..a5fd79484eb 100644 > --- a/gcc/ira-color.cc > +++ b/gcc/ira-color.cc > @@ -1572,15 +1572,14 @@ update_conflict_hard_regno_costs (int *costs, enum > reg_class aclass, > else > gcc_unreachable (); >=20=20 > + another_aclass =3D ALLOCNO_CLASS (another_allocno); > if (another_allocno =3D=3D from > + || ALLOCNO_ASSIGNED_P (another_allocno) > + || ALLOCNO_COLOR_DATA (another_allocno)->may_be_spilled_p > + || ! ira_reg_classes_intersect_p[aclass][another_aclass] > || allocnos_conflict_p (another_allocno, start)) > continue; >=20=20 > - another_aclass =3D ALLOCNO_CLASS (another_allocno); > - if (! ira_reg_classes_intersect_p[aclass][another_aclass] > - || ALLOCNO_ASSIGNED_P (another_allocno) > - || ALLOCNO_COLOR_DATA (another_allocno)->may_be_spilled_p) > - continue; > class_size =3D ira_class_hard_regs_num[another_aclass]; > ira_allocate_and_copy_costs > (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno), >=20 >=20 If it is allocnos_conflict_p takes significant time, this change definitely= has sense. On my estimation it will decrease allocnos_conflict_p calls in abou= t 4 times (assuming fp and int reg classes and half allocnos already assigned). In any case, the above change is profitable as allocnos_conflict_p practica= lly always takes more time than the condition tests moved up. > Now, what's more odd is that we sometimes have a nice bitmap representati= on > for the conflicts but we always iterate. So it _seems_ we should be able > to do sth like >=20 > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > index 8b6db1bb417..682d1ef7562 100644 > --- a/gcc/ira-color.cc > +++ b/gcc/ira-color.cc > @@ -1352,9 +1352,23 @@ allocnos_conflict_p (ira_allocno_t a1, ira_allocno= _t > a2) > { > obj =3D ALLOCNO_OBJECT (a1, word); > /* Take preferences of conflicting allocnos into account. */ > - FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) > - if (OBJECT_ALLOCNO (conflict_obj) =3D=3D a2) > - return true; > + if (!OBJECT_CONFLICT_VEC_P (obj)) > + { > + for (int w2 =3D 0; w2 < ALLOCNO_NUM_OBJECTS (a2); w2++) > + { > + ira_object_t obj2 =3D ALLOCNO_OBJECT (a2, w2); > + if (OBJECT_CONFLICT_ID (obj2) >=3D OBJECT_MIN (obj) > + && OBJECT_CONFLICT_ID (obj2) <=3D OBJECT_MAX (obj) > + && TEST_MINMAX_SET_BIT (OBJECT_CONFLICT_BITVEC (obj), > + OBJECT_CONFLICT_ID (obj2), > + OBJECT_MIN (obj), OBJECT_MAX > (obj))) > + return true; > + } > + } > + else > + FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) > + if (OBJECT_ALLOCNO (conflict_obj) =3D=3D a2) > + return true; > } > return false; > }=20=20 >=20 > which reduces compile-time from 10s to 1s for me ... the above should > be split out so we can "optimally" use the bit test for > object vs. allocno when possible. >=20 > Vlad - any thoughts about the above two things? Shall I try to polish and > optimize the bit test or would you be willing to pick those two speedups = up? This change also has sense. Usually for big functions conflict sets are ve= ry sparse and bit vectors are not used. But it seems this is not the case for= the PR. Please, polish and optimize the change as you proposed and I approve the fi= nal version promptly. Thank you for working on this PR, Richard.=