From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CB4FB3858417; Fri, 22 Mar 2024 09:41:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CB4FB3858417 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1711100468; bh=UbBJJW9RqmB/9lXYz0KXYN9zrj4FLjFIMM5JsrM4+s8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=uJFKlSq+59t9GDLlXzaG2RwY4TL5z5haZ9VHas7sYm7zIYebOyklwFKYFkI5YG2jI XAbbADeB0CLxoj0SWfg+AxeFh+WBrj2WvHoKX0/z/fqjXPrd+CtSkvSV2egac7ZAXs WuzTDrrI9Xm+UZkjfJnPCpQ/8RLoUm+Ms5FkE90I= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/101523] Huge number of combine attempts Date: Fri, 22 Mar 2024 09:41:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: compile-time-hog, memory-hog X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: segher at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101523 --- Comment #43 from Richard Biener --- The interesting bit is that there are only 12026 overall loglinks, the number of combine attempts is way higher than that would suggest also given the few successful combinations. So something is odd here. There's one interesting machinery in try_combine through added_links_insn and added_notes_insn we can end up re-processing a large swat of insns (even though we should need to only re-process the link target insns, not all insns inbetween). There might be the opportunity, for the "reprocessin= g", to use a worklist instead of resetting the insn walk. I added statistics to note the "distance" we travel there by taking DF_INSN_LUID (ret) - DF_INSN_LUID (added_{notes,links}_insn) as that. This shows 48 such jumps with seemingly large distances: 305 combine "restart earlier =3D=3D 143" 3 305 combine "restart earlier =3D=3D 254" 1 305 combine "restart earlier =3D=3D 684" 1 305 combine "restart earlier =3D=3D 726" 1 305 combine "restart earlier =3D=3D 777" 1 305 combine "restart earlier =3D=3D 1158" 1 305 combine "restart earlier =3D=3D 1421" 1 305 combine "restart earlier =3D=3D 2073" 1 305 combine "restart earlier =3D=3D 2130" 1 ... 305 combine "restart earlier =3D=3D 49717" 1 305 combine "restart earlier =3D=3D 49763" 1 305 combine "restart earlier =3D=3D 49866" 1 305 combine "restart earlier =3D=3D 50010" 1 305 combine "restart earlier =3D=3D 50286" 1 305 combine "restart earlier =3D=3D 50754" 1 305 combine "restart earlier =3D=3D 50809" 1 killing this feature doesn't improve things to -O1 levels though so it's more likely the fact that we also do rtx_insn *ret =3D newi2pat ? i2 : i3; thus re-start at i2 when we altered i2. We re-start through this 6910 times. Always re-starting at i3 helps a lot and gets us -O1 performance back. From comment#1 it suggests that newi2pat might in fact be equal to the old, so I tried to count how many times this happens with a stupid diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d836..acd176d3185 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -4435,6 +4435,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, propagate_for_debug (i2, last_combined_insn, i2dest, i2src, this_basic_block); INSN_CODE (i2) =3D i2_code_number; + if (rtx_equal_p (PATTERN (i2), newi2pat)) + statistics_counter_event (cfun, "equal newi2pat", 1); PATTERN (i2) =3D newi2pat; } else and indeed this shows this to be the case 9211 times. The following improves compile-time to 20s and 460MB memory use. In general the algorithmic deficiency with the "restarting" remains and a proper fix is to use a worklist for those that you'd drain before advancing in the instruction chain (so not have a single 'ret' insn to reprocess but add to the worklist). I'm not sure whether identifying a not changed "new" i2 can be done better. I'll leave it all to Segher of course - he'll be fastest to produce somethi= ng he likes. diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d836..0c61dcedaa1 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -4276,6 +4276,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx _insn *i0, } } + bool newi2pat_not_new =3D false; { rtx i3notes, i2notes, i1notes =3D 0, i0notes =3D 0; struct insn_link *i3links, *i2links, *i1links =3D 0, *i0links =3D 0; @@ -4435,6 +4436,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, propagate_for_debug (i2, last_combined_insn, i2dest, i2src, this_basic_block); INSN_CODE (i2) =3D i2_code_number; + if (rtx_equal_p (PATTERN (i2), newi2pat)) + newi2pat_not_new =3D true; PATTERN (i2) =3D newi2pat; } else @@ -4752,7 +4755,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, combine_successes++; undo_commit (); - rtx_insn *ret =3D newi2pat ? i2 : i3; + rtx_insn *ret =3D newi2pat && !newi2pat_not_new ? i2 : i3; if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID (ret)) ret =3D added_links_insn; if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID (ret))=