From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id E4E113858D28; Thu, 21 Mar 2024 12:57:35 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E4E113858D28
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1711025855;
	bh=cs169D1ZQqHyTccwYhegNxt+N9ugEQKQ6m+1rZbnFKg=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=PDNux69qct7rCK3WeBqIU7fYOBn4AAv7F0DtXUSLx5+xN+Qh9Mx+GKP1rO/eDSJGo
	 1+RGN5g2DMN0uGY2MXz8yxeqUbPRBtidKyIjvi4Y1FDcvTAY8z9zantzceW3O9QaIW
	 Yv1BQTlxkT9bwZmuFXdAmJYc8Wa6gHH4FVhbyNRo=
From: "segher at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/101523] Huge number of combine attempts
Date: Thu, 21 Mar 2024 12:57:33 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: compile-time-hog, memory-hog
X-Bugzilla-Severity: normal
X-Bugzilla-Who: segher at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: segher at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-101523-4-IVpdnwewMM@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-101523-4@http.gcc.gnu.org/bugzilla/>
References: <bug-101523-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101523
--- Comment #38 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #36)
> > No, it definitely should be done.  As I showed back then, it costs less=
 than
> > 1%
> > extra compile time on *any platform* on average, and it reduced code si=
ze by
> > 1%-2%
> > everywhere.
> >=20
> > It also cannot get stuck, any combination is attempted only once, any
> > combination
> > that succeeds eats up a loglink.  It is finite, (almost) linear in fact.
>=20
> So the slowness for the testcase comes from failed attempts.

Of course.  Most attempts do not succeed, there aren't instructions for most
"random" combinations of instructions feeding each other.  But combine blin=
dly
tries everything, that is its strength!  It ends up finding many more thing
than
any recognition automaton does.

> > Something that is the real complaint here: it seems we do not GC often
> > enough,
> > only after processing a BB (or EBB)?  That adds up for artificial code =
like
> > this, sure.
>=20
> For memory use if you know combine doesn't have "dangling" links to GC me=
mory
> you can call ggc_collect at any point you like.  Or, when you create
> throw-away RTL, ggc_free it explicitly (yeah, that only frees the
> "toplevel").

A lot of it *is* toplevel (well, completely disconnected RTX), just
temporaries,
things we can just throw away.  At every try_combine call even, kinda.  The=
re
might be some more RTX that needs some protection.  We'll see.

> > And the "param to give an upper limit to how many combination attempts =
are
> > done
> > (per BB)" offer is on the table still, too.  I don't think it would eve=
r be
> > useful (if you want your code to compile faster just write better code!=
),
> > but :-)
>=20
> Well, while you say the number of successful combinations is linear the
> number of combine attempts appearantly isn't

It is, and that is pretty easy to show even.  With retries it stays linear,=
 but
with a hefty constant.  And on some targets (with more than three inputs for
some instructions, say) it can be a big constant anyway.

But linear is linear, and stays linear, for way too big code it is just as
acceptable as for "normal" code.  Just slow.  If you don't want the compile=
r to
take a long time compiling your way too big code, use -O0, or preferably do=
 not
write insane code in the first place :-)


> (well, of course, if we ever
> combine from multi-use defs).  So yeah, a param might be useful here but
> instead of some constant limit on the number of combine attempts per
> function or per BB it might make sense to instead limit it on the number
> of DEFs?

We still use loglinks in combine.  These are nice to prove that things stay
linear, even (every time combine succeeds a loglink is used up).

The number of loglinks and insns (insns combine can do anything with) diffe=
rs
by a small constant factor.

> I understand we work on the uses

We work on the loglinks, a def-use pair if you want.

> so it'll be a bit hard to
> apply this in a way to, say, combine a DEF only with the N nearest uses
> (but not any ones farther out),

There is only a loglink from a def to the very next use.  If that combines,=
 the
insn that does the def is retained as well, if there is any other use.  But
there never is a combination of a def with a later use tried, if the earlie=
st
use does not combine.

> and maintaining such a count per DEF would
> cost.  So more practical might be to limit the number of attempts to comb=
ine
> into an (unchanged?) insns?
>=20
> Basically I would hope with a hard limit in place we'd not stop after the
> first half of a BB leaving trivial combinations in the second half
> unhandled but instead somehow throttle the "expensive" cases?

Ideally we'll not do *any* artificial limitations.  But GCC just throws its=
 hat
in the ring in other cases as well, say, too big RA problems.  You do get n=
ot
as high quality code as wanted, but at least you get something compiled in
an acceptable timeframe :-)=