From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 85F49383303D; Thu, 24 Jun 2021 12:11:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 85F49383303D From: "linkw at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/100328] IRA doesn't model matching constraint well Date: Thu, 24 Jun 2021 12:11:36 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: linkw at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jun 2021 12:11:37 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100328 --- Comment #3 from Kewen Lin --- (In reply to Vladimir Makarov from comment #2) > (In reply to Kewen Lin from comment #1) > > Created attachment 50715 [details] > > ira:consider matching cstr in all alternatives > >=20 > > With little understanding on ira, I am not quite sure this patch is on = the > > reasonable direction. It aims to check the matching constraint in all > > alternatives, if there is one alternative with matching constraint and > > matches the current preferred regclass, it will record the output opera= nd > > number and further create one copy for it. Normally it can get the prio= rity > > against shuffle copies and the matching constraint will get satisfied w= ith > > higher possibility, reload doesn't create extra copies to meet the matc= hing > > constraint or the desirable register class when it has to. > >=20 > > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay = as > > shuffle copies, and later any of A,B,C,D gets assigned by one hardware > > register which is a VSX register but not a FP register, which means it = has > > to pay costs once we can NOT go with VSX alternatives, so at that time = we > > can increase the freq for the remaining copies related to this, once the > > matching constraint gets satisfied further, there aren't any extra cost= s to > > pay. This idea seems a bit complicated in the current framework, so the > > proposed patch aggressively emphasizes the matching constraint at the t= ime > > of creating copies. > >=20 > > FWIW bootstrapped/regtested on powerpc64le-linux-gnu P9. The evaluation= with > > Power9 SPEC2017 all run shows 505.mcf_r +2.98%, 508.namd_r +3.37%, 519.= lbm_r > > +2.51%, no remarkable degradation is observed. >=20 > Thank you for working on this issue. >=20 > The current implementation of ira_get_dup_out_num was specifically tuned = for > better register allocation for x86-64 div insns. >=20 > Your patch definitely improves code for power9 and I would say significan= tly > (congratulations!). The patch you proposed makes me think that it might > work for major targets as well. >=20 > I would prefer to avoid introducing new parameter because there are too m= any > of them already and its description is cryptic. >=20 Thanks for your comments, Vladimir! Yeah, Segher also thought it can benef= it other targets and suggested making it on by default, I've made this paramet= er on by default in v2, if it's fine on x86-64 and aarch64 with some testing a= nd benchmarking later, I think we can simply get rid of the parameter as you suggested.=20 > It would be nice if you benchmark the patch on x86-64 too, If there is no > overall degradation with new behaviour we could remove the parameter and > make the new behaviour as a default. If it is not, well we will keep the > parameter. >=20 Sorry that I don't have a x86-64 or aarch64 performance machine at hand, the new version v2 was bootstrapped/regtested on powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but had some failures on aarch64, I was still investigating it. Once it got root-caused and fixed, I would ask around fo= lks to help to benchmark this. > As for the patch itself, I don't like some variable names. Sorry. Could > you use op_regno, out_regno, and present_alt instead of op_no, out_no, to= t.=20 > Please, in general use longer variable names reflecting their purpose as = GCC > developers reads code in many times more than writing it. Got it, thanks for the suggestion! This part has been simplified with recog_op_alt, hope it looks better.=