From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 85854 invoked by alias); 6 May 2015 16:13:20 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 85713 invoked by uid 48); 6 May 2015 16:13:07 -0000 From: "robert.suchanek at imgtec dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/65862] [MIPS] IRA/LRA issue: integers spilled to floating-point registers Date: Wed, 06 May 2015 16:13:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 5.1.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: robert.suchanek at imgtec dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-05/txt/msg00505.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65862 --- Comment #5 from Robert Suchanek --- Sorry for late reply, I was on vacation. > The costs are equal if cost of moving general regs to/from fp regs or > memory are equal. So it looks ok to me. > > r218 spilled in IRA is reassigned to a fp reg in *LRA*. > But I could try to use preferred class in LRA (after checking how it > affects x86/x86-64 performance), if such solution is ok for you. Indeed, the above test case only shows the problem in LRA. If the preferred class would be the winner then why not. However, there are still some issues with IRA and I have another testcase to show it. > I am not sure, that the result code is better as we access memory 3 > times instead of access to $f20. On one hand, yes, it seems good but it's not always desirable to use FP regs until absolutely necessary. For instance, compiling the dynamic linker that uses FP regs does not seem to be right. I had another thought about spilling into registers and how we could guarantee spilling into the desirable class. In the majority of cases where integers end up in floating-point registers, I see the following in the dumps: ... Reassigning non-reload pseudos Assign 52 to r217 (freq=46) ... This introduced the use of FP registers (in lra-assigns.c): ... if (n != 0 && lra_dump_file != NULL) fprintf (lra_dump_file, " Reassigning non-reload pseudos\n"); qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func); for (i = 0; i < n; i++) { regno = sorted_pseudos[i]; hard_regno = find_hard_regno_for (regno, &cost, -1, false); if (hard_regno >= 0) ... else ... } ... find_hard_regno_for chooses the FP registers freely because of allocno class has ALL_REGS. With a quick hack in the if conditional to skip the body for pseudos spilled to memory: ... if (hard_regno >= 0 && ! in_mem_p (regno)) ... forces the use of the TARGET_SPILL_CLASS hook and resolves spilling to FP regs in over 95% cases but not entirely. In terms of the code size, this change had a minor improvement on average case. Would this approach be the correct way to guarantee spilling to the desired class? In the remaining 5% cases, IRA assigns FP regs with LRA blindly following IRA's decisions like in the following reduced case: int a, b, d, e, j, k, n, o; unsigned c, h, i, l, m, p; int *f; int *g; int fn1(int p1) { return p1 - a; } int fn2() { b = b + 1 - a; e = 1 + o + 1518500249; d = d + n; c = (int)c + g[0]; b = b + m + 1; d = d + p + 1518500249; d = d + k - 1; c = fn1(c + j + 1518500249); e = fn1(e + i + 1); d = d + h + 1859775393 - a; c = fn1(c + (d ^ 1 ^ b) + g[1] + 1); b = fn1(b + m + 3); d = fn1(d + l + 1); b = b + (c ^ 1) + p + 1; e = fn1(e + (b ^ c ^ d) + n + 1); d = o; b = 0; e = e + k + 1859775393; f[0] = e; } I'm not sure how this could be fixed in LRA and again this is related to ALL_REGS for allocnos. Perhaps changing the class for reloads to the spill class in LRA would do the trick but it may have other problems. My last attempt was to increase the cost of FP_REGS in IRA for integral modes (similar effect to increasing the costs of moving FP<>GR in the backend) but the cost pass looks complicated and I'm not entirely sure where to tweak it. Any suggestions/ideas? > I tried reverting the ALL_REGS patch and I don't see any regressions - in > fact allocations are slightly better (fewer registers with ALL_REGS > preference which is what we need - a strong decision to allocate to either > FP or int regs). So what was the motivation for it? AFAICS, the aim was to fix the code generation regression for x86. x86 doesn't seem to be as much affected as others. I did not notice code size differences with -O2 and default arch for x86_64-unknown-linux-gnu triplet and CSiBE benchmark, -Os showed some minor improvements/regression with the largest difference in mpeg2dec-0.3.1 yielding ~0.3% improvement. I haven't evaluated performance changes. For MIPS, I also saw allocation improvements, more erratic than x86 with improvement about 0.5% on average. Reverting the patch does bring the old issue back but I wonder what is the impact of it and whether it is a justifiable fix to the extent it outweights the disadvantages. Or maybe the original problem could be fixed differently?