From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5258 invoked by alias); 17 Mar 2014 09:09:22 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 5221 invoked by uid 48); 17 Mar 2014 09:09:16 -0000 From: "olegendo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/53513] SH Target: Add support for fschg and fpchg insns Date: Mon, 17 Mar 2014 09:09:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 4.8.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: olegendo at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-03/txt/msg01374.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513 --- Comment #7 from Oleg Endo --- (In reply to Rich Felker from comment #6) > On Sun, Mar 16, 2014 at 11:32:21PM +0000, olegendo at gcc dot gnu.org wrote: > > If it's OK for a temporary mode switch to clobber other FPSCR bits (such as in > > the PR = single mode above), it should also be OK to load the FPSCR value from > > a thread local variable inside the thread-control-block: > > > > mov.l @(, gbr),r0 // r0 = address to fpscr value for a > > // particular mode setting > > lds.l @r0+,fpscr // mode switch > > IMO this is an ugly hack that shouldn't be taken. It has lots of > complex interactions with other things: signal handlers, the > ucontext.h functions, fenv, pthread, etc. Could you please elaborate on the problems you see there? > that could probably be > achieved correctly if somebody wanted to spend the effort on it, but > it would be ugly and SH-specific and honestly we already have a > shortage of people willing to spend time fixing SH problems without > introducing even more work. I failed to mention, that the idea with TLS variables was meant to be a separate option. If implemented in the compiler it would need to be turned on explicitly, similar to the option '-matomic-model=soft-tcb,gbr-offset='. Thus, if a libc/kernel/whatever doesn't want to make use of it, it won't have to. > > > This would require that any FPSCR setting change is also propagated to > > the TLS variables. E.g. setting the rounding mode would have to update > > FPSCR mode values in all such TLS variables. > > I guess that it would be useful to be able to select an FPSCR value for at > > least all combinations of FR and SZ bit settings, in other words having > > a TLS __fpscr_values array with 4 entries. However, it would make things > > such as toggling FPSCR.FR via frchg inefficient due to the required updates > > of the TLS variables. Other setting changes such as denorm or rounding > > mode are probably not so critical. > > If I'm not mistaken, the toggle approach will be efficient without > this TLS hack once it's implemented, right? No. The toggle approach is only efficient on SH4A. Other SH FPUs don't implement the fpchg instruction and require sts-lds fpscr sequences, regardless of toggling or explicitly setting the PR mode. > I don't think it makes > sense to introduce a hack for just a short-term mitigation of the > performance regression. If you think the short-term fix for this issue > is too costly, the proper solution is probably to add a -m option to > turn it back off (using the old __fpscr_values approach). This was not meant to be a short-term mitigation. In fact figuring out whether FPSCR bits can be clobbered during a PR mode switch or not is already not so simple. If at all, it would be an additional optimization opportunity after everything else is in place. Keeping the 'global __fpscr_values' approach as an option could be useful for thread model = single, or for bare-metal applications where the rounding mode etc is never changed from the default and FP status bits are never read back by application code.