From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30442 invoked by alias); 14 Dec 2014 13:56:10 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 30405 invoked by uid 48); 14 Dec 2014 13:56:04 -0000 From: "olegendo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/64305] New: [SH] Add support for fschg insn and 64 bit FP moves Date: Sun, 14 Dec 2014 13:56:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: olegendo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-12/txt/msg01617.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64305 Bug ID: 64305 Summary: [SH] Add support for fschg insn and 64 bit FP moves Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: olegendo at gcc dot gnu.org Target: sh*-*-* Currently, 64 bit FP moves are utilized only for handling DFmode types when the option -mfmovd (and is specified. The way it's done right now works only on SH4A and SH2A, since FPSCR.SZ is tied to FPSCR.PR. On SH4A and SH2A loading DFmode types from memory using 64 bit FP moves (FPSCR.SZ = 1) performs little/big endian swapping if FPSCR.PR = 1. If FPSCR.PR = 0 the two 32 bit halves are loaded as a pair of SFmode values in big endian order. On SH4 64 bit FP moves are only defined for FPSCR.SZ = 1 and FPSCR.PR = 0, which allows loading of DFmode values in big endian ordering only. 64 bit FP moves can be used for accessing DFmode types in memory on SH4 little endian, but the memory layout for those values would be have to be half little-endian half big-endian. This could be realized with some optional -m setting. 64 bit FP moves for (FPSCR.SZ = 1 FPSCR.PR = 0) can be utilized on SH4, SH4A, SH2A for doing SFmode vector loads, since the order of the vector elements is endian invariant. E.g. the following typedef float v4sf __attribute__ ((vector_size (16))); float test (v4sf* x) { return (*x)[0]; } compiles to rts fmov.s @r4,fr0 regardless of the endian mode. What is currently lacking to realize the above is FPSCR.SZ mode switching. Notice that the 'fschg' insn is only valid when FPSCR.PR = 0 on all FPU enabled cores (SH2A, SH4, SH4A). Thus FPSCR.SZ mode switching depends on FPSCR.PR mode switching to some extent. On SH2A and SH4 FPSCR.PR mode switching is done using sts-modify-lds sequences of FPSCR, since there is no fpchg insn. If SZ and PR mode switching is done independently, multiple FPSCR mode switches might need combining for better efficiency. In some cases FP register-to-register moves and loads/stores of adjacent SFmode values can also be done via 64 bit FP moves. Recently a new pass 'pass_sched_fusion' has been added, which tries to fuse such adjacent loads/stores. On SH 64 bit FP moves can only operate on even register numbers, thus fusing loads/stores has an impact on the register allocation. The new pass 'pass_sched_fusion' on the other hand is done before peephole2, which is after register allocation/reload and thus will probably not be that useful on SH. Whether using a 64 bit FP move (either for SFmode vectors, DFmode types or fused SFmode access) will be beneficial or not depends on the surrounding code and the number of FPSCR.SZ mode switches that need to be inserted. If insns can't be grouped to minimize mode switches (see PR 64299) it might be better to split 64 bit FP moves into 32 bit FP moves.