public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "amylaar at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug target/29969] New: should use floating point registers for block copies Date: Fri, 24 Nov 2006 12:03:00 -0000 [thread overview] Message-ID: <bug-29969-5394@http.gcc.gnu.org/bugzilla/> (raw) In integer-dominated code, it is often useful to use floating point registers to do block copies. If suitable alignment is available, 64 bit loads / stores allow to do the copy with half as many memory operations. If the source is loop invariant, the loads can be hoisted out of the loop; register pressure usually makes this unfeasible for integer registers. The destination, and, if not loop invariant, the source need to be at least 32 bit aligned for this to be profitable (or at least there must be a known constant offset to such an alignment. At -O3, preconditioning could be used to cover all possible offsets and select the code at run-time). Also, a minimum size is required. The total size need not be aligned, as smaller pieces can be copied in integer registers. A testcase for this is the main loop of dhrystone, where the two strings fit into 4 64-bit values each (after padding), and cse allows to fit them in 5 64-bit values together. Four of these fit into the call saved registers dr12, dr14, xd12 and xd14, thus their loads can be hoisted out of the loop. The tree of the current function could be examined for heuristics to determine if using floating point registers for block copies makes sense (look for high integer register pressure and low floating point register pressure - call saved registers if a loop invariant crosses a call; might also take different integer / floating point memory latencies into account if the block is relatively short, by checking if there appear to be a sufficient number of other instructions to hide some of the latency. Alternatively or additionally, an option and/or parameters used in the heuristics can be used to control the behaviour. To increase the incidence of suitably aligned copies, constant alignment and data alignment for block copy destinations of suitable size which are defined in the current compilation unit should be increased to 64 bit, and such data items should also be padded to 64 bits. This may be controlled by an invocation option. (If the last 64 bit item would contain no more than 32 bits, and the register pressure is too high to hoist out all loads, padding to fit 8 / 16 / 32 bit is sufficient. The latter padding is useful for integer copies in general) When doing LTO, this might be expanded to items which are defined in other compilation units, and to special cases of indirect references. The actual copy is best done exploiting post-increment for load and pre-decrement for store, and is thus highly machine specific. It therefore seems best to do this in sh.c:expand_block_move. Thus, STORE_BY_PIECES_P and MOVE_BY_PIECES_P will have to reject the size and alignment combinations of copies that we want to handle this way. Due to a quirk in the SH4 specification, we need a third fp_mode value for 64 bit loads / stores (unless FMOVD_WORKS is true). This mode has FPSCR.PR cleared and FPSCR.SZ set. To get the full benefit for copies that are in a loop that does calls, we should fix rtl-optimization/29349 first. When using the -m4-single ABI, the new mode can be generated from the normal mode by issuing one fschg instruction; we can switch back with another fschg instruction. For -m4a or -m4-300, we need both an fpchg and an fschg; -m4 must load the new mode from a third value in fpscr_values. The actual loads and stores must not look like ordinary SImode or DImode loads and stores, because that would give - via GO_IF_LEGITIMATE_ADDRESS - the wrong message to the optimizers about the available addressing modes. Moreover, POST_INC / PRE_DEC are currently not allowed at rtl generation time. A possible sulution is to use patterns that pair the load / store with an explicit set of the address register. I'd prefer to use two match_dup to keep the address register in sync, since otherwise the optimizers can too easily hijack the pattern for something inappropriate. The MEMs are probably best using SFmode / DFmode, but wrapping them in an SImode / DImode unspec; however, care must be taken to still get the right alias set for the MEM. -- Summary: should use floating point registers for block copies Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: amylaar at gcc dot gnu dot org GCC target triplet: sh4*-*-* BugsThisDependsOn: 29349 OtherBugsDependingO 29842 nThis: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29969
next reply other threads:[~2006-11-24 12:03 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2006-11-24 12:03 amylaar at gcc dot gnu dot org [this message] 2006-11-25 23:36 ` [Bug target/29969] " pinskia at gcc dot gnu dot org 2010-06-10 16:43 ` marc dot mengel at gmail dot com 2010-06-10 19:59 ` amylaar at gcc dot gnu dot org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-29969-5394@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).