public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/29969]  New: should use floating point registers for block copies
@ 2006-11-24 12:03 amylaar at gcc dot gnu dot org
  2006-11-25 23:36 ` [Bug target/29969] " pinskia at gcc dot gnu dot org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: amylaar at gcc dot gnu dot org @ 2006-11-24 12:03 UTC (permalink / raw)
  To: gcc-bugs

In integer-dominated code, it is often useful to use floating point
registers to do block copies.  If suitable alignment is available,
64 bit loads / stores allow to do the copy with half as many memory
operations.  If the source is loop invariant, the loads can be
hoisted out of the loop; register pressure usually makes this
unfeasible for integer registers.
The destination, and, if not loop invariant, the source need to be
at least 32 bit aligned for this to be profitable (or at least there must
be a known constant offset to such an alignment.  At -O3, preconditioning
could be used to cover all possible offsets and select the code at
run-time).  Also, a minimum size is required.  The total size need not be
aligned, as smaller pieces can be copied in integer registers.

A testcase for this is the main loop of dhrystone, where
the two strings fit into 4 64-bit values each (after padding),
and cse allows to fit them in 5 64-bit values together.
Four of these fit into the call saved registers dr12, dr14, xd12 and xd14,
thus their loads can be hoisted out of the loop.

The tree of the current function could be examined for heuristics to
determine if using floating point registers for block copies makes sense
(look for high integer register pressure and low floating point register
pressure - call saved registers if a loop invariant crosses a call; might
also take different integer / floating point memory latencies into account
if the block is relatively short, by checking if there appear to be a
sufficient number of other instructions to hide some of the latency.
Alternatively or additionally, an option and/or parameters used in the
heuristics can be used to control the behaviour.

To increase the incidence of suitably aligned copies, constant alignment and
data alignment for block copy destinations of suitable size which are
defined in the current compilation unit should be increased to 64 bit,
and such data items should also be padded to 64 bits.
This may be controlled by an invocation option.
(If the last 64 bit item would contain no more than 32 bits, and the
 register pressure is too high to hoist out all loads, padding to fit 8
/ 16 / 32 bit is sufficient.  The latter padding is useful for integer
 copies in general)
When doing LTO, this might be expanded to items which are defined in other
compilation units, and to special cases of indirect references.

The actual copy is best done exploiting post-increment for load and
pre-decrement for store, and is thus highly machine specific.  It therefore
seems best to do this in sh.c:expand_block_move.
Thus, STORE_BY_PIECES_P and MOVE_BY_PIECES_P will have to reject the
size and alignment combinations of copies that we want to handle this way.

Due to a quirk in the SH4 specification, we need a third fp_mode value
for 64 bit loads / stores (unless FMOVD_WORKS is true).
This mode has FPSCR.PR cleared and FPSCR.SZ set.
To get the full benefit for copies that are in a loop that does calls,
we should fix rtl-optimization/29349 first.
When using the -m4-single ABI, the new mode can be generated from the
normal mode by issuing one fschg instruction; we can switch back with
another fschg instruction.
For -m4a or -m4-300, we need both an fpchg and an fschg; -m4 must load
the new mode from a third value in fpscr_values.

The actual loads and stores must not look like ordinary SImode or DImode
loads and stores, because that would give - via GO_IF_LEGITIMATE_ADDRESS -
the wrong message to the optimizers about the available addressing modes.
Moreover, POST_INC / PRE_DEC are currently not allowed at rtl generation
time.
A possible sulution is to use patterns that pair the load / store
with an explicit set of the address register.  I'd prefer to use
two match_dup to keep the address register in sync, since otherwise
the optimizers can too easily hijack the pattern for something inappropriate.
The MEMs are probably best using SFmode / DFmode, but wrapping them in an
SImode / DImode unspec; however, care must be taken to still get the
right alias set for the MEM.


-- 
           Summary: should use floating point registers for block copies
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: amylaar at gcc dot gnu dot org
GCC target triplet: sh4*-*-*
 BugsThisDependsOn: 29349
OtherBugsDependingO 29842
             nThis:


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29969


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/29969] should use floating point registers for block copies
  2006-11-24 12:03 [Bug target/29969] New: should use floating point registers for block copies amylaar at gcc dot gnu dot org
@ 2006-11-25 23:36 ` pinskia at gcc dot gnu dot org
  2010-06-10 16:43 ` marc dot mengel at gmail dot com
  2010-06-10 19:59 ` amylaar at gcc dot gnu dot org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-11-25 23:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2006-11-25 23:36 -------
Confirmed.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2006-11-25 23:36:47
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29969


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/29969] should use floating point registers for block copies
  2006-11-24 12:03 [Bug target/29969] New: should use floating point registers for block copies amylaar at gcc dot gnu dot org
  2006-11-25 23:36 ` [Bug target/29969] " pinskia at gcc dot gnu dot org
@ 2010-06-10 16:43 ` marc dot mengel at gmail dot com
  2010-06-10 19:59 ` amylaar at gcc dot gnu dot org
  2 siblings, 0 replies; 4+ messages in thread
From: marc dot mengel at gmail dot com @ 2010-06-10 16:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from marc dot mengel at gmail dot com  2010-06-10 16:43 -------
This could be a disaster if floating point exceptions are enabled, as it would
trigger an exception whenever some part of the block was an invalid floating
point number.
One would at least need to save/restore the floating point exception flags
around such a block copy.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29969


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/29969] should use floating point registers for block copies
  2006-11-24 12:03 [Bug target/29969] New: should use floating point registers for block copies amylaar at gcc dot gnu dot org
  2006-11-25 23:36 ` [Bug target/29969] " pinskia at gcc dot gnu dot org
  2010-06-10 16:43 ` marc dot mengel at gmail dot com
@ 2010-06-10 19:59 ` amylaar at gcc dot gnu dot org
  2 siblings, 0 replies; 4+ messages in thread
From: amylaar at gcc dot gnu dot org @ 2010-06-10 19:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from amylaar at gcc dot gnu dot org  2010-06-10 19:58 -------
(In reply to comment #2)
> This could be a disaster if floating point exceptions are enabled, as it would
> trigger an exception whenever some part of the block was an invalid floating
> point number.
> One would at least need to save/restore the floating point exception flags
> around such a block copy.

If that is an issue is target specific.  First, if all floating point loads
detect loading invalid floating point numbers, and sencond, if floating point
exceptions are used for the target in practical terms.
E.G. for the SH4 (at least SH4-100 / SH4-200), you get always a floating
point exception if the exception is enabled and it could happen for any
input data, so it is not very popular to enable it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29969


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-06-10 19:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-24 12:03 [Bug target/29969] New: should use floating point registers for block copies amylaar at gcc dot gnu dot org
2006-11-25 23:36 ` [Bug target/29969] " pinskia at gcc dot gnu dot org
2010-06-10 16:43 ` marc dot mengel at gmail dot com
2010-06-10 19:59 ` amylaar at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).