public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [SH4] DFmode splits after reload
@ 2003-12-12  8:52 Rakesh Kumar - Software, Noida
  2003-12-16  1:23 ` Jim Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Rakesh Kumar - Software, Noida @ 2003-12-12  8:52 UTC (permalink / raw)
  To: gcc

Hi All,

  The instruction
        a[i] += a[j];       /* a is a double array */
 when compiled for SH4 (-O2 -ml -m4) generates the following assembly.

fmov.s @r2+,fr5
fmov.s @r1,fr2
add #-4,r1              <-- Here
fmov.s @r2,fr4
add #4,r1               <-- Here
fadd dr4,dr2
fmov.s fr2,@r1
fmov.s fr3,@-r1

 Clearly, the marked instructions are useless and should be omitted.
SH4 doesn't allow for 64-bit load/stores and hence a DF mode load/store
insn is split into 3 insns (after reload) -> 2 32-bit load/stores and 
one address arithmetic insn. These address arithmetic insns clutter the
code.
Also, ISP2 freedom is restricted due to unnecessary insns. The extreme case
is present in stress-1.17/layer3.i benchmark.

  Is there any possibility of recombination of splitted instructions
with other insns? There is a lot of operators in scope - ADD, SUB,
OR, AND, Post/Pre Inc/Dec, etc. IMHO, it should clean the code on other
processors also.

Regards,
Rakesh Kumar

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SH4] DFmode splits after reload
  2003-12-12  8:52 [SH4] DFmode splits after reload Rakesh Kumar - Software, Noida
@ 2003-12-16  1:23 ` Jim Wilson
  2003-12-16  1:42   ` Jim Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Jim Wilson @ 2003-12-16  1:23 UTC (permalink / raw)
  To: Rakesh Kumar - Software, Noida; +Cc: gcc

Rakesh Kumar - Software, Noida wrote:
> fmov.s @r1,fr2
> add #-4,r1              <-- Here
> fmov.s @r2,fr4
> add #4,r1               <-- Here

The code seems broken in that it isn't setting fr3 before using it.  Why 
did that load get optimized out?  When did it get optimized out?  Was 
the load really unnecessary?

Maybe it is unnecessary because of a loop?.  If so, it seems like a 
bigger problem here in that the a[i] load should have been hoisted up 
out of the inner loop, and the store pushed after the inner loop.

If it was legitimately optimized out, then one could argue that there is 
a problem with optimizing out unnecessary loads/stores.  We should have 
been able to simplify load before it was split.

If it is still a legitimate optimization, and the load was optimized out 
before the postreload pass, then it might be reasonable to do something 
there.  There are already some somewhat similar optimizations done by 
the postreload pass.  See reload_cse_move2add.
-- 
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SH4] DFmode splits after reload
  2003-12-16  1:23 ` Jim Wilson
@ 2003-12-16  1:42   ` Jim Wilson
  0 siblings, 0 replies; 5+ messages in thread
From: Jim Wilson @ 2003-12-16  1:42 UTC (permalink / raw)
  To: Rakesh Kumar - Software, Noida; +Cc: gcc

Rakesh Kumar - Software, Noida wrote:
> fmov.s @r1,fr2
> add #-4,r1              <-- Here
> fmov.s @r2,fr4
> add #4,r1               <-- Here

The code seems broken in that it isn't setting fr3 before using it.  Why 
did that load get optimized out?  When did it get optimized out?  Was 
the load really unnecessary?

Maybe it is unnecessary because of a loop?.  If so, it seems like a 
bigger problem here in that the a[i] load should have been hoisted up 
out of the inner loop, and the store pushed after the inner loop.

If it was legitimately optimized out, then one could argue that there is 
a problem with optimizing out unnecessary loads/stores.  We should have 
been able to simplify load before it was split.

If it is still a legitimate optimization, and the load was optimized out 
before the postreload pass, then it might be reasonable to do something 
there.  There are already some somewhat similar optimizations done by 
the postreload pass.  See reload_cse_move2add.
-- 
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [SH4] DFmode splits after reload
  2003-12-16 12:56 Rakesh Kumar - Software, Noida
@ 2003-12-16 13:27 ` Jim Wilson
  0 siblings, 0 replies; 5+ messages in thread
From: Jim Wilson @ 2003-12-16 13:27 UTC (permalink / raw)
  To: Rakesh Kumar - Software, Noida; +Cc: gcc

On Mon, 2003-12-15 at 21:58, Rakesh Kumar - Software, Noida wrote:
>         fmov.s  @r1+,fr3   <-- Load 64-bit a[i] in two 32-bit halves
>         fmov.s  @r1,fr2
>         add     #-4,r1     <-- Restore the value in r1
>...
>         add     #4,r1
>         fmov.s  fr2,@r1    <-- Store a[i] in 32-bit pieces
>         fmov.s  fr3,@-r1

Thanks, that helps.  The two adds come from different insn splits.  I
thought they were coming from the same split, and hence I got confused
about where the missing instruction was.

> My idea is to recombine the instructions,
> if possible, as and when splitting takes place. As in this case, at the
> time of DFmode store insn, the splitter could have looked for the previous
> use/definition of r1, hence removing two address arithmetic instructions.

That seems risky.  splitting can happen in multiple places, and an
instruction that is split is not necessarily emitted.  combine for
instance will try splits, and then sometimes throw the result away.

Try defining a peephole2 pattern to handle this.  You need to split it
into nothing, but there should be a way to make that work.
-- 
Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [SH4] DFmode splits after reload
@ 2003-12-16 12:56 Rakesh Kumar - Software, Noida
  2003-12-16 13:27 ` Jim Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Rakesh Kumar - Software, Noida @ 2003-12-16 12:56 UTC (permalink / raw)
  To: Jim Wilson, Rakesh Kumar - Software, Noida; +Cc: gcc



> -----Original Message-----
> From: Jim Wilson [mailto:wilson@specifixinc.com]
> Sent: Tuesday, December 16, 2003 6:46 AM
> To: Rakesh Kumar - Software, Noida
> Cc: gcc@gcc.gnu.org
> Subject: Re: [SH4] DFmode splits after reload
> 
> The code seems broken in that it isn't setting fr3 before 
> using it.  Why 
> did that load get optimized out?  When did it get optimized out?  Was 
> the load really unnecessary?
> 

Hi,

  Probably it was my mistake to snip the assembly. I have a simple statement

         a[i] += a[j];   /* a is a double array */

There is no loop involved. On SH4, with -O2 -ml -m4 -fno-schedule-insns2,
following is the assembly output by GCC (r1 holds a[i] and r2 holds a[j]).

        fmov.s  @r1+,fr3   <-- Load 64-bit a[i] in two 32-bit halves
        fmov.s  @r1,fr2
        add     #-4,r1     <-- Restore the value in r1
        fmov.s  @r2+,fr5   <-- Load first 32-bits of a[j]
        fmov.s  @r2,fr4    <-- Load second 32-bits of a[j]
        fadd    dr4,dr2    <-- a[i] + a[j]
        add     #4,r1
        fmov.s  fr2,@r1    <-- Store a[i] in 32-bit pieces
        fmov.s  fr3,@-r1

  As I said earlier, SH4 doesn't allow 64-bit load/stores. It has to be
broken into two 32-bit transfers in flow2 pass. Hence, first 3
instructions in the assembly are a result of split of a single
instruction (from postreload pass)

         (insn:HI 19 18 20 1 0x4024f800 (parallel [
            (set (reg:DF 66 fr2 [171])
                (mem:DF (reg/f:SI 1 r1 [164]) [3 S8 A32]))
            (use (reg/v:PSI 151 fpscr))
            (clobber (scratch:SI))
        ]) 142 {movdf_i4} (insn_list 12 (nil))
    (nil))

Similarly, last 3 instructions are splitted from

         (insn:HI 22 21 24 1 0x4024f800 (parallel [
            (set (mem:DF (reg/f:SI 1 r1 [164]) [3 S8 A32])
                (reg:DF 66 fr2 [171]))
            (use (reg/v:PSI 151 fpscr))
            (clobber (scratch:SI))
        ]) 142 {movdf_i4} (insn_list 21 (nil))
    (nil))

In the ideal case, if the splitter had known in advance that r1 does not
need to be restored, because it is being used in subsequent instructions,
we could have avoided the address arithmetic insns. But splitter cannot
foresee the problems. And then it could be modified to compensate for
its past mistakes.

reload_cse_move2add does not help in this case, since these splits
happen in flow2. My idea is to recombine the instructions,
if possible, as and when splitting takes place. As in this case, at the
time of DFmode store insn, the splitter could have looked for the previous
use/definition of r1, hence removing two address arithmetic instructions.

Thanks and Regards,
Rakesh Kumar

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-12-16  7:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-12  8:52 [SH4] DFmode splits after reload Rakesh Kumar - Software, Noida
2003-12-16  1:23 ` Jim Wilson
2003-12-16  1:42   ` Jim Wilson
2003-12-16 12:56 Rakesh Kumar - Software, Noida
2003-12-16 13:27 ` Jim Wilson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).