lower subreg optimization

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* lower subreg optimization
@ 2010-04-06  9:24 roy rosen
  2010-04-06 16:37 ` Jim Wilson
  2010-04-06 16:58 ` Ian Lance Taylor
  0 siblings, 2 replies; 10+ messages in thread
From: roy rosen @ 2010-04-06  9:24 UTC (permalink / raw)
  To: gcc

Hi,

I have encountered several problems with lower subreg optimization in my port.
In some cases I noticed that insns are decomposed in subreg1 pass and
do not get recomposed later which causes at the end using two insns
instead of one.


For example I have the following dump before subreg1

(note 30 93 31 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 31 30 32 7 a.c:25 (set (reg:V4HI 112)
        (mem:V4HI (reg/f:SI 98 [ __vect_p_41 ]) [2 S8 A64])) 115
{*movv4hi_load} (nil))

(insn 32 31 33 7 a.c:25 (set (reg:V4HI 113)
        (mem:V4HI (reg/f:SI 99 [ __vect_p_36 ]) [2 S8 A64])) 115
{*movv4hi_load} (nil))

(insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
        (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0)
            (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil))

(insn 34 33 35 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 4)
        (plus:V2HI (subreg:V2HI (reg:V4HI 112) 4)
            (subreg:V2HI (reg:V4HI 113) 4))) 118 {addv2hi3} (nil))

(insn 35 34 36 7 a.c:25 (set (reg:V4HI 114)
        (vec_concat:V4HI (subreg:V2HI (reg:V4HI 114) 0)
            (subreg:V2HI (reg:V4HI 114) 4))) 119 {concat_v2hi_to_v4hi}
(expr_list:REG_EQUAL (plus:V4HI (reg:V4HI 112)
            (reg:V4HI 113))
        (nil)))

(insn 36 35 37 7 a.c:25 (set (mem:V4HI (reg/f:SI 97 [ __vect_p_47 ]) [2 S8 A64])
        (reg:V4HI 114)) 116 {*movv4hi_store} (nil))

which turns into:

(note 30 93 94 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 94 30 95 7 a.c:25 (set (reg:SI 142)
        (mem:SI (reg/f:SI 98 [ __vect_p_41 ]) [2 S4 A64])) 62
{movsi_load} (nil))

(insn 95 94 96 7 a.c:25 (set (reg:SI 143 [+4 ])
        (mem:SI (plus:SI (reg/f:SI 98 [ __vect_p_41 ])
                (const_int 4 [0x4])) [2 S4 A32])) 62 {movsi_load} (nil))

(insn 96 95 97 7 a.c:25 (set (reg:SI 144)
        (mem:SI (reg/f:SI 99 [ __vect_p_36 ]) [2 S4 A64])) 62
{movsi_load} (nil))

(insn 97 96 33 7 a.c:25 (set (reg:SI 145 [+4 ])
        (mem:SI (plus:SI (reg/f:SI 99 [ __vect_p_36 ])
                (const_int 4 [0x4])) [2 S4 A32])) 62 {movsi_load} (nil))

(insn 33 97 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
        (plus:V2HI (subreg:V2HI (reg:SI 142) 0)
            (subreg:V2HI (reg:SI 144) 0))) 118 {addv2hi3} (nil))

(insn 34 33 35 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 4)
        (plus:V2HI (subreg:V2HI (reg:SI 143 [+4 ]) 0)
            (subreg:V2HI (reg:SI 145 [+4 ]) 0))) 118 {addv2hi3} (nil))

(insn 35 34 36 7 a.c:25 (set (reg:V4HI 114)
        (vec_concat:V4HI (subreg:V2HI (reg:V4HI 114) 0)
            (subreg:V2HI (reg:V4HI 114) 4))) 119 {concat_v2hi_to_v4hi} (nil))

(insn 36 35 98 7 a.c:25 (set (mem:V4HI (reg/f:SI 97 [ __vect_p_47 ]) [2 S8 A64])
        (reg:V4HI 114)) 116 {*movv4hi_store} (nil))

notice that now the loads are being done in SI mode which is twice
expensive than in V4HI mode.

Can someone please help with that?
Should this code be decomposed and then composed (which it doesn't) or
should it not be decoposed at the first place.
What should I change in order to get at the end a load for v4hi.
Thanks, Roy.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-06  9:24 lower subreg optimization roy rosen
@ 2010-04-06 16:37 ` Jim Wilson
  2010-04-08  6:16   ` roy rosen
  2010-04-06 16:58 ` Ian Lance Taylor
  1 sibling, 1 reply; 10+ messages in thread
From: Jim Wilson @ 2010-04-06 16:37 UTC (permalink / raw)
  To: roy rosen; +Cc: gcc

On 04/06/2010 02:24 AM, roy rosen wrote:
> (insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
>          (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0)
>              (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil))

Only subregs are decomposed.  So use vec_select instead of subreg.  I 
see you already have a vec_concat to combine the two v2hi into one v4hi, 
so there is no need for the subreg in the dest.  You should try 
eliminating that first and see if that helps.  If that isn't enough, 
then replace the subregs in the source with vec_select operations.

Jim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-06 16:37 ` Jim Wilson
@ 2010-04-08  6:16   ` roy rosen
  2010-04-09 16:52     ` Jim Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: roy rosen @ 2010-04-08  6:16 UTC (permalink / raw)
  To: Jim Wilson; +Cc: gcc

2010/4/6, Jim Wilson <wilson@codesourcery.com>:
> On 04/06/2010 02:24 AM, roy rosen wrote:
> > (insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
> >         (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0)
> >             (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil))
> >
>
> Only subregs are decomposed.  So use vec_select instead of subreg.  I see
> you already have a vec_concat to combine the two v2hi into one v4hi, so
> there is no need for the subreg in the dest.  You should try eliminating
> that first and see if that helps.  If that isn't enough, then replace the
> subregs in the source with vec_select operations.
>
> Jim
>

Thanks Jim,

I have implemented your suggestion and now I am using vec_select and
the subreg optimization does not decomopose the instruction.
The problem now is that I get stuck with redundent instructions (that
I translate to move insns).
For example:

(insn 37 32 38 7 a.c:25 (set (reg:V2HI 116)
        (vec_concat:V2HI (vec_select:HI (reg:V4HI 112)
                (parallel [
                        (const_int 0 [0x0])
                    ]))
            (vec_select:HI (reg:V4HI 112)
                (parallel [
                        (const_int 1 [0x1])
                    ])))) 121 {v4hi_extract_low_v2hi}
(expr_list:REG_DEAD (reg:V4HI 112)
        (nil)))

This instruction eventually has to be optimized out somehow. It is
dealing with extracting V2HI from V4HI. V4HI is stored in a register
pair (like r0:r1) and V2HI would simply mean to take one of these
registers - this does not need an instruction.

I saw in arm/neon.md that they have a similar problem:

; FIXME: We wouldn't need the following insns if we could write subregs of
; vector registers. Make an attempt at removing unnecessary moves, though
; we're really at the mercy of the register allocator.

(define_insn "move_lo_quad_v4si"
  [(set (match_operand:V4SI 0 "s_register_operand" "+w")
        (vec_concat:V4SI
          (match_operand:V2SI 1 "s_register_operand" "w")
          (vec_select:V2SI (match_dup 0)
			   (parallel [(const_int 2) (const_int 3)]))))]
  "TARGET_NEON"
{
  int dest = REGNO (operands[0]);
  int src = REGNO (operands[1]);

  if (dest != src)
    return "vmov\t%e0, %P1";
  else
    return "";
}
  [(set_attr "neon_type" "neon_bp_simple")]
)

Their solution is also not complete.
What is the proper way to handle such a case and how do I let gcc know
that this is a simple move instruction so that gcc would be able to
optimize it out?

Thanks, Roy.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-08  6:16   ` roy rosen
@ 2010-04-09 16:52     ` Jim Wilson
  0 siblings, 0 replies; 10+ messages in thread
From: Jim Wilson @ 2010-04-09 16:52 UTC (permalink / raw)
  To: roy rosen; +Cc: gcc

On 04/07/2010 10:48 PM, roy rosen wrote:
> I saw in arm/neon.md that they have a similar problem:
> ...
> Their solution is also not complete.
> What is the proper way to handle such a case and how do I let gcc know
> that this is a simple move instruction so that gcc would be able to
> optimize it out?

The only simple solution at the moment is the one that the ARM port is 
using.  You avoid emitting the move when you got the lucky reg-alloc 
result, and you emit the move when you aren't lucky.

As the neon.md comment suggests, and as Ian Taylor mentioned in his 
response, a possible solution is to modify the lower-subreg.c pass 
somehow so that it no longer splits subregs of vector modes, possibly 
controlled by a hook.

We might be able to modify the register allocator to look for this 
pattern, to increase the chances of getting the good reg-alloc result, 
but the lower-subreg.c change is probably better.

Another solution might be to add a pass (or modify an existing one like 
regmove.c) to try to put things back together again, but this is 
probably also not as good as the lower-subreg.c change.

Jim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-06  9:24 lower subreg optimization roy rosen
  2010-04-06 16:37 ` Jim Wilson
@ 2010-04-06 16:58 ` Ian Lance Taylor
  2010-04-06 17:13   ` Nathan Froyd
  1 sibling, 1 reply; 10+ messages in thread
From: Ian Lance Taylor @ 2010-04-06 16:58 UTC (permalink / raw)
  To: roy rosen; +Cc: gcc

roy rosen <roy.1rosen@gmail.com> writes:

> I have encountered several problems with lower subreg optimization in my port.
> In some cases I noticed that insns are decomposed in subreg1 pass and
> do not get recomposed later which causes at the end using two insns
> instead of one.

In the code the register is always accessed via a subreg, so the
lower-subregs pass thinks that it is OK to decompose the register.
Once it is decomposed, nothing is expected to put it back together.

To fix this, you should probably look at simple_move in
lower-subreg.c.  You will want it to return NULL_RTX for a vector load
or store.  Perhaps it should check costs, or perhaps it should never
decompose explicit vector modes.

Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-06 16:58 ` Ian Lance Taylor
@ 2010-04-06 17:13   ` Nathan Froyd
  2010-04-06 17:27     ` Steven Bosscher
  2010-04-06 18:55     ` Ian Lance Taylor
  0 siblings, 2 replies; 10+ messages in thread
From: Nathan Froyd @ 2010-04-06 17:13 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: roy rosen, gcc

On Tue, Apr 06, 2010 at 09:58:23AM -0700, Ian Lance Taylor wrote:
> In the code the register is always accessed via a subreg, so the
> lower-subregs pass thinks that it is OK to decompose the register.
> Once it is decomposed, nothing is expected to put it back together.
> 
> To fix this, you should probably look at simple_move in
> lower-subreg.c.  You will want it to return NULL_RTX for a vector load
> or store.  Perhaps it should check costs, or perhaps it should never
> decompose explicit vector modes.

Compiling anything that uses doubles on powerpc e500v2 produces awful
code due in part to lower-subregs (the register allocator doesn't help,
either, but that's a different story).  Code that looks like:

  rY:DI = r<ARG>:DI
  rX:DI = rY:DI
  (subreg:DF rZ:DI 0) = rX:DI

<ARG> is a hard register for argument passing; the code looks equally
awful inside of a function, too.  The above gets lowered to:

1:  r<Y>:SI = r<ARG>:SI
2:  r<Y+1>:SI = r<ARG+1>:SI
3:  (subreg:SI rX:DI 0) = r<Y>:SI
4:  (subreg:SI rX:DI 4) = r<Y+1>:SI
5:  (subreg:DF rZ:DI 0) = rX:DI

which usually results in two stores and a load against the stack, rather
than a single-instruction dealing entirely in registers.  I realize
e500v2 is not exactly a mainstream target, but perhaps a target hook is
appropriate here?  I suppose checking costs might achieve the same
thing.

-Nathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-06 17:13   ` Nathan Froyd
@ 2010-04-06 17:27     ` Steven Bosscher
  2010-04-06 18:55     ` Ian Lance Taylor
  1 sibling, 0 replies; 10+ messages in thread
From: Steven Bosscher @ 2010-04-06 17:27 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: Ian Lance Taylor, roy rosen, gcc

On Tue, Apr 6, 2010 at 7:12 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> I realize
> e500v2 is not exactly a mainstream target, but perhaps a target hook is
> appropriate here

Big hammer. Preferred tool for jobs in the real world.

>  I suppose checking costs might achieve the same
> thing.

Small hammer. Preferred tool for jobs in the GCC world.

I suppose the trouble will be that rtx_cost is not going to be
accurate enough for the job...

Ciao!
Steven

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-06 17:13   ` Nathan Froyd
  2010-04-06 17:27     ` Steven Bosscher
@ 2010-04-06 18:55     ` Ian Lance Taylor
  2010-04-06 19:05       ` Nathan Froyd
  2010-04-06 19:23       ` Joseph S. Myers
  1 sibling, 2 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2010-04-06 18:55 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: roy rosen, gcc

Nathan Froyd <froydnj@codesourcery.com> writes:

> Compiling anything that uses doubles on powerpc e500v2 produces awful
> code due in part to lower-subregs (the register allocator doesn't help,
> either, but that's a different story).  Code that looks like:
>
>   rY:DI = r<ARG>:DI
>   rX:DI = rY:DI
>   (subreg:DF rZ:DI 0) = rX:DI
>
> <ARG> is a hard register for argument passing; the code looks equally
> awful inside of a function, too.  The above gets lowered to:
>
> 1:  r<Y>:SI = r<ARG>:SI
> 2:  r<Y+1>:SI = r<ARG+1>:SI
> 3:  (subreg:SI rX:DI 0) = r<Y>:SI
> 4:  (subreg:SI rX:DI 4) = r<Y+1>:SI
> 5:  (subreg:DF rZ:DI 0) = rX:DI
>
> which usually results in two stores and a load against the stack, rather
> than a single-instruction dealing entirely in registers.  I realize
> e500v2 is not exactly a mainstream target, but perhaps a target hook is
> appropriate here?  I suppose checking costs might achieve the same
> thing.

I doubt that a target hook is required to avoid this.  Perhaps
simple_move_operand should reject a mode changing subreg when the two
modes are !MODE_TIEABLE_P.

This code is sort of weird, though; why the conversion from DImode to
DFmode?

Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-06 18:55     ` Ian Lance Taylor
@ 2010-04-06 19:05       ` Nathan Froyd
  2010-04-06 19:23       ` Joseph S. Myers
  1 sibling, 0 replies; 10+ messages in thread
From: Nathan Froyd @ 2010-04-06 19:05 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: roy rosen, gcc

On Tue, Apr 06, 2010 at 11:55:01AM -0700, Ian Lance Taylor wrote:
> Nathan Froyd <froydnj@codesourcery.com> writes:
> > Compiling anything that uses doubles on powerpc e500v2 produces awful
> > code due in part to lower-subregs (the register allocator doesn't help,
> > either, but that's a different story).
> 
> I doubt that a target hook is required to avoid this.  Perhaps
> simple_move_operand should reject a mode changing subreg when the two
> modes are !MODE_TIEABLE_P.

Ah, thanks for the pointer. I'll try poking at that.

> This code is sort of weird, though; why the conversion from DImode to
> DFmode?

Welcome to the wonderful world of e500, which has floating-point
instructions operating on the general purpose registers.

-Nathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: lower subreg optimization
  2010-04-06 18:55     ` Ian Lance Taylor
  2010-04-06 19:05       ` Nathan Froyd
@ 2010-04-06 19:23       ` Joseph S. Myers
  1 sibling, 0 replies; 10+ messages in thread
From: Joseph S. Myers @ 2010-04-06 19:23 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Nathan Froyd, roy rosen, gcc

On Tue, 6 Apr 2010, Ian Lance Taylor wrote:

> This code is sort of weird, though; why the conversion from DImode to
> DFmode?

See <http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00245.html> for 
discussion of the e500 subregs and their semantics.  I made them work 
reliably (for 4.3 and later, 4.4 and later when decimal floating-point 
modes are involved) - but not necessarily efficiently in all cases.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-04-09 16:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-06  9:24 lower subreg optimization roy rosen
2010-04-06 16:37 ` Jim Wilson
2010-04-08  6:16   ` roy rosen
2010-04-09 16:52     ` Jim Wilson
2010-04-06 16:58 ` Ian Lance Taylor
2010-04-06 17:13   ` Nathan Froyd
2010-04-06 17:27     ` Steven Bosscher
2010-04-06 18:55     ` Ian Lance Taylor
2010-04-06 19:05       ` Nathan Froyd
2010-04-06 19:23       ` Joseph S. Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).