public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* lower subreg optimization
@ 2010-04-06  9:24 roy rosen
  2010-04-06 16:37 ` Jim Wilson
  2010-04-06 16:58 ` Ian Lance Taylor
  0 siblings, 2 replies; 10+ messages in thread
From: roy rosen @ 2010-04-06  9:24 UTC (permalink / raw)
  To: gcc

Hi,

I have encountered several problems with lower subreg optimization in my port.
In some cases I noticed that insns are decomposed in subreg1 pass and
do not get recomposed later which causes at the end using two insns
instead of one.


For example I have the following dump before subreg1

(note 30 93 31 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 31 30 32 7 a.c:25 (set (reg:V4HI 112)
        (mem:V4HI (reg/f:SI 98 [ __vect_p_41 ]) [2 S8 A64])) 115
{*movv4hi_load} (nil))

(insn 32 31 33 7 a.c:25 (set (reg:V4HI 113)
        (mem:V4HI (reg/f:SI 99 [ __vect_p_36 ]) [2 S8 A64])) 115
{*movv4hi_load} (nil))

(insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
        (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0)
            (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil))

(insn 34 33 35 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 4)
        (plus:V2HI (subreg:V2HI (reg:V4HI 112) 4)
            (subreg:V2HI (reg:V4HI 113) 4))) 118 {addv2hi3} (nil))

(insn 35 34 36 7 a.c:25 (set (reg:V4HI 114)
        (vec_concat:V4HI (subreg:V2HI (reg:V4HI 114) 0)
            (subreg:V2HI (reg:V4HI 114) 4))) 119 {concat_v2hi_to_v4hi}
(expr_list:REG_EQUAL (plus:V4HI (reg:V4HI 112)
            (reg:V4HI 113))
        (nil)))

(insn 36 35 37 7 a.c:25 (set (mem:V4HI (reg/f:SI 97 [ __vect_p_47 ]) [2 S8 A64])
        (reg:V4HI 114)) 116 {*movv4hi_store} (nil))

which turns into:

(note 30 93 94 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 94 30 95 7 a.c:25 (set (reg:SI 142)
        (mem:SI (reg/f:SI 98 [ __vect_p_41 ]) [2 S4 A64])) 62
{movsi_load} (nil))

(insn 95 94 96 7 a.c:25 (set (reg:SI 143 [+4 ])
        (mem:SI (plus:SI (reg/f:SI 98 [ __vect_p_41 ])
                (const_int 4 [0x4])) [2 S4 A32])) 62 {movsi_load} (nil))

(insn 96 95 97 7 a.c:25 (set (reg:SI 144)
        (mem:SI (reg/f:SI 99 [ __vect_p_36 ]) [2 S4 A64])) 62
{movsi_load} (nil))

(insn 97 96 33 7 a.c:25 (set (reg:SI 145 [+4 ])
        (mem:SI (plus:SI (reg/f:SI 99 [ __vect_p_36 ])
                (const_int 4 [0x4])) [2 S4 A32])) 62 {movsi_load} (nil))

(insn 33 97 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
        (plus:V2HI (subreg:V2HI (reg:SI 142) 0)
            (subreg:V2HI (reg:SI 144) 0))) 118 {addv2hi3} (nil))

(insn 34 33 35 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 4)
        (plus:V2HI (subreg:V2HI (reg:SI 143 [+4 ]) 0)
            (subreg:V2HI (reg:SI 145 [+4 ]) 0))) 118 {addv2hi3} (nil))

(insn 35 34 36 7 a.c:25 (set (reg:V4HI 114)
        (vec_concat:V4HI (subreg:V2HI (reg:V4HI 114) 0)
            (subreg:V2HI (reg:V4HI 114) 4))) 119 {concat_v2hi_to_v4hi} (nil))

(insn 36 35 98 7 a.c:25 (set (mem:V4HI (reg/f:SI 97 [ __vect_p_47 ]) [2 S8 A64])
        (reg:V4HI 114)) 116 {*movv4hi_store} (nil))

notice that now the loads are being done in SI mode which is twice
expensive than in V4HI mode.

Can someone please help with that?
Should this code be decomposed and then composed (which it doesn't) or
should it not be decoposed at the first place.
What should I change in order to get at the end a load for v4hi.
Thanks, Roy.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-04-09 16:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-06  9:24 lower subreg optimization roy rosen
2010-04-06 16:37 ` Jim Wilson
2010-04-08  6:16   ` roy rosen
2010-04-09 16:52     ` Jim Wilson
2010-04-06 16:58 ` Ian Lance Taylor
2010-04-06 17:13   ` Nathan Froyd
2010-04-06 17:27     ` Steven Bosscher
2010-04-06 18:55     ` Ian Lance Taylor
2010-04-06 19:05       ` Nathan Froyd
2010-04-06 19:23       ` Joseph S. Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).