From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-160419-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30183 invoked by alias); 6 Apr 2010 09:24:43 -0000
Received: (qmail 30154 invoked by uid 22791); 6 Apr 2010 09:24:31 -0000
X-SWARE-Spam-Status: No, hits=0.3 required=5.0 	tests=BAYES_05,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,SARE_MSGID_LONG45,T_TO_NO_BRKTS_FREEMAIL
X-Spam-Check-By: sourceware.org
Received: from mail-pv0-f175.google.com (HELO mail-pv0-f175.google.com) (74.125.83.175)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 06 Apr 2010 09:24:26 +0000
Received: by pvb32 with SMTP id 32so122588pvb.20         for <gcc@gcc.gnu.org>; Tue, 06 Apr 2010 02:24:25 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.113.7 with HTTP; Tue, 6 Apr 2010 02:24:25 -0700 (PDT)
Date: Tue, 06 Apr 2010 09:24:00 -0000
Received: by 10.114.139.10 with SMTP id m10mr5859009wad.128.1270545865175;  	Tue, 06 Apr 2010 02:24:25 -0700 (PDT)
Message-ID: <m2sbba479b11004060224g59086b3j5e53b9704ea179bd@mail.gmail.com>
Subject: lower subreg optimization
From: roy rosen <roy.1rosen@gmail.com>
To: gcc@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2010-04/txt/msg00075.txt.bz2

Hi,

I have encountered several problems with lower subreg optimization in my port.
In some cases I noticed that insns are decomposed in subreg1 pass and
do not get recomposed later which causes at the end using two insns
instead of one.


For example I have the following dump before subreg1

(note 30 93 31 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 31 30 32 7 a.c:25 (set (reg:V4HI 112)
        (mem:V4HI (reg/f:SI 98 [ __vect_p_41 ]) [2 S8 A64])) 115
{*movv4hi_load} (nil))

(insn 32 31 33 7 a.c:25 (set (reg:V4HI 113)
        (mem:V4HI (reg/f:SI 99 [ __vect_p_36 ]) [2 S8 A64])) 115
{*movv4hi_load} (nil))

(insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
        (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0)
            (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil))

(insn 34 33 35 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 4)
        (plus:V2HI (subreg:V2HI (reg:V4HI 112) 4)
            (subreg:V2HI (reg:V4HI 113) 4))) 118 {addv2hi3} (nil))

(insn 35 34 36 7 a.c:25 (set (reg:V4HI 114)
        (vec_concat:V4HI (subreg:V2HI (reg:V4HI 114) 0)
            (subreg:V2HI (reg:V4HI 114) 4))) 119 {concat_v2hi_to_v4hi}
(expr_list:REG_EQUAL (plus:V4HI (reg:V4HI 112)
            (reg:V4HI 113))
        (nil)))

(insn 36 35 37 7 a.c:25 (set (mem:V4HI (reg/f:SI 97 [ __vect_p_47 ]) [2 S8 A64])
        (reg:V4HI 114)) 116 {*movv4hi_store} (nil))

which turns into:

(note 30 93 94 7 [bb 7] NOTE_INSN_BASIC_BLOCK)

(insn 94 30 95 7 a.c:25 (set (reg:SI 142)
        (mem:SI (reg/f:SI 98 [ __vect_p_41 ]) [2 S4 A64])) 62
{movsi_load} (nil))

(insn 95 94 96 7 a.c:25 (set (reg:SI 143 [+4 ])
        (mem:SI (plus:SI (reg/f:SI 98 [ __vect_p_41 ])
                (const_int 4 [0x4])) [2 S4 A32])) 62 {movsi_load} (nil))

(insn 96 95 97 7 a.c:25 (set (reg:SI 144)
        (mem:SI (reg/f:SI 99 [ __vect_p_36 ]) [2 S4 A64])) 62
{movsi_load} (nil))

(insn 97 96 33 7 a.c:25 (set (reg:SI 145 [+4 ])
        (mem:SI (plus:SI (reg/f:SI 99 [ __vect_p_36 ])
                (const_int 4 [0x4])) [2 S4 A32])) 62 {movsi_load} (nil))

(insn 33 97 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
        (plus:V2HI (subreg:V2HI (reg:SI 142) 0)
            (subreg:V2HI (reg:SI 144) 0))) 118 {addv2hi3} (nil))

(insn 34 33 35 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 4)
        (plus:V2HI (subreg:V2HI (reg:SI 143 [+4 ]) 0)
            (subreg:V2HI (reg:SI 145 [+4 ]) 0))) 118 {addv2hi3} (nil))

(insn 35 34 36 7 a.c:25 (set (reg:V4HI 114)
        (vec_concat:V4HI (subreg:V2HI (reg:V4HI 114) 0)
            (subreg:V2HI (reg:V4HI 114) 4))) 119 {concat_v2hi_to_v4hi} (nil))

(insn 36 35 98 7 a.c:25 (set (mem:V4HI (reg/f:SI 97 [ __vect_p_47 ]) [2 S8 A64])
        (reg:V4HI 114)) 116 {*movv4hi_store} (nil))

notice that now the loads are being done in SI mode which is twice
expensive than in V4HI mode.

Can someone please help with that?
Should this code be decomposed and then composed (which it doesn't) or
should it not be decoposed at the first place.
What should I change in order to get at the end a load for v4hi.
Thanks, Roy.