From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-160470-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 22333 invoked by alias); 8 Apr 2010 05:48:41 -0000
Received: (qmail 22101 invoked by uid 22791); 8 Apr 2010 05:48:40 -0000
X-SWARE-Spam-Status: No, hits=-1.1 required=5.0 	tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SARE_MSGID_LONG45
X-Spam-Check-By: sourceware.org
Received: from mail-iw0-f200.google.com (HELO mail-iw0-f200.google.com) (209.85.223.200)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 08 Apr 2010 05:48:35 +0000
Received: by iwn38 with SMTP id 38so1281241iwn.8         for <gcc@gcc.gnu.org>; Wed, 07 Apr 2010 22:48:33 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.113.7 with HTTP; Wed, 7 Apr 2010 22:48:33 -0700 (PDT)
In-Reply-To: <4BBB6358.4050602@codesourcery.com>
References: <m2sbba479b11004060224g59086b3j5e53b9704ea179bd@mail.gmail.com> 	 <4BBB6358.4050602@codesourcery.com>
Date: Thu, 08 Apr 2010 06:16:00 -0000
Received: by 10.231.149.10 with SMTP id r10mr702416ibv.63.1270705713433; Wed,  	07 Apr 2010 22:48:33 -0700 (PDT)
Message-ID: <i2vbba479b11004072248gabfac370o3737996793a15d6c@mail.gmail.com>
Subject: Re: lower subreg optimization
From: roy rosen <roy.1rosen@gmail.com>
To: Jim Wilson <wilson@codesourcery.com>
Cc: gcc@gcc.gnu.org
Content-Type: text/plain; charset=ISO-8859-1
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2010-04/txt/msg00126.txt.bz2

2010/4/6, Jim Wilson <wilson@codesourcery.com>:
> On 04/06/2010 02:24 AM, roy rosen wrote:
> > (insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
> >         (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0)
> >             (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil))
> >
>
> Only subregs are decomposed.  So use vec_select instead of subreg.  I see
> you already have a vec_concat to combine the two v2hi into one v4hi, so
> there is no need for the subreg in the dest.  You should try eliminating
> that first and see if that helps.  If that isn't enough, then replace the
> subregs in the source with vec_select operations.
>
> Jim
>

Thanks Jim,

I have implemented your suggestion and now I am using vec_select and
the subreg optimization does not decomopose the instruction.
The problem now is that I get stuck with redundent instructions (that
I translate to move insns).
For example:

(insn 37 32 38 7 a.c:25 (set (reg:V2HI 116)
        (vec_concat:V2HI (vec_select:HI (reg:V4HI 112)
                (parallel [
                        (const_int 0 [0x0])
                    ]))
            (vec_select:HI (reg:V4HI 112)
                (parallel [
                        (const_int 1 [0x1])
                    ])))) 121 {v4hi_extract_low_v2hi}
(expr_list:REG_DEAD (reg:V4HI 112)
        (nil)))

This instruction eventually has to be optimized out somehow. It is
dealing with extracting V2HI from V4HI. V4HI is stored in a register
pair (like r0:r1) and V2HI would simply mean to take one of these
registers - this does not need an instruction.

I saw in arm/neon.md that they have a similar problem:

; FIXME: We wouldn't need the following insns if we could write subregs of
; vector registers. Make an attempt at removing unnecessary moves, though
; we're really at the mercy of the register allocator.

(define_insn "move_lo_quad_v4si"
  [(set (match_operand:V4SI 0 "s_register_operand" "+w")
        (vec_concat:V4SI
          (match_operand:V2SI 1 "s_register_operand" "w")
          (vec_select:V2SI (match_dup 0)
			   (parallel [(const_int 2) (const_int 3)]))))]
  "TARGET_NEON"
{
  int dest = REGNO (operands[0]);
  int src = REGNO (operands[1]);

  if (dest != src)
    return "vmov\t%e0, %P1";
  else
    return "";
}
  [(set_attr "neon_type" "neon_bp_simple")]
)

Their solution is also not complete.
What is the proper way to handle such a case and how do I let gcc know
that this is a simple move instruction so that gcc would be able to
optimize it out?

Thanks, Roy.