From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-333254-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 8996 invoked by alias); 30 Nov 2012 22:33:05 -0000
Received: (qmail 8921 invoked by uid 22791); 30 Nov 2012 22:33:02 -0000
X-SWARE-Spam-Status: No, hits=-8.0 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,TW_ZJ
X-Spam-Check-By: sourceware.org
Received: from mail4-relais-sop.national.inria.fr (HELO mail4-relais-sop.national.inria.fr) (192.134.164.105)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 30 Nov 2012 22:32:41 +0000
Received: from ip-53.net-81-220-141.rev.numericable.fr (HELO laptop-mg.local) ([81.220.141.53])  by mail4-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 30 Nov 2012 23:32:39 +0100
Date: Fri, 30 Nov 2012 22:36:00 -0000
From: Marc Glisse <marc.glisse@inria.fr>
To: Uros Bizjak <ubizjak@gmail.com>
cc: gcc-patches@gcc.gnu.org
Subject: Re: [i386] scalar ops that preserve the high part of a vector
In-Reply-To: <CAFULd4bVFbr1KC6FrjWOhEfmqEr_v2VuQMFjg4+TpTB681=HhA@mail.gmail.com>
Message-ID: <alpine.DEB.2.02.1211302244290.3783@laptop-mg.saclay.inria.fr>
References: <alpine.DEB.2.02.1210131032460.9651@stedding.saclay.inria.fr> <CAFULd4YHdLF1ZyxrMG8MhRjo40f-EfAJZnDOEBc80pOGa4WNGQ@mail.gmail.com> <alpine.DEB.2.02.1210141057010.3752@laptop-mg.saclay.inria.fr> <alpine.DEB.2.02.1211301317160.3783@laptop-mg.saclay.inria.fr> <CAFULd4bVFbr1KC6FrjWOhEfmqEr_v2VuQMFjg4+TpTB681=HhA@mail.gmail.com>
User-Agent: Alpine 2.02 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2012-11/txt/msg02576.txt.bz2

On Fri, 30 Nov 2012, Uros Bizjak wrote:

> For reference, we are talking about:
>
> (define_insn "<sse>_vm<plusminus_insn><mode>3"
>  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
> 	(vec_merge:VF_128
> 	  (plusminus:VF_128
> 	    (match_operand:VF_128 1 "register_operand" "0,x")
> 	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
> 	  (match_dup 1)
> 	  (const_int 1)))]
>  "TARGET_SSE"
>  "@
>   <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %2}
>   v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>  [(set_attr "isa" "noavx,avx")
>   (set_attr "type" "sseadd")
>   (set_attr "prefix" "orig,vex")
>   (set_attr "mode" "<ssescalarmode>")])
>
> No, looking at your description, the operand 2 should be scalar
> operand (we use _s{s,d} scalar instruction here), and for doubles this
> should refer to 64bit memory location. I don't remember all the
> details about vec_merge scalar instructions, but it looks to me that
> canonical representation should be more like your proposal:
>
> +(define_insn "*sse2_vm<plusminus_insn>v2df3"
> +  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
> +    (vec_concat:V2DF
> +      (plusminus:DF
> +        (vec_select:DF
> +          (match_operand:V2DF 1 "register_operand" "0,x")
> +          (parallel [(const_int 0)]))
> +        (match_operand:DF 2 "nonimmediate_operand" "xm,xm"))
> +      (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))]
> +  "TARGET_SSE2"

Thank you.

Among the following possible patterns, my choice (if nobody objects) is to 
use 4) for V2DF and 3) (rewritten without iterators) for V4SF. The 
question is then what should be done about the builtins and intrinsics. 
_mm_add_sd takes two __m128. If I change the signature of 
__builtin_ia32_addsd, I can make _mm_add_sd pass __B[0] as second 
argument, but I don't know if I am allowed to change that signature. 
Otherwise I guess I'll need to keep a separate expander for it (I'd rather 
not). And then there are several other operations than +/- to handle.


1) Current pattern:

   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 	(vec_merge:VF_128
 	  (plusminus:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,x")
 	    (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
 	  (match_dup 1)
 	  (const_int 1)))]

2) Minimal fix:

   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 	(vec_merge:VF_128
 	  (plusminus:VF_128
 	    (match_operand:VF_128 1 "register_operand" "0,x")
 	    (vec_duplicate:VF_128
 	      (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,xm")))
 	  (match_dup 1)
 	  (const_int 1)))]

3) With the operation in scalar mode:

   [(set (match_operand:VF_128 0 "register_operand" "=x,x")
 	(vec_merge:VF_128
 	  (vec_duplicate:VF_128
 	    (plusminus:<ssescalarmode>
 	      (vec_select:<ssescalarmode>
 		(match_operand:VF_128 1 "register_operand" "0,x")
 		(parallel [(const_int 0)]))
 	      (match_operand:<ssescalarmode> 2 "nonimmediate_operand" "xm,xm"))))
 	  (match_dup 1)
 	  (const_int 1)))]

4) Special version which only makes sense for vectors of 2 elements:

   [(set (match_operand:V2DF 0 "register_operand" "=x,x")
 	(vec_concat:V2DF
 	  (plusminus:DF
 	    (vec_select:DF
 	      (match_operand:V2DF 1 "register_operand" "0,x")
 	      (parallel [(const_int 0)]))
 	    (match_operand:DF 2 "nonimmediate_operand" "xm,xm"))
 	  (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))]

-- 
Marc Glisse