public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RTL, i386] Use subreg instead of UNSPEC_CAST
@ 2013-03-19 15:47 Marc Glisse
  2013-03-19 21:22 ` Richard Henderson
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Glisse @ 2013-03-19 15:47 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 574 bytes --]

Hello,

the following patch passes bootstrap+testsuite on x86_64-linux-gnu. I 
don't see any particular reason to forbid vector subregs of vectors, since 
we can already do it through a scalar. And not using unspecs helps avoid 
unnecessary copies.

2013-01-03  Marc Glisse  <marc.glisse@inria.fr>

 	PR target/50829
gcc/
 	* config/i386/sse.md (enum unspec): Remove UNSPEC_CAST.
 	(avx_<castmode><avxsizesuffix>_<castmode>): Use subreg.
 	* emit-rtl.c (validate_subreg): Allow vector-vector subregs.

gcc/testsuite/
 	* gcc.target/i386/pr50829.c: New file.

-- 
Marc Glisse

[-- Attachment #2: Type: TEXT/PLAIN, Size: 3620 bytes --]

Index: gcc/testsuite/gcc.target/i386/pr50829.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr50829.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pr50829.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -mavx" } */
+
+#include <x86intrin.h>
+
+__m256d
+concat (__m128d x)
+{
+  __m256d z = _mm256_castpd128_pd256 (x);
+  return _mm256_insertf128_pd (z, x, 1);
+}
+
+/* { dg-final { scan-assembler-not "vmov" } } */

Property changes on: gcc/testsuite/gcc.target/i386/pr50829.c
___________________________________________________________________
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 196633)
+++ gcc/config/i386/sse.md	(working copy)
@@ -66,21 +66,20 @@
   UNSPEC_AESKEYGENASSIST
 
   ;; For PCLMUL support
   UNSPEC_PCLMUL
 
   ;; For AVX support
   UNSPEC_PCMP
   UNSPEC_VPERMIL
   UNSPEC_VPERMIL2
   UNSPEC_VPERMIL2F128
-  UNSPEC_CAST
   UNSPEC_VTESTP
   UNSPEC_VCVTPH2PS
   UNSPEC_VCVTPS2PH
 
   ;; For AVX2 support
   UNSPEC_VPERMVAR
   UNSPEC_VPERMTI
   UNSPEC_GATHER
   UNSPEC_VSIBADDR
 ])
@@ -11089,23 +11088,22 @@
   "TARGET_AVX"
   "v<sseintprefix>maskmov<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "sselog1")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "vex")
    (set_attr "btver2_decode" "vector") 
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
-	(unspec:AVX256MODE2P
-	  [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")]
-	  UNSPEC_CAST))]
+	(subreg:AVX256MODE2P
+	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))]
   "TARGET_AVX"
   "#"
   "&& reload_completed"
   [(const_int 0)]
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   if (REG_P (op0))
     op0 = gen_rtx_REG (<ssehalfvecmode>mode, REGNO (op0));
   else
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 196633)
+++ gcc/emit-rtl.c	(working copy)
@@ -707,20 +707,23 @@ validate_subreg (enum machine_mode omode
   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
 	   && GET_MODE_INNER (imode) == omode)
     ;
   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
      i.e. (subreg:V4SF (reg:SF) 0).  This surely isn't the cleanest way to
      represent this.  It's questionable if this ought to be represented at
      all -- why can't this all be hidden in post-reload splitters that make
      arbitrarily mode changes to the registers themselves.  */
   else if (VECTOR_MODE_P (omode) && GET_MODE_INNER (omode) == imode)
     ;
+  else if (VECTOR_MODE_P (omode) && VECTOR_MODE_P (imode)
+	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
+    ;
   /* Subregs involving floating point modes are not allowed to
      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
      (subreg:SI (reg:DF) 0) isn't.  */
   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
     {
       if (! (isize == osize
 	     /* LRA can use subreg to store a floating point value in
 		an integer mode.  Although the floating point and the
 		integer modes need the same number of hard registers,
 		the size of floating point mode can be less than the

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-19 15:47 [RTL, i386] Use subreg instead of UNSPEC_CAST Marc Glisse
@ 2013-03-19 21:22 ` Richard Henderson
  2013-03-20 15:00   ` Marc Glisse
  2014-06-10  6:36   ` Marc Glisse
  0 siblings, 2 replies; 12+ messages in thread
From: Richard Henderson @ 2013-03-19 21:22 UTC (permalink / raw)
  To: Marc Glisse; +Cc: gcc-patches

On 03/19/2013 08:47 AM, Marc Glisse wrote:
>  (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
>    [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
> -	(unspec:AVX256MODE2P
> -	  [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")]
> -	  UNSPEC_CAST))]
> +	(subreg:AVX256MODE2P
> +	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))]
>    "TARGET_AVX"
>    "#"
>    "&& reload_completed"
>    [(const_int 0)]

I'm not fond of this, primarily because I believe the pattern should
not exist at all.

One of the following is true:

  (1) reload needs working around (thus all the reload_completed nonsense)
or
  (2) the entire pattern is useless and would be subsumed by mov<mode>
or
  (3) the entire pattern is useless and is *already* subsumed by
      mov<mode>, since mov is earlier in the md file, making this
      pattern dead code.



r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-19 21:22 ` Richard Henderson
@ 2013-03-20 15:00   ` Marc Glisse
  2013-03-20 15:13     ` Richard Henderson
  2014-06-10  6:36   ` Marc Glisse
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Glisse @ 2013-03-20 15:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches

On Tue, 19 Mar 2013, Richard Henderson wrote:

> On 03/19/2013 08:47 AM, Marc Glisse wrote:
>>  (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
>>    [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
>> -	(unspec:AVX256MODE2P
>> -	  [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")]
>> -	  UNSPEC_CAST))]
>> +	(subreg:AVX256MODE2P
>> +	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))]
>>    "TARGET_AVX"
>>    "#"
>>    "&& reload_completed"
>>    [(const_int 0)]
>
> I'm not fond of this, primarily because I believe the pattern should
> not exist at all.

Sure, removing it would be even better.

> One of the following is true:
>
>  (1) reload needs working around (thus all the reload_completed nonsense)
> or
>  (2) the entire pattern is useless and would be subsumed by mov<mode>
> or
>  (3) the entire pattern is useless and is *already* subsumed by
>      mov<mode>, since mov is earlier in the md file, making this
>      pattern dead code.

We need something to expand _mm256_castpd128_pd256 to. I tried making it a 
define_expand (with the subreg pattern, and keeping the {} part intact), 
but that gives check_rtl errors in lra. I then tried to remove the REG_P 
condition and use simplify_gen_subreg or gen_lowpart, but the first one 
gives unrecognizable insn at -O0 (same as removing the {} part completely) 
(it seems happier at -O1), while the second ICEs (gen_lowpart_common 
returns 0) for any -Ox except -O0. As must be obvious from this paragraph, 
I just tried a few random bad ideas... and when none worked I posted the 
minimal patch that worked.

Do you at least agree that vector-vector subregs make sense, or is that 
part wrong as well?

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-20 15:00   ` Marc Glisse
@ 2013-03-20 15:13     ` Richard Henderson
  2013-03-20 15:17       ` Richard Biener
  2013-03-20 15:29       ` Marc Glisse
  0 siblings, 2 replies; 12+ messages in thread
From: Richard Henderson @ 2013-03-20 15:13 UTC (permalink / raw)
  To: Marc Glisse; +Cc: gcc-patches

On 03/20/2013 08:00 AM, Marc Glisse wrote:
> Do you at least agree that vector-vector subregs make sense, or is that part
> wrong as well?

You mean a V4SImode subreg of a V8SImode register, not just same-size casting?
It makes logical sense, but I'm fairly sure you'll need a lot more surgery
throughout the compiler to make that happen.

I'm curious how a define_expand can fail in LRA, but your define_insn succeeds?
Is the failure because of ix86_cannot_change_mode_class?  Because that hook
fairly well defines what subregs are valid.  And if that says it isn't valid,
then even having a define_insn that uses such is wrong.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-20 15:13     ` Richard Henderson
@ 2013-03-20 15:17       ` Richard Biener
  2013-03-20 15:29       ` Marc Glisse
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Biener @ 2013-03-20 15:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Marc Glisse, gcc-patches

On Wed, Mar 20, 2013 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>> Do you at least agree that vector-vector subregs make sense, or is that part
>> wrong as well?
>
> You mean a V4SImode subreg of a V8SImode register, not just same-size casting?
> It makes logical sense, but I'm fairly sure you'll need a lot more surgery
> throughout the compiler to make that happen.
>
> I'm curious how a define_expand can fail in LRA, but your define_insn succeeds?
> Is the failure because of ix86_cannot_change_mode_class?  Because that hook
> fairly well defines what subregs are valid.  And if that says it isn't valid,
> then even having a define_insn that uses such is wrong.

Don't we have vec_select to get a V4SImode out of a V8SImode?  So you
only need a define_insn that special-cases the subreg-like ones?

Richard.

>
> r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-20 15:13     ` Richard Henderson
  2013-03-20 15:17       ` Richard Biener
@ 2013-03-20 15:29       ` Marc Glisse
  2013-03-20 15:44         ` Richard Biener
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Glisse @ 2013-03-20 15:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches

On Wed, 20 Mar 2013, Richard Henderson wrote:

> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>> Do you at least agree that vector-vector subregs make sense, or is that part
>> wrong as well?
>
> You mean a V4SImode subreg of a V8SImode register, not just same-size casting?

I am mostly interested in the reverse, a paradoxical subreg, since 
vec_select can only model one direction (and only rvalues, but that's a 
different question).

> It makes logical sense, but I'm fairly sure you'll need a lot more surgery
> throughout the compiler to make that happen.
>
> I'm curious how a define_expand can fail in LRA, but your define_insn succeeds?

Total guesswork:

I think it is related to that REG_P protected code, and the 
reload_complete test. With the define_insn_and_split, we keep the insn 
until after reload and only do the subreg magic then. With a 
define_expand, we end up writing to reg 60 as a V2DF and reading it as a 
V4DF, and since it isn't a hard register, that causes a problem.

> Is the failure because of ix86_cannot_change_mode_class?  Because that hook
> fairly well defines what subregs are valid.  And if that says it isn't valid,
> then even having a define_insn that uses such is wrong.

A quick look at ix86_cannot_change_mode_class seems to indicate that it 
does not mind such paradoxical subregs.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-20 15:29       ` Marc Glisse
@ 2013-03-20 15:44         ` Richard Biener
  2013-03-20 15:54           ` Marc Glisse
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Biener @ 2013-03-20 15:44 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Richard Henderson, gcc-patches

On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Wed, 20 Mar 2013, Richard Henderson wrote:
>
>> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>>>
>>> Do you at least agree that vector-vector subregs make sense, or is that
>>> part
>>> wrong as well?
>>
>>
>> You mean a V4SImode subreg of a V8SImode register, not just same-size
>> casting?
>
>
> I am mostly interested in the reverse, a paradoxical subreg, since
> vec_select can only model one direction (and only rvalues, but that's a
> different question).

vec_duplicate?

Honestly, what semantics should  _mm256_castpd128_pd256 have if
it is supposed to cast a v2df to a v4df?  Or what use?

Richard.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-20 15:44         ` Richard Biener
@ 2013-03-20 15:54           ` Marc Glisse
  2013-03-21  9:24             ` Richard Biener
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Glisse @ 2013-03-20 15:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Henderson, gcc-patches

On Wed, 20 Mar 2013, Richard Biener wrote:

> On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> On Wed, 20 Mar 2013, Richard Henderson wrote:
>>
>>> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>>>>
>>>> Do you at least agree that vector-vector subregs make sense, or is that
>>>> part
>>>> wrong as well?
>>>
>>>
>>> You mean a V4SImode subreg of a V8SImode register, not just same-size
>>> casting?
>>
>>
>> I am mostly interested in the reverse, a paradoxical subreg, since
>> vec_select can only model one direction (and only rvalues, but that's a
>> different question).
>
> vec_duplicate?

There is already some of that in various places, and there may be even 
more vec_merge+vec_duplicate patterns soon, but you want to make sure you 
don't actually do the duplication.

> Honestly, what semantics should  _mm256_castpd128_pd256 have if
> it is supposed to cast a v2df to a v4df?

NOP. We don't care what is in the high part of the vector.

> Or what use?

Many vector operations are defined as taking 2 vectors and merging them 
somehow. I didn't check if this case works, but for instance if you want 
to copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you 
will probably have to cast the V2DF to a V4DF and then use an intrinsic 
that takes 2 V4DF. (there are many issues with those intrinsics, but we 
don't control them)

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-20 15:54           ` Marc Glisse
@ 2013-03-21  9:24             ` Richard Biener
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Biener @ 2013-03-21  9:24 UTC (permalink / raw)
  To: Marc Glisse; +Cc: Richard Henderson, gcc-patches

On Wed, Mar 20, 2013 at 4:54 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Wed, 20 Mar 2013, Richard Biener wrote:
>
>> On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> On Wed, 20 Mar 2013, Richard Henderson wrote:
>>>
>>>> On 03/20/2013 08:00 AM, Marc Glisse wrote:
>>>>>
>>>>>
>>>>> Do you at least agree that vector-vector subregs make sense, or is that
>>>>> part
>>>>> wrong as well?
>>>>
>>>>
>>>>
>>>> You mean a V4SImode subreg of a V8SImode register, not just same-size
>>>> casting?
>>>
>>>
>>>
>>> I am mostly interested in the reverse, a paradoxical subreg, since
>>> vec_select can only model one direction (and only rvalues, but that's a
>>> different question).
>>
>>
>> vec_duplicate?
>
>
> There is already some of that in various places, and there may be even more
> vec_merge+vec_duplicate patterns soon, but you want to make sure you don't
> actually do the duplication.
>
>
>> Honestly, what semantics should  _mm256_castpd128_pd256 have if
>> it is supposed to cast a v2df to a v4df?
>
>
> NOP. We don't care what is in the high part of the vector.
>
>> Or what use?
>
>
> Many vector operations are defined as taking 2 vectors and merging them
> somehow. I didn't check if this case works, but for instance if you want to
> copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you will
> probably have to cast the V2DF to a V4DF and then use an intrinsic that
> takes 2 V4DF. (there are many issues with those intrinsics, but we don't
> control them)

Hmm, I see.  I still think that we should expose most of the intrinsics
and builtins implementation details earlier, at the GIMPLE level.  This one
would be an awkward one there, too.  You'd need sth like

  v4df_3 = CONSTRUCTOR { v2df_2, v2df_1(D) };

thus, make that "uninitialized" explicit by using a default def.  I
think we don't
support generating the above from C/C++ source with GNU extensions as
vector type casts are quite restricted at the moment, so there you'd have
to write sth like

  double uninit;
  v4df res = { v2dfv[0], v2dfv[1], uninit, uninit };

which would get you

  D.1723 = BIT_FIELD_REF <x, 64, 0>;
  D.1724 = BIT_FIELD_REF <x, 64, 64>;
  D.1725 = {D.1723, D.1724, uninit, uninit};

at the moment.  And of course awkward code in the end ;)

Which leaves the other option of folding the __builtin_ia32_ps256_ps
in the target (and most other builtins).

Just side-tracking from the RTL issue of course ...

Richard.

> --
> Marc Glisse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2013-03-19 21:22 ` Richard Henderson
  2013-03-20 15:00   ` Marc Glisse
@ 2014-06-10  6:36   ` Marc Glisse
  2014-07-26  9:53     ` Marc Glisse
  1 sibling, 1 reply; 12+ messages in thread
From: Marc Glisse @ 2014-06-10  6:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1239 bytes --]

On Tue, 19 Mar 2013, Richard Henderson wrote:

> I'm not fond of this, primarily because I believe the pattern should
> not exist at all.

One year later, new try. Tweaking the pattern, I ended up with a copy of 
the mov pattern (the subreg is generated automatically when the modes 
don't match), so I just removed it. I know the comment in emit-rtl.c says 
splitters are a better way forward than subregs, but I haven't managed 
with splitters while the subreg patch is very simple :-) I added a -O0 
testcase because when I was experimenting I had many versions that worked 
for -O2 but ICEd at -O0 (and vice versa), but it might be redundant with 
some other tests.

Bootstrap+testsuite on x86_64-linux-gnu.

2014-06-10  Marc Glisse  <marc.glisse@inria.fr>

 	PR target/50829
gcc/
 	* config/i386/sse.md (enum unspec): Remove UNSPEC_CAST.
 	(avx_<castmode><avxsizesuffix>_<castmode>): Remove.
 	* config/i386/i386.c (builtin_description) [__builtin_ia32_si256_si,
 	__builtin_ia32_ps256_ps, __builtin_ia32_pd256_pd]: Replace the
 	removed insn with mov.
 	* emit-rtl.c (validate_subreg): Allow vector-vector subregs.

gcc/testsuite/
 	* gcc.target/i386/pr50829-1.c: New file.
 	* gcc.target/i386/pr50829-2.c: New file.

-- 
Marc Glisse

[-- Attachment #2: Type: TEXT/PLAIN, Size: 7531 bytes --]

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 211397)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29793,23 +29793,23 @@ static const struct builtin_description
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_roundps_sfix256, "__builtin_ia32_ceilps_sfix256", IX86_BUILTIN_CEILPS_SFIX256, (enum rtx_code) ROUND_CEIL, (int) V8SI_FTYPE_V8SF_ROUND },
 
   { OPTION_MASK_ISA_AVX, CODE_FOR_roundv8sf2, "__builtin_ia32_roundps_az256", IX86_BUILTIN_ROUNDPS_AZ256, UNKNOWN, (int) V8SF_FTYPE_V8SF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_roundv8sf2_sfix, "__builtin_ia32_roundps_az_sfix256", IX86_BUILTIN_ROUNDPS_AZ_SFIX256, UNKNOWN, (int) V8SI_FTYPE_V8SF },
 
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_unpckhpd256,  "__builtin_ia32_unpckhpd256", IX86_BUILTIN_UNPCKHPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_unpcklpd256,  "__builtin_ia32_unpcklpd256", IX86_BUILTIN_UNPCKLPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_unpckhps256,  "__builtin_ia32_unpckhps256", IX86_BUILTIN_UNPCKHPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_unpcklps256,  "__builtin_ia32_unpcklps256", IX86_BUILTIN_UNPCKLPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF },
 
-  { OPTION_MASK_ISA_AVX, CODE_FOR_avx_si256_si, "__builtin_ia32_si256_si", IX86_BUILTIN_SI256_SI, UNKNOWN, (int) V8SI_FTYPE_V4SI },
-  { OPTION_MASK_ISA_AVX, CODE_FOR_avx_ps256_ps, "__builtin_ia32_ps256_ps", IX86_BUILTIN_PS256_PS, UNKNOWN, (int) V8SF_FTYPE_V4SF },
-  { OPTION_MASK_ISA_AVX, CODE_FOR_avx_pd256_pd, "__builtin_ia32_pd256_pd", IX86_BUILTIN_PD256_PD, UNKNOWN, (int) V4DF_FTYPE_V2DF },
+  { OPTION_MASK_ISA_AVX, CODE_FOR_movv8si, "__builtin_ia32_si256_si", IX86_BUILTIN_SI256_SI, UNKNOWN, (int) V8SI_FTYPE_V4SI },
+  { OPTION_MASK_ISA_AVX, CODE_FOR_movv8sf, "__builtin_ia32_ps256_ps", IX86_BUILTIN_PS256_PS, UNKNOWN, (int) V8SF_FTYPE_V4SF },
+  { OPTION_MASK_ISA_AVX, CODE_FOR_movv4df, "__builtin_ia32_pd256_pd", IX86_BUILTIN_PD256_PD, UNKNOWN, (int) V4DF_FTYPE_V2DF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v8si, "__builtin_ia32_si_si256", IX86_BUILTIN_SI_SI256, UNKNOWN, (int) V4SI_FTYPE_V8SI },
   { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v8sf, "__builtin_ia32_ps_ps256", IX86_BUILTIN_PS_PS256, UNKNOWN, (int) V4SF_FTYPE_V8SF },
   { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v4df, "__builtin_ia32_pd_pd256", IX86_BUILTIN_PD_PD256, UNKNOWN, (int) V2DF_FTYPE_V4DF },
 
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestzpd", IX86_BUILTIN_VTESTZPD, EQ, (int) INT_FTYPE_V2DF_V2DF_PTEST },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestcpd", IX86_BUILTIN_VTESTCPD, LTU, (int) INT_FTYPE_V2DF_V2DF_PTEST },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestnzcpd", IX86_BUILTIN_VTESTNZCPD, GTU, (int) INT_FTYPE_V2DF_V2DF_PTEST },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestps, "__builtin_ia32_vtestzps", IX86_BUILTIN_VTESTZPS, EQ, (int) INT_FTYPE_V4SF_V4SF_PTEST },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestps, "__builtin_ia32_vtestcps", IX86_BUILTIN_VTESTCPS, LTU, (int) INT_FTYPE_V4SF_V4SF_PTEST },
   { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestps, "__builtin_ia32_vtestnzcps", IX86_BUILTIN_VTESTNZCPS, GTU, (int) INT_FTYPE_V4SF_V4SF_PTEST },
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	(revision 211397)
+++ gcc/config/i386/sse.md	(working copy)
@@ -66,21 +66,20 @@
   UNSPEC_AESKEYGENASSIST
 
   ;; For PCLMUL support
   UNSPEC_PCLMUL
 
   ;; For AVX support
   UNSPEC_PCMP
   UNSPEC_VPERMIL
   UNSPEC_VPERMIL2
   UNSPEC_VPERMIL2F128
-  UNSPEC_CAST
   UNSPEC_VTESTP
   UNSPEC_VCVTPH2PS
   UNSPEC_VCVTPS2PH
 
   ;; For AVX2 support
   UNSPEC_VPERMVAR
   UNSPEC_VPERMTI
   UNSPEC_GATHER
   UNSPEC_VSIBADDR
 
@@ -14816,40 +14815,20 @@
 
 (define_expand "maskstore<mode>"
   [(set (match_operand:V48_AVX2 0 "memory_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
 	   (match_operand:V48_AVX2 1 "register_operand")
 	   (match_dup 0)]
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
-(define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
-  [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
-	(unspec:AVX256MODE2P
-	  [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")]
-	  UNSPEC_CAST))]
-  "TARGET_AVX"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  rtx op0 = operands[0];
-  rtx op1 = operands[1];
-  if (REG_P (op0))
-    op0 = gen_rtx_REG (<ssehalfvecmode>mode, REGNO (op0));
-  else
-    op1 = gen_rtx_REG (<MODE>mode, REGNO (op1));
-  emit_move_insn (op0, op1);
-  DONE;
-})
-
 (define_expand "vec_init<mode>"
   [(match_operand:V_256 0 "register_operand")
    (match_operand 1)]
   "TARGET_AVX"
 {
   ix86_expand_vector_init (false, operands[0], operands[1]);
   DONE;
 })
 
 (define_expand "vec_init<mode>"
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 211397)
+++ gcc/emit-rtl.c	(working copy)
@@ -775,20 +775,23 @@ validate_subreg (enum machine_mode omode
   else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode))
 	   && GET_MODE_INNER (imode) == omode)
     ;
   /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs,
      i.e. (subreg:V4SF (reg:SF) 0).  This surely isn't the cleanest way to
      represent this.  It's questionable if this ought to be represented at
      all -- why can't this all be hidden in post-reload splitters that make
      arbitrarily mode changes to the registers themselves.  */
   else if (VECTOR_MODE_P (omode) && GET_MODE_INNER (omode) == imode)
     ;
+  else if (VECTOR_MODE_P (omode) && VECTOR_MODE_P (imode)
+	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
+    ;
   /* Subregs involving floating point modes are not allowed to
      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
      (subreg:SI (reg:DF) 0) isn't.  */
   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
     {
       if (! (isize == osize
 	     /* LRA can use subreg to store a floating point value in
 		an integer mode.  Although the floating point and the
 		integer modes need the same number of hard registers,
 		the size of floating point mode can be less than the
Index: gcc/testsuite/gcc.target/i386/pr50829-1.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr50829-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pr50829-1.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+#include <x86intrin.h>
+
+__m256d
+concat (__m128d x)
+{
+  __m256d z = _mm256_castpd128_pd256 (x);
+  return _mm256_insertf128_pd (z, x, 1);
+}
+
+/* { dg-final { scan-assembler-not "vmov" } } */
Index: gcc/testsuite/gcc.target/i386/pr50829-2.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr50829-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pr50829-2.c	(working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mavx" } */
+
+#include <x86intrin.h>
+
+__m256d
+concat (__m128d x)
+{
+  __m256d z = _mm256_castpd128_pd256 (x);
+  return _mm256_insertf128_pd (z, x, 1);
+}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2014-06-10  6:36   ` Marc Glisse
@ 2014-07-26  9:53     ` Marc Glisse
  2014-08-26 12:44       ` Marc Glisse
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Glisse @ 2014-07-26  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Henderson

Hello,

any comment on this patch?
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html

On Tue, 10 Jun 2014, Marc Glisse wrote:

> On Tue, 19 Mar 2013, Richard Henderson wrote:
>
>> I'm not fond of this, primarily because I believe the pattern should
>> not exist at all.
>
> One year later, new try. Tweaking the pattern, I ended up with a copy of the 
> mov pattern (the subreg is generated automatically when the modes don't 
> match), so I just removed it. I know the comment in emit-rtl.c says splitters 
> are a better way forward than subregs, but I haven't managed with splitters 
> while the subreg patch is very simple :-) I added a -O0 testcase because when 
> I was experimenting I had many versions that worked for -O2 but ICEd at -O0 
> (and vice versa), but it might be redundant with some other tests.
>
> Bootstrap+testsuite on x86_64-linux-gnu.
>
> 2014-06-10  Marc Glisse  <marc.glisse@inria.fr>
>
> 	PR target/50829
> gcc/
> 	* config/i386/sse.md (enum unspec): Remove UNSPEC_CAST.
> 	(avx_<castmode><avxsizesuffix>_<castmode>): Remove.
> 	* config/i386/i386.c (builtin_description) [__builtin_ia32_si256_si,
> 	__builtin_ia32_ps256_ps, __builtin_ia32_pd256_pd]: Replace the
> 	removed insn with mov.
> 	* emit-rtl.c (validate_subreg): Allow vector-vector subregs.
>
> gcc/testsuite/
> 	* gcc.target/i386/pr50829-1.c: New file.
> 	* gcc.target/i386/pr50829-2.c: New file.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST
  2014-07-26  9:53     ` Marc Glisse
@ 2014-08-26 12:44       ` Marc Glisse
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Glisse @ 2014-08-26 12:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Henderson

Ping?

On Sat, 26 Jul 2014, Marc Glisse wrote:

> Hello,
>
> any comment on this patch?
> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html
>
> On Tue, 10 Jun 2014, Marc Glisse wrote:
>
>> On Tue, 19 Mar 2013, Richard Henderson wrote:
>> 
>>> I'm not fond of this, primarily because I believe the pattern should
>>> not exist at all.
>> 
>> One year later, new try. Tweaking the pattern, I ended up with a copy of 
>> the mov pattern (the subreg is generated automatically when the modes don't 
>> match), so I just removed it. I know the comment in emit-rtl.c says 
>> splitters are a better way forward than subregs, but I haven't managed with 
>> splitters while the subreg patch is very simple :-) I added a -O0 testcase 
>> because when I was experimenting I had many versions that worked for -O2 
>> but ICEd at -O0 (and vice versa), but it might be redundant with some other 
>> tests.
>> 
>> Bootstrap+testsuite on x86_64-linux-gnu.
>> 
>> 2014-06-10  Marc Glisse  <marc.glisse@inria.fr>
>>
>> 	PR target/50829
>> gcc/
>> 	* config/i386/sse.md (enum unspec): Remove UNSPEC_CAST.
>> 	(avx_<castmode><avxsizesuffix>_<castmode>): Remove.
>> 	* config/i386/i386.c (builtin_description) [__builtin_ia32_si256_si,
>> 	__builtin_ia32_ps256_ps, __builtin_ia32_pd256_pd]: Replace the
>> 	removed insn with mov.
>> 	* emit-rtl.c (validate_subreg): Allow vector-vector subregs.
>> 
>> gcc/testsuite/
>> 	* gcc.target/i386/pr50829-1.c: New file.
>> 	* gcc.target/i386/pr50829-2.c: New file.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-08-26 12:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-19 15:47 [RTL, i386] Use subreg instead of UNSPEC_CAST Marc Glisse
2013-03-19 21:22 ` Richard Henderson
2013-03-20 15:00   ` Marc Glisse
2013-03-20 15:13     ` Richard Henderson
2013-03-20 15:17       ` Richard Biener
2013-03-20 15:29       ` Marc Glisse
2013-03-20 15:44         ` Richard Biener
2013-03-20 15:54           ` Marc Glisse
2013-03-21  9:24             ` Richard Biener
2014-06-10  6:36   ` Marc Glisse
2014-07-26  9:53     ` Marc Glisse
2014-08-26 12:44       ` Marc Glisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).