* [RTL, i386] Use subreg instead of UNSPEC_CAST @ 2013-03-19 15:47 Marc Glisse 2013-03-19 21:22 ` Richard Henderson 0 siblings, 1 reply; 12+ messages in thread From: Marc Glisse @ 2013-03-19 15:47 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: TEXT/PLAIN, Size: 574 bytes --] Hello, the following patch passes bootstrap+testsuite on x86_64-linux-gnu. I don't see any particular reason to forbid vector subregs of vectors, since we can already do it through a scalar. And not using unspecs helps avoid unnecessary copies. 2013-01-03 Marc Glisse <marc.glisse@inria.fr> PR target/50829 gcc/ * config/i386/sse.md (enum unspec): Remove UNSPEC_CAST. (avx_<castmode><avxsizesuffix>_<castmode>): Use subreg. * emit-rtl.c (validate_subreg): Allow vector-vector subregs. gcc/testsuite/ * gcc.target/i386/pr50829.c: New file. -- Marc Glisse [-- Attachment #2: Type: TEXT/PLAIN, Size: 3620 bytes --] Index: gcc/testsuite/gcc.target/i386/pr50829.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr50829.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr50829.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -mavx" } */ + +#include <x86intrin.h> + +__m256d +concat (__m128d x) +{ + __m256d z = _mm256_castpd128_pd256 (x); + return _mm256_insertf128_pd (z, x, 1); +} + +/* { dg-final { scan-assembler-not "vmov" } } */ Property changes on: gcc/testsuite/gcc.target/i386/pr50829.c ___________________________________________________________________ Added: svn:keywords + Author Date Id Revision URL Added: svn:eol-style + native Index: gcc/config/i386/sse.md =================================================================== --- gcc/config/i386/sse.md (revision 196633) +++ gcc/config/i386/sse.md (working copy) @@ -66,21 +66,20 @@ UNSPEC_AESKEYGENASSIST ;; For PCLMUL support UNSPEC_PCLMUL ;; For AVX support UNSPEC_PCMP UNSPEC_VPERMIL UNSPEC_VPERMIL2 UNSPEC_VPERMIL2F128 - UNSPEC_CAST UNSPEC_VTESTP UNSPEC_VCVTPH2PS UNSPEC_VCVTPS2PH ;; For AVX2 support UNSPEC_VPERMVAR UNSPEC_VPERMTI UNSPEC_GATHER UNSPEC_VSIBADDR ]) @@ -11089,23 +11088,22 @@ "TARGET_AVX" "v<sseintprefix>maskmov<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "prefix" "vex") (set_attr "btver2_decode" "vector") (set_attr "mode" "<sseinsnmode>")]) (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") - (unspec:AVX256MODE2P - [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")] - UNSPEC_CAST))] + (subreg:AVX256MODE2P + (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))] "TARGET_AVX" "#" "&& reload_completed" [(const_int 0)] { rtx op0 = operands[0]; rtx op1 = operands[1]; if (REG_P (op0)) op0 = gen_rtx_REG (<ssehalfvecmode>mode, REGNO (op0)); else Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 196633) +++ gcc/emit-rtl.c (working copy) @@ -707,20 +707,23 @@ validate_subreg (enum machine_mode omode else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode)) && GET_MODE_INNER (imode) == omode) ; /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs, i.e. (subreg:V4SF (reg:SF) 0). This surely isn't the cleanest way to represent this. It's questionable if this ought to be represented at all -- why can't this all be hidden in post-reload splitters that make arbitrarily mode changes to the registers themselves. */ else if (VECTOR_MODE_P (omode) && GET_MODE_INNER (omode) == imode) ; + else if (VECTOR_MODE_P (omode) && VECTOR_MODE_P (imode) + && GET_MODE_INNER (omode) == GET_MODE_INNER (imode)) + ; /* Subregs involving floating point modes are not allowed to change size. Therefore (subreg:DI (reg:DF) 0) is fine, but (subreg:SI (reg:DF) 0) isn't. */ else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode)) { if (! (isize == osize /* LRA can use subreg to store a floating point value in an integer mode. Although the floating point and the integer modes need the same number of hard registers, the size of floating point mode can be less than the ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-19 15:47 [RTL, i386] Use subreg instead of UNSPEC_CAST Marc Glisse @ 2013-03-19 21:22 ` Richard Henderson 2013-03-20 15:00 ` Marc Glisse 2014-06-10 6:36 ` Marc Glisse 0 siblings, 2 replies; 12+ messages in thread From: Richard Henderson @ 2013-03-19 21:22 UTC (permalink / raw) To: Marc Glisse; +Cc: gcc-patches On 03/19/2013 08:47 AM, Marc Glisse wrote: > (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" > [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") > - (unspec:AVX256MODE2P > - [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")] > - UNSPEC_CAST))] > + (subreg:AVX256MODE2P > + (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))] > "TARGET_AVX" > "#" > "&& reload_completed" > [(const_int 0)] I'm not fond of this, primarily because I believe the pattern should not exist at all. One of the following is true: (1) reload needs working around (thus all the reload_completed nonsense) or (2) the entire pattern is useless and would be subsumed by mov<mode> or (3) the entire pattern is useless and is *already* subsumed by mov<mode>, since mov is earlier in the md file, making this pattern dead code. r~ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-19 21:22 ` Richard Henderson @ 2013-03-20 15:00 ` Marc Glisse 2013-03-20 15:13 ` Richard Henderson 2014-06-10 6:36 ` Marc Glisse 1 sibling, 1 reply; 12+ messages in thread From: Marc Glisse @ 2013-03-20 15:00 UTC (permalink / raw) To: Richard Henderson; +Cc: gcc-patches On Tue, 19 Mar 2013, Richard Henderson wrote: > On 03/19/2013 08:47 AM, Marc Glisse wrote: >> (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" >> [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") >> - (unspec:AVX256MODE2P >> - [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")] >> - UNSPEC_CAST))] >> + (subreg:AVX256MODE2P >> + (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x") 0))] >> "TARGET_AVX" >> "#" >> "&& reload_completed" >> [(const_int 0)] > > I'm not fond of this, primarily because I believe the pattern should > not exist at all. Sure, removing it would be even better. > One of the following is true: > > (1) reload needs working around (thus all the reload_completed nonsense) > or > (2) the entire pattern is useless and would be subsumed by mov<mode> > or > (3) the entire pattern is useless and is *already* subsumed by > mov<mode>, since mov is earlier in the md file, making this > pattern dead code. We need something to expand _mm256_castpd128_pd256 to. I tried making it a define_expand (with the subreg pattern, and keeping the {} part intact), but that gives check_rtl errors in lra. I then tried to remove the REG_P condition and use simplify_gen_subreg or gen_lowpart, but the first one gives unrecognizable insn at -O0 (same as removing the {} part completely) (it seems happier at -O1), while the second ICEs (gen_lowpart_common returns 0) for any -Ox except -O0. As must be obvious from this paragraph, I just tried a few random bad ideas... and when none worked I posted the minimal patch that worked. Do you at least agree that vector-vector subregs make sense, or is that part wrong as well? -- Marc Glisse ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-20 15:00 ` Marc Glisse @ 2013-03-20 15:13 ` Richard Henderson 2013-03-20 15:17 ` Richard Biener 2013-03-20 15:29 ` Marc Glisse 0 siblings, 2 replies; 12+ messages in thread From: Richard Henderson @ 2013-03-20 15:13 UTC (permalink / raw) To: Marc Glisse; +Cc: gcc-patches On 03/20/2013 08:00 AM, Marc Glisse wrote: > Do you at least agree that vector-vector subregs make sense, or is that part > wrong as well? You mean a V4SImode subreg of a V8SImode register, not just same-size casting? It makes logical sense, but I'm fairly sure you'll need a lot more surgery throughout the compiler to make that happen. I'm curious how a define_expand can fail in LRA, but your define_insn succeeds? Is the failure because of ix86_cannot_change_mode_class? Because that hook fairly well defines what subregs are valid. And if that says it isn't valid, then even having a define_insn that uses such is wrong. r~ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-20 15:13 ` Richard Henderson @ 2013-03-20 15:17 ` Richard Biener 2013-03-20 15:29 ` Marc Glisse 1 sibling, 0 replies; 12+ messages in thread From: Richard Biener @ 2013-03-20 15:17 UTC (permalink / raw) To: Richard Henderson; +Cc: Marc Glisse, gcc-patches On Wed, Mar 20, 2013 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote: > On 03/20/2013 08:00 AM, Marc Glisse wrote: >> Do you at least agree that vector-vector subregs make sense, or is that part >> wrong as well? > > You mean a V4SImode subreg of a V8SImode register, not just same-size casting? > It makes logical sense, but I'm fairly sure you'll need a lot more surgery > throughout the compiler to make that happen. > > I'm curious how a define_expand can fail in LRA, but your define_insn succeeds? > Is the failure because of ix86_cannot_change_mode_class? Because that hook > fairly well defines what subregs are valid. And if that says it isn't valid, > then even having a define_insn that uses such is wrong. Don't we have vec_select to get a V4SImode out of a V8SImode? So you only need a define_insn that special-cases the subreg-like ones? Richard. > > r~ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-20 15:13 ` Richard Henderson 2013-03-20 15:17 ` Richard Biener @ 2013-03-20 15:29 ` Marc Glisse 2013-03-20 15:44 ` Richard Biener 1 sibling, 1 reply; 12+ messages in thread From: Marc Glisse @ 2013-03-20 15:29 UTC (permalink / raw) To: Richard Henderson; +Cc: gcc-patches On Wed, 20 Mar 2013, Richard Henderson wrote: > On 03/20/2013 08:00 AM, Marc Glisse wrote: >> Do you at least agree that vector-vector subregs make sense, or is that part >> wrong as well? > > You mean a V4SImode subreg of a V8SImode register, not just same-size casting? I am mostly interested in the reverse, a paradoxical subreg, since vec_select can only model one direction (and only rvalues, but that's a different question). > It makes logical sense, but I'm fairly sure you'll need a lot more surgery > throughout the compiler to make that happen. > > I'm curious how a define_expand can fail in LRA, but your define_insn succeeds? Total guesswork: I think it is related to that REG_P protected code, and the reload_complete test. With the define_insn_and_split, we keep the insn until after reload and only do the subreg magic then. With a define_expand, we end up writing to reg 60 as a V2DF and reading it as a V4DF, and since it isn't a hard register, that causes a problem. > Is the failure because of ix86_cannot_change_mode_class? Because that hook > fairly well defines what subregs are valid. And if that says it isn't valid, > then even having a define_insn that uses such is wrong. A quick look at ix86_cannot_change_mode_class seems to indicate that it does not mind such paradoxical subregs. -- Marc Glisse ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-20 15:29 ` Marc Glisse @ 2013-03-20 15:44 ` Richard Biener 2013-03-20 15:54 ` Marc Glisse 0 siblings, 1 reply; 12+ messages in thread From: Richard Biener @ 2013-03-20 15:44 UTC (permalink / raw) To: Marc Glisse; +Cc: Richard Henderson, gcc-patches On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote: > On Wed, 20 Mar 2013, Richard Henderson wrote: > >> On 03/20/2013 08:00 AM, Marc Glisse wrote: >>> >>> Do you at least agree that vector-vector subregs make sense, or is that >>> part >>> wrong as well? >> >> >> You mean a V4SImode subreg of a V8SImode register, not just same-size >> casting? > > > I am mostly interested in the reverse, a paradoxical subreg, since > vec_select can only model one direction (and only rvalues, but that's a > different question). vec_duplicate? Honestly, what semantics should _mm256_castpd128_pd256 have if it is supposed to cast a v2df to a v4df? Or what use? Richard. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-20 15:44 ` Richard Biener @ 2013-03-20 15:54 ` Marc Glisse 2013-03-21 9:24 ` Richard Biener 0 siblings, 1 reply; 12+ messages in thread From: Marc Glisse @ 2013-03-20 15:54 UTC (permalink / raw) To: Richard Biener; +Cc: Richard Henderson, gcc-patches On Wed, 20 Mar 2013, Richard Biener wrote: > On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote: >> On Wed, 20 Mar 2013, Richard Henderson wrote: >> >>> On 03/20/2013 08:00 AM, Marc Glisse wrote: >>>> >>>> Do you at least agree that vector-vector subregs make sense, or is that >>>> part >>>> wrong as well? >>> >>> >>> You mean a V4SImode subreg of a V8SImode register, not just same-size >>> casting? >> >> >> I am mostly interested in the reverse, a paradoxical subreg, since >> vec_select can only model one direction (and only rvalues, but that's a >> different question). > > vec_duplicate? There is already some of that in various places, and there may be even more vec_merge+vec_duplicate patterns soon, but you want to make sure you don't actually do the duplication. > Honestly, what semantics should _mm256_castpd128_pd256 have if > it is supposed to cast a v2df to a v4df? NOP. We don't care what is in the high part of the vector. > Or what use? Many vector operations are defined as taking 2 vectors and merging them somehow. I didn't check if this case works, but for instance if you want to copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you will probably have to cast the V2DF to a V4DF and then use an intrinsic that takes 2 V4DF. (there are many issues with those intrinsics, but we don't control them) -- Marc Glisse ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-20 15:54 ` Marc Glisse @ 2013-03-21 9:24 ` Richard Biener 0 siblings, 0 replies; 12+ messages in thread From: Richard Biener @ 2013-03-21 9:24 UTC (permalink / raw) To: Marc Glisse; +Cc: Richard Henderson, gcc-patches On Wed, Mar 20, 2013 at 4:54 PM, Marc Glisse <marc.glisse@inria.fr> wrote: > On Wed, 20 Mar 2013, Richard Biener wrote: > >> On Wed, Mar 20, 2013 at 4:29 PM, Marc Glisse <marc.glisse@inria.fr> wrote: >>> >>> On Wed, 20 Mar 2013, Richard Henderson wrote: >>> >>>> On 03/20/2013 08:00 AM, Marc Glisse wrote: >>>>> >>>>> >>>>> Do you at least agree that vector-vector subregs make sense, or is that >>>>> part >>>>> wrong as well? >>>> >>>> >>>> >>>> You mean a V4SImode subreg of a V8SImode register, not just same-size >>>> casting? >>> >>> >>> >>> I am mostly interested in the reverse, a paradoxical subreg, since >>> vec_select can only model one direction (and only rvalues, but that's a >>> different question). >> >> >> vec_duplicate? > > > There is already some of that in various places, and there may be even more > vec_merge+vec_duplicate patterns soon, but you want to make sure you don't > actually do the duplication. > > >> Honestly, what semantics should _mm256_castpd128_pd256 have if >> it is supposed to cast a v2df to a v4df? > > > NOP. We don't care what is in the high part of the vector. > >> Or what use? > > > Many vector operations are defined as taking 2 vectors and merging them > somehow. I didn't check if this case works, but for instance if you want to > copy a V2DF to the bottom part of a V4DF using Intel's intrinsics, you will > probably have to cast the V2DF to a V4DF and then use an intrinsic that > takes 2 V4DF. (there are many issues with those intrinsics, but we don't > control them) Hmm, I see. I still think that we should expose most of the intrinsics and builtins implementation details earlier, at the GIMPLE level. This one would be an awkward one there, too. You'd need sth like v4df_3 = CONSTRUCTOR { v2df_2, v2df_1(D) }; thus, make that "uninitialized" explicit by using a default def. I think we don't support generating the above from C/C++ source with GNU extensions as vector type casts are quite restricted at the moment, so there you'd have to write sth like double uninit; v4df res = { v2dfv[0], v2dfv[1], uninit, uninit }; which would get you D.1723 = BIT_FIELD_REF <x, 64, 0>; D.1724 = BIT_FIELD_REF <x, 64, 64>; D.1725 = {D.1723, D.1724, uninit, uninit}; at the moment. And of course awkward code in the end ;) Which leaves the other option of folding the __builtin_ia32_ps256_ps in the target (and most other builtins). Just side-tracking from the RTL issue of course ... Richard. > -- > Marc Glisse ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2013-03-19 21:22 ` Richard Henderson 2013-03-20 15:00 ` Marc Glisse @ 2014-06-10 6:36 ` Marc Glisse 2014-07-26 9:53 ` Marc Glisse 1 sibling, 1 reply; 12+ messages in thread From: Marc Glisse @ 2014-06-10 6:36 UTC (permalink / raw) To: Richard Henderson; +Cc: gcc-patches [-- Attachment #1: Type: TEXT/PLAIN, Size: 1239 bytes --] On Tue, 19 Mar 2013, Richard Henderson wrote: > I'm not fond of this, primarily because I believe the pattern should > not exist at all. One year later, new try. Tweaking the pattern, I ended up with a copy of the mov pattern (the subreg is generated automatically when the modes don't match), so I just removed it. I know the comment in emit-rtl.c says splitters are a better way forward than subregs, but I haven't managed with splitters while the subreg patch is very simple :-) I added a -O0 testcase because when I was experimenting I had many versions that worked for -O2 but ICEd at -O0 (and vice versa), but it might be redundant with some other tests. Bootstrap+testsuite on x86_64-linux-gnu. 2014-06-10 Marc Glisse <marc.glisse@inria.fr> PR target/50829 gcc/ * config/i386/sse.md (enum unspec): Remove UNSPEC_CAST. (avx_<castmode><avxsizesuffix>_<castmode>): Remove. * config/i386/i386.c (builtin_description) [__builtin_ia32_si256_si, __builtin_ia32_ps256_ps, __builtin_ia32_pd256_pd]: Replace the removed insn with mov. * emit-rtl.c (validate_subreg): Allow vector-vector subregs. gcc/testsuite/ * gcc.target/i386/pr50829-1.c: New file. * gcc.target/i386/pr50829-2.c: New file. -- Marc Glisse [-- Attachment #2: Type: TEXT/PLAIN, Size: 7531 bytes --] Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 211397) +++ gcc/config/i386/i386.c (working copy) @@ -29793,23 +29793,23 @@ static const struct builtin_description { OPTION_MASK_ISA_AVX, CODE_FOR_avx_roundps_sfix256, "__builtin_ia32_ceilps_sfix256", IX86_BUILTIN_CEILPS_SFIX256, (enum rtx_code) ROUND_CEIL, (int) V8SI_FTYPE_V8SF_ROUND }, { OPTION_MASK_ISA_AVX, CODE_FOR_roundv8sf2, "__builtin_ia32_roundps_az256", IX86_BUILTIN_ROUNDPS_AZ256, UNKNOWN, (int) V8SF_FTYPE_V8SF }, { OPTION_MASK_ISA_AVX, CODE_FOR_roundv8sf2_sfix, "__builtin_ia32_roundps_az_sfix256", IX86_BUILTIN_ROUNDPS_AZ_SFIX256, UNKNOWN, (int) V8SI_FTYPE_V8SF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_unpckhpd256, "__builtin_ia32_unpckhpd256", IX86_BUILTIN_UNPCKHPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_unpcklpd256, "__builtin_ia32_unpcklpd256", IX86_BUILTIN_UNPCKLPD256, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_unpckhps256, "__builtin_ia32_unpckhps256", IX86_BUILTIN_UNPCKHPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_unpcklps256, "__builtin_ia32_unpcklps256", IX86_BUILTIN_UNPCKLPS256, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_si256_si, "__builtin_ia32_si256_si", IX86_BUILTIN_SI256_SI, UNKNOWN, (int) V8SI_FTYPE_V4SI }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_ps256_ps, "__builtin_ia32_ps256_ps", IX86_BUILTIN_PS256_PS, UNKNOWN, (int) V8SF_FTYPE_V4SF }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_pd256_pd, "__builtin_ia32_pd256_pd", IX86_BUILTIN_PD256_PD, UNKNOWN, (int) V4DF_FTYPE_V2DF }, + { OPTION_MASK_ISA_AVX, CODE_FOR_movv8si, "__builtin_ia32_si256_si", IX86_BUILTIN_SI256_SI, UNKNOWN, (int) V8SI_FTYPE_V4SI }, + { OPTION_MASK_ISA_AVX, CODE_FOR_movv8sf, "__builtin_ia32_ps256_ps", IX86_BUILTIN_PS256_PS, UNKNOWN, (int) V8SF_FTYPE_V4SF }, + { OPTION_MASK_ISA_AVX, CODE_FOR_movv4df, "__builtin_ia32_pd256_pd", IX86_BUILTIN_PD256_PD, UNKNOWN, (int) V4DF_FTYPE_V2DF }, { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v8si, "__builtin_ia32_si_si256", IX86_BUILTIN_SI_SI256, UNKNOWN, (int) V4SI_FTYPE_V8SI }, { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v8sf, "__builtin_ia32_ps_ps256", IX86_BUILTIN_PS_PS256, UNKNOWN, (int) V4SF_FTYPE_V8SF }, { OPTION_MASK_ISA_AVX, CODE_FOR_vec_extract_lo_v4df, "__builtin_ia32_pd_pd256", IX86_BUILTIN_PD_PD256, UNKNOWN, (int) V2DF_FTYPE_V4DF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestzpd", IX86_BUILTIN_VTESTZPD, EQ, (int) INT_FTYPE_V2DF_V2DF_PTEST }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestcpd", IX86_BUILTIN_VTESTCPD, LTU, (int) INT_FTYPE_V2DF_V2DF_PTEST }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestpd, "__builtin_ia32_vtestnzcpd", IX86_BUILTIN_VTESTNZCPD, GTU, (int) INT_FTYPE_V2DF_V2DF_PTEST }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestps, "__builtin_ia32_vtestzps", IX86_BUILTIN_VTESTZPS, EQ, (int) INT_FTYPE_V4SF_V4SF_PTEST }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestps, "__builtin_ia32_vtestcps", IX86_BUILTIN_VTESTCPS, LTU, (int) INT_FTYPE_V4SF_V4SF_PTEST }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vtestps, "__builtin_ia32_vtestnzcps", IX86_BUILTIN_VTESTNZCPS, GTU, (int) INT_FTYPE_V4SF_V4SF_PTEST }, Index: gcc/config/i386/sse.md =================================================================== --- gcc/config/i386/sse.md (revision 211397) +++ gcc/config/i386/sse.md (working copy) @@ -66,21 +66,20 @@ UNSPEC_AESKEYGENASSIST ;; For PCLMUL support UNSPEC_PCLMUL ;; For AVX support UNSPEC_PCMP UNSPEC_VPERMIL UNSPEC_VPERMIL2 UNSPEC_VPERMIL2F128 - UNSPEC_CAST UNSPEC_VTESTP UNSPEC_VCVTPH2PS UNSPEC_VCVTPS2PH ;; For AVX2 support UNSPEC_VPERMVAR UNSPEC_VPERMTI UNSPEC_GATHER UNSPEC_VSIBADDR @@ -14816,40 +14815,20 @@ (define_expand "maskstore<mode>" [(set (match_operand:V48_AVX2 0 "memory_operand") (unspec:V48_AVX2 [(match_operand:<sseintvecmode> 2 "register_operand") (match_operand:V48_AVX2 1 "register_operand") (match_dup 0)] UNSPEC_MASKMOV))] "TARGET_AVX") -(define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" - [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") - (unspec:AVX256MODE2P - [(match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "xm,x")] - UNSPEC_CAST))] - "TARGET_AVX" - "#" - "&& reload_completed" - [(const_int 0)] -{ - rtx op0 = operands[0]; - rtx op1 = operands[1]; - if (REG_P (op0)) - op0 = gen_rtx_REG (<ssehalfvecmode>mode, REGNO (op0)); - else - op1 = gen_rtx_REG (<MODE>mode, REGNO (op1)); - emit_move_insn (op0, op1); - DONE; -}) - (define_expand "vec_init<mode>" [(match_operand:V_256 0 "register_operand") (match_operand 1)] "TARGET_AVX" { ix86_expand_vector_init (false, operands[0], operands[1]); DONE; }) (define_expand "vec_init<mode>" Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 211397) +++ gcc/emit-rtl.c (working copy) @@ -775,20 +775,23 @@ validate_subreg (enum machine_mode omode else if ((COMPLEX_MODE_P (imode) || VECTOR_MODE_P (imode)) && GET_MODE_INNER (imode) == omode) ; /* ??? x86 sse code makes heavy use of *paradoxical* vector subregs, i.e. (subreg:V4SF (reg:SF) 0). This surely isn't the cleanest way to represent this. It's questionable if this ought to be represented at all -- why can't this all be hidden in post-reload splitters that make arbitrarily mode changes to the registers themselves. */ else if (VECTOR_MODE_P (omode) && GET_MODE_INNER (omode) == imode) ; + else if (VECTOR_MODE_P (omode) && VECTOR_MODE_P (imode) + && GET_MODE_INNER (omode) == GET_MODE_INNER (imode)) + ; /* Subregs involving floating point modes are not allowed to change size. Therefore (subreg:DI (reg:DF) 0) is fine, but (subreg:SI (reg:DF) 0) isn't. */ else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode)) { if (! (isize == osize /* LRA can use subreg to store a floating point value in an integer mode. Although the floating point and the integer modes need the same number of hard registers, the size of floating point mode can be less than the Index: gcc/testsuite/gcc.target/i386/pr50829-1.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr50829-1.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr50829-1.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +#include <x86intrin.h> + +__m256d +concat (__m128d x) +{ + __m256d z = _mm256_castpd128_pd256 (x); + return _mm256_insertf128_pd (z, x, 1); +} + +/* { dg-final { scan-assembler-not "vmov" } } */ Index: gcc/testsuite/gcc.target/i386/pr50829-2.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr50829-2.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr50829-2.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -mavx" } */ + +#include <x86intrin.h> + +__m256d +concat (__m128d x) +{ + __m256d z = _mm256_castpd128_pd256 (x); + return _mm256_insertf128_pd (z, x, 1); +} ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2014-06-10 6:36 ` Marc Glisse @ 2014-07-26 9:53 ` Marc Glisse 2014-08-26 12:44 ` Marc Glisse 0 siblings, 1 reply; 12+ messages in thread From: Marc Glisse @ 2014-07-26 9:53 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Henderson Hello, any comment on this patch? https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html On Tue, 10 Jun 2014, Marc Glisse wrote: > On Tue, 19 Mar 2013, Richard Henderson wrote: > >> I'm not fond of this, primarily because I believe the pattern should >> not exist at all. > > One year later, new try. Tweaking the pattern, I ended up with a copy of the > mov pattern (the subreg is generated automatically when the modes don't > match), so I just removed it. I know the comment in emit-rtl.c says splitters > are a better way forward than subregs, but I haven't managed with splitters > while the subreg patch is very simple :-) I added a -O0 testcase because when > I was experimenting I had many versions that worked for -O2 but ICEd at -O0 > (and vice versa), but it might be redundant with some other tests. > > Bootstrap+testsuite on x86_64-linux-gnu. > > 2014-06-10 Marc Glisse <marc.glisse@inria.fr> > > PR target/50829 > gcc/ > * config/i386/sse.md (enum unspec): Remove UNSPEC_CAST. > (avx_<castmode><avxsizesuffix>_<castmode>): Remove. > * config/i386/i386.c (builtin_description) [__builtin_ia32_si256_si, > __builtin_ia32_ps256_ps, __builtin_ia32_pd256_pd]: Replace the > removed insn with mov. > * emit-rtl.c (validate_subreg): Allow vector-vector subregs. > > gcc/testsuite/ > * gcc.target/i386/pr50829-1.c: New file. > * gcc.target/i386/pr50829-2.c: New file. -- Marc Glisse ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RTL, i386] Use subreg instead of UNSPEC_CAST 2014-07-26 9:53 ` Marc Glisse @ 2014-08-26 12:44 ` Marc Glisse 0 siblings, 0 replies; 12+ messages in thread From: Marc Glisse @ 2014-08-26 12:44 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Henderson Ping? On Sat, 26 Jul 2014, Marc Glisse wrote: > Hello, > > any comment on this patch? > https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html > > On Tue, 10 Jun 2014, Marc Glisse wrote: > >> On Tue, 19 Mar 2013, Richard Henderson wrote: >> >>> I'm not fond of this, primarily because I believe the pattern should >>> not exist at all. >> >> One year later, new try. Tweaking the pattern, I ended up with a copy of >> the mov pattern (the subreg is generated automatically when the modes don't >> match), so I just removed it. I know the comment in emit-rtl.c says >> splitters are a better way forward than subregs, but I haven't managed with >> splitters while the subreg patch is very simple :-) I added a -O0 testcase >> because when I was experimenting I had many versions that worked for -O2 >> but ICEd at -O0 (and vice versa), but it might be redundant with some other >> tests. >> >> Bootstrap+testsuite on x86_64-linux-gnu. >> >> 2014-06-10 Marc Glisse <marc.glisse@inria.fr> >> >> PR target/50829 >> gcc/ >> * config/i386/sse.md (enum unspec): Remove UNSPEC_CAST. >> (avx_<castmode><avxsizesuffix>_<castmode>): Remove. >> * config/i386/i386.c (builtin_description) [__builtin_ia32_si256_si, >> __builtin_ia32_ps256_ps, __builtin_ia32_pd256_pd]: Replace the >> removed insn with mov. >> * emit-rtl.c (validate_subreg): Allow vector-vector subregs. >> >> gcc/testsuite/ >> * gcc.target/i386/pr50829-1.c: New file. >> * gcc.target/i386/pr50829-2.c: New file. -- Marc Glisse ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-08-26 12:44 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-03-19 15:47 [RTL, i386] Use subreg instead of UNSPEC_CAST Marc Glisse 2013-03-19 21:22 ` Richard Henderson 2013-03-20 15:00 ` Marc Glisse 2013-03-20 15:13 ` Richard Henderson 2013-03-20 15:17 ` Richard Biener 2013-03-20 15:29 ` Marc Glisse 2013-03-20 15:44 ` Richard Biener 2013-03-20 15:54 ` Marc Glisse 2013-03-21 9:24 ` Richard Biener 2014-06-10 6:36 ` Marc Glisse 2014-07-26 9:53 ` Marc Glisse 2014-08-26 12:44 ` Marc Glisse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).