* [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts @ 2017-07-27 23:21 Michael Meissner 2017-07-28 7:51 ` Richard Biener 2017-07-28 21:09 ` Segher Boessenkool 0 siblings, 2 replies; 16+ messages in thread From: Michael Meissner @ 2017-07-27 23:21 UTC (permalink / raw) To: GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt [-- Attachment #1: Type: text/plain, Size: 2105 bytes --] This patches optimizes the PowerPC vector set operation for 64-bit doubles and longs where the elements in the vector set may have been extracted from another vector (PR target/81593): Here an an example: vector double test_vpasted (vector double high, vector double low) { vector double res; res[1] = high[1]; res[0] = low[0]; return res; } Previously it would generate: xxpermdi 12,34,34,2 vspltisw 2,0 xxlor 0,35,35 xxpermdi 34,34,12,0 xxpermdi 34,0,34,1 and with these patches, it now generates: xxpermdi 34,35,34,1 I have tested it on a little endian power8 system and a big endian power7 system with the usual bootstrap and make checks with no regressions. Can I check this into the trunk? I also built Spec 2006 with the compiler, and saw no changes in the code generated. This isn't surprising because it isn't something that auto vectorization might generate by default. [gcc] 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/81593 * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New declaration. * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to emit XXPERMDI accessing either double word in either vector register inputs. * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator): Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify the constraints with the removal of the -mupper-regs-* switches. (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs where either register might have come from VEC_SELECT. (vsx_concat_<mode>_2): Likewise. (vsx_concat_<mode>_3): Likewise. (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a VEC_CONCAT rather than use an UNSPEC to specify the option. [gcc/testsuite] 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/81593 * gcc.target/powerpc/vsx-extract-6.c: New test. * gcc.target/powerpc/vsx-extract-7.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 [-- Attachment #2: pr81593.patch01b --] [-- Type: text/plain, Size: 8484 bytes --] Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000-protos.h) (revision 250577) +++ gcc/config/rs6000/rs6000-protos.h (.../gcc/config/rs6000/rs6000-protos.h) (working copy) @@ -233,6 +233,7 @@ extern void rs6000_asm_output_dwarf_pcre const char *label); extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size, const char *label); +extern const char *rs6000_emit_xxpermdi (rtx[], rtx, rtx); /* Declare functions in rs6000-c.c */ Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000.c) (revision 250577) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000/rs6000.c) (working copy) @@ -39167,6 +39167,38 @@ rs6000_optab_supported_p (int op, machin return true; } } + +\f +/* Emit a XXPERMDI instruction that can extract from either double word of the + two arguments. ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving + which double word to be used for the operand. */ + +const char * +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2) +{ + int op1_dword = (!element1) ? 0 : INTVAL (element1); + int op2_dword = (!element2) ? 0 : INTVAL (element2); + + gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1)); + + if (BYTES_BIG_ENDIAN) + { + operands[3] = GEN_INT (2*op1_dword + op2_dword); + return "xxpermdi %x0,%x1,%x2,%3"; + } + else + { + if (element1) + op1_dword = 1 - op1_dword; + + if (element2) + op2_dword = 1 - op2_dword; + + operands[3] = GEN_INT (op1_dword + 2*op2_dword); + return "xxpermdi %x0,%x2,%x1,%3"; + } +} + \f struct gcc_target targetm = TARGET_INITIALIZER; Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/vsx.md) (revision 250577) +++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000/vsx.md) (working copy) @@ -2366,19 +2366,17 @@ (define_insn "*vsx_float_fix_v2df2" ;; Build a V2DF/V2DI vector from two scalars (define_insn "vsx_concat_<mode>" - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we") (vec_concat:VSX_D - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b") - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))] + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b") + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))] "VECTOR_MEM_VSX_P (<MODE>mode)" { if (which_alternative == 0) - return (BYTES_BIG_ENDIAN - ? "xxpermdi %x0,%x1,%x2,0" - : "xxpermdi %x0,%x2,%x1,0"); + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX); else if (which_alternative == 1) - return (BYTES_BIG_ENDIAN + return (VECTOR_ELT_ORDER_BIG ? "mtvsrdd %x0,%1,%2" : "mtvsrdd %x0,%2,%1"); @@ -2387,6 +2385,47 @@ (define_insn "vsx_concat_<mode>" } [(set_attr "type" "vecperm")]) +;; Combiner patterns to allow creating XXPERMDI's to access either double +;; register in a vector register. Note, rs6000_emit_xxpermdi expects +;; operands[0..2] to be the vector registers. +(define_insn "*vsx_concat_<mode>_1" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (vec_select:<VS_scalar> + (match_operand:VSX_D 1 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")])) + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa")))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + return rs6000_emit_xxpermdi (operands, operands[3], NULL_RTX); +}) + +(define_insn "*vsx_concat_<mode>_2" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa") + (vec_select:<VS_scalar> + (match_operand:VSX_D 2 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + return rs6000_emit_xxpermdi (operands, NULL_RTX, operands[3]); +}) + +(define_insn "*vsx_concat_<mode>_3" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (vec_select:<VS_scalar> + (match_operand:VSX_D 1 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")])) + (vec_select:<VS_scalar> + (match_operand:VSX_D 2 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + return rs6000_emit_xxpermdi (operands, operands[3], operands[4]); +}) + ;; Special purpose concat using xxpermdi to glue two single precision values ;; together, relying on the fact that internally scalar floats are represented ;; as doubles. This is used to initialize a V4SF vector with 4 floats @@ -2587,25 +2626,35 @@ (define_expand "vsx_set_v1ti" DONE; }) -;; Set the element of a V2DI/VD2F mode -(define_insn "vsx_set_<mode>" - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>") - (unspec:VSX_D - [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>") - (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>") - (match_operand:QI 3 "u5bit_cint_operand" "i,i")] - UNSPEC_VSX_SET))] +;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT +(define_expand "vsx_set_<mode>" + [(use (match_operand:VSX_D 0 "vsx_register_operand")) + (use (match_operand:VSX_D 1 "vsx_register_operand")) + (use (match_operand:<VS_scalar> 2 "gpc_reg_operand")) + (use (match_operand:QI 3 "const_0_to_1_operand"))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - int idx_first = BYTES_BIG_ENDIAN ? 0 : 1; - if (INTVAL (operands[3]) == idx_first) - return \"xxpermdi %x0,%x2,%x1,1\"; - else if (INTVAL (operands[3]) == 1 - idx_first) - return \"xxpermdi %x0,%x1,%x2,0\"; + rtx dest = operands[0]; + rtx vec_reg = operands[1]; + rtx value = operands[2]; + rtx ele = operands[3]; + rtx tmp = gen_reg_rtx (<VS_scalar>mode); + + if (ele == const0_rtx) + { + emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx)); + emit_insn (gen_vsx_concat_<mode> (dest, value, tmp)); + DONE; + } + else if (ele == const1_rtx) + { + emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx)); + emit_insn (gen_vsx_concat_<mode> (dest, tmp, value)); + DONE; + } else gcc_unreachable (); -} - [(set_attr "type" "vecperm")]) +}) ;; Extract a DF/DI element from V2DF/V2DI ;; Optimize cases were we can do a simple or direct move. Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c) (revision 250640) @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +vector unsigned long +test_vpasted (vector unsigned long high, vector unsigned long low) +{ + vector unsigned long res; + res[1] = high[1]; + res[0] = low[0]; + return res; +} + +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 250640) @@ -0,0 +1,15 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +vector double +test_vpasted (vector double high, vector double low) +{ + vector double res; + res[1] = high[1]; + res[0] = low[0]; + return res; +} + +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner @ 2017-07-28 7:51 ` Richard Biener 2017-07-28 8:02 ` Andrew Pinski 2017-07-28 14:44 ` Michael Meissner 2017-07-28 21:09 ` Segher Boessenkool 1 sibling, 2 replies; 16+ messages in thread From: Richard Biener @ 2017-07-28 7:51 UTC (permalink / raw) To: Michael Meissner, GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner <meissner@linux.vnet.ibm.com> wrote: > This patches optimizes the PowerPC vector set operation for 64-bit doubles and > longs where the elements in the vector set may have been extracted from another > vector (PR target/81593): > > Here an an example: > > vector double > test_vpasted (vector double high, vector double low) > { > vector double res; > res[1] = high[1]; > res[0] = low[0]; > return res; > } Interesting. We expand from <bb 2> [100.00%] [count: INV]: _1 = BIT_FIELD_REF <high_4(D), 64, 64>; res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>; _2 = BIT_FIELD_REF <low_7(D), 64, 0>; res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>; return res_8; but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap pass looks like the canonical pass for this even though it's quite awkward to fill this in. So a match.pd rule would work as well here - your ppc backend patterns are v2df specific, right? > Previously it would generate: > > xxpermdi 12,34,34,2 > vspltisw 2,0 > xxlor 0,35,35 > xxpermdi 34,34,12,0 > xxpermdi 34,0,34,1 > > and with these patches, it now generates: > > xxpermdi 34,35,34,1 > > I have tested it on a little endian power8 system and a big endian power7 > system with the usual bootstrap and make checks with no regressions. Can I > check this into the trunk? > > I also built Spec 2006 with the compiler, and saw no changes in the code > generated. This isn't surprising because it isn't something that auto > vectorization might generate by default. > > [gcc] > 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com> > > PR target/81593 > * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New > declaration. > * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to > emit XXPERMDI accessing either double word in either vector > register inputs. > * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator): > Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify > the constraints with the removal of the -mupper-regs-* switches. > (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs > where either register might have come from VEC_SELECT. > (vsx_concat_<mode>_2): Likewise. > (vsx_concat_<mode>_3): Likewise. > (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a > VEC_CONCAT rather than use an UNSPEC to specify the option. > > [gcc/testsuite] > 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com> > > PR target/81593 > * gcc.target/powerpc/vsx-extract-6.c: New test. > * gcc.target/powerpc/vsx-extract-7.c: Likewise. > > -- > Michael Meissner, IBM > IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA > email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-28 7:51 ` Richard Biener @ 2017-07-28 8:02 ` Andrew Pinski 2017-07-28 8:20 ` Richard Biener 2017-07-28 8:42 ` Marc Glisse 2017-07-28 14:44 ` Michael Meissner 1 sibling, 2 replies; 16+ messages in thread From: Andrew Pinski @ 2017-07-28 8:02 UTC (permalink / raw) To: Richard Biener Cc: Michael Meissner, GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner > <meissner@linux.vnet.ibm.com> wrote: >> This patches optimizes the PowerPC vector set operation for 64-bit doubles and >> longs where the elements in the vector set may have been extracted from another >> vector (PR target/81593): >> >> Here an an example: >> >> vector double >> test_vpasted (vector double high, vector double low) >> { >> vector double res; >> res[1] = high[1]; >> res[0] = low[0]; >> return res; >> } > > Interesting. We expand from > > <bb 2> [100.00%] [count: INV]: > _1 = BIT_FIELD_REF <high_4(D), 64, 64>; > res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>; > _2 = BIT_FIELD_REF <low_7(D), 64, 0>; > res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>; > return res_8; > > but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap > pass looks like the canonical pass for this even though it's quite awkward > to fill this in. I was thinking about this exactly. Though for the scale use of BIT_INSERT_EXPR/BIT_FIELD_REF. I have a case where someone writes (this shows up in GCC too): a->b = c->b; a->d = c->d; a->e = c->e; a->f = c->f; a->g = c->g; a->h = c->h; Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have: _1 = BIT_FIELD_REF <a.1_3, 2, 0>; _8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>; _2 = BIT_FIELD_REF <a.1_3, 4, 2>; _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>; .... For the vector case, can't we write it as: _1 = BIT_FIELD_REF <high_4(D), 64, 64>; _2 = BIT_FIELD_REF <low_7(D), 64, 0>; res_8 = {_1, _2}; And then have some match.pd patterns (which might get complex), to rewrite that into VEC_PERM_EXPR? The reason why I ask that is because say someone who wrote: vector double test_vpasted (vector double high, vector double low) { vector double res = { high[1], low[0] }; return res; } Thanks, Andrew Pinski > > So a match.pd rule would work as well here - your ppc backend patterns > are v2df specific, right? > >> Previously it would generate: >> >> xxpermdi 12,34,34,2 >> vspltisw 2,0 >> xxlor 0,35,35 >> xxpermdi 34,34,12,0 >> xxpermdi 34,0,34,1 >> >> and with these patches, it now generates: >> >> xxpermdi 34,35,34,1 >> >> I have tested it on a little endian power8 system and a big endian power7 >> system with the usual bootstrap and make checks with no regressions. Can I >> check this into the trunk? >> >> I also built Spec 2006 with the compiler, and saw no changes in the code >> generated. This isn't surprising because it isn't something that auto >> vectorization might generate by default. >> >> [gcc] >> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com> >> >> PR target/81593 >> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New >> declaration. >> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to >> emit XXPERMDI accessing either double word in either vector >> register inputs. >> * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator): >> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify >> the constraints with the removal of the -mupper-regs-* switches. >> (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs >> where either register might have come from VEC_SELECT. >> (vsx_concat_<mode>_2): Likewise. >> (vsx_concat_<mode>_3): Likewise. >> (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a >> VEC_CONCAT rather than use an UNSPEC to specify the option. >> >> [gcc/testsuite] >> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com> >> >> PR target/81593 >> * gcc.target/powerpc/vsx-extract-6.c: New test. >> * gcc.target/powerpc/vsx-extract-7.c: Likewise. >> >> -- >> Michael Meissner, IBM >> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA >> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-28 8:02 ` Andrew Pinski @ 2017-07-28 8:20 ` Richard Biener 2017-07-28 8:42 ` Marc Glisse 1 sibling, 0 replies; 16+ messages in thread From: Richard Biener @ 2017-07-28 8:20 UTC (permalink / raw) To: Andrew Pinski Cc: Michael Meissner, GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt On Fri, Jul 28, 2017 at 10:02 AM, Andrew Pinski <pinskia@gmail.com> wrote: > On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner >> <meissner@linux.vnet.ibm.com> wrote: >>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and >>> longs where the elements in the vector set may have been extracted from another >>> vector (PR target/81593): >>> >>> Here an an example: >>> >>> vector double >>> test_vpasted (vector double high, vector double low) >>> { >>> vector double res; >>> res[1] = high[1]; >>> res[0] = low[0]; >>> return res; >>> } >> >> Interesting. We expand from >> >> <bb 2> [100.00%] [count: INV]: >> _1 = BIT_FIELD_REF <high_4(D), 64, 64>; >> res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>; >> _2 = BIT_FIELD_REF <low_7(D), 64, 0>; >> res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>; >> return res_8; >> >> but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap >> pass looks like the canonical pass for this even though it's quite awkward >> to fill this in. > > I was thinking about this exactly. Though for the scale use of > BIT_INSERT_EXPR/BIT_FIELD_REF. > I have a case where someone writes (this shows up in GCC too): > a->b = c->b; > a->d = c->d; > a->e = c->e; > a->f = c->f; > a->g = c->g; > a->h = c->h; > > Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have: > _1 = BIT_FIELD_REF <a.1_3, 2, 0>; > _8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>; > _2 = BIT_FIELD_REF <a.1_3, 4, 2>; > _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>; > .... > > For the vector case, can't we write it as: > _1 = BIT_FIELD_REF <high_4(D), 64, 64>; > _2 = BIT_FIELD_REF <low_7(D), 64, 0>; > res_8 = {_1, _2}; > > And then have some match.pd patterns (which might get complex), to > rewrite that into VEC_PERM_EXPR? > The reason why I ask that is because say someone who wrote: > vector double > test_vpasted (vector double high, vector double low) > { > vector double res = { high[1], low[0] }; > return res; > } I still believe a proper pass is better than match.pd patterns (which are awkward when dealing with "variable operand number" cases). I believe in the end we want to "unify" SRA, bswap and store-merging at least. Analyze memory/component accesses, their flow and then pattern-match the result. bswap is good with the flow stuff but its memory/component access analysis is too ad-hoc. "unify" in the sense of using common infrastructure. Richard. > > Thanks, > Andrew Pinski > >> >> So a match.pd rule would work as well here - your ppc backend patterns >> are v2df specific, right? >> >>> Previously it would generate: >>> >>> xxpermdi 12,34,34,2 >>> vspltisw 2,0 >>> xxlor 0,35,35 >>> xxpermdi 34,34,12,0 >>> xxpermdi 34,0,34,1 >>> >>> and with these patches, it now generates: >>> >>> xxpermdi 34,35,34,1 >>> >>> I have tested it on a little endian power8 system and a big endian power7 >>> system with the usual bootstrap and make checks with no regressions. Can I >>> check this into the trunk? >>> >>> I also built Spec 2006 with the compiler, and saw no changes in the code >>> generated. This isn't surprising because it isn't something that auto >>> vectorization might generate by default. >>> >>> [gcc] >>> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com> >>> >>> PR target/81593 >>> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New >>> declaration. >>> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to >>> emit XXPERMDI accessing either double word in either vector >>> register inputs. >>> * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator): >>> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify >>> the constraints with the removal of the -mupper-regs-* switches. >>> (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs >>> where either register might have come from VEC_SELECT. >>> (vsx_concat_<mode>_2): Likewise. >>> (vsx_concat_<mode>_3): Likewise. >>> (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a >>> VEC_CONCAT rather than use an UNSPEC to specify the option. >>> >>> [gcc/testsuite] >>> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com> >>> >>> PR target/81593 >>> * gcc.target/powerpc/vsx-extract-6.c: New test. >>> * gcc.target/powerpc/vsx-extract-7.c: Likewise. >>> >>> -- >>> Michael Meissner, IBM >>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA >>> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-28 8:02 ` Andrew Pinski 2017-07-28 8:20 ` Richard Biener @ 2017-07-28 8:42 ` Marc Glisse 1 sibling, 0 replies; 16+ messages in thread From: Marc Glisse @ 2017-07-28 8:42 UTC (permalink / raw) To: Andrew Pinski Cc: Richard Biener, Michael Meissner, GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt On Fri, 28 Jul 2017, Andrew Pinski wrote: > For the vector case, can't we write it as: > _1 = BIT_FIELD_REF <high_4(D), 64, 64>; > _2 = BIT_FIELD_REF <low_7(D), 64, 0>; > res_8 = {_1, _2}; > > And then have some match.pd patterns (which might get complex), to > rewrite that into VEC_PERM_EXPR? For this last part, we have simplify_vector_constructor in tree-ssa-forwprop.c, which currently only recognizes VEC_PERM_EXPR of a single vector, but I guess it could be extended to 2 vectors. Not as good as a bswap revamp (which will be needed anyway at some point), but less work. -- Marc Glisse ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-28 7:51 ` Richard Biener 2017-07-28 8:02 ` Andrew Pinski @ 2017-07-28 14:44 ` Michael Meissner 1 sibling, 0 replies; 16+ messages in thread From: Michael Meissner @ 2017-07-28 14:44 UTC (permalink / raw) To: Richard Biener Cc: Michael Meissner, GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt On Fri, Jul 28, 2017 at 09:51:30AM +0200, Richard Biener wrote: > On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner > <meissner@linux.vnet.ibm.com> wrote: > > This patches optimizes the PowerPC vector set operation for 64-bit doubles and > > longs where the elements in the vector set may have been extracted from another > > vector (PR target/81593): > > > > Here an an example: > > > > vector double > > test_vpasted (vector double high, vector double low) > > { > > vector double res; > > res[1] = high[1]; > > res[0] = low[0]; > > return res; > > } > > Interesting. We expand from > > <bb 2> [100.00%] [count: INV]: > _1 = BIT_FIELD_REF <high_4(D), 64, 64>; > res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>; > _2 = BIT_FIELD_REF <low_7(D), 64, 0>; > res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>; > return res_8; > > but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap > pass looks like the canonical pass for this even though it's quite awkward > to fill this in. > > So a match.pd rule would work as well here - your ppc backend patterns > are v2df specific, right? Both V2DF and V2DI. While it would be great to have a machine independent optimization, my patches would also work for PowerPC specific built-ins for vector extract and vector insert. Also my patches replaces an UNSPEC to create the vector with VEC_CONCAT. Thus work going on in for machine independent support should not preclude this patch from being accepted in the PowerPC backend. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner 2017-07-28 7:51 ` Richard Biener @ 2017-07-28 21:09 ` Segher Boessenkool 2017-07-30 14:01 ` Bill Schmidt ` (2 more replies) 1 sibling, 3 replies; 16+ messages in thread From: Segher Boessenkool @ 2017-07-28 21:09 UTC (permalink / raw) To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt Hi! On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote: > This patches optimizes the PowerPC vector set operation for 64-bit doubles and > longs where the elements in the vector set may have been extracted from another > vector (PR target/81593): > * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to > emit XXPERMDI accessing either double word in either vector > register inputs. "emit" is not a good name for this: that is generally used for something that does emit_insn, i.e. put an insn in the instruction stream. This function returns a string a define_insn can return. For the rl* insns I called the similar functions rs6000_insn_for_*, maybe something like that is better here? > +/* Emit a XXPERMDI instruction that can extract from either double word of the > + two arguments. ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving > + which double word to be used for the operand. */ > + > +const char * > +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2) > +{ > + int op1_dword = (!element1) ? 0 : INTVAL (element1); > + int op2_dword = (!element2) ? 0 : INTVAL (element2); > + > + gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1)); > + > + if (BYTES_BIG_ENDIAN) > + { > + operands[3] = GEN_INT (2*op1_dword + op2_dword); > + return "xxpermdi %x0,%x1,%x2,%3"; > + } > + else > + { > + if (element1) > + op1_dword = 1 - op1_dword; > + > + if (element2) > + op2_dword = 1 - op2_dword; > + > + operands[3] = GEN_INT (op1_dword + 2*op2_dword); > + return "xxpermdi %x0,%x2,%x1,%3"; > + } > +} I think calling this with the rtx elementN args makes this only more complicated (the function comment doesn't say what they are or what NULL means, btw). > (define_insn "vsx_concat_<mode>" > - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we") > (vec_concat:VSX_D > - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b") > - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))] > + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b") > + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > if (which_alternative == 0) > - return (BYTES_BIG_ENDIAN > - ? "xxpermdi %x0,%x1,%x2,0" > - : "xxpermdi %x0,%x2,%x1,0"); > + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX); > > else if (which_alternative == 1) > - return (BYTES_BIG_ENDIAN > + return (VECTOR_ELT_ORDER_BIG > ? "mtvsrdd %x0,%1,%2" > : "mtvsrdd %x0,%2,%1"); This one could be { if (!BYTES_BIG_ENDIAN) std::swap (operands[1], operands[2]); switch (which_alternative) { case 0: return "xxpermdi %x0,%x1,%x2,0"; case 1: return "mtvsrdd %x0,%1,%2"; default: gcc_unreachable (); } } (Could/should we use xxmrghd instead? Do all supported assemblers know that extended mnemonic, is it actually more readable?) > --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 250640) > @@ -0,0 +1,15 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-O2 -mvsx" } */ > + > +vector double > +test_vpasted (vector double high, vector double low) > +{ > + vector double res; > + res[1] = high[1]; > + res[0] = low[0]; > + return res; > +} > + > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ In this and the other testcase, should you test no other insns at all are generated? Segher ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-28 21:09 ` Segher Boessenkool @ 2017-07-30 14:01 ` Bill Schmidt 2017-07-31 17:41 ` Michael Meissner 2017-07-31 17:38 ` Michael Meissner 2017-08-02 14:29 ` Michael Meissner 2 siblings, 1 reply; 16+ messages in thread From: Bill Schmidt @ 2017-07-30 14:01 UTC (permalink / raw) To: Segher Boessenkool; +Cc: Michael Meissner, GCC Patches, David Edelsohn > On Jul 28, 2017, at 4:08 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > Hi! > > On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote: >> This patches optimizes the PowerPC vector set operation for 64-bit doubles and >> longs where the elements in the vector set may have been extracted from another >> vector (PR target/81593): > >> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to >> emit XXPERMDI accessing either double word in either vector >> register inputs. > > "emit" is not a good name for this: that is generally used for something > that does emit_insn, i.e. put an insn in the instruction stream. This > function returns a string a define_insn can return. For the rl* insns > I called the similar functions rs6000_insn_for_*, maybe something like > that is better here? > >> +/* Emit a XXPERMDI instruction that can extract from either double word of the >> + two arguments. ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving >> + which double word to be used for the operand. */ >> + >> +const char * >> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2) >> +{ >> + int op1_dword = (!element1) ? 0 : INTVAL (element1); >> + int op2_dword = (!element2) ? 0 : INTVAL (element2); >> + >> + gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1)); >> + >> + if (BYTES_BIG_ENDIAN) >> + { >> + operands[3] = GEN_INT (2*op1_dword + op2_dword); >> + return "xxpermdi %x0,%x1,%x2,%3"; >> + } >> + else >> + { >> + if (element1) >> + op1_dword = 1 - op1_dword; >> + >> + if (element2) >> + op2_dword = 1 - op2_dword; >> + >> + operands[3] = GEN_INT (op1_dword + 2*op2_dword); >> + return "xxpermdi %x0,%x2,%x1,%3"; >> + } >> +} > > I think calling this with the rtx elementN args makes this only more > complicated (the function comment doesn't say what they are or what > NULL means, btw). > >> (define_insn "vsx_concat_<mode>" >> - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") >> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we") >> (vec_concat:VSX_D >> - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b") >> - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))] >> + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b") >> + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))] >> "VECTOR_MEM_VSX_P (<MODE>mode)" >> { >> if (which_alternative == 0) >> - return (BYTES_BIG_ENDIAN >> - ? "xxpermdi %x0,%x1,%x2,0" >> - : "xxpermdi %x0,%x2,%x1,0"); >> + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX); >> >> else if (which_alternative == 1) >> - return (BYTES_BIG_ENDIAN >> + return (VECTOR_ELT_ORDER_BIG >> ? "mtvsrdd %x0,%1,%2" >> : "mtvsrdd %x0,%2,%1"); > > This one could be > > { > if (!BYTES_BIG_ENDIAN) !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be). (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether. Still of two minds about this.) Bill > std::swap (operands[1], operands[2]); > > switch (which_alternative) > { > case 0: > return "xxpermdi %x0,%x1,%x2,0"; > case 1: > return "mtvsrdd %x0,%1,%2"; > default: > gcc_unreachable (); > } > } > > (Could/should we use xxmrghd instead? Do all supported assemblers know > that extended mnemonic, is it actually more readable?) > >> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 0) >> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 250640) >> @@ -0,0 +1,15 @@ >> +/* { dg-do compile { target { powerpc*-*-* } } } */ >> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ >> +/* { dg-require-effective-target powerpc_vsx_ok } */ >> +/* { dg-options "-O2 -mvsx" } */ >> + >> +vector double >> +test_vpasted (vector double high, vector double low) >> +{ >> + vector double res; >> + res[1] = high[1]; >> + res[0] = low[0]; >> + return res; >> +} >> + >> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ > > In this and the other testcase, should you test no other insns at all > are generated? > > > Segher > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-30 14:01 ` Bill Schmidt @ 2017-07-31 17:41 ` Michael Meissner 2017-07-31 17:42 ` Bill Schmidt 0 siblings, 1 reply; 16+ messages in thread From: Michael Meissner @ 2017-07-31 17:41 UTC (permalink / raw) To: Bill Schmidt Cc: Segher Boessenkool, Michael Meissner, GCC Patches, David Edelsohn On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote: > >> (define_insn "vsx_concat_<mode>" > >> - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") > >> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we") > >> (vec_concat:VSX_D > >> - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b") > >> - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))] > >> + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b") > >> + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))] > >> "VECTOR_MEM_VSX_P (<MODE>mode)" > >> { > >> if (which_alternative == 0) > >> - return (BYTES_BIG_ENDIAN > >> - ? "xxpermdi %x0,%x1,%x2,0" > >> - : "xxpermdi %x0,%x2,%x1,0"); > >> + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX); > >> > >> else if (which_alternative == 1) > >> - return (BYTES_BIG_ENDIAN > >> + return (VECTOR_ELT_ORDER_BIG > >> ? "mtvsrdd %x0,%1,%2" > >> : "mtvsrdd %x0,%2,%1"); > > > > This one could be > > > > { > > if (!BYTES_BIG_ENDIAN) > > !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be). (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether. Still of two minds about this.) > > Bill In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct (I originally had it using VECTOR_ELT_ORDER_BIG and got failures). But I need to look at it again. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-31 17:41 ` Michael Meissner @ 2017-07-31 17:42 ` Bill Schmidt 0 siblings, 0 replies; 16+ messages in thread From: Bill Schmidt @ 2017-07-31 17:42 UTC (permalink / raw) To: Michael Meissner; +Cc: Segher Boessenkool, GCC Patches, David Edelsohn > On Jul 31, 2017, at 12:40 PM, Michael Meissner <meissner@linux.vnet.ibm.com> wrote: > > On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote: >>>> (define_insn "vsx_concat_<mode>" >>>> - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") >>>> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we") >>>> (vec_concat:VSX_D >>>> - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b") >>>> - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))] >>>> + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b") >>>> + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))] >>>> "VECTOR_MEM_VSX_P (<MODE>mode)" >>>> { >>>> if (which_alternative == 0) >>>> - return (BYTES_BIG_ENDIAN >>>> - ? "xxpermdi %x0,%x1,%x2,0" >>>> - : "xxpermdi %x0,%x2,%x1,0"); >>>> + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX); >>>> >>>> else if (which_alternative == 1) >>>> - return (BYTES_BIG_ENDIAN >>>> + return (VECTOR_ELT_ORDER_BIG >>>> ? "mtvsrdd %x0,%1,%2" >>>> : "mtvsrdd %x0,%2,%1"); >>> >>> This one could be >>> >>> { >>> if (!BYTES_BIG_ENDIAN) >> >> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be). (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether. Still of two minds about this.) >> >> Bill > > In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct > (I originally had it using VECTOR_ELT_ORDER_BIG and got failures). But I need > to look at it again. Hi Mike, You misunderstand me, I think you had it right (you did move to VECTOR_ELT_ORDER_BIG here) but I just wanted to clarify that Segher's suggestion would also need to use VECTOR_ELT_ORDER_BIG. Thanks, Bill > > -- > Michael Meissner, IBM > IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA > email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-28 21:09 ` Segher Boessenkool 2017-07-30 14:01 ` Bill Schmidt @ 2017-07-31 17:38 ` Michael Meissner 2017-08-02 14:29 ` Michael Meissner 2 siblings, 0 replies; 16+ messages in thread From: Michael Meissner @ 2017-07-31 17:38 UTC (permalink / raw) To: Segher Boessenkool Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote: > Hi! > > On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote: > > This patches optimizes the PowerPC vector set operation for 64-bit doubles and > > longs where the elements in the vector set may have been extracted from another > > vector (PR target/81593): > > > * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to > > emit XXPERMDI accessing either double word in either vector > > register inputs. > > "emit" is not a good name for this: that is generally used for something > that does emit_insn, i.e. put an insn in the instruction stream. This > function returns a string a define_insn can return. For the rl* insns > I called the similar functions rs6000_insn_for_*, maybe something like > that is better here? Yeah, I should have used rs6000_output_xxpermdi or similar (or output_xxpermdi, etc.), which is what the other functions used. > > +/* Emit a XXPERMDI instruction that can extract from either double word of the > > + two arguments. ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving > > + which double word to be used for the operand. */ > > + > > +const char * > > +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2) > > +{ > > + int op1_dword = (!element1) ? 0 : INTVAL (element1); > > + int op2_dword = (!element2) ? 0 : INTVAL (element2); > > + > > + gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1)); > > + > > + if (BYTES_BIG_ENDIAN) > > + { > > + operands[3] = GEN_INT (2*op1_dword + op2_dword); > > + return "xxpermdi %x0,%x1,%x2,%3"; > > + } > > + else > > + { > > + if (element1) > > + op1_dword = 1 - op1_dword; > > + > > + if (element2) > > + op2_dword = 1 - op2_dword; > > + > > + operands[3] = GEN_INT (op1_dword + 2*op2_dword); > > + return "xxpermdi %x0,%x2,%x1,%3"; > > + } > > +} > > I think calling this with the rtx elementN args makes this only more > complicated (the function comment doesn't say what they are or what > NULL means, btw). Ok, let me think on it. > > > (define_insn "vsx_concat_<mode>" > > - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") > > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we") > > (vec_concat:VSX_D > > - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b") > > - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))] > > + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b") > > + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))] > > "VECTOR_MEM_VSX_P (<MODE>mode)" > > { > > if (which_alternative == 0) > > - return (BYTES_BIG_ENDIAN > > - ? "xxpermdi %x0,%x1,%x2,0" > > - : "xxpermdi %x0,%x2,%x1,0"); > > + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX); > > > > else if (which_alternative == 1) > > - return (BYTES_BIG_ENDIAN > > + return (VECTOR_ELT_ORDER_BIG > > ? "mtvsrdd %x0,%1,%2" > > : "mtvsrdd %x0,%2,%1"); > > This one could be > > { > if (!BYTES_BIG_ENDIAN) > std::swap (operands[1], operands[2]); > > switch (which_alternative) > { > case 0: > return "xxpermdi %x0,%x1,%x2,0"; > case 1: > return "mtvsrdd %x0,%1,%2"; > default: > gcc_unreachable (); > } > } > (Could/should we use xxmrghd instead? Do all supported assemblers know > that extended mnemonic, is it actually more readable?) For me no, xxpermdi is clearer. But if you want xxmrghd, I can do it. > > --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 0) > > +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 250640) > > @@ -0,0 +1,15 @@ > > +/* { dg-do compile { target { powerpc*-*-* } } } */ > > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > > +/* { dg-require-effective-target powerpc_vsx_ok } */ > > +/* { dg-options "-O2 -mvsx" } */ > > + > > +vector double > > +test_vpasted (vector double high, vector double low) > > +{ > > + vector double res; > > + res[1] = high[1]; > > + res[0] = low[0]; > > + return res; > > +} > > + > > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ > > In this and the other testcase, should you test no other insns at all > are generated? It is kind of hard to test a negative, without trying to guess what possible instructions could be generated. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-07-28 21:09 ` Segher Boessenkool 2017-07-30 14:01 ` Bill Schmidt 2017-07-31 17:38 ` Michael Meissner @ 2017-08-02 14:29 ` Michael Meissner 2017-08-03 15:01 ` Segher Boessenkool 2 siblings, 1 reply; 16+ messages in thread From: Michael Meissner @ 2017-08-02 14:29 UTC (permalink / raw) To: Segher Boessenkool Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt [-- Attachment #1: Type: text/plain, Size: 1951 bytes --] On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote: > > "emit" is not a good name for this: that is generally used for something > that does emit_insn, i.e. put an insn in the instruction stream. This > function returns a string a define_insn can return. For the rl* insns > I called the similar functions rs6000_insn_for_*, maybe something like > that is better here? ... > I think calling this with the rtx elementN args makes this only more > complicated (the function comment doesn't say what they are or what > NULL means, btw). ... > In this and the other testcase, should you test no other insns at all > are generated? Here are the revised patches. I tested on a little endian power8 system and a big endian power7 system. Are these patches ok for the trunk? [gcc] 2017-08-02 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/81593 * config/rs6000/rs6000-protos.h (rs6000_output_xxpermdi): New declaration. * config/rs6000/rs6000.c (rs6000_output_xxpermdi): New function to emit XXPERMDI accessing either double word in either vector register inputs. * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator): Rewrite VEC_CONCAT insn to call rs6000_output_xxpermdi. Simplify the constraints with the removal of the -mupper-regs-* switches. (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs where either register might have come from VEC_SELECT. (vsx_concat_<mode>_2): Likewise. (vsx_concat_<mode>_3): Likewise. (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a VEC_CONCAT rather than use an UNSPEC to specify the option. [gcc/testsuite] 2017-08-02 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/81593 * gcc.target/powerpc/vsx-extract-6.c: New test. * gcc.target/powerpc/vsx-extract-7.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 [-- Attachment #2: pr81593.patch02b --] [-- Type: text/plain, Size: 10486 bytes --] Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250793) +++ gcc/config/rs6000/rs6000-protos.h (.../gcc/config/rs6000) (working copy) @@ -233,6 +233,7 @@ extern void rs6000_asm_output_dwarf_pcre const char *label); extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size, const char *label); +extern const char *rs6000_output_xxpermdi (rtx, rtx, rtx, rtx, rtx); /* Declare functions in rs6000-c.c */ Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250793) +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy) @@ -39007,6 +39007,60 @@ rs6000_optab_supported_p (int op, machin return true; } } + +\f +/* Output a xxpermdi instruction that sets a 128-bit vector DEST combining two + inputs SRC1 and SRC2. + + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is + non-NULL, it is a 0 or 1 constant that gives the vector element number to + use for extracting the 64-bit double word from ARG1. + + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is + non-NULL, it is a 0 or 1 constant that gives the vector element number to + use for extracting the 64-bit double word from ARG2. + + The element number is based on the user element ordering, set by the + endianess and by the -maltivec={le,be} options. */ + +const char * +rs6000_output_xxpermdi (rtx dest, + rtx src1, + rtx src2, + rtx element1, + rtx element2) +{ + int op1_dword = (!element1) ? 0 : INTVAL (element1); + int op2_dword = (!element2) ? 0 : INTVAL (element2); + rtx xops[10]; + const char *insn_string; + + gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1)); + xops[0] = dest; + xops[1] = src1; + xops[2] = src2; + + if (BYTES_BIG_ENDIAN) + { + xops[3] = GEN_INT (2*op1_dword + op2_dword); + insn_string = "xxpermdi %x0,%x1,%x2,%3"; + } + else + { + if (element1) + op1_dword = 1 - op1_dword; + + if (element2) + op2_dword = 1 - op2_dword; + + xops[3] = GEN_INT (op1_dword + 2*op2_dword); + insn_string = "xxpermdi %x0,%x2,%x1,%3"; + } + + output_asm_insn (insn_string, xops); + return ""; +} + \f struct gcc_target targetm = TARGET_INITIALIZER; Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250793) +++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy) @@ -2364,19 +2364,18 @@ (define_insn "*vsx_float_fix_v2df2" ;; Build a V2DF/V2DI vector from two scalars (define_insn "vsx_concat_<mode>" - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we") (vec_concat:VSX_D - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b") - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))] + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b") + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))] "VECTOR_MEM_VSX_P (<MODE>mode)" { if (which_alternative == 0) - return (BYTES_BIG_ENDIAN - ? "xxpermdi %x0,%x1,%x2,0" - : "xxpermdi %x0,%x2,%x1,0"); + return rs6000_output_xxpermdi (operands[0], operands[1], operands[2], + NULL_RTX, NULL_RTX); else if (which_alternative == 1) - return (BYTES_BIG_ENDIAN + return (VECTOR_ELT_ORDER_BIG ? "mtvsrdd %x0,%1,%2" : "mtvsrdd %x0,%2,%1"); @@ -2385,6 +2384,50 @@ (define_insn "vsx_concat_<mode>" } [(set_attr "type" "vecperm")]) +;; Combiner patterns to allow creating XXPERMDI's to access either double +;; register in a vector register. Note, rs6000_output_xxpermdi expects +;; operands[0..2] to be the vector registers. +(define_insn "*vsx_concat_<mode>_1" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (vec_select:<VS_scalar> + (match_operand:VSX_D 1 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")])) + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa")))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + return rs6000_output_xxpermdi (operands[0], operands[1], operands[2], + operands[3], NULL_RTX); +}) + +(define_insn "*vsx_concat_<mode>_2" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa") + (vec_select:<VS_scalar> + (match_operand:VSX_D 2 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + return rs6000_output_xxpermdi (operands[0], operands[1], operands[2], + NULL_RTX, operands[3]); +}) + +(define_insn "*vsx_concat_<mode>_3" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (vec_select:<VS_scalar> + (match_operand:VSX_D 1 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")])) + (vec_select:<VS_scalar> + (match_operand:VSX_D 2 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + return rs6000_output_xxpermdi (operands[0], operands[1], operands[2], + operands[3], operands[4]); +}) + ;; Special purpose concat using xxpermdi to glue two single precision values ;; together, relying on the fact that internally scalar floats are represented ;; as doubles. This is used to initialize a V4SF vector with 4 floats @@ -2585,25 +2628,35 @@ (define_expand "vsx_set_v1ti" DONE; }) -;; Set the element of a V2DI/VD2F mode -(define_insn "vsx_set_<mode>" - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>") - (unspec:VSX_D - [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>") - (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>") - (match_operand:QI 3 "u5bit_cint_operand" "i,i")] - UNSPEC_VSX_SET))] +;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT +(define_expand "vsx_set_<mode>" + [(use (match_operand:VSX_D 0 "vsx_register_operand")) + (use (match_operand:VSX_D 1 "vsx_register_operand")) + (use (match_operand:<VS_scalar> 2 "gpc_reg_operand")) + (use (match_operand:QI 3 "const_0_to_1_operand"))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - int idx_first = BYTES_BIG_ENDIAN ? 0 : 1; - if (INTVAL (operands[3]) == idx_first) - return \"xxpermdi %x0,%x2,%x1,1\"; - else if (INTVAL (operands[3]) == 1 - idx_first) - return \"xxpermdi %x0,%x1,%x2,0\"; + rtx dest = operands[0]; + rtx vec_reg = operands[1]; + rtx value = operands[2]; + rtx ele = operands[3]; + rtx tmp = gen_reg_rtx (<VS_scalar>mode); + + if (ele == const0_rtx) + { + emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx)); + emit_insn (gen_vsx_concat_<mode> (dest, value, tmp)); + DONE; + } + else if (ele == const1_rtx) + { + emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx)); + emit_insn (gen_vsx_concat_<mode> (dest, tmp, value)); + DONE; + } else gcc_unreachable (); -} - [(set_attr "type" "vecperm")]) +}) ;; Extract a DF/DI element from V2DF/V2DI ;; Optimize cases were we can do a simple or direct move. Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250804) @@ -0,0 +1,25 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +vector unsigned long +test_vpasted (vector unsigned long high, vector unsigned long low) +{ + vector unsigned long res; + res[1] = high[1]; + res[0] = low[0]; + return res; +} + +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mvspltisw\M} } } */ +/* { dg-final { scan-assembler-not {\mxxlor\M} } } */ +/* { dg-final { scan-assembler-not {\mxxlxor\M} } } */ +/* { dg-final { scan-assembler-not {\mxxspltib\M} } } */ +/* { dg-final { scan-assembler-not {\mlxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mlxv[dw][24]x\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mstxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mstxv[dw][24]x\M} } } */ +/* { dg-final { scan-assembler-not {\mstvx\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250804) @@ -0,0 +1,25 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +vector double +test_vpasted (vector double high, vector double low) +{ + vector double res; + res[1] = high[1]; + res[0] = low[0]; + return res; +} + +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mvspltisw\M} } } */ +/* { dg-final { scan-assembler-not {\mxxlor\M} } } */ +/* { dg-final { scan-assembler-not {\mxxlxor\M} } } */ +/* { dg-final { scan-assembler-not {\mxxspltib\M} } } */ +/* { dg-final { scan-assembler-not {\mlxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mlxv[dw][24]x\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mstxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mstxv[dw][24]x\M} } } */ +/* { dg-final { scan-assembler-not {\mstvx\M} } } */ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-08-02 14:29 ` Michael Meissner @ 2017-08-03 15:01 ` Segher Boessenkool 2017-08-03 17:23 ` Michael Meissner 2017-08-07 13:18 ` Michael Meissner 0 siblings, 2 replies; 16+ messages in thread From: Segher Boessenkool @ 2017-08-03 15:01 UTC (permalink / raw) To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt Hi Mike, On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote: > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote: > > I think calling this with the rtx elementN args makes this only more > > complicated (the function comment doesn't say what they are or what > > NULL means, btw). You didn't handle the first part of this as far as I see? It's the big complicating issue here. > + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is > + non-NULL, it is a 0 or 1 constant that gives the vector element number to > + use for extracting the 64-bit double word from ARG1. > + > + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is > + non-NULL, it is a 0 or 1 constant that gives the vector element number to > + use for extracting the 64-bit double word from ARG2. > + > + The element number is based on the user element ordering, set by the > + endianess and by the -maltivec={le,be} options. */ ("endianness", two n's). I don't like using NULL as a magic value at all; it does not simplify this interface, it complicates it instead. Can you move the "which half is high" decision to the callers? Segher ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-08-03 15:01 ` Segher Boessenkool @ 2017-08-03 17:23 ` Michael Meissner 2017-08-07 13:18 ` Michael Meissner 1 sibling, 0 replies; 16+ messages in thread From: Michael Meissner @ 2017-08-03 17:23 UTC (permalink / raw) To: Segher Boessenkool Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote: > > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote: > > > I think calling this with the rtx elementN args makes this only more > > > complicated (the function comment doesn't say what they are or what > > > NULL means, btw). > > You didn't handle the first part of this as far as I see? It's the > big complicating issue here. I am not sure exactly what you are asking for. This is like the other output functions that take the rtx insns. > > + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is > > + non-NULL, it is a 0 or 1 constant that gives the vector element number to > > + use for extracting the 64-bit double word from ARG1. > > + > > + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is > > + non-NULL, it is a 0 or 1 constant that gives the vector element number to > > + use for extracting the 64-bit double word from ARG2. > > + > > + The element number is based on the user element ordering, set by the > > + endianess and by the -maltivec={le,be} options. */ > > ("endianness", two n's). > > I don't like using NULL as a magic value at all; it does not simplify > this interface, it complicates it instead. > > Can you move the "which half is high" decision to the callers? And then essentially there is no need for the function, since each of the 4 concat variants have to have the logic to support big endian, -maltivec=be, and -maltivec=le. Let me see what I can do about it. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-08-03 15:01 ` Segher Boessenkool 2017-08-03 17:23 ` Michael Meissner @ 2017-08-07 13:18 ` Michael Meissner 2017-08-07 21:54 ` Segher Boessenkool 1 sibling, 1 reply; 16+ messages in thread From: Michael Meissner @ 2017-08-07 13:18 UTC (permalink / raw) To: Segher Boessenkool Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt [-- Attachment #1: Type: text/plain, Size: 3765 bytes --] On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote: > Hi Mike, > > On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote: > > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote: > > > I think calling this with the rtx elementN args makes this only more > > > complicated (the function comment doesn't say what they are or what > > > NULL means, btw). > > You didn't handle the first part of this as far as I see? It's the > big complicating issue here. > > > + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is > > + non-NULL, it is a 0 or 1 constant that gives the vector element number to > > + use for extracting the 64-bit double word from ARG1. > > + > > + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is > > + non-NULL, it is a 0 or 1 constant that gives the vector element number to > > + use for extracting the 64-bit double word from ARG2. > > + > > + The element number is based on the user element ordering, set by the > > + endianess and by the -maltivec={le,be} options. */ > > ("endianness", two n's). > > I don't like using NULL as a magic value at all; it does not simplify > this interface, it complicates it instead. > > Can you move the "which half is high" decision to the callers? I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do the calculation of the XXPERMDI mask in each of the vsx_concat_<mask>_{1,2,3} insns. Just to be sure I got things correct, I wrote a new executable test that tests various methods of creating/inserting 2 element vectors with double word elements, and tested in BE, LE -maltivec=be, and LE, and the results match previous compilers. I have done bootstrap/build checks on a big endian power7, a little endian power8 system, and I have done a non-bootstrap/check on a power9 prototype (I have script issues that prevents a bootstrap build on power9 that I need to look into). There are no regressions in the tests and the new tests were run on each of the systems. Can I check this into the trunk? I would also like to backport it to all open branches (particularly GCC 7, but GCC 6 if possible). Note, the patch will need a slight tweak on the older systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to adjust the constraints to accomidate this, and under GCC 6 DImode not being allowed in traditional Altivec registers. [gcc] 2017-08-07 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/81593 * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D): Cleanup constraints since the -mupper-regs-* switches have been eliminated. (vsx_concat_<mode>_1): New combiner insns to recognize inserting into a vector from a double word element that was extracted from another vector, and eliminate extra XXPERMDI instructions. (vsx_concat_<mode>_2): Likewise. (vsx_concat_<mode>_3): Likewise. (vsx_set_<mode>, VSX_D): Rewrite vector set in terms of vector concat to allow optimizing inserts from previous extracts. [gcc/testsuite] 2017-08-07 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/81593 * gcc.target/powerpc/vec-setup.h: New tests to test various combinations of setting up vectors of 2 double word elements. * gcc.target/powerpc/vec-setup-long.c: Likewise. * gcc.target/powerpc/vec-setup-double.c: Likewise. * gcc.target/powerpc/vec-setup-be-long.c: Likewise. * gcc.target/powerpc/vec-setup-be-double.c: Likewise. * gcc.target/powerpc/vsx-extract-6.c: New tests for optimzing vector inserts from vector extracts. * gcc.target/powerpc/vsx-extract-7.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 [-- Attachment #2: pr81593.patch05b --] [-- Type: text/plain, Size: 21249 bytes --] Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250858) +++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy) @@ -2364,10 +2364,10 @@ (define_insn "*vsx_float_fix_v2df2" ;; Build a V2DF/V2DI vector from two scalars (define_insn "vsx_concat_<mode>" - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we") + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we") (vec_concat:VSX_D - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b") - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))] + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b") + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))] "VECTOR_MEM_VSX_P (<MODE>mode)" { if (which_alternative == 0) @@ -2385,6 +2385,80 @@ (define_insn "vsx_concat_<mode>" } [(set_attr "type" "vecperm")]) +;; Combiner patterns to allow creating XXPERMDI's to access either double +;; word element in a vector register. +(define_insn "*vsx_concat_<mode>_1" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (vec_select:<VS_scalar> + (match_operand:VSX_D 1 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n")])) + (match_operand:<VS_scalar> 3 "gpc_reg_operand" "wa")))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + HOST_WIDE_INT dword = INTVAL (operands[2]); + if (BYTES_BIG_ENDIAN) + { + operands[4] = GEN_INT (2*dword); + return "xxpermdi %x0,%x1,%x3,%4"; + } + else + { + operands[4] = GEN_INT (!dword); + return "xxpermdi %x0,%x3,%x1,%4"; + } +} + [(set_attr "type" "vecperm")]) + +(define_insn "*vsx_concat_<mode>_2" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa") + (vec_select:<VS_scalar> + (match_operand:VSX_D 2 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + HOST_WIDE_INT dword = INTVAL (operands[3]); + if (BYTES_BIG_ENDIAN) + { + operands[4] = GEN_INT (dword); + return "xxpermdi %x0,%x1,%x2,%4"; + } + else + { + operands[4] = GEN_INT (2 * !dword); + return "xxpermdi %x0,%x2,%x1,%4"; + } +} + [(set_attr "type" "vecperm")]) + +(define_insn "*vsx_concat_<mode>_3" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_concat:VSX_D + (vec_select:<VS_scalar> + (match_operand:VSX_D 1 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n")])) + (vec_select:<VS_scalar> + (match_operand:VSX_D 3 "gpc_reg_operand" "wa") + (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))] + "VECTOR_MEM_VSX_P (<MODE>mode)" +{ + HOST_WIDE_INT dword1 = INTVAL (operands[2]); + HOST_WIDE_INT dword2 = INTVAL (operands[4]); + if (BYTES_BIG_ENDIAN) + { + operands[5] = GEN_INT ((2 * dword1) + dword2); + return "xxpermdi %x0,%x1,%x3,%5"; + } + else + { + operands[5] = GEN_INT ((2 * !dword2) + !dword1); + return "xxpermdi %x0,%x3,%x1,%5"; + } +} + [(set_attr "type" "vecperm")]) + ;; Special purpose concat using xxpermdi to glue two single precision values ;; together, relying on the fact that internally scalar floats are represented ;; as doubles. This is used to initialize a V4SF vector with 4 floats @@ -2585,25 +2659,35 @@ (define_expand "vsx_set_v1ti" DONE; }) -;; Set the element of a V2DI/VD2F mode -(define_insn "vsx_set_<mode>" - [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>") - (unspec:VSX_D - [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>") - (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>") - (match_operand:QI 3 "u5bit_cint_operand" "i,i")] - UNSPEC_VSX_SET))] +;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT +(define_expand "vsx_set_<mode>" + [(use (match_operand:VSX_D 0 "vsx_register_operand")) + (use (match_operand:VSX_D 1 "vsx_register_operand")) + (use (match_operand:<VS_scalar> 2 "gpc_reg_operand")) + (use (match_operand:QI 3 "const_0_to_1_operand"))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - int idx_first = BYTES_BIG_ENDIAN ? 0 : 1; - if (INTVAL (operands[3]) == idx_first) - return \"xxpermdi %x0,%x2,%x1,1\"; - else if (INTVAL (operands[3]) == 1 - idx_first) - return \"xxpermdi %x0,%x1,%x2,0\"; + rtx dest = operands[0]; + rtx vec_reg = operands[1]; + rtx value = operands[2]; + rtx ele = operands[3]; + rtx tmp = gen_reg_rtx (<VS_scalar>mode); + + if (ele == const0_rtx) + { + emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx)); + emit_insn (gen_vsx_concat_<mode> (dest, value, tmp)); + DONE; + } + else if (ele == const1_rtx) + { + emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx)); + emit_insn (gen_vsx_concat_<mode> (dest, tmp, value)); + DONE; + } else gcc_unreachable (); -} - [(set_attr "type" "vecperm")]) +}) ;; Extract a DF/DI element from V2DF/V2DI ;; Optimize cases were we can do a simple or direct move. Index: gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250878) @@ -0,0 +1,11 @@ +/* { dg-do run { target { powerpc64le*-*-linux* } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-O2 -mvsx -maltivec=be" } */ + +/* Test various ways of creating vectors with 2 double words and accessing the + elements. This test uses the long (on 64-bit systems) or long long datatype + (on 32-bit systems). + + This test explicitly tests -maltivec=be to make sure things are correct. */ + +#include "vec-setup.h" Index: gcc/testsuite/gcc.target/powerpc/vec-setup.h =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-setup.h (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-setup.h (.../gcc/testsuite/gcc.target/powerpc) (revision 250878) @@ -0,0 +1,366 @@ +#include <altivec.h> + +/* Test various ways of creating vectors with 2 double words and accessing the + elements. This include files supports: + + testing double + testing long on 64-bit systems + testing long long on 32-bit systems. + + The endian support is: + + big endian + little endian with little endian element ordering + little endian with big endian element ordering. */ + +#ifdef DEBUG +#include <stdio.h> +#define DEBUG0(STR) fputs (STR, stdout) +#define DEBUG2(STR,A,B) printf (STR, A, B) + +static int errors = 0; + +#else +#include <stdlib.h> +#define DEBUG0(STR) +#define DEBUG2(STR,A,B) +#endif + +#if defined(DO_DOUBLE) +#define TYPE double +#define STYPE "double" +#define ZERO 0.0 +#define ONE 1.0 +#define TWO 2.0 +#define THREE 3.0 +#define FOUR 4.0 +#define FIVE 5.0 +#define SIX 6.0 +#define FMT "g" + +#elif defined(_ARCH_PPC64) +#define TYPE long +#define STYPE "long" +#define ZERO 0L +#define ONE 1L +#define TWO 2L +#define THREE 3L +#define FOUR 4L +#define FIVE 5L +#define SIX 6L +#define FMT "ld" + +#else +#define TYPE long long +#define STYPE "long long" +#define ZERO 0LL +#define ONE 1LL +#define TWO 2LL +#define THREE 3LL +#define FOUR 4LL +#define FIVE 5LL +#define SIX 6LL +#define FMT "lld" +#endif + +/* Macros to order the left/right values correctly. Note, -maltivec=be does + not change the order for static initializations, so we have to handle it + specially. */ + +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ +#define INIT_ORDER(A, B) (TYPE) A, (TYPE) B +#define ELEMENT_ORDER(A, B) (TYPE) A, (TYPE) B +#define ENDIAN "-mbig" + +#elif __VEC_ELEMENT_REG_ORDER__ == __ORDER_BIG_ENDIAN__ +#define NO_ARRAY +#define INIT_ORDER(A, B) (TYPE) B, (TYPE) A +#define ELEMENT_ORDER(A, B) (TYPE) A, (TYPE) B +#define ENDIAN "-mlittle -maltivec=be" + +#else +#define INIT_ORDER(A, B) (TYPE) B, (TYPE) A +#define ELEMENT_ORDER(A, B) (TYPE) B, (TYPE) A +#define ENDIAN "-mlittle" +#endif + +static volatile TYPE five = FIVE; +static volatile TYPE six = SIX; +static volatile vector TYPE s_v12 = { ONE, TWO }; +static volatile vector TYPE g_v34 = { THREE, FOUR }; + + +__attribute__((__noinline__)) +static void +vector_check (vector TYPE v, TYPE expect_hi, TYPE expect_lo) +{ + TYPE actual_hi, actual_lo; +#ifdef DEBUG + const char *pass_fail; +#endif + + __asm__ ("xxlor %x0,%x1,%x1" : "=&wa" (actual_hi) : "wa" (v)); + __asm__ ("xxpermdi %x0,%x1,%x1,3" : "=&wa" (actual_lo) : "wa" (v)); + +#ifdef DEBUG + if ((actual_hi == expect_hi) && (actual_lo == expect_lo)) + pass_fail = ", pass"; + else + { + pass_fail = ", fail"; + errors++; + } + + printf ("Expected %" FMT ", %" FMT ", got %" FMT ", %" FMT "%s\n", + expect_hi, expect_lo, + actual_hi, actual_lo, + pass_fail); +#else + if ((actual_hi != expect_hi) || (actual_lo != expect_lo)) + abort (); +#endif +} + +__attribute__((__noinline__)) +static vector TYPE +combine (TYPE op0, TYPE op1) +{ + return (vector TYPE) { op0, op1 }; +} + +__attribute__((__noinline__)) +static vector TYPE +combine_insert (TYPE op0, TYPE op1) +{ + vector TYPE ret = (vector TYPE) { ZERO, ZERO }; + ret = vec_insert (op0, ret, 0); + ret = vec_insert (op1, ret, 1); + return ret; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract_00 (vector TYPE a, vector TYPE b) +{ + return (vector TYPE) { vec_extract (a, 0), vec_extract (b, 0) }; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract_01 (vector TYPE a, vector TYPE b) +{ + return (vector TYPE) { vec_extract (a, 0), vec_extract (b, 1) }; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract_10 (vector TYPE a, vector TYPE b) +{ + return (vector TYPE) { vec_extract (a, 1), vec_extract (b, 0) }; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract_11 (vector TYPE a, vector TYPE b) +{ + return (vector TYPE) { vec_extract (a, 1), vec_extract (b, 1) }; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract2_0s (vector TYPE a, TYPE b) +{ + return (vector TYPE) { vec_extract (a, 0), b }; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract2_1s (vector TYPE a, TYPE b) +{ + return (vector TYPE) { vec_extract (a, 1), b }; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract2_s0 (TYPE a, vector TYPE b) +{ + return (vector TYPE) { a, vec_extract (b, 0) }; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract2_s1 (TYPE a, vector TYPE b) +{ + return (vector TYPE) { a, vec_extract (b, 1) }; +} + +__attribute__((__noinline__)) +static vector TYPE +concat_extract_nn (vector TYPE a, vector TYPE b, size_t i, size_t j) +{ + return (vector TYPE) { vec_extract (a, i), vec_extract (b, j) }; +} + +#ifndef NO_ARRAY +__attribute__((__noinline__)) +static vector TYPE +array_0 (vector TYPE v, TYPE a) +{ + v[0] = a; + return v; +} + +__attribute__((__noinline__)) +static vector TYPE +array_1 (vector TYPE v, TYPE a) +{ + v[1] = a; + return v; +} + +__attribute__((__noinline__)) +static vector TYPE +array_01 (vector TYPE v, TYPE a, TYPE b) +{ + v[0] = a; + v[1] = b; + return v; +} + +__attribute__((__noinline__)) +static vector TYPE +array_01b (TYPE a, TYPE b) +{ + vector TYPE v = (vector TYPE) { 0, 0 }; + v[0] = a; + v[1] = b; + return v; +} +#endif + +int +main (void) +{ + vector TYPE a = (vector TYPE) { ONE, TWO }; + vector TYPE b = (vector TYPE) { THREE, FOUR }; + size_t i, j; + +#ifndef NO_ARRAY + vector TYPE z = (vector TYPE) { ZERO, ZERO }; +#endif + + DEBUG2 ("Endian: %s, type: %s\n", ENDIAN, STYPE); + DEBUG0 ("\nStatic/global initialization\n"); + vector_check (s_v12, INIT_ORDER (1, 2)); + vector_check (g_v34, INIT_ORDER (3, 4)); + + DEBUG0 ("\nVector via constant runtime intiialization\n"); + vector_check (a, INIT_ORDER (1, 2)); + vector_check (b, INIT_ORDER (3, 4)); + + DEBUG0 ("\nCombine scalars using vector initialization\n"); + vector_check (combine (1, 2), INIT_ORDER (1, 2)); + vector_check (combine (3, 4), INIT_ORDER (3, 4)); + + DEBUG0 ("\nSetup with vec_insert\n"); + a = combine_insert (1, 2); + b = combine_insert (3, 4); + vector_check (a, ELEMENT_ORDER (1, 2)); + vector_check (b, ELEMENT_ORDER (3, 4)); + +#ifndef NO_ARRAY + DEBUG0 ("\nTesting array syntax\n"); + vector_check (array_0 (a, FIVE), ELEMENT_ORDER (5, 2)); + vector_check (array_1 (b, SIX), ELEMENT_ORDER (3, 6)); + vector_check (array_01 (z, FIVE, SIX), ELEMENT_ORDER (5, 6)); + vector_check (array_01b (FIVE, SIX), ELEMENT_ORDER (5, 6)); + + vector_check (array_0 (a, five), ELEMENT_ORDER (5, 2)); + vector_check (array_1 (b, six), ELEMENT_ORDER (3, 6)); + vector_check (array_01 (z, five, six), ELEMENT_ORDER (5, 6)); + vector_check (array_01b (five, six), ELEMENT_ORDER (5, 6)); +#else + DEBUG0 ("\nSkipping array syntax on -maltivec=be\n"); +#endif + + DEBUG0 ("\nTesting concat and extract\n"); + vector_check (concat_extract_00 (a, b), INIT_ORDER (1, 3)); + vector_check (concat_extract_01 (a, b), INIT_ORDER (1, 4)); + vector_check (concat_extract_10 (a, b), INIT_ORDER (2, 3)); + vector_check (concat_extract_11 (a, b), INIT_ORDER (2, 4)); + + DEBUG0 ("\nTesting concat and extract #2\n"); + vector_check (concat_extract2_0s (a, FIVE), INIT_ORDER (1, 5)); + vector_check (concat_extract2_1s (a, FIVE), INIT_ORDER (2, 5)); + vector_check (concat_extract2_s0 (SIX, a), INIT_ORDER (6, 1)); + vector_check (concat_extract2_s1 (SIX, a), INIT_ORDER (6, 2)); + + DEBUG0 ("\nTesting variable concat and extract\n"); + for (i = 0; i < 2; i++) + { + for (j = 0; j < 2; j++) + { + static struct { + TYPE hi; + TYPE lo; + } hilo[2][2] = + { { { ONE, THREE }, { ONE, FOUR } }, + { { TWO, THREE }, { TWO, FOUR } } }; + + vector_check (concat_extract_nn (a, b, i, j), + INIT_ORDER (hilo[i][j].hi, hilo[i][j].lo)); + } + } + + DEBUG0 ("\nTesting separate function\n"); + vector_check (combine (vec_extract (a, 0), vec_extract (b, 0)), + INIT_ORDER (1, 3)); + + vector_check (combine (vec_extract (a, 0), vec_extract (b, 1)), + INIT_ORDER (1, 4)); + + vector_check (combine (vec_extract (a, 1), vec_extract (b, 0)), + INIT_ORDER (2, 3)); + + vector_check (combine (vec_extract (a, 1), vec_extract (b, 1)), + INIT_ORDER (2, 4)); + + vector_check (combine_insert (vec_extract (a, 0), vec_extract (b, 0)), + ELEMENT_ORDER (1, 3)); + + vector_check (combine_insert (vec_extract (a, 0), vec_extract (b, 1)), + ELEMENT_ORDER (1, 4)); + + vector_check (combine_insert (vec_extract (a, 1), vec_extract (b, 0)), + ELEMENT_ORDER (2, 3)); + + vector_check (combine_insert (vec_extract (a, 1), vec_extract (b, 1)), + ELEMENT_ORDER (2, 4)); + + +#if defined(DO_DOUBLE) + DEBUG0 ("\nTesting explicit 2df concat\n"); + vector_check (__builtin_vsx_concat_2df (FIVE, SIX), INIT_ORDER (5, 6)); + vector_check (__builtin_vsx_concat_2df (five, six), INIT_ORDER (5, 6)); + +#elif defined(_ARCH_PPC64) + DEBUG0 ("\nTesting explicit 2di concat\n"); + vector_check (__builtin_vsx_concat_2di (FIVE, SIX), INIT_ORDER (5, 6)); + vector_check (__builtin_vsx_concat_2di (five, six), INIT_ORDER (5, 6)); + +#else + DEBUG0 ("\nSkip explicit 2di concat on 32-bit\n"); +#endif + +#ifdef DEBUG + if (errors) + printf ("\n%d error%s were found", errors, (errors == 1) ? "" : "s"); + else + printf ("\nNo errors were found.\n"); + + return errors; + +#else + return 0; +#endif +} Index: gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250878) @@ -0,0 +1,12 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-O2 -mvsx" } */ + +/* Test various ways of creating vectors with 2 double words and accessing the + elements. This test uses the double datatype. + + This test explicitly tests -maltivec=be to make sure things are correct. */ + +#define DO_DOUBLE + +#include "vec-setup.h" Index: gcc/testsuite/gcc.target/powerpc/vec-setup-double.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-setup-double.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-setup-double.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250878) @@ -0,0 +1,11 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-O2 -mvsx" } */ + +/* Test various ways of creating vectors with 2 double words and accessing the + elements. This test uses the double datatype and the default endian + order. */ + +#define DO_DOUBLE + +#include "vec-setup.h" Index: gcc/testsuite/gcc.target/powerpc/vec-setup-long.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-setup-long.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-setup-long.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250878) @@ -0,0 +1,9 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-O2 -mvsx" } */ + +/* Test various ways of creating vectors with 2 double words and accessing the + elements. This test uses the long (on 64-bit systems) or long long datatype + (on 32-bit systems). The default endian order is used. */ + +#include "vec-setup.h" Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250858) @@ -0,0 +1,25 @@ +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +vector unsigned long +test_vpasted (vector unsigned long high, vector unsigned long low) +{ + vector unsigned long res; + res[1] = high[1]; + res[0] = low[0]; + return res; +} + +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mvspltisw\M} } } */ +/* { dg-final { scan-assembler-not {\mxxlor\M} } } */ +/* { dg-final { scan-assembler-not {\mxxlxor\M} } } */ +/* { dg-final { scan-assembler-not {\mxxspltib\M} } } */ +/* { dg-final { scan-assembler-not {\mlxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mlxv[dw][24]x\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mstxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mstxv[dw][24]x\M} } } */ +/* { dg-final { scan-assembler-not {\mstvx\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250858) @@ -0,0 +1,25 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +vector double +test_vpasted (vector double high, vector double low) +{ + vector double res; + res[1] = high[1]; + res[0] = low[0]; + return res; +} + +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mvspltisw\M} } } */ +/* { dg-final { scan-assembler-not {\mxxlor\M} } } */ +/* { dg-final { scan-assembler-not {\mxxlxor\M} } } */ +/* { dg-final { scan-assembler-not {\mxxspltib\M} } } */ +/* { dg-final { scan-assembler-not {\mlxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mlxv[dw][24]x\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mstxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mstxv[dw][24]x\M} } } */ +/* { dg-final { scan-assembler-not {\mstvx\M} } } */ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts 2017-08-07 13:18 ` Michael Meissner @ 2017-08-07 21:54 ` Segher Boessenkool 0 siblings, 0 replies; 16+ messages in thread From: Segher Boessenkool @ 2017-08-07 21:54 UTC (permalink / raw) To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt On Mon, Aug 07, 2017 at 09:18:30AM -0400, Michael Meissner wrote: > > I don't like using NULL as a magic value at all; it does not simplify > > this interface, it complicates it instead. > > > > Can you move the "which half is high" decision to the callers? > > I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do > the calculation of the XXPERMDI mask in each of the vsx_concat_<mask>_{1,2,3} > insns. Just to be sure I got things correct, I wrote a new executable test > that tests various methods of creating/inserting 2 element vectors with double > word elements, and tested in BE, LE -maltivec=be, and LE, and the results match > previous compilers. > > I have done bootstrap/build checks on a big endian power7, a little endian > power8 system, and I have done a non-bootstrap/check on a power9 prototype (I > have script issues that prevents a bootstrap build on power9 that I need to > look into). There are no regressions in the tests and the new tests were run > on each of the systems. Can I check this into the trunk? > > I would also like to backport it to all open branches (particularly GCC 7, but > GCC 6 if possible). Note, the patch will need a slight tweak on the older > systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to adjust > the constraints to accomidate this, and under GCC 6 DImode not being allowed in > traditional Altivec registers. Thanks! Okay for trunk. The 7 branch is frozen; okay for 7 after the release, and for 6 too. Segher ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-08-07 21:54 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner 2017-07-28 7:51 ` Richard Biener 2017-07-28 8:02 ` Andrew Pinski 2017-07-28 8:20 ` Richard Biener 2017-07-28 8:42 ` Marc Glisse 2017-07-28 14:44 ` Michael Meissner 2017-07-28 21:09 ` Segher Boessenkool 2017-07-30 14:01 ` Bill Schmidt 2017-07-31 17:41 ` Michael Meissner 2017-07-31 17:42 ` Bill Schmidt 2017-07-31 17:38 ` Michael Meissner 2017-08-02 14:29 ` Michael Meissner 2017-08-03 15:01 ` Segher Boessenkool 2017-08-03 17:23 ` Michael Meissner 2017-08-07 13:18 ` Michael Meissner 2017-08-07 21:54 ` Segher Boessenkool
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).