* [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
@ 2017-07-27 23:21 Michael Meissner
2017-07-28 7:51 ` Richard Biener
2017-07-28 21:09 ` Segher Boessenkool
0 siblings, 2 replies; 16+ messages in thread
From: Michael Meissner @ 2017-07-27 23:21 UTC (permalink / raw)
To: GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt
[-- Attachment #1: Type: text/plain, Size: 2105 bytes --]
This patches optimizes the PowerPC vector set operation for 64-bit doubles and
longs where the elements in the vector set may have been extracted from another
vector (PR target/81593):
Here an an example:
vector double
test_vpasted (vector double high, vector double low)
{
vector double res;
res[1] = high[1];
res[0] = low[0];
return res;
}
Previously it would generate:
xxpermdi 12,34,34,2
vspltisw 2,0
xxlor 0,35,35
xxpermdi 34,34,12,0
xxpermdi 34,0,34,1
and with these patches, it now generates:
xxpermdi 34,35,34,1
I have tested it on a little endian power8 system and a big endian power7
system with the usual bootstrap and make checks with no regressions. Can I
check this into the trunk?
I also built Spec 2006 with the compiler, and saw no changes in the code
generated. This isn't surprising because it isn't something that auto
vectorization might generate by default.
[gcc]
2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/81593
* config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
declaration.
* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
emit XXPERMDI accessing either double word in either vector
register inputs.
* config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify
the constraints with the removal of the -mupper-regs-* switches.
(vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
where either register might have come from VEC_SELECT.
(vsx_concat_<mode>_2): Likewise.
(vsx_concat_<mode>_3): Likewise.
(vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
VEC_CONCAT rather than use an UNSPEC to specify the option.
[gcc/testsuite]
2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/81593
* gcc.target/powerpc/vsx-extract-6.c: New test.
* gcc.target/powerpc/vsx-extract-7.c: Likewise.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[-- Attachment #2: pr81593.patch01b --]
[-- Type: text/plain, Size: 8484 bytes --]
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000-protos.h) (revision 250577)
+++ gcc/config/rs6000/rs6000-protos.h (.../gcc/config/rs6000/rs6000-protos.h) (working copy)
@@ -233,6 +233,7 @@ extern void rs6000_asm_output_dwarf_pcre
const char *label);
extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
const char *label);
+extern const char *rs6000_emit_xxpermdi (rtx[], rtx, rtx);
/* Declare functions in rs6000-c.c */
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000.c) (revision 250577)
+++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000/rs6000.c) (working copy)
@@ -39167,6 +39167,38 @@ rs6000_optab_supported_p (int op, machin
return true;
}
}
+
+\f
+/* Emit a XXPERMDI instruction that can extract from either double word of the
+ two arguments. ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
+ which double word to be used for the operand. */
+
+const char *
+rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
+{
+ int op1_dword = (!element1) ? 0 : INTVAL (element1);
+ int op2_dword = (!element2) ? 0 : INTVAL (element2);
+
+ gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
+
+ if (BYTES_BIG_ENDIAN)
+ {
+ operands[3] = GEN_INT (2*op1_dword + op2_dword);
+ return "xxpermdi %x0,%x1,%x2,%3";
+ }
+ else
+ {
+ if (element1)
+ op1_dword = 1 - op1_dword;
+
+ if (element2)
+ op2_dword = 1 - op2_dword;
+
+ operands[3] = GEN_INT (op1_dword + 2*op2_dword);
+ return "xxpermdi %x0,%x2,%x1,%3";
+ }
+}
+
\f
struct gcc_target targetm = TARGET_INITIALIZER;
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/vsx.md) (revision 250577)
+++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000/vsx.md) (working copy)
@@ -2366,19 +2366,17 @@ (define_insn "*vsx_float_fix_v2df2"
;; Build a V2DF/V2DI vector from two scalars
(define_insn "vsx_concat_<mode>"
- [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
(vec_concat:VSX_D
- (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
- (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
+ (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
+ (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
if (which_alternative == 0)
- return (BYTES_BIG_ENDIAN
- ? "xxpermdi %x0,%x1,%x2,0"
- : "xxpermdi %x0,%x2,%x1,0");
+ return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
else if (which_alternative == 1)
- return (BYTES_BIG_ENDIAN
+ return (VECTOR_ELT_ORDER_BIG
? "mtvsrdd %x0,%1,%2"
: "mtvsrdd %x0,%2,%1");
@@ -2387,6 +2385,47 @@ (define_insn "vsx_concat_<mode>"
}
[(set_attr "type" "vecperm")])
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; register in a vector register. Note, rs6000_emit_xxpermdi expects
+;; operands[0..2] to be the vector registers.
+(define_insn "*vsx_concat_<mode>_1"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+ (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa")))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ return rs6000_emit_xxpermdi (operands, operands[3], NULL_RTX);
+})
+
+(define_insn "*vsx_concat_<mode>_2"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa")
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ return rs6000_emit_xxpermdi (operands, NULL_RTX, operands[3]);
+})
+
+(define_insn "*vsx_concat_<mode>_3"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ return rs6000_emit_xxpermdi (operands, operands[3], operands[4]);
+})
+
;; Special purpose concat using xxpermdi to glue two single precision values
;; together, relying on the fact that internally scalar floats are represented
;; as doubles. This is used to initialize a V4SF vector with 4 floats
@@ -2587,25 +2626,35 @@ (define_expand "vsx_set_v1ti"
DONE;
})
-;; Set the element of a V2DI/VD2F mode
-(define_insn "vsx_set_<mode>"
- [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>")
- (unspec:VSX_D
- [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>")
- (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")
- (match_operand:QI 3 "u5bit_cint_operand" "i,i")]
- UNSPEC_VSX_SET))]
+;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT
+(define_expand "vsx_set_<mode>"
+ [(use (match_operand:VSX_D 0 "vsx_register_operand"))
+ (use (match_operand:VSX_D 1 "vsx_register_operand"))
+ (use (match_operand:<VS_scalar> 2 "gpc_reg_operand"))
+ (use (match_operand:QI 3 "const_0_to_1_operand"))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
- int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
- if (INTVAL (operands[3]) == idx_first)
- return \"xxpermdi %x0,%x2,%x1,1\";
- else if (INTVAL (operands[3]) == 1 - idx_first)
- return \"xxpermdi %x0,%x1,%x2,0\";
+ rtx dest = operands[0];
+ rtx vec_reg = operands[1];
+ rtx value = operands[2];
+ rtx ele = operands[3];
+ rtx tmp = gen_reg_rtx (<VS_scalar>mode);
+
+ if (ele == const0_rtx)
+ {
+ emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx));
+ emit_insn (gen_vsx_concat_<mode> (dest, value, tmp));
+ DONE;
+ }
+ else if (ele == const1_rtx)
+ {
+ emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx));
+ emit_insn (gen_vsx_concat_<mode> (dest, tmp, value));
+ DONE;
+ }
else
gcc_unreachable ();
-}
- [(set_attr "type" "vecperm")])
+})
;; Extract a DF/DI element from V2DF/V2DI
;; Optimize cases were we can do a simple or direct move.
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c) (revision 250640)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector unsigned long
+test_vpasted (vector unsigned long high, vector unsigned long low)
+{
+ vector unsigned long res;
+ res[1] = high[1];
+ res[0] = low[0];
+ return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 250640)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector double
+test_vpasted (vector double high, vector double low)
+{
+ vector double res;
+ res[1] = high[1];
+ res[0] = low[0];
+ return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner
@ 2017-07-28 7:51 ` Richard Biener
2017-07-28 8:02 ` Andrew Pinski
2017-07-28 14:44 ` Michael Meissner
2017-07-28 21:09 ` Segher Boessenkool
1 sibling, 2 replies; 16+ messages in thread
From: Richard Biener @ 2017-07-28 7:51 UTC (permalink / raw)
To: Michael Meissner, GCC Patches, Segher Boessenkool,
David Edelsohn, Bill Schmidt
On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from another
> vector (PR target/81593):
>
> Here an an example:
>
> vector double
> test_vpasted (vector double high, vector double low)
> {
> vector double res;
> res[1] = high[1];
> res[0] = low[0];
> return res;
> }
Interesting. We expand from
<bb 2> [100.00%] [count: INV]:
_1 = BIT_FIELD_REF <high_4(D), 64, 64>;
res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
_2 = BIT_FIELD_REF <low_7(D), 64, 0>;
res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
return res_8;
but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap
pass looks like the canonical pass for this even though it's quite awkward
to fill this in.
So a match.pd rule would work as well here - your ppc backend patterns
are v2df specific, right?
> Previously it would generate:
>
> xxpermdi 12,34,34,2
> vspltisw 2,0
> xxlor 0,35,35
> xxpermdi 34,34,12,0
> xxpermdi 34,0,34,1
>
> and with these patches, it now generates:
>
> xxpermdi 34,35,34,1
>
> I have tested it on a little endian power8 system and a big endian power7
> system with the usual bootstrap and make checks with no regressions. Can I
> check this into the trunk?
>
> I also built Spec 2006 with the compiler, and saw no changes in the code
> generated. This isn't surprising because it isn't something that auto
> vectorization might generate by default.
>
> [gcc]
> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com>
>
> PR target/81593
> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
> declaration.
> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> emit XXPERMDI accessing either double word in either vector
> register inputs.
> * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify
> the constraints with the removal of the -mupper-regs-* switches.
> (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
> where either register might have come from VEC_SELECT.
> (vsx_concat_<mode>_2): Likewise.
> (vsx_concat_<mode>_3): Likewise.
> (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
> VEC_CONCAT rather than use an UNSPEC to specify the option.
>
> [gcc/testsuite]
> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com>
>
> PR target/81593
> * gcc.target/powerpc/vsx-extract-6.c: New test.
> * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>
> --
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-28 7:51 ` Richard Biener
@ 2017-07-28 8:02 ` Andrew Pinski
2017-07-28 8:20 ` Richard Biener
2017-07-28 8:42 ` Marc Glisse
2017-07-28 14:44 ` Michael Meissner
1 sibling, 2 replies; 16+ messages in thread
From: Andrew Pinski @ 2017-07-28 8:02 UTC (permalink / raw)
To: Richard Biener
Cc: Michael Meissner, GCC Patches, Segher Boessenkool,
David Edelsohn, Bill Schmidt
On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>> longs where the elements in the vector set may have been extracted from another
>> vector (PR target/81593):
>>
>> Here an an example:
>>
>> vector double
>> test_vpasted (vector double high, vector double low)
>> {
>> vector double res;
>> res[1] = high[1];
>> res[0] = low[0];
>> return res;
>> }
>
> Interesting. We expand from
>
> <bb 2> [100.00%] [count: INV]:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
> return res_8;
>
> but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap
> pass looks like the canonical pass for this even though it's quite awkward
> to fill this in.
I was thinking about this exactly. Though for the scale use of
BIT_INSERT_EXPR/BIT_FIELD_REF.
I have a case where someone writes (this shows up in GCC too):
a->b = c->b;
a->d = c->d;
a->e = c->e;
a->f = c->f;
a->g = c->g;
a->h = c->h;
Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have:
_1 = BIT_FIELD_REF <a.1_3, 2, 0>;
_8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>;
_2 = BIT_FIELD_REF <a.1_3, 4, 2>;
_9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;
....
For the vector case, can't we write it as:
_1 = BIT_FIELD_REF <high_4(D), 64, 64>;
_2 = BIT_FIELD_REF <low_7(D), 64, 0>;
res_8 = {_1, _2};
And then have some match.pd patterns (which might get complex), to
rewrite that into VEC_PERM_EXPR?
The reason why I ask that is because say someone who wrote:
vector double
test_vpasted (vector double high, vector double low)
{
vector double res = { high[1], low[0] };
return res;
}
Thanks,
Andrew Pinski
>
> So a match.pd rule would work as well here - your ppc backend patterns
> are v2df specific, right?
>
>> Previously it would generate:
>>
>> xxpermdi 12,34,34,2
>> vspltisw 2,0
>> xxlor 0,35,35
>> xxpermdi 34,34,12,0
>> xxpermdi 34,0,34,1
>>
>> and with these patches, it now generates:
>>
>> xxpermdi 34,35,34,1
>>
>> I have tested it on a little endian power8 system and a big endian power7
>> system with the usual bootstrap and make checks with no regressions. Can I
>> check this into the trunk?
>>
>> I also built Spec 2006 with the compiler, and saw no changes in the code
>> generated. This isn't surprising because it isn't something that auto
>> vectorization might generate by default.
>>
>> [gcc]
>> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com>
>>
>> PR target/81593
>> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>> declaration.
>> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>> emit XXPERMDI accessing either double word in either vector
>> register inputs.
>> * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify
>> the constraints with the removal of the -mupper-regs-* switches.
>> (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>> where either register might have come from VEC_SELECT.
>> (vsx_concat_<mode>_2): Likewise.
>> (vsx_concat_<mode>_3): Likewise.
>> (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>> VEC_CONCAT rather than use an UNSPEC to specify the option.
>>
>> [gcc/testsuite]
>> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com>
>>
>> PR target/81593
>> * gcc.target/powerpc/vsx-extract-6.c: New test.
>> * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>
>> --
>> Michael Meissner, IBM
>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-28 8:02 ` Andrew Pinski
@ 2017-07-28 8:20 ` Richard Biener
2017-07-28 8:42 ` Marc Glisse
1 sibling, 0 replies; 16+ messages in thread
From: Richard Biener @ 2017-07-28 8:20 UTC (permalink / raw)
To: Andrew Pinski
Cc: Michael Meissner, GCC Patches, Segher Boessenkool,
David Edelsohn, Bill Schmidt
On Fri, Jul 28, 2017 at 10:02 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
>> <meissner@linux.vnet.ibm.com> wrote:
>>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>>> longs where the elements in the vector set may have been extracted from another
>>> vector (PR target/81593):
>>>
>>> Here an an example:
>>>
>>> vector double
>>> test_vpasted (vector double high, vector double low)
>>> {
>>> vector double res;
>>> res[1] = high[1];
>>> res[0] = low[0];
>>> return res;
>>> }
>>
>> Interesting. We expand from
>>
>> <bb 2> [100.00%] [count: INV]:
>> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
>> res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
>> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
>> res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
>> return res_8;
>>
>> but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap
>> pass looks like the canonical pass for this even though it's quite awkward
>> to fill this in.
>
> I was thinking about this exactly. Though for the scale use of
> BIT_INSERT_EXPR/BIT_FIELD_REF.
> I have a case where someone writes (this shows up in GCC too):
> a->b = c->b;
> a->d = c->d;
> a->e = c->e;
> a->f = c->f;
> a->g = c->g;
> a->h = c->h;
>
> Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have:
> _1 = BIT_FIELD_REF <a.1_3, 2, 0>;
> _8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>;
> _2 = BIT_FIELD_REF <a.1_3, 4, 2>;
> _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;
> ....
>
> For the vector case, can't we write it as:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = {_1, _2};
>
> And then have some match.pd patterns (which might get complex), to
> rewrite that into VEC_PERM_EXPR?
> The reason why I ask that is because say someone who wrote:
> vector double
> test_vpasted (vector double high, vector double low)
> {
> vector double res = { high[1], low[0] };
> return res;
> }
I still believe a proper pass is better than match.pd patterns (which are
awkward when dealing with "variable operand number" cases).
I believe in the end we want to "unify" SRA, bswap and store-merging
at least. Analyze memory/component accesses, their flow and then
pattern-match the result. bswap is good with the flow stuff but its
memory/component access analysis is too ad-hoc.
"unify" in the sense of using common infrastructure.
Richard.
>
> Thanks,
> Andrew Pinski
>
>>
>> So a match.pd rule would work as well here - your ppc backend patterns
>> are v2df specific, right?
>>
>>> Previously it would generate:
>>>
>>> xxpermdi 12,34,34,2
>>> vspltisw 2,0
>>> xxlor 0,35,35
>>> xxpermdi 34,34,12,0
>>> xxpermdi 34,0,34,1
>>>
>>> and with these patches, it now generates:
>>>
>>> xxpermdi 34,35,34,1
>>>
>>> I have tested it on a little endian power8 system and a big endian power7
>>> system with the usual bootstrap and make checks with no regressions. Can I
>>> check this into the trunk?
>>>
>>> I also built Spec 2006 with the compiler, and saw no changes in the code
>>> generated. This isn't surprising because it isn't something that auto
>>> vectorization might generate by default.
>>>
>>> [gcc]
>>> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com>
>>>
>>> PR target/81593
>>> * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>>> declaration.
>>> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>>> emit XXPERMDI accessing either double word in either vector
>>> register inputs.
>>> * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>>> Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi. Simplify
>>> the constraints with the removal of the -mupper-regs-* switches.
>>> (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>>> where either register might have come from VEC_SELECT.
>>> (vsx_concat_<mode>_2): Likewise.
>>> (vsx_concat_<mode>_3): Likewise.
>>> (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>>> VEC_CONCAT rather than use an UNSPEC to specify the option.
>>>
>>> [gcc/testsuite]
>>> 2017-07-27 Michael Meissner <meissner@linux.vnet.ibm.com>
>>>
>>> PR target/81593
>>> * gcc.target/powerpc/vsx-extract-6.c: New test.
>>> * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>>
>>> --
>>> Michael Meissner, IBM
>>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>>> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-28 8:02 ` Andrew Pinski
2017-07-28 8:20 ` Richard Biener
@ 2017-07-28 8:42 ` Marc Glisse
1 sibling, 0 replies; 16+ messages in thread
From: Marc Glisse @ 2017-07-28 8:42 UTC (permalink / raw)
To: Andrew Pinski
Cc: Richard Biener, Michael Meissner, GCC Patches,
Segher Boessenkool, David Edelsohn, Bill Schmidt
On Fri, 28 Jul 2017, Andrew Pinski wrote:
> For the vector case, can't we write it as:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = {_1, _2};
>
> And then have some match.pd patterns (which might get complex), to
> rewrite that into VEC_PERM_EXPR?
For this last part, we have simplify_vector_constructor in
tree-ssa-forwprop.c, which currently only recognizes VEC_PERM_EXPR of a
single vector, but I guess it could be extended to 2 vectors. Not as good
as a bswap revamp (which will be needed anyway at some point), but less
work.
--
Marc Glisse
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-28 7:51 ` Richard Biener
2017-07-28 8:02 ` Andrew Pinski
@ 2017-07-28 14:44 ` Michael Meissner
1 sibling, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2017-07-28 14:44 UTC (permalink / raw)
To: Richard Biener
Cc: Michael Meissner, GCC Patches, Segher Boessenkool,
David Edelsohn, Bill Schmidt
On Fri, Jul 28, 2017 at 09:51:30AM +0200, Richard Biener wrote:
> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> > longs where the elements in the vector set may have been extracted from another
> > vector (PR target/81593):
> >
> > Here an an example:
> >
> > vector double
> > test_vpasted (vector double high, vector double low)
> > {
> > vector double res;
> > res[1] = high[1];
> > res[0] = low[0];
> > return res;
> > }
>
> Interesting. We expand from
>
> <bb 2> [100.00%] [count: INV]:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
> return res_8;
>
> but ideally we'd pattern-match that to a VEC_PERM_EXPR. The bswap
> pass looks like the canonical pass for this even though it's quite awkward
> to fill this in.
>
> So a match.pd rule would work as well here - your ppc backend patterns
> are v2df specific, right?
Both V2DF and V2DI.
While it would be great to have a machine independent optimization, my patches
would also work for PowerPC specific built-ins for vector extract and vector
insert.
Also my patches replaces an UNSPEC to create the vector with VEC_CONCAT.
Thus work going on in for machine independent support should not preclude
this patch from being accepted in the PowerPC backend.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner
2017-07-28 7:51 ` Richard Biener
@ 2017-07-28 21:09 ` Segher Boessenkool
2017-07-30 14:01 ` Bill Schmidt
` (2 more replies)
1 sibling, 3 replies; 16+ messages in thread
From: Segher Boessenkool @ 2017-07-28 21:09 UTC (permalink / raw)
To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt
Hi!
On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from another
> vector (PR target/81593):
> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> emit XXPERMDI accessing either double word in either vector
> register inputs.
"emit" is not a good name for this: that is generally used for something
that does emit_insn, i.e. put an insn in the instruction stream. This
function returns a string a define_insn can return. For the rl* insns
I called the similar functions rs6000_insn_for_*, maybe something like
that is better here?
> +/* Emit a XXPERMDI instruction that can extract from either double word of the
> + two arguments. ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
> + which double word to be used for the operand. */
> +
> +const char *
> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> +{
> + int op1_dword = (!element1) ? 0 : INTVAL (element1);
> + int op2_dword = (!element2) ? 0 : INTVAL (element2);
> +
> + gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> +
> + if (BYTES_BIG_ENDIAN)
> + {
> + operands[3] = GEN_INT (2*op1_dword + op2_dword);
> + return "xxpermdi %x0,%x1,%x2,%3";
> + }
> + else
> + {
> + if (element1)
> + op1_dword = 1 - op1_dword;
> +
> + if (element2)
> + op2_dword = 1 - op2_dword;
> +
> + operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> + return "xxpermdi %x0,%x2,%x1,%3";
> + }
> +}
I think calling this with the rtx elementN args makes this only more
complicated (the function comment doesn't say what they are or what
NULL means, btw).
> (define_insn "vsx_concat_<mode>"
> - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> (vec_concat:VSX_D
> - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
> "VECTOR_MEM_VSX_P (<MODE>mode)"
> {
> if (which_alternative == 0)
> - return (BYTES_BIG_ENDIAN
> - ? "xxpermdi %x0,%x1,%x2,0"
> - : "xxpermdi %x0,%x2,%x1,0");
> + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>
> else if (which_alternative == 1)
> - return (BYTES_BIG_ENDIAN
> + return (VECTOR_ELT_ORDER_BIG
> ? "mtvsrdd %x0,%1,%2"
> : "mtvsrdd %x0,%2,%1");
This one could be
{
if (!BYTES_BIG_ENDIAN)
std::swap (operands[1], operands[2]);
switch (which_alternative)
{
case 0:
return "xxpermdi %x0,%x1,%x2,0";
case 1:
return "mtvsrdd %x0,%1,%2";
default:
gcc_unreachable ();
}
}
(Could/should we use xxmrghd instead? Do all supported assemblers know
that extended mnemonic, is it actually more readable?)
> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 250640)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +vector double
> +test_vpasted (vector double high, vector double low)
> +{
> + vector double res;
> + res[1] = high[1];
> + res[0] = low[0];
> + return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
In this and the other testcase, should you test no other insns at all
are generated?
Segher
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-28 21:09 ` Segher Boessenkool
@ 2017-07-30 14:01 ` Bill Schmidt
2017-07-31 17:41 ` Michael Meissner
2017-07-31 17:38 ` Michael Meissner
2017-08-02 14:29 ` Michael Meissner
2 siblings, 1 reply; 16+ messages in thread
From: Bill Schmidt @ 2017-07-30 14:01 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: Michael Meissner, GCC Patches, David Edelsohn
> On Jul 28, 2017, at 4:08 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>> longs where the elements in the vector set may have been extracted from another
>> vector (PR target/81593):
>
>> * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>> emit XXPERMDI accessing either double word in either vector
>> register inputs.
>
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream. This
> function returns a string a define_insn can return. For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?
>
>> +/* Emit a XXPERMDI instruction that can extract from either double word of the
>> + two arguments. ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
>> + which double word to be used for the operand. */
>> +
>> +const char *
>> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
>> +{
>> + int op1_dword = (!element1) ? 0 : INTVAL (element1);
>> + int op2_dword = (!element2) ? 0 : INTVAL (element2);
>> +
>> + gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
>> +
>> + if (BYTES_BIG_ENDIAN)
>> + {
>> + operands[3] = GEN_INT (2*op1_dword + op2_dword);
>> + return "xxpermdi %x0,%x1,%x2,%3";
>> + }
>> + else
>> + {
>> + if (element1)
>> + op1_dword = 1 - op1_dword;
>> +
>> + if (element2)
>> + op2_dword = 1 - op2_dword;
>> +
>> + operands[3] = GEN_INT (op1_dword + 2*op2_dword);
>> + return "xxpermdi %x0,%x2,%x1,%3";
>> + }
>> +}
>
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).
>
>> (define_insn "vsx_concat_<mode>"
>> - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
>> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>> (vec_concat:VSX_D
>> - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
>> - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
>> + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
>> + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
>> "VECTOR_MEM_VSX_P (<MODE>mode)"
>> {
>> if (which_alternative == 0)
>> - return (BYTES_BIG_ENDIAN
>> - ? "xxpermdi %x0,%x1,%x2,0"
>> - : "xxpermdi %x0,%x2,%x1,0");
>> + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>>
>> else if (which_alternative == 1)
>> - return (BYTES_BIG_ENDIAN
>> + return (VECTOR_ELT_ORDER_BIG
>> ? "mtvsrdd %x0,%1,%2"
>> : "mtvsrdd %x0,%2,%1");
>
> This one could be
>
> {
> if (!BYTES_BIG_ENDIAN)
!VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be). (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether. Still of two minds about this.)
Bill
> std::swap (operands[1], operands[2]);
>
> switch (which_alternative)
> {
> case 0:
> return "xxpermdi %x0,%x1,%x2,0";
> case 1:
> return "mtvsrdd %x0,%1,%2";
> default:
> gcc_unreachable ();
> }
> }
>
> (Could/should we use xxmrghd instead? Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)
>
>> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 0)
>> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 250640)
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile { target { powerpc*-*-* } } } */
>> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> +/* { dg-require-effective-target powerpc_vsx_ok } */
>> +/* { dg-options "-O2 -mvsx" } */
>> +
>> +vector double
>> +test_vpasted (vector double high, vector double low)
>> +{
>> + vector double res;
>> + res[1] = high[1];
>> + res[0] = low[0];
>> + return res;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
>
> In this and the other testcase, should you test no other insns at all
> are generated?
>
>
> Segher
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-28 21:09 ` Segher Boessenkool
2017-07-30 14:01 ` Bill Schmidt
@ 2017-07-31 17:38 ` Michael Meissner
2017-08-02 14:29 ` Michael Meissner
2 siblings, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2017-07-31 17:38 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt
On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> > longs where the elements in the vector set may have been extracted from another
> > vector (PR target/81593):
>
> > * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> > emit XXPERMDI accessing either double word in either vector
> > register inputs.
>
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream. This
> function returns a string a define_insn can return. For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?
Yeah, I should have used rs6000_output_xxpermdi or similar (or output_xxpermdi,
etc.), which is what the other functions used.
> > +/* Emit a XXPERMDI instruction that can extract from either double word of the
> > + two arguments. ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
> > + which double word to be used for the operand. */
> > +
> > +const char *
> > +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> > +{
> > + int op1_dword = (!element1) ? 0 : INTVAL (element1);
> > + int op2_dword = (!element2) ? 0 : INTVAL (element2);
> > +
> > + gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> > +
> > + if (BYTES_BIG_ENDIAN)
> > + {
> > + operands[3] = GEN_INT (2*op1_dword + op2_dword);
> > + return "xxpermdi %x0,%x1,%x2,%3";
> > + }
> > + else
> > + {
> > + if (element1)
> > + op1_dword = 1 - op1_dword;
> > +
> > + if (element2)
> > + op2_dword = 1 - op2_dword;
> > +
> > + operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> > + return "xxpermdi %x0,%x2,%x1,%3";
> > + }
> > +}
>
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).
Ok, let me think on it.
>
> > (define_insn "vsx_concat_<mode>"
> > - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> > + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> > (vec_concat:VSX_D
> > - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> > - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> > + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> > + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
> > "VECTOR_MEM_VSX_P (<MODE>mode)"
> > {
> > if (which_alternative == 0)
> > - return (BYTES_BIG_ENDIAN
> > - ? "xxpermdi %x0,%x1,%x2,0"
> > - : "xxpermdi %x0,%x2,%x1,0");
> > + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >
> > else if (which_alternative == 1)
> > - return (BYTES_BIG_ENDIAN
> > + return (VECTOR_ELT_ORDER_BIG
> > ? "mtvsrdd %x0,%1,%2"
> > : "mtvsrdd %x0,%2,%1");
>
> This one could be
>
> {
> if (!BYTES_BIG_ENDIAN)
> std::swap (operands[1], operands[2]);
>
> switch (which_alternative)
> {
> case 0:
> return "xxpermdi %x0,%x1,%x2,0";
> case 1:
> return "mtvsrdd %x0,%1,%2";
> default:
> gcc_unreachable ();
> }
> }
> (Could/should we use xxmrghd instead? Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)
For me no, xxpermdi is clearer. But if you want xxmrghd, I can do it.
> > --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c) (revision 250640)
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +vector double
> > +test_vpasted (vector double high, vector double low)
> > +{
> > + vector double res;
> > + res[1] = high[1];
> > + res[0] = low[0];
> > + return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
>
> In this and the other testcase, should you test no other insns at all
> are generated?
It is kind of hard to test a negative, without trying to guess what possible
instructions could be generated.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-30 14:01 ` Bill Schmidt
@ 2017-07-31 17:41 ` Michael Meissner
2017-07-31 17:42 ` Bill Schmidt
0 siblings, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2017-07-31 17:41 UTC (permalink / raw)
To: Bill Schmidt
Cc: Segher Boessenkool, Michael Meissner, GCC Patches, David Edelsohn
On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
> >> (define_insn "vsx_concat_<mode>"
> >> - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> >> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> >> (vec_concat:VSX_D
> >> - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> >> - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> >> + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> >> + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
> >> "VECTOR_MEM_VSX_P (<MODE>mode)"
> >> {
> >> if (which_alternative == 0)
> >> - return (BYTES_BIG_ENDIAN
> >> - ? "xxpermdi %x0,%x1,%x2,0"
> >> - : "xxpermdi %x0,%x2,%x1,0");
> >> + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >>
> >> else if (which_alternative == 1)
> >> - return (BYTES_BIG_ENDIAN
> >> + return (VECTOR_ELT_ORDER_BIG
> >> ? "mtvsrdd %x0,%1,%2"
> >> : "mtvsrdd %x0,%2,%1");
> >
> > This one could be
> >
> > {
> > if (!BYTES_BIG_ENDIAN)
>
> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be). (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether. Still of two minds about this.)
>
> Bill
In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
(I originally had it using VECTOR_ELT_ORDER_BIG and got failures). But I need
to look at it again.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-31 17:41 ` Michael Meissner
@ 2017-07-31 17:42 ` Bill Schmidt
0 siblings, 0 replies; 16+ messages in thread
From: Bill Schmidt @ 2017-07-31 17:42 UTC (permalink / raw)
To: Michael Meissner; +Cc: Segher Boessenkool, GCC Patches, David Edelsohn
> On Jul 31, 2017, at 12:40 PM, Michael Meissner <meissner@linux.vnet.ibm.com> wrote:
>
> On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
>>>> (define_insn "vsx_concat_<mode>"
>>>> - [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
>>>> + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>>>> (vec_concat:VSX_D
>>>> - (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
>>>> - (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
>>>> + (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
>>>> + (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
>>>> "VECTOR_MEM_VSX_P (<MODE>mode)"
>>>> {
>>>> if (which_alternative == 0)
>>>> - return (BYTES_BIG_ENDIAN
>>>> - ? "xxpermdi %x0,%x1,%x2,0"
>>>> - : "xxpermdi %x0,%x2,%x1,0");
>>>> + return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>>>>
>>>> else if (which_alternative == 1)
>>>> - return (BYTES_BIG_ENDIAN
>>>> + return (VECTOR_ELT_ORDER_BIG
>>>> ? "mtvsrdd %x0,%1,%2"
>>>> : "mtvsrdd %x0,%2,%1");
>>>
>>> This one could be
>>>
>>> {
>>> if (!BYTES_BIG_ENDIAN)
>>
>> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be). (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether. Still of two minds about this.)
>>
>> Bill
>
> In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
> (I originally had it using VECTOR_ELT_ORDER_BIG and got failures). But I need
> to look at it again.
Hi Mike,
You misunderstand me, I think you had it right (you did move to VECTOR_ELT_ORDER_BIG here)
but I just wanted to clarify that Segher's suggestion would also need to use VECTOR_ELT_ORDER_BIG.
Thanks,
Bill
>
> --
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-07-28 21:09 ` Segher Boessenkool
2017-07-30 14:01 ` Bill Schmidt
2017-07-31 17:38 ` Michael Meissner
@ 2017-08-02 14:29 ` Michael Meissner
2017-08-03 15:01 ` Segher Boessenkool
2 siblings, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2017-08-02 14:29 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt
[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]
On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
>
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream. This
> function returns a string a define_insn can return. For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?
...
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).
...
> In this and the other testcase, should you test no other insns at all
> are generated?
Here are the revised patches. I tested on a little endian power8 system and a
big endian power7 system. Are these patches ok for the trunk?
[gcc]
2017-08-02 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/81593
* config/rs6000/rs6000-protos.h (rs6000_output_xxpermdi): New
declaration.
* config/rs6000/rs6000.c (rs6000_output_xxpermdi): New function to
emit XXPERMDI accessing either double word in either vector
register inputs.
* config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
Rewrite VEC_CONCAT insn to call rs6000_output_xxpermdi. Simplify
the constraints with the removal of the -mupper-regs-* switches.
(vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
where either register might have come from VEC_SELECT.
(vsx_concat_<mode>_2): Likewise.
(vsx_concat_<mode>_3): Likewise.
(vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
VEC_CONCAT rather than use an UNSPEC to specify the option.
[gcc/testsuite]
2017-08-02 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/81593
* gcc.target/powerpc/vsx-extract-6.c: New test.
* gcc.target/powerpc/vsx-extract-7.c: Likewise.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[-- Attachment #2: pr81593.patch02b --]
[-- Type: text/plain, Size: 10486 bytes --]
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250793)
+++ gcc/config/rs6000/rs6000-protos.h (.../gcc/config/rs6000) (working copy)
@@ -233,6 +233,7 @@ extern void rs6000_asm_output_dwarf_pcre
const char *label);
extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
const char *label);
+extern const char *rs6000_output_xxpermdi (rtx, rtx, rtx, rtx, rtx);
/* Declare functions in rs6000-c.c */
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250793)
+++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy)
@@ -39007,6 +39007,60 @@ rs6000_optab_supported_p (int op, machin
return true;
}
}
+
+\f
+/* Output a xxpermdi instruction that sets a 128-bit vector DEST combining two
+ inputs SRC1 and SRC2.
+
+ If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is
+ non-NULL, it is a 0 or 1 constant that gives the vector element number to
+ use for extracting the 64-bit double word from ARG1.
+
+ If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is
+ non-NULL, it is a 0 or 1 constant that gives the vector element number to
+ use for extracting the 64-bit double word from ARG2.
+
+ The element number is based on the user element ordering, set by the
+ endianess and by the -maltivec={le,be} options. */
+
+const char *
+rs6000_output_xxpermdi (rtx dest,
+ rtx src1,
+ rtx src2,
+ rtx element1,
+ rtx element2)
+{
+ int op1_dword = (!element1) ? 0 : INTVAL (element1);
+ int op2_dword = (!element2) ? 0 : INTVAL (element2);
+ rtx xops[10];
+ const char *insn_string;
+
+ gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
+ xops[0] = dest;
+ xops[1] = src1;
+ xops[2] = src2;
+
+ if (BYTES_BIG_ENDIAN)
+ {
+ xops[3] = GEN_INT (2*op1_dword + op2_dword);
+ insn_string = "xxpermdi %x0,%x1,%x2,%3";
+ }
+ else
+ {
+ if (element1)
+ op1_dword = 1 - op1_dword;
+
+ if (element2)
+ op2_dword = 1 - op2_dword;
+
+ xops[3] = GEN_INT (op1_dword + 2*op2_dword);
+ insn_string = "xxpermdi %x0,%x2,%x1,%3";
+ }
+
+ output_asm_insn (insn_string, xops);
+ return "";
+}
+
\f
struct gcc_target targetm = TARGET_INITIALIZER;
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250793)
+++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy)
@@ -2364,19 +2364,18 @@ (define_insn "*vsx_float_fix_v2df2"
;; Build a V2DF/V2DI vector from two scalars
(define_insn "vsx_concat_<mode>"
- [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
(vec_concat:VSX_D
- (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
- (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
+ (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
+ (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
if (which_alternative == 0)
- return (BYTES_BIG_ENDIAN
- ? "xxpermdi %x0,%x1,%x2,0"
- : "xxpermdi %x0,%x2,%x1,0");
+ return rs6000_output_xxpermdi (operands[0], operands[1], operands[2],
+ NULL_RTX, NULL_RTX);
else if (which_alternative == 1)
- return (BYTES_BIG_ENDIAN
+ return (VECTOR_ELT_ORDER_BIG
? "mtvsrdd %x0,%1,%2"
: "mtvsrdd %x0,%2,%1");
@@ -2385,6 +2384,50 @@ (define_insn "vsx_concat_<mode>"
}
[(set_attr "type" "vecperm")])
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; register in a vector register. Note, rs6000_output_xxpermdi expects
+;; operands[0..2] to be the vector registers.
+(define_insn "*vsx_concat_<mode>_1"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+ (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa")))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ return rs6000_output_xxpermdi (operands[0], operands[1], operands[2],
+ operands[3], NULL_RTX);
+})
+
+(define_insn "*vsx_concat_<mode>_2"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa")
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ return rs6000_output_xxpermdi (operands[0], operands[1], operands[2],
+ NULL_RTX, operands[3]);
+})
+
+(define_insn "*vsx_concat_<mode>_3"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ return rs6000_output_xxpermdi (operands[0], operands[1], operands[2],
+ operands[3], operands[4]);
+})
+
;; Special purpose concat using xxpermdi to glue two single precision values
;; together, relying on the fact that internally scalar floats are represented
;; as doubles. This is used to initialize a V4SF vector with 4 floats
@@ -2585,25 +2628,35 @@ (define_expand "vsx_set_v1ti"
DONE;
})
-;; Set the element of a V2DI/VD2F mode
-(define_insn "vsx_set_<mode>"
- [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>")
- (unspec:VSX_D
- [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>")
- (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")
- (match_operand:QI 3 "u5bit_cint_operand" "i,i")]
- UNSPEC_VSX_SET))]
+;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT
+(define_expand "vsx_set_<mode>"
+ [(use (match_operand:VSX_D 0 "vsx_register_operand"))
+ (use (match_operand:VSX_D 1 "vsx_register_operand"))
+ (use (match_operand:<VS_scalar> 2 "gpc_reg_operand"))
+ (use (match_operand:QI 3 "const_0_to_1_operand"))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
- int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
- if (INTVAL (operands[3]) == idx_first)
- return \"xxpermdi %x0,%x2,%x1,1\";
- else if (INTVAL (operands[3]) == 1 - idx_first)
- return \"xxpermdi %x0,%x1,%x2,0\";
+ rtx dest = operands[0];
+ rtx vec_reg = operands[1];
+ rtx value = operands[2];
+ rtx ele = operands[3];
+ rtx tmp = gen_reg_rtx (<VS_scalar>mode);
+
+ if (ele == const0_rtx)
+ {
+ emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx));
+ emit_insn (gen_vsx_concat_<mode> (dest, value, tmp));
+ DONE;
+ }
+ else if (ele == const1_rtx)
+ {
+ emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx));
+ emit_insn (gen_vsx_concat_<mode> (dest, tmp, value));
+ DONE;
+ }
else
gcc_unreachable ();
-}
- [(set_attr "type" "vecperm")])
+})
;; Extract a DF/DI element from V2DF/V2DI
;; Optimize cases were we can do a simple or direct move.
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250804)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector unsigned long
+test_vpasted (vector unsigned long high, vector unsigned long low)
+{
+ vector unsigned long res;
+ res[1] = high[1];
+ res[0] = low[0];
+ return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mvspltisw\M} } } */
+/* { dg-final { scan-assembler-not {\mxxlor\M} } } */
+/* { dg-final { scan-assembler-not {\mxxlxor\M} } } */
+/* { dg-final { scan-assembler-not {\mxxspltib\M} } } */
+/* { dg-final { scan-assembler-not {\mlxvx?\M} } } */
+/* { dg-final { scan-assembler-not {\mlxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not {\mlvx\M} } } */
+/* { dg-final { scan-assembler-not {\mstxvx?\M} } } */
+/* { dg-final { scan-assembler-not {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not {\mstvx\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250804)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector double
+test_vpasted (vector double high, vector double low)
+{
+ vector double res;
+ res[1] = high[1];
+ res[0] = low[0];
+ return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mvspltisw\M} } } */
+/* { dg-final { scan-assembler-not {\mxxlor\M} } } */
+/* { dg-final { scan-assembler-not {\mxxlxor\M} } } */
+/* { dg-final { scan-assembler-not {\mxxspltib\M} } } */
+/* { dg-final { scan-assembler-not {\mlxvx?\M} } } */
+/* { dg-final { scan-assembler-not {\mlxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not {\mlvx\M} } } */
+/* { dg-final { scan-assembler-not {\mstxvx?\M} } } */
+/* { dg-final { scan-assembler-not {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not {\mstvx\M} } } */
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-08-02 14:29 ` Michael Meissner
@ 2017-08-03 15:01 ` Segher Boessenkool
2017-08-03 17:23 ` Michael Meissner
2017-08-07 13:18 ` Michael Meissner
0 siblings, 2 replies; 16+ messages in thread
From: Segher Boessenkool @ 2017-08-03 15:01 UTC (permalink / raw)
To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt
Hi Mike,
On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > I think calling this with the rtx elementN args makes this only more
> > complicated (the function comment doesn't say what they are or what
> > NULL means, btw).
You didn't handle the first part of this as far as I see? It's the
big complicating issue here.
> + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is
> + non-NULL, it is a 0 or 1 constant that gives the vector element number to
> + use for extracting the 64-bit double word from ARG1.
> +
> + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is
> + non-NULL, it is a 0 or 1 constant that gives the vector element number to
> + use for extracting the 64-bit double word from ARG2.
> +
> + The element number is based on the user element ordering, set by the
> + endianess and by the -maltivec={le,be} options. */
("endianness", two n's).
I don't like using NULL as a magic value at all; it does not simplify
this interface, it complicates it instead.
Can you move the "which half is high" decision to the callers?
Segher
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-08-03 15:01 ` Segher Boessenkool
@ 2017-08-03 17:23 ` Michael Meissner
2017-08-07 13:18 ` Michael Meissner
1 sibling, 0 replies; 16+ messages in thread
From: Michael Meissner @ 2017-08-03 17:23 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt
On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote:
> Hi Mike,
>
> On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > > I think calling this with the rtx elementN args makes this only more
> > > complicated (the function comment doesn't say what they are or what
> > > NULL means, btw).
>
> You didn't handle the first part of this as far as I see? It's the
> big complicating issue here.
I am not sure exactly what you are asking for. This is like the other output
functions that take the rtx insns.
> > + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is
> > + non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > + use for extracting the 64-bit double word from ARG1.
> > +
> > + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is
> > + non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > + use for extracting the 64-bit double word from ARG2.
> > +
> > + The element number is based on the user element ordering, set by the
> > + endianess and by the -maltivec={le,be} options. */
>
> ("endianness", two n's).
>
> I don't like using NULL as a magic value at all; it does not simplify
> this interface, it complicates it instead.
>
> Can you move the "which half is high" decision to the callers?
And then essentially there is no need for the function, since each of the 4
concat variants have to have the logic to support big endian, -maltivec=be, and
-maltivec=le.
Let me see what I can do about it.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-08-03 15:01 ` Segher Boessenkool
2017-08-03 17:23 ` Michael Meissner
@ 2017-08-07 13:18 ` Michael Meissner
2017-08-07 21:54 ` Segher Boessenkool
1 sibling, 1 reply; 16+ messages in thread
From: Michael Meissner @ 2017-08-07 13:18 UTC (permalink / raw)
To: Segher Boessenkool
Cc: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt
[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]
On Thu, Aug 03, 2017 at 10:01:41AM -0500, Segher Boessenkool wrote:
> Hi Mike,
>
> On Wed, Aug 02, 2017 at 10:28:55AM -0400, Michael Meissner wrote:
> > On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> > > I think calling this with the rtx elementN args makes this only more
> > > complicated (the function comment doesn't say what they are or what
> > > NULL means, btw).
>
> You didn't handle the first part of this as far as I see? It's the
> big complicating issue here.
>
> > + If ELEMENT1 is null, use the top 64-bit double word of ARG1. If it is
> > + non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > + use for extracting the 64-bit double word from ARG1.
> > +
> > + If ELEMENT2 is null, use the top 64-bit double word of ARG2. If it is
> > + non-NULL, it is a 0 or 1 constant that gives the vector element number to
> > + use for extracting the 64-bit double word from ARG2.
> > +
> > + The element number is based on the user element ordering, set by the
> > + endianess and by the -maltivec={le,be} options. */
>
> ("endianness", two n's).
>
> I don't like using NULL as a magic value at all; it does not simplify
> this interface, it complicates it instead.
>
> Can you move the "which half is high" decision to the callers?
I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do
the calculation of the XXPERMDI mask in each of the vsx_concat_<mask>_{1,2,3}
insns. Just to be sure I got things correct, I wrote a new executable test
that tests various methods of creating/inserting 2 element vectors with double
word elements, and tested in BE, LE -maltivec=be, and LE, and the results match
previous compilers.
I have done bootstrap/build checks on a big endian power7, a little endian
power8 system, and I have done a non-bootstrap/check on a power9 prototype (I
have script issues that prevents a bootstrap build on power9 that I need to
look into). There are no regressions in the tests and the new tests were run
on each of the systems. Can I check this into the trunk?
I would also like to backport it to all open branches (particularly GCC 7, but
GCC 6 if possible). Note, the patch will need a slight tweak on the older
systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to adjust
the constraints to accomidate this, and under GCC 6 DImode not being allowed in
traditional Altivec registers.
[gcc]
2017-08-07 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/81593
* config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D): Cleanup
constraints since the -mupper-regs-* switches have been
eliminated.
(vsx_concat_<mode>_1): New combiner insns to recognize inserting
into a vector from a double word element that was extracted from
another vector, and eliminate extra XXPERMDI instructions.
(vsx_concat_<mode>_2): Likewise.
(vsx_concat_<mode>_3): Likewise.
(vsx_set_<mode>, VSX_D): Rewrite vector set in terms of vector
concat to allow optimizing inserts from previous extracts.
[gcc/testsuite]
2017-08-07 Michael Meissner <meissner@linux.vnet.ibm.com>
PR target/81593
* gcc.target/powerpc/vec-setup.h: New tests to test various
combinations of setting up vectors of 2 double word elements.
* gcc.target/powerpc/vec-setup-long.c: Likewise.
* gcc.target/powerpc/vec-setup-double.c: Likewise.
* gcc.target/powerpc/vec-setup-be-long.c: Likewise.
* gcc.target/powerpc/vec-setup-be-double.c: Likewise.
* gcc.target/powerpc/vsx-extract-6.c: New tests for optimzing
vector inserts from vector extracts.
* gcc.target/powerpc/vsx-extract-7.c: Likewise.
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[-- Attachment #2: pr81593.patch05b --]
[-- Type: text/plain, Size: 21249 bytes --]
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250858)
+++ gcc/config/rs6000/vsx.md (.../gcc/config/rs6000) (working copy)
@@ -2364,10 +2364,10 @@ (define_insn "*vsx_float_fix_v2df2"
;; Build a V2DF/V2DI vector from two scalars
(define_insn "vsx_concat_<mode>"
- [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
(vec_concat:VSX_D
- (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
- (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
+ (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
+ (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
if (which_alternative == 0)
@@ -2385,6 +2385,80 @@ (define_insn "vsx_concat_<mode>"
}
[(set_attr "type" "vecperm")])
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; word element in a vector register.
+(define_insn "*vsx_concat_<mode>_1"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n")]))
+ (match_operand:<VS_scalar> 3 "gpc_reg_operand" "wa")))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ HOST_WIDE_INT dword = INTVAL (operands[2]);
+ if (BYTES_BIG_ENDIAN)
+ {
+ operands[4] = GEN_INT (2*dword);
+ return "xxpermdi %x0,%x1,%x3,%4";
+ }
+ else
+ {
+ operands[4] = GEN_INT (!dword);
+ return "xxpermdi %x0,%x3,%x1,%4";
+ }
+}
+ [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_concat_<mode>_2"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa")
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ HOST_WIDE_INT dword = INTVAL (operands[3]);
+ if (BYTES_BIG_ENDIAN)
+ {
+ operands[4] = GEN_INT (dword);
+ return "xxpermdi %x0,%x1,%x2,%4";
+ }
+ else
+ {
+ operands[4] = GEN_INT (2 * !dword);
+ return "xxpermdi %x0,%x2,%x1,%4";
+ }
+}
+ [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_concat_<mode>_3"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_concat:VSX_D
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n")]))
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 3 "gpc_reg_operand" "wa")
+ (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))]
+ "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+ HOST_WIDE_INT dword1 = INTVAL (operands[2]);
+ HOST_WIDE_INT dword2 = INTVAL (operands[4]);
+ if (BYTES_BIG_ENDIAN)
+ {
+ operands[5] = GEN_INT ((2 * dword1) + dword2);
+ return "xxpermdi %x0,%x1,%x3,%5";
+ }
+ else
+ {
+ operands[5] = GEN_INT ((2 * !dword2) + !dword1);
+ return "xxpermdi %x0,%x3,%x1,%5";
+ }
+}
+ [(set_attr "type" "vecperm")])
+
;; Special purpose concat using xxpermdi to glue two single precision values
;; together, relying on the fact that internally scalar floats are represented
;; as doubles. This is used to initialize a V4SF vector with 4 floats
@@ -2585,25 +2659,35 @@ (define_expand "vsx_set_v1ti"
DONE;
})
-;; Set the element of a V2DI/VD2F mode
-(define_insn "vsx_set_<mode>"
- [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>")
- (unspec:VSX_D
- [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>")
- (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")
- (match_operand:QI 3 "u5bit_cint_operand" "i,i")]
- UNSPEC_VSX_SET))]
+;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT
+(define_expand "vsx_set_<mode>"
+ [(use (match_operand:VSX_D 0 "vsx_register_operand"))
+ (use (match_operand:VSX_D 1 "vsx_register_operand"))
+ (use (match_operand:<VS_scalar> 2 "gpc_reg_operand"))
+ (use (match_operand:QI 3 "const_0_to_1_operand"))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
- int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
- if (INTVAL (operands[3]) == idx_first)
- return \"xxpermdi %x0,%x2,%x1,1\";
- else if (INTVAL (operands[3]) == 1 - idx_first)
- return \"xxpermdi %x0,%x1,%x2,0\";
+ rtx dest = operands[0];
+ rtx vec_reg = operands[1];
+ rtx value = operands[2];
+ rtx ele = operands[3];
+ rtx tmp = gen_reg_rtx (<VS_scalar>mode);
+
+ if (ele == const0_rtx)
+ {
+ emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx));
+ emit_insn (gen_vsx_concat_<mode> (dest, value, tmp));
+ DONE;
+ }
+ else if (ele == const1_rtx)
+ {
+ emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx));
+ emit_insn (gen_vsx_concat_<mode> (dest, tmp, value));
+ DONE;
+ }
else
gcc_unreachable ();
-}
- [(set_attr "type" "vecperm")])
+})
;; Extract a DF/DI element from V2DF/V2DI
;; Optimize cases were we can do a simple or direct move.
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250878)
@@ -0,0 +1,11 @@
+/* { dg-do run { target { powerpc64le*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx -maltivec=be" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+ elements. This test uses the long (on 64-bit systems) or long long datatype
+ (on 32-bit systems).
+
+ This test explicitly tests -maltivec=be to make sure things are correct. */
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup.h (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup.h (.../gcc/testsuite/gcc.target/powerpc) (revision 250878)
@@ -0,0 +1,366 @@
+#include <altivec.h>
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+ elements. This include files supports:
+
+ testing double
+ testing long on 64-bit systems
+ testing long long on 32-bit systems.
+
+ The endian support is:
+
+ big endian
+ little endian with little endian element ordering
+ little endian with big endian element ordering. */
+
+#ifdef DEBUG
+#include <stdio.h>
+#define DEBUG0(STR) fputs (STR, stdout)
+#define DEBUG2(STR,A,B) printf (STR, A, B)
+
+static int errors = 0;
+
+#else
+#include <stdlib.h>
+#define DEBUG0(STR)
+#define DEBUG2(STR,A,B)
+#endif
+
+#if defined(DO_DOUBLE)
+#define TYPE double
+#define STYPE "double"
+#define ZERO 0.0
+#define ONE 1.0
+#define TWO 2.0
+#define THREE 3.0
+#define FOUR 4.0
+#define FIVE 5.0
+#define SIX 6.0
+#define FMT "g"
+
+#elif defined(_ARCH_PPC64)
+#define TYPE long
+#define STYPE "long"
+#define ZERO 0L
+#define ONE 1L
+#define TWO 2L
+#define THREE 3L
+#define FOUR 4L
+#define FIVE 5L
+#define SIX 6L
+#define FMT "ld"
+
+#else
+#define TYPE long long
+#define STYPE "long long"
+#define ZERO 0LL
+#define ONE 1LL
+#define TWO 2LL
+#define THREE 3LL
+#define FOUR 4LL
+#define FIVE 5LL
+#define SIX 6LL
+#define FMT "lld"
+#endif
+
+/* Macros to order the left/right values correctly. Note, -maltivec=be does
+ not change the order for static initializations, so we have to handle it
+ specially. */
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define INIT_ORDER(A, B) (TYPE) A, (TYPE) B
+#define ELEMENT_ORDER(A, B) (TYPE) A, (TYPE) B
+#define ENDIAN "-mbig"
+
+#elif __VEC_ELEMENT_REG_ORDER__ == __ORDER_BIG_ENDIAN__
+#define NO_ARRAY
+#define INIT_ORDER(A, B) (TYPE) B, (TYPE) A
+#define ELEMENT_ORDER(A, B) (TYPE) A, (TYPE) B
+#define ENDIAN "-mlittle -maltivec=be"
+
+#else
+#define INIT_ORDER(A, B) (TYPE) B, (TYPE) A
+#define ELEMENT_ORDER(A, B) (TYPE) B, (TYPE) A
+#define ENDIAN "-mlittle"
+#endif
+
+static volatile TYPE five = FIVE;
+static volatile TYPE six = SIX;
+static volatile vector TYPE s_v12 = { ONE, TWO };
+static volatile vector TYPE g_v34 = { THREE, FOUR };
+
+
+__attribute__((__noinline__))
+static void
+vector_check (vector TYPE v, TYPE expect_hi, TYPE expect_lo)
+{
+ TYPE actual_hi, actual_lo;
+#ifdef DEBUG
+ const char *pass_fail;
+#endif
+
+ __asm__ ("xxlor %x0,%x1,%x1" : "=&wa" (actual_hi) : "wa" (v));
+ __asm__ ("xxpermdi %x0,%x1,%x1,3" : "=&wa" (actual_lo) : "wa" (v));
+
+#ifdef DEBUG
+ if ((actual_hi == expect_hi) && (actual_lo == expect_lo))
+ pass_fail = ", pass";
+ else
+ {
+ pass_fail = ", fail";
+ errors++;
+ }
+
+ printf ("Expected %" FMT ", %" FMT ", got %" FMT ", %" FMT "%s\n",
+ expect_hi, expect_lo,
+ actual_hi, actual_lo,
+ pass_fail);
+#else
+ if ((actual_hi != expect_hi) || (actual_lo != expect_lo))
+ abort ();
+#endif
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+combine (TYPE op0, TYPE op1)
+{
+ return (vector TYPE) { op0, op1 };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+combine_insert (TYPE op0, TYPE op1)
+{
+ vector TYPE ret = (vector TYPE) { ZERO, ZERO };
+ ret = vec_insert (op0, ret, 0);
+ ret = vec_insert (op1, ret, 1);
+ return ret;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_00 (vector TYPE a, vector TYPE b)
+{
+ return (vector TYPE) { vec_extract (a, 0), vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_01 (vector TYPE a, vector TYPE b)
+{
+ return (vector TYPE) { vec_extract (a, 0), vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_10 (vector TYPE a, vector TYPE b)
+{
+ return (vector TYPE) { vec_extract (a, 1), vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_11 (vector TYPE a, vector TYPE b)
+{
+ return (vector TYPE) { vec_extract (a, 1), vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_0s (vector TYPE a, TYPE b)
+{
+ return (vector TYPE) { vec_extract (a, 0), b };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_1s (vector TYPE a, TYPE b)
+{
+ return (vector TYPE) { vec_extract (a, 1), b };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_s0 (TYPE a, vector TYPE b)
+{
+ return (vector TYPE) { a, vec_extract (b, 0) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract2_s1 (TYPE a, vector TYPE b)
+{
+ return (vector TYPE) { a, vec_extract (b, 1) };
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+concat_extract_nn (vector TYPE a, vector TYPE b, size_t i, size_t j)
+{
+ return (vector TYPE) { vec_extract (a, i), vec_extract (b, j) };
+}
+
+#ifndef NO_ARRAY
+__attribute__((__noinline__))
+static vector TYPE
+array_0 (vector TYPE v, TYPE a)
+{
+ v[0] = a;
+ return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_1 (vector TYPE v, TYPE a)
+{
+ v[1] = a;
+ return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_01 (vector TYPE v, TYPE a, TYPE b)
+{
+ v[0] = a;
+ v[1] = b;
+ return v;
+}
+
+__attribute__((__noinline__))
+static vector TYPE
+array_01b (TYPE a, TYPE b)
+{
+ vector TYPE v = (vector TYPE) { 0, 0 };
+ v[0] = a;
+ v[1] = b;
+ return v;
+}
+#endif
+
+int
+main (void)
+{
+ vector TYPE a = (vector TYPE) { ONE, TWO };
+ vector TYPE b = (vector TYPE) { THREE, FOUR };
+ size_t i, j;
+
+#ifndef NO_ARRAY
+ vector TYPE z = (vector TYPE) { ZERO, ZERO };
+#endif
+
+ DEBUG2 ("Endian: %s, type: %s\n", ENDIAN, STYPE);
+ DEBUG0 ("\nStatic/global initialization\n");
+ vector_check (s_v12, INIT_ORDER (1, 2));
+ vector_check (g_v34, INIT_ORDER (3, 4));
+
+ DEBUG0 ("\nVector via constant runtime intiialization\n");
+ vector_check (a, INIT_ORDER (1, 2));
+ vector_check (b, INIT_ORDER (3, 4));
+
+ DEBUG0 ("\nCombine scalars using vector initialization\n");
+ vector_check (combine (1, 2), INIT_ORDER (1, 2));
+ vector_check (combine (3, 4), INIT_ORDER (3, 4));
+
+ DEBUG0 ("\nSetup with vec_insert\n");
+ a = combine_insert (1, 2);
+ b = combine_insert (3, 4);
+ vector_check (a, ELEMENT_ORDER (1, 2));
+ vector_check (b, ELEMENT_ORDER (3, 4));
+
+#ifndef NO_ARRAY
+ DEBUG0 ("\nTesting array syntax\n");
+ vector_check (array_0 (a, FIVE), ELEMENT_ORDER (5, 2));
+ vector_check (array_1 (b, SIX), ELEMENT_ORDER (3, 6));
+ vector_check (array_01 (z, FIVE, SIX), ELEMENT_ORDER (5, 6));
+ vector_check (array_01b (FIVE, SIX), ELEMENT_ORDER (5, 6));
+
+ vector_check (array_0 (a, five), ELEMENT_ORDER (5, 2));
+ vector_check (array_1 (b, six), ELEMENT_ORDER (3, 6));
+ vector_check (array_01 (z, five, six), ELEMENT_ORDER (5, 6));
+ vector_check (array_01b (five, six), ELEMENT_ORDER (5, 6));
+#else
+ DEBUG0 ("\nSkipping array syntax on -maltivec=be\n");
+#endif
+
+ DEBUG0 ("\nTesting concat and extract\n");
+ vector_check (concat_extract_00 (a, b), INIT_ORDER (1, 3));
+ vector_check (concat_extract_01 (a, b), INIT_ORDER (1, 4));
+ vector_check (concat_extract_10 (a, b), INIT_ORDER (2, 3));
+ vector_check (concat_extract_11 (a, b), INIT_ORDER (2, 4));
+
+ DEBUG0 ("\nTesting concat and extract #2\n");
+ vector_check (concat_extract2_0s (a, FIVE), INIT_ORDER (1, 5));
+ vector_check (concat_extract2_1s (a, FIVE), INIT_ORDER (2, 5));
+ vector_check (concat_extract2_s0 (SIX, a), INIT_ORDER (6, 1));
+ vector_check (concat_extract2_s1 (SIX, a), INIT_ORDER (6, 2));
+
+ DEBUG0 ("\nTesting variable concat and extract\n");
+ for (i = 0; i < 2; i++)
+ {
+ for (j = 0; j < 2; j++)
+ {
+ static struct {
+ TYPE hi;
+ TYPE lo;
+ } hilo[2][2] =
+ { { { ONE, THREE }, { ONE, FOUR } },
+ { { TWO, THREE }, { TWO, FOUR } } };
+
+ vector_check (concat_extract_nn (a, b, i, j),
+ INIT_ORDER (hilo[i][j].hi, hilo[i][j].lo));
+ }
+ }
+
+ DEBUG0 ("\nTesting separate function\n");
+ vector_check (combine (vec_extract (a, 0), vec_extract (b, 0)),
+ INIT_ORDER (1, 3));
+
+ vector_check (combine (vec_extract (a, 0), vec_extract (b, 1)),
+ INIT_ORDER (1, 4));
+
+ vector_check (combine (vec_extract (a, 1), vec_extract (b, 0)),
+ INIT_ORDER (2, 3));
+
+ vector_check (combine (vec_extract (a, 1), vec_extract (b, 1)),
+ INIT_ORDER (2, 4));
+
+ vector_check (combine_insert (vec_extract (a, 0), vec_extract (b, 0)),
+ ELEMENT_ORDER (1, 3));
+
+ vector_check (combine_insert (vec_extract (a, 0), vec_extract (b, 1)),
+ ELEMENT_ORDER (1, 4));
+
+ vector_check (combine_insert (vec_extract (a, 1), vec_extract (b, 0)),
+ ELEMENT_ORDER (2, 3));
+
+ vector_check (combine_insert (vec_extract (a, 1), vec_extract (b, 1)),
+ ELEMENT_ORDER (2, 4));
+
+
+#if defined(DO_DOUBLE)
+ DEBUG0 ("\nTesting explicit 2df concat\n");
+ vector_check (__builtin_vsx_concat_2df (FIVE, SIX), INIT_ORDER (5, 6));
+ vector_check (__builtin_vsx_concat_2df (five, six), INIT_ORDER (5, 6));
+
+#elif defined(_ARCH_PPC64)
+ DEBUG0 ("\nTesting explicit 2di concat\n");
+ vector_check (__builtin_vsx_concat_2di (FIVE, SIX), INIT_ORDER (5, 6));
+ vector_check (__builtin_vsx_concat_2di (five, six), INIT_ORDER (5, 6));
+
+#else
+ DEBUG0 ("\nSkip explicit 2di concat on 32-bit\n");
+#endif
+
+#ifdef DEBUG
+ if (errors)
+ printf ("\n%d error%s were found", errors, (errors == 1) ? "" : "s");
+ else
+ printf ("\nNo errors were found.\n");
+
+ return errors;
+
+#else
+ return 0;
+#endif
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-be-double.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250878)
@@ -0,0 +1,12 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+ elements. This test uses the double datatype.
+
+ This test explicitly tests -maltivec=be to make sure things are correct. */
+
+#define DO_DOUBLE
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-double.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-double.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-double.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250878)
@@ -0,0 +1,11 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+ elements. This test uses the double datatype and the default endian
+ order. */
+
+#define DO_DOUBLE
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vec-setup-long.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-setup-long.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-setup-long.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250878)
@@ -0,0 +1,9 @@
+/* { dg-do run { target { powerpc*-*-linux* } } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* Test various ways of creating vectors with 2 double words and accessing the
+ elements. This test uses the long (on 64-bit systems) or long long datatype
+ (on 32-bit systems). The default endian order is used. */
+
+#include "vec-setup.h"
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250858)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector unsigned long
+test_vpasted (vector unsigned long high, vector unsigned long low)
+{
+ vector unsigned long res;
+ res[1] = high[1];
+ res[0] = low[0];
+ return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mvspltisw\M} } } */
+/* { dg-final { scan-assembler-not {\mxxlor\M} } } */
+/* { dg-final { scan-assembler-not {\mxxlxor\M} } } */
+/* { dg-final { scan-assembler-not {\mxxspltib\M} } } */
+/* { dg-final { scan-assembler-not {\mlxvx?\M} } } */
+/* { dg-final { scan-assembler-not {\mlxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not {\mlvx\M} } } */
+/* { dg-final { scan-assembler-not {\mstxvx?\M} } } */
+/* { dg-final { scan-assembler-not {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not {\mstvx\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c (.../gcc/testsuite/gcc.target/powerpc) (revision 250858)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector double
+test_vpasted (vector double high, vector double low)
+{
+ vector double res;
+ res[1] = high[1];
+ res[0] = low[0];
+ return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mvspltisw\M} } } */
+/* { dg-final { scan-assembler-not {\mxxlor\M} } } */
+/* { dg-final { scan-assembler-not {\mxxlxor\M} } } */
+/* { dg-final { scan-assembler-not {\mxxspltib\M} } } */
+/* { dg-final { scan-assembler-not {\mlxvx?\M} } } */
+/* { dg-final { scan-assembler-not {\mlxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not {\mlvx\M} } } */
+/* { dg-final { scan-assembler-not {\mstxvx?\M} } } */
+/* { dg-final { scan-assembler-not {\mstxv[dw][24]x\M} } } */
+/* { dg-final { scan-assembler-not {\mstvx\M} } } */
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts
2017-08-07 13:18 ` Michael Meissner
@ 2017-08-07 21:54 ` Segher Boessenkool
0 siblings, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2017-08-07 21:54 UTC (permalink / raw)
To: Michael Meissner, GCC Patches, David Edelsohn, Bill Schmidt
On Mon, Aug 07, 2017 at 09:18:30AM -0400, Michael Meissner wrote:
> > I don't like using NULL as a magic value at all; it does not simplify
> > this interface, it complicates it instead.
> >
> > Can you move the "which half is high" decision to the callers?
>
> I rewrote the patch to eliminate the rs6000_output_xxpermdi function, and do
> the calculation of the XXPERMDI mask in each of the vsx_concat_<mask>_{1,2,3}
> insns. Just to be sure I got things correct, I wrote a new executable test
> that tests various methods of creating/inserting 2 element vectors with double
> word elements, and tested in BE, LE -maltivec=be, and LE, and the results match
> previous compilers.
>
> I have done bootstrap/build checks on a big endian power7, a little endian
> power8 system, and I have done a non-bootstrap/check on a power9 prototype (I
> have script issues that prevents a bootstrap build on power9 that I need to
> look into). There are no regressions in the tests and the new tests were run
> on each of the systems. Can I check this into the trunk?
>
> I would also like to backport it to all open branches (particularly GCC 7, but
> GCC 6 if possible). Note, the patch will need a slight tweak on the older
> systems due to GCC 7 still supporting -mupper-regs-{df,di} and I have to adjust
> the constraints to accomidate this, and under GCC 6 DImode not being allowed in
> traditional Altivec registers.
Thanks! Okay for trunk. The 7 branch is frozen; okay for 7 after the
release, and for 6 too.
Segher
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-08-07 21:54 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27 23:21 [PATCH], PR target/81593, Optimize PowerPC vector sets coming from a vector extracts Michael Meissner
2017-07-28 7:51 ` Richard Biener
2017-07-28 8:02 ` Andrew Pinski
2017-07-28 8:20 ` Richard Biener
2017-07-28 8:42 ` Marc Glisse
2017-07-28 14:44 ` Michael Meissner
2017-07-28 21:09 ` Segher Boessenkool
2017-07-30 14:01 ` Bill Schmidt
2017-07-31 17:41 ` Michael Meissner
2017-07-31 17:42 ` Bill Schmidt
2017-07-31 17:38 ` Michael Meissner
2017-08-02 14:29 ` Michael Meissner
2017-08-03 15:01 ` Segher Boessenkool
2017-08-03 17:23 ` Michael Meissner
2017-08-07 13:18 ` Michael Meissner
2017-08-07 21:54 ` Segher Boessenkool
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).