* [PATCH, rs6000] folding of vector stores in GIMPLE
@ 2017-09-21 20:56 Will Schmidt
2017-09-22 10:27 ` Richard Biener
0 siblings, 1 reply; 4+ messages in thread
From: Will Schmidt @ 2017-09-21 20:56 UTC (permalink / raw)
To: GCC Patches
Cc: Segher Boessenkool, Richard Biener, Bill Schmidt, David Edelsohn
Hi,
Folding of vector stores in GIMPLE.
- Add code to handle gimple folding for the vec_st (vector store) builtins.
- Remove the now obsoleted folding code for vec_st from rs6000-c-c.
There are two spots that I could use some feedback on.
First -
An early exit remains in place prevent folding of statements that do not
have a LHS. To allow folding of the stores to get past the check, I have
added a helper function (rs6000_builtin_valid_without_lhs) that allows
those store intrinsics to proceed. I'm not sure the approach (or the name I chose)
is the best choice, so I'll defer to recommendations on how to improve that. :-)
Second -
This code (as-is) is subject to a TBAA related issue (similar to what was noticed
in the gimple folding code for loads. As-is, with a testcase such as :
void testst_struct1b (vector double vd1, long long ll1, struct S *p)
{
vec_st (vd1, ll1, (vector double *)p);
}
will generate gimple that looks like:
MEM[(struct S *)D.3218] = vd1;
If I rework the code, setting arg2_type to be ptr_type_node, i.e.
+ tree arg2_type = TREE_TYPE (arg2);
to:
+ tree arg2_type = ptr_type_node;
the generated gimple then looks like
MEM[(void *)D.3218] = vd1;
Which is probably OK, but I cannot say for certain. The generated .s content is at least equivalent.
The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which
has been posted separately.
regtest looks clean on power6 and newer.
pending feedback, OK for trunk?
Thanks,
-Will
[gcc]
2017-09-21 Will Schmidt <will_schmidt@vnet.ibm.com>
* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
for early folding of vector stores (ALTIVEC_BUILTIN_ST_*).
(rs6000_builtin_valid_without_lhs): helper function.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST.
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index a49db97..4a363a1 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
convert (TREE_TYPE (stmt), arg0));
stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
return stmt;
}
- /* Expand vec_st into an expression that masks the address and
- performs the store. We need to expand this early to allow
- the best aliasing, as by the time we get into RTL we no longer
- are able to honor __restrict__, for example. We may want to
- consider this for all memory access built-ins.
-
- When -maltivec=be is specified, or the wrong number of arguments
- is provided, simply punt to existing built-in processing. */
-
- if (fcode == ALTIVEC_BUILTIN_VEC_ST
- && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
- && nargs == 3)
- {
- tree arg0 = (*arglist)[0];
- tree arg1 = (*arglist)[1];
- tree arg2 = (*arglist)[2];
-
- /* Construct the masked address. Let existing error handling take
- over if we don't have a constant offset. */
- arg1 = fold (arg1);
-
- if (TREE_CODE (arg1) == INTEGER_CST)
- {
- if (!ptrofftype_p (TREE_TYPE (arg1)))
- arg1 = build1 (NOP_EXPR, sizetype, arg1);
-
- tree arg2_type = TREE_TYPE (arg2);
- if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ())
- {
- /* Force array-to-pointer decay for C++. */
- arg2 = default_conversion (arg2);
- arg2_type = TREE_TYPE (arg2);
- }
-
- /* Find the built-in to make sure a compatible one exists; if not
- we fall back to default handling to get the error message. */
- for (desc = altivec_overloaded_builtins;
- desc->code && desc->code != fcode; desc++)
- continue;
-
- for (; desc->code == fcode; desc++)
- if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)
- && rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2)
- && rs6000_builtin_type_compatible (TREE_TYPE (arg2),
- desc->op3))
- {
- tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type,
- arg2, arg1);
- tree aligned
- = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type,
- addr, build_int_cst (arg2_type, -16));
-
- tree arg0_type = TREE_TYPE (arg0);
- if (TYPE_MODE (arg0_type) == V2DImode)
- /* Type-based aliasing analysis thinks vector long
- and vector long long are different and will put them
- in distinct alias classes. Force our address type
- to be a may-alias type to avoid this. */
- arg0_type
- = build_pointer_type_for_mode (arg0_type, Pmode,
- true/*can_alias_all*/);
- else
- arg0_type = build_pointer_type (arg0_type);
- aligned = build1 (NOP_EXPR, arg0_type, aligned);
- tree stg = build_indirect_ref (loc, aligned, RO_NULL);
- tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg,
- convert (TREE_TYPE (stg), arg0));
- return retval;
- }
- }
- }
-
for (n = 0;
!VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
fnargs = TREE_CHAIN (fnargs), n++)
{
tree decl_type = TREE_VALUE (fnargs);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1978634..ef41534 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16155,10 +16155,29 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
#else
return NULL_TREE;
#endif
}
+/* Helper function to sort out which built-ins may be valid without having
+ a LHS. */
+bool
+rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
+{
+ switch (fn_code)
+ {
+ case ALTIVEC_BUILTIN_STVX_V16QI:
+ case ALTIVEC_BUILTIN_STVX_V8HI:
+ case ALTIVEC_BUILTIN_STVX_V4SI:
+ case ALTIVEC_BUILTIN_STVX_V4SF:
+ case ALTIVEC_BUILTIN_STVX_V2DI:
+ case ALTIVEC_BUILTIN_STVX_V2DF:
+ return true;
+ default:
+ return false;
+ }
+}
+
/* Fold a machine-dependent built-in in GIMPLE. (For folding into
a constant, use rs6000_fold_builtin.) */
bool
rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
@@ -16182,12 +16201,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
fn_code, fn_name1, fn_name2);
if (!rs6000_fold_gimple)
return false;
- /* Generic solution to prevent gimple folding of code without a LHS. */
- if (!gimple_call_lhs (stmt))
+ /* Prevent gimple folding for code that does not have a LHS, unless it is
+ allowed per the rs6000_builtin_valid_without_lhs helper function. */
+ if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code))
return false;
switch (fn_code)
{
/* Flavors of vec_add. We deliberately don't expand
@@ -16585,11 +16605,48 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
build_int_cst (arg1_type, 0)));
gimple_set_location (g, loc);
gsi_replace (gsi, g, true);
return true;
}
-
+ /* Vector stores. */
+ case ALTIVEC_BUILTIN_STVX_V16QI:
+ case ALTIVEC_BUILTIN_STVX_V8HI:
+ case ALTIVEC_BUILTIN_STVX_V4SI:
+ case ALTIVEC_BUILTIN_STVX_V4SF:
+ case ALTIVEC_BUILTIN_STVX_V2DI:
+ case ALTIVEC_BUILTIN_STVX_V2DF:
+ {
+ /* Do not fold for -maltivec=be on LE targets. */
+ if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN)
+ return false;
+ arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. */
+ arg1 = gimple_call_arg (stmt, 1); /* Offset. */
+ tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address. */
+ location_t loc = gimple_location (stmt);
+ tree arg0_type = TREE_TYPE (arg0);
+ tree arg2_type = TREE_TYPE (arg2);
+ /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. Create
+ the tree using the value from arg0. The resulting type will match
+ the type of arg2. */
+ gimple_seq stmts = NULL;
+ tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
+ tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
+ arg2_type, arg2, temp_offset);
+ /* Mask off any lower bits from the address. */
+ tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
+ arg2_type, temp_addr,
+ build_int_cst (arg2_type, -16));
+ gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+ /* The desired gimple result should be similar to:
+ MEM[(__vector floatD.1407 *)_1] = vf1D.2697; */
+ gimple *g;
+ g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
+ build_int_cst (arg2_type, 0)), arg0);
+ gimple_set_location (g, loc);
+ gsi_replace (gsi, g, true);
+ return true;
+ }
default:
if (TARGET_DEBUG_BUILTIN)
fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
fn_code, fn_name1, fn_name2);
break;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, rs6000] folding of vector stores in GIMPLE
2017-09-21 20:56 [PATCH, rs6000] folding of vector stores in GIMPLE Will Schmidt
@ 2017-09-22 10:27 ` Richard Biener
2017-09-22 19:59 ` Bill Schmidt
0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2017-09-22 10:27 UTC (permalink / raw)
To: will_schmidt
Cc: GCC Patches, Segher Boessenkool, Bill Schmidt, David Edelsohn
On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt
<will_schmidt@vnet.ibm.com> wrote:
> Hi,
>
> Folding of vector stores in GIMPLE.
>
> - Add code to handle gimple folding for the vec_st (vector store) builtins.
> - Remove the now obsoleted folding code for vec_st from rs6000-c-c.
>
> There are two spots that I could use some feedback on.
>
> First -
> An early exit remains in place prevent folding of statements that do not
> have a LHS. To allow folding of the stores to get past the check, I have
> added a helper function (rs6000_builtin_valid_without_lhs) that allows
> those store intrinsics to proceed. I'm not sure the approach (or the name I chose)
> is the best choice, so I'll defer to recommendations on how to improve that. :-)
>
> Second -
> This code (as-is) is subject to a TBAA related issue (similar to what was noticed
> in the gimple folding code for loads. As-is, with a testcase such as :
>
> void testst_struct1b (vector double vd1, long long ll1, struct S *p)
> {
> vec_st (vd1, ll1, (vector double *)p);
> }
>
> will generate gimple that looks like:
> MEM[(struct S *)D.3218] = vd1;
>
> If I rework the code, setting arg2_type to be ptr_type_node, i.e.
> + tree arg2_type = TREE_TYPE (arg2);
> to:
> + tree arg2_type = ptr_type_node;
>
> the generated gimple then looks like
> MEM[(void *)D.3218] = vd1;
>
> Which is probably OK, but I cannot say for certain. The generated .s content is at least equivalent.
It looks safe.
The question I had is whether vec_st (vd1, ll1, (vector double *)p) is
equivalent
to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective. Thus
whether the type of the tird argument to vec_st defines the type of the access
(in C standards meaning). If so then what we do now is pessimizing (but
as you say previously (long long *) and (long *) were aliased together and
you got wrong-code with aliasing with regular stores of the "wrong" same type).
A proper fix would be to transition this type as seen from the frontend to
GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing
that on an extra argument, a constant zero pointer of the alias
pointer type to use
(which would also serve as a type indicator of the store itself). I'd use a
target specific internal function for this (not sure if we can have those target
specific, but I guess if it's folded away then that's fine) and get away with
the overload set.
Richard.
> The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which
> has been posted separately.
>
> regtest looks clean on power6 and newer.
>
> pending feedback, OK for trunk?
>
> Thanks,
> -Will
>
> [gcc]
>
> 2017-09-21 Will Schmidt <will_schmidt@vnet.ibm.com>
>
> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
> for early folding of vector stores (ALTIVEC_BUILTIN_ST_*).
> (rs6000_builtin_valid_without_lhs): helper function.
> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST.
>
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index a49db97..4a363a1 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
> convert (TREE_TYPE (stmt), arg0));
> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> return stmt;
> }
>
> - /* Expand vec_st into an expression that masks the address and
> - performs the store. We need to expand this early to allow
> - the best aliasing, as by the time we get into RTL we no longer
> - are able to honor __restrict__, for example. We may want to
> - consider this for all memory access built-ins.
> -
> - When -maltivec=be is specified, or the wrong number of arguments
> - is provided, simply punt to existing built-in processing. */
> -
> - if (fcode == ALTIVEC_BUILTIN_VEC_ST
> - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
> - && nargs == 3)
> - {
> - tree arg0 = (*arglist)[0];
> - tree arg1 = (*arglist)[1];
> - tree arg2 = (*arglist)[2];
> -
> - /* Construct the masked address. Let existing error handling take
> - over if we don't have a constant offset. */
> - arg1 = fold (arg1);
> -
> - if (TREE_CODE (arg1) == INTEGER_CST)
> - {
> - if (!ptrofftype_p (TREE_TYPE (arg1)))
> - arg1 = build1 (NOP_EXPR, sizetype, arg1);
> -
> - tree arg2_type = TREE_TYPE (arg2);
> - if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ())
> - {
> - /* Force array-to-pointer decay for C++. */
> - arg2 = default_conversion (arg2);
> - arg2_type = TREE_TYPE (arg2);
> - }
> -
> - /* Find the built-in to make sure a compatible one exists; if not
> - we fall back to default handling to get the error message. */
> - for (desc = altivec_overloaded_builtins;
> - desc->code && desc->code != fcode; desc++)
> - continue;
> -
> - for (; desc->code == fcode; desc++)
> - if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)
> - && rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2)
> - && rs6000_builtin_type_compatible (TREE_TYPE (arg2),
> - desc->op3))
> - {
> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type,
> - arg2, arg1);
> - tree aligned
> - = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type,
> - addr, build_int_cst (arg2_type, -16));
> -
> - tree arg0_type = TREE_TYPE (arg0);
> - if (TYPE_MODE (arg0_type) == V2DImode)
> - /* Type-based aliasing analysis thinks vector long
> - and vector long long are different and will put them
> - in distinct alias classes. Force our address type
> - to be a may-alias type to avoid this. */
> - arg0_type
> - = build_pointer_type_for_mode (arg0_type, Pmode,
> - true/*can_alias_all*/);
> - else
> - arg0_type = build_pointer_type (arg0_type);
> - aligned = build1 (NOP_EXPR, arg0_type, aligned);
> - tree stg = build_indirect_ref (loc, aligned, RO_NULL);
> - tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg,
> - convert (TREE_TYPE (stg), arg0));
> - return retval;
> - }
> - }
> - }
> -
> for (n = 0;
> !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
> fnargs = TREE_CHAIN (fnargs), n++)
> {
> tree decl_type = TREE_VALUE (fnargs);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 1978634..ef41534 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -16155,10 +16155,29 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
> #else
> return NULL_TREE;
> #endif
> }
>
> +/* Helper function to sort out which built-ins may be valid without having
> + a LHS. */
> +bool
> +rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
> +{
> + switch (fn_code)
> + {
> + case ALTIVEC_BUILTIN_STVX_V16QI:
> + case ALTIVEC_BUILTIN_STVX_V8HI:
> + case ALTIVEC_BUILTIN_STVX_V4SI:
> + case ALTIVEC_BUILTIN_STVX_V4SF:
> + case ALTIVEC_BUILTIN_STVX_V2DI:
> + case ALTIVEC_BUILTIN_STVX_V2DF:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> /* Fold a machine-dependent built-in in GIMPLE. (For folding into
> a constant, use rs6000_fold_builtin.) */
>
> bool
> rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> @@ -16182,12 +16201,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> fn_code, fn_name1, fn_name2);
>
> if (!rs6000_fold_gimple)
> return false;
>
> - /* Generic solution to prevent gimple folding of code without a LHS. */
> - if (!gimple_call_lhs (stmt))
> + /* Prevent gimple folding for code that does not have a LHS, unless it is
> + allowed per the rs6000_builtin_valid_without_lhs helper function. */
> + if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code))
> return false;
>
> switch (fn_code)
> {
> /* Flavors of vec_add. We deliberately don't expand
> @@ -16585,11 +16605,48 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> build_int_cst (arg1_type, 0)));
> gimple_set_location (g, loc);
> gsi_replace (gsi, g, true);
> return true;
> }
> -
> + /* Vector stores. */
> + case ALTIVEC_BUILTIN_STVX_V16QI:
> + case ALTIVEC_BUILTIN_STVX_V8HI:
> + case ALTIVEC_BUILTIN_STVX_V4SI:
> + case ALTIVEC_BUILTIN_STVX_V4SF:
> + case ALTIVEC_BUILTIN_STVX_V2DI:
> + case ALTIVEC_BUILTIN_STVX_V2DF:
> + {
> + /* Do not fold for -maltivec=be on LE targets. */
> + if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN)
> + return false;
> + arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. */
> + arg1 = gimple_call_arg (stmt, 1); /* Offset. */
> + tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address. */
> + location_t loc = gimple_location (stmt);
> + tree arg0_type = TREE_TYPE (arg0);
> + tree arg2_type = TREE_TYPE (arg2);
> + /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. Create
> + the tree using the value from arg0. The resulting type will match
> + the type of arg2. */
> + gimple_seq stmts = NULL;
> + tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
> + tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
> + arg2_type, arg2, temp_offset);
> + /* Mask off any lower bits from the address. */
> + tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
> + arg2_type, temp_addr,
> + build_int_cst (arg2_type, -16));
> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> + /* The desired gimple result should be similar to:
> + MEM[(__vector floatD.1407 *)_1] = vf1D.2697; */
> + gimple *g;
> + g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
> + build_int_cst (arg2_type, 0)), arg0);
> + gimple_set_location (g, loc);
> + gsi_replace (gsi, g, true);
> + return true;
> + }
> default:
> if (TARGET_DEBUG_BUILTIN)
> fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
> fn_code, fn_name1, fn_name2);
> break;
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, rs6000] folding of vector stores in GIMPLE
2017-09-22 10:27 ` Richard Biener
@ 2017-09-22 19:59 ` Bill Schmidt
2017-09-22 23:44 ` Segher Boessenkool
0 siblings, 1 reply; 4+ messages in thread
From: Bill Schmidt @ 2017-09-22 19:59 UTC (permalink / raw)
To: Richard Biener
Cc: will_schmidt, GCC Patches, Segher Boessenkool, David Edelsohn
On Sep 22, 2017, at 5:27 AM, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt
> <will_schmidt@vnet.ibm.com> wrote:
>> Hi,
>>
>> Folding of vector stores in GIMPLE.
>>
>> - Add code to handle gimple folding for the vec_st (vector store) builtins.
>> - Remove the now obsoleted folding code for vec_st from rs6000-c-c.
>>
>> There are two spots that I could use some feedback on.
>>
>> First -
>> An early exit remains in place prevent folding of statements that do not
>> have a LHS. To allow folding of the stores to get past the check, I have
>> added a helper function (rs6000_builtin_valid_without_lhs) that allows
>> those store intrinsics to proceed. I'm not sure the approach (or the name I chose)
>> is the best choice, so I'll defer to recommendations on how to improve that. :-)
It's fine, but please make the helper function static.
>>
>> Second -
>> This code (as-is) is subject to a TBAA related issue (similar to what was noticed
>> in the gimple folding code for loads. As-is, with a testcase such as :
>>
>> void testst_struct1b (vector double vd1, long long ll1, struct S *p)
>> {
>> vec_st (vd1, ll1, (vector double *)p);
>> }
>>
>> will generate gimple that looks like:
>> MEM[(struct S *)D.3218] = vd1;
>>
>> If I rework the code, setting arg2_type to be ptr_type_node, i.e.
>> + tree arg2_type = TREE_TYPE (arg2);
>> to:
>> + tree arg2_type = ptr_type_node;
>>
>> the generated gimple then looks like
>> MEM[(void *)D.3218] = vd1;
>>
>> Which is probably OK, but I cannot say for certain. The generated .s content is at least equivalent.
>
> It looks safe.
>
> The question I had is whether vec_st (vd1, ll1, (vector double *)p) is
> equivalent
> to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective. Thus
> whether the type of the tird argument to vec_st defines the type of the access
> (in C standards meaning). If so then what we do now is pessimizing (but
> as you say previously (long long *) and (long *) were aliased together and
> you got wrong-code with aliasing with regular stores of the "wrong" same type).
>
> A proper fix would be to transition this type as seen from the frontend to
> GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing
> that on an extra argument, a constant zero pointer of the alias
> pointer type to use
> (which would also serve as a type indicator of the store itself). I'd use a
> target specific internal function for this (not sure if we can have those target
> specific, but I guess if it's folded away then that's fine) and get away with
> the overload set.
OK. Will, for now, let's again use the (void *) solution for the time being, and
add commentary recording this improvement for future work. Same would
go for the vec_vsx_ld/st variations once you get to those.
Otherwise the patch looks ok to me. I'll defer to Segher for approval, of course.
Thanks,
Bill
>
> Richard.
>
>> The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which
>> has been posted separately.
>>
>> regtest looks clean on power6 and newer.
>>
>> pending feedback, OK for trunk?
>>
>> Thanks,
>> -Will
>>
>> [gcc]
>>
>> 2017-09-21 Will Schmidt <will_schmidt@vnet.ibm.com>
>>
>> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>> for early folding of vector stores (ALTIVEC_BUILTIN_ST_*).
>> (rs6000_builtin_valid_without_lhs): helper function.
>> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>> Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST.
>>
>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>> index a49db97..4a363a1 100644
>> --- a/gcc/config/rs6000/rs6000-c.c
>> +++ b/gcc/config/rs6000/rs6000-c.c
>> @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>> convert (TREE_TYPE (stmt), arg0));
>> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>> return stmt;
>> }
>>
>> - /* Expand vec_st into an expression that masks the address and
>> - performs the store. We need to expand this early to allow
>> - the best aliasing, as by the time we get into RTL we no longer
>> - are able to honor __restrict__, for example. We may want to
>> - consider this for all memory access built-ins.
>> -
>> - When -maltivec=be is specified, or the wrong number of arguments
>> - is provided, simply punt to existing built-in processing. */
>> -
>> - if (fcode == ALTIVEC_BUILTIN_VEC_ST
>> - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>> - && nargs == 3)
>> - {
>> - tree arg0 = (*arglist)[0];
>> - tree arg1 = (*arglist)[1];
>> - tree arg2 = (*arglist)[2];
>> -
>> - /* Construct the masked address. Let existing error handling take
>> - over if we don't have a constant offset. */
>> - arg1 = fold (arg1);
>> -
>> - if (TREE_CODE (arg1) == INTEGER_CST)
>> - {
>> - if (!ptrofftype_p (TREE_TYPE (arg1)))
>> - arg1 = build1 (NOP_EXPR, sizetype, arg1);
>> -
>> - tree arg2_type = TREE_TYPE (arg2);
>> - if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ())
>> - {
>> - /* Force array-to-pointer decay for C++. */
>> - arg2 = default_conversion (arg2);
>> - arg2_type = TREE_TYPE (arg2);
>> - }
>> -
>> - /* Find the built-in to make sure a compatible one exists; if not
>> - we fall back to default handling to get the error message. */
>> - for (desc = altivec_overloaded_builtins;
>> - desc->code && desc->code != fcode; desc++)
>> - continue;
>> -
>> - for (; desc->code == fcode; desc++)
>> - if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)
>> - && rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2)
>> - && rs6000_builtin_type_compatible (TREE_TYPE (arg2),
>> - desc->op3))
>> - {
>> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type,
>> - arg2, arg1);
>> - tree aligned
>> - = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type,
>> - addr, build_int_cst (arg2_type, -16));
>> -
>> - tree arg0_type = TREE_TYPE (arg0);
>> - if (TYPE_MODE (arg0_type) == V2DImode)
>> - /* Type-based aliasing analysis thinks vector long
>> - and vector long long are different and will put them
>> - in distinct alias classes. Force our address type
>> - to be a may-alias type to avoid this. */
>> - arg0_type
>> - = build_pointer_type_for_mode (arg0_type, Pmode,
>> - true/*can_alias_all*/);
>> - else
>> - arg0_type = build_pointer_type (arg0_type);
>> - aligned = build1 (NOP_EXPR, arg0_type, aligned);
>> - tree stg = build_indirect_ref (loc, aligned, RO_NULL);
>> - tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg,
>> - convert (TREE_TYPE (stg), arg0));
>> - return retval;
>> - }
>> - }
>> - }
>> -
>> for (n = 0;
>> !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
>> fnargs = TREE_CHAIN (fnargs), n++)
>> {
>> tree decl_type = TREE_VALUE (fnargs);
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 1978634..ef41534 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -16155,10 +16155,29 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
>> #else
>> return NULL_TREE;
>> #endif
>> }
>>
>> +/* Helper function to sort out which built-ins may be valid without having
>> + a LHS. */
>> +bool
>> +rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
>> +{
>> + switch (fn_code)
>> + {
>> + case ALTIVEC_BUILTIN_STVX_V16QI:
>> + case ALTIVEC_BUILTIN_STVX_V8HI:
>> + case ALTIVEC_BUILTIN_STVX_V4SI:
>> + case ALTIVEC_BUILTIN_STVX_V4SF:
>> + case ALTIVEC_BUILTIN_STVX_V2DI:
>> + case ALTIVEC_BUILTIN_STVX_V2DF:
>> + return true;
>> + default:
>> + return false;
>> + }
>> +}
>> +
>> /* Fold a machine-dependent built-in in GIMPLE. (For folding into
>> a constant, use rs6000_fold_builtin.) */
>>
>> bool
>> rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>> @@ -16182,12 +16201,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>> fn_code, fn_name1, fn_name2);
>>
>> if (!rs6000_fold_gimple)
>> return false;
>>
>> - /* Generic solution to prevent gimple folding of code without a LHS. */
>> - if (!gimple_call_lhs (stmt))
>> + /* Prevent gimple folding for code that does not have a LHS, unless it is
>> + allowed per the rs6000_builtin_valid_without_lhs helper function. */
>> + if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code))
>> return false;
>>
>> switch (fn_code)
>> {
>> /* Flavors of vec_add. We deliberately don't expand
>> @@ -16585,11 +16605,48 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>> build_int_cst (arg1_type, 0)));
>> gimple_set_location (g, loc);
>> gsi_replace (gsi, g, true);
>> return true;
>> }
>> -
>> + /* Vector stores. */
>> + case ALTIVEC_BUILTIN_STVX_V16QI:
>> + case ALTIVEC_BUILTIN_STVX_V8HI:
>> + case ALTIVEC_BUILTIN_STVX_V4SI:
>> + case ALTIVEC_BUILTIN_STVX_V4SF:
>> + case ALTIVEC_BUILTIN_STVX_V2DI:
>> + case ALTIVEC_BUILTIN_STVX_V2DF:
>> + {
>> + /* Do not fold for -maltivec=be on LE targets. */
>> + if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN)
>> + return false;
>> + arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. */
>> + arg1 = gimple_call_arg (stmt, 1); /* Offset. */
>> + tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address. */
>> + location_t loc = gimple_location (stmt);
>> + tree arg0_type = TREE_TYPE (arg0);
>> + tree arg2_type = TREE_TYPE (arg2);
>> + /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. Create
>> + the tree using the value from arg0. The resulting type will match
>> + the type of arg2. */
>> + gimple_seq stmts = NULL;
>> + tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
>> + tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
>> + arg2_type, arg2, temp_offset);
>> + /* Mask off any lower bits from the address. */
>> + tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
>> + arg2_type, temp_addr,
>> + build_int_cst (arg2_type, -16));
>> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> + /* The desired gimple result should be similar to:
>> + MEM[(__vector floatD.1407 *)_1] = vf1D.2697; */
>> + gimple *g;
>> + g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
>> + build_int_cst (arg2_type, 0)), arg0);
>> + gimple_set_location (g, loc);
>> + gsi_replace (gsi, g, true);
>> + return true;
>> + }
>> default:
>> if (TARGET_DEBUG_BUILTIN)
>> fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
>> fn_code, fn_name1, fn_name2);
>> break;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, rs6000] folding of vector stores in GIMPLE
2017-09-22 19:59 ` Bill Schmidt
@ 2017-09-22 23:44 ` Segher Boessenkool
0 siblings, 0 replies; 4+ messages in thread
From: Segher Boessenkool @ 2017-09-22 23:44 UTC (permalink / raw)
To: Bill Schmidt; +Cc: Richard Biener, will_schmidt, GCC Patches, David Edelsohn
On Fri, Sep 22, 2017 at 02:58:54PM -0500, Bill Schmidt wrote:
> OK. Will, for now, let's again use the (void *) solution for the time being, and
> add commentary recording this improvement for future work. Same would
> go for the vec_vsx_ld/st variations once you get to those.
>
> Otherwise the patch looks ok to me. I'll defer to Segher for approval, of course.
It's okay for trunk with the suggested improvements. Thanks for the
review!
Segher
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-22 23:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-21 20:56 [PATCH, rs6000] folding of vector stores in GIMPLE Will Schmidt
2017-09-22 10:27 ` Richard Biener
2017-09-22 19:59 ` Bill Schmidt
2017-09-22 23:44 ` Segher Boessenkool
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).