* [PATCH, rs6000] folding of vector stores in GIMPLE @ 2017-09-21 20:56 Will Schmidt 2017-09-22 10:27 ` Richard Biener 0 siblings, 1 reply; 4+ messages in thread From: Will Schmidt @ 2017-09-21 20:56 UTC (permalink / raw) To: GCC Patches Cc: Segher Boessenkool, Richard Biener, Bill Schmidt, David Edelsohn Hi, Folding of vector stores in GIMPLE. - Add code to handle gimple folding for the vec_st (vector store) builtins. - Remove the now obsoleted folding code for vec_st from rs6000-c-c. There are two spots that I could use some feedback on. First - An early exit remains in place prevent folding of statements that do not have a LHS. To allow folding of the stores to get past the check, I have added a helper function (rs6000_builtin_valid_without_lhs) that allows those store intrinsics to proceed. I'm not sure the approach (or the name I chose) is the best choice, so I'll defer to recommendations on how to improve that. :-) Second - This code (as-is) is subject to a TBAA related issue (similar to what was noticed in the gimple folding code for loads. As-is, with a testcase such as : void testst_struct1b (vector double vd1, long long ll1, struct S *p) { vec_st (vd1, ll1, (vector double *)p); } will generate gimple that looks like: MEM[(struct S *)D.3218] = vd1; If I rework the code, setting arg2_type to be ptr_type_node, i.e. + tree arg2_type = TREE_TYPE (arg2); to: + tree arg2_type = ptr_type_node; the generated gimple then looks like MEM[(void *)D.3218] = vd1; Which is probably OK, but I cannot say for certain. The generated .s content is at least equivalent. The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which has been posted separately. regtest looks clean on power6 and newer. pending feedback, OK for trunk? Thanks, -Will [gcc] 2017-09-21 Will Schmidt <will_schmidt@vnet.ibm.com> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling for early folding of vector stores (ALTIVEC_BUILTIN_ST_*). (rs6000_builtin_valid_without_lhs): helper function. * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST. diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index a49db97..4a363a1 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, convert (TREE_TYPE (stmt), arg0)); stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); return stmt; } - /* Expand vec_st into an expression that masks the address and - performs the store. We need to expand this early to allow - the best aliasing, as by the time we get into RTL we no longer - are able to honor __restrict__, for example. We may want to - consider this for all memory access built-ins. - - When -maltivec=be is specified, or the wrong number of arguments - is provided, simply punt to existing built-in processing. */ - - if (fcode == ALTIVEC_BUILTIN_VEC_ST - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) - && nargs == 3) - { - tree arg0 = (*arglist)[0]; - tree arg1 = (*arglist)[1]; - tree arg2 = (*arglist)[2]; - - /* Construct the masked address. Let existing error handling take - over if we don't have a constant offset. */ - arg1 = fold (arg1); - - if (TREE_CODE (arg1) == INTEGER_CST) - { - if (!ptrofftype_p (TREE_TYPE (arg1))) - arg1 = build1 (NOP_EXPR, sizetype, arg1); - - tree arg2_type = TREE_TYPE (arg2); - if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ()) - { - /* Force array-to-pointer decay for C++. */ - arg2 = default_conversion (arg2); - arg2_type = TREE_TYPE (arg2); - } - - /* Find the built-in to make sure a compatible one exists; if not - we fall back to default handling to get the error message. */ - for (desc = altivec_overloaded_builtins; - desc->code && desc->code != fcode; desc++) - continue; - - for (; desc->code == fcode; desc++) - if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1) - && rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2) - && rs6000_builtin_type_compatible (TREE_TYPE (arg2), - desc->op3)) - { - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type, - arg2, arg1); - tree aligned - = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type, - addr, build_int_cst (arg2_type, -16)); - - tree arg0_type = TREE_TYPE (arg0); - if (TYPE_MODE (arg0_type) == V2DImode) - /* Type-based aliasing analysis thinks vector long - and vector long long are different and will put them - in distinct alias classes. Force our address type - to be a may-alias type to avoid this. */ - arg0_type - = build_pointer_type_for_mode (arg0_type, Pmode, - true/*can_alias_all*/); - else - arg0_type = build_pointer_type (arg0_type); - aligned = build1 (NOP_EXPR, arg0_type, aligned); - tree stg = build_indirect_ref (loc, aligned, RO_NULL); - tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg, - convert (TREE_TYPE (stg), arg0)); - return retval; - } - } - } - for (n = 0; !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs; fnargs = TREE_CHAIN (fnargs), n++) { tree decl_type = TREE_VALUE (fnargs); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 1978634..ef41534 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -16155,10 +16155,29 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, #else return NULL_TREE; #endif } +/* Helper function to sort out which built-ins may be valid without having + a LHS. */ +bool +rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code) +{ + switch (fn_code) + { + case ALTIVEC_BUILTIN_STVX_V16QI: + case ALTIVEC_BUILTIN_STVX_V8HI: + case ALTIVEC_BUILTIN_STVX_V4SI: + case ALTIVEC_BUILTIN_STVX_V4SF: + case ALTIVEC_BUILTIN_STVX_V2DI: + case ALTIVEC_BUILTIN_STVX_V2DF: + return true; + default: + return false; + } +} + /* Fold a machine-dependent built-in in GIMPLE. (For folding into a constant, use rs6000_fold_builtin.) */ bool rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) @@ -16182,12 +16201,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) fn_code, fn_name1, fn_name2); if (!rs6000_fold_gimple) return false; - /* Generic solution to prevent gimple folding of code without a LHS. */ - if (!gimple_call_lhs (stmt)) + /* Prevent gimple folding for code that does not have a LHS, unless it is + allowed per the rs6000_builtin_valid_without_lhs helper function. */ + if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code)) return false; switch (fn_code) { /* Flavors of vec_add. We deliberately don't expand @@ -16585,11 +16605,48 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) build_int_cst (arg1_type, 0))); gimple_set_location (g, loc); gsi_replace (gsi, g, true); return true; } - + /* Vector stores. */ + case ALTIVEC_BUILTIN_STVX_V16QI: + case ALTIVEC_BUILTIN_STVX_V8HI: + case ALTIVEC_BUILTIN_STVX_V4SI: + case ALTIVEC_BUILTIN_STVX_V4SF: + case ALTIVEC_BUILTIN_STVX_V2DI: + case ALTIVEC_BUILTIN_STVX_V2DF: + { + /* Do not fold for -maltivec=be on LE targets. */ + if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN) + return false; + arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. */ + arg1 = gimple_call_arg (stmt, 1); /* Offset. */ + tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address. */ + location_t loc = gimple_location (stmt); + tree arg0_type = TREE_TYPE (arg0); + tree arg2_type = TREE_TYPE (arg2); + /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. Create + the tree using the value from arg0. The resulting type will match + the type of arg2. */ + gimple_seq stmts = NULL; + tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1); + tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, + arg2_type, arg2, temp_offset); + /* Mask off any lower bits from the address. */ + tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR, + arg2_type, temp_addr, + build_int_cst (arg2_type, -16)); + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); + /* The desired gimple result should be similar to: + MEM[(__vector floatD.1407 *)_1] = vf1D.2697; */ + gimple *g; + g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr, + build_int_cst (arg2_type, 0)), arg0); + gimple_set_location (g, loc); + gsi_replace (gsi, g, true); + return true; + } default: if (TARGET_DEBUG_BUILTIN) fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n", fn_code, fn_name1, fn_name2); break; ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, rs6000] folding of vector stores in GIMPLE 2017-09-21 20:56 [PATCH, rs6000] folding of vector stores in GIMPLE Will Schmidt @ 2017-09-22 10:27 ` Richard Biener 2017-09-22 19:59 ` Bill Schmidt 0 siblings, 1 reply; 4+ messages in thread From: Richard Biener @ 2017-09-22 10:27 UTC (permalink / raw) To: will_schmidt Cc: GCC Patches, Segher Boessenkool, Bill Schmidt, David Edelsohn On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote: > Hi, > > Folding of vector stores in GIMPLE. > > - Add code to handle gimple folding for the vec_st (vector store) builtins. > - Remove the now obsoleted folding code for vec_st from rs6000-c-c. > > There are two spots that I could use some feedback on. > > First - > An early exit remains in place prevent folding of statements that do not > have a LHS. To allow folding of the stores to get past the check, I have > added a helper function (rs6000_builtin_valid_without_lhs) that allows > those store intrinsics to proceed. I'm not sure the approach (or the name I chose) > is the best choice, so I'll defer to recommendations on how to improve that. :-) > > Second - > This code (as-is) is subject to a TBAA related issue (similar to what was noticed > in the gimple folding code for loads. As-is, with a testcase such as : > > void testst_struct1b (vector double vd1, long long ll1, struct S *p) > { > vec_st (vd1, ll1, (vector double *)p); > } > > will generate gimple that looks like: > MEM[(struct S *)D.3218] = vd1; > > If I rework the code, setting arg2_type to be ptr_type_node, i.e. > + tree arg2_type = TREE_TYPE (arg2); > to: > + tree arg2_type = ptr_type_node; > > the generated gimple then looks like > MEM[(void *)D.3218] = vd1; > > Which is probably OK, but I cannot say for certain. The generated .s content is at least equivalent. It looks safe. The question I had is whether vec_st (vd1, ll1, (vector double *)p) is equivalent to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective. Thus whether the type of the tird argument to vec_st defines the type of the access (in C standards meaning). If so then what we do now is pessimizing (but as you say previously (long long *) and (long *) were aliased together and you got wrong-code with aliasing with regular stores of the "wrong" same type). A proper fix would be to transition this type as seen from the frontend to GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing that on an extra argument, a constant zero pointer of the alias pointer type to use (which would also serve as a type indicator of the store itself). I'd use a target specific internal function for this (not sure if we can have those target specific, but I guess if it's folded away then that's fine) and get away with the overload set. Richard. > The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which > has been posted separately. > > regtest looks clean on power6 and newer. > > pending feedback, OK for trunk? > > Thanks, > -Will > > [gcc] > > 2017-09-21 Will Schmidt <will_schmidt@vnet.ibm.com> > > * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling > for early folding of vector stores (ALTIVEC_BUILTIN_ST_*). > (rs6000_builtin_valid_without_lhs): helper function. > * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): > Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST. > > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c > index a49db97..4a363a1 100644 > --- a/gcc/config/rs6000/rs6000-c.c > +++ b/gcc/config/rs6000/rs6000-c.c > @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, > convert (TREE_TYPE (stmt), arg0)); > stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); > return stmt; > } > > - /* Expand vec_st into an expression that masks the address and > - performs the store. We need to expand this early to allow > - the best aliasing, as by the time we get into RTL we no longer > - are able to honor __restrict__, for example. We may want to > - consider this for all memory access built-ins. > - > - When -maltivec=be is specified, or the wrong number of arguments > - is provided, simply punt to existing built-in processing. */ > - > - if (fcode == ALTIVEC_BUILTIN_VEC_ST > - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) > - && nargs == 3) > - { > - tree arg0 = (*arglist)[0]; > - tree arg1 = (*arglist)[1]; > - tree arg2 = (*arglist)[2]; > - > - /* Construct the masked address. Let existing error handling take > - over if we don't have a constant offset. */ > - arg1 = fold (arg1); > - > - if (TREE_CODE (arg1) == INTEGER_CST) > - { > - if (!ptrofftype_p (TREE_TYPE (arg1))) > - arg1 = build1 (NOP_EXPR, sizetype, arg1); > - > - tree arg2_type = TREE_TYPE (arg2); > - if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ()) > - { > - /* Force array-to-pointer decay for C++. */ > - arg2 = default_conversion (arg2); > - arg2_type = TREE_TYPE (arg2); > - } > - > - /* Find the built-in to make sure a compatible one exists; if not > - we fall back to default handling to get the error message. */ > - for (desc = altivec_overloaded_builtins; > - desc->code && desc->code != fcode; desc++) > - continue; > - > - for (; desc->code == fcode; desc++) > - if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1) > - && rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2) > - && rs6000_builtin_type_compatible (TREE_TYPE (arg2), > - desc->op3)) > - { > - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type, > - arg2, arg1); > - tree aligned > - = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type, > - addr, build_int_cst (arg2_type, -16)); > - > - tree arg0_type = TREE_TYPE (arg0); > - if (TYPE_MODE (arg0_type) == V2DImode) > - /* Type-based aliasing analysis thinks vector long > - and vector long long are different and will put them > - in distinct alias classes. Force our address type > - to be a may-alias type to avoid this. */ > - arg0_type > - = build_pointer_type_for_mode (arg0_type, Pmode, > - true/*can_alias_all*/); > - else > - arg0_type = build_pointer_type (arg0_type); > - aligned = build1 (NOP_EXPR, arg0_type, aligned); > - tree stg = build_indirect_ref (loc, aligned, RO_NULL); > - tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg, > - convert (TREE_TYPE (stg), arg0)); > - return retval; > - } > - } > - } > - > for (n = 0; > !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs; > fnargs = TREE_CHAIN (fnargs), n++) > { > tree decl_type = TREE_VALUE (fnargs); > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 1978634..ef41534 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -16155,10 +16155,29 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, > #else > return NULL_TREE; > #endif > } > > +/* Helper function to sort out which built-ins may be valid without having > + a LHS. */ > +bool > +rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code) > +{ > + switch (fn_code) > + { > + case ALTIVEC_BUILTIN_STVX_V16QI: > + case ALTIVEC_BUILTIN_STVX_V8HI: > + case ALTIVEC_BUILTIN_STVX_V4SI: > + case ALTIVEC_BUILTIN_STVX_V4SF: > + case ALTIVEC_BUILTIN_STVX_V2DI: > + case ALTIVEC_BUILTIN_STVX_V2DF: > + return true; > + default: > + return false; > + } > +} > + > /* Fold a machine-dependent built-in in GIMPLE. (For folding into > a constant, use rs6000_fold_builtin.) */ > > bool > rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > @@ -16182,12 +16201,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > fn_code, fn_name1, fn_name2); > > if (!rs6000_fold_gimple) > return false; > > - /* Generic solution to prevent gimple folding of code without a LHS. */ > - if (!gimple_call_lhs (stmt)) > + /* Prevent gimple folding for code that does not have a LHS, unless it is > + allowed per the rs6000_builtin_valid_without_lhs helper function. */ > + if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code)) > return false; > > switch (fn_code) > { > /* Flavors of vec_add. We deliberately don't expand > @@ -16585,11 +16605,48 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > build_int_cst (arg1_type, 0))); > gimple_set_location (g, loc); > gsi_replace (gsi, g, true); > return true; > } > - > + /* Vector stores. */ > + case ALTIVEC_BUILTIN_STVX_V16QI: > + case ALTIVEC_BUILTIN_STVX_V8HI: > + case ALTIVEC_BUILTIN_STVX_V4SI: > + case ALTIVEC_BUILTIN_STVX_V4SF: > + case ALTIVEC_BUILTIN_STVX_V2DI: > + case ALTIVEC_BUILTIN_STVX_V2DF: > + { > + /* Do not fold for -maltivec=be on LE targets. */ > + if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN) > + return false; > + arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. */ > + arg1 = gimple_call_arg (stmt, 1); /* Offset. */ > + tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address. */ > + location_t loc = gimple_location (stmt); > + tree arg0_type = TREE_TYPE (arg0); > + tree arg2_type = TREE_TYPE (arg2); > + /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. Create > + the tree using the value from arg0. The resulting type will match > + the type of arg2. */ > + gimple_seq stmts = NULL; > + tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1); > + tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, > + arg2_type, arg2, temp_offset); > + /* Mask off any lower bits from the address. */ > + tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR, > + arg2_type, temp_addr, > + build_int_cst (arg2_type, -16)); > + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); > + /* The desired gimple result should be similar to: > + MEM[(__vector floatD.1407 *)_1] = vf1D.2697; */ > + gimple *g; > + g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr, > + build_int_cst (arg2_type, 0)), arg0); > + gimple_set_location (g, loc); > + gsi_replace (gsi, g, true); > + return true; > + } > default: > if (TARGET_DEBUG_BUILTIN) > fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n", > fn_code, fn_name1, fn_name2); > break; > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, rs6000] folding of vector stores in GIMPLE 2017-09-22 10:27 ` Richard Biener @ 2017-09-22 19:59 ` Bill Schmidt 2017-09-22 23:44 ` Segher Boessenkool 0 siblings, 1 reply; 4+ messages in thread From: Bill Schmidt @ 2017-09-22 19:59 UTC (permalink / raw) To: Richard Biener Cc: will_schmidt, GCC Patches, Segher Boessenkool, David Edelsohn On Sep 22, 2017, at 5:27 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt > <will_schmidt@vnet.ibm.com> wrote: >> Hi, >> >> Folding of vector stores in GIMPLE. >> >> - Add code to handle gimple folding for the vec_st (vector store) builtins. >> - Remove the now obsoleted folding code for vec_st from rs6000-c-c. >> >> There are two spots that I could use some feedback on. >> >> First - >> An early exit remains in place prevent folding of statements that do not >> have a LHS. To allow folding of the stores to get past the check, I have >> added a helper function (rs6000_builtin_valid_without_lhs) that allows >> those store intrinsics to proceed. I'm not sure the approach (or the name I chose) >> is the best choice, so I'll defer to recommendations on how to improve that. :-) It's fine, but please make the helper function static. >> >> Second - >> This code (as-is) is subject to a TBAA related issue (similar to what was noticed >> in the gimple folding code for loads. As-is, with a testcase such as : >> >> void testst_struct1b (vector double vd1, long long ll1, struct S *p) >> { >> vec_st (vd1, ll1, (vector double *)p); >> } >> >> will generate gimple that looks like: >> MEM[(struct S *)D.3218] = vd1; >> >> If I rework the code, setting arg2_type to be ptr_type_node, i.e. >> + tree arg2_type = TREE_TYPE (arg2); >> to: >> + tree arg2_type = ptr_type_node; >> >> the generated gimple then looks like >> MEM[(void *)D.3218] = vd1; >> >> Which is probably OK, but I cannot say for certain. The generated .s content is at least equivalent. > > It looks safe. > > The question I had is whether vec_st (vd1, ll1, (vector double *)p) is > equivalent > to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective. Thus > whether the type of the tird argument to vec_st defines the type of the access > (in C standards meaning). If so then what we do now is pessimizing (but > as you say previously (long long *) and (long *) were aliased together and > you got wrong-code with aliasing with regular stores of the "wrong" same type). > > A proper fix would be to transition this type as seen from the frontend to > GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing > that on an extra argument, a constant zero pointer of the alias > pointer type to use > (which would also serve as a type indicator of the store itself). I'd use a > target specific internal function for this (not sure if we can have those target > specific, but I guess if it's folded away then that's fine) and get away with > the overload set. OK. Will, for now, let's again use the (void *) solution for the time being, and add commentary recording this improvement for future work. Same would go for the vec_vsx_ld/st variations once you get to those. Otherwise the patch looks ok to me. I'll defer to Segher for approval, of course. Thanks, Bill > > Richard. > >> The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which >> has been posted separately. >> >> regtest looks clean on power6 and newer. >> >> pending feedback, OK for trunk? >> >> Thanks, >> -Will >> >> [gcc] >> >> 2017-09-21 Will Schmidt <will_schmidt@vnet.ibm.com> >> >> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling >> for early folding of vector stores (ALTIVEC_BUILTIN_ST_*). >> (rs6000_builtin_valid_without_lhs): helper function. >> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): >> Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST. >> >> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c >> index a49db97..4a363a1 100644 >> --- a/gcc/config/rs6000/rs6000-c.c >> +++ b/gcc/config/rs6000/rs6000-c.c >> @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, >> convert (TREE_TYPE (stmt), arg0)); >> stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl); >> return stmt; >> } >> >> - /* Expand vec_st into an expression that masks the address and >> - performs the store. We need to expand this early to allow >> - the best aliasing, as by the time we get into RTL we no longer >> - are able to honor __restrict__, for example. We may want to >> - consider this for all memory access built-ins. >> - >> - When -maltivec=be is specified, or the wrong number of arguments >> - is provided, simply punt to existing built-in processing. */ >> - >> - if (fcode == ALTIVEC_BUILTIN_VEC_ST >> - && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG) >> - && nargs == 3) >> - { >> - tree arg0 = (*arglist)[0]; >> - tree arg1 = (*arglist)[1]; >> - tree arg2 = (*arglist)[2]; >> - >> - /* Construct the masked address. Let existing error handling take >> - over if we don't have a constant offset. */ >> - arg1 = fold (arg1); >> - >> - if (TREE_CODE (arg1) == INTEGER_CST) >> - { >> - if (!ptrofftype_p (TREE_TYPE (arg1))) >> - arg1 = build1 (NOP_EXPR, sizetype, arg1); >> - >> - tree arg2_type = TREE_TYPE (arg2); >> - if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ()) >> - { >> - /* Force array-to-pointer decay for C++. */ >> - arg2 = default_conversion (arg2); >> - arg2_type = TREE_TYPE (arg2); >> - } >> - >> - /* Find the built-in to make sure a compatible one exists; if not >> - we fall back to default handling to get the error message. */ >> - for (desc = altivec_overloaded_builtins; >> - desc->code && desc->code != fcode; desc++) >> - continue; >> - >> - for (; desc->code == fcode; desc++) >> - if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1) >> - && rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2) >> - && rs6000_builtin_type_compatible (TREE_TYPE (arg2), >> - desc->op3)) >> - { >> - tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type, >> - arg2, arg1); >> - tree aligned >> - = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type, >> - addr, build_int_cst (arg2_type, -16)); >> - >> - tree arg0_type = TREE_TYPE (arg0); >> - if (TYPE_MODE (arg0_type) == V2DImode) >> - /* Type-based aliasing analysis thinks vector long >> - and vector long long are different and will put them >> - in distinct alias classes. Force our address type >> - to be a may-alias type to avoid this. */ >> - arg0_type >> - = build_pointer_type_for_mode (arg0_type, Pmode, >> - true/*can_alias_all*/); >> - else >> - arg0_type = build_pointer_type (arg0_type); >> - aligned = build1 (NOP_EXPR, arg0_type, aligned); >> - tree stg = build_indirect_ref (loc, aligned, RO_NULL); >> - tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg, >> - convert (TREE_TYPE (stg), arg0)); >> - return retval; >> - } >> - } >> - } >> - >> for (n = 0; >> !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs; >> fnargs = TREE_CHAIN (fnargs), n++) >> { >> tree decl_type = TREE_VALUE (fnargs); >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> index 1978634..ef41534 100644 >> --- a/gcc/config/rs6000/rs6000.c >> +++ b/gcc/config/rs6000/rs6000.c >> @@ -16155,10 +16155,29 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, >> #else >> return NULL_TREE; >> #endif >> } >> >> +/* Helper function to sort out which built-ins may be valid without having >> + a LHS. */ >> +bool >> +rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code) >> +{ >> + switch (fn_code) >> + { >> + case ALTIVEC_BUILTIN_STVX_V16QI: >> + case ALTIVEC_BUILTIN_STVX_V8HI: >> + case ALTIVEC_BUILTIN_STVX_V4SI: >> + case ALTIVEC_BUILTIN_STVX_V4SF: >> + case ALTIVEC_BUILTIN_STVX_V2DI: >> + case ALTIVEC_BUILTIN_STVX_V2DF: >> + return true; >> + default: >> + return false; >> + } >> +} >> + >> /* Fold a machine-dependent built-in in GIMPLE. (For folding into >> a constant, use rs6000_fold_builtin.) */ >> >> bool >> rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) >> @@ -16182,12 +16201,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) >> fn_code, fn_name1, fn_name2); >> >> if (!rs6000_fold_gimple) >> return false; >> >> - /* Generic solution to prevent gimple folding of code without a LHS. */ >> - if (!gimple_call_lhs (stmt)) >> + /* Prevent gimple folding for code that does not have a LHS, unless it is >> + allowed per the rs6000_builtin_valid_without_lhs helper function. */ >> + if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code)) >> return false; >> >> switch (fn_code) >> { >> /* Flavors of vec_add. We deliberately don't expand >> @@ -16585,11 +16605,48 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) >> build_int_cst (arg1_type, 0))); >> gimple_set_location (g, loc); >> gsi_replace (gsi, g, true); >> return true; >> } >> - >> + /* Vector stores. */ >> + case ALTIVEC_BUILTIN_STVX_V16QI: >> + case ALTIVEC_BUILTIN_STVX_V8HI: >> + case ALTIVEC_BUILTIN_STVX_V4SI: >> + case ALTIVEC_BUILTIN_STVX_V4SF: >> + case ALTIVEC_BUILTIN_STVX_V2DI: >> + case ALTIVEC_BUILTIN_STVX_V2DF: >> + { >> + /* Do not fold for -maltivec=be on LE targets. */ >> + if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN) >> + return false; >> + arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. */ >> + arg1 = gimple_call_arg (stmt, 1); /* Offset. */ >> + tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address. */ >> + location_t loc = gimple_location (stmt); >> + tree arg0_type = TREE_TYPE (arg0); >> + tree arg2_type = TREE_TYPE (arg2); >> + /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. Create >> + the tree using the value from arg0. The resulting type will match >> + the type of arg2. */ >> + gimple_seq stmts = NULL; >> + tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1); >> + tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR, >> + arg2_type, arg2, temp_offset); >> + /* Mask off any lower bits from the address. */ >> + tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR, >> + arg2_type, temp_addr, >> + build_int_cst (arg2_type, -16)); >> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >> + /* The desired gimple result should be similar to: >> + MEM[(__vector floatD.1407 *)_1] = vf1D.2697; */ >> + gimple *g; >> + g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr, >> + build_int_cst (arg2_type, 0)), arg0); >> + gimple_set_location (g, loc); >> + gsi_replace (gsi, g, true); >> + return true; >> + } >> default: >> if (TARGET_DEBUG_BUILTIN) >> fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n", >> fn_code, fn_name1, fn_name2); >> break; ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH, rs6000] folding of vector stores in GIMPLE 2017-09-22 19:59 ` Bill Schmidt @ 2017-09-22 23:44 ` Segher Boessenkool 0 siblings, 0 replies; 4+ messages in thread From: Segher Boessenkool @ 2017-09-22 23:44 UTC (permalink / raw) To: Bill Schmidt; +Cc: Richard Biener, will_schmidt, GCC Patches, David Edelsohn On Fri, Sep 22, 2017 at 02:58:54PM -0500, Bill Schmidt wrote: > OK. Will, for now, let's again use the (void *) solution for the time being, and > add commentary recording this improvement for future work. Same would > go for the vec_vsx_ld/st variations once you get to those. > > Otherwise the patch looks ok to me. I'll defer to Segher for approval, of course. It's okay for trunk with the suggested improvements. Thanks for the review! Segher ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-22 23:44 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-09-21 20:56 [PATCH, rs6000] folding of vector stores in GIMPLE Will Schmidt 2017-09-22 10:27 ` Richard Biener 2017-09-22 19:59 ` Bill Schmidt 2017-09-22 23:44 ` Segher Boessenkool
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).