Hi Richard, on 2022/11/24 17:24, Richard Sandiford wrote: > "Kewen.Lin" writes: >> Hi, >> >> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD, >> STORE} into normal vector load/store if the given length is known >> to be equal to the length of the whole vector. It would help to >> improve overall cycles as normally the latency of vector access >> with length in bytes is bigger than normal vector access, and it >> also saves the preparation for length if constant length can not >> be encoded into instruction (such as on power). >> >> Bootstrapped and regtested on x86_64-redhat-linux, >> aarch64-linux-gnu and powerpc64{,le}-linux-gnu. >> >> Is it ok for trunk? >> >> BR, >> Kewen >> ----- >> PR tree-optimization/107412 >> >> gcc/ChangeLog: >> >> * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ... >> (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter >> mask_p indicating it's for mask or length, and add some handlings for >> IFN LEN_{LOAD,STORE}. >> (gimple_fold_mask_load): Rename to ... >> (gimple_fold_partial_load): ... this, add one parameter mask_p. >> (gimple_fold_mask_store): Rename to ... >> (gimple_fold_partial_store): ... this, add one parameter mask_p. >> (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE}, >> and adjust calls on gimple_fold_mask_load_store_mem_ref to >> gimple_fold_partial_load_store_mem_ref. > > Sorry to reply to late (still catching up on email), but: > >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr107412.c: New test. >> * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for >> folded LEN_LOAD. >> --- >> gcc/gimple-fold.cc | 57 ++++++++++++++----- >> .../gcc.target/powerpc/p9-vec-length-epil-8.c | 2 +- >> gcc/testsuite/gcc.target/powerpc/pr107412.c | 19 +++++++ >> 3 files changed, 64 insertions(+), 14 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c >> >> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc >> index a1704784bc9..e3a087defa6 100644 >> --- a/gcc/gimple-fold.cc >> +++ b/gcc/gimple-fold.cc >> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree type, >> return wi::min_precision (wres, sign) > TYPE_PRECISION (type); >> } >> >> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF >> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF >> for the memory it references, otherwise return null. VECTYPE is the >> - type of the memory vector. */ >> + type of the memory vector. MASK_P indicates it's for MASK if true, >> + otherwise it's for LEN. */ >> >> static tree >> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype) >> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p) >> { >> tree ptr = gimple_call_arg (call, 0); >> tree alias_align = gimple_call_arg (call, 1); >> - tree mask = gimple_call_arg (call, 2); >> - if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask)) >> + if (!tree_fits_uhwi_p (alias_align)) >> return NULL_TREE; >> >> + if (mask_p) >> + { >> + tree mask = gimple_call_arg (call, 2); >> + if (!integer_all_onesp (mask)) >> + return NULL_TREE; >> + } else { > > Minor nit: }, else, and { should be on separate lines. But the thing > I actually wanted to say was... Thanks for catching, I must have forgotten to reformat these lines. > >> + tree basic_len = gimple_call_arg (call, 2); >> + if (!tree_fits_uhwi_p (basic_len)) >> + return NULL_TREE; >> + unsigned int nargs = gimple_call_num_args (call); >> + tree bias = gimple_call_arg (call, nargs - 1); >> + gcc_assert (tree_fits_uhwi_p (bias)); >> + tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias); >> + unsigned int len = tree_to_uhwi (biased_len); >> + unsigned int vect_len >> + = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant (); >> + if (vect_len != len) >> + return NULL_TREE; > > Using "unsigned int" truncates the value. I realise that's probably > safe in this context, since large values have undefined behaviour. > But it still seems better to use an untruncated type, so that it > looks less like an oversight. (I think this is one case where "auto" > can help, since it gets the type right automatically.) > > It would also be better to avoid the to_constant, since we haven't > proven is_constant. How about: > > tree basic_len = gimple_call_arg (call, 2); > if (!poly_int_tree_p (basic_len)) > return NULL_TREE; > unsigned int nargs = gimple_call_num_args (call); > tree bias = gimple_call_arg (call, nargs - 1); > gcc_assert (TREE_CODE (bias) == INTEGER_CST); > if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias), > GET_MODE_SIZE (TYPE_MODE (vectype)))) > return NULL_TREE; > > which also avoids using tree arithmetic for the subtraction? I agree your proposed code has better robustness, thanks! Sorry that the original patch was committed, I made a patch as attached. It's bootstrapped and regresss-tested on powerpc64-linux-gnu P8, and powerpc64le-linux-gnu P9 and P10. Is it ok for trunk? BR, Kewen