gcc-patches-owner@gcc.gnu.org wrote on 01/07/2010 11:00:50 AM: > Hi, > > This patch adds vectorization support of min/max location pattern: > > for (i = 0; i < N; i++) > if (arr[i] < limit) > { > pos = i + 1; > limit = arr[i]; > } > > The recognized pattern is compound of two statements (and is called > compound pattern): > > # pos_22 = PHI > # limit_24 = PHI > ... > pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22; > limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24; > > both statements should be reductions with cond_expr and have the same > condition part. The min/max statement is expected to be of the form "x op > y ? x : y" (where op can be >, <, >= or <=), and the location is expected > to be an induction. > > To vectorize min/max location pattern we use a technique described in > "Multimedia vectorization of floating-point MIN/MAX reductions" by > A.J.C.Bik, X.Tian and M.B.Girkar, > http://portal.acm.org/citation.cfm?id=1145765. > > Vectorized loop (maxloc, first index): > vcx[0:vl-1:1] = | x |..| x |; - vector of max values > vck[0:vl-1:1] = | k |..| k |; - vector of positions > ind[0:vl-1:1] = |vl-1|..| 0 |; > inc[0:vl-1:1] = | vl |..| vl |; > for (i = 0; i < N; i += vl) { > msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]); > vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) | > (vck[0:vl-1:1] & !msk[0:vl-1:1]); > vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]); > ind[0:vl-1:1] += inc[0:vl-1:1]; > } > x = HMAX(vcx[0:vl-1:1]); - scalar maximum extraction > msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|); > vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) | > (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]); > k = HMIN(vck[0:vl-1:1]); - first position extraction > > > Vectorization of minloc is supposed to help gas_dyn from Polyhedron as > discussed in PR 31067. > > PRs 44710 and 44711 currently prevent the vectorization. PR 44711 can be > bypassed by using -fno-tree-pre. I'll wait for a fix of PR 44710 before I > commit this patch (after I regtest it again). > Also the case of pos = i; instead of pos = i+1; is not supported since in > this case the operands are switched, i.e., we get "x op y ? y : x". > > > My main question is the implementation of vector comparisons. I understand > that different targets can return different types of results. So instead of > defining new tree codes, I used target builtin which also returns the type > of the result. > > Other comments are welcome too. > > Bootstrapped and tested on powerpc64-suse-linux. Since it looks like nobody objects the use of target builtins for vector comparison, I am resubmitting an updated patch (the code) for review of non-vectorizer parts. Thanks, Ira ChangeLog: * doc/tm.texi: Regenerate. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_VECT_COMPARE): Document. * target.def (builtin_vect_compare): Add new builtin. * tree-vectorizer.h (enum vect_compound_pattern): New. (struct _stmt_vec_info): Add new fields compound_pattern and reduc_scalar_result_stmt. Add macros to access them. (is_pattern_stmt_p): Return true for compound pattern. (vectorizable_condition): Add arguments. (vect_recog_compound_func_ptr): New function-pointer type. (NUM_COMPOUND_PATTERNS): New. (vect_compound_pattern_recog): Declare. * tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert for compound patterns. (vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction patterns. Update comment. (vect_analyze_scalar_cycles): Update comment. (destroy_loop_vec_info): Update def stmt for the original pattern statement. (vect_is_simple_reduction_1): Skip compound pattern statements in uses check. Add spaces. Skip commutativity and type checks for minimum location statement. Fix printings. (vect_model_reduction_cost): Add min/max location pattern cost computation. (vect_create_epilogue_for_compound_pattern): New function. (vect_create_epilog_for_reduction): Don't retrieve the original statement for compound pattern. Fix comment accordingly. Store the result of vector reduction computation in a variable and use it. Call vect_create_epilogue_for_compound_pattern (). Check if optab exists before using it. Keep the scalar result computation statement. Use either exit phi node result or compound pattern result in scalar extraction. Don't expect to find an exit phi node for min/max statement. (vectorizable_reduction): Skip check for uses in loop for compound patterns. Don't retrieve the original statement for compound pattern. Call vectorizable_condition () with additional parameters. Skip reduction code check for compound patterns. Prepare operands for min/max location statement vectorization and pass them to vectorizable_condition (). (vectorizable_live_operation): Return TRUE for compound patterns. * tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare. (vect_recog_compound_func_ptrs): Likewise. (vect_recog_min_max_loc_pattern): New function. (vect_compound_pattern_recog): Likewise. * tree-vect-stmts.c (process_use): Mark compound pattern statements as used by reduction. (vect_mark_stmts_to_be_vectorized): Allow compound pattern statements to be used by reduction. (vectorize_minmax_location_pattern): New function. (vectorizable_condition): Update comment, add arguments. Skip checks irrelevant for compound pattern. Check that vector comparisons are supported by the target. Prepare operands using new arguments. Call vectorize_minmax_location_pattern(). (vect_analyze_stmt): Allow nested cycle statements to be used by reduction. Call vectorizable_condition () with additional arguments. (vect_transform_stmt): Call vectorizable_condition () with additional arguments. (new_stmt_vec_info): Initialize new fields. * config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_VCMPLTFP): New. (ALTIVEC_BUILTIN_VCMPLEFP): New. * config/rs6000/rs6000.c (rs6000_builtin_vect_compare): New. (TARGET_VECTORIZE_BUILTIN_VEC_CMP): Redefine. (struct builtin_description bdesc_2arg): Add altivec_vcmpltfp and altivec_vcmplefp. * config/rs6000/altivec.md (altivec_vcmpltfp): New pattern. (altivec_vcmplefp): Likewise. * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound patterns. (See attached file: minloc-new.txt) > > testsuite/ChangeLog: > > * gcc.dg/vect/vect.exp: Define how to run tests named fast-math*.c > * lib/target-supports.exp (check_effective_target_vect_cmp): New. > * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test. > * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c, > gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c, > gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c, > gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c, > gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c, > gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c, > gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c, > gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c: Likewise. >