gcc-patches-owner@gcc.gnu.org wrote on 01/07/2010 11:00:50 AM:

> Hi,
>
> This patch adds vectorization support of min/max location pattern:
>
>   for (i = 0; i < N; i++)
>     if (arr[i] < limit)
>       {
>         pos = i + 1;
>         limit = arr[i];
>       }
>
> The recognized pattern is compound of two statements (and is called
> compound pattern):
>
>   # pos_22 = PHI <pos_1(4), 1(2)>
>   # limit_24 = PHI <limit_4(4), 0(2)>
>   ...
>   pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;
>   limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;
>
> both statements should be reductions with cond_expr and have the same
> condition part. The min/max statement is expected to be of the form "x op
> y ? x : y" (where op can be >, <, >= or <=), and the location is expected
> to be an induction.
>
> To vectorize min/max location pattern we use a technique described in
> "Multimedia vectorization of floating-point MIN/MAX reductions" by
> A.J.C.Bik, X.Tian and M.B.Girkar,
> http://portal.acm.org/citation.cfm?id=1145765.
>
> Vectorized loop (maxloc, first index):
>      vcx[0:vl-1:1] = | x |..| x |;  - vector of max values
>      vck[0:vl-1:1] = | k |..| k |;  - vector of positions
>      ind[0:vl-1:1] = |vl-1|..| 0 |;
>      inc[0:vl-1:1] = | vl |..| vl |;
>      for (i = 0; i < N; i += vl) {
>        msk[0:vl-1:1] = (a[i:i+vl-1:1] > vcx[0:vl-1:1]);
>        vck[0:vl-1:1] = (ind[0:vl-1:1] & msk[0:vl-1:1]) |
>                        (vck[0:vl-1:1] & !msk[0:vl-1:1]);
>        vcx[0:vl-1:1] = VMAX(vcx[0:vl-1:1], a[i:i+vl-1:1]);
>        ind[0:vl-1:1] += inc[0:vl-1:1];
>      }
>      x = HMAX(vcx[0:vl-1:1]);       - scalar maximum extraction
>      msk[0:vl-1:1] = (vcx[0:vl-1:1] == |x|..|x|);
>      vck[0:vl-1:1] = (vck[0:vl-1:1] & msk[0:vl-1:1]) |
>                      (|MaxInt|..|MaxInt| & !msk[0:vl-1:1]);
>      k = HMIN(vck[0:vl-1:1]);       - first position extraction
>
>
> Vectorization of minloc is supposed to help gas_dyn from Polyhedron as
> discussed in PR 31067.
>
> PRs 44710 and 44711 currently prevent the vectorization. PR 44711 can be
> bypassed by using -fno-tree-pre. I'll wait for a fix of PR 44710 before I
> commit this patch (after I regtest it again).
> Also the case of pos = i; instead of pos = i+1; is not supported since in
> this case the operands are switched, i.e., we get "x op y ? y : x".
>
>
> My main question is the implementation of vector comparisons. I
understand
> that different targets can return different types of results. So instead
of
> defining new tree codes, I used target builtin which also returns the
type
> of the result.
>
> Other comments are welcome too.
>
> Bootstrapped and tested on powerpc64-suse-linux.

Since it looks like nobody objects the use of target builtins for vector
comparison, I am resubmitting an updated patch (the code) for review of
non-vectorizer parts.

Thanks,
Ira


ChangeLog:

      * doc/tm.texi: Regenerate.
      * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_VECT_COMPARE):
      Document.
      * target.def (builtin_vect_compare): Add new builtin.
      * tree-vectorizer.h (enum vect_compound_pattern): New.
      (struct _stmt_vec_info): Add new fields compound_pattern and
      reduc_scalar_result_stmt. Add macros to access them.
      (is_pattern_stmt_p): Return true for compound pattern.
      (vectorizable_condition): Add arguments.
      (vect_recog_compound_func_ptr): New function-pointer type.
      (NUM_COMPOUND_PATTERNS): New.
      (vect_compound_pattern_recog): Declare.
      * tree-vect-loop.c (vect_determine_vectorization_factor): Fix assert
      for compound patterns.
      (vect_analyze_scalar_cycles_1): Fix typo. Detect compound reduction
      patterns. Update comment.
      (vect_analyze_scalar_cycles): Update comment.
      (destroy_loop_vec_info): Update def stmt for the original pattern
      statement.
      (vect_is_simple_reduction_1): Skip compound pattern statements in
      uses check. Add spaces. Skip commutativity and type checks for
      minimum location statement. Fix printings.
      (vect_model_reduction_cost): Add min/max location pattern cost
      computation.
      (vect_create_epilogue_for_compound_pattern): New function.
      (vect_create_epilog_for_reduction): Don't retrieve the original
      statement for compound pattern. Fix comment accordingly. Store the
      result of vector reduction computation in a variable and use it. Call
      vect_create_epilogue_for_compound_pattern (). Check if optab exists
      before using it. Keep the scalar result computation statement. Use
      either exit phi node result or compound pattern result in scalar
      extraction. Don't expect to find an exit phi node for min/max
      statement.
      (vectorizable_reduction): Skip check for uses in loop for compound
      patterns. Don't retrieve the original statement for compound pattern.
      Call vectorizable_condition () with additional parameters. Skip
      reduction code check for compound patterns. Prepare operands for
      min/max location statement vectorization and pass them to
      vectorizable_condition ().
      (vectorizable_live_operation): Return TRUE for compound patterns.
      * tree-vect-patterns.c (vect_recog_min_max_loc_pattern): Declare.
      (vect_recog_compound_func_ptrs): Likewise.
      (vect_recog_min_max_loc_pattern): New function.
      (vect_compound_pattern_recog): Likewise.
      * tree-vect-stmts.c (process_use): Mark compound pattern statements
      as used by reduction.
      (vect_mark_stmts_to_be_vectorized): Allow compound pattern statements
      to be used by reduction.
      (vectorize_minmax_location_pattern): New function.
      (vectorizable_condition): Update comment, add arguments. Skip checks
      irrelevant for compound pattern. Check that vector comparisons are
      supported by the target. Prepare operands using new arguments. Call
      vectorize_minmax_location_pattern().
      (vect_analyze_stmt): Allow nested cycle statements to be used by
      reduction. Call vectorizable_condition () with additional arguments.
      (vect_transform_stmt): Call vectorizable_condition () with additional
      arguments.
      (new_stmt_vec_info): Initialize new fields.
      * config/rs6000/rs6000-builtin.def (ALTIVEC_BUILTIN_VCMPLTFP): New.
      (ALTIVEC_BUILTIN_VCMPLEFP): New.
      * config/rs6000/rs6000.c (rs6000_builtin_vect_compare): New.
      (TARGET_VECTORIZE_BUILTIN_VEC_CMP): Redefine.
      (struct builtin_description bdesc_2arg): Add altivec_vcmpltfp and
      altivec_vcmplefp.
      * config/rs6000/altivec.md (altivec_vcmpltfp): New pattern.
      (altivec_vcmplefp): Likewise.
      * tree-vect-slp.c (vect_get_and_check_slp_defs): Fail for compound
      patterns.

(See attached file: minloc-new.txt)

>
> testsuite/ChangeLog:
>
>       * gcc.dg/vect/vect.exp: Define how to run tests named fast-math*.c
>       * lib/target-supports.exp (check_effective_target_vect_cmp): New.
>       * gcc.dg/vect/fast-math-no-pre-minmax-loc-1.c: New test.
>       * gcc.dg/vect/fast-math-no-pre-minmax-loc-2.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-3.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-4.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-5.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-6.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-7.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-8.c,
>       gcc.dg/vect/fast-math-no-pre-minmax-loc-9.c: Likewise.
>