From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22518 invoked by alias); 15 Sep 2015 12:09:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 21242 invoked by uid 89); 15 Sep 2015 12:09:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f169.google.com Received: from mail-yk0-f169.google.com (HELO mail-yk0-f169.google.com) (209.85.160.169) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 15 Sep 2015 12:09:53 +0000 Received: by ykdg206 with SMTP id g206so182416843ykd.1 for ; Tue, 15 Sep 2015 05:09:51 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.170.74.85 with SMTP id q82mr20428536ykq.94.1442318991570; Tue, 15 Sep 2015 05:09:51 -0700 (PDT) Received: by 10.37.93.136 with HTTP; Tue, 15 Sep 2015 05:09:51 -0700 (PDT) In-Reply-To: References: Date: Tue, 15 Sep 2015 12:10:00 -0000 Message-ID: Subject: Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947) From: Richard Biener To: Alan Hayward Cc: "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg01053.txt.bz2 On Thu, Sep 10, 2015 at 4:51 PM, Alan Hayward wrote: > Hi, > This patch (attached) adds support for vectorizing conditional expressions > (PR 65947), for example: > > int condition_reduction (int *a, int min_v) > { > int last = 0; > for (int i = 0; i < N; i++) > if (a[i] < min_v) > last = a[i]; > return last; > } > > To do this the loop is vectorised to create a vector of data results (ie > of matching a[i] values). Using an induction variable, an additional > vector is added containing the indexes where the matches occured. In the > function epilogue this is reduced to a single max value and then used to > index into the vector of data results. > When no values are matched in the loop, the indexes vector will contain > all zeroes, eventually matching the first entry in the data results vector. > > To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is > supported by aarch64 and arm. On X86 and powerpc, gcc will complain that > REDUC_MAX_EXPR is not supported for the required modes, failing the > vectorization. On mips it complains that the required vcond expression is > not supported. It is suggested the relevant backend experts add the > required backend support. > > Using a simple testcase based around a large number of N and run on an > aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of > it's original time. > > This patch caused binary differences in three spec2006 binaries on aarch64 > - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board > showed no improvement or degregation in runtime. > > > In the near future I hope to submit a further patch (as PR 66558) which > optimises the case where the result is simply the index of the loop, for > example: > int condition_reduction (int *a, int min_v) > { > int last = 0; > for (int i = 0; i < N; i++) > if (a[i] < min_v) > last = i; > return last; > } > In this case a lot of the new code can be optimized away. > > I have run check for aarch64, arm and x86 and have seen no regressions. Now comments on the patch itself. + if (code == COND_EXPR) + *v_reduc_type = COND_REDUCTION; so why not simply use COND_EXPR instead of the new v_reduc_type? + if (check_reduction && code != COND_EXPR && + vect_is_slp_reduction (loop_info, phi, def_stmt)) &&s go to the next line + /* Reduction of the max index and a reduction of the found + values. */ + epilogue_cost += add_stmt_cost (target_cost_data, 1, + vec_to_scalar, stmt_info, 0, + vect_epilogue); vec_to_scalar once isn't what the comment suggests. Instead the comment suggests twice what a regular reduction would do but I guess we can "hide" the vec_to_scalar cost and "merge" it with the broadcast. Thus make the above two vector_stmt costs? + /* A broadcast of the max value. */ + epilogue_cost += add_stmt_cost (target_cost_data, 2, + scalar_to_vec, stmt_info, 0, + vect_epilogue); comment suggests a single broadcast. @@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi) the final vector of induction results: */ exit_phi = NULL; FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg) - { + { gimple use_stmt = USE_STMT (use_p); if (is_gimple_debug (use_stmt)) continue; please avoid unrelated whitespace changes. + case COND_EXPR: + if (v_reduc_type == COND_REDUCTION) + { ... + /* Fall through. */ + case MIN_EXPR: case MAX_EXPR: - case COND_EXPR: aww, so we could already handle COND_EXPR reductions? How do they differ from what you add? Did you see if that path is even exercised today? + /* Create a vector of {init_value, 0, 0, 0...}. */ + vec *v; + vec_alloc (v, nunits); + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val); + if (SCALAR_FLOAT_TYPE_P (scalar_type)) + for (i = 1; i < nunits; ++i) + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, + build_real (scalar_type, dconst0)); + else + for (i = 1; i < nunits; ++i) + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, + build_int_cst (scalar_type, 0)); + init_def = build_constructor (vectype, v); you can unify the float/int case by using build_zero_cst (scalar_type). Note that you should build a vector constant instead of a constructor if init_val is a constant. The convenient way is to build the vector elements into a tree[] array and use build_vector_stat in that case. + /* Find maximum value from the vector of found indexes. */ + tree max_index = make_temp_ssa_name (index_scalar_type, NULL, ""); just use make_ssa_name (index_scalar_type); + /* Convert the vector of data to the same type as the EQ. */ + tree vec_data_cast; + if ( TYPE_UNSIGNED (index_vec_type)) + { How come it never happens the element sizes do not match? (int index type and double data type?) + /* Where the max index occured, use the value from the data vector. */ + tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL, ""); + gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR, + vec_compare, vec_data_cast); that is, don't you need to do some widening/shortening on the comparison result? (also what happens in the case "or all of the values"?) Definitely too much VIEW_CONVERT_MAGIC here for my taste ;) + This function also handles reduction of condition expressions, for example: + for (int i = 0; i < N; i++) + if (a[i] < value) + last = a[i]; + This is handled by vectorising the loop and creating an additional vector + containing the loop indexes for which "a[i] < value" was true. In the + function epilogue this is reduced to a single max value and then used to + index into the vector of results. I miss a comment that shows the kind of code we transform this to. "an additional vector containing the loop indexes" can't work - the vector will not be large enough ;) Naiively I would have made 'last' a vector, performing the reduction element-wise and in the epilogue reduce 'last' itself. And it looks like we are already doing that for int foo (int *a) { int val = 0; for (int i = 0; i < 1024; ++i) if (a[i] > val) val = a[i]; return val; } I must be missing something. Yes, I think we can't do index reduction yet, but pr65947-10.c is alrady handled? Stopping here. Thanks, Richard. > > Changelog: > > 2015-08-28 Alan Hayward > > PR tree-optimization/65947 > * tree-vect-loop.c > (vect_is_simple_reduction_1): Find condition reductions. > (vect_model_reduction_cost): Add condition reduction costs. > (get_initial_def_for_reduction): Add condition reduction initial > var. > (vect_create_epilog_for_reduction): Add condition reduction epilog. > (vectorizable_reduction): Condition reduction support. > * tree-vect-stmts.c > (vectorizable_condition): Add vect reduction arg > * doc/sourcebuild.texi (Vector-specific attributes): Document > vect_max_reduc > > testsuite/Changelog: > > PR tree-optimization/65947 > * lib/target-supports.exp > (check_effective_target_vect_max_reduc): Add. > * gcc.dg/vect/pr65947-1.c: New test. > * gcc.dg/vect/pr65947-2.c: New test. > * gcc.dg/vect/pr65947-3.c: New test. > * gcc.dg/vect/pr65947-4.c: New test. > * gcc.dg/vect/pr65947-5.c: New test. > * gcc.dg/vect/pr65947-6.c: New test. > * gcc.dg/vect/pr65947-7.c: New test. > * gcc.dg/vect/pr65947-8.c: New test. > * gcc.dg/vect/pr65947-9.c: New test. > * gcc.dg/vect/pr65947-10.c: New test. > * gcc.dg/vect/pr65947-11.c: New test. > > > > Thanks, > Alan > >