From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22696 invoked by alias); 14 Jan 2014 09:01:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 22682 invoked by uid 89); 14 Jan 2014 09:01:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.4 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx2.suse.de Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Tue, 14 Jan 2014 09:01:41 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id BA40BAC55; Tue, 14 Jan 2014 09:01:38 +0000 (UTC) Date: Tue, 14 Jan 2014 09:01:00 -0000 From: Richard Biener To: Cong Hou cc: Jakub Jelinek , GCC Patches Subject: Re: [PATCH] Fixing PR59006 and PR58921 by delaying loop invariant hoisting in vectorizer. In-Reply-To: Message-ID: References: <20131127113730.GZ892@tucnak.redhat.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2014-01/txt/msg00763.txt.bz2 On Mon, 13 Jan 2014, Cong Hou wrote: > I noticed that LIM could not hoist vector invariant, and that is why > my first implementation tries to hoist them all. Yes, I filed PR59786 for this. I'll see if I can come up with a fix suitable for stage3. > In addition, there are two disadvantages of hoisting invariant load + > lim method: > > First, for some instructions the scalar version is faster than the > vector version, and in this case hoisting scalar instructions before > vectorization is better. Those instructions include data > packing/unpacking, integer multiplication with SSE2, etc.. > > Second, it may use more SIMD registers. > > The following code shows a simple example: > > char *a, *b, *c; > for (int i = 0; i < N; ++i) > a[i] = b[0] * c[0] + a[i]; > > Vectorizing b[0]*c[0] is worse than loading the result of b[0]*c[0] > into a vector. Yes. I've tried with adjusting the vec_def_type as in the prototype patch I sent before christmas but that's quite intrusive for this stage. You could argue that performing invariant motion is not really the vectorizers main task and that a combination of hoisting only the load, later LIM hoisting the rest and then tree-vect-generic.c demoting vector ops to scalar ops (unimplemented, but also a useful general optimization) would work as well. That said, we should definitely have a second look for 4.10. For now hoisting the load is an improvement over 4.8 (at least I hope so ;)) which needs to be good enough for 4.9. I had to fix a latent bug to cure some testsuite fallout so the following is what I ended up committing. Jakub, adding the new flag is ok with me. Thanks, Richard. 2014-01-14 Richard Biener PR tree-optimization/58921 PR tree-optimization/59006 * tree-vect-loop-manip.c (vect_loop_versioning): Remove code hoisting invariant stmts. * tree-vect-stmts.c (vectorizable_load): Insert the splat of invariant loads on the preheader edge if possible. * gcc.dg/torture/pr58921.c: New testcase. * gcc.dg/torture/pr59006.c: Likewise. * gcc.dg/vect/pr58508.c: XFAIL no longer handled cases. Index: gcc/tree-vect-loop-manip.c =================================================================== *** gcc/tree-vect-loop-manip.c (revision 206576) --- gcc/tree-vect-loop-manip.c (working copy) *************** vect_loop_versioning (loop_vec_info loop *** 2435,2507 **** } } - - /* Extract load statements on memrefs with zero-stride accesses. */ - - if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) - { - /* In the loop body, we iterate each statement to check if it is a load. - Then we check the DR_STEP of the data reference. If DR_STEP is zero, - then we will hoist the load statement to the loop preheader. */ - - basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); - int nbbs = loop->num_nodes; - - for (int i = 0; i < nbbs; ++i) - { - for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]); - !gsi_end_p (si);) - { - gimple stmt = gsi_stmt (si); - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); - - if (is_gimple_assign (stmt) - && (!dr - || (DR_IS_READ (dr) && integer_zerop (DR_STEP (dr))))) - { - bool hoist = true; - ssa_op_iter iter; - tree var; - - /* We hoist a statement if all SSA uses in it are defined - outside of the loop. */ - FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE) - { - gimple def = SSA_NAME_DEF_STMT (var); - if (!gimple_nop_p (def) - && flow_bb_inside_loop_p (loop, gimple_bb (def))) - { - hoist = false; - break; - } - } - - if (hoist) - { - if (dr) - gimple_set_vuse (stmt, NULL); - - gsi_remove (&si, false); - gsi_insert_on_edge_immediate (loop_preheader_edge (loop), - stmt); - - if (dump_enabled_p ()) - { - dump_printf_loc - (MSG_NOTE, vect_location, - "hoisting out of the vectorized loop: "); - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); - dump_printf (MSG_NOTE, "\n"); - } - continue; - } - } - gsi_next (&si); - } - } - } - /* End loop-exit-fixes after versioning. */ if (cond_expr_stmt_list) --- 2435,2440 ---- Index: gcc/tree-vect-stmts.c =================================================================== *** gcc/tree-vect-stmts.c (revision 206576) --- gcc/tree-vect-stmts.c (working copy) *************** vectorizable_load (gimple stmt, gimple_s *** 6368,6379 **** /* 4. Handle invariant-load. */ if (inv_p && !bb_vinfo) { - gimple_stmt_iterator gsi2 = *gsi; gcc_assert (!grouped_load); ! gsi_next (&gsi2); ! new_temp = vect_init_vector (stmt, scalar_dest, ! vectype, &gsi2); new_stmt = SSA_NAME_DEF_STMT (new_temp); } if (negative) --- 6368,6406 ---- /* 4. Handle invariant-load. */ if (inv_p && !bb_vinfo) { gcc_assert (!grouped_load); ! /* If we have versioned for aliasing then we are sure ! this is a loop invariant load and thus we can insert ! it on the preheader edge. */ ! if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) ! { ! if (dump_enabled_p ()) ! { ! dump_printf_loc (MSG_NOTE, vect_location, ! "hoisting out of the vectorized " ! "loop: "); ! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); ! dump_printf (MSG_NOTE, "\n"); ! } ! tree tem = copy_ssa_name (scalar_dest, NULL); ! gsi_insert_on_edge_immediate ! (loop_preheader_edge (loop), ! gimple_build_assign (tem, ! unshare_expr ! (gimple_assign_rhs1 (stmt)))); ! new_temp = vect_init_vector (stmt, tem, vectype, NULL); ! } ! else ! { ! gimple_stmt_iterator gsi2 = *gsi; ! gsi_next (&gsi2); ! new_temp = vect_init_vector (stmt, scalar_dest, ! vectype, &gsi2); ! } new_stmt = SSA_NAME_DEF_STMT (new_temp); + set_vinfo_for_stmt (new_stmt, + new_stmt_vec_info (new_stmt, loop_vinfo, + bb_vinfo)); } if (negative) Index: gcc/testsuite/gcc.dg/torture/pr58921.c =================================================================== *** gcc/testsuite/gcc.dg/torture/pr58921.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr58921.c (working copy) *************** *** 0 **** --- 1,17 ---- + /* { dg-do compile } */ + + int a[7]; + int b; + + void + fn1 () + { + for (; b; b++) + a[b] = ((a[b] <= 0) == (a[0] != 0)); + } + + int + main () + { + return 0; + } Index: gcc/testsuite/gcc.dg/torture/pr59006.c =================================================================== *** gcc/testsuite/gcc.dg/torture/pr59006.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr59006.c (working copy) *************** *** 0 **** --- 1,13 ---- + /* { dg-do compile } */ + + int a[8], b; + void fn1(void) + { + int c; + for (; b; b++) + { + int d = a[b]; + c = a[0] ? d : 0; + a[b] = c; + } + } Index: gcc/testsuite/gcc.dg/vect/pr58508.c =================================================================== *** gcc/testsuite/gcc.dg/vect/pr58508.c (revision 206576) --- gcc/testsuite/gcc.dg/vect/pr58508.c (working copy) *************** void test5 (int* a, int* b) *** 66,70 **** } } ! /* { dg-final { scan-tree-dump-times "hoist" 8 "vect" { xfail vect_no_align } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */ --- 66,71 ---- } } ! /* { dg-final { scan-tree-dump-times "hoist" 8 "vect" { xfail *-*-* } } } */ ! /* { dg-final { scan-tree-dump-times "hoist" 3 "vect" { xfail vect_no_align } } } */ /* { dg-final { cleanup-tree-dump "vect" } } */