From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22877 invoked by alias); 23 Sep 2013 18:23:32 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 22825 invoked by uid 48); 23 Sep 2013 18:23:28 -0000 From: "congh at google dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/58508] New: Redundant vector load of "actual" loop invariant in loop body. Date: Mon, 23 Sep 2013 18:23:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.9.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: congh at google dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-09/txt/msg01663.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58508 Bug ID: 58508 Summary: Redundant vector load of "actual" loop invariant in loop body. Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: congh at google dot com When GCC vectorizes the loop below, it will firstly do loop versioning with aliasing check on a and b. Since a and b have different strides (1 and 0), the check guarantees that there is no aliasing between a and b across all iterations. Then with this precondition *b becomes a loop invariant so that it can be loaded outside the loop during vectorization (Note that this precondition always holds when the loop is being vectorized). This can save us a load and a shuffle instruction in each iteration. void foo (int* a, int* b, int n) { for (int i = 0; i < n; ++i) a[i] += *b; } I have a patch handling this case as an optimization. After loop versioning, I detect all zero-strided data references and hoist the loads of them to the loop header. The patch is shown below. thanks, Cong Index: gcc/tree-vect-loop-manip.c =================================================================== --- gcc/tree-vect-loop-manip.c (revision 202662) +++ gcc/tree-vect-loop-manip.c (working copy) @@ -2477,6 +2477,37 @@ vect_loop_versioning (loop_vec_info loop adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); } + /* Extract load and store statements on pointers with zero-stride + accesses. */ + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) + { + + /* In the loop body, we iterate each statement to check if it is a load + or store. Then we check the DR_STEP of the data reference. If + DR_STEP is zero, then we will hoist the load statement to the loop + preheader, and move the store statement to the loop exit. */ + + for (gimple_stmt_iterator si = gsi_start_bb (loop->header); + !gsi_end_p (si); ) + { + gimple stmt = gsi_stmt (si); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + + if (dr && integer_zerop (DR_STEP (dr))) + { + if (DR_IS_READ (dr)) + { + basic_block preheader = loop_preheader_edge (loop)->src; + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); + gsi_move_after (&si, &si_dst); + } + } + else + gsi_next (&si); + } + } + /* End loop-exit-fixes after versioning. */ if (cond_expr_stmt_list)