From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115513 invoked by alias); 1 Jun 2015 08:12:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 115503 invoked by uid 89); 1 Jun 2015 08:12:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 01 Jun 2015 08:12:24 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1YzKpW-0005Ag-Ku from Tom_deVries@mentor.com ; Mon, 01 Jun 2015 01:12:19 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Mon, 1 Jun 2015 09:12:16 +0100 Message-ID: <556C13DC.50406@mentor.com> Date: Mon, 01 Jun 2015 08:12:00 -0000 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Richard Biener CC: GCC Patches , Jakub Jelinek Subject: [gomp4,committed,PR65443] Add transform_to_exit_first_loop_al References: <551564D0.2090308@mentor.com> <551E8A1F.5050908@mentor.com> <552E69ED.7020601@mentor.com> <55548A19.3000302@mentor.com> In-Reply-To: <55548A19.3000302@mentor.com> Content-Type: multipart/mixed; boundary="------------090404040505050607080209" X-SW-Source: 2015-06/txt/msg00010.txt.bz2 --------------090404040505050607080209 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Content-length: 6697 [ was: Re: [PING][PATCH][PR65443] Add transform_to_exit_first_loop_alt ] On 14/05/15 13:42, Tom de Vries wrote: > On 20-04-15 14:25, Richard Biener wrote: >> On Wed, 15 Apr 2015, Tom de Vries wrote: >> >>> On 03-04-15 14:39, Tom de Vries wrote: >>>> On 27-03-15 15:10, Tom de Vries wrote: >>>>> Hi, >>>>> >>>>> this patch fixes PR65443, a todo in the parloops pass for function >>>>> transform_to_exit_first_loop: >>>>> ... >>>>> TODO: the common case is that latch of the loop is empty and >>>>> immediately >>>>> follows the loop exit. In this case, it would be better not >>>>> to copy >>>>> the >>>>> body of the loop, but only move the entry of the loop directly >>>>> before >>>>> the >>>>> exit check and increase the number of iterations of the loop >>>>> by one. >>>>> This may need some additional preconditioning in case NIT = ~0. >>>>> ... >>>>> >>>>> The current implementation of transform_to_exit_first_loop >>>>> transforms the >>>>> loop >>>>> by moving and duplicating the loop body. This patch transforms the >>>>> loop >>>>> into the >>>>> same normal form, but without the duplication, if that's possible >>>>> (determined by >>>>> try_transform_to_exit_first_loop_alt). >>>>> >>>>> The actual transformation, done by transform_to_exit_first_loop_alt >>>>> transforms >>>>> this loop: >>>>> ... >>>>> bb preheader: >>>>> ... >>>>> goto >>>>> >>>>> : >>>>> ... >>>>> if (ivtmp < n) >>>>> goto ; >>>>> else >>>>> goto ; >>>>> >>>>> : >>>>> ivtmp = ivtmp + inc; >>>>> goto >>>>> ... >>>>> >>>>> into this one: >>>>> ... >>>>> bb preheader: >>>>> ... >>>>> goto >>>>> >>>>> : >>>>> ... >>>>> goto ; >>>>> >>>>> : >>>>> if (ivtmp < n + 1) >>>>> goto ; >>>>> else >>>>> goto ; >>>>> >>>>> : >>>>> ivtmp = ivtmp + inc; >>>>> goto >>>>> ... >>>>> >>>> >>>> Updated patch, now using redirect_edge_var_map and flush_pending_stmts. >>>> >>>> Bootstrapped and reg-tested on x86_64. >>>> >>>> OK for stage1 trunk? >>>> >>> >>> Ping. >> > > Hi Richard, > > thanks for the review. > >> +static void >> +replace_imm_uses (tree val, imm_use_iterator *imm_iter) >> +{ >> + use_operand_p use_p; >> + >> + FOR_EACH_IMM_USE_ON_STMT (use_p, *imm_iter) >> + SET_USE (use_p, val); >> >> Use propagate_value. Why this odd interface passing a imm_iter?! >> > > The replace_imm_uses function is common code factored out of > replace_uses_in_bb_by and replace_uses_in_bbs_by. I'm not sure what is > odd about the interface of the replace_imm_uses function, but I've > removed the function. > > I tried using propagate_value, but that one doesn't like virtuals. > >> In fact most of the "repair SSA" in transform_to_exit_first_loop_alt >> looks too ad-hoc to me ... it also looks somewhat familiar to other >> code ... >> >> Unfortunately the comment before the function isn't in SSA form >> so it's hard to follow the transform. >> > > I've added the ssa bits in the transformation comment before the > function, and updated variable names and comments in the function. > >> I consider the parloops code bitrotten, no repair possible, so >> I might be convinced to not care about new spaghetti in there... >> >> + /* Fix up loop structure. TODO: Check whether this is sufficient. */ >> + loop->header = new_header; >> + >> >> no, surely not. Number of iterations (and estimates) are off >> after the transform > > The loop is cancelled at the end of gen_parallel_loop, and the > estimations are removed. We only seem to be using a limited set of the > loop data until the loop is cancelled. Updated comment accordingly. > >> and loop->latch might also need updating. >> > > That one actually stays the same. Updated comment accordingly. > >> "Safest" is to simply schedule a fixup (but you'll lose any >> loop annotation in that process). >> > > Since the loop is cancelled, AFAIU we don't need that, but... You're > referring to the annotations mentioned in > replace_loop_annotate_in_block? Losing the annotations seems to be a > generic problem of scheduling such a fixup, not particular to this > patch. Do you have a concern related to the patch and these annotations? > >> + /* Figure out whether nit + 1 overflows. */ >> + if (TREE_CODE (nit) == INTEGER_CST) >> + { >> + if (!tree_int_cst_equal (nit, TYPE_MAXVAL (nit_type))) >> >> in case nit_type is a pointer type TYPE_MAXVAL will be NULL I think. >> > > We've done ivcanon just before this point, so AFAIU we can assume we're > dealing with an unsigned integer. > >> Is the whole exercise for performance? > > I'm trying to fix the todo in the code for parloops, which is about > performance, though I don't expect a large gain. > > OTOH, my main focus is a related oacc kernels issue. For an oacc kernels > region, the entire loop is enclosed in a kernels region. Peeling of the > last iteration means I have to guard that iteration to run on only one > thread, which currently isn't done, so in that sense it's a correctness > issue as well. > >> In that case using an >> entirely new, unsigned IV, that runs from 0 to niter should be easiest >> and > > Introducting such an IV would mean that we don't have to update the IV, > but still we have to connect the new IV to the uses of the old IV. > > The current special handling of the IV variable is actually not that > elaborate: > ... > + /* Replace ivtmp_a with ivtmp_c in condition 'if (ivtmp_a < n)'. */ > + replace_uses_in_bb_by (res_a, res_c, new_header); > ... > So I'm not sure it's easier. > > just run-time guard that niter == +INF case? > > That doesn't work out nicely for the oacc kernels case. I need to know > compile-time wheter I can parallelize or not. > >> For the graphite case, can't you make graphite generate the >> loops exit-first in the first place? >> > > Perhaps, but that doesn't make a difference for the oacc kernels case. > >> The testcases are just correctness ones? That is, there isn't >> any testcase that checks whether the new code is exercised? >> (no extra debugging dumping?) >> > > There are 3 new test patterns, each with a libgomp/testsuite/libgomp.c > and a gcc/testsuite/gcc.dg variant, so 6 new test-cases in total. The > gcc/testsuite/gcc.dg variant checks that the new code is exercised. The > libgomp/testsuite/libgomp.c variant checks that the new code generates > correct code. > Committed to gomp-4_0-branch. Thanks, - Tom --------------090404040505050607080209 Content-Type: text/x-patch; name="0001-Add-transform_to_exit_first_loop_alt.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-Add-transform_to_exit_first_loop_alt.patch" Content-length: 19918 Add transform_to_exit_first_loop_alt 2015-05-28 Tom de Vries PR tree-optimization/65443 * tree-parloops.c (replace_imm_uses, replace_uses_in_bb_by) (replace_uses_in_bbs_by, transform_to_exit_first_loop_alt) (try_transform_to_exit_first_loop_alt): New function. (transform_to_exit_first_loop): Use try_transform_to_exit_first_loop_alt. * gcc.dg/parloops-exit-first-loop-alt-2.c: New test. * gcc.dg/parloops-exit-first-loop-alt-3.c: New test. * gcc.dg/parloops-exit-first-loop-alt.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt.c: New test. diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c new file mode 100644 index 0000000..a4d6cb2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pthread } */ +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */ + +#define N 1000 + +unsigned int a[N]; +unsigned int b[N]; +unsigned int c[N]; + +void __attribute__((noclone,noinline)) +f (unsigned int n) +{ + int i; + + for (i = 0; i < N; ++i) + c[i] = a[i] + b[i]; +} + +/* Three times three array accesses: + - three in f._loopfn.0 + - three in the parallel + - three in the low iteration count loop + Crucially, none for a peeled off last iteration following the parallel. */ +/* { dg-final { scan-tree-dump-times "(?n)\\\[i" 9 "parloops" } } */ + +/* { dg-final { cleanup-tree-dump "parloops" } } */ diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c new file mode 100644 index 0000000..c7ab51f88 --- /dev/null +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pthread } */ +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */ + +unsigned int *a; + +unsigned int __attribute__((noclone,noinline)) +f (unsigned int n) +{ + int i; + unsigned int sum = 1; + + for (i = 0; i < n; ++i) + sum += a[i]; + + return sum; +} + +/* Three array accesses: + - one in f._loopfn.0 + - one in the parallel + - one in the low iteration count loop + Crucially, none for a peeled off last iteration following the parallel. */ +/* { dg-final { scan-tree-dump-times "(?n)\\\* 4" 3 "parloops" } } */ + +/* { dg-final { cleanup-tree-dump "parloops" } } */ diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c new file mode 100644 index 0000000..42ca3ac --- /dev/null +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pthread } */ +/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops" } */ + +#define N 1000 + +unsigned int a[N]; +unsigned int b[N]; +unsigned int c[N]; + +void __attribute__((noclone,noinline)) +f (unsigned int n) +{ + int i; + + for (i = 0; i < n; ++i) + c[i] = a[i] + b[i]; +} + +/* Three times three array accesses: + - three in f._loopfn.0 + - three in the parallel + - three in the low iteration count loop + Crucially, none for a peeled off last iteration following the parallel. */ +/* { dg-final { scan-tree-dump-times "(?n)\\\[i" 9 "parloops" } } */ + +/* { dg-final { cleanup-tree-dump "parloops" } } */ diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index e10179d..f698eea 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3. If not see #include "plugin-api.h" #include "ipa-ref.h" #include "cgraph.h" +#include "tree-ssa.h" /* This pass tries to distribute iterations of loops into several threads. The implementation is straightforward -- for each loop we test whether its @@ -1487,17 +1488,386 @@ create_loop_fn (location_t loc) return decl; } -/* Moves the exit condition of LOOP to the beginning of its header, and - duplicates the part of the last iteration that gets disabled to the - exit of the loop. NIT is the number of iterations of the loop - (used to initialize the variables in the duplicated part). +/* Replace uses of NAME by VAL in block BB. */ - TODO: the common case is that latch of the loop is empty and immediately - follows the loop exit. In this case, it would be better not to copy the - body of the loop, but only move the entry of the loop directly before the - exit check and increase the number of iterations of the loop by one. - This may need some additional preconditioning in case NIT = ~0. - REDUCTION_LIST describes the reductions in LOOP. */ +static void +replace_uses_in_bb_by (tree name, tree val, basic_block bb) +{ + gimple use_stmt; + imm_use_iterator imm_iter; + + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name) + { + if (gimple_bb (use_stmt) != bb) + continue; + + use_operand_p use_p; + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) + SET_USE (use_p, val); + } +} + +/* Replace uses of NAME by VAL in blocks BBS. */ + +static void +replace_uses_in_bbs_by (tree name, tree val, bitmap bbs) +{ + gimple use_stmt; + imm_use_iterator imm_iter; + + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name) + { + if (!bitmap_bit_p (bbs, gimple_bb (use_stmt)->index)) + continue; + + use_operand_p use_p; + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) + SET_USE (use_p, val); + } +} + +/* Do transformation from: + + : + ... + goto + + : + ivtmp_a = PHI + sum_a = PHI + ... + use (ivtmp_a) + ... + sum_b = sum_a + sum_update + ... + if (ivtmp_a < n) + goto ; + else + goto ; + + : + ivtmp_b = ivtmp_a + 1; + goto + + : + sum_z = PHI + + [1] Where is single_pred (bb latch); In the simplest case, + that's . + + to: + + : + ... + goto + + : + ivtmp_a = PHI + sum_a = PHI + ... + use (ivtmp_a) + ... + sum_b = sum_a + sum_update + ... + goto ; + + : + ivtmp_c = PHI + sum_c = PHI + if (ivtmp_c < n + 1) + goto ; + else + goto ; + + : + ivtmp_b = ivtmp_a + 1; + goto + + : + sum_z = PHI + + + In unified diff format: + + : + ... +- goto ++ goto + + : +- ivtmp_a = PHI +- sum_a = PHI ++ ivtmp_a = PHI ++ sum_a = PHI + ... + use (ivtmp_a) + ... + sum_b = sum_a + sum_update + ... +- if (ivtmp_a < n) +- goto ; ++ goto ; ++ ++ : ++ ivtmp_c = PHI ++ sum_c = PHI ++ if (ivtmp_c < n + 1) ++ goto ; + else + goto ; + + : + ivtmp_b = ivtmp_a + 1; +- goto ++ goto + + : +- sum_z = PHI ++ sum_z = PHI + + Note: the example does not show any virtual phis, but these are handled more + or less as reductions. + + + Moves the exit condition of LOOP to the beginning of its header. + REDUCTION_LIST describes the reductions in LOOP. BOUND is the new loop + bound. */ + +static void +transform_to_exit_first_loop_alt (struct loop *loop, + reduction_info_table_type *reduction_list, + tree bound) +{ + basic_block header = loop->header; + basic_block latch = loop->latch; + edge exit = single_dom_exit (loop); + basic_block exit_block = exit->dest; + gcond *cond_stmt = as_a (last_stmt (exit->src)); + tree control = gimple_cond_lhs (cond_stmt); + edge e; + + /* Gather the bbs dominated by the exit block. */ + bitmap exit_dominated = BITMAP_ALLOC (NULL); + bitmap_set_bit (exit_dominated, exit_block->index); + vec exit_dominated_vec + = get_dominated_by (CDI_DOMINATORS, exit_block); + + int i; + basic_block dom_bb; + FOR_EACH_VEC_ELT (exit_dominated_vec, i, dom_bb) + bitmap_set_bit (exit_dominated, dom_bb->index); + + exit_dominated_vec.release (); + + /* Create the new_header block. */ + basic_block new_header = split_block_before_cond_jump (exit->src); + edge split_edge = single_pred_edge (new_header); + + /* Redirect entry edge to new_header. */ + edge entry = loop_preheader_edge (loop); + e = redirect_edge_and_branch (entry, new_header); + gcc_assert (e == entry); + + /* Redirect post_inc_edge to new_header. */ + edge post_inc_edge = single_succ_edge (latch); + e = redirect_edge_and_branch (post_inc_edge, new_header); + gcc_assert (e == post_inc_edge); + + /* Redirect post_cond_edge to header. */ + edge post_cond_edge = single_pred_edge (latch); + e = redirect_edge_and_branch (post_cond_edge, header); + gcc_assert (e == post_cond_edge); + + /* Redirect split_edge to latch. */ + e = redirect_edge_and_branch (split_edge, latch); + gcc_assert (e == split_edge); + + /* Set the new loop bound. */ + gimple_cond_set_rhs (cond_stmt, bound); + + /* Repair the ssa. */ + vec *v = redirect_edge_var_map_vector (post_inc_edge); + edge_var_map *vm; + gphi_iterator gsi; + for (gsi = gsi_start_phis (header), i = 0; + !gsi_end_p (gsi) && v->iterate (i, &vm); + gsi_next (&gsi), i++) + { + gphi *phi = gsi.phi (); + tree res_a = PHI_RESULT (phi); + + /* Create new phi. */ + tree res_c = copy_ssa_name (res_a, phi); + gphi *nphi = create_phi_node (res_c, new_header); + + /* Replace ivtmp_a with ivtmp_c in condition 'if (ivtmp_a < n)'. */ + replace_uses_in_bb_by (res_a, res_c, new_header); + + /* Replace ivtmp/sum_b with ivtmp/sum_c in header phi. */ + add_phi_arg (phi, res_c, post_cond_edge, UNKNOWN_LOCATION); + + /* Replace sum_b with sum_c in exit phi. Loop-closed ssa does not hold + for virtuals, so we cannot get away with exit_block only. */ + tree res_b = redirect_edge_var_map_def (vm); + replace_uses_in_bbs_by (res_b, res_c, exit_dominated); + + struct reduction_info *red = reduction_phi (reduction_list, phi); + gcc_assert (virtual_operand_p (res_a) + || res_a == control + || red != NULL); + + if (red) + { + /* Register the new reduction phi. */ + red->reduc_phi = nphi; + gimple_set_uid (red->reduc_phi, red->reduc_version); + } + } + gcc_assert (gsi_end_p (gsi) && !v->iterate (i, &vm)); + BITMAP_FREE (exit_dominated); + + /* Set the preheader argument of the new phis to ivtmp/sum_init. */ + flush_pending_stmts (entry); + + /* Set the latch arguments of the new phis to ivtmp/sum_b. */ + flush_pending_stmts (post_inc_edge); + + /* Register the reduction exit phis. */ + for (gphi_iterator gsi = gsi_start_phis (exit_block); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gphi *phi = gsi.phi (); + tree res_z = PHI_RESULT (phi); + if (virtual_operand_p (res_z)) + continue; + + tree res_c = PHI_ARG_DEF_FROM_EDGE (phi, exit); + gimple reduc_phi = SSA_NAME_DEF_STMT (res_c); + struct reduction_info *red = reduction_phi (reduction_list, reduc_phi); + if (red != NULL) + red->keep_res = phi; + } + + /* We're going to cancel the loop at the end of gen_parallel_loop, but until + then we're still using some fields, so only bother about fields that are + still used: header and latch. + The loop has a new header bb, so we update it. The latch bb stays the + same. */ + loop->header = new_header; + + /* Recalculate dominance info. */ + free_dominance_info (CDI_DOMINATORS); + calculate_dominance_info (CDI_DOMINATORS); +} + +/* Tries to moves the exit condition of LOOP to the beginning of its header + without duplication of the loop body. NIT is the number of iterations of the + loop. REDUCTION_LIST describes the reductions in LOOP. Return true if + transformation is successful. */ + +static bool +try_transform_to_exit_first_loop_alt (struct loop *loop, + reduction_info_table_type *reduction_list, + tree nit) +{ + /* Check whether the latch contains a single statement. */ + if (!gimple_seq_singleton_p (bb_seq (loop->latch))) + return true; + + /* Check whether the latch contains the loop iv increment. */ + edge back = single_succ_edge (loop->latch); + edge exit = single_dom_exit (loop); + gcond *cond_stmt = as_a (last_stmt (exit->src)); + tree control = gimple_cond_lhs (cond_stmt); + gphi *phi = as_a (SSA_NAME_DEF_STMT (control)); + tree inc_res = gimple_phi_arg_def (phi, back->dest_idx); + if (gimple_bb (SSA_NAME_DEF_STMT (inc_res)) != loop->latch) + return false; + + /* Check whether there's no code between the loop condition and the latch. */ + if (!single_pred_p (loop->latch) + || single_pred (loop->latch) != exit->src) + return false; + + tree alt_bound = NULL_TREE; + tree nit_type = TREE_TYPE (nit); + + /* Figure out whether nit + 1 overflows. */ + if (TREE_CODE (nit) == INTEGER_CST) + { + if (!tree_int_cst_equal (nit, TYPE_MAXVAL (nit_type))) + { + alt_bound = fold_build2_loc (UNKNOWN_LOCATION, PLUS_EXPR, nit_type, + nit, build_one_cst (nit_type)); + + gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST); + } + else + { + /* Todo: Figure out if we can trigger this, if it's worth to handle + optimally, and if we can handle it optimally. */ + } + } + else + { + gcc_assert (TREE_CODE (nit) == SSA_NAME); + + gimple def = SSA_NAME_DEF_STMT (nit); + + if (def + && is_gimple_assign (def) + && gimple_assign_rhs_code (def) == PLUS_EXPR) + { + tree op1 = gimple_assign_rhs1 (def); + tree op2 = gimple_assign_rhs2 (def); + if (integer_minus_onep (op1)) + alt_bound = op2; + else if (integer_minus_onep (op2)) + alt_bound = op1; + } + + /* There is a number of test-cases for which we don't get an alt_bound + here: they're listed here, with the lhs of the last stmt as the nit: + + libgomp.graphite/force-parallel-1.c: + _21 = (signed long) N_6(D); + _19 = _21 + -1; + _7 = (unsigned long) _19; + + libgomp.graphite/force-parallel-2.c: + _33 = (signed long) N_9(D); + _16 = _33 + -1; + _37 = (unsigned long) _16; + + libgomp.graphite/force-parallel-5.c: + : + # graphite_IV.5_46 = PHI <0(5), graphite_IV.5_47(11)> + : + _33 = (unsigned long) graphite_IV.5_46; + + g++.dg/tree-ssa/pr34355.C: + _2 = (unsigned int) i_9; + _3 = 4 - _2; + + gcc.dg/pr53849.c: + _5 = d.0_11 + -2; + _18 = (unsigned int) _5; + + We will be able to handle some of these cases, if we can determine when + it's safe to look past casts. */ + } + + if (alt_bound == NULL_TREE) + return false; + + transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound); + return true; +} + +/* Moves the exit condition of LOOP to the beginning of its header. NIT is the + number of iterations of the loop. REDUCTION_LIST describes the reductions in + LOOP. */ static void transform_to_exit_first_loop (struct loop *loop, @@ -1961,8 +2331,19 @@ gen_parallel_loop (struct loop *loop, /* Base all the induction variables in LOOP on a single control one. */ canonicalize_loop_ivs (loop, &nit, true); - /* Ensure that the exit condition is the first statement in the loop. */ - transform_to_exit_first_loop (loop, reduction_list, nit); + /* Ensure that the exit condition is the first statement in the loop. + The common case is that latch of the loop is empty (apart from the + increment) and immediately follows the loop exit test. Attempt to move the + entry of the loop directly before the exit check and increase the number of + iterations of the loop by one. */ + if (!try_transform_to_exit_first_loop_alt (loop, reduction_list, nit)) + { + /* Fall back on the method that handles more cases, but duplicates the + loop body: move the exit condition of LOOP to the beginning of its + header, and duplicate the part of the last iteration that gets disabled + to the exit of the loop. */ + transform_to_exit_first_loop (loop, reduction_list, nit); + } /* Generate initializations for reductions. */ if (reduction_list->elements () > 0) diff --git a/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c b/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c new file mode 100644 index 0000000..eb5e11f --- /dev/null +++ b/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c @@ -0,0 +1,48 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-parallelize-loops=2" } */ + +#include +#include + +#define N 1000 + +unsigned int a[N]; +unsigned int b[N]; +unsigned int c[N]; + +void __attribute__((noclone,noinline)) +f (void) +{ + int i; + + for (i = 0; i < N; ++i) + c[i] = a[i] + b[i]; +} + +int +main (void) +{ + int i, j; + + /* Complexify loop to inhibit parloops. */ + for (j = 0; j < 100; ++j) + for (i = 0; i < 10; i++) + { + int k = i + (10 * j); + a[k] = k; + b[k] = (k * 3) % 7; + c[k] = k * 2; + } + + f (); + + for (i = 0; i < N; i++) + { + unsigned int actual = c[i]; + unsigned int expected = i + ((i * 3) % 7); + if (actual != expected) + abort (); + } + + return 0; +} diff --git a/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c b/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c new file mode 100644 index 0000000..b426b3f --- /dev/null +++ b/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c @@ -0,0 +1,29 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-parallelize-loops=2" } */ + +unsigned int *a; + +unsigned int __attribute__((noclone,noinline)) +f (unsigned int n) +{ + int i; + unsigned int sum = 1; + + for (i = 0; i < n; ++i) + sum += a[i]; + + return sum; +} + +int +main (void) +{ + unsigned int res; + unsigned int array[4000]; + int i; + for (i = 0; i < 4000; ++i) + array[i] = i % 7; + a = &array[0]; + res = f (4000); + return !(res == 11995); +} diff --git a/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c b/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c new file mode 100644 index 0000000..d7d4003 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c @@ -0,0 +1,48 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-parallelize-loops=2" } */ + +#include +#include + +#define N 1000 + +unsigned int a[N]; +unsigned int b[N]; +unsigned int c[N]; + +void __attribute__((noclone,noinline)) +f (unsigned int n) +{ + int i; + + for (i = 0; i < n; ++i) + c[i] = a[i] + b[i]; +} + +int +main (void) +{ + int i, j; + + /* Complexify loop to inhibit parloops. */ + for (j = 0; j < 100; ++j) + for (i = 0; i < 10; i++) + { + int k = i + (10 * j); + a[k] = k; + b[k] = (k * 3) % 7; + c[k] = k * 2; + } + + f (N); + + for (i = 0; i < N; i++) + { + unsigned int actual = c[i]; + unsigned int expected = i + ((i * 3) % 7); + if (actual != expected) + abort (); + } + + return 0; +} -- 1.9.1 --------------090404040505050607080209--