From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27905 invoked by alias); 16 Apr 2014 12:10:11 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 27858 invoked by uid 48); 16 Apr 2014 12:10:06 -0000 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/60841] [4.9/4.10 Regression] gcc: internal compiler error: Killed (program cc1) out of memory Date: Wed, 16 Apr 2014 12:10:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.9.0 X-Bugzilla-Keywords: memory-hog X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 4.9.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-04/txt/msg01179.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60841 --- Comment #12 from Richard Biener --- Ok, so this loop contains an incredibly connected value-graph (connecting the loads to the stores) and stupidly (yeah, well...) the vectorizer SLP code builds a tree (yes, by effectively duplicating 'shared' nodes). AFAIK that itself isn't a regression. What may be a regression is that we compute SLP now before other analysis (dependence analysis for example) would reject the loop as not vectorizable. So you can certainly build a testcase that should show similar behavior with 4.8 or 4.7. Note that the vectorization transform doesn't yet support SLP rooting of a build-from-scalars vector, so your obvious idea to fix this is a general missed optimization (also for scalar code vectorization - as soon as the SLP build fails we can make it succeed by building a starting vector from N (different) scalars). It also doesn't really work with SLP within loop vectorization I think. 4.8 fails to vectorize with t.i:9: note: === vect_analyze_data_ref_accesses === t.i:9: note: Detected interleaving of size 4 t.i:9: note: interleaving size is greater than step for p_240->f3 t.i:9: note: not vectorized: complicated access pattern. t.i:9: note: bad data access. but 4.9 and trunk are happy with them. Example testcase, vectorized with the same graph vs. tree issue with -fvect-cost-model=unlimited: int a[1024]; int b[1024]; void foo() { int i; for (i = 0; i < 1024/4; i+=4) { int a0 = a[i+0]; int a1 = a[i+1]; int a2 = a[i+2]; int a3 = a[i+3]; a0 = a0 * 3; a1 = a1 * 3; a2 = a2 * 3; a3 = a3 * 3; int a0p = a0 + 1; int a1p = a1 + 1; int a2p = a2 + 1; int a3p = a3 + 1; b[i+0] = a0p + a0; b[i+1] = a1p + a1; b[i+2] = a2p + a2; b[i+3] = a3p + a3; } } > grep 'vectorizing SLP node' t.c.114t.vect t.c:6:3: note: ------>vectorizing SLP node starting from: a0_4 = a[i_30]; t.c:6:3: note: ------>vectorizing SLP node starting from: a0_11 = a0_4 * 3; t.c:6:3: note: ------>vectorizing SLP node starting from: a0p_15 = a0_11 + 1; t.c:6:3: note: ------>vectorizing SLP node starting from: a0_4 = a[i_30]; t.c:6:3: note: ------>vectorizing SLP node starting from: a0_11 = a0_4 * 3; t.c:6:3: note: ------>vectorizing SLP node starting from: _19 = a0p_15 + a0_11; t.c:6:3: note: ------>vectorizing SLP node starting from: b[i_30] = _19; thus we have duplicate nodes for the SLP load and the SLP mult by 3. This issue grows the SLP tree exponentially. In theory each reference may correspond to a different SLP permutation (but we don't support that in the code-gen yet). Note we also emit duplicate vectorized code from such SLP trees (but usually we reject it via cost analysis). The above happens at least since GCC 4.7. I'm not sure if we should completely disregard multi-uses for this reason (will check if there are testcases that expect vectorization), but definitely that would fix the issue. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 209423) +++ gcc/tree-vect-slp.c (working copy) @@ -911,6 +911,37 @@ vect_build_slp_tree (loop_vec_info loop_ if (oprnd_info->first_dt != vect_internal_def) continue; + /* Check if we have multiple uses on stmts not participating in + this SLP node. */ + unsigned j; + gimple def_stmt, use_stmt; + imm_use_iterator iter; + FOR_EACH_VEC_ELT (oprnd_info->def_stmts, j, def_stmt) + { + tree def; + if (gimple_code (def_stmt) == GIMPLE_PHI) + def = gimple_phi_result (def_stmt); + else + def = single_ssa_tree_operand (def_stmt, SSA_OP_DEF); + FOR_EACH_IMM_USE_STMT (use_stmt, iter, def) + { + unsigned k; + gimple sstmt; + FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (*node), k, sstmt) + if (use_stmt == sstmt) + break; + if (k == SLP_TREE_SCALAR_STMTS (*node).length ()) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Build SLP failed: multiple uses\n"); + end_imm_use_stmt_traverse (&iter); + vect_free_oprnd_info (oprnds_info); + return false; + } + } + } + this retains multiple uses in the same SLP node which is supported. Unfortunately the check is quadratic in the SLP group size (but that's limited to the size of a vector register ...). We could, if we ensure like PLF_2 is always clear on stmts, mark the scalar stmts that way and aovid the inner loop, of course. The above patch for example disables SLP vectorization of slp-12a.c (and other vect testcases) because there is a non-SLP use of b6 with the store to ia[i]. So checking this at this point doesn't seem to work. We can impose an artificial limit on the SLP size. But that's not a real fix either. We can mark stmts "consumed" when building the SLP tree and check that. Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c (revision 209423) +++ gcc/tree-vect-stmts.c (working copy) @@ -7388,6 +7388,8 @@ new_stmt_vec_info (gimple stmt, loop_vec GROUP_GAP (res) = 0; GROUP_SAME_DR_STMT (res) = NULL; + gimple_set_plf (stmt, GF_PLF_1, false); + return res; } Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 209423) +++ gcc/tree-vect-slp.c (working copy) @@ -85,6 +85,8 @@ vect_free_slp_tree (slp_tree node) FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child) vect_free_slp_tree (child); + gimple_set_plf (SLP_TREE_SCALAR_STMTS (node)[0], GF_PLF_1, false); + SLP_TREE_CHILDREN (node).release (); SLP_TREE_SCALAR_STMTS (node).release (); SLP_TREE_VEC_STMTS (node).release (); @@ -862,6 +864,14 @@ vect_build_slp_tree (loop_vec_info loop_ matches[0] = false; stmt = SLP_TREE_SCALAR_STMTS (*node)[0]; + if (gimple_plf (stmt, GF_PLF_1)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Build SLP failed: multiple uses\n"); + return false; + } + if (is_gimple_call (stmt)) nops = gimple_call_num_args (stmt); else if (is_gimple_assign (stmt)) @@ -977,6 +1020,8 @@ vect_build_slp_tree (loop_vec_info loop_ return false; } + gimple_set_plf (stmt, GF_PLF_1, true); + vect_free_oprnd_info (oprnds_info); return true; } That still fails slp-18.c which exactly has such a duplicate use (with very non-optimal initial vectorized codegen). Similar bb-slp-20.c and bb-slp-21.c. So we could add an artificial "number of allowed duplications" counter ... (but that couldn't use the flag approach anymore as we'd falsely clear it on duplicate uses).