[Bug tree-optimization/60841] [4.9/4.10 Regression] gcc: internal compiler error: Killed (program cc1) out of memory

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/60841] [4.9/4.10 Regression] gcc: internal compiler error: Killed (program cc1) out of memory
Date: Wed, 16 Apr 2014 12:10:00 -0000	[thread overview]
Message-ID: <bug-60841-4-9wFfXDRpgk@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-60841-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60841

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so this loop contains an incredibly connected value-graph (connecting the
loads to the stores) and stupidly (yeah, well...) the vectorizer SLP code
builds a tree (yes, by effectively duplicating 'shared' nodes).

AFAIK that itself isn't a regression.  What may be a regression is that
we compute SLP now before other analysis (dependence analysis for example)
would reject the loop as not vectorizable.  So you can certainly build
a testcase that should show similar behavior with 4.8 or 4.7.

Note that the vectorization transform doesn't yet support SLP rooting
of a build-from-scalars vector, so your obvious idea to fix this is
a general missed optimization (also for scalar code vectorization - as
soon as the SLP build fails we can make it succeed by building a starting
vector from N (different) scalars).  It also doesn't really work with
SLP within loop vectorization I think.

4.8 fails to vectorize with

t.i:9: note: === vect_analyze_data_ref_accesses ===
t.i:9: note: Detected interleaving of size 4
t.i:9: note: interleaving size is greater than step for p_240->f3
t.i:9: note: not vectorized: complicated access pattern.
t.i:9: note: bad data access.

but 4.9 and trunk are happy with them.

Example testcase, vectorized with the same graph vs. tree issue with
-fvect-cost-model=unlimited:

int a[1024];
int b[1024];
void foo()
{
  int i;
  for (i = 0; i < 1024/4; i+=4)
    {
      int a0 = a[i+0];
      int a1 = a[i+1];
      int a2 = a[i+2];
      int a3 = a[i+3];
      a0 = a0 * 3;
      a1 = a1 * 3;
      a2 = a2 * 3;
      a3 = a3 * 3;
      int a0p = a0 + 1;
      int a1p = a1 + 1;
      int a2p = a2 + 1;
      int a3p = a3 + 1;
      b[i+0] = a0p + a0;
      b[i+1] = a1p + a1;
      b[i+2] = a2p + a2;
      b[i+3] = a3p + a3;
    }
}

> grep 'vectorizing SLP node' t.c.114t.vect 
t.c:6:3: note: ------>vectorizing SLP node starting from: a0_4 = a[i_30];
t.c:6:3: note: ------>vectorizing SLP node starting from: a0_11 = a0_4 * 3;
t.c:6:3: note: ------>vectorizing SLP node starting from: a0p_15 = a0_11 + 1;
t.c:6:3: note: ------>vectorizing SLP node starting from: a0_4 = a[i_30];
t.c:6:3: note: ------>vectorizing SLP node starting from: a0_11 = a0_4 * 3;
t.c:6:3: note: ------>vectorizing SLP node starting from: _19 = a0p_15 + a0_11;
t.c:6:3: note: ------>vectorizing SLP node starting from: b[i_30] = _19;

thus we have duplicate nodes for the SLP load and the SLP mult by 3.  This
issue grows the SLP tree exponentially.  In theory each reference may
correspond to a different SLP permutation (but we don't support that in
the code-gen yet).  Note we also emit duplicate vectorized code from
such SLP trees (but usually we reject it via cost analysis).  The above
happens at least since GCC 4.7.

I'm not sure if we should completely disregard multi-uses for this reason
(will check if there are testcases that expect vectorization), but
definitely that would fix the issue.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c (revision 209423)
+++ gcc/tree-vect-slp.c (working copy)
@@ -911,6 +911,37 @@ vect_build_slp_tree (loop_vec_info loop_
       if (oprnd_info->first_dt != vect_internal_def)
         continue;

+      /* Check if we have multiple uses on stmts not participating in
+        this SLP node.  */
+      unsigned j;
+      gimple def_stmt, use_stmt;
+      imm_use_iterator iter;
+      FOR_EACH_VEC_ELT (oprnd_info->def_stmts, j, def_stmt)
+       {
+         tree def;
+         if (gimple_code (def_stmt) == GIMPLE_PHI)
+           def = gimple_phi_result (def_stmt);
+         else
+           def = single_ssa_tree_operand (def_stmt, SSA_OP_DEF);
+         FOR_EACH_IMM_USE_STMT (use_stmt, iter, def)
+           {
+             unsigned k;
+             gimple sstmt;
+             FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (*node), k, sstmt)
+                 if (use_stmt == sstmt)
+                   break;
+             if (k == SLP_TREE_SCALAR_STMTS (*node).length ())
+               {
+                 if (dump_enabled_p ())
+                   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                                    "Build SLP failed: multiple uses\n");
+                 end_imm_use_stmt_traverse (&iter);
+                 vect_free_oprnd_info (oprnds_info);
+                 return false;
+               }
+           }
+       }
+

this retains multiple uses in the same SLP node which is supported.
Unfortunately the check is quadratic in the SLP group size (but that's
limited to the size of a vector register ...).  We could, if we ensure
like PLF_2 is always clear on stmts, mark the scalar stmts that way
and aovid the inner loop, of course.

The above patch for example disables SLP vectorization of slp-12a.c
(and other vect testcases) because there is a non-SLP use of b6 with the
store to ia[i].

So checking this at this point doesn't seem to work.

We can impose an artificial limit on the SLP size.  But that's not a real
fix either.

We can mark stmts "consumed" when building the SLP tree and check that.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c       (revision 209423)
+++ gcc/tree-vect-stmts.c       (working copy)
@@ -7388,6 +7388,8 @@ new_stmt_vec_info (gimple stmt, loop_vec
   GROUP_GAP (res) = 0;
   GROUP_SAME_DR_STMT (res) = NULL;

+  gimple_set_plf (stmt, GF_PLF_1, false);
+
   return res;
 }

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c (revision 209423)
+++ gcc/tree-vect-slp.c (working copy)
@@ -85,6 +85,8 @@ vect_free_slp_tree (slp_tree node)
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
     vect_free_slp_tree (child);

+  gimple_set_plf (SLP_TREE_SCALAR_STMTS (node)[0], GF_PLF_1, false);
+
   SLP_TREE_CHILDREN (node).release ();
   SLP_TREE_SCALAR_STMTS (node).release ();
   SLP_TREE_VEC_STMTS (node).release ();
@@ -862,6 +864,14 @@ vect_build_slp_tree (loop_vec_info loop_
   matches[0] = false;

   stmt = SLP_TREE_SCALAR_STMTS (*node)[0];
+  if (gimple_plf (stmt, GF_PLF_1))
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "Build SLP failed: multiple uses\n");
+      return false;
+    }
+
   if (is_gimple_call (stmt))
     nops = gimple_call_num_args (stmt);
   else if (is_gimple_assign (stmt))
@@ -977,6 +1020,8 @@ vect_build_slp_tree (loop_vec_info loop_
       return false;
     }

+  gimple_set_plf (stmt, GF_PLF_1, true);
+
   vect_free_oprnd_info (oprnds_info);
   return true;
 }

That still fails slp-18.c which exactly has such a duplicate use (with
very non-optimal initial vectorized codegen).  Similar bb-slp-20.c
and bb-slp-21.c.  So we could add an artificial "number of allowed
duplications" counter ... (but that couldn't use the flag approach
anymore as we'd falsely clear it on duplicate uses).

next prev parent reply	other threads:[~2014-04-16 12:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14 16:58 [Bug c/60841] New: gcc: internal compiler error: Killed (program cc1) in gcc 4.9.0 RC 2014-04-11 mike at vermeulen dot com
2014-04-14 17:01 ` [Bug c/60841] " trippels at gcc dot gnu.org
2014-04-14 17:09 ` trippels at gcc dot gnu.org
2014-04-14 17:13 ` [Bug c/60841] [4.9/4.10 Regression] gcc: internal compiler error: Killed (program cc1) out of memory mike at vermeulen dot com
2014-04-14 17:58 ` trippels at gcc dot gnu.org
2014-04-14 19:23 ` [Bug tree-optimization/60841] " jakub at gcc dot gnu.org
2014-04-15  7:16 ` jakub at gcc dot gnu.org
2014-04-15 13:28 ` jakub at gcc dot gnu.org
2014-04-15 13:41 ` trippels at gcc dot gnu.org
2014-04-15 16:37 ` jakub at gcc dot gnu.org
2014-04-15 16:59 ` jakub at gcc dot gnu.org
2014-04-16  8:35 ` rguenth at gcc dot gnu.org
2014-04-16 12:10 ` rguenth at gcc dot gnu.org [this message]
2014-04-16 13:06 ` rguenth at gcc dot gnu.org
2014-04-17  8:09 ` rguenth at gcc dot gnu.org
2014-04-17  8:10 ` [Bug tree-optimization/60841] [4.9 " rguenth at gcc dot gnu.org
2014-04-22 11:37 ` jakub at gcc dot gnu.org
2014-04-22 13:29 ` rguenth at gcc dot gnu.org
2014-04-22 13:30 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-60841-4-9wFfXDRpgk@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).