From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-473693-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 82756 invoked by alias); 22 Feb 2018 07:42:50 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 82747 invoked by uid 89); 22 Feb 2018 07:42:49 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-16.9 required=5.0 tests=BAYES_00,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=nn, H*MI:41B8, H*M:41B8
X-HELO: mx2.suse.de
Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 Feb 2018 07:42:47 +0000
Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254])	by mx2.suse.de (Postfix) with ESMTP id 32B13ACC3;	Thu, 22 Feb 2018 07:42:45 +0000 (UTC)
Date: Thu, 22 Feb 2018 07:42:00 -0000
User-Agent: K-9 Mail for Android
In-Reply-To: <20180221222836.GS5867@tucnak>
References: <20180221222836.GS5867@tucnak>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [PATCH] Fix store merging (PR tree-optimization/84503)
To: Jakub Jelinek <jakub@redhat.com>,Jeff Law <law@redhat.com>
CC: gcc-patches@gcc.gnu.org
From: Richard Biener <rguenther@suse.de>
Message-ID: <F4CC4F09-B303-41B8-8789-C6E27A3C3F9D@suse.de>
X-SW-Source: 2018-02/txt/msg01261.txt.bz2

On February 21, 2018 11:28:36 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com=
> wrote:
>Hi!
>
>The long comment above the new check_no_overlap function below should
>hopefully explain the problem.  coalesce_immediate_stores is splitting
>the
>stores from the same base into groups that we are going to optimize
>individually; for each successfully merged group we emit new stores at
>the
>location of the last store from that group.  And
>coalesce_immediate_stores
>processes stores ordered primarily by increasing bit position and only
>secondarily by the original ordering in the basic block.
>On the following testcases, we have the unfortunate ordering of:
>     MEM[(long long int *)p_28] =3D 0;
>     MEM[(long long int *)p_28 + 8B] =3D 0;
>     MEM[(long long int *)p_28 + 16B] =3D 0;
>     MEM[(long long int *)p_28 + 24B] =3D 0;
>     _129 =3D (int) _130;
>     MEM[(int *)p_28 + 8B] =3D _129;
>     MEM[(int *)p_28].a =3D -1;
>We already have
>     MEM[(long long int *)p_28] =3D 0;
>     MEM[(int *)p_28].a =3D -1;
>in the first group (which is fine) and the following 2 stores to
>consider
>are
>     MEM[(long long int *)p_28 + 8B] =3D 0;
>and
>     MEM[(int *)p_28 + 8B] =3D _129;
>We add the first store to the group, which has the effect of emitting
>the
>merged stores at the location of former MEM[(int *)p_28].a =3D -1;
>store.  Then we process MEM[(int *)p_28 + 8B] =3D _129; (which was
>handled
>just as potential bswap possibility and as it overlaps with previous
>and can't be merged with next either, it will be its own group and thus
>reuse the original stmt; the net effect is that we've reordered the
>MEM[(long long int *)p_28 + 8B] =3D 0; store from before the
>MEM[(int *)p_28 + 8B] =3D _129; store to after it, but those two do
>alias.
>
>Instead of detecting this situation late, where we could just throw
>away the
>whole group (which would mean we couldn't merge even the
>MEM[(long long int *)p_28] =3D 0; and MEM[(int *)p_28].a =3D -1; stores)
>this patch attempts to check before calling merge_into whether it will
>not overlap with later stores we won't be able to emit in the same
>group
>and if it does, it terminates the group right away.
>
>I had to fix also the merged_store_group::merge_into width computation,
>it was mistakenly just counting sum of the bits we've added to the
>group,
>but we really need the distance between the first bit in the group and
>last
>bit, including any gaps in between, but exclusing gaps before the start
>and
>after the end (still within the bitregion).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.=20

Richard.=20

>For 7.x I think we'll need something along the lines of PR82916
>instead,
>while the same testcase fails there, we only handle stores with lhs
>being
>a constant and so the problem is that we aren't terminating the chain
>when
>we should on the variable store.
>
>2018-02-21  Jakub Jelinek  <jakub@redhat.com>
>
>	PR tree-optimization/84503
>	* gimple-ssa-store-merging.c (merged_store_group::merge_into): Compute
>	width as info->bitpos + info->bitsize - start.
>	(merged_store_group::merge_overlapping): Simplify width computation.
>	(check_no_overlap): New function.
>	(imm_store_chain_info::try_coalesce_bswap): Compute expected
>	start + width and last_order of the group, fail if check_no_overlap
>	fails.
>	(imm_store_chain_info::coalesce_immediate_stores): Don't merge info
>	to group if check_no_overlap fails.
>
>	* gcc.dg/pr84503-1.c: New test.
>	* gcc.dg/pr84503-2.c: New test.
>
>--- gcc/gimple-ssa-store-merging.c.jj	2018-01-16 09:52:26.230235131
>+0100
>+++ gcc/gimple-ssa-store-merging.c	2018-02-21 20:13:45.239129599 +0100
>@@ -1891,12 +1891,11 @@ merged_store_group::do_merge (store_imme
> void
> merged_store_group::merge_into (store_immediate_info *info)
> {
>-  unsigned HOST_WIDE_INT wid =3D info->bitsize;
>/* Make sure we're inserting in the position we think we're inserting.=20
>*/
>   gcc_assert (info->bitpos >=3D start + width
> 	      && info->bitregion_start <=3D bitregion_end);
>=20
>-  width +=3D wid;
>+  width =3D info->bitpos + info->bitsize - start;
>   do_merge (info);
> }
>=20
>@@ -1909,7 +1908,7 @@ merged_store_group::merge_overlapping (s
> {
>   /* If the store extends the size of the group, extend the width.  */
>   if (info->bitpos + info->bitsize > start + width)
>-    width +=3D info->bitpos + info->bitsize - (start + width);
>+    width =3D info->bitpos + info->bitsize - start;
>=20
>   do_merge (info);
> }
>@@ -2304,6 +2303,55 @@ gather_bswap_load_refs (vec<tree> *refs,
>     }
> }
>=20
>+/* Check if there are any stores in M_STORE_INFO after index I
>+   (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap
>+   a potential group ending with END that have their order
>+   smaller than LAST_ORDER.  RHS_CODE is the kind of store in the
>+   group.  Return true if there are no such stores.
>+   Consider:
>+     MEM[(long long int *)p_28] =3D 0;
>+     MEM[(long long int *)p_28 + 8B] =3D 0;
>+     MEM[(long long int *)p_28 + 16B] =3D 0;
>+     MEM[(long long int *)p_28 + 24B] =3D 0;
>+     _129 =3D (int) _130;
>+     MEM[(int *)p_28 + 8B] =3D _129;
>+     MEM[(int *)p_28].a =3D -1;
>+   We already have
>+     MEM[(long long int *)p_28] =3D 0;
>+     MEM[(int *)p_28].a =3D -1;
>+   stmts in the current group and need to consider if it is safe to
>+   add MEM[(long long int *)p_28 + 8B] =3D 0; store into the same group.
>+   There is an overlap between that store and the MEM[(int *)p_28 +
>8B] =3D _129;
>+   store though, so if we add the MEM[(long long int *)p_28 + 8B] =3D 0;
>+   into the group and merging of those 3 stores is successful, merged
>+   stmts will be emitted at the latest store from that group, i.e.
>+   LAST_ORDER, which is the MEM[(int *)p_28].a =3D -1; store.
>+   The MEM[(int *)p_28 + 8B] =3D _129; store that originally follows
>+   the MEM[(long long int *)p_28 + 8B] =3D 0; would now be before it,
>+   so we need to refuse merging MEM[(long long int *)p_28 + 8B] =3D 0;
>+   into the group.  That way it will be its own store group and will
>+   not be touched.  If RHS_CODE is INTEGER_CST and there are
>overlapping
>+   INTEGER_CST stores, those are mergeable using merge_overlapping,
>+   so don't return false for those.  */
>+
>+static bool
>+check_no_overlap (vec<store_immediate_info *> m_store_info, unsigned
>int i,
>+		  enum tree_code rhs_code, unsigned int last_order,
>+		  unsigned HOST_WIDE_INT end)
>+{
>+  unsigned int len =3D m_store_info.length ();
>+  for (++i; i < len; ++i)
>+    {
>+      store_immediate_info *info =3D m_store_info[i];
>+      if (info->bitpos >=3D end)
>+	break;
>+      if (info->order < last_order
>+	  && (rhs_code !=3D INTEGER_CST || info->rhs_code !=3D INTEGER_CST))
>+	return false;
>+    }
>+  return true;
>+}
>+
> /* Return true if m_store_info[first] and at least one following store
>  form a group which store try_size bitsize value which is byte swapped
>    from a memory load or some value, or identity from some value.
>@@ -2375,6 +2423,7 @@ imm_store_chain_info::try_coalesce_bswap
>   unsigned int last_order =3D merged_store->last_order;
>   gimple *first_stmt =3D merged_store->first_stmt;
>   gimple *last_stmt =3D merged_store->last_stmt;
>+  unsigned HOST_WIDE_INT end =3D merged_store->start +
>merged_store->width;
>   store_immediate_info *infof =3D m_store_info[first];
>=20
>   for (unsigned int i =3D first; i <=3D last; ++i)
>@@ -2413,25 +2462,23 @@ imm_store_chain_info::try_coalesce_bswap
> 	}
>       else
> 	{
>-	  if (n.base_addr)
>+	  if (n.base_addr && n.vuse !=3D this_n.vuse)
> 	    {
>-	      if (n.vuse !=3D this_n.vuse)
>-		{
>-		  if (vuse_store =3D=3D 0)
>-		    return false;
>-		  vuse_store =3D 1;
>-		}
>-	      if (info->order > last_order)
>-		{
>-		  last_order =3D info->order;
>-		  last_stmt =3D info->stmt;
>-		}
>-	      else if (info->order < first_order)
>-		{
>-		  first_order =3D info->order;
>-		  first_stmt =3D info->stmt;
>-		}
>+	      if (vuse_store =3D=3D 0)
>+		return false;
>+	      vuse_store =3D 1;
> 	    }
>+	  if (info->order > last_order)
>+	    {
>+	      last_order =3D info->order;
>+	      last_stmt =3D info->stmt;
>+	    }
>+	  else if (info->order < first_order)
>+	    {
>+	      first_order =3D info->order;
>+	      first_stmt =3D info->stmt;
>+	    }
>+	  end =3D MAX (end, info->bitpos + info->bitsize);
>=20
> 	  ins_stmt =3D perform_symbolic_merge (ins_stmt, &n, info->ins_stmt,
> 					     &this_n, &n);
>@@ -2452,6 +2499,9 @@ imm_store_chain_info::try_coalesce_bswap
>   if (n.base_addr =3D=3D NULL_TREE && !is_gimple_val (n.src))
>     return false;
>=20
>+  if (!check_no_overlap (m_store_info, last, LROTATE_EXPR, last_order,
>end))
>+    return false;
>+
>   /* Don't handle memory copy this way if normal non-bswap processing
>      would handle it too.  */
>   if (n.n =3D=3D cmpnop && (unsigned) n.n_ops =3D=3D last - first + 1)
>@@ -2633,7 +2683,13 @@ imm_store_chain_info::coalesce_immediate
> 	       : !info->ops[0].base_addr)
> 	      && (infof->ops[1].base_addr
> 		  ? compatible_load_p (merged_store, info, base_addr, 1)
>-		  : !info->ops[1].base_addr))
>+		  : !info->ops[1].base_addr)
>+	      && check_no_overlap (m_store_info, i, info->rhs_code,
>+				   MAX (merged_store->last_order,
>+					info->order),
>+				   MAX (merged_store->start
>+					+ merged_store->width,
>+					info->bitpos + info->bitsize)))
> 	    {
> 	      merged_store->merge_into (info);
> 	      continue;
>--- gcc/testsuite/gcc.dg/pr84503-1.c.jj	2018-02-21 20:36:33.474226039
>+0100
>+++ gcc/testsuite/gcc.dg/pr84503-1.c	2018-02-21 20:20:53.144165116
>+0100
>@@ -0,0 +1,68 @@
>+/* PR tree-optimization/84503 */
>+/* { dg-do run } */
>+/* { dg-options "-O3" } */
>+
>+typedef __SIZE_TYPE__ size_t;
>+typedef __UINTPTR_TYPE__ uintptr_t;
>+
>+struct S { int a; unsigned short b; int c, d, e; long f, g, h; int i,
>j; };
>+static struct S *k;
>+static size_t l =3D 0;
>+int m;
>+
>+static int
>+bar (void)
>+{
>+  unsigned i;
>+  int j;
>+  if (k[0].c =3D=3D 0)
>+    {
>+      ++m;
>+      size_t n =3D l * 2;
>+      struct S *o;
>+      o =3D (struct S *) __builtin_realloc (k, sizeof (struct S) * n);
>+      if (!o)
>+	__builtin_exit (0);
>+      k =3D o;
>+      for (i =3D l; i < n; i++)
>+	{
>+	  void *p =3D (void *) &k[i];
>+	  int q =3D 0;
>+	  size_t r =3D sizeof (struct S);
>+	  if ((((uintptr_t) p) % __alignof__ (long)) =3D=3D 0
>+	      && r % sizeof (long) =3D=3D 0)
>+	    {
>+	      long __attribute__ ((may_alias)) *s =3D (long *) p;
>+	      long *t =3D (long *) ((char *) s + r);
>+	      while (s < t)
>+		*s++ =3D 0;
>+	    }
>+	  else
>+	    __builtin_memset (p, q, r);
>+	  k[i].c =3D i + 1;
>+	  k[i].a =3D -1;
>+	}
>+      k[n - 1].c =3D 0;
>+      k[0].c =3D l;
>+      l =3D n;
>+    }
>+  j =3D k[0].c;
>+  k[0].c =3D k[j].c;
>+  return j;
>+}
>+
>+int
>+main ()
>+{
>+  k =3D (struct S *) __builtin_malloc (sizeof (struct S));
>+  if (!k)
>+    __builtin_exit (0);
>+  __builtin_memset (k, '\0', sizeof (struct S));
>+  k->a =3D -1;
>+  l =3D 1;
>+  for (int i =3D 0; i < 15; ++i)
>+    bar ();
>+  if (m !=3D 4)
>+    __builtin_abort ();
>+  return 0;
>+}
>--- gcc/testsuite/gcc.dg/pr84503-2.c.jj	2018-02-21 20:36:41.874226585
>+0100
>+++ gcc/testsuite/gcc.dg/pr84503-2.c	2018-02-21 20:36:59.875227757
>+0100
>@@ -0,0 +1,5 @@
>+/* PR tree-optimization/84503 */
>+/* { dg-do run } */
>+/* { dg-options "-O3 -fno-tree-vectorize -fno-ivopts" } */
>+
>+#include "pr84503-1.c"
>
>	Jakub