public inbox for gcc-regression@sourceware.org
help / color / mirror / Atom feed
* [TCWG CI] 464.h264ref grew in size by 2% after gcc: tree-optimization/102436 - restore loop store motion
@ 2021-11-21 22:36 ci_notify
  0 siblings, 0 replies; only message in thread
From: ci_notify @ 2021-11-21 22:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-regression

After gcc commit 0fc859f5efcb4624a8b4ffdbf34d63972af179a8
Author: Richard Biener <rguenther@suse.de>

    tree-optimization/102436 - restore loop store motion

the following benchmarks grew in size by more than 1%:
- 464.h264ref grew in size by 2% from 424532 to 432716 bytes

Below reproducer instructions can be used to re-build both "first_bad" and "last_good" cross-toolchains used in this bisection.  Naturally, the scripts will fail when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and assembly files at:
- First_bad save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/build-0fc859f5efcb4624a8b4ffdbf34d63972af179a8/save-temps/
- Last_good save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/build-09d462146b3107c665265b11ad925c61a91c6efb/save-temps/
- Baseline save-temps: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: aarch64-linux-gnu
- Compiler flags: -Os
- Hardware: APM Mustang 8x X-Gene1

This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os

First_bad build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/build-0fc859f5efcb4624a8b4ffdbf34d63972af179a8/
Last_good build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/build-09d462146b3107c665265b11ad925c61a91c6efb/
Baseline build: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/build-baseline/
Even more details: https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-gcc-0fc859f5efcb4624a8b4ffdbf34d63972af179a8
cd investigate-gcc-0fc859f5efcb4624a8b4ffdbf34d63972af179a8

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/manifests/build-baseline.sh --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/manifests/build-parameters.sh --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_apm-gnu-master-aarch64-spec2k6-Os/11/artifact/artifacts/test.sh --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/

cd gcc

# Reproduce first_bad build
git checkout --detach 0fc859f5efcb4624a8b4ffdbf34d63972af179a8
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 09d462146b3107c665265b11ad925c61a91c6efb
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit 0fc859f5efcb4624a8b4ffdbf34d63972af179a8
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Nov 18 13:40:32 2021 +0100

    tree-optimization/102436 - restore loop store motion
    
    This restores a case of conditional store motion we fail to handle
    after the rewrite.  We can recognize the special case of all
    stores in a loop happening in a single conditionally executed
    block which ensures stores are not re-ordered by executing them
    in different loop iterations.  Separating out the case avoids
    complicating the already complex main path.
    
    2021-11-18  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/102436
            * tree-ssa-loop-im.c (execute_sm_if_changed): Add mode
            to just create the if structure and return the then block.
            (execute_sm): Add flag to indicate the var will re-use
            another flag var.
            (hoist_memory_references): Support a single conditional
            block with all stores as special case.
    
            * gcc.dg/torture/20211118-1.c: New testcase.
            * gcc.dg/tree-ssa/ssa-lim-18.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/20211118-1.c  |  27 +++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c |  19 ++++
 gcc/tree-ssa-loop-im.c                     | 162 +++++++++++++++++++++++++++--
 3 files changed, 199 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/20211118-1.c b/gcc/testsuite/gcc.dg/torture/20211118-1.c
new file mode 100644
index 00000000000..67ef68420df
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/20211118-1.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+
+unsigned p[3];
+void __attribute__((noipa))
+bar (float *q, int n, int m)
+{
+  for (int i = 0; i < m; ++i)
+    {
+      if (i == n)
+        {
+          unsigned a = p[1];
+          p[1] = a + 1;
+          *q = 1.;
+        }
+      q++;
+    }
+}
+
+int main()
+{
+#if __SIZEOF_FLOAT__ == __SIZEOF_INT__
+  bar ((float *)p, 1, 3);
+  if (((float *)p)[1] != 1.)
+    __builtin_abort ();
+#endif
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
new file mode 100644
index 00000000000..da19806b712
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fstrict-aliasing -fdump-tree-lim2-details" } */
+
+unsigned p;
+
+void foo (float *q)
+{
+  for (int i = 0; i < 256; ++i)
+    {
+      if (p)
+        {
+          unsigned a = p;
+          *(q++) = 1.;
+          p = a + 1;
+        }
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Executing store motion" 1 "lim2" } } */
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 8a81acae115..682406d26c1 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -1911,10 +1911,13 @@ first_mem_ref_loc (class loop *loop, im_mem_ref *ref)
        }
      }
      if (lsm_flag)	<--
-       MEM = lsm;	<--
+       MEM = lsm;	<-- (X)
+
+  In case MEM and TMP_VAR are NULL the function will return the then
+  block so the caller can insert (X) and other related stmts. 
 */
 
-static void
+static basic_block
 execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag,
 		       edge preheader, hash_set <basic_block> *flag_bbs,
 		       edge &append_cond_position, edge &last_cond_fallthru)
@@ -2009,10 +2012,13 @@ execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag,
 			    NULL_TREE, NULL_TREE);
   gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
 
-  gsi = gsi_start_bb (then_bb);
   /* Insert actual store.  */
-  stmt = gimple_build_assign (unshare_expr (mem), tmp_var);
-  gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+  if (mem)
+    {
+      gsi = gsi_start_bb (then_bb);
+      stmt = gimple_build_assign (unshare_expr (mem), tmp_var);
+      gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
+    }
 
   edge e1 = single_succ_edge (new_bb);
   edge e2 = make_edge (new_bb, then_bb,
@@ -2060,6 +2066,8 @@ execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag,
 	      update_stmt (phi);
 	    }
       }
+
+  return then_bb;
 }
 
 /* When REF is set on the location, set flag indicating the store.  */
@@ -2117,7 +2125,8 @@ struct sm_aux
 
 static void
 execute_sm (class loop *loop, im_mem_ref *ref,
-	    hash_map<im_mem_ref *, sm_aux *> &aux_map, bool maybe_mt)
+	    hash_map<im_mem_ref *, sm_aux *> &aux_map, bool maybe_mt,
+	    bool use_other_flag_var)
 {
   gassign *load;
   struct fmt_data fmt_data;
@@ -2146,7 +2155,7 @@ execute_sm (class loop *loop, im_mem_ref *ref,
 	  || (! flag_store_data_races && ! always_stored)))
     multi_threaded_model_p = true;
 
-  if (multi_threaded_model_p)
+  if (multi_threaded_model_p && !use_other_flag_var)
     aux->store_flag
       = execute_sm_if_changed_flag_set (loop, ref, &aux->flag_bbs);
   else
@@ -2182,7 +2191,7 @@ execute_sm (class loop *loop, im_mem_ref *ref,
   lim_data->tgt_loop = loop;
   gsi_insert_before (&gsi, load, GSI_SAME_STMT);
 
-  if (multi_threaded_model_p)
+  if (aux->store_flag)
     {
       load = gimple_build_assign (aux->store_flag, boolean_false_node);
       lim_data = init_lim_data (load);
@@ -2555,6 +2564,140 @@ hoist_memory_references (class loop *loop, bitmap mem_refs,
   unsigned  i;
   bitmap_iterator bi;
 
+  /* There's a special case we can use ordered re-materialization for
+     conditionally excuted stores which is when all stores in the loop
+     happen in the same basic-block.  In that case we know we'll reach
+     all stores and thus can simply process that BB and emit a single
+     conditional block of ordered materializations.  See PR102436.  */
+  basic_block single_store_bb = NULL;
+  EXECUTE_IF_SET_IN_BITMAP (&memory_accesses.all_refs_stored_in_loop[loop->num],
+			    0, i, bi)
+    {
+      bool fail = false;
+      ref = memory_accesses.refs_list[i];
+      for (auto loc : ref->accesses_in_loop)
+	if (!gimple_vdef (loc.stmt))
+	  ;
+	else if (!single_store_bb)
+	  {
+	    single_store_bb = gimple_bb (loc.stmt);
+	    bool conditional = false;
+	    for (edge e : exits)
+	      if (!dominated_by_p (CDI_DOMINATORS, e->src, single_store_bb))
+		{
+		  /* Conditional as seen from e.  */
+		  conditional = true;
+		  break;
+		}
+	    if (!conditional)
+	      {
+		fail = true;
+		break;
+	      }
+	  }
+	else if (single_store_bb != gimple_bb (loc.stmt))
+	  {
+	    fail = true;
+	    break;
+	  }
+      if (fail)
+	{
+	  single_store_bb = NULL;
+	  break;
+	}
+    }
+  if (single_store_bb)
+    {
+      /* Analyze the single block with stores.  */
+      auto_bitmap fully_visited;
+      auto_bitmap refs_not_supported;
+      auto_bitmap refs_not_in_seq;
+      auto_vec<seq_entry> seq;
+      bitmap_copy (refs_not_in_seq, mem_refs);
+      int res = sm_seq_valid_bb (loop, single_store_bb, NULL_TREE,
+				 seq, refs_not_in_seq, refs_not_supported,
+				 false, fully_visited);
+      if (res != 1)
+	{
+	  /* Unhandled refs can still fail this.  */
+	  bitmap_clear (mem_refs);
+	  return;
+	}
+
+      /* We cannot handle sm_other since we neither remember the
+	 stored location nor the value at the point we execute them.  */
+      for (unsigned i = 0; i < seq.length (); ++i)
+	{
+	  unsigned new_i;
+	  if (seq[i].second == sm_other
+	      && seq[i].from != NULL_TREE)
+	    seq[i].from = NULL_TREE;
+	  else if ((seq[i].second == sm_ord
+		    || (seq[i].second == sm_other
+			&& seq[i].from != NULL_TREE))
+		   && !sm_seq_push_down (seq, i, &new_i))
+	    {
+	      bitmap_set_bit (refs_not_supported, seq[new_i].first);
+	      seq[new_i].second = sm_other;
+	      seq[new_i].from = NULL_TREE;
+	    }
+	}
+      bitmap_and_compl_into (mem_refs, refs_not_supported);
+      if (bitmap_empty_p (mem_refs))
+	return;
+
+      /* Prune seq.  */
+      while (seq.last ().second == sm_other
+	     && seq.last ().from == NULL_TREE)
+	seq.pop ();
+
+      hash_map<im_mem_ref *, sm_aux *> aux_map;
+
+      /* Execute SM but delay the store materialization for ordered
+	 sequences on exit.  */
+      bool first_p = true;
+      EXECUTE_IF_SET_IN_BITMAP (mem_refs, 0, i, bi)
+	{
+	  ref = memory_accesses.refs_list[i];
+	  execute_sm (loop, ref, aux_map, true, !first_p);
+	  first_p = false;
+	}
+
+      /* Get at the single flag variable we eventually produced.  */
+      im_mem_ref *ref
+	= memory_accesses.refs_list[bitmap_first_set_bit (mem_refs)];
+      sm_aux *aux = *aux_map.get (ref);
+
+      /* Materialize ordered store sequences on exits.  */
+      edge e;
+      FOR_EACH_VEC_ELT (exits, i, e)
+	{
+	  edge append_cond_position = NULL;
+	  edge last_cond_fallthru = NULL;
+	  edge insert_e = e;
+	  /* Construct the single flag variable control flow and insert
+	     the ordered seq of stores in the then block.  With
+	     -fstore-data-races we can do the stores unconditionally.  */
+	  if (aux->store_flag)
+	    insert_e
+	      = single_pred_edge
+		  (execute_sm_if_changed (e, NULL_TREE, NULL_TREE,
+					  aux->store_flag,
+					  loop_preheader_edge (loop),
+					  &aux->flag_bbs, append_cond_position,
+					  last_cond_fallthru));
+	  execute_sm_exit (loop, insert_e, seq, aux_map, sm_ord,
+			   append_cond_position, last_cond_fallthru);
+	  gsi_commit_one_edge_insert (insert_e, NULL);
+	}
+
+      for (hash_map<im_mem_ref *, sm_aux *>::iterator iter = aux_map.begin ();
+	   iter != aux_map.end (); ++iter)
+	delete (*iter).second;
+
+      return;
+    }
+
   /* To address PR57359 before actually applying store-motion check
      the candidates found for validity with regards to reordering
      relative to other stores which we until here disambiguated using
@@ -2693,7 +2836,8 @@ hoist_memory_references (class loop *loop, bitmap mem_refs,
   EXECUTE_IF_SET_IN_BITMAP (mem_refs, 0, i, bi)
     {
       ref = memory_accesses.refs_list[i];
-      execute_sm (loop, ref, aux_map, bitmap_bit_p (refs_not_supported, i));
+      execute_sm (loop, ref, aux_map, bitmap_bit_p (refs_not_supported, i),
+		  false);
     }
 
   /* Materialize ordered store sequences on exits.  */
</cut>
>From hjl@sc.intel.com  Sun Nov 21 23:47:56 2021
Return-Path: <hjl@sc.intel.com>
X-Original-To: gcc-regression@gcc.gnu.org
Delivered-To: gcc-regression@gcc.gnu.org
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
 by sourceware.org (Postfix) with ESMTPS id 25F723858C3A
 for <gcc-regression@gcc.gnu.org>; Sun, 21 Nov 2021 23:47:55 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 25F723858C3A
X-IronPort-AV: E=McAfee;i="6200,9189,10175"; a="215409830"
X-IronPort-AV: E=Sophos;i="5.87,253,1631602800"; d="scan'208";a="215409830"
Received: from orsmga008.jf.intel.com ([10.7.209.65])
 by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 Nov 2021 15:47:53 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.87,253,1631602800"; d="scan'208";a="508715008"
Received: from scymds01.sc.intel.com ([10.148.94.138])
 by orsmga008.jf.intel.com with ESMTP; 21 Nov 2021 15:47:53 -0800
Received: from gnu-ivb-1.sc.intel.com (gnu-ivb-1.sc.intel.com [172.25.70.227])
 by scymds01.sc.intel.com with ESMTP id 1ALNlrvh024562;
 Sun, 21 Nov 2021 15:47:53 -0800
Received: by gnu-ivb-1.sc.intel.com (Postfix, from userid 1000)
 id 6E190120064; Sun, 21 Nov 2021 15:47:53 -0800 (PST)
Date: Sun, 21 Nov 2021 15:47:53 -0800
To: skpgkp2@gmail.com, hjl.tools@gmail.com, gcc-regression@gcc.gnu.org
Subject: Regressions on master at commit r12-5440 vs commit r12-5437 on
 Linux/i686
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20211121234753.6E190120064@gnu-ivb-1.sc.intel.com>
From: "H.J. Lu" <hjl@sc.intel.com>
X-Spam-Status: No, score=-3468.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-regression@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-regression mailing list <gcc-regression.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-regression/>
List-Post: <mailto:gcc-regression@gcc.gnu.org>
List-Help: <mailto:gcc-regression-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Nov 2021 23:47:56 -0000

New failures:

New passes:
FAIL: gcc.dg/tree-prof/merge_block.c scan-tree-dump-not optimized "Invalid sum"


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-11-21 22:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-21 22:36 [TCWG CI] 464.h264ref grew in size by 2% after gcc: tree-optimization/102436 - restore loop store motion ci_notify

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).