public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, GCC, gcc-5/6-branch] Fix PR77673: bswap loads passed end of object
@ 2016-12-13 11:43 Thomas Preudhomme
  2016-12-13 11:46 ` Thomas Preudhomme
  2016-12-14  8:08 ` Richard Biener
  0 siblings, 2 replies; 3+ messages in thread
From: Thomas Preudhomme @ 2016-12-13 11:43 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1625 bytes --]

Hi,

Fix in r242869 for PR77673 (bswap loads after end of object) applies cleanly on 
GCC 6 and with trivial fix for GCC 5 (gimple * in GCC 6 -> gimple in GCC 5). The 
backports also bootstrap fine on x86_64-linux-gnu and testsuite shows no regression.

ChangeLog entries are as follow:


*** gcc/ChangeLog ***

2016-12-12  Thomas Preud'homme  <thomas.preudhomme@arm.com>

         Backport from mainline
         2016-11-25  Thomas Preud'homme  <thomas.preudhomme@arm.com>

         PR tree-optimization/77673
         * tree-ssa-math-opts.c (struct symbolic_number): Add new src field.
         (init_symbolic_number): Initialize src field from src parameter.
         (perform_symbolic_merge): Select most dominated statement as the
         source statement.  Set src field of resulting n structure from the
         input src with the lowest address.
         (find_bswap_or_nop): Rename source_stmt into ins_stmt.
         (bswap_replace): Rename src_stmt into ins_stmt.  Initially get source
         of load from src field rather than insertion statement.  Cancel
         optimization if statement analyzed is not dominated by the insertion
         statement.
         (pass_optimize_bswap::execute): Rename src_stmt to ins_stmt.  Compute
         dominance information.


*** gcc/testsuite/ChangeLog ***

2016-12-12  Thomas Preud'homme  <thomas.preudhomme@arm.com>

         Backport from mainline
         2016-11-25  Thomas Preud'homme  <thomas.preudhomme@arm.com>

         PR tree-optimization/77673
         * gcc.dg/pr77673.c: New test.


Is this ok for gcc-5-branch and gcc-6-branch?

Best regards,

Thomas

[-- Attachment #2: 2_combine_profile_multilib.patch --]
[-- Type: text/x-patch, Size: 9988 bytes --]

diff --git a/gcc/config.gcc b/gcc/config.gcc
index bfd1127d6e8e647ca8c3a57dd2d58b586dffe4a5..d7508e89de0e1d7a3027da0b04e4fa36c1e95fd0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3710,34 +3710,18 @@ case "${target}" in
 		# Add extra multilibs
 		if test "x$with_multilib_list" != x; then
 			arm_multilibs=`echo $with_multilib_list | sed -e 's/,/ /g'`
-			case ${arm_multilibs} in
-			aprofile)
-				# Note that arm/t-aprofile is a
-				# stand-alone make file fragment to be
-				# used only with itself.  We do not
-				# specifically use the
-				# TM_MULTILIB_OPTION framework because
-				# this shorthand is more
-				# pragmatic.
-				tmake_profile_file="arm/t-aprofile"
-				;;
-			rmprofile)
-				# Note that arm/t-rmprofile is a
-				# stand-alone make file fragment to be
-				# used only with itself.  We do not
-				# specifically use the
-				# TM_MULTILIB_OPTION framework because
-				# this shorthand is more
-				# pragmatic.
-				tmake_profile_file="arm/t-rmprofile"
-				;;
-			default)
-				;;
-			*)
-				echo "Error: --with-multilib-list=${with_multilib_list} not supported." 1>&2
-				exit 1
-				;;
-			esac
+			if test "x${arm_multilibs}" != xdefault ; then
+				for arm_multilib in ${arm_multilibs}; do
+					case ${arm_multilib} in
+					aprofile|rmprofile)
+						tmake_profile_file="arm/t-multilib"
+						;;
+					*)
+						echo "Error: --with-multilib-list=${with_multilib_list} not supported." 1>&2
+						exit 1
+						;;
+					esac
+				done
 
 			if test "x${tmake_profile_file}" != x ; then
 				# arm/t-aprofile and arm/t-rmprofile are only
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 90305e1206e3964e08a673e385d3198747bdffa1..2cbd8e3c8e8bcd4bed6368bfea83ece953c8dbb4 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -24,30 +24,13 @@
 # have their default values during the configure step.  We enforce
 # this during the top-level configury.
 
-MULTILIB_OPTIONS     =
-MULTILIB_DIRNAMES    =
-MULTILIB_EXCEPTIONS  =
-MULTILIB_MATCHES     =
-MULTILIB_REUSE	     =
+# Arch and FPU variants to build libraries with
 
-# We have the following hierachy:
-#   ISA: A32 (.) or T32 (thumb)
-#   Architecture: ARMv7-A (v7-a), ARMv7VE (v7ve), or ARMv8-A (v8-a).
-#   FPU: VFPv3-D16 (fpv3), NEONv1 (simdv1), VFPv4-D16 (fpv4),
-#        NEON-VFPV4 (simdvfpv4), NEON for ARMv8 (simdv8), or None (.).
-#   Float-abi: Soft (.), softfp (softfp), or hard (hardfp).
+MULTI_ARCH_OPTS_A       = march=armv7-a/march=armv7ve/march=armv8-a
+MULTI_ARCH_DIRS_A       = v7-a v7ve v8-a
 
-MULTILIB_OPTIONS       += mthumb
-MULTILIB_DIRNAMES      += thumb
-
-MULTILIB_OPTIONS       += march=armv7-a/march=armv7ve/march=armv8-a
-MULTILIB_DIRNAMES      += v7-a v7ve v8-a
-
-MULTILIB_OPTIONS       += mfpu=vfpv3-d16/mfpu=neon/mfpu=vfpv4-d16/mfpu=neon-vfpv4/mfpu=neon-fp-armv8
-MULTILIB_DIRNAMES      += fpv3 simdv1 fpv4 simdvfpv4 simdv8
-
-MULTILIB_OPTIONS       += mfloat-abi=softfp/mfloat-abi=hard
-MULTILIB_DIRNAMES      += softfp hard
+MULTI_FPU_OPTS_A        = mfpu=vfpv3-d16/mfpu=neon/mfpu=vfpv4-d16/mfpu=neon-vfpv4/mfpu=neon-fp-armv8
+MULTI_FPU_DIRS_A        = fpv3 simdv1 fpv4 simdvfpv4 simdv8
 
 
 # Option combinations to build library with
@@ -71,7 +54,11 @@ MULTILIB_REQUIRED      += *march=armv8-a
 MULTILIB_REQUIRED      += *march=armv8-a/mfpu=neon-fp-armv8/mfloat-abi=*
 
 
+# Matches
+
 # CPU Matches
+MULTILIB_MATCHES       += march?armv7-a=mcpu?marvell-pj4
+MULTILIB_MATCHES       += march?armv7-a=mcpu?generic-armv7-a
 MULTILIB_MATCHES       += march?armv7-a=mcpu?cortex-a8
 MULTILIB_MATCHES       += march?armv7-a=mcpu?cortex-a9
 MULTILIB_MATCHES       += march?armv7-a=mcpu?cortex-a5
diff --git a/gcc/config/arm/t-multilib b/gcc/config/arm/t-multilib
new file mode 100644
index 0000000000000000000000000000000000000000..642e731765534904b752d7e15fb6b68a3db2708e
--- /dev/null
+++ b/gcc/config/arm/t-multilib
@@ -0,0 +1,69 @@
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# This is a target makefile fragment that attempts to get
+# multilibs built for the range of CPU's, FPU's and ABI's that
+# are relevant for the ARM architecture.  It should not be used in
+# conjunction with another make file fragment and assumes --with-arch,
+# --with-cpu, --with-fpu, --with-float, --with-mode have their default
+# values during the configure step.  We enforce this during the
+# top-level configury.
+
+MULTILIB_OPTIONS     =
+MULTILIB_DIRNAMES    =
+MULTILIB_EXCEPTIONS  =
+MULTILIB_MATCHES     =
+MULTILIB_REUSE	     =
+
+comma := ,
+tm_multilib_list := $(subst $(comma), ,$(TM_MULTILIB_CONFIG))
+
+HAS_APROFILE := $(filter aprofile,$(tm_multilib_list))
+HAS_RMPROFILE := $(filter rmprofile,$(tm_multilib_list))
+
+ifneq (,$(HAS_APROFILE))
+include $(srcdir)/config/arm/t-aprofile
+endif
+ifneq (,$(HAS_RMPROFILE))
+include $(srcdir)/config/arm/t-rmprofile
+endif
+SEP := $(and $(HAS_APROFILE),$(HAS_RMPROFILE),/)
+
+
+# We have the following hierachy:
+#   ISA: A32 (.) or T16/T32 (thumb)
+#   Architecture: ARMv6-M (v6-m), ARMv7-M (v7-m), ARMv7E-M (v7e-m),
+#                 ARMv7 (v7-ar), ARMv7-A (v7-a), ARMv7VE (v7ve),
+#                 ARMv8-M Baseline (v8-m.base), ARMv8-M Mainline (v8-m.main)
+#                 or ARMv8-A (v8-a).
+#   FPU: VFPv3-D16 (fpv3), NEONv1 (simdv1), FPV4-SP-D16 (fpv4-sp),
+#        VFPv4-D16 (fpv4), NEON-VFPV4 (simdvfpv4), FPV5-SP-D16 (fpv5-sp),
+#        VFPv5-D16 (fpv5), NEON for ARMv8 (simdv8), or None (.).
+#   Float-abi: Soft (.), softfp (softfp), or hard (hard).
+
+MULTILIB_OPTIONS       += mthumb
+MULTILIB_DIRNAMES      += thumb
+
+MULTILIB_OPTIONS       += $(MULTI_ARCH_OPTS_A)$(SEP)$(MULTI_ARCH_OPTS_RM)
+MULTILIB_DIRNAMES      += $(MULTI_ARCH_DIRS_A) $(MULTI_ARCH_DIRS_RM)
+
+MULTILIB_OPTIONS       += $(MULTI_FPU_OPTS_A)$(SEP)$(MULTI_FPU_OPTS_RM)
+MULTILIB_DIRNAMES      += $(MULTI_FPU_DIRS_A) $(MULTI_FPU_DIRS_RM)
+
+MULTILIB_OPTIONS       += mfloat-abi=softfp/mfloat-abi=hard
+MULTILIB_DIRNAMES      += softfp hard
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
index c195a6590c2f8e1753a9f2498583f5be89df7d1e..ea6054dfd58be63426fc77e3a00a25fde96a8583 100644
--- a/gcc/config/arm/t-rmprofile
+++ b/gcc/config/arm/t-rmprofile
@@ -24,33 +24,14 @@
 # values during the configure step.  We enforce this during the
 # top-level configury.
 
-MULTILIB_OPTIONS     =
-MULTILIB_DIRNAMES    =
-MULTILIB_EXCEPTIONS  =
-MULTILIB_MATCHES     =
-MULTILIB_REUSE       =
 
-# We have the following hierachy:
-#   ISA: A32 (.) or T16/T32 (thumb).
-#   Architecture: ARMv6S-M (v6-m), ARMv7-M (v7-m), ARMv7E-M (v7e-m),
-#                 ARMv8-M Baseline (v8-m.base) or ARMv8-M Mainline (v8-m.main).
-#   FPU: VFPv3-D16 (fpv3), FPV4-SP-D16 (fpv4-sp), FPV5-SP-D16 (fpv5-sp),
-#        VFPv5-D16 (fpv5), or None (.).
-#   Float-abi: Soft (.), softfp (softfp), or hard (hardfp).
+# Arch and FPU variants to build libraries with
 
-# Options to build libraries with
+MULTI_ARCH_OPTS_RM      = march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7/march=armv8-m.base/march=armv8-m.main
+MULTI_ARCH_DIRS_RM      = v6-m v7-m v7e-m v7-ar v8-m.base v8-m.main
 
-MULTILIB_OPTIONS       += mthumb
-MULTILIB_DIRNAMES      += thumb
-
-MULTILIB_OPTIONS       += march=armv6s-m/march=armv7-m/march=armv7e-m/march=armv7/march=armv8-m.base/march=armv8-m.main
-MULTILIB_DIRNAMES      += v6-m v7-m v7e-m v7-ar v8-m.base v8-m.main
-
-MULTILIB_OPTIONS       += mfpu=vfpv3-d16/mfpu=fpv4-sp-d16/mfpu=fpv5-sp-d16/mfpu=fpv5-d16
-MULTILIB_DIRNAMES      += fpv3 fpv4-sp fpv5-sp fpv5
-
-MULTILIB_OPTIONS       += mfloat-abi=softfp/mfloat-abi=hard
-MULTILIB_DIRNAMES      += softfp hard
+MULTI_FPU_OPTS_RM       = mfpu=vfpv3-d16/mfpu=fpv4-sp-d16/mfpu=fpv5-sp-d16/mfpu=fpv5-d16
+MULTI_FPU_DIRS_RM       = fpv3 fpv4-sp fpv5-sp fpv5
 
 
 # Option combinations to build library with
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 0b94bc1931a226e58d06a7ed5a726454142c006a..d91c82b569c017819cc2390085ed159e02929819 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1114,14 +1114,18 @@ for each target is given below.
 
 @table @code
 @item arm*-*-*
-@var{list} is one of@code{default}, @code{aprofile} or @code{rmprofile}.
-Specifying @code{default} is equivalent to omitting this option, ie. only the
-default runtime library will be enabled.  Specifying @code{aprofile} or
-@code{rmprofile} builds multilibs for a combination of ISA, architecture,
-FPU available and floating-point ABI.
+@var{list} is a comma separated list of @code{aprofile} and @code{rmprofile}
+to build multilibs for A or R and M architecture profiles respectively.  Note
+that, due to some limitation of the current multilib framework, using the
+combined @code{aprofile,rmprofile} multilibs selects in some cases a less
+optimal multilib than when using the multilib profile for the architecture
+targetted.  The special value @code{default} is also accepted and is equivalent
+to omitting the option, ie. only the default run-time library will be enabled.
 
 The table below gives the combination of ISAs, architectures, FPUs and
 floating-point ABIs for which multilibs are built for each accepted value.
+The union of these options is considered when specifying both @code{aprofile}
+and @code{rmprofile}.
 
 @multitable @columnfractions .15 .28 .30
 @item Option @tab aprofile @tab rmprofile

[-- Attachment #3: fix_pr77673_gcc5.patch --]
[-- Type: text/x-patch, Size: 7027 bytes --]

diff --git a/gcc/testsuite/gcc.dg/pr77673.c b/gcc/testsuite/gcc.dg/pr77673.c
new file mode 100644
index 0000000000000000000000000000000000000000..e03bcb9284d1e5a0e496cfa547fdbab630bec04f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr77673.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-require-effective-target bswap32 } */
+/* { dg-options "-O2 -fPIC -fdump-tree-bswap" } */
+/* { dg-additional-options "-march=z900" { target s390*-*-* } } */
+
+void mach_parse_compressed(unsigned char* ptr, unsigned long int* val)
+{
+  if (ptr[0] < 0xC0U) {
+    *val = ptr[0] + ptr[1];
+    return;
+  }
+
+  *val = ((unsigned long int)(ptr[0]) << 24)
+    | ((unsigned long int)(ptr[1]) << 16)
+    | ((unsigned long int)(ptr[2]) << 8)
+    | ptr[3];
+}
+
+/* { dg-final { scan-tree-dump-not "load_dst_\\d+ =.* if \\(" "bswap" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 61b65bb824b1e90ab13e3cb3ac1b4fbbc34a3b70..3882652dcfc0e04642196243a034f1f7c1405936 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1645,6 +1645,7 @@ struct symbolic_number {
   tree base_addr;
   tree offset;
   HOST_WIDE_INT bytepos;
+  tree src;
   tree alias_set;
   tree vuse;
   unsigned HOST_WIDE_INT range;
@@ -1746,6 +1747,7 @@ init_symbolic_number (struct symbolic_number *n, tree src)
   int size;
 
   n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE;
+  n->src = src;
 
   /* Set up the symbolic number N by setting each byte to a value between 1 and
      the byte size of rhs1.  The highest order byte is set to n->size and the
@@ -1859,6 +1861,7 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1,
       uint64_t inc;
       HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
       struct symbolic_number *toinc_n_ptr, *n_end;
+      basic_block bb1, bb2;
 
       if (!n1->base_addr || !n2->base_addr
 	  || !operand_equal_p (n1->base_addr, n2->base_addr, 0))
@@ -1872,15 +1875,20 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1,
 	{
 	  n_start = n1;
 	  start_sub = n2->bytepos - n1->bytepos;
-	  source_stmt = source_stmt1;
 	}
       else
 	{
 	  n_start = n2;
 	  start_sub = n1->bytepos - n2->bytepos;
-	  source_stmt = source_stmt2;
 	}
 
+      bb1 = gimple_bb (source_stmt1);
+      bb2 = gimple_bb (source_stmt2);
+      if (dominated_by_p (CDI_DOMINATORS, bb1, bb2))
+	source_stmt = source_stmt1;
+      else
+	source_stmt = source_stmt2;
+
       /* Find the highest address at which a load is performed and
 	 compute related info.  */
       end1 = n1->bytepos + (n1->range - 1);
@@ -1937,6 +1945,7 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1,
   n->vuse = n_start->vuse;
   n->base_addr = n_start->base_addr;
   n->offset = n_start->offset;
+  n->src = n_start->src;
   n->bytepos = n_start->bytepos;
   n->type = n_start->type;
   size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
@@ -2147,7 +2156,7 @@ find_bswap_or_nop (gimple stmt, struct symbolic_number *n, bool *bswap)
   uint64_t cmpxchg = CMPXCHG;
   uint64_t cmpnop = CMPNOP;
 
-  gimple source_stmt;
+  gimple ins_stmt;
   int limit;
 
   /* The last parameter determines the depth search limit.  It usually
@@ -2157,9 +2166,9 @@ find_bswap_or_nop (gimple stmt, struct symbolic_number *n, bool *bswap)
      in libgcc, and for initial shift/and operation of the src operand.  */
   limit = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (gimple_expr_type (stmt)));
   limit += 1 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);
-  source_stmt = find_bswap_or_nop_1 (stmt, n, limit);
+  ins_stmt = find_bswap_or_nop_1 (stmt, n, limit);
 
-  if (!source_stmt)
+  if (!ins_stmt)
     return NULL;
 
   /* Find real size of result (highest non-zero byte).  */
@@ -2201,7 +2210,7 @@ find_bswap_or_nop (gimple stmt, struct symbolic_number *n, bool *bswap)
     return NULL;
 
   n->range *= BITS_PER_UNIT;
-  return source_stmt;
+  return ins_stmt;
 }
 
 namespace {
@@ -2250,7 +2259,7 @@ public:
    changing of basic block.  */
 
 static bool
-bswap_replace (gimple cur_stmt, gimple src_stmt, tree fndecl, tree bswap_type,
+bswap_replace (gimple cur_stmt, gimple ins_stmt, tree fndecl, tree bswap_type,
 	       tree load_type, struct symbolic_number *n, bool bswap)
 {
   gimple_stmt_iterator gsi;
@@ -2258,18 +2267,24 @@ bswap_replace (gimple cur_stmt, gimple src_stmt, tree fndecl, tree bswap_type,
   gimple bswap_stmt;
 
   gsi = gsi_for_stmt (cur_stmt);
-  src = gimple_assign_rhs1 (src_stmt);
+  src = n->src;
   tgt = gimple_assign_lhs (cur_stmt);
 
   /* Need to load the value from memory first.  */
   if (n->base_addr)
     {
-      gimple_stmt_iterator gsi_ins = gsi_for_stmt (src_stmt);
+      gimple_stmt_iterator gsi_ins = gsi_for_stmt (ins_stmt);
       tree addr_expr, addr_tmp, val_expr, val_tmp;
       tree load_offset_ptr, aligned_load_type;
       gimple addr_stmt, load_stmt;
       unsigned align;
       HOST_WIDE_INT load_offset = 0;
+      basic_block ins_bb, cur_bb;
+
+      ins_bb = gimple_bb (ins_stmt);
+      cur_bb = gimple_bb (cur_stmt);
+      if (!dominated_by_p (CDI_DOMINATORS, cur_bb, ins_bb))
+	return false;
 
       align = get_object_alignment (src);
       /* If the new access is smaller than the original one, we need
@@ -2301,7 +2316,7 @@ bswap_replace (gimple cur_stmt, gimple src_stmt, tree fndecl, tree bswap_type,
       /* Move cur_stmt just before  one of the load of the original
 	 to ensure it has the same VUSE.  See PR61517 for what could
 	 go wrong.  */
-      if (gimple_bb (cur_stmt) != gimple_bb (src_stmt))
+      if (gimple_bb (cur_stmt) != gimple_bb (ins_stmt))
 	reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt));
       gsi_move_before (&gsi, &gsi_ins);
       gsi = gsi_for_stmt (cur_stmt);
@@ -2474,6 +2489,7 @@ pass_optimize_bswap::execute (function *fun)
 
   memset (&nop_stats, 0, sizeof (nop_stats));
   memset (&bswap_stats, 0, sizeof (bswap_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -2485,7 +2501,7 @@ pass_optimize_bswap::execute (function *fun)
 	 variant wouldn't be detected.  */
       for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
         {
-	  gimple src_stmt, cur_stmt = gsi_stmt (gsi);
+	  gimple ins_stmt, cur_stmt = gsi_stmt (gsi);
 	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
 	  enum tree_code code;
 	  struct symbolic_number n;
@@ -2518,9 +2534,9 @@ pass_optimize_bswap::execute (function *fun)
 	      continue;
 	    }
 
-	  src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
+	  ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
 
-	  if (!src_stmt)
+	  if (!ins_stmt)
 	    continue;
 
 	  switch (n.range)
@@ -2554,7 +2570,7 @@ pass_optimize_bswap::execute (function *fun)
 	  if (bswap && !fndecl && n.range != 16)
 	    continue;
 
-	  if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, load_type,
+	  if (bswap_replace (cur_stmt, ins_stmt, fndecl, bswap_type, load_type,
 			     &n, bswap))
 	    changed = true;
 	}

[-- Attachment #4: fix_pr77673_gcc6.patch --]
[-- Type: text/x-patch, Size: 6979 bytes --]

diff --git a/gcc/testsuite/gcc.dg/pr77673.c b/gcc/testsuite/gcc.dg/pr77673.c
new file mode 100644
index 0000000000000000000000000000000000000000..e03bcb9284d1e5a0e496cfa547fdbab630bec04f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr77673.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-require-effective-target bswap32 } */
+/* { dg-options "-O2 -fPIC -fdump-tree-bswap" } */
+/* { dg-additional-options "-march=z900" { target s390*-*-* } } */
+
+void mach_parse_compressed(unsigned char* ptr, unsigned long int* val)
+{
+  if (ptr[0] < 0xC0U) {
+    *val = ptr[0] + ptr[1];
+    return;
+  }
+
+  *val = ((unsigned long int)(ptr[0]) << 24)
+    | ((unsigned long int)(ptr[1]) << 16)
+    | ((unsigned long int)(ptr[2]) << 8)
+    | ptr[3];
+}
+
+/* { dg-final { scan-tree-dump-not "load_dst_\\d+ =.* if \\(" "bswap" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 735b7c67c31df0c8317544346396fb8b15315879..ac15e8179a3257d1e190086c8089bc85ed8552bf 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1951,6 +1951,7 @@ struct symbolic_number {
   tree base_addr;
   tree offset;
   HOST_WIDE_INT bytepos;
+  tree src;
   tree alias_set;
   tree vuse;
   unsigned HOST_WIDE_INT range;
@@ -2052,6 +2053,7 @@ init_symbolic_number (struct symbolic_number *n, tree src)
   int size;
 
   n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE;
+  n->src = src;
 
   /* Set up the symbolic number N by setting each byte to a value between 1 and
      the byte size of rhs1.  The highest order byte is set to n->size and the
@@ -2167,6 +2169,7 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1,
       uint64_t inc;
       HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
       struct symbolic_number *toinc_n_ptr, *n_end;
+      basic_block bb1, bb2;
 
       if (!n1->base_addr || !n2->base_addr
 	  || !operand_equal_p (n1->base_addr, n2->base_addr, 0))
@@ -2180,15 +2183,20 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1,
 	{
 	  n_start = n1;
 	  start_sub = n2->bytepos - n1->bytepos;
-	  source_stmt = source_stmt1;
 	}
       else
 	{
 	  n_start = n2;
 	  start_sub = n1->bytepos - n2->bytepos;
-	  source_stmt = source_stmt2;
 	}
 
+      bb1 = gimple_bb (source_stmt1);
+      bb2 = gimple_bb (source_stmt2);
+      if (dominated_by_p (CDI_DOMINATORS, bb1, bb2))
+	source_stmt = source_stmt1;
+      else
+	source_stmt = source_stmt2;
+
       /* Find the highest address at which a load is performed and
 	 compute related info.  */
       end1 = n1->bytepos + (n1->range - 1);
@@ -2245,6 +2253,7 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1,
   n->vuse = n_start->vuse;
   n->base_addr = n_start->base_addr;
   n->offset = n_start->offset;
+  n->src = n_start->src;
   n->bytepos = n_start->bytepos;
   n->type = n_start->type;
   size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
@@ -2455,7 +2464,7 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
   uint64_t cmpxchg = CMPXCHG;
   uint64_t cmpnop = CMPNOP;
 
-  gimple *source_stmt;
+  gimple *ins_stmt;
   int limit;
 
   /* The last parameter determines the depth search limit.  It usually
@@ -2465,9 +2474,9 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
      in libgcc, and for initial shift/and operation of the src operand.  */
   limit = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (gimple_expr_type (stmt)));
   limit += 1 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);
-  source_stmt = find_bswap_or_nop_1 (stmt, n, limit);
+  ins_stmt = find_bswap_or_nop_1 (stmt, n, limit);
 
-  if (!source_stmt)
+  if (!ins_stmt)
     return NULL;
 
   /* Find real size of result (highest non-zero byte).  */
@@ -2509,7 +2518,7 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
     return NULL;
 
   n->range *= BITS_PER_UNIT;
-  return source_stmt;
+  return ins_stmt;
 }
 
 namespace {
@@ -2558,7 +2567,7 @@ public:
    changing of basic block.  */
 
 static bool
-bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl,
+bswap_replace (gimple *cur_stmt, gimple *ins_stmt, tree fndecl,
 	       tree bswap_type, tree load_type, struct symbolic_number *n,
 	       bool bswap)
 {
@@ -2567,18 +2576,24 @@ bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl,
   gimple *bswap_stmt;
 
   gsi = gsi_for_stmt (cur_stmt);
-  src = gimple_assign_rhs1 (src_stmt);
+  src = n->src;
   tgt = gimple_assign_lhs (cur_stmt);
 
   /* Need to load the value from memory first.  */
   if (n->base_addr)
     {
-      gimple_stmt_iterator gsi_ins = gsi_for_stmt (src_stmt);
+      gimple_stmt_iterator gsi_ins = gsi_for_stmt (ins_stmt);
       tree addr_expr, addr_tmp, val_expr, val_tmp;
       tree load_offset_ptr, aligned_load_type;
       gimple *addr_stmt, *load_stmt;
       unsigned align;
       HOST_WIDE_INT load_offset = 0;
+      basic_block ins_bb, cur_bb;
+
+      ins_bb = gimple_bb (ins_stmt);
+      cur_bb = gimple_bb (cur_stmt);
+      if (!dominated_by_p (CDI_DOMINATORS, cur_bb, ins_bb))
+	return false;
 
       align = get_object_alignment (src);
       /* If the new access is smaller than the original one, we need
@@ -2610,7 +2625,7 @@ bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl,
       /* Move cur_stmt just before  one of the load of the original
 	 to ensure it has the same VUSE.  See PR61517 for what could
 	 go wrong.  */
-      if (gimple_bb (cur_stmt) != gimple_bb (src_stmt))
+      if (gimple_bb (cur_stmt) != gimple_bb (ins_stmt))
 	reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt));
       gsi_move_before (&gsi, &gsi_ins);
       gsi = gsi_for_stmt (cur_stmt);
@@ -2783,6 +2798,7 @@ pass_optimize_bswap::execute (function *fun)
 
   memset (&nop_stats, 0, sizeof (nop_stats));
   memset (&bswap_stats, 0, sizeof (bswap_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -2794,7 +2810,7 @@ pass_optimize_bswap::execute (function *fun)
 	 variant wouldn't be detected.  */
       for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
         {
-	  gimple *src_stmt, *cur_stmt = gsi_stmt (gsi);
+	  gimple *ins_stmt, *cur_stmt = gsi_stmt (gsi);
 	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
 	  enum tree_code code;
 	  struct symbolic_number n;
@@ -2827,9 +2843,9 @@ pass_optimize_bswap::execute (function *fun)
 	      continue;
 	    }
 
-	  src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
+	  ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
 
-	  if (!src_stmt)
+	  if (!ins_stmt)
 	    continue;
 
 	  switch (n.range)
@@ -2863,7 +2879,7 @@ pass_optimize_bswap::execute (function *fun)
 	  if (bswap && !fndecl && n.range != 16)
 	    continue;
 
-	  if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, load_type,
+	  if (bswap_replace (cur_stmt, ins_stmt, fndecl, bswap_type, load_type,
 			     &n, bswap))
 	    changed = true;
 	}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, GCC, gcc-5/6-branch] Fix PR77673: bswap loads passed end of object
  2016-12-13 11:43 [PATCH, GCC, gcc-5/6-branch] Fix PR77673: bswap loads passed end of object Thomas Preudhomme
@ 2016-12-13 11:46 ` Thomas Preudhomme
  2016-12-14  8:08 ` Richard Biener
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Preudhomme @ 2016-12-13 11:46 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1911 bytes --]

Sorry, ignore the first attachment (2_combine_profile_multilib.patch). i always 
miss that Thunderbird selects the first file in the in-review folder upfront.

Best regards,

Thomas

On 13/12/16 11:43, Thomas Preudhomme wrote:
> Hi,
>
> Fix in r242869 for PR77673 (bswap loads after end of object) applies cleanly on
> GCC 6 and with trivial fix for GCC 5 (gimple * in GCC 6 -> gimple in GCC 5). The
> backports also bootstrap fine on x86_64-linux-gnu and testsuite shows no
> regression.
>
> ChangeLog entries are as follow:
>
>
> *** gcc/ChangeLog ***
>
> 2016-12-12  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         Backport from mainline
>         2016-11-25  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         PR tree-optimization/77673
>         * tree-ssa-math-opts.c (struct symbolic_number): Add new src field.
>         (init_symbolic_number): Initialize src field from src parameter.
>         (perform_symbolic_merge): Select most dominated statement as the
>         source statement.  Set src field of resulting n structure from the
>         input src with the lowest address.
>         (find_bswap_or_nop): Rename source_stmt into ins_stmt.
>         (bswap_replace): Rename src_stmt into ins_stmt.  Initially get source
>         of load from src field rather than insertion statement.  Cancel
>         optimization if statement analyzed is not dominated by the insertion
>         statement.
>         (pass_optimize_bswap::execute): Rename src_stmt to ins_stmt.  Compute
>         dominance information.
>
>
> *** gcc/testsuite/ChangeLog ***
>
> 2016-12-12  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         Backport from mainline
>         2016-11-25  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         PR tree-optimization/77673
>         * gcc.dg/pr77673.c: New test.
>
>
> Is this ok for gcc-5-branch and gcc-6-branch?
>
> Best regards,
>
> Thomas

[-- Attachment #2: fix_pr77673_gcc5.patch --]
[-- Type: text/x-patch, Size: 7027 bytes --]

diff --git a/gcc/testsuite/gcc.dg/pr77673.c b/gcc/testsuite/gcc.dg/pr77673.c
new file mode 100644
index 0000000000000000000000000000000000000000..e03bcb9284d1e5a0e496cfa547fdbab630bec04f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr77673.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-require-effective-target bswap32 } */
+/* { dg-options "-O2 -fPIC -fdump-tree-bswap" } */
+/* { dg-additional-options "-march=z900" { target s390*-*-* } } */
+
+void mach_parse_compressed(unsigned char* ptr, unsigned long int* val)
+{
+  if (ptr[0] < 0xC0U) {
+    *val = ptr[0] + ptr[1];
+    return;
+  }
+
+  *val = ((unsigned long int)(ptr[0]) << 24)
+    | ((unsigned long int)(ptr[1]) << 16)
+    | ((unsigned long int)(ptr[2]) << 8)
+    | ptr[3];
+}
+
+/* { dg-final { scan-tree-dump-not "load_dst_\\d+ =.* if \\(" "bswap" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 61b65bb824b1e90ab13e3cb3ac1b4fbbc34a3b70..3882652dcfc0e04642196243a034f1f7c1405936 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1645,6 +1645,7 @@ struct symbolic_number {
   tree base_addr;
   tree offset;
   HOST_WIDE_INT bytepos;
+  tree src;
   tree alias_set;
   tree vuse;
   unsigned HOST_WIDE_INT range;
@@ -1746,6 +1747,7 @@ init_symbolic_number (struct symbolic_number *n, tree src)
   int size;
 
   n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE;
+  n->src = src;
 
   /* Set up the symbolic number N by setting each byte to a value between 1 and
      the byte size of rhs1.  The highest order byte is set to n->size and the
@@ -1859,6 +1861,7 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1,
       uint64_t inc;
       HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
       struct symbolic_number *toinc_n_ptr, *n_end;
+      basic_block bb1, bb2;
 
       if (!n1->base_addr || !n2->base_addr
 	  || !operand_equal_p (n1->base_addr, n2->base_addr, 0))
@@ -1872,15 +1875,20 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1,
 	{
 	  n_start = n1;
 	  start_sub = n2->bytepos - n1->bytepos;
-	  source_stmt = source_stmt1;
 	}
       else
 	{
 	  n_start = n2;
 	  start_sub = n1->bytepos - n2->bytepos;
-	  source_stmt = source_stmt2;
 	}
 
+      bb1 = gimple_bb (source_stmt1);
+      bb2 = gimple_bb (source_stmt2);
+      if (dominated_by_p (CDI_DOMINATORS, bb1, bb2))
+	source_stmt = source_stmt1;
+      else
+	source_stmt = source_stmt2;
+
       /* Find the highest address at which a load is performed and
 	 compute related info.  */
       end1 = n1->bytepos + (n1->range - 1);
@@ -1937,6 +1945,7 @@ perform_symbolic_merge (gimple source_stmt1, struct symbolic_number *n1,
   n->vuse = n_start->vuse;
   n->base_addr = n_start->base_addr;
   n->offset = n_start->offset;
+  n->src = n_start->src;
   n->bytepos = n_start->bytepos;
   n->type = n_start->type;
   size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
@@ -2147,7 +2156,7 @@ find_bswap_or_nop (gimple stmt, struct symbolic_number *n, bool *bswap)
   uint64_t cmpxchg = CMPXCHG;
   uint64_t cmpnop = CMPNOP;
 
-  gimple source_stmt;
+  gimple ins_stmt;
   int limit;
 
   /* The last parameter determines the depth search limit.  It usually
@@ -2157,9 +2166,9 @@ find_bswap_or_nop (gimple stmt, struct symbolic_number *n, bool *bswap)
      in libgcc, and for initial shift/and operation of the src operand.  */
   limit = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (gimple_expr_type (stmt)));
   limit += 1 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);
-  source_stmt = find_bswap_or_nop_1 (stmt, n, limit);
+  ins_stmt = find_bswap_or_nop_1 (stmt, n, limit);
 
-  if (!source_stmt)
+  if (!ins_stmt)
     return NULL;
 
   /* Find real size of result (highest non-zero byte).  */
@@ -2201,7 +2210,7 @@ find_bswap_or_nop (gimple stmt, struct symbolic_number *n, bool *bswap)
     return NULL;
 
   n->range *= BITS_PER_UNIT;
-  return source_stmt;
+  return ins_stmt;
 }
 
 namespace {
@@ -2250,7 +2259,7 @@ public:
    changing of basic block.  */
 
 static bool
-bswap_replace (gimple cur_stmt, gimple src_stmt, tree fndecl, tree bswap_type,
+bswap_replace (gimple cur_stmt, gimple ins_stmt, tree fndecl, tree bswap_type,
 	       tree load_type, struct symbolic_number *n, bool bswap)
 {
   gimple_stmt_iterator gsi;
@@ -2258,18 +2267,24 @@ bswap_replace (gimple cur_stmt, gimple src_stmt, tree fndecl, tree bswap_type,
   gimple bswap_stmt;
 
   gsi = gsi_for_stmt (cur_stmt);
-  src = gimple_assign_rhs1 (src_stmt);
+  src = n->src;
   tgt = gimple_assign_lhs (cur_stmt);
 
   /* Need to load the value from memory first.  */
   if (n->base_addr)
     {
-      gimple_stmt_iterator gsi_ins = gsi_for_stmt (src_stmt);
+      gimple_stmt_iterator gsi_ins = gsi_for_stmt (ins_stmt);
       tree addr_expr, addr_tmp, val_expr, val_tmp;
       tree load_offset_ptr, aligned_load_type;
       gimple addr_stmt, load_stmt;
       unsigned align;
       HOST_WIDE_INT load_offset = 0;
+      basic_block ins_bb, cur_bb;
+
+      ins_bb = gimple_bb (ins_stmt);
+      cur_bb = gimple_bb (cur_stmt);
+      if (!dominated_by_p (CDI_DOMINATORS, cur_bb, ins_bb))
+	return false;
 
       align = get_object_alignment (src);
       /* If the new access is smaller than the original one, we need
@@ -2301,7 +2316,7 @@ bswap_replace (gimple cur_stmt, gimple src_stmt, tree fndecl, tree bswap_type,
       /* Move cur_stmt just before  one of the load of the original
 	 to ensure it has the same VUSE.  See PR61517 for what could
 	 go wrong.  */
-      if (gimple_bb (cur_stmt) != gimple_bb (src_stmt))
+      if (gimple_bb (cur_stmt) != gimple_bb (ins_stmt))
 	reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt));
       gsi_move_before (&gsi, &gsi_ins);
       gsi = gsi_for_stmt (cur_stmt);
@@ -2474,6 +2489,7 @@ pass_optimize_bswap::execute (function *fun)
 
   memset (&nop_stats, 0, sizeof (nop_stats));
   memset (&bswap_stats, 0, sizeof (bswap_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -2485,7 +2501,7 @@ pass_optimize_bswap::execute (function *fun)
 	 variant wouldn't be detected.  */
       for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
         {
-	  gimple src_stmt, cur_stmt = gsi_stmt (gsi);
+	  gimple ins_stmt, cur_stmt = gsi_stmt (gsi);
 	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
 	  enum tree_code code;
 	  struct symbolic_number n;
@@ -2518,9 +2534,9 @@ pass_optimize_bswap::execute (function *fun)
 	      continue;
 	    }
 
-	  src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
+	  ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
 
-	  if (!src_stmt)
+	  if (!ins_stmt)
 	    continue;
 
 	  switch (n.range)
@@ -2554,7 +2570,7 @@ pass_optimize_bswap::execute (function *fun)
 	  if (bswap && !fndecl && n.range != 16)
 	    continue;
 
-	  if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, load_type,
+	  if (bswap_replace (cur_stmt, ins_stmt, fndecl, bswap_type, load_type,
 			     &n, bswap))
 	    changed = true;
 	}

[-- Attachment #3: fix_pr77673_gcc6.patch --]
[-- Type: text/x-patch, Size: 6979 bytes --]

diff --git a/gcc/testsuite/gcc.dg/pr77673.c b/gcc/testsuite/gcc.dg/pr77673.c
new file mode 100644
index 0000000000000000000000000000000000000000..e03bcb9284d1e5a0e496cfa547fdbab630bec04f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr77673.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target fpic } } */
+/* { dg-require-effective-target bswap32 } */
+/* { dg-options "-O2 -fPIC -fdump-tree-bswap" } */
+/* { dg-additional-options "-march=z900" { target s390*-*-* } } */
+
+void mach_parse_compressed(unsigned char* ptr, unsigned long int* val)
+{
+  if (ptr[0] < 0xC0U) {
+    *val = ptr[0] + ptr[1];
+    return;
+  }
+
+  *val = ((unsigned long int)(ptr[0]) << 24)
+    | ((unsigned long int)(ptr[1]) << 16)
+    | ((unsigned long int)(ptr[2]) << 8)
+    | ptr[3];
+}
+
+/* { dg-final { scan-tree-dump-not "load_dst_\\d+ =.* if \\(" "bswap" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 735b7c67c31df0c8317544346396fb8b15315879..ac15e8179a3257d1e190086c8089bc85ed8552bf 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1951,6 +1951,7 @@ struct symbolic_number {
   tree base_addr;
   tree offset;
   HOST_WIDE_INT bytepos;
+  tree src;
   tree alias_set;
   tree vuse;
   unsigned HOST_WIDE_INT range;
@@ -2052,6 +2053,7 @@ init_symbolic_number (struct symbolic_number *n, tree src)
   int size;
 
   n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE;
+  n->src = src;
 
   /* Set up the symbolic number N by setting each byte to a value between 1 and
      the byte size of rhs1.  The highest order byte is set to n->size and the
@@ -2167,6 +2169,7 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1,
       uint64_t inc;
       HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
       struct symbolic_number *toinc_n_ptr, *n_end;
+      basic_block bb1, bb2;
 
       if (!n1->base_addr || !n2->base_addr
 	  || !operand_equal_p (n1->base_addr, n2->base_addr, 0))
@@ -2180,15 +2183,20 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1,
 	{
 	  n_start = n1;
 	  start_sub = n2->bytepos - n1->bytepos;
-	  source_stmt = source_stmt1;
 	}
       else
 	{
 	  n_start = n2;
 	  start_sub = n1->bytepos - n2->bytepos;
-	  source_stmt = source_stmt2;
 	}
 
+      bb1 = gimple_bb (source_stmt1);
+      bb2 = gimple_bb (source_stmt2);
+      if (dominated_by_p (CDI_DOMINATORS, bb1, bb2))
+	source_stmt = source_stmt1;
+      else
+	source_stmt = source_stmt2;
+
       /* Find the highest address at which a load is performed and
 	 compute related info.  */
       end1 = n1->bytepos + (n1->range - 1);
@@ -2245,6 +2253,7 @@ perform_symbolic_merge (gimple *source_stmt1, struct symbolic_number *n1,
   n->vuse = n_start->vuse;
   n->base_addr = n_start->base_addr;
   n->offset = n_start->offset;
+  n->src = n_start->src;
   n->bytepos = n_start->bytepos;
   n->type = n_start->type;
   size = TYPE_PRECISION (n->type) / BITS_PER_UNIT;
@@ -2455,7 +2464,7 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
   uint64_t cmpxchg = CMPXCHG;
   uint64_t cmpnop = CMPNOP;
 
-  gimple *source_stmt;
+  gimple *ins_stmt;
   int limit;
 
   /* The last parameter determines the depth search limit.  It usually
@@ -2465,9 +2474,9 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
      in libgcc, and for initial shift/and operation of the src operand.  */
   limit = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (gimple_expr_type (stmt)));
   limit += 1 + (int) ceil_log2 ((unsigned HOST_WIDE_INT) limit);
-  source_stmt = find_bswap_or_nop_1 (stmt, n, limit);
+  ins_stmt = find_bswap_or_nop_1 (stmt, n, limit);
 
-  if (!source_stmt)
+  if (!ins_stmt)
     return NULL;
 
   /* Find real size of result (highest non-zero byte).  */
@@ -2509,7 +2518,7 @@ find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
     return NULL;
 
   n->range *= BITS_PER_UNIT;
-  return source_stmt;
+  return ins_stmt;
 }
 
 namespace {
@@ -2558,7 +2567,7 @@ public:
    changing of basic block.  */
 
 static bool
-bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl,
+bswap_replace (gimple *cur_stmt, gimple *ins_stmt, tree fndecl,
 	       tree bswap_type, tree load_type, struct symbolic_number *n,
 	       bool bswap)
 {
@@ -2567,18 +2576,24 @@ bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl,
   gimple *bswap_stmt;
 
   gsi = gsi_for_stmt (cur_stmt);
-  src = gimple_assign_rhs1 (src_stmt);
+  src = n->src;
   tgt = gimple_assign_lhs (cur_stmt);
 
   /* Need to load the value from memory first.  */
   if (n->base_addr)
     {
-      gimple_stmt_iterator gsi_ins = gsi_for_stmt (src_stmt);
+      gimple_stmt_iterator gsi_ins = gsi_for_stmt (ins_stmt);
       tree addr_expr, addr_tmp, val_expr, val_tmp;
       tree load_offset_ptr, aligned_load_type;
       gimple *addr_stmt, *load_stmt;
       unsigned align;
       HOST_WIDE_INT load_offset = 0;
+      basic_block ins_bb, cur_bb;
+
+      ins_bb = gimple_bb (ins_stmt);
+      cur_bb = gimple_bb (cur_stmt);
+      if (!dominated_by_p (CDI_DOMINATORS, cur_bb, ins_bb))
+	return false;
 
       align = get_object_alignment (src);
       /* If the new access is smaller than the original one, we need
@@ -2610,7 +2625,7 @@ bswap_replace (gimple *cur_stmt, gimple *src_stmt, tree fndecl,
       /* Move cur_stmt just before  one of the load of the original
 	 to ensure it has the same VUSE.  See PR61517 for what could
 	 go wrong.  */
-      if (gimple_bb (cur_stmt) != gimple_bb (src_stmt))
+      if (gimple_bb (cur_stmt) != gimple_bb (ins_stmt))
 	reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt));
       gsi_move_before (&gsi, &gsi_ins);
       gsi = gsi_for_stmt (cur_stmt);
@@ -2783,6 +2798,7 @@ pass_optimize_bswap::execute (function *fun)
 
   memset (&nop_stats, 0, sizeof (nop_stats));
   memset (&bswap_stats, 0, sizeof (bswap_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -2794,7 +2810,7 @@ pass_optimize_bswap::execute (function *fun)
 	 variant wouldn't be detected.  */
       for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
         {
-	  gimple *src_stmt, *cur_stmt = gsi_stmt (gsi);
+	  gimple *ins_stmt, *cur_stmt = gsi_stmt (gsi);
 	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
 	  enum tree_code code;
 	  struct symbolic_number n;
@@ -2827,9 +2843,9 @@ pass_optimize_bswap::execute (function *fun)
 	      continue;
 	    }
 
-	  src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
+	  ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
 
-	  if (!src_stmt)
+	  if (!ins_stmt)
 	    continue;
 
 	  switch (n.range)
@@ -2863,7 +2879,7 @@ pass_optimize_bswap::execute (function *fun)
 	  if (bswap && !fndecl && n.range != 16)
 	    continue;
 
-	  if (bswap_replace (cur_stmt, src_stmt, fndecl, bswap_type, load_type,
+	  if (bswap_replace (cur_stmt, ins_stmt, fndecl, bswap_type, load_type,
 			     &n, bswap))
 	    changed = true;
 	}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, GCC, gcc-5/6-branch] Fix PR77673: bswap loads passed end of object
  2016-12-13 11:43 [PATCH, GCC, gcc-5/6-branch] Fix PR77673: bswap loads passed end of object Thomas Preudhomme
  2016-12-13 11:46 ` Thomas Preudhomme
@ 2016-12-14  8:08 ` Richard Biener
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Biener @ 2016-12-14  8:08 UTC (permalink / raw)
  To: Thomas Preudhomme; +Cc: Jakub Jelinek, gcc-patches

On Tue, 13 Dec 2016, Thomas Preudhomme wrote:

> Hi,
> 
> Fix in r242869 for PR77673 (bswap loads after end of object) applies cleanly
> on GCC 6 and with trivial fix for GCC 5 (gimple * in GCC 6 -> gimple in GCC
> 5). The backports also bootstrap fine on x86_64-linux-gnu and testsuite shows
> no regression.

Ok.

Richard.

> ChangeLog entries are as follow:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2016-12-12  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> 
>         Backport from mainline
>         2016-11-25  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> 
>         PR tree-optimization/77673
>         * tree-ssa-math-opts.c (struct symbolic_number): Add new src field.
>         (init_symbolic_number): Initialize src field from src parameter.
>         (perform_symbolic_merge): Select most dominated statement as the
>         source statement.  Set src field of resulting n structure from the
>         input src with the lowest address.
>         (find_bswap_or_nop): Rename source_stmt into ins_stmt.
>         (bswap_replace): Rename src_stmt into ins_stmt.  Initially get source
>         of load from src field rather than insertion statement.  Cancel
>         optimization if statement analyzed is not dominated by the insertion
>         statement.
>         (pass_optimize_bswap::execute): Rename src_stmt to ins_stmt.  Compute
>         dominance information.
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2016-12-12  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> 
>         Backport from mainline
>         2016-11-25  Thomas Preud'homme  <thomas.preudhomme@arm.com>
> 
>         PR tree-optimization/77673
>         * gcc.dg/pr77673.c: New test.
> 
> 
> Is this ok for gcc-5-branch and gcc-6-branch?
> 
> Best regards,
> 
> Thomas
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-12-14  8:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13 11:43 [PATCH, GCC, gcc-5/6-branch] Fix PR77673: bswap loads passed end of object Thomas Preudhomme
2016-12-13 11:46 ` Thomas Preudhomme
2016-12-14  8:08 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).