[PATCH][v5] GIMPLE store merging pass

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][v5] GIMPLE store merging pass
@ 2016-10-10  8:05 Kyrill Tkachov
  2016-10-29  8:07 ` Andreas Schwab
  0 siblings, 1 reply; 5+ messages in thread
From: Kyrill Tkachov @ 2016-10-10  8:05 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 3080 bytes --]

Hi all,

This is another revision of the pass addressing Richard's feedback [1]
I believe I've addressed all of it and added more comments to the code where
needed.

The output_merged_store function now uses the new split_group helper to break
up the merged store into multiple regular-sized stores.

The apply_stores function that splats the stores in a group together can now
return a bool to indicate failure and is used to reject quickly one-store groups
and other store groups that we cannot output.

One thing I've been struggling with is reimplementing encode_tree_to_bitpos,
the function that applies a tree constant to the merged byte array.
I've tried to reimplement it by writing the constant to a byte array with native_encode_expr
and manipulating the bytes directly to insert them into the appropriate bit position without
constructing an intermediate wide_int.  This works, but only for little-endian.
On big-endian it generated wrong code.
So this patch doesn't include that implementation but rather uses the previous one that uses
a wide_int but is correct on both endiannesses.
Richard, I am sending out a patch that implements the cheaper algorithm separately if you
want to help debug it.

This has been bootstrapped and tested on arm, aarch64, aarch64_be, x86_64.
Besides the encode_tree_to_bitpos reimplementation (which will have its own thread)
does this version look good?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02225.html

2016-10-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR middle-end/22141
     * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
     * common.opt (fstore-merging): New Optimization option.
     * opts.c (default_options_table): Add entry for
     OPT_ftree_store_merging.
     * fold-const.h (can_native_encode_type_p): Declare prototype.
     * fold-const.c (can_native_encode_type_p): Define.
     * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
     * passes.def: Insert pass_tree_store_merging.
     * tree-pass.h (make_pass_store_merging): Declare extern
     prototype.
     * gimple-ssa-store-merging.c: New file.
     * doc/invoke.texi (Optimization Options): Document
     -fstore-merging.

2016-10-10  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
             Jakub Jelinek  <jakub@redhat.com>
             Andrew Pinski  <pinskia@gmail.com>

     PR middle-end/22141
     PR rtl-optimization/23684
     * gcc.c-torture/execute/pr22141-1.c: New test.
     * gcc.c-torture/execute/pr22141-2.c: Likewise.
     * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
     * gcc.target/aarch64/ldp_stp_4.c: Likewise.
     * gcc.dg/store_merging_1.c: New test.
     * gcc.dg/store_merging_2.c: Likewise.
     * gcc.dg/store_merging_3.c: Likewise.
     * gcc.dg/store_merging_4.c: Likewise.
     * gcc.dg/store_merging_5.c: Likewise.
     * gcc.dg/store_merging_6.c: Likewise.
     * gcc.dg/store_merging_7.c: Likewise.
     * gcc.target/i386/pr22141.c: Likewise.
     * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.
     * g++.dg/init/new17.C: Likewise.

[-- Attachment #2: store-merging.patch --]
[-- Type: text/x-patch, Size: 62491 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2a98e62b03ac8b84e4595660ac952a8bb3eb1d7f..fd4353fd94f3f12d1b4c799896a704981c6ea9a1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1300,6 +1300,7 @@ OBJS = \
 	gimple-ssa-isolate-paths.o \
 	gimple-ssa-nonnull-compare.o \
 	gimple-ssa-split-paths.o \
+	gimple-ssa-store-merging.o \
 	gimple-ssa-strength-reduction.o \
 	gimple-ssa-sprintf.o \
 	gimple-streamer-in.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index ca48872072e0780c25178e714f96a8aa7f37eb1a..79255c865e4ff4d49b4331337ede172e9e2d7e31 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1460,6 +1460,10 @@ fstrict-volatile-bitfields
 Common Report Var(flag_strict_volatile_bitfields) Init(-1) Optimization
 Force bitfield accesses to match their type width.
 
+fstore-merging
+Common Report Var(flag_store_merging) Optimization
+Merge adjacent stores.
+
 fguess-branch-probability
 Common Report Var(flag_guess_branch_prob) Optimization
 Enable guessing of branch probabilities.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d9667e7f5d91b25e4160fdfc6aae2e5d64ba260d..bd60decb4d2f7883e2c3834d5c819fba78622177 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -403,7 +403,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsingle-precision-constant -fsplit-ivs-in-unroller @gol
 -fsplit-paths @gol
 -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
--fstdarg-opt -fstrict-aliasing @gol
+-fstdarg-opt -fstore-merging -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
@@ -414,8 +414,8 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
--ftree-vectorize -ftree-vrp -funconstrained-commons @gol
+-ftree-switch-conversion -ftree-tail-merge @gol
+-ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-math-optimizations -funswitch-loops @gol
 -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
@@ -6604,6 +6604,7 @@ compilation time.
 -fsplit-wide-types @gol
 -fssa-backprop @gol
 -fssa-phiopt @gol
+-fstore-merging @gol
 -ftree-bit-ccp @gol
 -ftree-ccp @gol
 -ftree-ch @gol
@@ -7938,6 +7939,13 @@ Perform scalar replacement of aggregates.  This pass replaces structure
 references with scalars to prevent committing structures to memory too
 early.  This flag is enabled by default at @option{-O} and higher.
 
+@item -fstore-merging
+@opindex fstore-merging
+Perform merging of narrow stores to consecutive memory addresses.  This pass
+merges contiguous stores of immediate values narrower than a word into fewer
+wider stores to reduce the number of instructions.  This is enabled by default
+at @option{-O} and higher.
+
 @item -ftree-ter
 @opindex ftree-ter
 Perform temporary expression replacement during the SSA->normal phase.  Single
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 637e46b0d48788217196ba53110b69d302b17db7..af9c6c96baa953c99438189266c53703d49f644b 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -27,6 +27,7 @@ extern int folding_initializer;
 /* Convert between trees and native memory representation.  */
 extern int native_encode_expr (const_tree, unsigned char *, int, int off = -1);
 extern tree native_interpret_expr (tree, const unsigned char *, int);
+extern bool can_native_encode_type_p (tree);
 
 /* Fold constants as much as possible in an expression.
    Returns the simplified expression.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 65c75f639315d620eddfef3d8362b4902d28440a..564d086e9636743104d629cd6f5620ac2ee6a544 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -7515,6 +7515,26 @@ can_native_interpret_type_p (tree type)
     }
 }
 
+/* Return true iff a constant of type TYPE is accepted by
+   native_encode_expr.  */
+
+bool
+can_native_encode_type_p (tree type)
+{
+  switch (TREE_CODE (type))
+    {
+    case INTEGER_TYPE:
+    case REAL_TYPE:
+    case FIXED_POINT_TYPE:
+    case COMPLEX_TYPE:
+    case VECTOR_TYPE:
+    case POINTER_TYPE:
+      return true;
+    default:
+      return false;
+    }
+}
+
 /* Fold a VIEW_CONVERT_EXPR of a constant expression EXPR to type
    TYPE at compile-time.  If we're unable to perform the conversion
    return NULL_TREE.  */
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
new file mode 100644
index 0000000000000000000000000000000000000000..36c0d2396b6f4bd74aac2e9fb869b77432905767
--- /dev/null
+++ b/gcc/gimple-ssa-store-merging.c
@@ -0,0 +1,1219 @@
+/* GIMPLE store merging pass.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* The purpose of this pass is to combine multiple memory stores of
+   constant values to consecutive memory locations into fewer wider stores.
+   For example, if we have a sequence peforming four byte stores to
+   consecutive memory locations:
+   [p     ] := imm1;
+   [p + 1B] := imm2;
+   [p + 2B] := imm3;
+   [p + 3B] := imm4;
+   we can transform this into a single 4-byte store if the target supports it:
+  [p] := imm1:imm2:imm3:imm4 //concatenated immediates according to endianness.
+
+   The algorithm is applied to each basic block in three phases:
+
+   1) Scan through the basic block recording constant assignments to
+   destinations that can be expressed as a store to memory of a certain size
+   at a certain bit offset.  Record store chains to different bases in a
+   hash_map (m_stores) and make sure to terminate such chains when appropriate
+   (for example when when the stored values get used subsequently).
+   These stores can be a result of structure element initializers, array stores
+   etc.  A store_immediate_info object is recorded for every such store.
+   Record as many such assignments to a single base as possible until a
+   statement that interferes with the store sequence is encountered.
+
+   2) Analyze the chain of stores recorded in phase 1) (i.e. the vector of
+   store_immediate_info objects) and coalesce contiguous stores into
+   merged_store_group objects.
+
+   For example, given the stores:
+   [p     ] := 0;
+   [p + 1B] := 1;
+   [p + 3B] := 0;
+   [p + 4B] := 1;
+   [p + 5B] := 0;
+   [p + 6B] := 0;
+   This phase would produce two merged_store_group objects, one recording the
+   two bytes stored in the memory region [p : p + 1] and another
+   recording the four bytes stored in the memory region [p + 3 : p + 6].
+
+   3) The merged_store_group objects produced in phase 2) are processed
+   to generate the sequence of wider stores that set the contiguous memory
+   regions to the sequence of bytes that correspond to it.  This may emit
+   multiple stores per store group to handle contiguous stores that are not
+   of a size that is a power of 2.  For example it can try to emit a 40-bit
+   store as a 32-bit store followed by an 8-bit store.
+   We try to emit as wide stores as we can while respecting STRICT_ALIGNMENT or
+   SLOW_UNALIGNED_ACCESS rules.
+
+   Note on endianness and example:
+   Consider 2 contiguous 16-bit stores followed by 2 contiguous 8-bit stores:
+   [p     ] := 0x1234;
+   [p + 2B] := 0x5678;
+   [p + 4B] := 0xab;
+   [p + 5B] := 0xcd;
+
+   The memory layout for little-endian (LE) and big-endian (BE) must be:
+  p |LE|BE|
+  ---------
+  0 |34|12|
+  1 |12|34|
+  2 |78|56|
+  3 |56|78|
+  4 |ab|ab|
+  5 |cd|cd|
+
+  To merge these into a single 48-bit merged value 'val' in phase 2)
+  on little-endian we insert stores to higher (consecutive) bitpositions
+  into the most significant bits of the merged value.
+  The final merged value would be: 0xcdab56781234
+
+  For big-endian we insert stores to higher bitpositions into the least
+  significant bits of the merged value.
+  The final merged value would be: 0x12345678abcd
+
+  Then, in phase 3), we want to emit this 48-bit value as a 32-bit store
+  followed by a 16-bit store.  Again, we must consider endianness when
+  breaking down the 48-bit value 'val' computed above.
+  For little endian we emit:
+  [p]      (32-bit) := 0x56781234; // val & 0x0000ffffffff;
+  [p + 4B] (16-bit) := 0xcdab;    // (val & 0xffff00000000) >> 32;
+
+  Whereas for big-endian we emit:
+  [p]      (32-bit) := 0x12345678; // (val & 0xffffffff0000) >> 16;
+  [p + 4B] (16-bit) := 0xabcd;     //  val & 0x00000000ffff;  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "builtins.h"
+#include "fold-const.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-pretty-print.h"
+#include "alias.h"
+#include "fold-const.h"
+#include "params.h"
+#include "print-tree.h"
+#include "tree-hash-traits.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "stor-layout.h"
+#include "tree-cfg.h"
+#include "tree-eh.h"
+#include "target.h"
+
+/* The maximum size (in bits) of the stores this pass should generate.  */
+#define MAX_STORE_BITSIZE (BITS_PER_WORD)
+#define MAX_STORE_BYTES (MAX_STORE_BITSIZE / BITS_PER_UNIT)
+
+namespace {
+
+/* Struct recording the information about a single store of an immediate
+   to memory.  These are created in the first phase and coalesced into
+   merged_store_group objects in the second phase.  */
+
+struct store_immediate_info
+{
+  unsigned HOST_WIDE_INT bitsize;
+  unsigned HOST_WIDE_INT bitpos;
+  tree val;
+  tree dest;
+  gimple *stmt;
+  unsigned int order;
+  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, tree,
+			tree, gimple *, unsigned int);
+};
+
+store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
+					    unsigned HOST_WIDE_INT bp, tree v,
+					    tree d, gimple *st,
+					    unsigned int ord)
+  : bitsize (bs), bitpos (bp), val (v), dest (d), stmt (st), order (ord)
+{
+}
+
+/* Struct representing a group of stores to contiguous memory locations.
+   These are produced by the second phase (coalescing) and consumed in the
+   third phase that outputs the widened stores.  */
+
+struct merged_store_group
+{
+  unsigned HOST_WIDE_INT start;
+  unsigned HOST_WIDE_INT width;
+  /* The size of the allocated memory for val.  */
+  unsigned HOST_WIDE_INT buf_size;
+
+  unsigned int align;
+  unsigned int first_order;
+  unsigned int last_order;
+
+  auto_vec<struct store_immediate_info *> stores;
+  /* We record the first and last original statements in the sequence because
+     we'll need their vuse/vdef and replacement position.  It's easier to keep
+     track of them separately as 'stores' is reordered by apply_stores.  */
+  gimple *last_stmt;
+  gimple *first_stmt;
+  unsigned char *val;
+
+  merged_store_group (store_immediate_info *);
+  ~merged_store_group ();
+  void merge_into (store_immediate_info *);
+  void merge_overlapping (store_immediate_info *);
+  bool apply_stores ();
+};
+
+/* Debug helper.  Dump LEN elements of byte array PTR to FD in hex.  */
+
+static void
+dump_char_array (FILE *fd, unsigned char *ptr, unsigned int len)
+{
+  if (!fd)
+    return;
+
+  for (unsigned int i = 0; i < len; i++)
+    fprintf (fd, "%x ", ptr[i]);
+  fprintf (fd, "\n");
+}
+
+/* Write BITLEN bits of EXPR to the byte array PTR at
+   bit position BITPOS.  PTR should contain TOTAL_BYTES elements.
+   Return true if the operation succeeded.  */
+
+static bool
+encode_tree_to_bitpos (tree expr, unsigned char *ptr, int bitlen, int bitpos,
+			unsigned int total_bytes)
+{
+  unsigned int last_byte = (bitpos + bitlen - 1) / BITS_PER_UNIT;
+  unsigned int first_byte = bitpos / BITS_PER_UNIT;
+  tree tmp_int = expr;
+  bool sub_byte_op_p = (bitlen % BITS_PER_UNIT) || (bitpos % BITS_PER_UNIT);
+  /* If the expression is not an integer encode it into the temporary buffer
+     and read it back as an integer.  */
+  if (TREE_CODE (expr) != INTEGER_CST && sub_byte_op_p)
+    {
+      unsigned char *tmpbuf = XALLOCAVEC (unsigned char, total_bytes);
+      memset (tmpbuf, 0, total_bytes);
+      native_encode_expr (expr, tmpbuf, total_bytes, 0);
+      tree read_back_type
+	= build_nonstandard_integer_type (tree_to_shwi (
+		TYPE_SIZE (TREE_TYPE (expr))), UNSIGNED);
+      tmp_int
+	= native_interpret_expr (read_back_type, tmpbuf, total_bytes);
+    }
+
+  gcc_assert (tmp_int);
+
+  /* If we're inserting a non-bytesized width or not at a byte boundary
+     use an intermediate wide_int to perform the bit-insertion correctly.  */
+  if (sub_byte_op_p
+      || (TREE_CODE (expr) == INTEGER_CST
+	  && mode_for_size (bitlen, MODE_INT, 0) == BLKmode))
+    {
+      unsigned int byte_size = last_byte - first_byte + 1;
+
+      /* The functions native_encode_expr/native_interpret_expr use the
+	 TYPE_MODE of the type to determine the number of bytes to write/read
+	 so if we want to process a number of bytes that does not have a
+	 TYPE_MODE of equal size we need to use a type that has a valid mode
+	 for it.  */
+
+      machine_mode mode
+	= smallest_mode_for_size (byte_size * BITS_PER_UNIT, MODE_INT);
+      tree dest_int_type
+	= build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), UNSIGNED);
+      byte_size = GET_MODE_SIZE (mode);
+      /* The region from the byte array that we're inserting into.  */
+      tree ptr_wide_int
+	= native_interpret_expr (dest_int_type, ptr + first_byte,
+				 total_bytes);
+
+      gcc_assert (ptr_wide_int);
+      wide_int dest_wide_int
+	= wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type));
+
+      wide_int expr_wide_int
+	= wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT);
+      if (BYTES_BIG_ENDIAN)
+	{
+	  unsigned int insert_pos
+	    = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT);
+	  dest_wide_int
+	    = wi::insert (dest_wide_int, expr_wide_int, insert_pos, bitlen);
+	}
+      else
+	dest_wide_int = wi::insert (dest_wide_int, expr_wide_int,
+				    bitpos % BITS_PER_UNIT, bitlen);
+
+      tree res = wide_int_to_tree (dest_int_type, dest_wide_int);
+      if (!native_encode_expr (res, ptr + first_byte, total_bytes, 0))
+	return false;
+
+    }
+  /* If we're inserting a "well-behaved" value at a normal position just
+     call native_encode_expr directly.  */
+  else if (!native_encode_expr (tmp_int, ptr + first_byte,
+				 total_bytes, 0))
+    return false;
+
+  return true;
+}
+
+/* Sorting function for store_immediate_info objects.
+   Sorts them by bitposition.  */
+
+static int
+sort_by_bitpos (const void *x, const void *y)
+{
+  store_immediate_info *const *tmp = (store_immediate_info * const *) x;
+  store_immediate_info *const *tmp2 = (store_immediate_info * const *) y;
+
+  if ((*tmp)->bitpos <= (*tmp2)->bitpos)
+    return -1;
+  else if ((*tmp)->bitpos > (*tmp2)->bitpos)
+    return 1;
+
+  gcc_unreachable ();
+}
+
+/* Sorting function for store_immediate_info objects.
+   Sorts them by the order field.  */
+
+static int
+sort_by_order (const void *x, const void *y)
+{
+  store_immediate_info *const *tmp = (store_immediate_info * const *) x;
+  store_immediate_info *const *tmp2 = (store_immediate_info * const *) y;
+
+  if ((*tmp)->order < (*tmp2)->order)
+    return -1;
+  else if ((*tmp)->order > (*tmp2)->order)
+    return 1;
+
+  gcc_unreachable ();
+}
+
+/* Initialize a merged_store_group object from a store_immediate_info
+   object.  */
+
+merged_store_group::merged_store_group (store_immediate_info *info)
+{
+  start = info->bitpos;
+  width = info->bitsize;
+  /* VAL has memory allocated for it in apply_stores once the group
+     width has been finalized.  */
+  val = NULL;
+  align = get_object_alignment (info->dest);
+  stores.create (1);
+  stores.safe_push (info);
+  last_stmt = info->stmt;
+  last_order = info->order;
+  first_stmt = last_stmt;
+  first_order = last_order;
+  buf_size = 0;
+}
+
+merged_store_group::~merged_store_group ()
+{
+  if (val)
+    XDELETEVEC (val);
+}
+
+/* Merge a store recorded by INFO into this merged store.
+   The store is not overlapping with the existing recorded
+   stores.  */
+
+void
+merged_store_group::merge_into (store_immediate_info *info)
+{
+  unsigned HOST_WIDE_INT wid = info->bitsize;
+  /* Make sure we're inserting in the position we think we're inserting.  */
+  gcc_assert (info->bitpos == start + width);
+
+  width += wid;
+  gimple *stmt = info->stmt;
+  stores.safe_push (info);
+  if (info->order > last_order)
+    {
+      last_order = info->order;
+      last_stmt = stmt;
+    }
+  else if (info->order < first_order)
+    {
+      first_order = info->order;
+      first_stmt = stmt;
+    }
+}
+
+/* Merge a store described by INFO into this merged store.
+   INFO overlaps in some way with the current store (i.e. it's not contiguous
+   which is handled by merged_store_group::merge_into).  */
+
+void
+merged_store_group::merge_overlapping (store_immediate_info *info)
+{
+  gimple *stmt = info->stmt;
+  stores.safe_push (info);
+
+  /* If the store extends the size of the group, extend the width.  */
+  if ((info->bitpos + info->bitsize) > (start + width))
+    width += info->bitpos + info->bitsize - (start + width);
+
+  if (info->order > last_order)
+    {
+      last_order = info->order;
+      last_stmt = stmt;
+    }
+  else if (info->order < first_order)
+    {
+      first_order = info->order;
+      first_stmt = stmt;
+    }
+}
+
+/* Go through all the recorded stores in this group in program order and
+   apply their values to the VAL byte array to create the final merged
+   value.  Return true if the operation succeeded.  */
+
+bool
+merged_store_group::apply_stores ()
+{
+  /* The total width of the stores must add up to a whole number of bytes
+     and start at a byte boundary.  We don't support emitting bitfield
+     references for now.  Also, make sure we have more than one store
+     in the group, otherwise we cannot merge anything.  */
+  if (width % BITS_PER_UNIT != 0
+      || start % BITS_PER_UNIT != 0
+      || stores.length () == 1)
+    return false;
+
+  stores.qsort (sort_by_order);
+  struct store_immediate_info *info;
+  unsigned int i;
+  /* Create a buffer of a size that is 2 times the number of bytes we're
+     storing.  That way native_encode_expr can write power-of-2-sized
+     chunks without overrunning.  */
+  buf_size
+    = 2 * (ROUND_UP (width, BITS_PER_UNIT) / BITS_PER_UNIT);
+  val = XCNEWVEC (unsigned char, buf_size);
+
+  FOR_EACH_VEC_ELT (stores, i, info)
+    {
+      unsigned int pos_in_buffer = info->bitpos - start;
+      bool ret = encode_tree_to_bitpos (info->val, val, info->bitsize,
+					 pos_in_buffer, buf_size);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  if (ret)
+	    {
+	      fprintf (dump_file, "After writing ");
+	      print_generic_expr (dump_file, info->val, 0);
+	      fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
+			" at position %d the merged region contains:\n",
+			info->bitsize, pos_in_buffer);
+	      dump_char_array (dump_file, val, buf_size);
+	    }
+	  else
+	    fprintf (dump_file, "Failed to merge stores\n");
+        }
+      if (!ret)
+	return false;
+    }
+  return true;
+}
+
+/* Structure describing the store chain.  */
+
+struct imm_store_chain_info
+{
+  auto_vec<struct store_immediate_info *> m_store_info;
+  auto_vec<merged_store_group *> m_merged_store_groups;
+
+  bool terminate_and_process_chain (tree);
+  bool coalesce_immediate_stores ();
+  bool output_merged_store (tree, merged_store_group *);
+  bool output_merged_stores (tree);
+};
+
+const pass_data pass_data_tree_store_merging = {
+  GIMPLE_PASS,     /* type */
+  "store-merging", /* name */
+  OPTGROUP_NONE,   /* optinfo_flags */
+  TV_NONE,	 /* tv_id */
+  PROP_ssa,	/* properties_required */
+  0,		   /* properties_provided */
+  0,		   /* properties_destroyed */
+  0,		   /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_store_merging : public gimple_opt_pass
+{
+public:
+  pass_store_merging (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_tree_store_merging, ctxt)
+  {
+  }
+
+  /* Pass not supported for PDP-endianness.  */
+  virtual bool
+  gate (function *)
+  {
+    return flag_store_merging && (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
+  }
+
+  virtual unsigned int execute (function *);
+
+private:
+  hash_map<tree_operand_hash, struct imm_store_chain_info *> m_stores;
+
+  bool terminate_and_process_all_chains ();
+  bool terminate_all_aliasing_chains (tree, tree, gimple *);
+  bool terminate_and_release_chain (tree);
+}; // class pass_store_merging
+
+/* Terminate and process all recorded chains.  Return true if any changes
+   were made.  */
+
+bool
+pass_store_merging::terminate_and_process_all_chains ()
+{
+  hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator iter
+    = m_stores.begin ();
+  bool ret = false;
+  for (; iter != m_stores.end (); ++iter)
+    ret |= terminate_and_release_chain ((*iter).first);
+
+  return ret;
+}
+
+/* Terminate all chains that are affected by the assignment to DEST, appearing
+   in statement STMT and ultimately points to the object BASE.  Return true if
+   at least one aliasing chain was terminated.  BASE and DEST are allowed to
+   be NULL_TREE.  In that case the aliasing checks are performed on the whole
+   statement rather than a particular operand in it.  */
+
+bool
+pass_store_merging::terminate_all_aliasing_chains (tree dest, tree base,
+						   gimple *stmt)
+{
+  bool ret = false;
+
+  /* If the statement doesn't touch memory it can't alias.  */
+  if (!gimple_vuse (stmt))
+    return false;
+
+  struct imm_store_chain_info **chain_info = NULL;
+
+  /* Check if the assignment destination (BASE) is part of a store chain.
+     This is to catch non-constant stores to destinations that may be part
+     of a chain.  */
+  if (base)
+    {
+      chain_info = m_stores.get (base);
+      if (chain_info)
+	{
+	  struct store_immediate_info *info;
+	  unsigned int i;
+	  FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
+	    {
+	      if (refs_may_alias_p (info->dest, dest))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    {
+		      fprintf (dump_file, "stmt causes chain termination:\n");
+		      print_gimple_stmt (dump_file, stmt, 0, 0);
+		    }
+		  terminate_and_release_chain (base);
+		  ret = true;
+		  break;
+		}
+	    }
+	}
+    }
+
+  hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator iter
+    = m_stores.begin ();
+
+  /* Check for aliasing with all other store chains.  */
+  for (; iter != m_stores.end (); ++iter)
+    {
+      /* We already checked all the stores in chain_info and terminated the
+	 chain if necessary.  Skip it here.  */
+      if (chain_info && (*chain_info) == (*iter).second)
+	continue;
+
+      tree key = (*iter).first;
+      if (ref_maybe_used_by_stmt_p (stmt, key)
+	  || stmt_may_clobber_ref_p (stmt, key))
+	{
+	  terminate_and_release_chain (key);
+	  ret = true;
+	}
+    }
+
+  return ret;
+}
+
+/* Helper function.  Terminate the recorded chain storing to base object
+   BASE.  Return true if the merging and output was successful.  The m_stores
+   entry is removed after the processing in any case.  */
+
+bool
+pass_store_merging::terminate_and_release_chain (tree base)
+{
+  struct imm_store_chain_info **chain_info = m_stores.get (base);
+
+  if (!chain_info)
+    return false;
+
+  gcc_assert (*chain_info);
+
+  bool ret = (*chain_info)->terminate_and_process_chain (base);
+  delete *chain_info;
+  m_stores.remove (base);
+
+  return ret;
+}
+
+/* Go through the candidate stores recorded in m_store_info and merge them
+   into merged_store_group objects recorded into m_merged_store_groups
+   representing the widened stores.  Return true if coalescing was successful
+   and the number of widened stores is fewer than the original number
+   of stores.  */
+
+bool
+imm_store_chain_info::coalesce_immediate_stores ()
+{
+  /* Anything less can't be processed.  */
+  if (m_store_info.length () < 2)
+    return false;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Attempting to coalesce %u stores in chain.\n",
+	     m_store_info.length ());
+
+  store_immediate_info *info;
+  unsigned int i;
+
+  /* Order the stores by the bitposition they write to.  */
+  m_store_info.qsort (sort_by_bitpos);
+
+  info = m_store_info[0];
+  merged_store_group *merged_store = new merged_store_group (info);
+
+  FOR_EACH_VEC_ELT (m_store_info, i, info)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Store %u:\nbitsize:" HOST_WIDE_INT_PRINT_DEC
+			      " bitpos:" HOST_WIDE_INT_PRINT_DEC " val:\n",
+		   i, info->bitsize, info->bitpos);
+	  print_generic_expr (dump_file, info->val, 0);
+	  fprintf (dump_file, "\n------------\n");
+	}
+
+      if (i == 0)
+	continue;
+
+      /* |---store 1---|
+	       |---store 2---|
+       Overlapping stores.  */
+      unsigned HOST_WIDE_INT start = info->bitpos;
+      if (IN_RANGE (start, merged_store->start,
+		    merged_store->start + merged_store->width - 1))
+	{
+	  merged_store->merge_overlapping (info);
+	  continue;
+	}
+
+      /* |---store 1---| <gap> |---store 2---|.
+	 Gap between stores.  Start a new group.  */
+      if (start != merged_store->start + merged_store->width)
+	{
+	  /* Try to apply all the stores recorded for the group to determine
+	     the bitpattern they write and discard it if that fails.
+	     This will also reject single-store groups.  */
+	  if (!merged_store->apply_stores ())
+	    delete merged_store;
+	  else
+	    m_merged_store_groups.safe_push (merged_store);
+
+	  merged_store = new merged_store_group (info);
+
+	  continue;
+	}
+
+      /* |---store 1---||---store 2---|
+	 This store is consecutive to the previous one.
+	 Merge it into the current store group.  */
+       merged_store->merge_into (info);
+    }
+
+    /* Record or discard the last store group.  */
+    if (!merged_store->apply_stores ())
+      delete merged_store;
+    else
+      m_merged_store_groups.safe_push (merged_store);
+
+  gcc_assert (m_merged_store_groups.length () <= m_store_info.length ());
+  bool success
+    = !m_merged_store_groups.is_empty ()
+      && m_merged_store_groups.length () < m_store_info.length ();
+
+  if (success && dump_file)
+    fprintf (dump_file, "Coalescing successful!\n"
+			 "Merged into %u stores\n",
+		m_merged_store_groups.length ());
+
+  return success;
+}
+
+/* Return the type to use for the merged stores described by STMTS.
+   This is needed to get the alias sets right.  */
+
+static tree
+get_alias_type_for_stmts (auto_vec<gimple *> &stmts)
+{
+  gimple *stmt;
+  unsigned int i;
+  tree lhs = gimple_assign_lhs (stmts[0]);
+  tree type = reference_alias_ptr_type (lhs);
+
+  FOR_EACH_VEC_ELT (stmts, i, stmt)
+    {
+      if (i == 0)
+	continue;
+
+      lhs = gimple_assign_lhs (stmt);
+      tree type1 = reference_alias_ptr_type (lhs);
+      if (!alias_ptr_types_compatible_p (type, type1))
+	return ptr_type_node;
+    }
+  return type;
+}
+
+/* Return the location_t information we can find among the statements
+   in STMTS.  */
+
+static location_t
+get_location_for_stmts (auto_vec<gimple *> &stmts)
+{
+  gimple *stmt;
+  unsigned int i;
+
+  FOR_EACH_VEC_ELT (stmts, i, stmt)
+    if (gimple_has_location (stmt))
+      return gimple_location (stmt);
+
+  return UNKNOWN_LOCATION;
+}
+
+/* Used to decribe a store resulting from splitting a wide store in smaller
+   regularly-sized stores in split_group.  */
+
+struct split_store
+{
+  unsigned HOST_WIDE_INT bytepos;
+  unsigned HOST_WIDE_INT size;
+  unsigned HOST_WIDE_INT align;
+  auto_vec<gimple *> orig_stmts;
+  split_store (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+	       unsigned HOST_WIDE_INT);
+};
+
+/* Simple constructor.  */
+
+split_store::split_store (unsigned HOST_WIDE_INT bp,
+			  unsigned HOST_WIDE_INT sz,
+			  unsigned HOST_WIDE_INT al)
+			  : bytepos (bp), size (sz), align (al)
+{
+  orig_stmts.create (0);
+}
+
+/* Record all statements corresponding to stores in GROUP that write to
+   the region starting at BITPOS and is of size BITSIZE.  Record such
+   statements in STMTS.  The stores in GROUP must be sorted by
+   bitposition.  */
+
+static void
+find_constituent_stmts (struct merged_store_group *group,
+			 auto_vec<gimple *> &stmts,
+			 unsigned HOST_WIDE_INT bitpos,
+			 unsigned HOST_WIDE_INT bitsize)
+{
+  struct store_immediate_info *info;
+  unsigned int i;
+  unsigned HOST_WIDE_INT end = bitpos + bitsize;
+  FOR_EACH_VEC_ELT (group->stores, i, info)
+    {
+      unsigned HOST_WIDE_INT stmt_start = info->bitpos;
+      unsigned HOST_WIDE_INT stmt_end = stmt_start + info->bitsize;
+      if (stmt_end < bitpos)
+	continue;
+      /* The stores in GROUP are ordered by bitposition so if we're past
+	  the region for this group return early.  */
+      if (stmt_start > end)
+	return;
+
+      if (IN_RANGE (stmt_start, bitpos, bitpos + bitsize)
+	  || IN_RANGE (stmt_end, bitpos, end)
+	  /* The statement writes a region that completely encloses the region
+	     that this group writes.  Unlikely to occur but let's
+	     handle it.  */
+	  || IN_RANGE (bitpos, stmt_start, stmt_end))
+	stmts.safe_push (info->stmt);
+    }
+}
+
+/* Split a merged store described by GROUP by populating the SPLIT_STORES
+   vector with split_store structs describing the byte offset (from the base),
+   the bit size and alignment of each store as well as the original statements
+   involved in each such split group.
+   This is to separate the splitting strategy from the statement
+   building/emission/linking done in output_merged_store.
+   At the moment just start with the widest possible size and keep emitting
+   the widest we can until we have emitted all the bytes, halving the size
+   when appropriate.  */
+
+static bool
+split_group (merged_store_group *group,
+	     auto_vec<struct split_store *> &split_stores)
+{
+  unsigned HOST_WIDE_INT pos = group->start;
+  unsigned HOST_WIDE_INT size = group->width;
+  unsigned HOST_WIDE_INT bytepos = pos / BITS_PER_UNIT;
+  unsigned HOST_WIDE_INT align = group->align;
+
+  /* We don't handle partial bitfields for now.  We shouldn't have
+     reached this far.  */
+  gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0));
+
+  bool allow_unaligned
+    = !STRICT_ALIGNMENT && PARAM_VALUE (PARAM_STORE_MERGING_ALLOW_UNALIGNED);
+
+  unsigned int try_size = MAX_STORE_BITSIZE;
+  while (try_size > size
+	 || (!allow_unaligned
+	     && try_size > align))
+    {
+      try_size /= 2;
+      if (try_size < BITS_PER_UNIT)
+	return false;
+    }
+
+  unsigned HOST_WIDE_INT try_pos = bytepos;
+  group->stores.qsort (sort_by_bitpos);
+
+  while (size > 0)
+    {
+      struct split_store *store = new split_store (try_pos, try_size, align);
+      unsigned HOST_WIDE_INT try_bitpos = try_pos * BITS_PER_UNIT;
+      find_constituent_stmts (group, store->orig_stmts, try_bitpos, try_size);
+      split_stores.safe_push (store);
+
+      try_pos += try_size / BITS_PER_UNIT;
+
+      size -= try_size;
+      align = try_size;
+      while (size < try_size)
+	try_size /= 2;
+    }
+  return true;
+}
+
+/* Given a merged store group GROUP output the widened version of it.
+   The store chain is against the base object BASE.
+   Try store sizes of at most MAX_STORE_BITSIZE bits wide and don't output
+   unaligned stores for STRICT_ALIGNMENT targets or if it's too expensive.
+   Make sure that the number of statements output is less than the number of
+   original statements.  If a better sequence is possible emit it and
+   return true.  */
+
+bool
+imm_store_chain_info::output_merged_store (tree base, merged_store_group *group)
+{
+  unsigned HOST_WIDE_INT start_byte_pos = group->start / BITS_PER_UNIT;
+
+  unsigned int orig_num_stmts = group->stores.length ();
+  if (orig_num_stmts < 2)
+    return false;
+
+  auto_vec<struct split_store *> split_stores;
+  split_stores.create (0);
+  if (!split_group (group, split_stores))
+    return false;
+
+  gimple_stmt_iterator last_gsi = gsi_for_stmt (group->last_stmt);
+  gimple_seq seq = NULL;
+  unsigned int num_stmts = 0;
+  tree last_vdef, new_vuse;
+  last_vdef = gimple_vdef (group->last_stmt);
+  new_vuse = gimple_vuse (group->last_stmt);
+
+  gimple *stmt = NULL;
+  /* The new SSA names created.  Keep track of them so that we can free them
+     if we decide to not use the new sequence.  */
+  auto_vec<tree> new_ssa_names;
+  split_store *split_store;
+  unsigned int i;
+  bool fail = false;
+
+  FOR_EACH_VEC_ELT (split_stores, i, split_store)
+    {
+      unsigned HOST_WIDE_INT try_size = split_store->size;
+      unsigned HOST_WIDE_INT try_pos = split_store->bytepos;
+      unsigned HOST_WIDE_INT align = split_store->align;
+      tree offset_type = get_alias_type_for_stmts (split_store->orig_stmts);
+      location_t loc = get_location_for_stmts (split_store->orig_stmts);
+
+      tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
+      SET_TYPE_ALIGN (int_type, align);
+      tree addr = build_fold_addr_expr (base);
+      tree dest = fold_build2 (MEM_REF, int_type, addr,
+			       build_int_cst (offset_type, try_pos));
+
+      tree src = native_interpret_expr (int_type,
+					group->val + try_pos - start_byte_pos,
+					group->buf_size);
+
+      stmt = gimple_build_assign (dest, src);
+      gimple_set_location (stmt, loc);
+      gimple_set_vuse (stmt, new_vuse);
+      gimple_seq_add_stmt_without_update (&seq, stmt);
+
+      /* We didn't manage to reduce the number of statements.  Bail out.  */
+      if (++num_stmts == orig_num_stmts)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Exceeded original number of stmts (%u)."
+				  "  Not profitable to emit new sequence.\n",
+		       orig_num_stmts);
+	    }
+	  unsigned int ssa_count;
+	  tree ssa_name;
+	  /* Don't forget to cleanup the temporary SSA names.  */
+	  FOR_EACH_VEC_ELT (new_ssa_names, ssa_count, ssa_name)
+	    release_ssa_name (ssa_name);
+
+	  fail = true;
+	  break;
+	}
+
+      tree new_vdef;
+      if (i < split_stores.length () - 1)
+	{
+	  new_vdef = make_ssa_name (gimple_vop (cfun), stmt);
+	  new_ssa_names.safe_push (new_vdef);
+	}
+      else
+	new_vdef = last_vdef;
+
+      gimple_set_vdef (stmt, new_vdef);
+      SSA_NAME_DEF_STMT (new_vdef) = stmt;
+      new_vuse = new_vdef;
+    }
+
+  FOR_EACH_VEC_ELT (split_stores, i, split_store)
+    delete split_store;
+
+  if (fail)
+    return false;
+
+  gcc_assert (seq);
+  if (dump_file)
+    {
+      fprintf (dump_file,
+	       "New sequence of %u stmts to replace old one of %u stmts\n",
+	       num_stmts, orig_num_stmts);
+      if (dump_flags & TDF_DETAILS)
+	print_gimple_seq (dump_file, seq, 0, TDF_VOPS | TDF_MEMSYMS);
+    }
+  gsi_insert_seq_after (&last_gsi, seq, GSI_SAME_STMT);
+
+  return true;
+}
+
+/* Process the merged_store_group objects created in the coalescing phase.
+   The stores are all against the base object BASE.
+   Try to output the widened stores and delete the original statements if
+   successful.  Return true iff any changes were made.  */
+
+bool
+imm_store_chain_info::output_merged_stores (tree base)
+{
+  unsigned int i;
+  merged_store_group *merged_store;
+  bool ret = false;
+  FOR_EACH_VEC_ELT (m_merged_store_groups, i, merged_store)
+    {
+      if (output_merged_store (base, merged_store))
+	{
+	  unsigned int j;
+	  store_immediate_info *store;
+	  FOR_EACH_VEC_ELT (merged_store->stores, j, store)
+	    {
+	      gimple *stmt = store->stmt;
+	      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+	      gsi_remove (&gsi, true);
+	      if (stmt != merged_store->last_stmt)
+		{
+		  unlink_stmt_vdef (stmt);
+		  release_defs (stmt);
+		}
+	    }
+	  ret = true;
+	}
+    }
+  if (ret && dump_file)
+    fprintf (dump_file, "Merging successful!\n");
+
+  return ret;
+}
+
+/* Coalesce the store_immediate_info objects recorded against the base object
+   BASE in the first phase and output them.
+   Delete the allocated structures.
+   Return true if any changes were made.  */
+
+bool
+imm_store_chain_info::terminate_and_process_chain (tree base)
+{
+  /* Process store chain.  */
+  bool ret = false;
+  if (m_store_info.length () > 1)
+    {
+      ret = coalesce_immediate_stores ();
+      if (ret)
+	ret = output_merged_stores (base);
+    }
+
+  /* Delete all the entries we allocated ourselves.  */
+  store_immediate_info *info;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (m_store_info, i, info)
+    delete info;
+
+  merged_store_group *merged_info;
+  FOR_EACH_VEC_ELT (m_merged_store_groups, i, merged_info)
+    delete merged_info;
+
+  return ret;
+}
+
+/* Return true iff LHS is a destination potentially interesting for
+   store merging.  In practice these are the codes that get_inner_reference
+   can process.  */
+
+static bool
+lhs_valid_for_store_merging_p (tree lhs)
+{
+  tree_code code = TREE_CODE (lhs);
+
+  if (code == ARRAY_REF || code == ARRAY_RANGE_REF || code == MEM_REF
+      || code == COMPONENT_REF || code == BIT_FIELD_REF)
+    return true;
+
+  return false;
+}
+
+/* Return true if the tree RHS is a constant we want to consider
+   during store merging.  In practice accept all codes that
+   native_encode_expr accepts.  */
+
+static bool
+rhs_valid_for_store_merging_p (tree rhs)
+{
+  tree type = TREE_TYPE (rhs);
+  if (TREE_CODE_CLASS (TREE_CODE (rhs)) != tcc_constant
+      || !can_native_encode_type_p (type))
+    return false;
+
+  return true;
+}
+
+/* Entry point for the pass.  Go over each basic block recording chains of
+  immediate stores.  Upon encountering a terminating statement (as defined
+  by stmt_terminates_chain_p) process the recorded stores and emit the widened
+  variants.  */
+
+unsigned int
+pass_store_merging::execute (function *fun)
+{
+  basic_block bb;
+  hash_set<gimple *> orig_stmts;
+
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      gimple_stmt_iterator gsi;
+      unsigned HOST_WIDE_INT num_statements = 0;
+      /* Record the original statements so that we can keep track of
+	 statements emitted in this pass and not re-process new
+	 statements.  */
+      for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  if (is_gimple_debug (gsi_stmt (gsi)))
+	    continue;
+
+	  if (++num_statements > 2)
+	    break;
+	}
+
+      if (num_statements < 2)
+	continue;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Processing basic block <%d>:\n", bb->index);
+
+      for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+
+	  if (gimple_has_volatile_ops (stmt))
+	    {
+	      /* Terminate all chains.  */
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Volatile access terminates "
+				    "all chains\n");
+	      terminate_and_process_all_chains ();
+	      continue;
+	    }
+
+	  if (is_gimple_debug (stmt))
+	    continue;
+
+	  if (gimple_assign_single_p (stmt) && gimple_vdef (stmt)
+	      && !stmt_can_throw_internal (stmt)
+	      && lhs_valid_for_store_merging_p (gimple_assign_lhs (stmt)))
+	    {
+	      tree lhs = gimple_assign_lhs (stmt);
+	      tree rhs = gimple_assign_rhs1 (stmt);
+
+	      HOST_WIDE_INT bitsize, bitpos;
+	      machine_mode mode;
+	      int unsignedp = 0, reversep = 0, volatilep = 0;
+	      tree offset, base_addr;
+	      base_addr
+		= get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
+				       &unsignedp, &reversep, &volatilep);
+	      /* As a future enhancement we could handle stores with the same
+		 base and offset.  */
+	      bool invalid = offset || reversep
+			     || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
+				  && (TREE_CODE (rhs) != INTEGER_CST))
+			     || !rhs_valid_for_store_merging_p (rhs)
+		/* An access may not be volatile itself but base_addr may be
+		   a volatile decl i.e. MEM[&volatile-decl].  The hashing for
+		   tree_operand_hash won't consider such stores equal to each
+		   other so we can't track chains on them.  */
+			     || TREE_THIS_VOLATILE (base_addr);
+
+	      /* In some cases get_inner_reference may return a
+		 MEM_REF [ptr + byteoffset].  For the purposes of this pass
+		 canonicalize the base_addr to MEM_REF [ptr] and take
+		 byteoffset into account in the bitpos.  This occurs in
+		 PR 23684 and this way we can catch more chains.  */
+	      if (TREE_CODE (base_addr) == MEM_REF
+		  && POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (base_addr, 0))))
+		{
+		  offset_int bit_off, byte_off = mem_ref_offset (base_addr);
+		  bit_off = byte_off << LOG2_BITS_PER_UNIT;
+		  if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
+		    {
+		      bitpos += bit_off.to_shwi ();
+
+		      base_addr = copy_node (base_addr);
+		      TREE_OPERAND (base_addr, 1)
+			= build_zero_cst (TREE_TYPE (TREE_OPERAND (
+						       base_addr, 1)));
+		    }
+		  else
+		    invalid = true;
+		}
+
+	      struct imm_store_chain_info **chain_info
+		= m_stores.get (base_addr);
+
+	      if (!invalid)
+		{
+		  store_immediate_info *info;
+		  if (chain_info)
+		    {
+		      info = new store_immediate_info (
+			bitsize, bitpos, rhs, lhs, stmt,
+			(*chain_info)->m_store_info.length ());
+		      if (dump_file && (dump_flags & TDF_DETAILS))
+			{
+			  fprintf (dump_file,
+				   "Recording immediate store from stmt:\n");
+			  print_gimple_stmt (dump_file, stmt, 0, 0);
+			}
+		      (*chain_info)->m_store_info.safe_push (info);
+		      continue;
+		    }
+
+		  /* Store aliases any existing chain?  */
+		  terminate_all_aliasing_chains (lhs, base_addr, stmt);
+		  /* Start a new chain.  */
+		  struct imm_store_chain_info *new_chain
+		    = new imm_store_chain_info;
+		  info = new store_immediate_info (bitsize, bitpos, rhs, lhs,
+						   stmt, 0);
+		  new_chain->m_store_info.safe_push (info);
+		  m_stores.put (base_addr, new_chain);
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    {
+		      fprintf (dump_file,
+			       "Starting new chain with statement:\n");
+		      print_gimple_stmt (dump_file, stmt, 0, 0);
+		      fprintf (dump_file, "The base object is:\n");
+		      print_generic_expr (dump_file, base_addr, 0);
+		      fprintf (dump_file, "\n");
+		    }
+		}
+	      else
+		terminate_all_aliasing_chains (lhs, base_addr, stmt);
+
+	      continue;
+	    }
+
+	  terminate_all_aliasing_chains (NULL_TREE, NULL_TREE, stmt);
+	}
+      terminate_and_process_all_chains ();
+    }
+  return 0;
+}
+
+} // anon namespace
+
+/* Construct and return a store merging pass object.  */
+
+gimple_opt_pass *
+make_pass_store_merging (gcc::context *ctxt)
+{
+  return new pass_store_merging (ctxt);
+}
diff --git a/gcc/opts.c b/gcc/opts.c
index 45f1f89cd16cbc1fc061e50f62fc8a1cffca5075..1cf474bee1839b406f1ce6feee5f784cc2ca9057 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -522,6 +522,7 @@ static const struct default_options default_options_table[] =
     { OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
+    { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 },
 
     /* -O3 optimizations.  */
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
diff --git a/gcc/params.def b/gcc/params.def
index 8907aa4a0ffa5619b2b4022faa81db4fad89ce52..e63e5948089e1a6569cce386292d0d0890612256 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1100,6 +1100,12 @@ DEFPARAM (PARAM_MAX_TAIL_MERGE_COMPARISONS,
           "Maximum amount of similar bbs to compare a bb with.",
           10, 0, 0)
 
+DEFPARAM (PARAM_STORE_MERGING_ALLOW_UNALIGNED,
+	  "store-merging-allow-unaligned",
+	  "Allow the store merging pass to introduce unaligned stores "
+	  "if it is legal to do so",
+	  1, 0, 1)
+
 DEFPARAM (PARAM_MAX_TAIL_MERGE_ITERATIONS,
           "max-tail-merge-iterations",
           "Maximum amount of iterations of the pass over a function.",
diff --git a/gcc/passes.def b/gcc/passes.def
index 2830421a277a25b92506ce09e8b21067b6e1a3e2..ee7dd5032730e8741a379b1ed3a2cfa7e5ab12ef 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -329,6 +329,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_fold_builtins);
       NEXT_PASS (pass_optimize_widening_mul);
+      NEXT_PASS (pass_store_merging);
       NEXT_PASS (pass_tail_calls);
       /* If DCE is not run before checking for uninitialized uses,
 	 we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
diff --git a/gcc/testsuite/g++.dg/init/new17.C b/gcc/testsuite/g++.dg/init/new17.C
index a7b1659b97393b11c5ef043ca355808bfa0e22ce..f6a3231f9aa7eb8abdf061b50073176bc0cd37c3 100644
--- a/gcc/testsuite/g++.dg/init/new17.C
+++ b/gcc/testsuite/g++.dg/init/new17.C
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O2 -fstrict-aliasing -fdump-tree-optimized" }
+// { dg-options "-O2 -fstrict-aliasing -fno-store-merging -fdump-tree-optimized" }
 
 // Test that placement new does not introduce an unnecessary memory
 // barrier.
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr22141-1.c b/gcc/testsuite/gcc.c-torture/execute/pr22141-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..7c888b469cf39f00ced8ddb8cc6dc245fa30b97b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr22141-1.c
@@ -0,0 +1,122 @@
+/* PR middle-end/22141 */
+
+extern void abort (void);
+
+struct S
+{
+  struct T
+    {
+      char a;
+      char b;
+      char c;
+      char d;
+    } t;
+} u;
+
+struct U
+{
+  struct S s[4];
+};
+
+void __attribute__((noinline))
+c1 (struct T *p)
+{
+  if (p->a != 1 || p->b != 2 || p->c != 3 || p->d != 4)
+    abort ();
+  __builtin_memset (p, 0xaa, sizeof (*p));
+}
+
+void __attribute__((noinline))
+c2 (struct S *p)
+{
+  c1 (&p->t);
+}
+
+void __attribute__((noinline))
+c3 (struct U *p)
+{
+  c2 (&p->s[2]);
+}
+
+void __attribute__((noinline))
+f1 (void)
+{
+  u = (struct S) { { 1, 2, 3, 4 } };
+}
+
+void __attribute__((noinline))
+f2 (void)
+{
+  u.t.a = 1;
+  u.t.b = 2;
+  u.t.c = 3;
+  u.t.d = 4;
+}
+
+void __attribute__((noinline))
+f3 (void)
+{
+  u.t.d = 4;
+  u.t.b = 2;
+  u.t.a = 1;
+  u.t.c = 3;
+}
+
+void __attribute__((noinline))
+f4 (void)
+{
+  struct S v;
+  v.t.a = 1;
+  v.t.b = 2;
+  v.t.c = 3;
+  v.t.d = 4;
+  c2 (&v);
+}
+
+void __attribute__((noinline))
+f5 (struct S *p)
+{
+  p->t.a = 1;
+  p->t.c = 3;
+  p->t.d = 4;
+  p->t.b = 2;
+}
+
+void __attribute__((noinline))
+f6 (void)
+{
+  struct U v;
+  v.s[2].t.a = 1;
+  v.s[2].t.b = 2;
+  v.s[2].t.c = 3;
+  v.s[2].t.d = 4;
+  c3 (&v);
+}
+
+void __attribute__((noinline))
+f7 (struct U *p)
+{
+  p->s[2].t.a = 1;
+  p->s[2].t.c = 3;
+  p->s[2].t.d = 4;
+  p->s[2].t.b = 2;
+}
+
+int
+main (void)
+{
+  struct U w;
+  f1 ();
+  c2 (&u);
+  f2 ();
+  c1 (&u.t);
+  f3 ();
+  c2 (&u);
+  f4 ();
+  f5 (&u);
+  c2 (&u);
+  f6 ();
+  f7 (&w);
+  c3 (&w);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr22141-2.c b/gcc/testsuite/gcc.c-torture/execute/pr22141-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..cb9cc79026310260ffc3a83bfdf9bfc92f998a86
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr22141-2.c
@@ -0,0 +1,122 @@
+/* PR middle-end/22141 */
+
+extern void abort (void);
+
+struct S
+{
+  struct T
+    {
+      char a;
+      char b;
+      char c;
+      char d;
+    } t;
+} u __attribute__((aligned));
+
+struct U
+{
+  struct S s[4];
+};
+
+void __attribute__((noinline))
+c1 (struct T *p)
+{
+  if (p->a != 1 || p->b != 2 || p->c != 3 || p->d != 4)
+    abort ();
+  __builtin_memset (p, 0xaa, sizeof (*p));
+}
+
+void __attribute__((noinline))
+c2 (struct S *p)
+{
+  c1 (&p->t);
+}
+
+void __attribute__((noinline))
+c3 (struct U *p)
+{
+  c2 (&p->s[2]);
+}
+
+void __attribute__((noinline))
+f1 (void)
+{
+  u = (struct S) { { 1, 2, 3, 4 } };
+}
+
+void __attribute__((noinline))
+f2 (void)
+{
+  u.t.a = 1;
+  u.t.b = 2;
+  u.t.c = 3;
+  u.t.d = 4;
+}
+
+void __attribute__((noinline))
+f3 (void)
+{
+  u.t.d = 4;
+  u.t.b = 2;
+  u.t.a = 1;
+  u.t.c = 3;
+}
+
+void __attribute__((noinline))
+f4 (void)
+{
+  struct S v __attribute__((aligned));
+  v.t.a = 1;
+  v.t.b = 2;
+  v.t.c = 3;
+  v.t.d = 4;
+  c2 (&v);
+}
+
+void __attribute__((noinline))
+f5 (struct S *p)
+{
+  p->t.a = 1;
+  p->t.c = 3;
+  p->t.d = 4;
+  p->t.b = 2;
+}
+
+void __attribute__((noinline))
+f6 (void)
+{
+  struct U v __attribute__((aligned));
+  v.s[2].t.a = 1;
+  v.s[2].t.b = 2;
+  v.s[2].t.c = 3;
+  v.s[2].t.d = 4;
+  c3 (&v);
+}
+
+void __attribute__((noinline))
+f7 (struct U *p)
+{
+  p->s[2].t.a = 1;
+  p->s[2].t.c = 3;
+  p->s[2].t.d = 4;
+  p->s[2].t.b = 2;
+}
+
+int
+main (void)
+{
+  struct U w __attribute__((aligned));
+  f1 ();
+  c2 (&u);
+  f2 ();
+  c1 (&u.t);
+  f3 ();
+  c2 (&u);
+  f4 ();
+  f5 (&u);
+  c2 (&u);
+  f6 ();
+  f7 (&w);
+  c3 (&w);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/store_merging_1.c b/gcc/testsuite/gcc.dg/store_merging_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..35f4d82e6b22a231f1d7c6b3688a4bbcb57d2510
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+struct bar {
+  int a;
+  char b;
+  char c;
+  char d;
+  char e;
+  char f;
+  char g;
+};
+
+void
+foo1 (struct bar *p)
+{
+  p->b = 0;
+  p->a = 0;
+  p->c = 0;
+  p->d = 0;
+  p->e = 0;
+}
+
+void
+foo2 (struct bar *p)
+{
+  p->b = 0;
+  p->a = 0;
+  p->c = 1;
+  p->d = 0;
+  p->e = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_2.c b/gcc/testsuite/gcc.dg/store_merging_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e2acf390891284d96f646efd2b025a2ad7cb87d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_2.c
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+struct bar
+{
+  int a;
+  unsigned char b;
+  unsigned char c;
+  short d;
+  unsigned char e;
+  unsigned char f;
+  unsigned char g;
+};
+
+__attribute__ ((noinline)) void
+foozero (struct bar *p)
+{
+  p->b = 0;
+  p->a = 0;
+  p->c = 0;
+  p->d = 0;
+  p->e = 0;
+  p->f = 0;
+  p->g = 0;
+}
+
+__attribute__ ((noinline)) void
+foo1 (struct bar *p)
+{
+  p->b = 1;
+  p->a = 2;
+  p->c = 3;
+  p->d = 4;
+  p->e = 5;
+  p->f = 0;
+  p->g = 0xff;
+}
+
+__attribute__ ((noinline)) void
+foo2 (struct bar *p, struct bar *p2)
+{
+  p->b = 0xff;
+  p2->b = 0xa;
+  p->a = 0xfffff;
+  p2->c = 0xc;
+  p->c = 0xff;
+  p2->d = 0xbf;
+  p->d = 0xfff;
+}
+
+int
+main (void)
+{
+  struct bar b1, b2;
+  foozero (&b1);
+  foozero (&b2);
+
+  foo1 (&b1);
+  if (b1.b != 1 || b1.a != 2 || b1.c != 3 || b1.d != 4 || b1.e != 5
+      || b1.f != 0 || b1.g != 0xff)
+    __builtin_abort ();
+
+  foozero (&b1);
+  /* Make sure writes to aliasing struct pointers preserve the
+     correct order.  */
+  foo2 (&b1, &b1);
+  if (b1.b != 0xa || b1.a != 0xfffff || b1.c != 0xff || b1.d != 0xfff)
+    __builtin_abort ();
+
+  foozero (&b1);
+  foo2 (&b1, &b2);
+  if (b1.a != 0xfffff || b1.b != 0xff || b1.c != 0xff || b1.d != 0xfff
+      || b2.b != 0xa || b2.c != 0xc || b2.d != 0xbf)
+    __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_3.c b/gcc/testsuite/gcc.dg/store_merging_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..caf356da98159074488186dba6cad02233fa3aa2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging-details" } */
+
+/* Make sure stores to volatile addresses don't get combined with
+   other accesses.  */
+
+struct bar
+{
+  int a;
+  char b;
+  char c;
+  volatile short d;
+  char e;
+  char f;
+  char g;
+};
+
+void
+foozero (struct bar *p)
+{
+  p->b = 0xa;
+  p->a = 0xb;
+  p->c = 0xc;
+  p->d = 0;
+  p->e = 0xd;
+  p->f = 0xe;
+  p->g = 0xf;
+}
+
+/* { dg-final { scan-tree-dump "Volatile access terminates all chains" "store-merging" } } */
+/* { dg-final { scan-tree-dump-times "=\{v\} 0;" 1 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_4.c b/gcc/testsuite/gcc.dg/store_merging_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..a3d67697d6418ba0cd8aaad2f92d9ea720ec7ffc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_4.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+/* Check that we can merge interleaving stores that are guaranteed
+   to be non-aliasing.  */
+
+struct bar
+{
+  int a;
+  char b;
+  char c;
+  short d;
+  char e;
+  char f;
+  char g;
+};
+
+void
+foozero (struct bar *restrict p, struct bar *restrict p2)
+{
+  p->b = 0xff;
+  p2->b = 0xa;
+  p->a = 0xfffff;
+  p2->a = 0xab;
+  p2->c = 0xc;
+  p->c = 0xff;
+  p2->d = 0xbf;
+  p->d = 0xfff;
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_5.c b/gcc/testsuite/gcc.dg/store_merging_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ffe512b842b81d97d0158cb765669758c5ff898
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_5.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+/* Make sure that non-aliasing non-constant interspersed stores do not
+   stop chains.  */
+
+struct bar {
+  int a;
+  char b;
+  char c;
+  char d;
+  char e;
+  char g;
+};
+
+void
+foo1 (struct bar *p, char tmp)
+{
+  p->a = 0;
+  p->b = 0;
+  p->g = tmp;
+  p->c = 0;
+  p->d = 0;
+  p->e = 0;
+}
+
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[.*\\\]" 1 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_6.c b/gcc/testsuite/gcc.dg/store_merging_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..42b5c4f92dc8e99ec1a84cec4ce557eeaeab8d18
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_6.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+/* Check that we can widen accesses to bitfields.  */
+
+struct bar {
+  int a : 3;
+  unsigned char b : 4;
+  unsigned char c : 1;
+  char d;
+  char e;
+  char f;
+  char g;
+};
+
+__attribute__ ((noinline)) void
+foozero (struct bar *p)
+{
+  p->b = 0;
+  p->a = 0;
+  p->c = 0;
+  p->d = 0;
+  p->e = 0;
+  p->f = 0;
+  p->g = 0;
+}
+
+__attribute__ ((noinline)) void
+foo1 (struct bar *p)
+{
+  p->b = 3;
+  p->a = 2;
+  p->c = 1;
+  p->d = 4;
+  p->e = 5;
+}
+
+int
+main (void)
+{
+  struct bar p;
+  foozero (&p);
+  foo1 (&p);
+  if (p.a != 2 || p.b != 3 || p.c != 1 || p.d != 4 || p.e != 5
+      || p.f != 0 || p.g != 0)
+    __builtin_abort ();
+
+  return 0;
+}
+
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_7.c b/gcc/testsuite/gcc.dg/store_merging_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..4be352fef4a61d97f0e95784f5061f7fc124b937
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_7.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+/* Check that we can merge consecutive array members through the pointer.
+   PR rtl-optimization/23684.  */
+
+void
+foo (char *input)
+{
+  input = __builtin_assume_aligned (input, 8);
+  input[0] = 'H';
+  input[1] = 'e';
+  input[2] = 'l';
+  input[3] = 'l';
+  input[4] = 'o';
+  input[5] = ' ';
+  input[6] = 'w';
+  input[7] = 'o';
+  input[8] = 'r';
+  input[9] = 'l';
+  input[10] = 'd';
+  input[11] = '\0';
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
index f02e55f1cc2f01063aea35b3b88f793bb2f7c532..9de4e771ab1e73bce960d4038f8ec5b49b5c612c 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
@@ -3,22 +3,22 @@
 int arr[4][4];
 
 void
-foo ()
+foo (int x, int y)
 {
-  arr[0][1] = 1;
-  arr[1][0] = -1;
-  arr[2][0] = 1;
-  arr[1][1] = -1;
-  arr[0][2] = 1;
-  arr[0][3] = -1;
-  arr[1][2] = 1;
-  arr[2][1] = -1;
-  arr[3][0] = 1;
-  arr[3][1] = -1;
-  arr[2][2] = 1;
-  arr[1][3] = -1;
-  arr[2][3] = 1;
-  arr[3][2] = -1;
+  arr[0][1] = x;
+  arr[1][0] = y;
+  arr[2][0] = x;
+  arr[1][1] = y;
+  arr[0][2] = x;
+  arr[0][3] = y;
+  arr[1][2] = x;
+  arr[2][1] = y;
+  arr[3][0] = x;
+  arr[3][1] = y;
+  arr[2][2] = x;
+  arr[1][3] = y;
+  arr[2][3] = x;
+  arr[3][2] = y;
 }
 
 /* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 7 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_4.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_4.c
index 40056b1adebd3fe3e473e378e123ee62041da9a2..824f0d2e81bc250f40ffb71b3e39cde76c9ff28d 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_4.c
@@ -3,22 +3,22 @@
 float arr[4][4];
 
 void
-foo ()
+foo (float x, float y)
 {
-  arr[0][1] = 1;
-  arr[1][0] = -1;
-  arr[2][0] = 1;
-  arr[1][1] = -1;
-  arr[0][2] = 1;
-  arr[0][3] = -1;
-  arr[1][2] = 1;
-  arr[2][1] = -1;
-  arr[3][0] = 1;
-  arr[3][1] = -1;
-  arr[2][2] = 1;
-  arr[1][3] = -1;
-  arr[2][3] = 1;
-  arr[3][2] = -1;
+  arr[0][1] = x;
+  arr[1][0] = y;
+  arr[2][0] = x;
+  arr[1][1] = y;
+  arr[0][2] = x;
+  arr[0][3] = y;
+  arr[1][2] = x;
+  arr[2][1] = y;
+  arr[3][0] = x;
+  arr[3][1] = y;
+  arr[2][2] = x;
+  arr[1][3] = y;
+  arr[2][3] = x;
+  arr[3][2] = y;
 }
 
 /* { dg-final { scan-assembler-times "stp\ts\[0-9\]+, s\[0-9\]" 7 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr22141.c b/gcc/testsuite/gcc.target/i386/pr22141.c
new file mode 100644
index 0000000000000000000000000000000000000000..036422e8ccf3a60c8dde10b7ce90dd391afe7f1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr22141.c
@@ -0,0 +1,126 @@
+/* PR middle-end/22141 */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+extern void abort (void);
+
+struct S
+{
+  struct T
+    {
+      char a;
+      char b;
+      char c;
+      char d;
+    } t;
+} u;
+
+struct U
+{
+  struct S s[4];
+};
+
+void __attribute__((noinline))
+c1 (struct T *p)
+{
+  if (p->a != 1 || p->b != 2 || p->c != 3 || p->d != 4)
+    abort ();
+  __builtin_memset (p, 0xaa, sizeof (*p));
+}
+
+void __attribute__((noinline))
+c2 (struct S *p)
+{
+  c1 (&p->t);
+}
+
+void __attribute__((noinline))
+c3 (struct U *p)
+{
+  c2 (&p->s[2]);
+}
+
+void __attribute__((noinline))
+f1 (void)
+{
+  u = (struct S) { { 1, 2, 3, 4 } };
+}
+
+void __attribute__((noinline))
+f2 (void)
+{
+  u.t.a = 1;
+  u.t.b = 2;
+  u.t.c = 3;
+  u.t.d = 4;
+}
+
+void __attribute__((noinline))
+f3 (void)
+{
+  u.t.d = 4;
+  u.t.b = 2;
+  u.t.a = 1;
+  u.t.c = 3;
+}
+
+void __attribute__((noinline))
+f4 (void)
+{
+  struct S v;
+  v.t.a = 1;
+  v.t.b = 2;
+  v.t.c = 3;
+  v.t.d = 4;
+  c2 (&v);
+}
+
+void __attribute__((noinline))
+f5 (struct S *p)
+{
+  p->t.a = 1;
+  p->t.c = 3;
+  p->t.d = 4;
+  p->t.b = 2;
+}
+
+void __attribute__((noinline))
+f6 (void)
+{
+  struct U v;
+  v.s[2].t.a = 1;
+  v.s[2].t.b = 2;
+  v.s[2].t.c = 3;
+  v.s[2].t.d = 4;
+  c3 (&v);
+}
+
+void __attribute__((noinline))
+f7 (struct U *p)
+{
+  p->s[2].t.a = 1;
+  p->s[2].t.c = 3;
+  p->s[2].t.d = 4;
+  p->s[2].t.b = 2;
+}
+
+int
+main (void)
+{
+  struct U w;
+  f1 ();
+  c2 (&u);
+  f2 ();
+  c1 (&u.t);
+  f3 ();
+  c2 (&u);
+  f4 ();
+  f5 (&u);
+  c2 (&u);
+  f6 ();
+  f7 (&w);
+  c3 (&w);
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "67305985\|4030201" 7 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr34012.c b/gcc/testsuite/gcc.target/i386/pr34012.c
index 00b1240d1b9fb96fa558459bd4f357ad13a3881d..d0cffa052905a1c11e0927e92ab295d71517f746 100644
--- a/gcc/testsuite/gcc.target/i386/pr34012.c
+++ b/gcc/testsuite/gcc.target/i386/pr34012.c
@@ -1,7 +1,7 @@
 /* PR rtl-optimization/34012 */
 /* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fno-store-merging" } */
 
 void bar (long int *);
 void
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a706729fcff3675fe9c915dc1146cf67e9c88133..b5373a3df157874e91c7dc727d5164007467e7d1 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -425,6 +425,7 @@ extern gimple_opt_pass *make_pass_late_warn_uninitialized (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cse_reciprocals (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cse_sincos (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_optimize_bswap (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_store_merging (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_optimize_widening_mul (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_warn_function_return (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_warn_function_noreturn (gcc::context *ctxt);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][v5] GIMPLE store merging pass
  2016-10-10  8:05 [PATCH][v5] GIMPLE store merging pass Kyrill Tkachov
@ 2016-10-29  8:07 ` Andreas Schwab
  2016-10-29 15:57   ` [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148) Jakub Jelinek
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Schwab @ 2016-10-29  8:07 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Richard Biener

That breaks Ada:

a-teioed.adb: In function 'Ada.Text_Io.Editing.Format_Number':
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb: In function 'Ada.Text_Io.Editing.To_Picture':
a-teioed.adb:2724:4: error: alignment of array elements is greater than element size
a-teioed.adb: In function 'Ada.Text_Io.Editing.Valid':
a-teioed.adb:2751:4: error: alignment of array elements is greater than element size

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148)
  2016-10-29  8:07 ` Andreas Schwab
@ 2016-10-29 15:57   ` Jakub Jelinek
  2016-10-29 19:17     ` Richard Biener
  2016-10-31  9:37     ` Kyrill Tkachov
  0 siblings, 2 replies; 5+ messages in thread
From: Jakub Jelinek @ 2016-10-29 15:57 UTC (permalink / raw)
  To: Richard Biener, Andreas Schwab, Kyrill Tkachov; +Cc: gcc-patches

On Sat, Oct 29, 2016 at 10:07:22AM +0200, Andreas Schwab wrote:
> That breaks Ada:
> 
> a-teioed.adb: In function 'Ada.Text_Io.Editing.Format_Number':
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb: In function 'Ada.Text_Io.Editing.To_Picture':
> a-teioed.adb:2724:4: error: alignment of array elements is greater than element size
> a-teioed.adb: In function 'Ada.Text_Io.Editing.Valid':
> a-teioed.adb:2751:4: error: alignment of array elements is greater than element size

The bug is the same why PR78148 testcase fails with -fcompare-debug,
build_nonstandard_integer_type return values are shared, all such calls
with the same arguments return the same types, so using SET_TYPE_ALIGN on it
is wrong.

As it is a bootstrap failure on primary target and the fix is obvious,
I've committed it to trunk after bootstrapping/regtesting it on x86_64-linux
(where it fixed ada bootstrap) and i686-linux.

2016-10-29  Jakub Jelinek  <jakub@redhat.com>

	PR target/78148
	* gimple-ssa-store-merging.c
	(imm_store_chain_info::output_merged_store): Use build_aligned_type
	instead of SET_TYPE_ALIGN on shared integral type.

	* gcc.dg/pr78148.c: New test.

--- gcc/gimple-ssa-store-merging.c.jj	2016-10-29 14:39:24.000000000 +0200
+++ gcc/gimple-ssa-store-merging.c	2016-10-29 15:09:34.650749175 +0200
@@ -1130,7 +1130,7 @@ imm_store_chain_info::output_merged_stor
       location_t loc = get_location_for_stmts (split_store->orig_stmts);
 
       tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
-      SET_TYPE_ALIGN (int_type, align);
+      int_type = build_aligned_type (int_type, align);
       tree addr = build_fold_addr_expr (base);
       tree dest = fold_build2 (MEM_REF, int_type, addr,
 			       build_int_cst (offset_type, try_pos));
--- gcc/testsuite/gcc.dg/pr78148.c.jj	2016-10-29 15:10:05.432358626 +0200
+++ gcc/testsuite/gcc.dg/pr78148.c	2016-10-29 15:09:09.000000000 +0200
@@ -0,0 +1,31 @@
+/* PR target/78148 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcompare-debug" } */
+
+struct A { int a, b; };
+struct B { char c, d; };
+extern void bar (struct A, struct B);
+struct C { char e, f; } a;
+struct D
+{
+  int g;
+  struct C h[4];
+};
+struct D *e;
+
+struct D
+foo (void)
+{
+  int b;
+  struct B c;
+  struct A d;
+  d.b = c.c = c.d = 0;
+  bar (d, c);
+}
+
+void
+baz ()
+{
+  e->h[0].e = e->h[0].f = 0;
+  foo ();
+}

	Jakub

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148)
  2016-10-29 15:57   ` [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148) Jakub Jelinek
@ 2016-10-29 19:17     ` Richard Biener
  2016-10-31  9:37     ` Kyrill Tkachov
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Biener @ 2016-10-29 19:17 UTC (permalink / raw)
  To: Jakub Jelinek, Andreas Schwab, Kyrill Tkachov; +Cc: gcc-patches

On October 29, 2016 5:57:17 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Sat, Oct 29, 2016 at 10:07:22AM +0200, Andreas Schwab wrote:
>> That breaks Ada:
>> 
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.Format_Number':
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.To_Picture':
>> a-teioed.adb:2724:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.Valid':
>> a-teioed.adb:2751:4: error: alignment of array elements is greater
>than element size
>
>The bug is the same why PR78148 testcase fails with -fcompare-debug,
>build_nonstandard_integer_type return values are shared, all such calls
>with the same arguments return the same types, so using SET_TYPE_ALIGN
>on it
>is wrong.
>
>As it is a bootstrap failure on primary target and the fix is obvious,
>I've committed it to trunk after bootstrapping/regtesting it on
>x86_64-linux
>(where it fixed ada bootstrap) and i686-linux.

Whoops, sorry for not noticing during review.

Richard.

>2016-10-29  Jakub Jelinek  <jakub@redhat.com>
>
>	PR target/78148
>	* gimple-ssa-store-merging.c
>	(imm_store_chain_info::output_merged_store): Use build_aligned_type
>	instead of SET_TYPE_ALIGN on shared integral type.
>
>	* gcc.dg/pr78148.c: New test.
>
>--- gcc/gimple-ssa-store-merging.c.jj	2016-10-29 14:39:24.000000000
>+0200
>+++ gcc/gimple-ssa-store-merging.c	2016-10-29 15:09:34.650749175 +0200
>@@ -1130,7 +1130,7 @@ imm_store_chain_info::output_merged_stor
>     location_t loc = get_location_for_stmts (split_store->orig_stmts);
> 
>   tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
>-      SET_TYPE_ALIGN (int_type, align);
>+      int_type = build_aligned_type (int_type, align);
>       tree addr = build_fold_addr_expr (base);
>       tree dest = fold_build2 (MEM_REF, int_type, addr,
> 			       build_int_cst (offset_type, try_pos));
>--- gcc/testsuite/gcc.dg/pr78148.c.jj	2016-10-29 15:10:05.432358626
>+0200
>+++ gcc/testsuite/gcc.dg/pr78148.c	2016-10-29 15:09:09.000000000 +0200
>@@ -0,0 +1,31 @@
>+/* PR target/78148 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fcompare-debug" } */
>+
>+struct A { int a, b; };
>+struct B { char c, d; };
>+extern void bar (struct A, struct B);
>+struct C { char e, f; } a;
>+struct D
>+{
>+  int g;
>+  struct C h[4];
>+};
>+struct D *e;
>+
>+struct D
>+foo (void)
>+{
>+  int b;
>+  struct B c;
>+  struct A d;
>+  d.b = c.c = c.d = 0;
>+  bar (d, c);
>+}
>+
>+void
>+baz ()
>+{
>+  e->h[0].e = e->h[0].f = 0;
>+  foo ();
>+}
>
>	Jakub


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148)
  2016-10-29 15:57   ` [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148) Jakub Jelinek
  2016-10-29 19:17     ` Richard Biener
@ 2016-10-31  9:37     ` Kyrill Tkachov
  1 sibling, 0 replies; 5+ messages in thread
From: Kyrill Tkachov @ 2016-10-31  9:37 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Andreas Schwab; +Cc: gcc-patches


On 29/10/16 16:57, Jakub Jelinek wrote:
> On Sat, Oct 29, 2016 at 10:07:22AM +0200, Andreas Schwab wrote:
>> That breaks Ada:
>>
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.Format_Number':
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.To_Picture':
>> a-teioed.adb:2724:4: error: alignment of array elements is greater than element size
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.Valid':
>> a-teioed.adb:2751:4: error: alignment of array elements is greater than element size
> The bug is the same why PR78148 testcase fails with -fcompare-debug,
> build_nonstandard_integer_type return values are shared, all such calls
> with the same arguments return the same types, so using SET_TYPE_ALIGN on it
> is wrong.
>
> As it is a bootstrap failure on primary target and the fix is obvious,
> I've committed it to trunk after bootstrapping/regtesting it on x86_64-linux
> (where it fixed ada bootstrap) and i686-linux.

Thanks for fixing this Jakub!
Sorry for the breakage.
Kyrill

> 2016-10-29  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR target/78148
> 	* gimple-ssa-store-merging.c
> 	(imm_store_chain_info::output_merged_store): Use build_aligned_type
> 	instead of SET_TYPE_ALIGN on shared integral type.
>
> 	* gcc.dg/pr78148.c: New test.
>
> --- gcc/gimple-ssa-store-merging.c.jj	2016-10-29 14:39:24.000000000 +0200
> +++ gcc/gimple-ssa-store-merging.c	2016-10-29 15:09:34.650749175 +0200
> @@ -1130,7 +1130,7 @@ imm_store_chain_info::output_merged_stor
>         location_t loc = get_location_for_stmts (split_store->orig_stmts);
>   
>         tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
> -      SET_TYPE_ALIGN (int_type, align);
> +      int_type = build_aligned_type (int_type, align);
>         tree addr = build_fold_addr_expr (base);
>         tree dest = fold_build2 (MEM_REF, int_type, addr,
>   			       build_int_cst (offset_type, try_pos));
> --- gcc/testsuite/gcc.dg/pr78148.c.jj	2016-10-29 15:10:05.432358626 +0200
> +++ gcc/testsuite/gcc.dg/pr78148.c	2016-10-29 15:09:09.000000000 +0200
> @@ -0,0 +1,31 @@
> +/* PR target/78148 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcompare-debug" } */
> +
> +struct A { int a, b; };
> +struct B { char c, d; };
> +extern void bar (struct A, struct B);
> +struct C { char e, f; } a;
> +struct D
> +{
> +  int g;
> +  struct C h[4];
> +};
> +struct D *e;
> +
> +struct D
> +foo (void)
> +{
> +  int b;
> +  struct B c;
> +  struct A d;
> +  d.b = c.c = c.d = 0;
> +  bar (d, c);
> +}
> +
> +void
> +baz ()
> +{
> +  e->h[0].e = e->h[0].f = 0;
> +  foo ();
> +}
>
> 	Jakub

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-10-31  9:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-10  8:05 [PATCH][v5] GIMPLE store merging pass Kyrill Tkachov
2016-10-29  8:07 ` Andreas Schwab
2016-10-29 15:57   ` [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148) Jakub Jelinek
2016-10-29 19:17     ` Richard Biener
2016-10-31  9:37     ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).