* [PATCH][v5] GIMPLE store merging pass
@ 2016-10-10 8:05 Kyrill Tkachov
2016-10-29 8:07 ` Andreas Schwab
0 siblings, 1 reply; 5+ messages in thread
From: Kyrill Tkachov @ 2016-10-10 8:05 UTC (permalink / raw)
To: GCC Patches; +Cc: Richard Biener
[-- Attachment #1: Type: text/plain, Size: 3080 bytes --]
Hi all,
This is another revision of the pass addressing Richard's feedback [1]
I believe I've addressed all of it and added more comments to the code where
needed.
The output_merged_store function now uses the new split_group helper to break
up the merged store into multiple regular-sized stores.
The apply_stores function that splats the stores in a group together can now
return a bool to indicate failure and is used to reject quickly one-store groups
and other store groups that we cannot output.
One thing I've been struggling with is reimplementing encode_tree_to_bitpos,
the function that applies a tree constant to the merged byte array.
I've tried to reimplement it by writing the constant to a byte array with native_encode_expr
and manipulating the bytes directly to insert them into the appropriate bit position without
constructing an intermediate wide_int. This works, but only for little-endian.
On big-endian it generated wrong code.
So this patch doesn't include that implementation but rather uses the previous one that uses
a wide_int but is correct on both endiannesses.
Richard, I am sending out a patch that implements the cheaper algorithm separately if you
want to help debug it.
This has been bootstrapped and tested on arm, aarch64, aarch64_be, x86_64.
Besides the encode_tree_to_bitpos reimplementation (which will have its own thread)
does this version look good?
Thanks,
Kyrill
[1] https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02225.html
2016-10-10 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
PR middle-end/22141
* Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
* common.opt (fstore-merging): New Optimization option.
* opts.c (default_options_table): Add entry for
OPT_ftree_store_merging.
* fold-const.h (can_native_encode_type_p): Declare prototype.
* fold-const.c (can_native_encode_type_p): Define.
* params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
* passes.def: Insert pass_tree_store_merging.
* tree-pass.h (make_pass_store_merging): Declare extern
prototype.
* gimple-ssa-store-merging.c: New file.
* doc/invoke.texi (Optimization Options): Document
-fstore-merging.
2016-10-10 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Jakub Jelinek <jakub@redhat.com>
Andrew Pinski <pinskia@gmail.com>
PR middle-end/22141
PR rtl-optimization/23684
* gcc.c-torture/execute/pr22141-1.c: New test.
* gcc.c-torture/execute/pr22141-2.c: Likewise.
* gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
* gcc.target/aarch64/ldp_stp_4.c: Likewise.
* gcc.dg/store_merging_1.c: New test.
* gcc.dg/store_merging_2.c: Likewise.
* gcc.dg/store_merging_3.c: Likewise.
* gcc.dg/store_merging_4.c: Likewise.
* gcc.dg/store_merging_5.c: Likewise.
* gcc.dg/store_merging_6.c: Likewise.
* gcc.dg/store_merging_7.c: Likewise.
* gcc.target/i386/pr22141.c: Likewise.
* gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.
* g++.dg/init/new17.C: Likewise.
[-- Attachment #2: store-merging.patch --]
[-- Type: text/x-patch, Size: 62491 bytes --]
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2a98e62b03ac8b84e4595660ac952a8bb3eb1d7f..fd4353fd94f3f12d1b4c799896a704981c6ea9a1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1300,6 +1300,7 @@ OBJS = \
gimple-ssa-isolate-paths.o \
gimple-ssa-nonnull-compare.o \
gimple-ssa-split-paths.o \
+ gimple-ssa-store-merging.o \
gimple-ssa-strength-reduction.o \
gimple-ssa-sprintf.o \
gimple-streamer-in.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index ca48872072e0780c25178e714f96a8aa7f37eb1a..79255c865e4ff4d49b4331337ede172e9e2d7e31 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1460,6 +1460,10 @@ fstrict-volatile-bitfields
Common Report Var(flag_strict_volatile_bitfields) Init(-1) Optimization
Force bitfield accesses to match their type width.
+fstore-merging
+Common Report Var(flag_store_merging) Optimization
+Merge adjacent stores.
+
fguess-branch-probability
Common Report Var(flag_guess_branch_prob) Optimization
Enable guessing of branch probabilities.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d9667e7f5d91b25e4160fdfc6aae2e5d64ba260d..bd60decb4d2f7883e2c3834d5c819fba78622177 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -403,7 +403,7 @@ Objective-C and Objective-C++ Dialects}.
-fsingle-precision-constant -fsplit-ivs-in-unroller @gol
-fsplit-paths @gol
-fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
--fstdarg-opt -fstrict-aliasing @gol
+-fstdarg-opt -fstore-merging -fstrict-aliasing @gol
-fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
-ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
-ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
@@ -414,8 +414,8 @@ Objective-C and Objective-C++ Dialects}.
-ftree-loop-vectorize @gol
-ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
-ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
--ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
--ftree-vectorize -ftree-vrp -funconstrained-commons @gol
+-ftree-switch-conversion -ftree-tail-merge @gol
+-ftree-ter -ftree-vectorize -ftree-vrp -funconstrained-commons @gol
-funit-at-a-time -funroll-all-loops -funroll-loops @gol
-funsafe-math-optimizations -funswitch-loops @gol
-fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
@@ -6604,6 +6604,7 @@ compilation time.
-fsplit-wide-types @gol
-fssa-backprop @gol
-fssa-phiopt @gol
+-fstore-merging @gol
-ftree-bit-ccp @gol
-ftree-ccp @gol
-ftree-ch @gol
@@ -7938,6 +7939,13 @@ Perform scalar replacement of aggregates. This pass replaces structure
references with scalars to prevent committing structures to memory too
early. This flag is enabled by default at @option{-O} and higher.
+@item -fstore-merging
+@opindex fstore-merging
+Perform merging of narrow stores to consecutive memory addresses. This pass
+merges contiguous stores of immediate values narrower than a word into fewer
+wider stores to reduce the number of instructions. This is enabled by default
+at @option{-O} and higher.
+
@item -ftree-ter
@opindex ftree-ter
Perform temporary expression replacement during the SSA->normal phase. Single
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 637e46b0d48788217196ba53110b69d302b17db7..af9c6c96baa953c99438189266c53703d49f644b 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -27,6 +27,7 @@ extern int folding_initializer;
/* Convert between trees and native memory representation. */
extern int native_encode_expr (const_tree, unsigned char *, int, int off = -1);
extern tree native_interpret_expr (tree, const unsigned char *, int);
+extern bool can_native_encode_type_p (tree);
/* Fold constants as much as possible in an expression.
Returns the simplified expression.
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 65c75f639315d620eddfef3d8362b4902d28440a..564d086e9636743104d629cd6f5620ac2ee6a544 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -7515,6 +7515,26 @@ can_native_interpret_type_p (tree type)
}
}
+/* Return true iff a constant of type TYPE is accepted by
+ native_encode_expr. */
+
+bool
+can_native_encode_type_p (tree type)
+{
+ switch (TREE_CODE (type))
+ {
+ case INTEGER_TYPE:
+ case REAL_TYPE:
+ case FIXED_POINT_TYPE:
+ case COMPLEX_TYPE:
+ case VECTOR_TYPE:
+ case POINTER_TYPE:
+ return true;
+ default:
+ return false;
+ }
+}
+
/* Fold a VIEW_CONVERT_EXPR of a constant expression EXPR to type
TYPE at compile-time. If we're unable to perform the conversion
return NULL_TREE. */
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
new file mode 100644
index 0000000000000000000000000000000000000000..36c0d2396b6f4bd74aac2e9fb869b77432905767
--- /dev/null
+++ b/gcc/gimple-ssa-store-merging.c
@@ -0,0 +1,1219 @@
+/* GIMPLE store merging pass.
+ Copyright (C) 2016 Free Software Foundation, Inc.
+ Contributed by ARM Ltd.
+
+ This file is part of GCC.
+
+ GCC is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ GCC is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with GCC; see the file COPYING3. If not see
+ <http://www.gnu.org/licenses/>. */
+
+/* The purpose of this pass is to combine multiple memory stores of
+ constant values to consecutive memory locations into fewer wider stores.
+ For example, if we have a sequence peforming four byte stores to
+ consecutive memory locations:
+ [p ] := imm1;
+ [p + 1B] := imm2;
+ [p + 2B] := imm3;
+ [p + 3B] := imm4;
+ we can transform this into a single 4-byte store if the target supports it:
+ [p] := imm1:imm2:imm3:imm4 //concatenated immediates according to endianness.
+
+ The algorithm is applied to each basic block in three phases:
+
+ 1) Scan through the basic block recording constant assignments to
+ destinations that can be expressed as a store to memory of a certain size
+ at a certain bit offset. Record store chains to different bases in a
+ hash_map (m_stores) and make sure to terminate such chains when appropriate
+ (for example when when the stored values get used subsequently).
+ These stores can be a result of structure element initializers, array stores
+ etc. A store_immediate_info object is recorded for every such store.
+ Record as many such assignments to a single base as possible until a
+ statement that interferes with the store sequence is encountered.
+
+ 2) Analyze the chain of stores recorded in phase 1) (i.e. the vector of
+ store_immediate_info objects) and coalesce contiguous stores into
+ merged_store_group objects.
+
+ For example, given the stores:
+ [p ] := 0;
+ [p + 1B] := 1;
+ [p + 3B] := 0;
+ [p + 4B] := 1;
+ [p + 5B] := 0;
+ [p + 6B] := 0;
+ This phase would produce two merged_store_group objects, one recording the
+ two bytes stored in the memory region [p : p + 1] and another
+ recording the four bytes stored in the memory region [p + 3 : p + 6].
+
+ 3) The merged_store_group objects produced in phase 2) are processed
+ to generate the sequence of wider stores that set the contiguous memory
+ regions to the sequence of bytes that correspond to it. This may emit
+ multiple stores per store group to handle contiguous stores that are not
+ of a size that is a power of 2. For example it can try to emit a 40-bit
+ store as a 32-bit store followed by an 8-bit store.
+ We try to emit as wide stores as we can while respecting STRICT_ALIGNMENT or
+ SLOW_UNALIGNED_ACCESS rules.
+
+ Note on endianness and example:
+ Consider 2 contiguous 16-bit stores followed by 2 contiguous 8-bit stores:
+ [p ] := 0x1234;
+ [p + 2B] := 0x5678;
+ [p + 4B] := 0xab;
+ [p + 5B] := 0xcd;
+
+ The memory layout for little-endian (LE) and big-endian (BE) must be:
+ p |LE|BE|
+ ---------
+ 0 |34|12|
+ 1 |12|34|
+ 2 |78|56|
+ 3 |56|78|
+ 4 |ab|ab|
+ 5 |cd|cd|
+
+ To merge these into a single 48-bit merged value 'val' in phase 2)
+ on little-endian we insert stores to higher (consecutive) bitpositions
+ into the most significant bits of the merged value.
+ The final merged value would be: 0xcdab56781234
+
+ For big-endian we insert stores to higher bitpositions into the least
+ significant bits of the merged value.
+ The final merged value would be: 0x12345678abcd
+
+ Then, in phase 3), we want to emit this 48-bit value as a 32-bit store
+ followed by a 16-bit store. Again, we must consider endianness when
+ breaking down the 48-bit value 'val' computed above.
+ For little endian we emit:
+ [p] (32-bit) := 0x56781234; // val & 0x0000ffffffff;
+ [p + 4B] (16-bit) := 0xcdab; // (val & 0xffff00000000) >> 32;
+
+ Whereas for big-endian we emit:
+ [p] (32-bit) := 0x12345678; // (val & 0xffffffff0000) >> 16;
+ [p + 4B] (16-bit) := 0xabcd; // val & 0x00000000ffff; */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "builtins.h"
+#include "fold-const.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-pretty-print.h"
+#include "alias.h"
+#include "fold-const.h"
+#include "params.h"
+#include "print-tree.h"
+#include "tree-hash-traits.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "stor-layout.h"
+#include "tree-cfg.h"
+#include "tree-eh.h"
+#include "target.h"
+
+/* The maximum size (in bits) of the stores this pass should generate. */
+#define MAX_STORE_BITSIZE (BITS_PER_WORD)
+#define MAX_STORE_BYTES (MAX_STORE_BITSIZE / BITS_PER_UNIT)
+
+namespace {
+
+/* Struct recording the information about a single store of an immediate
+ to memory. These are created in the first phase and coalesced into
+ merged_store_group objects in the second phase. */
+
+struct store_immediate_info
+{
+ unsigned HOST_WIDE_INT bitsize;
+ unsigned HOST_WIDE_INT bitpos;
+ tree val;
+ tree dest;
+ gimple *stmt;
+ unsigned int order;
+ store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, tree,
+ tree, gimple *, unsigned int);
+};
+
+store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
+ unsigned HOST_WIDE_INT bp, tree v,
+ tree d, gimple *st,
+ unsigned int ord)
+ : bitsize (bs), bitpos (bp), val (v), dest (d), stmt (st), order (ord)
+{
+}
+
+/* Struct representing a group of stores to contiguous memory locations.
+ These are produced by the second phase (coalescing) and consumed in the
+ third phase that outputs the widened stores. */
+
+struct merged_store_group
+{
+ unsigned HOST_WIDE_INT start;
+ unsigned HOST_WIDE_INT width;
+ /* The size of the allocated memory for val. */
+ unsigned HOST_WIDE_INT buf_size;
+
+ unsigned int align;
+ unsigned int first_order;
+ unsigned int last_order;
+
+ auto_vec<struct store_immediate_info *> stores;
+ /* We record the first and last original statements in the sequence because
+ we'll need their vuse/vdef and replacement position. It's easier to keep
+ track of them separately as 'stores' is reordered by apply_stores. */
+ gimple *last_stmt;
+ gimple *first_stmt;
+ unsigned char *val;
+
+ merged_store_group (store_immediate_info *);
+ ~merged_store_group ();
+ void merge_into (store_immediate_info *);
+ void merge_overlapping (store_immediate_info *);
+ bool apply_stores ();
+};
+
+/* Debug helper. Dump LEN elements of byte array PTR to FD in hex. */
+
+static void
+dump_char_array (FILE *fd, unsigned char *ptr, unsigned int len)
+{
+ if (!fd)
+ return;
+
+ for (unsigned int i = 0; i < len; i++)
+ fprintf (fd, "%x ", ptr[i]);
+ fprintf (fd, "\n");
+}
+
+/* Write BITLEN bits of EXPR to the byte array PTR at
+ bit position BITPOS. PTR should contain TOTAL_BYTES elements.
+ Return true if the operation succeeded. */
+
+static bool
+encode_tree_to_bitpos (tree expr, unsigned char *ptr, int bitlen, int bitpos,
+ unsigned int total_bytes)
+{
+ unsigned int last_byte = (bitpos + bitlen - 1) / BITS_PER_UNIT;
+ unsigned int first_byte = bitpos / BITS_PER_UNIT;
+ tree tmp_int = expr;
+ bool sub_byte_op_p = (bitlen % BITS_PER_UNIT) || (bitpos % BITS_PER_UNIT);
+ /* If the expression is not an integer encode it into the temporary buffer
+ and read it back as an integer. */
+ if (TREE_CODE (expr) != INTEGER_CST && sub_byte_op_p)
+ {
+ unsigned char *tmpbuf = XALLOCAVEC (unsigned char, total_bytes);
+ memset (tmpbuf, 0, total_bytes);
+ native_encode_expr (expr, tmpbuf, total_bytes, 0);
+ tree read_back_type
+ = build_nonstandard_integer_type (tree_to_shwi (
+ TYPE_SIZE (TREE_TYPE (expr))), UNSIGNED);
+ tmp_int
+ = native_interpret_expr (read_back_type, tmpbuf, total_bytes);
+ }
+
+ gcc_assert (tmp_int);
+
+ /* If we're inserting a non-bytesized width or not at a byte boundary
+ use an intermediate wide_int to perform the bit-insertion correctly. */
+ if (sub_byte_op_p
+ || (TREE_CODE (expr) == INTEGER_CST
+ && mode_for_size (bitlen, MODE_INT, 0) == BLKmode))
+ {
+ unsigned int byte_size = last_byte - first_byte + 1;
+
+ /* The functions native_encode_expr/native_interpret_expr use the
+ TYPE_MODE of the type to determine the number of bytes to write/read
+ so if we want to process a number of bytes that does not have a
+ TYPE_MODE of equal size we need to use a type that has a valid mode
+ for it. */
+
+ machine_mode mode
+ = smallest_mode_for_size (byte_size * BITS_PER_UNIT, MODE_INT);
+ tree dest_int_type
+ = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), UNSIGNED);
+ byte_size = GET_MODE_SIZE (mode);
+ /* The region from the byte array that we're inserting into. */
+ tree ptr_wide_int
+ = native_interpret_expr (dest_int_type, ptr + first_byte,
+ total_bytes);
+
+ gcc_assert (ptr_wide_int);
+ wide_int dest_wide_int
+ = wi::to_wide (ptr_wide_int, TYPE_PRECISION (dest_int_type));
+
+ wide_int expr_wide_int
+ = wi::to_wide (tmp_int, byte_size * BITS_PER_UNIT);
+ if (BYTES_BIG_ENDIAN)
+ {
+ unsigned int insert_pos
+ = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT);
+ dest_wide_int
+ = wi::insert (dest_wide_int, expr_wide_int, insert_pos, bitlen);
+ }
+ else
+ dest_wide_int = wi::insert (dest_wide_int, expr_wide_int,
+ bitpos % BITS_PER_UNIT, bitlen);
+
+ tree res = wide_int_to_tree (dest_int_type, dest_wide_int);
+ if (!native_encode_expr (res, ptr + first_byte, total_bytes, 0))
+ return false;
+
+ }
+ /* If we're inserting a "well-behaved" value at a normal position just
+ call native_encode_expr directly. */
+ else if (!native_encode_expr (tmp_int, ptr + first_byte,
+ total_bytes, 0))
+ return false;
+
+ return true;
+}
+
+/* Sorting function for store_immediate_info objects.
+ Sorts them by bitposition. */
+
+static int
+sort_by_bitpos (const void *x, const void *y)
+{
+ store_immediate_info *const *tmp = (store_immediate_info * const *) x;
+ store_immediate_info *const *tmp2 = (store_immediate_info * const *) y;
+
+ if ((*tmp)->bitpos <= (*tmp2)->bitpos)
+ return -1;
+ else if ((*tmp)->bitpos > (*tmp2)->bitpos)
+ return 1;
+
+ gcc_unreachable ();
+}
+
+/* Sorting function for store_immediate_info objects.
+ Sorts them by the order field. */
+
+static int
+sort_by_order (const void *x, const void *y)
+{
+ store_immediate_info *const *tmp = (store_immediate_info * const *) x;
+ store_immediate_info *const *tmp2 = (store_immediate_info * const *) y;
+
+ if ((*tmp)->order < (*tmp2)->order)
+ return -1;
+ else if ((*tmp)->order > (*tmp2)->order)
+ return 1;
+
+ gcc_unreachable ();
+}
+
+/* Initialize a merged_store_group object from a store_immediate_info
+ object. */
+
+merged_store_group::merged_store_group (store_immediate_info *info)
+{
+ start = info->bitpos;
+ width = info->bitsize;
+ /* VAL has memory allocated for it in apply_stores once the group
+ width has been finalized. */
+ val = NULL;
+ align = get_object_alignment (info->dest);
+ stores.create (1);
+ stores.safe_push (info);
+ last_stmt = info->stmt;
+ last_order = info->order;
+ first_stmt = last_stmt;
+ first_order = last_order;
+ buf_size = 0;
+}
+
+merged_store_group::~merged_store_group ()
+{
+ if (val)
+ XDELETEVEC (val);
+}
+
+/* Merge a store recorded by INFO into this merged store.
+ The store is not overlapping with the existing recorded
+ stores. */
+
+void
+merged_store_group::merge_into (store_immediate_info *info)
+{
+ unsigned HOST_WIDE_INT wid = info->bitsize;
+ /* Make sure we're inserting in the position we think we're inserting. */
+ gcc_assert (info->bitpos == start + width);
+
+ width += wid;
+ gimple *stmt = info->stmt;
+ stores.safe_push (info);
+ if (info->order > last_order)
+ {
+ last_order = info->order;
+ last_stmt = stmt;
+ }
+ else if (info->order < first_order)
+ {
+ first_order = info->order;
+ first_stmt = stmt;
+ }
+}
+
+/* Merge a store described by INFO into this merged store.
+ INFO overlaps in some way with the current store (i.e. it's not contiguous
+ which is handled by merged_store_group::merge_into). */
+
+void
+merged_store_group::merge_overlapping (store_immediate_info *info)
+{
+ gimple *stmt = info->stmt;
+ stores.safe_push (info);
+
+ /* If the store extends the size of the group, extend the width. */
+ if ((info->bitpos + info->bitsize) > (start + width))
+ width += info->bitpos + info->bitsize - (start + width);
+
+ if (info->order > last_order)
+ {
+ last_order = info->order;
+ last_stmt = stmt;
+ }
+ else if (info->order < first_order)
+ {
+ first_order = info->order;
+ first_stmt = stmt;
+ }
+}
+
+/* Go through all the recorded stores in this group in program order and
+ apply their values to the VAL byte array to create the final merged
+ value. Return true if the operation succeeded. */
+
+bool
+merged_store_group::apply_stores ()
+{
+ /* The total width of the stores must add up to a whole number of bytes
+ and start at a byte boundary. We don't support emitting bitfield
+ references for now. Also, make sure we have more than one store
+ in the group, otherwise we cannot merge anything. */
+ if (width % BITS_PER_UNIT != 0
+ || start % BITS_PER_UNIT != 0
+ || stores.length () == 1)
+ return false;
+
+ stores.qsort (sort_by_order);
+ struct store_immediate_info *info;
+ unsigned int i;
+ /* Create a buffer of a size that is 2 times the number of bytes we're
+ storing. That way native_encode_expr can write power-of-2-sized
+ chunks without overrunning. */
+ buf_size
+ = 2 * (ROUND_UP (width, BITS_PER_UNIT) / BITS_PER_UNIT);
+ val = XCNEWVEC (unsigned char, buf_size);
+
+ FOR_EACH_VEC_ELT (stores, i, info)
+ {
+ unsigned int pos_in_buffer = info->bitpos - start;
+ bool ret = encode_tree_to_bitpos (info->val, val, info->bitsize,
+ pos_in_buffer, buf_size);
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ if (ret)
+ {
+ fprintf (dump_file, "After writing ");
+ print_generic_expr (dump_file, info->val, 0);
+ fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
+ " at position %d the merged region contains:\n",
+ info->bitsize, pos_in_buffer);
+ dump_char_array (dump_file, val, buf_size);
+ }
+ else
+ fprintf (dump_file, "Failed to merge stores\n");
+ }
+ if (!ret)
+ return false;
+ }
+ return true;
+}
+
+/* Structure describing the store chain. */
+
+struct imm_store_chain_info
+{
+ auto_vec<struct store_immediate_info *> m_store_info;
+ auto_vec<merged_store_group *> m_merged_store_groups;
+
+ bool terminate_and_process_chain (tree);
+ bool coalesce_immediate_stores ();
+ bool output_merged_store (tree, merged_store_group *);
+ bool output_merged_stores (tree);
+};
+
+const pass_data pass_data_tree_store_merging = {
+ GIMPLE_PASS, /* type */
+ "store-merging", /* name */
+ OPTGROUP_NONE, /* optinfo_flags */
+ TV_NONE, /* tv_id */
+ PROP_ssa, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_store_merging : public gimple_opt_pass
+{
+public:
+ pass_store_merging (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_tree_store_merging, ctxt)
+ {
+ }
+
+ /* Pass not supported for PDP-endianness. */
+ virtual bool
+ gate (function *)
+ {
+ return flag_store_merging && (WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN);
+ }
+
+ virtual unsigned int execute (function *);
+
+private:
+ hash_map<tree_operand_hash, struct imm_store_chain_info *> m_stores;
+
+ bool terminate_and_process_all_chains ();
+ bool terminate_all_aliasing_chains (tree, tree, gimple *);
+ bool terminate_and_release_chain (tree);
+}; // class pass_store_merging
+
+/* Terminate and process all recorded chains. Return true if any changes
+ were made. */
+
+bool
+pass_store_merging::terminate_and_process_all_chains ()
+{
+ hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator iter
+ = m_stores.begin ();
+ bool ret = false;
+ for (; iter != m_stores.end (); ++iter)
+ ret |= terminate_and_release_chain ((*iter).first);
+
+ return ret;
+}
+
+/* Terminate all chains that are affected by the assignment to DEST, appearing
+ in statement STMT and ultimately points to the object BASE. Return true if
+ at least one aliasing chain was terminated. BASE and DEST are allowed to
+ be NULL_TREE. In that case the aliasing checks are performed on the whole
+ statement rather than a particular operand in it. */
+
+bool
+pass_store_merging::terminate_all_aliasing_chains (tree dest, tree base,
+ gimple *stmt)
+{
+ bool ret = false;
+
+ /* If the statement doesn't touch memory it can't alias. */
+ if (!gimple_vuse (stmt))
+ return false;
+
+ struct imm_store_chain_info **chain_info = NULL;
+
+ /* Check if the assignment destination (BASE) is part of a store chain.
+ This is to catch non-constant stores to destinations that may be part
+ of a chain. */
+ if (base)
+ {
+ chain_info = m_stores.get (base);
+ if (chain_info)
+ {
+ struct store_immediate_info *info;
+ unsigned int i;
+ FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
+ {
+ if (refs_may_alias_p (info->dest, dest))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "stmt causes chain termination:\n");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+ }
+ terminate_and_release_chain (base);
+ ret = true;
+ break;
+ }
+ }
+ }
+ }
+
+ hash_map<tree_operand_hash, struct imm_store_chain_info *>::iterator iter
+ = m_stores.begin ();
+
+ /* Check for aliasing with all other store chains. */
+ for (; iter != m_stores.end (); ++iter)
+ {
+ /* We already checked all the stores in chain_info and terminated the
+ chain if necessary. Skip it here. */
+ if (chain_info && (*chain_info) == (*iter).second)
+ continue;
+
+ tree key = (*iter).first;
+ if (ref_maybe_used_by_stmt_p (stmt, key)
+ || stmt_may_clobber_ref_p (stmt, key))
+ {
+ terminate_and_release_chain (key);
+ ret = true;
+ }
+ }
+
+ return ret;
+}
+
+/* Helper function. Terminate the recorded chain storing to base object
+ BASE. Return true if the merging and output was successful. The m_stores
+ entry is removed after the processing in any case. */
+
+bool
+pass_store_merging::terminate_and_release_chain (tree base)
+{
+ struct imm_store_chain_info **chain_info = m_stores.get (base);
+
+ if (!chain_info)
+ return false;
+
+ gcc_assert (*chain_info);
+
+ bool ret = (*chain_info)->terminate_and_process_chain (base);
+ delete *chain_info;
+ m_stores.remove (base);
+
+ return ret;
+}
+
+/* Go through the candidate stores recorded in m_store_info and merge them
+ into merged_store_group objects recorded into m_merged_store_groups
+ representing the widened stores. Return true if coalescing was successful
+ and the number of widened stores is fewer than the original number
+ of stores. */
+
+bool
+imm_store_chain_info::coalesce_immediate_stores ()
+{
+ /* Anything less can't be processed. */
+ if (m_store_info.length () < 2)
+ return false;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Attempting to coalesce %u stores in chain.\n",
+ m_store_info.length ());
+
+ store_immediate_info *info;
+ unsigned int i;
+
+ /* Order the stores by the bitposition they write to. */
+ m_store_info.qsort (sort_by_bitpos);
+
+ info = m_store_info[0];
+ merged_store_group *merged_store = new merged_store_group (info);
+
+ FOR_EACH_VEC_ELT (m_store_info, i, info)
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Store %u:\nbitsize:" HOST_WIDE_INT_PRINT_DEC
+ " bitpos:" HOST_WIDE_INT_PRINT_DEC " val:\n",
+ i, info->bitsize, info->bitpos);
+ print_generic_expr (dump_file, info->val, 0);
+ fprintf (dump_file, "\n------------\n");
+ }
+
+ if (i == 0)
+ continue;
+
+ /* |---store 1---|
+ |---store 2---|
+ Overlapping stores. */
+ unsigned HOST_WIDE_INT start = info->bitpos;
+ if (IN_RANGE (start, merged_store->start,
+ merged_store->start + merged_store->width - 1))
+ {
+ merged_store->merge_overlapping (info);
+ continue;
+ }
+
+ /* |---store 1---| <gap> |---store 2---|.
+ Gap between stores. Start a new group. */
+ if (start != merged_store->start + merged_store->width)
+ {
+ /* Try to apply all the stores recorded for the group to determine
+ the bitpattern they write and discard it if that fails.
+ This will also reject single-store groups. */
+ if (!merged_store->apply_stores ())
+ delete merged_store;
+ else
+ m_merged_store_groups.safe_push (merged_store);
+
+ merged_store = new merged_store_group (info);
+
+ continue;
+ }
+
+ /* |---store 1---||---store 2---|
+ This store is consecutive to the previous one.
+ Merge it into the current store group. */
+ merged_store->merge_into (info);
+ }
+
+ /* Record or discard the last store group. */
+ if (!merged_store->apply_stores ())
+ delete merged_store;
+ else
+ m_merged_store_groups.safe_push (merged_store);
+
+ gcc_assert (m_merged_store_groups.length () <= m_store_info.length ());
+ bool success
+ = !m_merged_store_groups.is_empty ()
+ && m_merged_store_groups.length () < m_store_info.length ();
+
+ if (success && dump_file)
+ fprintf (dump_file, "Coalescing successful!\n"
+ "Merged into %u stores\n",
+ m_merged_store_groups.length ());
+
+ return success;
+}
+
+/* Return the type to use for the merged stores described by STMTS.
+ This is needed to get the alias sets right. */
+
+static tree
+get_alias_type_for_stmts (auto_vec<gimple *> &stmts)
+{
+ gimple *stmt;
+ unsigned int i;
+ tree lhs = gimple_assign_lhs (stmts[0]);
+ tree type = reference_alias_ptr_type (lhs);
+
+ FOR_EACH_VEC_ELT (stmts, i, stmt)
+ {
+ if (i == 0)
+ continue;
+
+ lhs = gimple_assign_lhs (stmt);
+ tree type1 = reference_alias_ptr_type (lhs);
+ if (!alias_ptr_types_compatible_p (type, type1))
+ return ptr_type_node;
+ }
+ return type;
+}
+
+/* Return the location_t information we can find among the statements
+ in STMTS. */
+
+static location_t
+get_location_for_stmts (auto_vec<gimple *> &stmts)
+{
+ gimple *stmt;
+ unsigned int i;
+
+ FOR_EACH_VEC_ELT (stmts, i, stmt)
+ if (gimple_has_location (stmt))
+ return gimple_location (stmt);
+
+ return UNKNOWN_LOCATION;
+}
+
+/* Used to decribe a store resulting from splitting a wide store in smaller
+ regularly-sized stores in split_group. */
+
+struct split_store
+{
+ unsigned HOST_WIDE_INT bytepos;
+ unsigned HOST_WIDE_INT size;
+ unsigned HOST_WIDE_INT align;
+ auto_vec<gimple *> orig_stmts;
+ split_store (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+ unsigned HOST_WIDE_INT);
+};
+
+/* Simple constructor. */
+
+split_store::split_store (unsigned HOST_WIDE_INT bp,
+ unsigned HOST_WIDE_INT sz,
+ unsigned HOST_WIDE_INT al)
+ : bytepos (bp), size (sz), align (al)
+{
+ orig_stmts.create (0);
+}
+
+/* Record all statements corresponding to stores in GROUP that write to
+ the region starting at BITPOS and is of size BITSIZE. Record such
+ statements in STMTS. The stores in GROUP must be sorted by
+ bitposition. */
+
+static void
+find_constituent_stmts (struct merged_store_group *group,
+ auto_vec<gimple *> &stmts,
+ unsigned HOST_WIDE_INT bitpos,
+ unsigned HOST_WIDE_INT bitsize)
+{
+ struct store_immediate_info *info;
+ unsigned int i;
+ unsigned HOST_WIDE_INT end = bitpos + bitsize;
+ FOR_EACH_VEC_ELT (group->stores, i, info)
+ {
+ unsigned HOST_WIDE_INT stmt_start = info->bitpos;
+ unsigned HOST_WIDE_INT stmt_end = stmt_start + info->bitsize;
+ if (stmt_end < bitpos)
+ continue;
+ /* The stores in GROUP are ordered by bitposition so if we're past
+ the region for this group return early. */
+ if (stmt_start > end)
+ return;
+
+ if (IN_RANGE (stmt_start, bitpos, bitpos + bitsize)
+ || IN_RANGE (stmt_end, bitpos, end)
+ /* The statement writes a region that completely encloses the region
+ that this group writes. Unlikely to occur but let's
+ handle it. */
+ || IN_RANGE (bitpos, stmt_start, stmt_end))
+ stmts.safe_push (info->stmt);
+ }
+}
+
+/* Split a merged store described by GROUP by populating the SPLIT_STORES
+ vector with split_store structs describing the byte offset (from the base),
+ the bit size and alignment of each store as well as the original statements
+ involved in each such split group.
+ This is to separate the splitting strategy from the statement
+ building/emission/linking done in output_merged_store.
+ At the moment just start with the widest possible size and keep emitting
+ the widest we can until we have emitted all the bytes, halving the size
+ when appropriate. */
+
+static bool
+split_group (merged_store_group *group,
+ auto_vec<struct split_store *> &split_stores)
+{
+ unsigned HOST_WIDE_INT pos = group->start;
+ unsigned HOST_WIDE_INT size = group->width;
+ unsigned HOST_WIDE_INT bytepos = pos / BITS_PER_UNIT;
+ unsigned HOST_WIDE_INT align = group->align;
+
+ /* We don't handle partial bitfields for now. We shouldn't have
+ reached this far. */
+ gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0));
+
+ bool allow_unaligned
+ = !STRICT_ALIGNMENT && PARAM_VALUE (PARAM_STORE_MERGING_ALLOW_UNALIGNED);
+
+ unsigned int try_size = MAX_STORE_BITSIZE;
+ while (try_size > size
+ || (!allow_unaligned
+ && try_size > align))
+ {
+ try_size /= 2;
+ if (try_size < BITS_PER_UNIT)
+ return false;
+ }
+
+ unsigned HOST_WIDE_INT try_pos = bytepos;
+ group->stores.qsort (sort_by_bitpos);
+
+ while (size > 0)
+ {
+ struct split_store *store = new split_store (try_pos, try_size, align);
+ unsigned HOST_WIDE_INT try_bitpos = try_pos * BITS_PER_UNIT;
+ find_constituent_stmts (group, store->orig_stmts, try_bitpos, try_size);
+ split_stores.safe_push (store);
+
+ try_pos += try_size / BITS_PER_UNIT;
+
+ size -= try_size;
+ align = try_size;
+ while (size < try_size)
+ try_size /= 2;
+ }
+ return true;
+}
+
+/* Given a merged store group GROUP output the widened version of it.
+ The store chain is against the base object BASE.
+ Try store sizes of at most MAX_STORE_BITSIZE bits wide and don't output
+ unaligned stores for STRICT_ALIGNMENT targets or if it's too expensive.
+ Make sure that the number of statements output is less than the number of
+ original statements. If a better sequence is possible emit it and
+ return true. */
+
+bool
+imm_store_chain_info::output_merged_store (tree base, merged_store_group *group)
+{
+ unsigned HOST_WIDE_INT start_byte_pos = group->start / BITS_PER_UNIT;
+
+ unsigned int orig_num_stmts = group->stores.length ();
+ if (orig_num_stmts < 2)
+ return false;
+
+ auto_vec<struct split_store *> split_stores;
+ split_stores.create (0);
+ if (!split_group (group, split_stores))
+ return false;
+
+ gimple_stmt_iterator last_gsi = gsi_for_stmt (group->last_stmt);
+ gimple_seq seq = NULL;
+ unsigned int num_stmts = 0;
+ tree last_vdef, new_vuse;
+ last_vdef = gimple_vdef (group->last_stmt);
+ new_vuse = gimple_vuse (group->last_stmt);
+
+ gimple *stmt = NULL;
+ /* The new SSA names created. Keep track of them so that we can free them
+ if we decide to not use the new sequence. */
+ auto_vec<tree> new_ssa_names;
+ split_store *split_store;
+ unsigned int i;
+ bool fail = false;
+
+ FOR_EACH_VEC_ELT (split_stores, i, split_store)
+ {
+ unsigned HOST_WIDE_INT try_size = split_store->size;
+ unsigned HOST_WIDE_INT try_pos = split_store->bytepos;
+ unsigned HOST_WIDE_INT align = split_store->align;
+ tree offset_type = get_alias_type_for_stmts (split_store->orig_stmts);
+ location_t loc = get_location_for_stmts (split_store->orig_stmts);
+
+ tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
+ SET_TYPE_ALIGN (int_type, align);
+ tree addr = build_fold_addr_expr (base);
+ tree dest = fold_build2 (MEM_REF, int_type, addr,
+ build_int_cst (offset_type, try_pos));
+
+ tree src = native_interpret_expr (int_type,
+ group->val + try_pos - start_byte_pos,
+ group->buf_size);
+
+ stmt = gimple_build_assign (dest, src);
+ gimple_set_location (stmt, loc);
+ gimple_set_vuse (stmt, new_vuse);
+ gimple_seq_add_stmt_without_update (&seq, stmt);
+
+ /* We didn't manage to reduce the number of statements. Bail out. */
+ if (++num_stmts == orig_num_stmts)
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Exceeded original number of stmts (%u)."
+ " Not profitable to emit new sequence.\n",
+ orig_num_stmts);
+ }
+ unsigned int ssa_count;
+ tree ssa_name;
+ /* Don't forget to cleanup the temporary SSA names. */
+ FOR_EACH_VEC_ELT (new_ssa_names, ssa_count, ssa_name)
+ release_ssa_name (ssa_name);
+
+ fail = true;
+ break;
+ }
+
+ tree new_vdef;
+ if (i < split_stores.length () - 1)
+ {
+ new_vdef = make_ssa_name (gimple_vop (cfun), stmt);
+ new_ssa_names.safe_push (new_vdef);
+ }
+ else
+ new_vdef = last_vdef;
+
+ gimple_set_vdef (stmt, new_vdef);
+ SSA_NAME_DEF_STMT (new_vdef) = stmt;
+ new_vuse = new_vdef;
+ }
+
+ FOR_EACH_VEC_ELT (split_stores, i, split_store)
+ delete split_store;
+
+ if (fail)
+ return false;
+
+ gcc_assert (seq);
+ if (dump_file)
+ {
+ fprintf (dump_file,
+ "New sequence of %u stmts to replace old one of %u stmts\n",
+ num_stmts, orig_num_stmts);
+ if (dump_flags & TDF_DETAILS)
+ print_gimple_seq (dump_file, seq, 0, TDF_VOPS | TDF_MEMSYMS);
+ }
+ gsi_insert_seq_after (&last_gsi, seq, GSI_SAME_STMT);
+
+ return true;
+}
+
+/* Process the merged_store_group objects created in the coalescing phase.
+ The stores are all against the base object BASE.
+ Try to output the widened stores and delete the original statements if
+ successful. Return true iff any changes were made. */
+
+bool
+imm_store_chain_info::output_merged_stores (tree base)
+{
+ unsigned int i;
+ merged_store_group *merged_store;
+ bool ret = false;
+ FOR_EACH_VEC_ELT (m_merged_store_groups, i, merged_store)
+ {
+ if (output_merged_store (base, merged_store))
+ {
+ unsigned int j;
+ store_immediate_info *store;
+ FOR_EACH_VEC_ELT (merged_store->stores, j, store)
+ {
+ gimple *stmt = store->stmt;
+ gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+ gsi_remove (&gsi, true);
+ if (stmt != merged_store->last_stmt)
+ {
+ unlink_stmt_vdef (stmt);
+ release_defs (stmt);
+ }
+ }
+ ret = true;
+ }
+ }
+ if (ret && dump_file)
+ fprintf (dump_file, "Merging successful!\n");
+
+ return ret;
+}
+
+/* Coalesce the store_immediate_info objects recorded against the base object
+ BASE in the first phase and output them.
+ Delete the allocated structures.
+ Return true if any changes were made. */
+
+bool
+imm_store_chain_info::terminate_and_process_chain (tree base)
+{
+ /* Process store chain. */
+ bool ret = false;
+ if (m_store_info.length () > 1)
+ {
+ ret = coalesce_immediate_stores ();
+ if (ret)
+ ret = output_merged_stores (base);
+ }
+
+ /* Delete all the entries we allocated ourselves. */
+ store_immediate_info *info;
+ unsigned int i;
+ FOR_EACH_VEC_ELT (m_store_info, i, info)
+ delete info;
+
+ merged_store_group *merged_info;
+ FOR_EACH_VEC_ELT (m_merged_store_groups, i, merged_info)
+ delete merged_info;
+
+ return ret;
+}
+
+/* Return true iff LHS is a destination potentially interesting for
+ store merging. In practice these are the codes that get_inner_reference
+ can process. */
+
+static bool
+lhs_valid_for_store_merging_p (tree lhs)
+{
+ tree_code code = TREE_CODE (lhs);
+
+ if (code == ARRAY_REF || code == ARRAY_RANGE_REF || code == MEM_REF
+ || code == COMPONENT_REF || code == BIT_FIELD_REF)
+ return true;
+
+ return false;
+}
+
+/* Return true if the tree RHS is a constant we want to consider
+ during store merging. In practice accept all codes that
+ native_encode_expr accepts. */
+
+static bool
+rhs_valid_for_store_merging_p (tree rhs)
+{
+ tree type = TREE_TYPE (rhs);
+ if (TREE_CODE_CLASS (TREE_CODE (rhs)) != tcc_constant
+ || !can_native_encode_type_p (type))
+ return false;
+
+ return true;
+}
+
+/* Entry point for the pass. Go over each basic block recording chains of
+ immediate stores. Upon encountering a terminating statement (as defined
+ by stmt_terminates_chain_p) process the recorded stores and emit the widened
+ variants. */
+
+unsigned int
+pass_store_merging::execute (function *fun)
+{
+ basic_block bb;
+ hash_set<gimple *> orig_stmts;
+
+ FOR_EACH_BB_FN (bb, fun)
+ {
+ gimple_stmt_iterator gsi;
+ unsigned HOST_WIDE_INT num_statements = 0;
+ /* Record the original statements so that we can keep track of
+ statements emitted in this pass and not re-process new
+ statements. */
+ for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ if (is_gimple_debug (gsi_stmt (gsi)))
+ continue;
+
+ if (++num_statements > 2)
+ break;
+ }
+
+ if (num_statements < 2)
+ continue;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Processing basic block <%d>:\n", bb->index);
+
+ for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gimple *stmt = gsi_stmt (gsi);
+
+ if (gimple_has_volatile_ops (stmt))
+ {
+ /* Terminate all chains. */
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Volatile access terminates "
+ "all chains\n");
+ terminate_and_process_all_chains ();
+ continue;
+ }
+
+ if (is_gimple_debug (stmt))
+ continue;
+
+ if (gimple_assign_single_p (stmt) && gimple_vdef (stmt)
+ && !stmt_can_throw_internal (stmt)
+ && lhs_valid_for_store_merging_p (gimple_assign_lhs (stmt)))
+ {
+ tree lhs = gimple_assign_lhs (stmt);
+ tree rhs = gimple_assign_rhs1 (stmt);
+
+ HOST_WIDE_INT bitsize, bitpos;
+ machine_mode mode;
+ int unsignedp = 0, reversep = 0, volatilep = 0;
+ tree offset, base_addr;
+ base_addr
+ = get_inner_reference (lhs, &bitsize, &bitpos, &offset, &mode,
+ &unsignedp, &reversep, &volatilep);
+ /* As a future enhancement we could handle stores with the same
+ base and offset. */
+ bool invalid = offset || reversep
+ || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
+ && (TREE_CODE (rhs) != INTEGER_CST))
+ || !rhs_valid_for_store_merging_p (rhs)
+ /* An access may not be volatile itself but base_addr may be
+ a volatile decl i.e. MEM[&volatile-decl]. The hashing for
+ tree_operand_hash won't consider such stores equal to each
+ other so we can't track chains on them. */
+ || TREE_THIS_VOLATILE (base_addr);
+
+ /* In some cases get_inner_reference may return a
+ MEM_REF [ptr + byteoffset]. For the purposes of this pass
+ canonicalize the base_addr to MEM_REF [ptr] and take
+ byteoffset into account in the bitpos. This occurs in
+ PR 23684 and this way we can catch more chains. */
+ if (TREE_CODE (base_addr) == MEM_REF
+ && POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (base_addr, 0))))
+ {
+ offset_int bit_off, byte_off = mem_ref_offset (base_addr);
+ bit_off = byte_off << LOG2_BITS_PER_UNIT;
+ if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
+ {
+ bitpos += bit_off.to_shwi ();
+
+ base_addr = copy_node (base_addr);
+ TREE_OPERAND (base_addr, 1)
+ = build_zero_cst (TREE_TYPE (TREE_OPERAND (
+ base_addr, 1)));
+ }
+ else
+ invalid = true;
+ }
+
+ struct imm_store_chain_info **chain_info
+ = m_stores.get (base_addr);
+
+ if (!invalid)
+ {
+ store_immediate_info *info;
+ if (chain_info)
+ {
+ info = new store_immediate_info (
+ bitsize, bitpos, rhs, lhs, stmt,
+ (*chain_info)->m_store_info.length ());
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file,
+ "Recording immediate store from stmt:\n");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+ }
+ (*chain_info)->m_store_info.safe_push (info);
+ continue;
+ }
+
+ /* Store aliases any existing chain? */
+ terminate_all_aliasing_chains (lhs, base_addr, stmt);
+ /* Start a new chain. */
+ struct imm_store_chain_info *new_chain
+ = new imm_store_chain_info;
+ info = new store_immediate_info (bitsize, bitpos, rhs, lhs,
+ stmt, 0);
+ new_chain->m_store_info.safe_push (info);
+ m_stores.put (base_addr, new_chain);
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file,
+ "Starting new chain with statement:\n");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+ fprintf (dump_file, "The base object is:\n");
+ print_generic_expr (dump_file, base_addr, 0);
+ fprintf (dump_file, "\n");
+ }
+ }
+ else
+ terminate_all_aliasing_chains (lhs, base_addr, stmt);
+
+ continue;
+ }
+
+ terminate_all_aliasing_chains (NULL_TREE, NULL_TREE, stmt);
+ }
+ terminate_and_process_all_chains ();
+ }
+ return 0;
+}
+
+} // anon namespace
+
+/* Construct and return a store merging pass object. */
+
+gimple_opt_pass *
+make_pass_store_merging (gcc::context *ctxt)
+{
+ return new pass_store_merging (ctxt);
+}
diff --git a/gcc/opts.c b/gcc/opts.c
index 45f1f89cd16cbc1fc061e50f62fc8a1cffca5075..1cf474bee1839b406f1ce6feee5f784cc2ca9057 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -522,6 +522,7 @@ static const struct default_options default_options_table[] =
{ OPT_LEVELS_2_PLUS, OPT_fisolate_erroneous_paths_dereference, NULL, 1 },
{ OPT_LEVELS_2_PLUS, OPT_fipa_ra, NULL, 1 },
{ OPT_LEVELS_2_PLUS, OPT_flra_remat, NULL, 1 },
+ { OPT_LEVELS_2_PLUS, OPT_fstore_merging, NULL, 1 },
/* -O3 optimizations. */
{ OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
diff --git a/gcc/params.def b/gcc/params.def
index 8907aa4a0ffa5619b2b4022faa81db4fad89ce52..e63e5948089e1a6569cce386292d0d0890612256 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1100,6 +1100,12 @@ DEFPARAM (PARAM_MAX_TAIL_MERGE_COMPARISONS,
"Maximum amount of similar bbs to compare a bb with.",
10, 0, 0)
+DEFPARAM (PARAM_STORE_MERGING_ALLOW_UNALIGNED,
+ "store-merging-allow-unaligned",
+ "Allow the store merging pass to introduce unaligned stores "
+ "if it is legal to do so",
+ 1, 0, 1)
+
DEFPARAM (PARAM_MAX_TAIL_MERGE_ITERATIONS,
"max-tail-merge-iterations",
"Maximum amount of iterations of the pass over a function.",
diff --git a/gcc/passes.def b/gcc/passes.def
index 2830421a277a25b92506ce09e8b21067b6e1a3e2..ee7dd5032730e8741a379b1ed3a2cfa7e5ab12ef 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -329,6 +329,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_phiopt);
NEXT_PASS (pass_fold_builtins);
NEXT_PASS (pass_optimize_widening_mul);
+ NEXT_PASS (pass_store_merging);
NEXT_PASS (pass_tail_calls);
/* If DCE is not run before checking for uninitialized uses,
we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
diff --git a/gcc/testsuite/g++.dg/init/new17.C b/gcc/testsuite/g++.dg/init/new17.C
index a7b1659b97393b11c5ef043ca355808bfa0e22ce..f6a3231f9aa7eb8abdf061b50073176bc0cd37c3 100644
--- a/gcc/testsuite/g++.dg/init/new17.C
+++ b/gcc/testsuite/g++.dg/init/new17.C
@@ -1,5 +1,5 @@
// { dg-do compile }
-// { dg-options "-O2 -fstrict-aliasing -fdump-tree-optimized" }
+// { dg-options "-O2 -fstrict-aliasing -fno-store-merging -fdump-tree-optimized" }
// Test that placement new does not introduce an unnecessary memory
// barrier.
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr22141-1.c b/gcc/testsuite/gcc.c-torture/execute/pr22141-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..7c888b469cf39f00ced8ddb8cc6dc245fa30b97b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr22141-1.c
@@ -0,0 +1,122 @@
+/* PR middle-end/22141 */
+
+extern void abort (void);
+
+struct S
+{
+ struct T
+ {
+ char a;
+ char b;
+ char c;
+ char d;
+ } t;
+} u;
+
+struct U
+{
+ struct S s[4];
+};
+
+void __attribute__((noinline))
+c1 (struct T *p)
+{
+ if (p->a != 1 || p->b != 2 || p->c != 3 || p->d != 4)
+ abort ();
+ __builtin_memset (p, 0xaa, sizeof (*p));
+}
+
+void __attribute__((noinline))
+c2 (struct S *p)
+{
+ c1 (&p->t);
+}
+
+void __attribute__((noinline))
+c3 (struct U *p)
+{
+ c2 (&p->s[2]);
+}
+
+void __attribute__((noinline))
+f1 (void)
+{
+ u = (struct S) { { 1, 2, 3, 4 } };
+}
+
+void __attribute__((noinline))
+f2 (void)
+{
+ u.t.a = 1;
+ u.t.b = 2;
+ u.t.c = 3;
+ u.t.d = 4;
+}
+
+void __attribute__((noinline))
+f3 (void)
+{
+ u.t.d = 4;
+ u.t.b = 2;
+ u.t.a = 1;
+ u.t.c = 3;
+}
+
+void __attribute__((noinline))
+f4 (void)
+{
+ struct S v;
+ v.t.a = 1;
+ v.t.b = 2;
+ v.t.c = 3;
+ v.t.d = 4;
+ c2 (&v);
+}
+
+void __attribute__((noinline))
+f5 (struct S *p)
+{
+ p->t.a = 1;
+ p->t.c = 3;
+ p->t.d = 4;
+ p->t.b = 2;
+}
+
+void __attribute__((noinline))
+f6 (void)
+{
+ struct U v;
+ v.s[2].t.a = 1;
+ v.s[2].t.b = 2;
+ v.s[2].t.c = 3;
+ v.s[2].t.d = 4;
+ c3 (&v);
+}
+
+void __attribute__((noinline))
+f7 (struct U *p)
+{
+ p->s[2].t.a = 1;
+ p->s[2].t.c = 3;
+ p->s[2].t.d = 4;
+ p->s[2].t.b = 2;
+}
+
+int
+main (void)
+{
+ struct U w;
+ f1 ();
+ c2 (&u);
+ f2 ();
+ c1 (&u.t);
+ f3 ();
+ c2 (&u);
+ f4 ();
+ f5 (&u);
+ c2 (&u);
+ f6 ();
+ f7 (&w);
+ c3 (&w);
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr22141-2.c b/gcc/testsuite/gcc.c-torture/execute/pr22141-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..cb9cc79026310260ffc3a83bfdf9bfc92f998a86
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr22141-2.c
@@ -0,0 +1,122 @@
+/* PR middle-end/22141 */
+
+extern void abort (void);
+
+struct S
+{
+ struct T
+ {
+ char a;
+ char b;
+ char c;
+ char d;
+ } t;
+} u __attribute__((aligned));
+
+struct U
+{
+ struct S s[4];
+};
+
+void __attribute__((noinline))
+c1 (struct T *p)
+{
+ if (p->a != 1 || p->b != 2 || p->c != 3 || p->d != 4)
+ abort ();
+ __builtin_memset (p, 0xaa, sizeof (*p));
+}
+
+void __attribute__((noinline))
+c2 (struct S *p)
+{
+ c1 (&p->t);
+}
+
+void __attribute__((noinline))
+c3 (struct U *p)
+{
+ c2 (&p->s[2]);
+}
+
+void __attribute__((noinline))
+f1 (void)
+{
+ u = (struct S) { { 1, 2, 3, 4 } };
+}
+
+void __attribute__((noinline))
+f2 (void)
+{
+ u.t.a = 1;
+ u.t.b = 2;
+ u.t.c = 3;
+ u.t.d = 4;
+}
+
+void __attribute__((noinline))
+f3 (void)
+{
+ u.t.d = 4;
+ u.t.b = 2;
+ u.t.a = 1;
+ u.t.c = 3;
+}
+
+void __attribute__((noinline))
+f4 (void)
+{
+ struct S v __attribute__((aligned));
+ v.t.a = 1;
+ v.t.b = 2;
+ v.t.c = 3;
+ v.t.d = 4;
+ c2 (&v);
+}
+
+void __attribute__((noinline))
+f5 (struct S *p)
+{
+ p->t.a = 1;
+ p->t.c = 3;
+ p->t.d = 4;
+ p->t.b = 2;
+}
+
+void __attribute__((noinline))
+f6 (void)
+{
+ struct U v __attribute__((aligned));
+ v.s[2].t.a = 1;
+ v.s[2].t.b = 2;
+ v.s[2].t.c = 3;
+ v.s[2].t.d = 4;
+ c3 (&v);
+}
+
+void __attribute__((noinline))
+f7 (struct U *p)
+{
+ p->s[2].t.a = 1;
+ p->s[2].t.c = 3;
+ p->s[2].t.d = 4;
+ p->s[2].t.b = 2;
+}
+
+int
+main (void)
+{
+ struct U w __attribute__((aligned));
+ f1 ();
+ c2 (&u);
+ f2 ();
+ c1 (&u.t);
+ f3 ();
+ c2 (&u);
+ f4 ();
+ f5 (&u);
+ c2 (&u);
+ f6 ();
+ f7 (&w);
+ c3 (&w);
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/store_merging_1.c b/gcc/testsuite/gcc.dg/store_merging_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..35f4d82e6b22a231f1d7c6b3688a4bbcb57d2510
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_1.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+struct bar {
+ int a;
+ char b;
+ char c;
+ char d;
+ char e;
+ char f;
+ char g;
+};
+
+void
+foo1 (struct bar *p)
+{
+ p->b = 0;
+ p->a = 0;
+ p->c = 0;
+ p->d = 0;
+ p->e = 0;
+}
+
+void
+foo2 (struct bar *p)
+{
+ p->b = 0;
+ p->a = 0;
+ p->c = 1;
+ p->d = 0;
+ p->e = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_2.c b/gcc/testsuite/gcc.dg/store_merging_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..8e2acf390891284d96f646efd2b025a2ad7cb87d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_2.c
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+struct bar
+{
+ int a;
+ unsigned char b;
+ unsigned char c;
+ short d;
+ unsigned char e;
+ unsigned char f;
+ unsigned char g;
+};
+
+__attribute__ ((noinline)) void
+foozero (struct bar *p)
+{
+ p->b = 0;
+ p->a = 0;
+ p->c = 0;
+ p->d = 0;
+ p->e = 0;
+ p->f = 0;
+ p->g = 0;
+}
+
+__attribute__ ((noinline)) void
+foo1 (struct bar *p)
+{
+ p->b = 1;
+ p->a = 2;
+ p->c = 3;
+ p->d = 4;
+ p->e = 5;
+ p->f = 0;
+ p->g = 0xff;
+}
+
+__attribute__ ((noinline)) void
+foo2 (struct bar *p, struct bar *p2)
+{
+ p->b = 0xff;
+ p2->b = 0xa;
+ p->a = 0xfffff;
+ p2->c = 0xc;
+ p->c = 0xff;
+ p2->d = 0xbf;
+ p->d = 0xfff;
+}
+
+int
+main (void)
+{
+ struct bar b1, b2;
+ foozero (&b1);
+ foozero (&b2);
+
+ foo1 (&b1);
+ if (b1.b != 1 || b1.a != 2 || b1.c != 3 || b1.d != 4 || b1.e != 5
+ || b1.f != 0 || b1.g != 0xff)
+ __builtin_abort ();
+
+ foozero (&b1);
+ /* Make sure writes to aliasing struct pointers preserve the
+ correct order. */
+ foo2 (&b1, &b1);
+ if (b1.b != 0xa || b1.a != 0xfffff || b1.c != 0xff || b1.d != 0xfff)
+ __builtin_abort ();
+
+ foozero (&b1);
+ foo2 (&b1, &b2);
+ if (b1.a != 0xfffff || b1.b != 0xff || b1.c != 0xff || b1.d != 0xfff
+ || b2.b != 0xa || b2.c != 0xc || b2.d != 0xbf)
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_3.c b/gcc/testsuite/gcc.dg/store_merging_3.c
new file mode 100644
index 0000000000000000000000000000000000000000..caf356da98159074488186dba6cad02233fa3aa2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging-details" } */
+
+/* Make sure stores to volatile addresses don't get combined with
+ other accesses. */
+
+struct bar
+{
+ int a;
+ char b;
+ char c;
+ volatile short d;
+ char e;
+ char f;
+ char g;
+};
+
+void
+foozero (struct bar *p)
+{
+ p->b = 0xa;
+ p->a = 0xb;
+ p->c = 0xc;
+ p->d = 0;
+ p->e = 0xd;
+ p->f = 0xe;
+ p->g = 0xf;
+}
+
+/* { dg-final { scan-tree-dump "Volatile access terminates all chains" "store-merging" } } */
+/* { dg-final { scan-tree-dump-times "=\{v\} 0;" 1 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_4.c b/gcc/testsuite/gcc.dg/store_merging_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..a3d67697d6418ba0cd8aaad2f92d9ea720ec7ffc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_4.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+/* Check that we can merge interleaving stores that are guaranteed
+ to be non-aliasing. */
+
+struct bar
+{
+ int a;
+ char b;
+ char c;
+ short d;
+ char e;
+ char f;
+ char g;
+};
+
+void
+foozero (struct bar *restrict p, struct bar *restrict p2)
+{
+ p->b = 0xff;
+ p2->b = 0xa;
+ p->a = 0xfffff;
+ p2->a = 0xab;
+ p2->c = 0xc;
+ p->c = 0xff;
+ p2->d = 0xbf;
+ p->d = 0xfff;
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_5.c b/gcc/testsuite/gcc.dg/store_merging_5.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ffe512b842b81d97d0158cb765669758c5ff898
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_5.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+/* Make sure that non-aliasing non-constant interspersed stores do not
+ stop chains. */
+
+struct bar {
+ int a;
+ char b;
+ char c;
+ char d;
+ char e;
+ char g;
+};
+
+void
+foo1 (struct bar *p, char tmp)
+{
+ p->a = 0;
+ p->b = 0;
+ p->g = tmp;
+ p->c = 0;
+ p->d = 0;
+ p->e = 0;
+}
+
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[.*\\\]" 1 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_6.c b/gcc/testsuite/gcc.dg/store_merging_6.c
new file mode 100644
index 0000000000000000000000000000000000000000..42b5c4f92dc8e99ec1a84cec4ce557eeaeab8d18
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_6.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+/* Check that we can widen accesses to bitfields. */
+
+struct bar {
+ int a : 3;
+ unsigned char b : 4;
+ unsigned char c : 1;
+ char d;
+ char e;
+ char f;
+ char g;
+};
+
+__attribute__ ((noinline)) void
+foozero (struct bar *p)
+{
+ p->b = 0;
+ p->a = 0;
+ p->c = 0;
+ p->d = 0;
+ p->e = 0;
+ p->f = 0;
+ p->g = 0;
+}
+
+__attribute__ ((noinline)) void
+foo1 (struct bar *p)
+{
+ p->b = 3;
+ p->a = 2;
+ p->c = 1;
+ p->d = 4;
+ p->e = 5;
+}
+
+int
+main (void)
+{
+ struct bar p;
+ foozero (&p);
+ foo1 (&p);
+ if (p.a != 2 || p.b != 3 || p.c != 1 || p.d != 4 || p.e != 5
+ || p.f != 0 || p.g != 0)
+ __builtin_abort ();
+
+ return 0;
+}
+
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 2 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.dg/store_merging_7.c b/gcc/testsuite/gcc.dg/store_merging_7.c
new file mode 100644
index 0000000000000000000000000000000000000000..4be352fef4a61d97f0e95784f5061f7fc124b937
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/store_merging_7.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target non_strict_align } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+/* Check that we can merge consecutive array members through the pointer.
+ PR rtl-optimization/23684. */
+
+void
+foo (char *input)
+{
+ input = __builtin_assume_aligned (input, 8);
+ input[0] = 'H';
+ input[1] = 'e';
+ input[2] = 'l';
+ input[3] = 'l';
+ input[4] = 'o';
+ input[5] = ' ';
+ input[6] = 'w';
+ input[7] = 'o';
+ input[8] = 'r';
+ input[9] = 'l';
+ input[10] = 'd';
+ input[11] = '\0';
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
index f02e55f1cc2f01063aea35b3b88f793bb2f7c532..9de4e771ab1e73bce960d4038f8ec5b49b5c612c 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_1.c
@@ -3,22 +3,22 @@
int arr[4][4];
void
-foo ()
+foo (int x, int y)
{
- arr[0][1] = 1;
- arr[1][0] = -1;
- arr[2][0] = 1;
- arr[1][1] = -1;
- arr[0][2] = 1;
- arr[0][3] = -1;
- arr[1][2] = 1;
- arr[2][1] = -1;
- arr[3][0] = 1;
- arr[3][1] = -1;
- arr[2][2] = 1;
- arr[1][3] = -1;
- arr[2][3] = 1;
- arr[3][2] = -1;
+ arr[0][1] = x;
+ arr[1][0] = y;
+ arr[2][0] = x;
+ arr[1][1] = y;
+ arr[0][2] = x;
+ arr[0][3] = y;
+ arr[1][2] = x;
+ arr[2][1] = y;
+ arr[3][0] = x;
+ arr[3][1] = y;
+ arr[2][2] = x;
+ arr[1][3] = y;
+ arr[2][3] = x;
+ arr[3][2] = y;
}
/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 7 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_4.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_4.c
index 40056b1adebd3fe3e473e378e123ee62041da9a2..824f0d2e81bc250f40ffb71b3e39cde76c9ff28d 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_4.c
@@ -3,22 +3,22 @@
float arr[4][4];
void
-foo ()
+foo (float x, float y)
{
- arr[0][1] = 1;
- arr[1][0] = -1;
- arr[2][0] = 1;
- arr[1][1] = -1;
- arr[0][2] = 1;
- arr[0][3] = -1;
- arr[1][2] = 1;
- arr[2][1] = -1;
- arr[3][0] = 1;
- arr[3][1] = -1;
- arr[2][2] = 1;
- arr[1][3] = -1;
- arr[2][3] = 1;
- arr[3][2] = -1;
+ arr[0][1] = x;
+ arr[1][0] = y;
+ arr[2][0] = x;
+ arr[1][1] = y;
+ arr[0][2] = x;
+ arr[0][3] = y;
+ arr[1][2] = x;
+ arr[2][1] = y;
+ arr[3][0] = x;
+ arr[3][1] = y;
+ arr[2][2] = x;
+ arr[1][3] = y;
+ arr[2][3] = x;
+ arr[3][2] = y;
}
/* { dg-final { scan-assembler-times "stp\ts\[0-9\]+, s\[0-9\]" 7 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr22141.c b/gcc/testsuite/gcc.target/i386/pr22141.c
new file mode 100644
index 0000000000000000000000000000000000000000..036422e8ccf3a60c8dde10b7ce90dd391afe7f1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr22141.c
@@ -0,0 +1,126 @@
+/* PR middle-end/22141 */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+
+extern void abort (void);
+
+struct S
+{
+ struct T
+ {
+ char a;
+ char b;
+ char c;
+ char d;
+ } t;
+} u;
+
+struct U
+{
+ struct S s[4];
+};
+
+void __attribute__((noinline))
+c1 (struct T *p)
+{
+ if (p->a != 1 || p->b != 2 || p->c != 3 || p->d != 4)
+ abort ();
+ __builtin_memset (p, 0xaa, sizeof (*p));
+}
+
+void __attribute__((noinline))
+c2 (struct S *p)
+{
+ c1 (&p->t);
+}
+
+void __attribute__((noinline))
+c3 (struct U *p)
+{
+ c2 (&p->s[2]);
+}
+
+void __attribute__((noinline))
+f1 (void)
+{
+ u = (struct S) { { 1, 2, 3, 4 } };
+}
+
+void __attribute__((noinline))
+f2 (void)
+{
+ u.t.a = 1;
+ u.t.b = 2;
+ u.t.c = 3;
+ u.t.d = 4;
+}
+
+void __attribute__((noinline))
+f3 (void)
+{
+ u.t.d = 4;
+ u.t.b = 2;
+ u.t.a = 1;
+ u.t.c = 3;
+}
+
+void __attribute__((noinline))
+f4 (void)
+{
+ struct S v;
+ v.t.a = 1;
+ v.t.b = 2;
+ v.t.c = 3;
+ v.t.d = 4;
+ c2 (&v);
+}
+
+void __attribute__((noinline))
+f5 (struct S *p)
+{
+ p->t.a = 1;
+ p->t.c = 3;
+ p->t.d = 4;
+ p->t.b = 2;
+}
+
+void __attribute__((noinline))
+f6 (void)
+{
+ struct U v;
+ v.s[2].t.a = 1;
+ v.s[2].t.b = 2;
+ v.s[2].t.c = 3;
+ v.s[2].t.d = 4;
+ c3 (&v);
+}
+
+void __attribute__((noinline))
+f7 (struct U *p)
+{
+ p->s[2].t.a = 1;
+ p->s[2].t.c = 3;
+ p->s[2].t.d = 4;
+ p->s[2].t.b = 2;
+}
+
+int
+main (void)
+{
+ struct U w;
+ f1 ();
+ c2 (&u);
+ f2 ();
+ c1 (&u.t);
+ f3 ();
+ c2 (&u);
+ f4 ();
+ f5 (&u);
+ c2 (&u);
+ f6 ();
+ f7 (&w);
+ c3 (&w);
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times "67305985\|4030201" 7 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr34012.c b/gcc/testsuite/gcc.target/i386/pr34012.c
index 00b1240d1b9fb96fa558459bd4f357ad13a3881d..d0cffa052905a1c11e0927e92ab295d71517f746 100644
--- a/gcc/testsuite/gcc.target/i386/pr34012.c
+++ b/gcc/testsuite/gcc.target/i386/pr34012.c
@@ -1,7 +1,7 @@
/* PR rtl-optimization/34012 */
/* { dg-do compile } */
/* { dg-require-effective-target lp64 } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fno-store-merging" } */
void bar (long int *);
void
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a706729fcff3675fe9c915dc1146cf67e9c88133..b5373a3df157874e91c7dc727d5164007467e7d1 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -425,6 +425,7 @@ extern gimple_opt_pass *make_pass_late_warn_uninitialized (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_cse_reciprocals (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_cse_sincos (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_optimize_bswap (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_store_merging (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_optimize_widening_mul (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_warn_function_return (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_warn_function_noreturn (gcc::context *ctxt);
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][v5] GIMPLE store merging pass
2016-10-10 8:05 [PATCH][v5] GIMPLE store merging pass Kyrill Tkachov
@ 2016-10-29 8:07 ` Andreas Schwab
2016-10-29 15:57 ` [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148) Jakub Jelinek
0 siblings, 1 reply; 5+ messages in thread
From: Andreas Schwab @ 2016-10-29 8:07 UTC (permalink / raw)
To: Kyrill Tkachov; +Cc: GCC Patches, Richard Biener
That breaks Ada:
a-teioed.adb: In function 'Ada.Text_Io.Editing.Format_Number':
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb:127:4: error: alignment of array elements is greater than element size
a-teioed.adb: In function 'Ada.Text_Io.Editing.To_Picture':
a-teioed.adb:2724:4: error: alignment of array elements is greater than element size
a-teioed.adb: In function 'Ada.Text_Io.Editing.Valid':
a-teioed.adb:2751:4: error: alignment of array elements is greater than element size
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 5+ messages in thread
* [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148)
2016-10-29 8:07 ` Andreas Schwab
@ 2016-10-29 15:57 ` Jakub Jelinek
2016-10-29 19:17 ` Richard Biener
2016-10-31 9:37 ` Kyrill Tkachov
0 siblings, 2 replies; 5+ messages in thread
From: Jakub Jelinek @ 2016-10-29 15:57 UTC (permalink / raw)
To: Richard Biener, Andreas Schwab, Kyrill Tkachov; +Cc: gcc-patches
On Sat, Oct 29, 2016 at 10:07:22AM +0200, Andreas Schwab wrote:
> That breaks Ada:
>
> a-teioed.adb: In function 'Ada.Text_Io.Editing.Format_Number':
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
> a-teioed.adb: In function 'Ada.Text_Io.Editing.To_Picture':
> a-teioed.adb:2724:4: error: alignment of array elements is greater than element size
> a-teioed.adb: In function 'Ada.Text_Io.Editing.Valid':
> a-teioed.adb:2751:4: error: alignment of array elements is greater than element size
The bug is the same why PR78148 testcase fails with -fcompare-debug,
build_nonstandard_integer_type return values are shared, all such calls
with the same arguments return the same types, so using SET_TYPE_ALIGN on it
is wrong.
As it is a bootstrap failure on primary target and the fix is obvious,
I've committed it to trunk after bootstrapping/regtesting it on x86_64-linux
(where it fixed ada bootstrap) and i686-linux.
2016-10-29 Jakub Jelinek <jakub@redhat.com>
PR target/78148
* gimple-ssa-store-merging.c
(imm_store_chain_info::output_merged_store): Use build_aligned_type
instead of SET_TYPE_ALIGN on shared integral type.
* gcc.dg/pr78148.c: New test.
--- gcc/gimple-ssa-store-merging.c.jj 2016-10-29 14:39:24.000000000 +0200
+++ gcc/gimple-ssa-store-merging.c 2016-10-29 15:09:34.650749175 +0200
@@ -1130,7 +1130,7 @@ imm_store_chain_info::output_merged_stor
location_t loc = get_location_for_stmts (split_store->orig_stmts);
tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
- SET_TYPE_ALIGN (int_type, align);
+ int_type = build_aligned_type (int_type, align);
tree addr = build_fold_addr_expr (base);
tree dest = fold_build2 (MEM_REF, int_type, addr,
build_int_cst (offset_type, try_pos));
--- gcc/testsuite/gcc.dg/pr78148.c.jj 2016-10-29 15:10:05.432358626 +0200
+++ gcc/testsuite/gcc.dg/pr78148.c 2016-10-29 15:09:09.000000000 +0200
@@ -0,0 +1,31 @@
+/* PR target/78148 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcompare-debug" } */
+
+struct A { int a, b; };
+struct B { char c, d; };
+extern void bar (struct A, struct B);
+struct C { char e, f; } a;
+struct D
+{
+ int g;
+ struct C h[4];
+};
+struct D *e;
+
+struct D
+foo (void)
+{
+ int b;
+ struct B c;
+ struct A d;
+ d.b = c.c = c.d = 0;
+ bar (d, c);
+}
+
+void
+baz ()
+{
+ e->h[0].e = e->h[0].f = 0;
+ foo ();
+}
Jakub
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148)
2016-10-29 15:57 ` [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148) Jakub Jelinek
@ 2016-10-29 19:17 ` Richard Biener
2016-10-31 9:37 ` Kyrill Tkachov
1 sibling, 0 replies; 5+ messages in thread
From: Richard Biener @ 2016-10-29 19:17 UTC (permalink / raw)
To: Jakub Jelinek, Andreas Schwab, Kyrill Tkachov; +Cc: gcc-patches
On October 29, 2016 5:57:17 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Sat, Oct 29, 2016 at 10:07:22AM +0200, Andreas Schwab wrote:
>> That breaks Ada:
>>
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.Format_Number':
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.To_Picture':
>> a-teioed.adb:2724:4: error: alignment of array elements is greater
>than element size
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.Valid':
>> a-teioed.adb:2751:4: error: alignment of array elements is greater
>than element size
>
>The bug is the same why PR78148 testcase fails with -fcompare-debug,
>build_nonstandard_integer_type return values are shared, all such calls
>with the same arguments return the same types, so using SET_TYPE_ALIGN
>on it
>is wrong.
>
>As it is a bootstrap failure on primary target and the fix is obvious,
>I've committed it to trunk after bootstrapping/regtesting it on
>x86_64-linux
>(where it fixed ada bootstrap) and i686-linux.
Whoops, sorry for not noticing during review.
Richard.
>2016-10-29 Jakub Jelinek <jakub@redhat.com>
>
> PR target/78148
> * gimple-ssa-store-merging.c
> (imm_store_chain_info::output_merged_store): Use build_aligned_type
> instead of SET_TYPE_ALIGN on shared integral type.
>
> * gcc.dg/pr78148.c: New test.
>
>--- gcc/gimple-ssa-store-merging.c.jj 2016-10-29 14:39:24.000000000
>+0200
>+++ gcc/gimple-ssa-store-merging.c 2016-10-29 15:09:34.650749175 +0200
>@@ -1130,7 +1130,7 @@ imm_store_chain_info::output_merged_stor
> location_t loc = get_location_for_stmts (split_store->orig_stmts);
>
> tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
>- SET_TYPE_ALIGN (int_type, align);
>+ int_type = build_aligned_type (int_type, align);
> tree addr = build_fold_addr_expr (base);
> tree dest = fold_build2 (MEM_REF, int_type, addr,
> build_int_cst (offset_type, try_pos));
>--- gcc/testsuite/gcc.dg/pr78148.c.jj 2016-10-29 15:10:05.432358626
>+0200
>+++ gcc/testsuite/gcc.dg/pr78148.c 2016-10-29 15:09:09.000000000 +0200
>@@ -0,0 +1,31 @@
>+/* PR target/78148 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fcompare-debug" } */
>+
>+struct A { int a, b; };
>+struct B { char c, d; };
>+extern void bar (struct A, struct B);
>+struct C { char e, f; } a;
>+struct D
>+{
>+ int g;
>+ struct C h[4];
>+};
>+struct D *e;
>+
>+struct D
>+foo (void)
>+{
>+ int b;
>+ struct B c;
>+ struct A d;
>+ d.b = c.c = c.d = 0;
>+ bar (d, c);
>+}
>+
>+void
>+baz ()
>+{
>+ e->h[0].e = e->h[0].f = 0;
>+ foo ();
>+}
>
> Jakub
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148)
2016-10-29 15:57 ` [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148) Jakub Jelinek
2016-10-29 19:17 ` Richard Biener
@ 2016-10-31 9:37 ` Kyrill Tkachov
1 sibling, 0 replies; 5+ messages in thread
From: Kyrill Tkachov @ 2016-10-31 9:37 UTC (permalink / raw)
To: Jakub Jelinek, Richard Biener, Andreas Schwab; +Cc: gcc-patches
On 29/10/16 16:57, Jakub Jelinek wrote:
> On Sat, Oct 29, 2016 at 10:07:22AM +0200, Andreas Schwab wrote:
>> That breaks Ada:
>>
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.Format_Number':
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb:127:4: error: alignment of array elements is greater than element size
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.To_Picture':
>> a-teioed.adb:2724:4: error: alignment of array elements is greater than element size
>> a-teioed.adb: In function 'Ada.Text_Io.Editing.Valid':
>> a-teioed.adb:2751:4: error: alignment of array elements is greater than element size
> The bug is the same why PR78148 testcase fails with -fcompare-debug,
> build_nonstandard_integer_type return values are shared, all such calls
> with the same arguments return the same types, so using SET_TYPE_ALIGN on it
> is wrong.
>
> As it is a bootstrap failure on primary target and the fix is obvious,
> I've committed it to trunk after bootstrapping/regtesting it on x86_64-linux
> (where it fixed ada bootstrap) and i686-linux.
Thanks for fixing this Jakub!
Sorry for the breakage.
Kyrill
> 2016-10-29 Jakub Jelinek <jakub@redhat.com>
>
> PR target/78148
> * gimple-ssa-store-merging.c
> (imm_store_chain_info::output_merged_store): Use build_aligned_type
> instead of SET_TYPE_ALIGN on shared integral type.
>
> * gcc.dg/pr78148.c: New test.
>
> --- gcc/gimple-ssa-store-merging.c.jj 2016-10-29 14:39:24.000000000 +0200
> +++ gcc/gimple-ssa-store-merging.c 2016-10-29 15:09:34.650749175 +0200
> @@ -1130,7 +1130,7 @@ imm_store_chain_info::output_merged_stor
> location_t loc = get_location_for_stmts (split_store->orig_stmts);
>
> tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
> - SET_TYPE_ALIGN (int_type, align);
> + int_type = build_aligned_type (int_type, align);
> tree addr = build_fold_addr_expr (base);
> tree dest = fold_build2 (MEM_REF, int_type, addr,
> build_int_cst (offset_type, try_pos));
> --- gcc/testsuite/gcc.dg/pr78148.c.jj 2016-10-29 15:10:05.432358626 +0200
> +++ gcc/testsuite/gcc.dg/pr78148.c 2016-10-29 15:09:09.000000000 +0200
> @@ -0,0 +1,31 @@
> +/* PR target/78148 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fcompare-debug" } */
> +
> +struct A { int a, b; };
> +struct B { char c, d; };
> +extern void bar (struct A, struct B);
> +struct C { char e, f; } a;
> +struct D
> +{
> + int g;
> + struct C h[4];
> +};
> +struct D *e;
> +
> +struct D
> +foo (void)
> +{
> + int b;
> + struct B c;
> + struct A d;
> + d.b = c.c = c.d = 0;
> + bar (d, c);
> +}
> +
> +void
> +baz ()
> +{
> + e->h[0].e = e->h[0].f = 0;
> + foo ();
> +}
>
> Jakub
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-10-31 9:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-10 8:05 [PATCH][v5] GIMPLE store merging pass Kyrill Tkachov
2016-10-29 8:07 ` Andreas Schwab
2016-10-29 15:57 ` [committed] Fix bootstrap with ada x86_64-linux and -fcompare-debug failure on ppc64le-linux (PR target/78148) Jakub Jelinek
2016-10-29 19:17 ` Richard Biener
2016-10-31 9:37 ` Kyrill Tkachov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).