Hi all, The v3 of this patch addresses feedback I received on the version posted at [1]. The merged store buffer is now represented as a char array that we splat values onto with native_encode_expr and native_interpret_expr. This allows us to merge anything that native_encode_expr accepts, including floating point values and short vectors. So this version extends the functionality of the previous one in that it handles floating point values as well. The first phase of the algorithm that detects the contiguous stores is also slightly refactored according to feedback to read more fluently. Richi, I experimented with merging up to MOVE_MAX bytes rather than word size but I got worse results on aarch64. MOVE_MAX there is 16 (because it has load/store register pair instructions) but the 128-bit immediates that we ended synthesising were too complex. Perhaps the TImode immediate store RTL expansions could be improved, but for now I've left the maximum merge size to be BITS_PER_WORD. I've disabled the pass for PDP-endian targets as the merging code proved to be quite fiddly to get right for different endiannesses and I didn't feel comfortable writing logic for BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN targets without serious testing capabilities. I hope that's ok (I note the bswap pass also doesn't try to do anything on such targets). Tested on arm, aarch64, x86_64 and on big-endian arm and aarch64. How does this version look? Thanks, Kyrill [1] https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01512.html 2016-09-06 Kyrylo Tkachov PR middle-end/22141 * Makefile.in (OBJS): Add gimple-ssa-store-merging.o. * common.opt (fstore-merging): New Optimization option. * opts.c (default_options_table): Add entry for OPT_ftree_store_merging. * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define. * passes.def: Insert pass_tree_store_merging. * tree-pass.h (make_pass_store_merging): Declare extern prototype. * gimple-ssa-store-merging.c: New file. * doc/invoke.texi (Optimization Options): Document -fstore-merging. 2016-09-06 Kyrylo Tkachov Jakub Jelinek PR middle-end/22141 * gcc.c-torture/execute/pr22141-1.c: New test. * gcc.c-torture/execute/pr22141-2.c: Likewise. * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging. * gcc.target/aarch64/ldp_stp_4.c: Likewise. * gcc.dg/store_merging_1.c: New test. * gcc.dg/store_merging_2.c: Likewise. * gcc.dg/store_merging_3.c: Likewise. * gcc.dg/store_merging_4.c: Likewise. * gcc.dg/store_merging_5.c: Likewise. * gcc.dg/store_merging_6.c: Likewise. * gcc.target/i386/pr22141.c: Likewise. * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.