From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 35966 invoked by alias); 27 Oct 2017 12:20:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 35940 invoked by uid 89); 27 Oct 2017 12:20:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-23.8 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 27 Oct 2017 12:19:56 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 80049548A64; Fri, 27 Oct 2017 14:19:54 +0200 (CEST) Date: Fri, 27 Oct 2017 12:27:00 -0000 From: Jan Hubicka To: Richard Biener Cc: GCC Patches Subject: Re: [RFC, PR 80689] Copy small aggregates element-wise Message-ID: <20171027121954.GC64719@kam.mff.cuni.cz> References: <20171013161353.uvlix6gfxz7ir4y7@virgil.suse.cz> <20171026121840.yxylasrref3supuy@virgil.suse.cz> <20171026125515.GA73191@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-SW-Source: 2017-10/txt/msg02053.txt.bz2 > On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka wrote: > >> I think the limit should be on the number of generated copies and not > >> the overall size of the structure... If the struct were composed of > >> 32 individual chars we wouldn't want to emit 32 loads and 32 stores... > >> > >> I wonder how rep; movb; interacts with store to load forwarding? Is > >> that maybe optimized well on some archs? movb should always > >> forward and wasn't the setup cost for small N reasonable on modern > >> CPUs? > > > > rep mov is win over loop for blocks over 128bytes on core, for blocks in rage > > 24-128 on zen. This is w/o store/load forwarding, but I doubt those provide > > a cheap way around. > > > >> > >> It probably depends on the width of the entries in the store buffer, > >> if they appear in-order and the alignment of the stores (if they are larger than > >> 8 bytes they are surely aligned). IIRC CPUs had smaller store buffer > >> entries than cache line size. > >> > >> Given that load bandwith is usually higher than store bandwith it > >> might make sense to do the store combining in our copying sequence, > >> like for the 8 byte entry case use sth like > >> > >> movq 0(%eax), %xmm0 > >> movhps 8(%eax), %xmm0 // or vpinsert > >> mov[au]ps %xmm0, 0%(ebx) > >> ... > >> > >> thus do two loads per store and perform the stores in wider > >> mode? > > > > This may be somewhat faster indeed. I am not sure if store to load > > forwarding will work for the later half when read again by halves. > > It would not happen on older CPUs :) > > Yes, forwarding larger stores to smaller loads generally works fine > since forever with the usual restrictions of alignment/size being > power of two "halves". > > The question is of course what to do for 4 byte or smaller elements or > mixed size elements. We can do zero-extending loads > (do we have them for QI, HI mode loads as well?) and > do shift and or's. I'm quite sure the CPUs wouldn't like to > see vpinsert's of different vector mode destinations. So it > would be 8 byte stores from GPRs and values built up via > shift & or. > > As said, the important part is that IIRC CPUs can usually > have more loads in flight than stores. Esp. Bulldozer > with the split core was store buffer size limited (but it > could do merging of store buffer entries IIRC). In a way this seems an independent optimization to me (store forwarding) because for sure this can work for user code which does not originate from copy sequence. Seems like something bit tricky to implement on top of RTL though. Honza > > Richard. > > > Honza > >> > >> As said a general concern was you not copying padding. If you > >> put this into an even more common place you surely will break > >> stuff, no? > >> > >> Richard. > >> > >> > > >> > Martin > >> > > >> > > >> >> > >> >> Richard. > >> >> > >> >> > Martin > >> >> > > >> >> > > >> >> > 2017-10-12 Martin Jambor > >> >> > > >> >> > PR target/80689 > >> >> > * tree-sra.h: New file. > >> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to > >> >> > tree-sra.h. > >> >> > * expr.c: Include params.h and tree-sra.h. > >> >> > (emit_move_elementwise): New function. > >> >> > (store_expr_with_bounds): Optionally use it. > >> >> > * ipa-cp.c: Include tree-sra.h. > >> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New. > >> >> > * config/i386/i386.c (ix86_option_override_internal): Set > >> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35. > >> >> > * tree-sra.c: Include tree-sra.h. > >> >> > (scalarizable_type_p): Renamed to > >> >> > simple_mix_of_records_and_arrays_p, made public, renamed the > >> >> > second parameter to allow_char_arrays. > >> >> > (extract_min_max_idx_from_array): New function. > >> >> > (completely_scalarize): Moved bits of the function to > >> >> > extract_min_max_idx_from_array. > >> >> > > >> >> > testsuite/ > >> >> > * gcc.target/i386/pr80689-1.c: New test. > >> >> > --- > >> >> > gcc/config/i386/i386.c | 4 ++ > >> >> > gcc/expr.c | 103 ++++++++++++++++++++++++++++-- > >> >> > gcc/ipa-cp.c | 1 + > >> >> > gcc/ipa-prop.h | 4 -- > >> >> > gcc/params.def | 6 ++ > >> >> > gcc/testsuite/gcc.target/i386/pr80689-1.c | 38 +++++++++++ > >> >> > gcc/tree-sra.c | 86 +++++++++++++++---------- > >> >> > gcc/tree-sra.h | 33 ++++++++++ > >> >> > 8 files changed, 233 insertions(+), 42 deletions(-) > >> >> > create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c > >> >> > create mode 100644 gcc/tree-sra.h > >> >> > > >> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > >> >> > index 1ee8351c21f..87f602e7ead 100644 > >> >> > --- a/gcc/config/i386/i386.c > >> >> > +++ b/gcc/config/i386/i386.c > >> >> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p, > >> >> > ix86_tune_cost->l2_cache_size, > >> >> > opts->x_param_values, > >> >> > opts_set->x_param_values); > >> >> > + maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, > >> >> > + 35, > >> >> > + opts->x_param_values, > >> >> > + opts_set->x_param_values); > >> >> > > >> >> > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ > >> >> > if (opts->x_flag_prefetch_loop_arrays < 0 > >> >> > diff --git a/gcc/expr.c b/gcc/expr.c > >> >> > index 134ee731c29..dff24e7f166 100644 > >> >> > --- a/gcc/expr.c > >> >> > +++ b/gcc/expr.c > >> >> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3. If not see > >> >> > #include "tree-chkp.h" > >> >> > #include "rtl-chkp.h" > >> >> > #include "ccmp.h" > >> >> > - > >> >> > +#include "params.h" > >> >> > +#include "tree-sra.h" > >> >> > > >> >> > /* If this is nonzero, we do not bother generating VOLATILE > >> >> > around volatile memory references, and we are willing to > >> >> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from) > >> >> > return maybe_expand_insn (code, 2, ops); > >> >> > } > >> >> > > >> >> > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET to TARGET > >> >> > + plus OFFSET, but do so element-wise and/or field-wise for each record and > >> >> > + array within TYPE. TYPE must either be a register type or an aggregate > >> >> > + complying with scalarizable_type_p. > >> >> > + > >> >> > + If CALL_PARAM_P is nonzero, this is a store into a call param on the > >> >> > + stack, and block moves may need to be treated specially. */ > >> >> > + > >> >> > +static void > >> >> > +emit_move_elementwise (tree type, rtx target, rtx source, HOST_WIDE_INT offset, > >> >> > + int call_param_p) > >> >> > +{ > >> >> > + switch (TREE_CODE (type)) > >> >> > + { > >> >> > + case RECORD_TYPE: > >> >> > + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) > >> >> > + if (TREE_CODE (fld) == FIELD_DECL) > >> >> > + { > >> >> > + HOST_WIDE_INT fld_offset = offset + int_bit_position (fld); > >> >> > + tree ft = TREE_TYPE (fld); > >> >> > + emit_move_elementwise (ft, target, source, fld_offset, > >> >> > + call_param_p); > >> >> > + } > >> >> > + break; > >> >> > + > >> >> > + case ARRAY_TYPE: > >> >> > + { > >> >> > + tree elem_type = TREE_TYPE (type); > >> >> > + HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type)); > >> >> > + gcc_assert (el_size > 0); > >> >> > + > >> >> > + offset_int idx, max; > >> >> > + /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1. */ > >> >> > + if (extract_min_max_idx_from_array (type, &idx, &max)) > >> >> > + { > >> >> > + HOST_WIDE_INT el_offset = offset; > >> >> > + for (; idx <= max; ++idx) > >> >> > + { > >> >> > + emit_move_elementwise (elem_type, target, source, el_offset, > >> >> > + call_param_p); > >> >> > + el_offset += el_size; > >> >> > + } > >> >> > + } > >> >> > + } > >> >> > + break; > >> >> > + default: > >> >> > + machine_mode mode = TYPE_MODE (type); > >> >> > + > >> >> > + rtx ntgt = adjust_address (target, mode, offset / BITS_PER_UNIT); > >> >> > + rtx nsrc = adjust_address (source, mode, offset / BITS_PER_UNIT); > >> >> > + > >> >> > + /* TODO: Figure out whether the following is actually necessary. */ > >> >> > + if (target == ntgt) > >> >> > + ntgt = copy_rtx (target); > >> >> > + if (source == nsrc) > >> >> > + nsrc = copy_rtx (source); > >> >> > + > >> >> > + gcc_assert (mode != VOIDmode); > >> >> > + if (mode != BLKmode) > >> >> > + emit_move_insn (ntgt, nsrc); > >> >> > + else > >> >> > + { > >> >> > + /* For example vector gimple registers can end up here. */ > >> >> > + rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX, > >> >> > + TYPE_MODE (sizetype), EXPAND_NORMAL); > >> >> > + emit_block_move (ntgt, nsrc, size, > >> >> > + (call_param_p > >> >> > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > >> >> > + } > >> >> > + break; > >> >> > + } > >> >> > + return; > >> >> > +} > >> >> > + > >> >> > /* Generate code for computing expression EXP, > >> >> > and storing the value into TARGET. > >> >> > > >> >> > @@ -5713,9 +5788,29 @@ store_expr_with_bounds (tree exp, rtx target, int call_param_p, > >> >> > emit_group_store (target, temp, TREE_TYPE (exp), > >> >> > int_size_in_bytes (TREE_TYPE (exp))); > >> >> > else if (GET_MODE (temp) == BLKmode) > >> >> > - emit_block_move (target, temp, expr_size (exp), > >> >> > - (call_param_p > >> >> > - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > >> >> > + { > >> >> > + /* Copying smallish BLKmode structures with emit_block_move and thus > >> >> > + by-pieces can result in store-to-load stalls. So copy some simple > >> >> > + small aggregates element or field-wise. */ > >> >> > + if (GET_MODE (target) == BLKmode > >> >> > + && AGGREGATE_TYPE_P (TREE_TYPE (exp)) > >> >> > + && !TREE_ADDRESSABLE (TREE_TYPE (exp)) > >> >> > + && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp))) > >> >> > + && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp))) > >> >> > + <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY) > >> >> > + * BITS_PER_UNIT)) > >> >> > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp), false)) > >> >> > + { > >> >> > + /* FIXME: Can this happen? What would it mean? */ > >> >> > + gcc_assert (!reverse); > >> >> > + emit_move_elementwise (TREE_TYPE (exp), target, temp, 0, > >> >> > + call_param_p); > >> >> > + } > >> >> > + else > >> >> > + emit_block_move (target, temp, expr_size (exp), > >> >> > + (call_param_p > >> >> > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > >> >> > + } > >> >> > /* If we emit a nontemporal store, there is nothing else to do. */ > >> >> > else if (nontemporal && emit_storent_insn (target, temp)) > >> >> > ; > >> >> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > >> >> > index 6b3d8d7364c..7d6019bbd30 100644 > >> >> > --- a/gcc/ipa-cp.c > >> >> > +++ b/gcc/ipa-cp.c > >> >> > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see > >> >> > #include "tree-ssa-ccp.h" > >> >> > #include "stringpool.h" > >> >> > #include "attribs.h" > >> >> > +#include "tree-sra.h" > >> >> > > >> >> > template class ipcp_value; > >> >> > > >> >> > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h > >> >> > index fa5bed49ee0..2313cc884ed 100644 > >> >> > --- a/gcc/ipa-prop.h > >> >> > +++ b/gcc/ipa-prop.h > >> >> > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *, > >> >> > void ipa_release_body_info (struct ipa_func_body_info *); > >> >> > tree ipa_get_callee_param_type (struct cgraph_edge *e, int i); > >> >> > > >> >> > -/* From tree-sra.c: */ > >> >> > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree, > >> >> > - gimple_stmt_iterator *, bool); > >> >> > - > >> >> > /* In ipa-cp.c */ > >> >> > void ipa_cp_c_finalize (void); > >> >> > > >> >> > diff --git a/gcc/params.def b/gcc/params.def > >> >> > index e55afc28053..5e19f1414a0 100644 > >> >> > --- a/gcc/params.def > >> >> > +++ b/gcc/params.def > >> >> > @@ -1294,6 +1294,12 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK, > >> >> > "Enable loop epilogue vectorization using smaller vector size.", > >> >> > 0, 0, 1) > >> >> > > >> >> > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, > >> >> > + "max-size-for-elementwise-copy", > >> >> > + "Maximum size in bytes of a structure or array to by considered for " > >> >> > + "copying by its individual fields or elements", > >> >> > + 0, 0, 512) > >> >> > + > >> >> > /* > >> >> > > >> >> > Local variables: > >> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c b/gcc/testsuite/gcc.target/i386/pr80689-1.c > >> >> > new file mode 100644 > >> >> > index 00000000000..4156d4fba45 > >> >> > --- /dev/null > >> >> > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c > >> >> > @@ -0,0 +1,38 @@ > >> >> > +/* { dg-do compile } */ > >> >> > +/* { dg-options "-O2" } */ > >> >> > + > >> >> > +typedef struct st1 > >> >> > +{ > >> >> > + long unsigned int a,b; > >> >> > + long int c,d; > >> >> > +}R; > >> >> > + > >> >> > +typedef struct st2 > >> >> > +{ > >> >> > + int t; > >> >> > + R reg; > >> >> > +}N; > >> >> > + > >> >> > +void Set (const R *region, N *n_info ); > >> >> > + > >> >> > +void test(N *n_obj ,const long unsigned int a, const long unsigned int b, const long int c,const long int d) > >> >> > +{ > >> >> > + R reg; > >> >> > + > >> >> > + reg.a=a; > >> >> > + reg.b=b; > >> >> > + reg.c=c; > >> >> > + reg.d=d; > >> >> > + Set (®, n_obj); > >> >> > + > >> >> > +} > >> >> > + > >> >> > +void Set (const R *reg, N *n_obj ) > >> >> > +{ > >> >> > + n_obj->reg=(*reg); > >> >> > +} > >> >> > + > >> >> > + > >> >> > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */ > >> >> > +/* { dg-final { scan-assembler-not "movdqu" } } */ > >> >> > +/* { dg-final { scan-assembler-not "movups" } } */ > >> >> > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > >> >> > index bac593951e7..ade97964205 100644 > >> >> > --- a/gcc/tree-sra.c > >> >> > +++ b/gcc/tree-sra.c > >> >> > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. If not see > >> >> > #include "ipa-fnsummary.h" > >> >> > #include "ipa-utils.h" > >> >> > #include "builtins.h" > >> >> > +#include "tree-sra.h" > >> >> > > >> >> > /* Enumeration of all aggregate reductions we can do. */ > >> >> > enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ > >> >> > @@ -952,14 +953,14 @@ create_access (tree expr, gimple *stmt, bool write) > >> >> > } > >> >> > > >> >> > > >> >> > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length > >> >> > - ARRAY_TYPE with fields that are either of gimple register types (excluding > >> >> > - bit-fields) or (recursively) scalarizable types. CONST_DECL must be true if > >> >> > - we are considering a decl from constant pool. If it is false, char arrays > >> >> > - will be refused. */ > >> >> > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length ARRAY_TYPE with > >> >> > + fields/elements that are not bit-fields and are either register types or > >> >> > + recursively comply with simple_mix_of_records_and_arrays_p. Furthermore, if > >> >> > + ALLOW_CHAR_ARRAYS is false, the function will return false also if TYPE > >> >> > + contains an array of elements that only have one byte. */ > >> >> > > >> >> > -static bool > >> >> > -scalarizable_type_p (tree type, bool const_decl) > >> >> > +bool > >> >> > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays) > >> >> > { > >> >> > gcc_assert (!is_gimple_reg_type (type)); > >> >> > if (type_contains_placeholder_p (type)) > >> >> > @@ -977,7 +978,7 @@ scalarizable_type_p (tree type, bool const_decl) > >> >> > return false; > >> >> > > >> >> > if (!is_gimple_reg_type (ft) > >> >> > - && !scalarizable_type_p (ft, const_decl)) > >> >> > + && !simple_mix_of_records_and_arrays_p (ft, allow_char_arrays)) > >> >> > return false; > >> >> > } > >> >> > > >> >> > @@ -986,7 +987,7 @@ scalarizable_type_p (tree type, bool const_decl) > >> >> > case ARRAY_TYPE: > >> >> > { > >> >> > HOST_WIDE_INT min_elem_size; > >> >> > - if (const_decl) > >> >> > + if (allow_char_arrays) > >> >> > min_elem_size = 0; > >> >> > else > >> >> > min_elem_size = BITS_PER_UNIT; > >> >> > @@ -1008,7 +1009,7 @@ scalarizable_type_p (tree type, bool const_decl) > >> >> > > >> >> > tree elem = TREE_TYPE (type); > >> >> > if (!is_gimple_reg_type (elem) > >> >> > - && !scalarizable_type_p (elem, const_decl)) > >> >> > + && !simple_mix_of_records_and_arrays_p (elem, allow_char_arrays)) > >> >> > return false; > >> >> > return true; > >> >> > } > >> >> > @@ -1017,10 +1018,38 @@ scalarizable_type_p (tree type, bool const_decl) > >> >> > } > >> >> > } > >> >> > > >> >> > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, tree); > >> >> > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, > >> >> > + tree); > >> >> > + > >> >> > +/* For a given array TYPE, return false if its domain does not have any maximum > >> >> > + value. Otherwise calculate MIN and MAX indices of the first and the last > >> >> > + element. */ > >> >> > + > >> >> > +bool > >> >> > +extract_min_max_idx_from_array (tree type, offset_int *min, offset_int *max) > >> >> > +{ > >> >> > + tree domain = TYPE_DOMAIN (type); > >> >> > + tree minidx = TYPE_MIN_VALUE (domain); > >> >> > + gcc_assert (TREE_CODE (minidx) == INTEGER_CST); > >> >> > + tree maxidx = TYPE_MAX_VALUE (domain); > >> >> > + if (!maxidx) > >> >> > + return false; > >> >> > + gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); > >> >> > + > >> >> > + /* MINIDX and MAXIDX are inclusive, and must be interpreted in > >> >> > + DOMAIN (e.g. signed int, whereas min/max may be size_int). */ > >> >> > + *min = wi::to_offset (minidx); > >> >> > + *max = wi::to_offset (maxidx); > >> >> > + if (!TYPE_UNSIGNED (domain)) > >> >> > + { > >> >> > + *min = wi::sext (*min, TYPE_PRECISION (domain)); > >> >> > + *max = wi::sext (*max, TYPE_PRECISION (domain)); > >> >> > + } > >> >> > + return true; > >> >> > +} > >> >> > > >> >> > /* Create total_scalarization accesses for all scalar fields of a member > >> >> > - of type DECL_TYPE conforming to scalarizable_type_p. BASE > >> >> > + of type DECL_TYPE conforming to simple_mix_of_records_and_arrays_p. BASE > >> >> > must be the top-most VAR_DECL representing the variable; within that, > >> >> > OFFSET locates the member and REF must be the memory reference expression for > >> >> > the member. */ > >> >> > @@ -1047,27 +1076,14 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) > >> >> > { > >> >> > tree elemtype = TREE_TYPE (decl_type); > >> >> > tree elem_size = TYPE_SIZE (elemtype); > >> >> > - gcc_assert (elem_size && tree_fits_shwi_p (elem_size)); > >> >> > HOST_WIDE_INT el_size = tree_to_shwi (elem_size); > >> >> > gcc_assert (el_size > 0); > >> >> > > >> >> > - tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type)); > >> >> > - gcc_assert (TREE_CODE (minidx) == INTEGER_CST); > >> >> > - tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type)); > >> >> > + offset_int idx, max; > >> >> > /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1. */ > >> >> > - if (maxidx) > >> >> > + if (extract_min_max_idx_from_array (decl_type, &idx, &max)) > >> >> > { > >> >> > - gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); > >> >> > tree domain = TYPE_DOMAIN (decl_type); > >> >> > - /* MINIDX and MAXIDX are inclusive, and must be interpreted in > >> >> > - DOMAIN (e.g. signed int, whereas min/max may be size_int). */ > >> >> > - offset_int idx = wi::to_offset (minidx); > >> >> > - offset_int max = wi::to_offset (maxidx); > >> >> > - if (!TYPE_UNSIGNED (domain)) > >> >> > - { > >> >> > - idx = wi::sext (idx, TYPE_PRECISION (domain)); > >> >> > - max = wi::sext (max, TYPE_PRECISION (domain)); > >> >> > - } > >> >> > for (int el_off = offset; idx <= max; ++idx) > >> >> > { > >> >> > tree nref = build4 (ARRAY_REF, elemtype, > >> >> > @@ -1088,10 +1104,10 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) > >> >> > } > >> >> > > >> >> > /* Create total_scalarization accesses for a member of type TYPE, which must > >> >> > - satisfy either is_gimple_reg_type or scalarizable_type_p. BASE must be the > >> >> > - top-most VAR_DECL representing the variable; within that, POS and SIZE locate > >> >> > - the member, REVERSE gives its torage order. and REF must be the reference > >> >> > - expression for it. */ > >> >> > + satisfy either is_gimple_reg_type or simple_mix_of_records_and_arrays_p. > >> >> > + BASE must be the top-most VAR_DECL representing the variable; within that, > >> >> > + POS and SIZE locate the member, REVERSE gives its torage order. and REF must > >> >> > + be the reference expression for it. */ > >> >> > > >> >> > static void > >> >> > scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse, > >> >> > @@ -1111,7 +1127,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse, > >> >> > } > >> >> > > >> >> > /* Create a total_scalarization access for VAR as a whole. VAR must be of a > >> >> > - RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p. */ > >> >> > + RECORD_TYPE or ARRAY_TYPE conforming to > >> >> > + simple_mix_of_records_and_arrays_p. */ > >> >> > > >> >> > static void > >> >> > create_total_scalarization_access (tree var) > >> >> > @@ -2803,8 +2820,9 @@ analyze_all_variable_accesses (void) > >> >> > { > >> >> > tree var = candidate (i); > >> >> > > >> >> > - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), > >> >> > - constant_decl_p (var))) > >> >> > + if (VAR_P (var) > >> >> > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (var), > >> >> > + constant_decl_p (var))) > >> >> > { > >> >> > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > >> >> > <= max_scalarization_size) > >> >> > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h > >> >> > new file mode 100644 > >> >> > index 00000000000..dc901385994 > >> >> > --- /dev/null > >> >> > +++ b/gcc/tree-sra.h > >> >> > @@ -0,0 +1,33 @@ > >> >> > +/* tree-sra.h - Run-time parameters. > >> >> > + Copyright (C) 2017 Free Software Foundation, Inc. > >> >> > + > >> >> > +This file is part of GCC. > >> >> > + > >> >> > +GCC is free software; you can redistribute it and/or modify it under > >> >> > +the terms of the GNU General Public License as published by the Free > >> >> > +Software Foundation; either version 3, or (at your option) any later > >> >> > +version. > >> >> > + > >> >> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > >> >> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > >> >> > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > >> >> > +for more details. > >> >> > + > >> >> > +You should have received a copy of the GNU General Public License > >> >> > +along with GCC; see the file COPYING3. If not see > >> >> > +. */ > >> >> > + > >> >> > +#ifndef TREE_SRA_H > >> >> > +#define TREE_SRA_H > >> >> > + > >> >> > + > >> >> > +bool simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays); > >> >> > +bool extract_min_max_idx_from_array (tree type, offset_int *idx, > >> >> > + offset_int *max); > >> >> > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset, > >> >> > + bool reverse, tree exp_type, > >> >> > + gimple_stmt_iterator *gsi, bool insert_after); > >> >> > + > >> >> > + > >> >> > + > >> >> > +#endif /* TREE_SRA_H */ > >> >> > -- > >> >> > 2.14.1 > >> >> >