From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22790 invoked by alias); 8 Nov 2013 12:25:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 22779 invoked by uid 89); 8 Nov 2013 12:25:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.4 required=5.0 tests=AWL,BAYES_99,RDNS_NONE,SPAM_SUBJECT,URIBL_BLOCKED autolearn=no version=3.3.2 X-HELO: mx2.suse.de Received: from Unknown (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 08 Nov 2013 12:25:33 +0000 Received: from relay1.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 35A18A60CF; Fri, 8 Nov 2013 13:25:24 +0100 (CET) Date: Fri, 08 Nov 2013 13:07:00 -0000 From: Richard Biener To: Zoran Jovanovic Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside) In-Reply-To: Message-ID: References: <386B40EC5E8DBF459FD11A754D868AD922E31112@BADAG02.ba.imgtec.org> <386B40EC5E8DBF459FD11A754D868AD922E333D4@BADAG02.ba.imgtec.org> <386B40EC5E8DBF459FD11A754D868AD922E34DA7@BADAG02.ba.imgtec.org> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2013-11/txt/msg00892.txt.bz2 > Hello, > This is new patch version. > Comments from Bernhard Reutner-Fischer review applied. > Also, test case bitfildmrg2.c modified - it is now execute test. > > > Example: > > Original code: > D.1351; > D.1350; > D.1349; > D.1349_2 = p1_1(D)->f1; > p2_3(D)->f1 = D.1349_2; > D.1350_4 = p1_1(D)->f2; > p2_3(D)->f2 = D.1350_4; > D.1351_5 = p1_1(D)->f3; > p2_3(D)->f3 = D.1351_5; > > Optimized code: > D.1358; > _16 = pr1_2(D)->_field0; > pr2_4(D)->_field0 = _16; > > Algorithm works on basic block level and consists of following 3 major steps: > 1. Go trough basic block statements list. If there are statement pairs > that implement copy of bit field content from one memory location to > another record statements pointers and other necessary data in > corresponding data structure. > 2. Identify records that represent adjacent bit field accesses and > mark them as merged. > 3. Modify trees accordingly. > > New command line option "-ftree-bitfield-merge" is introduced. > > Tested - passed gcc regression tests. Comments inline (sorry for the late reply...) > Changelog - > > gcc/ChangeLog: > 2013-09-24 Zoran Jovanovic (zoran.jovanovic@imgtec.com) > * Makefile.in : Added tree-sra.c to GTFILES. > * common.opt (ftree-bitfield-merge): New option. > * doc/invoke.texi: Added reference to "-ftree-bitfield-merge". > * tree-sra.c (ssa_bitfield_merge): New function. > Entry for (-ftree-bitfield-merge). > (bitfield_stmt_access_pair_htab_hash): New function. > (bitfield_stmt_access_pair_htab_eq): New function. > (cmp_access): New function. > (create_and_insert_access): New function. > (get_bit_offset): New function. > (get_merged_bit_field_size): New function. > (add_stmt_access_pair): New function. > * dwarf2out.c (simple_type_size_in_bits): moved to tree.c. > (field_byte_offset): declaration moved to tree.h, static removed. > * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. > * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. > * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c. > * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h. > * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c. > (simple_type_size_in_bits): moved from dwarf2out.c. > * tree.h (expressions_equal_p): declaration added. > (field_byte_offset): declaration added. > > Patch - > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index a2e3f7a..54aa8e7 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h > $(srcdir)/coretypes.h \ > $(srcdir)/asan.c \ > $(srcdir)/ubsan.c \ > $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \ > + $(srcdir)/tree-sra.c \ > @all_gtfiles@ > > # Compute the list of GT header files from the corresponding C sources, > diff --git a/gcc/common.opt b/gcc/common.opt > index 202e169..afac514 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2164,6 +2164,10 @@ ftree-sra > Common Report Var(flag_tree_sra) Optimization > Perform scalar replacement of aggregates > > +ftree-bitfield-merge > +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization > +Enable bit-field merge on trees > + Drop the tree- prefix for new options, it doesn't tell users anything. I suggest fmerge-bitfields Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization Merge loads and stores of consecutive bitfields and adjust docs with regarding to the flag name change of course. > ftree-ter > Common Report Var(flag_tree_ter) Optimization > Replace temporary expressions in the SSA->normal pass > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index aa0f4ed..e588cae 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. > -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol > -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol > -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol > --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol > +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol > -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol > -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol > -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol > @@ -7679,6 +7679,11 @@ pointer alignment information. > This pass only operates on local scalar variables and is enabled by default > at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. > > +@item -ftree-bitfield-merge > +@opindex ftree-bitfield-merge > +Combines several adjacent bit-field accesses that copy values > +from one memory location to another into one single bit-field access. > + > @item -ftree-ccp > @opindex ftree-ccp > Perform sparse conditional constant propagation (CCP) on trees. This > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index 95049e4..e74db17 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT, > unsigned int); > static tree field_type (const_tree); > static unsigned int simple_type_align_in_bits (const_tree); > static unsigned int simple_decl_align_in_bits (const_tree); > -static unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree); > -static HOST_WIDE_INT field_byte_offset (const_tree); > static void add_AT_location_description (dw_die_ref, enum > dwarf_attribute, > dw_loc_list_ref); > static void add_data_member_location_attribute (dw_die_ref, tree); > @@ -10149,25 +10147,6 @@ is_base_type (tree type) > return 0; > } > > -/* Given a pointer to a tree node, assumed to be some kind of a ..._TYPE > - node, return the size in bits for the type if it is a constant, or else > - return the alignment for the type if the type's size is not constant, or > - else return BITS_PER_WORD if the type actually turns out to be an > - ERROR_MARK node. */ > - > -static inline unsigned HOST_WIDE_INT > -simple_type_size_in_bits (const_tree type) > -{ > - if (TREE_CODE (type) == ERROR_MARK) > - return BITS_PER_WORD; > - else if (TYPE_SIZE (type) == NULL_TREE) > - return 0; > - else if (host_integerp (TYPE_SIZE (type), 1)) > - return tree_low_cst (TYPE_SIZE (type), 1); > - else > - return TYPE_ALIGN (type); > -} > - > /* Similarly, but return a double_int instead of UHWI. */ > > static inline double_int > @@ -14521,7 +14500,7 @@ round_up_to_align (double_int t, unsigned int align) > because the offset is actually variable. (We can't handle the latter case > just yet). */ > > -static HOST_WIDE_INT > +HOST_WIDE_INT > field_byte_offset (const_tree decl) > { > double_int object_offset_in_bits; > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c > b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c > new file mode 100644 > index 0000000..e9e96b7 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg1.c > @@ -0,0 +1,30 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-bitfield-merge -fdump-tree-esra" } */ > + > +struct S > +{ > + unsigned f1:7; > + unsigned f2:9; > + unsigned f3:3; > + unsigned f4:5; > + unsigned f5:1; > + unsigned f6:2; > +}; > + > +unsigned > +foo (struct S *p1, struct S *p2, int *ptr) > +{ > + p2->f1 = p1->f1; > + p2->f2 = p1->f2; > + p2->f3 = p1->f3; > + *ptr = 7; > + p2->f4 = p1->f4; > + p2->f5 = p1->f5; > + p2->f6 = p1->f6; > + return 0; > +} > + > +/* { dg-final { scan-tree-dump "19" "esra" } } */ > +/* { dg-final { scan-tree-dump "8" "esra"} } */ That's an awfully unspecific dump scan ;) Please make it more specific and add a comment what code generation you expect. > +/* { dg-final { cleanup-tree-dump "esra" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c > b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c > new file mode 100644 > index 0000000..653e904 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitfldmrg2.c > @@ -0,0 +1,90 @@ > +/* Check whether use of -ftree-bitfield-merge results in presence of overlaping > + unions results in incorrect code. */ > +/* { dg-options "-O2 -ftree-bitfield-merge" } */ > +/* { dg-do run } */ > +#include > +extern void abort (void); > + > +struct s1 > +{ > + unsigned f1:4; > + unsigned f2:4; > + unsigned f3:4; > +}; > + > +struct s2 > +{ > + unsigned char c; > + unsigned f1:4; > + unsigned f2:4; > + unsigned f3:4; > +}; > + > +struct s3 > +{ > + unsigned f1:3; > + unsigned f2:3; > + unsigned f3:3; > +}; > + > +struct s4 > +{ > + unsigned f0:3; > + unsigned f1:3; > + unsigned f2:3; > + unsigned f3:3; > +}; > + > +union un_1 > +{ > + struct s1 a; > + struct s2 b; > +}; > + > +union un_2 > +{ > + struct s3 a; > + struct s4 b; > +}; > + > +void f1 (union un_1 *p1, union un_1 *p2) > +{ > + p2->a.f3 = p1->b.f3; > + p2->a.f2 = p1->b.f2; > + p2->a.f1 = p1->b.f1; > + > + if (p1->b.f1 != 3) > + abort (); > +} > + > +void f2 (union un_2 *p1, union un_2 *p2) > +{ > + p2->b.f1 = p1->a.f1; > + p2->b.f2 = p1->a.f2; > + p2->b.f3 = p1->a.f3; > + > + if (p2->b.f1 != 0 || p2->b.f2 != 0 || p2->b.f3 != 0) > + abort (); > +} > + > +int main () > +{ > + union un_1 u1; > + union un_2 u2; > + > + u1.b.f1 = 1; > + u1.b.f2 = 2; > + u1.b.f3 = 3; > + u1.b.c = 0; > + > + f1 (&u1, &u1); > + > + u2.b.f0 = 0; > + u2.b.f1 = 1; > + u2.b.f2 = 2; > + u2.b.f3 = 3; > + > + f2 (&u2, &u2); > + > + return 0; > +} > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index 58c7565..610245b 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -92,6 +92,7 @@ along with GCC; see the file COPYING3. If not see > #include "gimple-pretty-print.h" > #include "ipa-inline.h" > #include "ipa-utils.h" > +#include "ggc.h" You shouldn't need that I think. > /* Enumeration of all aggregate reductions we can do. */ > enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ > @@ -3424,12 +3425,476 @@ perform_intra_sra (void) > return ret; > } > > +/* This optimization combines several adjacent bit-field accesses that copy > + values from one memory location to another into one single bit-field > + access. */ > + > +/* Data for single bit-field read/write sequence. */ > +struct GTY (()) bitfield_access_d { > + gimple load_stmt; /* Bit-field load statement. */ > + gimple store_stmt; /* Bit-field store statement. */ > + unsigned src_offset_words; /* Bit-field offset at src in words. */ > + unsigned src_bit_offset; /* Bit-field offset inside source word. */ > + unsigned src_bit_size; /* Size of bit-field in source word. */ > + unsigned dst_offset_words; /* Bit-field offset at dst in words. */ > + unsigned dst_bit_offset; /* Bit-field offset inside destination > + word. */ > + unsigned src_field_offset; /* Source field offset. */ > + unsigned dst_bit_size; /* Size of bit-field in destination word. */ > + tree src_addr; /* Address of source memory access. */ > + tree dst_addr; /* Address of destination memory access. */ > + bool merged; /* True if access is merged with another > + one. */ > + bool modified; /* True if bit-field size is modified. */ > + bool is_barrier; /* True if access is barrier (call or mem > + access). */ > + struct bitfield_access_d *next; /* Access with which this one is merged. */ > + tree bitfield_type; /* Field type. */ > + tree bitfield_representative; /* Bit field representative > of original > + declaration. */ > + tree field_decl_context; /* Context of original bit-field > + declaration. */ > +}; > + > +typedef struct bitfield_access_d bitfield_access_o; > +typedef struct bitfield_access_d *bitfield_access; > + > +/* Connecting register with bit-field access sequence that defines value in > + that register. */ > +struct GTY (()) bitfield_stmt_access_pair_d > +{ > + gimple stmt; > + bitfield_access access; > +}; > + > +typedef struct bitfield_stmt_access_pair_d bitfield_stmt_access_pair_o; > +typedef struct bitfield_stmt_access_pair_d *bitfield_stmt_access_pair; > + > +static GTY ((param_is (struct bitfield_stmt_access_pair_d))) > + htab_t bitfield_stmt_access_htab; Nor does this need to be registered with the garbage collector. Its lifetime is not greater than that of the SRA pass, right? > +/* Hash table callbacks for bitfield_stmt_access_htab. */ > + > +static hashval_t > +bitfield_stmt_access_pair_htab_hash (const void *p) > +{ > + const bitfield_stmt_access_pair entry = (const bitfield_stmt_access_pair)p; > + return (hashval_t) (uintptr_t) entry->stmt; > +} > + > +static int > +bitfield_stmt_access_pair_htab_eq (const void *p1, const void *p2) > +{ > + const struct bitfield_stmt_access_pair_d *entry1 = > + (const bitfield_stmt_access_pair)p1; > + const struct bitfield_stmt_access_pair_d *entry2 = > + (const bitfield_stmt_access_pair)p2; > + return entry1->stmt == entry2->stmt; > +} > + > +/* Counter used for generating unique names for new fields. */ > +static unsigned new_field_no; Must be stale...? > +/* Compare two bit-field access records. */ > + > +static int > +cmp_access (const void *p1, const void *p2) > +{ > + const bitfield_access a1 = (*(const bitfield_access*)p1); > + const bitfield_access a2 = (*(const bitfield_access*)p2); > + > + if (a1->bitfield_representative - a2->bitfield_representative) > + return a1->bitfield_representative - a2->bitfield_representative; Subtracting two unrelated pointers is undefined - I suggest you use DECL_UID (a1->bitfield_representative). > + if (!expressions_equal_p (a1->src_addr, a2->src_addr)) > + return a1 - a2; Same - the comparison ends up depending on memory layout, that's bad. > + > + if (!expressions_equal_p (a1->dst_addr, a2->dst_addr)) > + return a1 - a2; > + if (a1->src_offset_words - a2->src_offset_words) > + return a1->src_offset_words - a2->src_offset_words; > + > + return a1->src_bit_offset - a2->src_bit_offset; > +} > + > +/* Create new bit-field access structure and add it to given bitfield_accesses > + htab. */ > + > +static bitfield_access > +create_and_insert_access (vec > + *bitfield_accesses) > +{ > + bitfield_access access = ggc_alloc_bitfield_access_d (); > + memset (access, 0, sizeof (struct bitfield_access_d)); > + bitfield_accesses->safe_push (access); > + return access; > +} > + > +/* Slightly modified add_bit_offset_attribute from dwarf2out.c. */ > + > +static inline HOST_WIDE_INT > +get_bit_offset (tree decl) > +{ > + tree type = DECL_BIT_FIELD_TYPE (decl); > + HOST_WIDE_INT bitpos_int; > + > + /* Must be a field and a bit-field. */ > + gcc_assert (type && TREE_CODE (decl) == FIELD_DECL); > + /* Bit position and decl size should be integer constants that can be > + represented in a signle HOST_WIDE_INT. */ > + if (! host_integerp (bit_position (decl), 0) > + || ! host_integerp (DECL_SIZE (decl), 1)) > + return -1; > + > + bitpos_int = int_bit_position (decl); > + return bitpos_int; > +} > + > +/* Returns size of combined bitfields. Size cannot be larger than size > + of largest directly accessible memory unit. */ > + > +static int > +get_merged_bit_field_size (bitfield_access access) > +{ > + bitfield_access tmp_access = access; > + int size = 0; > + > + while (tmp_access) > + { > + size += tmp_access->src_bit_size; > + tmp_access = tmp_access->next; > + } > + return size; > +} > + > +/* Adds new pair consisting of statement and bit-field access structure that > + contains it. */ > + > +static bool add_stmt_access_pair (bitfield_access access, gimple stmt) > +{ > + bitfield_stmt_access_pair new_pair; > + void **slot; > + if (!bitfield_stmt_access_htab) > + bitfield_stmt_access_htab = > + htab_create_ggc (128, bitfield_stmt_access_pair_htab_hash, > + bitfield_stmt_access_pair_htab_eq, NULL); > + new_pair = ggc_alloc_bitfield_stmt_access_pair_o (); > + new_pair->stmt = stmt; > + new_pair->access = access; > + slot = htab_find_slot (bitfield_stmt_access_htab, new_pair, INSERT); > + if (*slot == HTAB_EMPTY_ENTRY) > + { > + *slot = new_pair; > + return true; > + } > + return false; > +} > + > +/* Returns true if given COMPONENT_REF is part of an union. */ > + > +static bool part_of_union_p (tree component) > +{ > + tree tmp = component; > + bool res = false; > + while (TREE_CODE (tmp) == COMPONENT_REF) > + { > + if (TREE_CODE (TREE_TYPE (tmp)) == UNION_TYPE) > + { > + res = true; > + break; > + } > + tmp = TREE_OPERAND (tmp, 0); > + } > + if (tmp && (TREE_CODE (TREE_TYPE (tmp)) == UNION_TYPE)) > + res = true; > + return res; > +} > + > +/* Main entry point for the bit-field merge optimization. */ Ok, I'm skipping to here (I was wondering what you need all the above functions for) > +static unsigned int > +ssa_bitfield_merge (void) > +{ > + basic_block bb; > + unsigned int todoflags = 0; > + vec bitfield_accesses; > + int ix, iy; > + bitfield_access access; > + bool cfg_changed = false; > + > + /* In the strict volatile bitfields case, doing code changes here may prevent > + other optimizations, in particular in a SLOW_BYTE_ACCESS setting. */ > + if (flag_strict_volatile_bitfields > 0) > + return 0; Hmm, I'm not sure we should care ... - but well, I don't care ;) > + FOR_EACH_BB (bb) > + { > + gimple_stmt_iterator gsi; > + vec bitfield_accesses_merge = vNULL; > + tree prev_representative = NULL_TREE; > + bitfield_accesses.create (0); > + > + /* Identify all bitfield copy sequences in the basic-block. */ > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) > + { > + gimple stmt = gsi_stmt (gsi); > + tree lhs, rhs; > + void **slot; > + struct bitfield_stmt_access_pair_d asdata; > + > + if (!is_gimple_assign (stmt)) Instead of checking TREE_THIS_VOLATILE below check here || gimple_has_volatile_ops (stmt) also you can narrow the assigns to visit by checking for !is_gimple_assign_single (stmt) instead > + { > + gsi_next (&gsi); > + continue; > + } > + > + lhs = gimple_assign_lhs (stmt); > + rhs = gimple_assign_rhs1 (stmt); > + > + if (TREE_CODE (rhs) == COMPONENT_REF) > + { > + use_operand_p use; > + gimple use_stmt; > + tree op0 = TREE_OPERAND (rhs, 0); > + tree op1 = TREE_OPERAND (rhs, 1); > + > + if (TREE_CODE (op1) == FIELD_DECL && DECL_BIT_FIELD_TYPE (op1) op1 is always a FIELD_DECL > + && !TREE_THIS_VOLATILE (op1) && !part_of_union_p (rhs)) I wonder what's the issue with unions ... sure, for union { int field1 : 3; int field2 : 2; }; but for union { struct { int field1 : 3; int field2 : 2; } a; ... }; ? Thus I'd simply check && TREE_CODE (DECL_CONTEXT (op1)) != UNION_TYPE && TREE_CODE (DECL_CONTEXT (op1)) != QUAL_UNION_TYPE maybe introduce #define UNION_TYPE_P(TYPE) (TREE_CODE (TYPE) == UNION_TYPE \ || TREE_CODE (TYPE) == QUAL_UNION_TYPE) alongside RECORD_OR_UNION_TYPE_P in tree.h. > + { > + if (single_imm_use (lhs, &use, &use_stmt) > + && is_gimple_assign (use_stmt)) > + { > + tree use_lhs = gimple_assign_lhs (use_stmt); > + if (TREE_CODE (use_lhs) == COMPONENT_REF) I'm not sure I follow the logic here, but it seems that you match a very specific pattern only - a bitfield copy. I'd have applied the lowering (this is really a lowering pass with a cost model you make up here) if the same DECL_BIT_FIELD_REPRESENTATIVE is used more than once in a BB on the same underlying base object. That is, we are reasonably sure we can remove a redundant load and eventually redundant stores. Thus, very simplistic you'd just record a op0, DECL_BIT_FIELD_REPRESENTATIVE pair into the hashtable, using iterative_hash_expr for op0 and DECL_UID for the representative, counting the number of times you see them. Make sure to apply the same for stores to bitfields, of course. > + { > + tree use_op0 = TREE_OPERAND (use_lhs, 0); > + tree use_op1 = TREE_OPERAND (use_lhs, 1); > + tree tmp_repr = DECL_BIT_FIELD_REPRESENTATIVE (op1); > + > + if (TREE_CODE (use_op1) == FIELD_DECL > + && DECL_BIT_FIELD_TYPE (use_op1) > + && !TREE_THIS_VOLATILE (use_op1)) > + { > + if (prev_representative > + && (prev_representative != tmp_repr)) > + { > + /* If previous access has different > + representative then barrier is needed > + between it and new access. */ > + access = create_and_insert_access > + (&bitfield_accesses); > + access->is_barrier = true; > + } > + prev_representative = tmp_repr; > + /* Create new bit-field access structure. */ > + access = create_and_insert_access > + (&bitfield_accesses); > + /* Collect access data - load instruction. */ > + access->src_bit_size = tree_low_cst > + (DECL_SIZE (op1), 1); > + access->src_bit_offset = get_bit_offset (op1); > + access->src_offset_words = > + field_byte_offset (op1) / UNITS_PER_WORD; > + access->src_field_offset = > + tree_low_cst (DECL_FIELD_OFFSET (op1),1); > + access->src_addr = op0; > + access->load_stmt = gsi_stmt (gsi); > + /* Collect access data - store instruction. */ > + access->dst_bit_size = > + tree_low_cst (DECL_SIZE (use_op1), 1); > + access->dst_bit_offset = > + get_bit_offset (use_op1); > + access->dst_offset_words = > + field_byte_offset (use_op1) / UNITS_PER_WORD; > + access->dst_addr = use_op0; > + access->store_stmt = use_stmt; > + add_stmt_access_pair (access, stmt); > + add_stmt_access_pair (access, use_stmt); > + access->bitfield_type > + = DECL_BIT_FIELD_TYPE (use_op1); > + access->bitfield_representative = tmp_repr; > + access->field_decl_context = > + DECL_FIELD_CONTEXT (op1); > + } > + } > + } > + } > + } > + > + /* Insert barrier for merging if statement is function call or memory > + access. */ > + if (bitfield_stmt_access_htab) > + { > + asdata.stmt = stmt; > + slot > + = htab_find_slot (bitfield_stmt_access_htab, &asdata, > + NO_INSERT); > + if (!slot > + && ((gimple_code (stmt) == GIMPLE_CALL) > + || (gimple_has_mem_ops (stmt)))) > + { > + /* Create new bit-field access structure. */ > + access = create_and_insert_access (&bitfield_accesses); > + /* Mark it as barrier. */ > + access->is_barrier = true; > + } > + } > + gsi_next (&gsi); > + } > + > + /* If there are no at least two accesses go to the next basic block. */ > + if (bitfield_accesses.length () <= 1) > + { > + bitfield_accesses.release (); > + continue; > + } > + > + /* Find bit-field accesses that can be merged. */ > + for (ix = 0; bitfield_accesses.iterate (ix, &access); ix++) > + { > + bitfield_access head_access; > + bitfield_access mrg_access; > + bitfield_access prev_access; > + if (!bitfield_accesses_merge.exists ()) > + bitfield_accesses_merge.create (0); > + > + bitfield_accesses_merge.safe_push (access); > + > + if (!access->is_barrier > + && !(access == bitfield_accesses.last () > + && !bitfield_accesses_merge.is_empty ())) > + continue; > + > + bitfield_accesses_merge.qsort (cmp_access); > + > + head_access = prev_access = NULL; > + for (iy = 0; bitfield_accesses_merge.iterate (iy, &mrg_access); iy++) > + { > + if (head_access > + && expressions_equal_p (head_access->src_addr, > + mrg_access->src_addr) > + && expressions_equal_p (head_access->dst_addr, > + mrg_access->dst_addr) > + && prev_access->src_offset_words > + == mrg_access->src_offset_words > + && prev_access->dst_offset_words > + == mrg_access->dst_offset_words > + && prev_access->src_bit_offset + prev_access->src_bit_size > + == mrg_access->src_bit_offset > + && prev_access->dst_bit_offset + prev_access->dst_bit_size > + == mrg_access->dst_bit_offset > + && prev_access->bitfield_representative > + == mrg_access->bitfield_representative) > + { > + /* Merge conditions are satisfied - merge accesses. */ > + mrg_access->merged = true; > + prev_access->next = mrg_access; > + head_access->modified = true; > + prev_access = mrg_access; > + } > + else > + head_access = prev_access = mrg_access; > + } > + bitfield_accesses_merge.release (); > + bitfield_accesses_merge = vNULL; > + } > + > + /* Modify generated code. */ Ick, so you are actually applying an optimization instead of just lowering the accesses to DECL_BIT_FIELD_REPRESENTATIVE accesses and letting CSE and DSE do their job. Hmm. I don't think this is a good idea. Instead what I'd like to see is more something like the following (bah, I never merged BIT_FIELD_EXPR which performs the bit-field-insert in a single stmt); quickly hacked together, without a cost model, just the lowering piece (likely not endian clean - but who knows). Index: gcc/tree-sra.c =================================================================== --- gcc/tree-sra.c (revision 204561) +++ gcc/tree-sra.c (working copy) @@ -3445,10 +3445,121 @@ perform_intra_sra (void) return ret; } +static void +lower_bitfields (void) +{ + basic_block bb; + FOR_EACH_BB (bb) + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + if (!gimple_assign_single_p (stmt) + || gimple_has_volatile_ops (stmt)) + continue; + + /* Lower a bitfield read. */ + tree ref = gimple_assign_rhs1 (stmt); + if (TREE_CODE (ref) == COMPONENT_REF + && DECL_BIT_FIELD_TYPE (TREE_OPERAND (ref, 1)) + && DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (ref, 1))) + { + tree field = TREE_OPERAND (ref, 1); + tree rep = DECL_BIT_FIELD_REPRESENTATIVE (field); + unsigned HOST_WIDE_INT off; + if (host_integerp (DECL_FIELD_OFFSET (field), 1) + && host_integerp (DECL_FIELD_OFFSET (rep), 1)) + off = (tree_low_cst (DECL_FIELD_OFFSET (field), 1) + - tree_low_cst (DECL_FIELD_OFFSET (rep), 1)) * BITS_PER_UNIT; + else + off = 0; + off += (tree_low_cst (DECL_FIELD_BIT_OFFSET (field), 1) + - tree_low_cst (DECL_FIELD_BIT_OFFSET (rep), 1)); + tree loadres = make_ssa_name (TREE_TYPE (rep), NULL); + gimple load + = gimple_build_assign (loadres, + build3 (COMPONENT_REF, TREE_TYPE (rep), + TREE_OPERAND (ref, 0), rep, + NULL_TREE)); + gimple_set_vuse (load, gimple_vuse (stmt)); + gsi_insert_before (&gsi, load, GSI_SAME_STMT); + gimple_assign_set_rhs1 (stmt, + build3 (BIT_FIELD_REF, TREE_TYPE (ref), + loadres, + DECL_SIZE (field), + bitsize_int (off))); + update_stmt (stmt); + } + ref = gimple_assign_lhs (stmt); + if (TREE_CODE (ref) == COMPONENT_REF + && DECL_BIT_FIELD_TYPE (TREE_OPERAND (ref, 1)) + && DECL_BIT_FIELD_REPRESENTATIVE (TREE_OPERAND (ref, 1))) + { + tree field = TREE_OPERAND (ref, 1); + tree rep = DECL_BIT_FIELD_REPRESENTATIVE (field); + unsigned HOST_WIDE_INT off; + if (host_integerp (DECL_FIELD_OFFSET (field), 1) + && host_integerp (DECL_FIELD_OFFSET (rep), 1)) + off = (tree_low_cst (DECL_FIELD_OFFSET (field), 1) + - tree_low_cst (DECL_FIELD_OFFSET (rep), 1)) * BITS_PER_UNIT; + else + off = 0; + off += (tree_low_cst (DECL_FIELD_BIT_OFFSET (field), 1) + - tree_low_cst (DECL_FIELD_BIT_OFFSET (rep), 1)); + tree loadres = make_ssa_name (TREE_TYPE (rep), NULL); + gimple load + = gimple_build_assign (loadres, + build3 (COMPONENT_REF, TREE_TYPE (rep), + unshare_expr (TREE_OPERAND (ref, 0)), + rep, + NULL_TREE)); + gimple_set_vuse (load, gimple_vuse (stmt)); + gsi_insert_before (&gsi, load, GSI_SAME_STMT); + /* FIXME: BIT_FIELD_EXPR. */ + /* Mask out bits. */ + tree masked = make_ssa_name (TREE_TYPE (rep), NULL); + gimple tems + = gimple_build_assign_with_ops (BIT_AND_EXPR, + masked, loadres, + double_int_to_tree (TREE_TYPE (rep), + double_int::mask (TREE_INT_CST_LOW (DECL_SIZE (field))).lshift (off))); + gsi_insert_before (&gsi, tems, GSI_SAME_STMT); + /* Zero-extend the value to representative size. */ + tree tem2 = make_ssa_name (unsigned_type_for (TREE_TYPE (field)), NULL); + tems = gimple_build_assign_with_ops (NOP_EXPR, tem2, + gimple_assign_rhs1 (stmt), + NULL_TREE); + gsi_insert_before (&gsi, tems, GSI_SAME_STMT); + tree tem = make_ssa_name (TREE_TYPE (rep), NULL); + tems = gimple_build_assign_with_ops (NOP_EXPR, tem, tem2, NULL_TREE); + gsi_insert_before (&gsi, tems, GSI_SAME_STMT); + /* Shift the value into place. */ + tem2 = make_ssa_name (TREE_TYPE (rep), NULL); + tems = gimple_build_assign_with_ops (LSHIFT_EXPR, tem2, tem, + size_int (off)); + gsi_insert_before (&gsi, tems, GSI_SAME_STMT); + /* Merge masked loaded value and value. */ + tree modres = make_ssa_name (TREE_TYPE (rep), NULL); + gimple mod + = gimple_build_assign_with_ops (BIT_IOR_EXPR, modres, + masked, tem2); + gsi_insert_before (&gsi, mod, GSI_SAME_STMT); + /* Finally adjust the store. */ + gimple_assign_set_rhs1 (stmt, modres); + gimple_assign_set_lhs (stmt, + build3 (COMPONENT_REF, TREE_TYPE (rep), + TREE_OPERAND (ref, 0), rep, + NULL_TREE)); + update_stmt (stmt); + } + } +} + /* Perform early intraprocedural SRA. */ static unsigned int early_intra_sra (void) { + lower_bitfields (); sra_mode = SRA_MODE_EARLY_INTRA; return perform_intra_sra (); } The idea is that this lowering then makes ESRA able to handle it (you can see that for example in the result from gcc.c-torture/execute/20000113-1.c which is miscompiled by the above, eh ... what did I say about not testing it??). I'll try to get the above working and cleaned up a bit today and maybe early next week. So stay tuned, I'll hand it over to make you test if it works as advertised for you. Richard. > + for (ix = 0; bitfield_accesses.iterate (ix, &access); ix++) > + { > + if (access->merged) > + { > + /* Access merged - remove instructions. */ > + gimple_stmt_iterator tmp_gsi; > + tmp_gsi = gsi_for_stmt (access->load_stmt); > + gsi_remove (&tmp_gsi, true); > + tmp_gsi = gsi_for_stmt (access->store_stmt); > + gsi_remove (&tmp_gsi, true); > + } > + else if (access->modified) > + { > + /* Access modified - modify generated code. */ > + gimple_stmt_iterator tmp_gsi; > + tree tmp_ssa; > + tree itype = make_node (INTEGER_TYPE); > + tree new_rhs; > + tree new_lhs; > + gimple new_stmt; > + char new_field_name [15]; > + int decl_size; > + > + /* Bit-field size changed - modify load statement. */ > + access->src_bit_size = get_merged_bit_field_size (access); > + > + TYPE_PRECISION (itype) = access->src_bit_size; > + fixup_unsigned_type (itype); > + > + /* Create new declaration. */ > + tree new_field = make_node (FIELD_DECL); > + sprintf (new_field_name, "_field%x", new_field_no++); > + DECL_NAME (new_field) = get_identifier (new_field_name); > + TREE_TYPE (new_field) = itype; > + DECL_BIT_FIELD (new_field) = 1; > + DECL_BIT_FIELD_TYPE (new_field) = access->bitfield_type; > + DECL_BIT_FIELD_REPRESENTATIVE (new_field) = > + access->bitfield_representative; > + DECL_FIELD_CONTEXT (new_field) = access->field_decl_context; > + DECL_NONADDRESSABLE_P (new_field) = 1; > + DECL_FIELD_OFFSET (new_field) = > + build_int_cst (unsigned_type_node, access->src_field_offset); > + DECL_FIELD_BIT_OFFSET (new_field) = > + build_int_cst (unsigned_type_node, access->src_bit_offset); > + DECL_SIZE (new_field) = build_int_cst (unsigned_type_node, > + access->src_bit_size); > + decl_size = access->src_bit_size / BITS_PER_UNIT > + + (access->src_bit_size % BITS_PER_UNIT ? 1 : 0); > + DECL_SIZE_UNIT (new_field) = > + build_int_cst (unsigned_type_node, decl_size); > + > + tmp_ssa = make_ssa_name (create_tmp_var (itype, NULL), NULL); > + > + /* Create new component ref. */ > + new_rhs = build3 (COMPONENT_REF, itype, access->src_addr, > + new_field, NULL); > + tmp_gsi = gsi_for_stmt (access->load_stmt); > + new_stmt = gimple_build_assign (tmp_ssa, new_rhs); > + gsi_insert_after (&tmp_gsi, new_stmt, GSI_SAME_STMT); > + SSA_NAME_DEF_STMT (tmp_ssa) = new_stmt; > + gsi_remove (&tmp_gsi, true); > + > + /* Bit-field size changed - modify store statement. */ > + /* Create new component ref. */ > + new_lhs = build3 (COMPONENT_REF, itype, access->dst_addr, > + new_field, NULL); > + new_stmt = gimple_build_assign (new_lhs, tmp_ssa); > + tmp_gsi = gsi_for_stmt (access->store_stmt); > + gsi_insert_after (&tmp_gsi, new_stmt, GSI_SAME_STMT); > + gsi_remove (&tmp_gsi, true); > + cfg_changed = true; > + } > + } > + /* Empty or delete data structures used for basic block. */ > + htab_empty (bitfield_stmt_access_htab); > + bitfield_stmt_access_htab = NULL; > + bitfield_accesses.release (); > + } > + > + if (cfg_changed) > + todoflags |= TODO_cleanup_cfg; > + > + return todoflags; > +} > + > /* Perform early intraprocedural SRA. */ > static unsigned int > early_intra_sra (void) > { > + unsigned int todoflags = 0; > sra_mode = SRA_MODE_EARLY_INTRA; > - return perform_intra_sra (); > + if (flag_tree_bitfield_merge) > + todoflags = ssa_bitfield_merge (); > + return todoflags | perform_intra_sra (); > } > > /* Perform "late" intraprocedural SRA. */ > @@ -5105,3 +5570,5 @@ make_pass_early_ipa_sra (gcc::context *ctxt) > { > return new pass_early_ipa_sra (ctxt); > } > + > +#include "gt-tree-sra.h" > diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c > index bd2feb4..683fd76 100644 > --- a/gcc/tree-ssa-sccvn.c > +++ b/gcc/tree-ssa-sccvn.c > @@ -4176,29 +4176,6 @@ get_next_value_id (void) > return next_value_id++; > } > > - > -/* Compare two expressions E1 and E2 and return true if they are equal. */ > - > -bool > -expressions_equal_p (tree e1, tree e2) > -{ > - /* The obvious case. */ > - if (e1 == e2) > - return true; > - > - /* If only one of them is null, they cannot be equal. */ > - if (!e1 || !e2) > - return false; > - > - /* Now perform the actual comparison. */ > - if (TREE_CODE (e1) == TREE_CODE (e2) > - && operand_equal_p (e1, e2, OEP_PURE_SAME)) > - return true; > - > - return false; > -} > - > - > /* Return true if the nary operation NARY may trap. This is a copy > of stmt_could_throw_1_p adjusted to the SCCVN IL. */ > > diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h > index 94e3603..707b18c 100644 > --- a/gcc/tree-ssa-sccvn.h > +++ b/gcc/tree-ssa-sccvn.h > @@ -21,10 +21,6 @@ > #ifndef TREE_SSA_SCCVN_H > #define TREE_SSA_SCCVN_H > > -/* In tree-ssa-sccvn.c */ > -bool expressions_equal_p (tree, tree); > - > - > /* TOP of the VN lattice. */ > extern tree VN_TOP; > > diff --git a/gcc/tree.c b/gcc/tree.c > index 6593cf8..6683957 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -12155,4 +12155,44 @@ contains_bitfld_component_ref_p (const_tree ref) > return false; > } > > +/* Compare two expressions E1 and E2 and return true if they are equal. */ > + > +bool > +expressions_equal_p (tree e1, tree e2) > +{ > + /* The obvious case. */ > + if (e1 == e2) > + return true; > + > + /* If only one of them is null, they cannot be equal. */ > + if (!e1 || !e2) > + return false; > + > + /* Now perform the actual comparison. */ > + if (TREE_CODE (e1) == TREE_CODE (e2) > + && operand_equal_p (e1, e2, OEP_PURE_SAME)) > + return true; > + > + return false; > +} > + > +/* Given a pointer to a tree node, assumed to be some kind of a ..._TYPE > + node, return the size in bits for the type if it is a constant, or else > + return the alignment for the type if the type's size is not constant, or > + else return BITS_PER_WORD if the type actually turns out to be an > + ERROR_MARK node. */ > + > +unsigned HOST_WIDE_INT > +simple_type_size_in_bits (const_tree type) > +{ > + if (TREE_CODE (type) == ERROR_MARK) > + return BITS_PER_WORD; > + else if (TYPE_SIZE (type) == NULL_TREE) > + return 0; > + else if (host_integerp (TYPE_SIZE (type), 1)) > + return tree_low_cst (TYPE_SIZE (type), 1); > + else > + return TYPE_ALIGN (type); > +} > + > #include "gt-tree.h" > diff --git a/gcc/tree.h b/gcc/tree.h > index a263a2c..b2bd481 100644 > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -4528,6 +4528,7 @@ extern tree get_ref_base_and_extent (tree, > HOST_WIDE_INT *, > HOST_WIDE_INT *, HOST_WIDE_INT *); > extern bool contains_bitfld_component_ref_p (const_tree); > extern bool type_in_anonymous_namespace_p (tree); > +extern bool expressions_equal_p (tree e1, tree e2); > > /* In tree-nested.c */ > extern tree build_addr (tree, tree); > @@ -4693,6 +4694,10 @@ extern tree resolve_asm_operand_names (tree, > tree, tree, tree); > extern tree tree_overlaps_hard_reg_set (tree, HARD_REG_SET *); > #endif > > +/* In dwarf2out.c. */ > +HOST_WIDE_INT > +field_byte_offset (const_tree decl); > + > ? > /* In tree-inline.c */ > > @@ -4979,5 +4984,6 @@ builtin_decl_implicit_p (enum built_in_function fncode) > #endif /* NO_DOLLAR_IN_LABEL */ > #endif /* NO_DOT_IN_LABEL */ > > +extern unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree type); > > #endif /* GCC_TREE_H */ > > > Regards, > Zoran Jovanovic > > -- Richard Biener SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend