public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [C++0x] contiguous bitfields race implementation
@ 2011-05-09 17:12 Aldy Hernandez
  2011-05-09 18:04 ` Jeff Law
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-05-09 17:12 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

Seeing that the current C++ draft has been approved, I'd like to submit 
this for mainline, and get the proper review everyone's being quietly 
avoiding :).

To refresh everyone's memory, here is the problem:

struct
{
     unsigned int a : 4;
     unsigned char b;
     unsigned int c: 6;
} var;


void seta(){
       var.a = 12;
}


In the new C++ standard, stores into <a> cannot touch <b>, so we can't 
store with anything wider (e.g. a 32 bit store) that will touch <b>. 
This problem can be seen on strictly aligned targets such as ARM, where 
we store the above sequence with a 32-bit store. Or on x86-64 with <a> 
being volatile (PR48124).

This patch fixes both problems, but only for the C++ memory model. This 
is NOT a generic fix PR48124, only a fix when using "--param 
allow-store-data-races=0".  I will gladly change the parameter name, if 
another is preferred.

The gist of this patch is in max_field_size(), where we calculate the 
maximum number of bits we can store into. In doing this calculation I 
assume we can store into the padding without causing any races. So, 
padding between fields and at the end of the structure are included.

Tested on x86-64 both with and without "--param 
allow-store-data-races=0", and visually inspecting the assembly on 
arm-linux and ia64-linux.

OK for trunk?
Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 26957 bytes --]

	* params.h (ALLOW_STORE_DATA_RACES): New.
	* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
	* Makefile.in (expr.o): Depend on PARAMS_H.
	* machmode.h (get_best_mode): Add argument.
	* fold-const.c (optimize_bit_field_compare): Add argument to
	get_best_mode.
	(fold_truthop): Same.
	* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
	* expr.c (emit_group_store): Same.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Add argument.
	Add argument to get_best_mode.
	(max_field_size): New.
	(expand_assignment): Calculate maxbits and pass it down
	accordingly.
	(store_field): New argument.
	(expand_expr_real_2): New argument to store_field.
	Include params.h.
	* expr.h (store_bit_field): New argument.
	* stor-layout.c (get_best_mode): Restrict mode expansion by taking
	into account maxbits.
	* calls.c (store_unaligned_arguments_into_pseudos): New argument
	to store_bit_field.
	* expmed.c (store_bit_field_1): New argument.  Use it.
	(store_bit_field): Same.
	(store_fixed_bit_field): Same.
	(store_split_bit_field): Same.
	(extract_bit_field_1): Pass new argument to get_best_mode.
	(extract_bit_field): Same.
	* stmt.c (store_bit_field): Pass new argument to store_bit_field.
	* tree.h (DECL_THREAD_VISIBLE_P): New.
	* doc/invoke.texi: Document parameter allow-store-data-races.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 173263)
+++ doc/invoke.texi	(working copy)
@@ -8886,6 +8886,11 @@ The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @end table
 @end table
 
Index: machmode.h
===================================================================
--- machmode.h	(revision 173263)
+++ machmode.h	(working copy)
@@ -248,7 +248,9 @@ extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+					unsigned HOST_WIDE_INT,
+					unsigned int,
 					enum machine_mode, int);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: tree.h
===================================================================
--- tree.h	(revision 173263)
+++ tree.h	(working copy)
@@ -3156,6 +3156,10 @@ struct GTY(()) tree_parm_decl {
 #define DECL_THREAD_LOCAL_P(NODE) \
   (VAR_DECL_CHECK (NODE)->decl_with_vis.tls_model >= TLS_MODEL_REAL)
 
+/* Return true if a VAR_DECL is visible from another thread.  */
+#define DECL_THREAD_VISIBLE_P(NODE) \
+  (TREE_STATIC (NODE) && !DECL_THREAD_LOCAL_P (NODE))
+
 /* In a non-local VAR_DECL with static storage duration, true if the
    variable has an initialization priority.  If false, the variable
    will be initialized at the DEFAULT_INIT_PRIORITY.  */
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 173263)
+++ fold-const.c	(working copy)
@@ -3409,7 +3409,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos,
+    nmode = get_best_mode (lbitsize, lbitpos, 0,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5237,7 +5237,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5302,7 +5302,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: params.h
===================================================================
--- params.h	(revision 173263)
+++ params.h	(working copy)
@@ -206,4 +206,6 @@ extern void init_param_values (int *para
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ALLOW_STORE_DATA_RACES \
+  PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)
 #endif /* ! GCC_PARAMS_H */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 173263)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,7 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, GET_MODE (x), y);
+	      store_bit_field (op, size, start, 0, GET_MODE (x), y);
 	      return;
 	    }
 
@@ -939,7 +939,7 @@ noce_emit_move_insn (rtx x, rtx y)
   inner = XEXP (outer, 0);
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
-  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, outmode, y);
+  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 173263)
+++ expr.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  
 #include "diagnostic.h"
 #include "ssaexpand.h"
 #include "target-globals.h"
+#include "params.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -142,7 +143,8 @@ static void store_constructor_field (rtx
 				     HOST_WIDE_INT, enum machine_mode,
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, enum machine_mode,
+static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
+			unsigned HOST_WIDE_INT, enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -2063,7 +2065,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 mode, tmps[i]);
+			 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2157,7 +2159,7 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2794,7 +2796,7 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3929,6 +3931,7 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
+				 unsigned HOST_WIDE_INT maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -3989,7 +3992,7 @@ optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
-      str_mode = get_best_mode (bitsize, bitpos,
+      str_mode = get_best_mode (bitsize, bitpos, maxbits,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4098,6 +4101,92 @@ optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
+/* In the C++ memory model, consecutive bit fields in a structure are
+   considered one memory location.
+
+   Given a COMPONENT_REF, this function returns the maximum number of
+   bits we are allowed to store into, when storing into the
+   COMPONENT_REF.  We return 0, if there is no restriction.
+
+   EXP is the COMPONENT_REF.
+
+   BITPOS is the position in bits where the bit starts within the structure.
+   BITSIZE is size in bits of the field being referenced in EXP.
+
+   For example, while storing into FOO.A here...
+
+      struct {
+        BIT 0:
+          unsigned int a : 4;
+	  unsigned int b : 1;
+	BIT 8:
+	  unsigned char c;
+	  unsigned int d : 6;
+      } foo;
+
+   ...we are not allowed to store past <b>, so for the layout above,
+   we would return 8 maximum bits (because who cares if we store into
+   the padding).  */
+
+
+static unsigned HOST_WIDE_INT
+max_field_size (tree exp, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+{
+  tree field, record_type, fld;
+  HOST_WIDE_INT maxbits = bitsize;
+
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+
+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || !DECL_THREAD_VISIBLE_P (TREE_OPERAND (exp, 0)))
+    return 0;
+
+  field = TREE_OPERAND (exp, 1);
+  record_type = DECL_FIELD_CONTEXT (field);
+
+  /* Find the original field within the structure.  */
+  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
+    if (fld == field)
+      break;
+  gcc_assert (fld == field);
+
+  /* If this is the last element in the structure, we can touch from
+     BITPOS to the end of the structure (including the padding).  */
+  if (!DECL_CHAIN (fld))
+    return TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - bitpos;
+
+  /* Count contiguous bit fields not separated by a 0-length bit-field.  */
+  for (fld = DECL_CHAIN (fld); fld; fld = DECL_CHAIN (fld))
+    {
+      tree t, offset;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
+		  unshare_expr (TREE_OPERAND (exp, 0)),
+		  fld, NULL_TREE);
+      get_inner_reference (t, &bitsize, &bitpos, &offset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      /* Only count contiguous bit fields, that are not separated by a
+	 zero-length bit field.  */
+      if (!DECL_BIT_FIELD (fld)
+	  || bitsize == 0)
+	{
+	  /* Include the padding up to the next field.  */
+	  maxbits += bitpos - maxbits;
+	  break;
+	}
+
+      maxbits += bitsize;
+    }
+
+  return maxbits;
+}
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
@@ -4197,6 +4286,9 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
+      /* Max consecutive bits we are allowed to touch while storing
+	 into TO.  */
+      HOST_WIDE_INT maxbits = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4206,6 +4298,10 @@ expand_assignment (tree to, tree from, b
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
 				 &unsignedp, &volatilep, true);
 
+      if (TREE_CODE (to) == COMPONENT_REF
+	  && DECL_BIT_FIELD (TREE_OPERAND (to, 1)))
+	maxbits = max_field_size (to, bitpos, bitsize);
+
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
 
@@ -4286,12 +4382,13 @@ expand_assignment (tree to, tree from, b
 	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
-	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
+	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos, maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2, mode1, from,
+				  bitpos - mode_bitsize / 2, maxbits,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
@@ -4312,7 +4409,8 @@ expand_assignment (tree to, tree from, b
 					    0);
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
-	      result = store_field (temp, bitsize, bitpos, mode1, from,
+	      result = store_field (temp, bitsize, bitpos, maxbits,
+				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
@@ -4337,11 +4435,12 @@ expand_assignment (tree to, tree from, b
 		MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 	    }
 
-	  if (optimize_bitfield_assignment_op (bitsize, bitpos, mode1,
+	  if (optimize_bitfield_assignment_op (bitsize, bitpos, maxbits, mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
-	    result = store_field (to_rtx, bitsize, bitpos, mode1, from,
+	    result = store_field (to_rtx, bitsize, bitpos, maxbits,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	}
@@ -4734,7 +4833,7 @@ store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, GET_MODE (temp), temp);
+			     0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5177,7 +5276,8 @@ store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, mode, exp, type, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, mode, exp, type, alias_set,
+		 false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5751,6 +5851,8 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
+   MAXBITS is the number of bits we can store into, 0 if no limit.
+
    Always return const0_rtx unless we have something particular to
    return.
 
@@ -5764,6 +5866,7 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+	     unsigned HOST_WIDE_INT maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5796,8 +5899,8 @@ store_field (rtx target, HOST_WIDE_INT b
       if (bitsize != (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (target)))
 	emit_move_insn (object, target);
 
-      store_field (blk_object, bitsize, bitpos, mode, exp, type, alias_set,
-		   nontemporal);
+      store_field (blk_object, bitsize, bitpos, maxbits,
+		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
 
@@ -5911,7 +6014,7 @@ store_field (rtx target, HOST_WIDE_INT b
 	}
 
       /* Store the value in the bitfield.  */
-      store_bit_field (target, bitsize, bitpos, mode, temp);
+      store_bit_field (target, bitsize, bitpos, maxbits, mode, temp);
 
       return const0_rtx;
     }
@@ -7323,7 +7426,7 @@ expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, TYPE_MODE (valtype), treeop0,
+			   0, 0, TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 173263)
+++ expr.h	(working copy)
@@ -665,7 +665,8 @@ extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT, enum machine_mode, rtx);
+			     unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
 			      enum machine_mode, enum machine_mode);
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 173263)
+++ stor-layout.c	(working copy)
@@ -2428,6 +2428,9 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
+   MAXBITS is the maximum number of bits we are allowed to touch, when
+   referencing this bit field.  MAXBITS is 0 if there is no limit.
+
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2445,7 +2448,8 @@ fixup_unsigned_type (tree type)
    decide which of the above modes should be used.  */
 
 enum machine_mode
-get_best_mode (int bitsize, int bitpos, unsigned int align,
+get_best_mode (int bitsize, int bitpos, unsigned HOST_WIDE_INT maxbits,
+	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
@@ -2484,6 +2488,7 @@ get_best_mode (int bitsize, int bitpos, 
 	  if (bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
+	      && (!maxbits || unit <= maxbits)
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 173263)
+++ calls.c	(working copy)
@@ -909,7 +909,7 @@ store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, word_mode,
+	    store_bit_field (reg, bitsize, endian_correction, 0, word_mode,
 			     word);
 	  }
       }
Index: expmed.c
===================================================================
--- expmed.c	(revision 173263)
+++ expmed.c	(working copy)
@@ -47,9 +47,13 @@ struct target_expmed *this_target_expmed
 
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT,
@@ -333,7 +337,9 @@ mode_for_extraction (enum extraction_pat
 
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		   unsigned HOST_WIDE_INT bitnum,
+		   unsigned HOST_WIDE_INT maxbits,
+		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -547,7 +553,9 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
-				  bitnum + bit_offset, word_mode,
+				  bitnum + bit_offset,
+				  maxbits,
+				  word_mode,
 				  value_word, fallback_p))
 	    {
 	      delete_insns_since (last);
@@ -718,9 +726,10 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
+	  || (maxbits && GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, maxbits, MEM_ALIGN (op0),
 				  (op_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : op_mode),
 				  MEM_VOLATILE_P (op0));
@@ -748,7 +757,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  /* Fetch that unit, store the bitfield in it, then store
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
-	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
+	  if (store_bit_field_1 (tempreg, bitsize, xbitpos, maxbits,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -761,21 +770,28 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
-  store_fixed_bit_field (op0, offset, bitsize, bitpos, value);
+  store_fixed_bit_field (op0, offset, bitsize, bitpos, maxbits, value);
   return true;
 }
 
 /* Generate code to store value from rtx VALUE
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
+
+   MAXBITS is the maximum number of bits we are allowed to store into,
+   0 if no limit.
+
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		 unsigned HOST_WIDE_INT bitnum,
+		 unsigned HOST_WIDE_INT maxbits,
+		 enum machine_mode fieldmode,
 		 rtx value)
 {
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, fieldmode, value, true))
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, maxbits,
+			  fieldmode, value, true))
     gcc_unreachable ();
 }
 \f
@@ -791,7 +807,9 @@ store_bit_field (rtx str_rtx, unsigned H
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT maxbits,
+		       rtx value)
 {
   enum machine_mode mode;
   unsigned int total_bits = BITS_PER_WORD;
@@ -812,7 +830,7 @@ store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos, value);
+	  store_split_bit_field (op0, bitsize, bitpos, maxbits, value);
 	  return;
 	}
     }
@@ -830,10 +848,12 @@ store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
+	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
 	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+			      maxbits,
 			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
@@ -841,7 +861,7 @@ store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 value);
+				 maxbits, value);
 	  return;
 	}
 
@@ -961,7 +981,9 @@ store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT maxbits,
+		       rtx value)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1076,7 +1098,7 @@ store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, part);
+			       thispos, maxbits, part);
       bitsdone += thissize;
     }
 }
@@ -1520,7 +1542,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, 0, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1646,7 +1668,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 173263)
+++ Makefile.in	(working copy)
@@ -2916,7 +2916,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    typeclass.h hard-reg-set.h toplev.h $(DIAGNOSTIC_CORE_H) hard-reg-set.h $(EXCEPT_H) \
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
-   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H)
+   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) $(PARAMS_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
Index: stmt.c
===================================================================
--- stmt.c	(revision 173263)
+++ stmt.c	(working copy)
@@ -1758,7 +1758,7 @@ expand_return (tree retval)
 
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
-	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, word_mode,
+	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 173263)
+++ params.def	(working copy)
@@ -884,6 +884,13 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
           "Maximum number of conditional store pairs that can be sunk",
           2, 0, 0)
 
+/* Data race flags for C++0x memory model compliance.  */
+
+DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
+	  "allow-store-data-races",
+	  "Allow new data races on stores to be introduced",
+	  1, 0, 1)
+
 
 /*
 Local variables:

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-09 17:12 [C++0x] contiguous bitfields race implementation Aldy Hernandez
@ 2011-05-09 18:04 ` Jeff Law
  2011-05-09 18:05   ` Aldy Hernandez
  2011-05-09 20:11   ` Aldy Hernandez
  0 siblings, 2 replies; 81+ messages in thread
From: Jeff Law @ 2011-05-09 18:04 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, gcc-patches

On 05/09/11 10:24, Aldy Hernandez wrote:
> Seeing that the current C++ draft has been approved, I'd like to submit
> this for mainline, and get the proper review everyone's being quietly
> avoiding :).
> 
> To refresh everyone's memory, here is the problem:
> 
> struct
> {
>     unsigned int a : 4;
>     unsigned char b;
>     unsigned int c: 6;
> } var;
> 
> 
> void seta(){
>       var.a = 12;
> }
> 
> 
> In the new C++ standard, stores into <a> cannot touch <b>, so we can't
> store with anything wider (e.g. a 32 bit store) that will touch <b>.
> This problem can be seen on strictly aligned targets such as ARM, where
> we store the above sequence with a 32-bit store. Or on x86-64 with <a>
> being volatile (PR48124).
> 
> This patch fixes both problems, but only for the C++ memory model. This
> is NOT a generic fix PR48124, only a fix when using "--param
> allow-store-data-races=0".  I will gladly change the parameter name, if
> another is preferred.
> 
> The gist of this patch is in max_field_size(), where we calculate the
> maximum number of bits we can store into. In doing this calculation I
> assume we can store into the padding without causing any races. So,
> padding between fields and at the end of the structure are included.
Well, the kernel guys would like to be able to be able to preserve the
padding bits too.  It's a long long sad story that I won't repeat...
And I don't think we should further complicate this stuff with the
desire to not clobber padding bits :-)  Though be aware the request
might come one day....


> 
> Tested on x86-64 both with and without "--param
> allow-store-data-races=0", and visually inspecting the assembly on
> arm-linux and ia64-linux.
Any way to add a test to the testsuite?

General approach seems OK; I didn't dive deeply into the implementation.
 I'll leave that for rth & jason :-)

jeff

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-09 18:04 ` Jeff Law
@ 2011-05-09 18:05   ` Aldy Hernandez
  2011-05-09 19:19     ` Jeff Law
  2011-05-09 20:11   ` Aldy Hernandez
  1 sibling, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-05-09 18:05 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jason Merrill, gcc-patches


>> struct
>> {
>>      unsigned int a : 4;
>>      unsigned char b;
>>      unsigned int c: 6;
>> } var;


> Well, the kernel guys would like to be able to be able to preserve the
> padding bits too.  It's a long long sad story that I won't repeat...
> And I don't think we should further complicate this stuff with the
> desire to not clobber padding bits :-)  Though be aware the request
> might come one day....

Woah, let me see if I got this right.  If we were to store in VAR.C 
above, the default for this memory model would be NOT to clobber the 
padding bits past <c>?  That definitely makes my implementation simpler, 
so I won't complain, but that's just weird.

>> Tested on x86-64 both with and without "--param
>> allow-store-data-races=0", and visually inspecting the assembly on
>> arm-linux and ia64-linux.
> Any way to add a test to the testsuite?

Arghhh... I was afraid you'd ask for one.  It was much easier with the 
test harness on cxx-memory-model.  I'll whip one up though...

Aldy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-09 18:05   ` Aldy Hernandez
@ 2011-05-09 19:19     ` Jeff Law
  0 siblings, 0 replies; 81+ messages in thread
From: Jeff Law @ 2011-05-09 19:19 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, gcc-patches

On 05/09/11 11:26, Aldy Hernandez wrote:
> 
>>> struct
>>> {
>>>      unsigned int a : 4;
>>>      unsigned char b;
>>>      unsigned int c: 6;
>>> } var;
> 
> 
>> Well, the kernel guys would like to be able to be able to preserve the
>> padding bits too.  It's a long long sad story that I won't repeat...
>> And I don't think we should further complicate this stuff with the
>> desire to not clobber padding bits :-)  Though be aware the request
>> might come one day....
> 
> Woah, let me see if I got this right.  If we were to store in VAR.C
> above, the default for this memory model would be NOT to clobber the
> padding bits past <c>?  That definitely makes my implementation simpler,
> so I won't complain, but that's just weird.
Just to be clear, it's something I've discussed with the kernel guys and
is completely separate from the C++ memory model.  I don't think we
should wrap this into your current work.

Consider if the kernel team wanted to add some information to a
structure without growing the structure.  Furthermore, assume that the
structure escapes, say into modules that aren't necessarily going to be
rebuilt, but those modules won't need to ever access this new
information.  And assume there happens to be enough padding bits to hold
this auxiliary information.

This has actually occurred and the kernel team wanted to use the padding
bits to hold the auxiliary information and maintain kernel ABI/API
compatibility.  Unfortunately, a store to a nearby bitfield can
overwrite the padding, thus if the structure escaped to a module that
still thought the bits were padding, that module would/could clobber
those padding bits, destroying the auxiliary data.

If GCC had a mode where it would preserve the padding bits (when
possible), it'd help the kernel team in these situations.



> 
> Arghhh... I was afraid you'd ask for one.  It was much easier with the
> test harness on cxx-memory-model.  I'll whip one up though...
Given others have (rightly) called me out on it a lot recently, I
figured I'd pass along the love :-)

jeff


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-09 18:04 ` Jeff Law
  2011-05-09 18:05   ` Aldy Hernandez
@ 2011-05-09 20:11   ` Aldy Hernandez
  2011-05-09 20:28     ` Jakub Jelinek
  2011-05-09 20:49     ` Jason Merrill
  1 sibling, 2 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-05-09 20:11 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jason Merrill, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]


>> Tested on x86-64 both with and without "--param
>> allow-store-data-races=0", and visually inspecting the assembly on
>> arm-linux and ia64-linux.
> Any way to add a test to the testsuite?

I was able to find a testcase for i386/x86_64 by making the bitfield 
volatile (similar to the problem in PR48124).  So there you go... 
testcase and all :).

Jakub also gave me a testcase which triggered a buglet in 
max_field_size.  I have now added a parameter INNERDECL which is the 
inner reference, so we can properly determine if the inner decl is 
thread visible or not.

Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 27584 bytes --]

	* params.h (ALLOW_STORE_DATA_RACES): New.
	* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
	* Makefile.in (expr.o): Depend on PARAMS_H.
	* machmode.h (get_best_mode): Add argument.
	* fold-const.c (optimize_bit_field_compare): Add argument to
	get_best_mode.
	(fold_truthop): Same.
	* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
	* expr.c (emit_group_store): Same.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Add argument.
	Add argument to get_best_mode.
	(max_field_size): New.
	(expand_assignment): Calculate maxbits and pass it down
	accordingly.
	(store_field): New argument.
	(expand_expr_real_2): New argument to store_field.
	Include params.h.
	* expr.h (store_bit_field): New argument.
	* stor-layout.c (get_best_mode): Restrict mode expansion by taking
	into account maxbits.
	* calls.c (store_unaligned_arguments_into_pseudos): New argument
	to store_bit_field.
	* expmed.c (store_bit_field_1): New argument.  Use it.
	(store_bit_field): Same.
	(store_fixed_bit_field): Same.
	(store_split_bit_field): Same.
	(extract_bit_field_1): Pass new argument to get_best_mode.
	(extract_bit_field): Same.
	* stmt.c (store_bit_field): Pass new argument to store_bit_field.
	* tree.h (DECL_THREAD_VISIBLE_P): New.
	* doc/invoke.texi: Document parameter allow-store-data-races.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 173263)
+++ doc/invoke.texi	(working copy)
@@ -8886,6 +8886,11 @@ The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @end table
 @end table
 
Index: machmode.h
===================================================================
--- machmode.h	(revision 173263)
+++ machmode.h	(working copy)
@@ -248,7 +248,9 @@ extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+					unsigned HOST_WIDE_INT,
+					unsigned int,
 					enum machine_mode, int);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: tree.h
===================================================================
--- tree.h	(revision 173263)
+++ tree.h	(working copy)
@@ -3156,6 +3156,10 @@ struct GTY(()) tree_parm_decl {
 #define DECL_THREAD_LOCAL_P(NODE) \
   (VAR_DECL_CHECK (NODE)->decl_with_vis.tls_model >= TLS_MODEL_REAL)
 
+/* Return true if a VAR_DECL is visible from another thread.  */
+#define DECL_THREAD_VISIBLE_P(NODE) \
+  (TREE_STATIC (NODE) && !DECL_THREAD_LOCAL_P (NODE))
+
 /* In a non-local VAR_DECL with static storage duration, true if the
    variable has an initialization priority.  If false, the variable
    will be initialized at the DEFAULT_INIT_PRIORITY.  */
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 173263)
+++ fold-const.c	(working copy)
@@ -3409,7 +3409,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos,
+    nmode = get_best_mode (lbitsize, lbitpos, 0,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5237,7 +5237,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5302,7 +5302,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: params.h
===================================================================
--- params.h	(revision 173263)
+++ params.h	(working copy)
@@ -206,4 +206,6 @@ extern void init_param_values (int *para
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ALLOW_STORE_DATA_RACES \
+  PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)
 #endif /* ! GCC_PARAMS_H */
Index: testsuite/gcc.dg/20110509.c
===================================================================
--- testsuite/gcc.dg/20110509.c	(revision 0)
+++ testsuite/gcc.dg/20110509.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.A.  */
+
+struct S
+{
+  volatile unsigned int a : 4;
+  unsigned char b;
+  unsigned int c : 6;
+} var;
+
+void set_a()
+{
+  var.a = 12;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 173263)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,7 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, GET_MODE (x), y);
+	      store_bit_field (op, size, start, 0, GET_MODE (x), y);
 	      return;
 	    }
 
@@ -939,7 +939,7 @@ noce_emit_move_insn (rtx x, rtx y)
   inner = XEXP (outer, 0);
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
-  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, outmode, y);
+  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 173263)
+++ expr.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  
 #include "diagnostic.h"
 #include "ssaexpand.h"
 #include "target-globals.h"
+#include "params.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -142,7 +143,8 @@ static void store_constructor_field (rtx
 				     HOST_WIDE_INT, enum machine_mode,
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, enum machine_mode,
+static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
+			unsigned HOST_WIDE_INT, enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -2063,7 +2065,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 mode, tmps[i]);
+			 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2157,7 +2159,7 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2794,7 +2796,7 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3929,6 +3931,7 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
+				 unsigned HOST_WIDE_INT maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -3989,7 +3992,7 @@ optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
-      str_mode = get_best_mode (bitsize, bitpos,
+      str_mode = get_best_mode (bitsize, bitpos, maxbits,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4098,6 +4101,93 @@ optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
+/* In the C++ memory model, consecutive bit fields in a structure are
+   considered one memory location.
+
+   Given a COMPONENT_REF, this function returns the maximum number of
+   bits we are allowed to store into, when storing into the
+   COMPONENT_REF.  We return 0, if there is no restriction.
+
+   EXP is the COMPONENT_REF.
+   INNERDECL is actual object being referenced.
+   BITPOS is the position in bits where the bit starts within the structure.
+   BITSIZE is size in bits of the field being referenced in EXP.
+
+   For example, while storing into FOO.A here...
+
+      struct {
+        BIT 0:
+          unsigned int a : 4;
+	  unsigned int b : 1;
+	BIT 8:
+	  unsigned char c;
+	  unsigned int d : 6;
+      } foo;
+
+   ...we are not allowed to store past <b>, so for the layout above,
+   we would return 8 maximum bits (because who cares if we store into
+   the padding).  */
+
+
+static unsigned HOST_WIDE_INT
+max_field_size (tree exp, tree innerdecl,
+		HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+{
+  tree field, record_type, fld;
+  HOST_WIDE_INT maxbits = bitsize;
+
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+
+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || !DECL_THREAD_VISIBLE_P (innerdecl))
+    return 0;
+
+  field = TREE_OPERAND (exp, 1);
+  record_type = DECL_FIELD_CONTEXT (field);
+
+  /* Find the original field within the structure.  */
+  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
+    if (fld == field)
+      break;
+  gcc_assert (fld == field);
+
+  /* If this is the last element in the structure, we can touch from
+     BITPOS to the end of the structure (including the padding).  */
+  if (!DECL_CHAIN (fld))
+    return TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - bitpos;
+
+  /* Count contiguous bit fields not separated by a 0-length bit-field.  */
+  for (fld = DECL_CHAIN (fld); fld; fld = DECL_CHAIN (fld))
+    {
+      tree t, offset;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
+		  unshare_expr (TREE_OPERAND (exp, 0)),
+		  fld, NULL_TREE);
+      get_inner_reference (t, &bitsize, &bitpos, &offset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      /* Only count contiguous bit fields, that are not separated by a
+	 zero-length bit field.  */
+      if (!DECL_BIT_FIELD (fld)
+	  || bitsize == 0)
+	{
+	  /* Include the padding up to the next field.  */
+	  maxbits += bitpos - maxbits;
+	  break;
+	}
+
+      maxbits += bitsize;
+    }
+
+  return maxbits;
+}
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
@@ -4197,6 +4287,9 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
+      /* Max consecutive bits we are allowed to touch while storing
+	 into TO.  */
+      HOST_WIDE_INT maxbits = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4206,6 +4299,10 @@ expand_assignment (tree to, tree from, b
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
 				 &unsignedp, &volatilep, true);
 
+      if (TREE_CODE (to) == COMPONENT_REF
+	  && DECL_BIT_FIELD (TREE_OPERAND (to, 1)))
+	maxbits = max_field_size (to, tem, bitpos, bitsize);
+
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
 
@@ -4286,12 +4383,13 @@ expand_assignment (tree to, tree from, b
 	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
-	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
+	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos, maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2, mode1, from,
+				  bitpos - mode_bitsize / 2, maxbits,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
@@ -4312,7 +4410,8 @@ expand_assignment (tree to, tree from, b
 					    0);
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
-	      result = store_field (temp, bitsize, bitpos, mode1, from,
+	      result = store_field (temp, bitsize, bitpos, maxbits,
+				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
@@ -4337,11 +4436,12 @@ expand_assignment (tree to, tree from, b
 		MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 	    }
 
-	  if (optimize_bitfield_assignment_op (bitsize, bitpos, mode1,
+	  if (optimize_bitfield_assignment_op (bitsize, bitpos, maxbits, mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
-	    result = store_field (to_rtx, bitsize, bitpos, mode1, from,
+	    result = store_field (to_rtx, bitsize, bitpos, maxbits,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	}
@@ -4734,7 +4834,7 @@ store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, GET_MODE (temp), temp);
+			     0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5177,7 +5277,8 @@ store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, mode, exp, type, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, mode, exp, type, alias_set,
+		 false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5751,6 +5852,8 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
+   MAXBITS is the number of bits we can store into, 0 if no limit.
+
    Always return const0_rtx unless we have something particular to
    return.
 
@@ -5764,6 +5867,7 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+	     unsigned HOST_WIDE_INT maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5796,8 +5900,8 @@ store_field (rtx target, HOST_WIDE_INT b
       if (bitsize != (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (target)))
 	emit_move_insn (object, target);
 
-      store_field (blk_object, bitsize, bitpos, mode, exp, type, alias_set,
-		   nontemporal);
+      store_field (blk_object, bitsize, bitpos, maxbits,
+		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
 
@@ -5911,7 +6015,7 @@ store_field (rtx target, HOST_WIDE_INT b
 	}
 
       /* Store the value in the bitfield.  */
-      store_bit_field (target, bitsize, bitpos, mode, temp);
+      store_bit_field (target, bitsize, bitpos, maxbits, mode, temp);
 
       return const0_rtx;
     }
@@ -7323,7 +7427,7 @@ expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, TYPE_MODE (valtype), treeop0,
+			   0, 0, TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 173263)
+++ expr.h	(working copy)
@@ -665,7 +665,8 @@ extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT, enum machine_mode, rtx);
+			     unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
 			      enum machine_mode, enum machine_mode);
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 173263)
+++ stor-layout.c	(working copy)
@@ -2428,6 +2428,9 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
+   MAXBITS is the maximum number of bits we are allowed to touch, when
+   referencing this bit field.  MAXBITS is 0 if there is no limit.
+
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2445,7 +2448,8 @@ fixup_unsigned_type (tree type)
    decide which of the above modes should be used.  */
 
 enum machine_mode
-get_best_mode (int bitsize, int bitpos, unsigned int align,
+get_best_mode (int bitsize, int bitpos, unsigned HOST_WIDE_INT maxbits,
+	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
@@ -2484,6 +2488,7 @@ get_best_mode (int bitsize, int bitpos, 
 	  if (bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
+	      && (!maxbits || unit <= maxbits)
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 173263)
+++ calls.c	(working copy)
@@ -909,7 +909,7 @@ store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, word_mode,
+	    store_bit_field (reg, bitsize, endian_correction, 0, word_mode,
 			     word);
 	  }
       }
Index: expmed.c
===================================================================
--- expmed.c	(revision 173263)
+++ expmed.c	(working copy)
@@ -47,9 +47,13 @@ struct target_expmed *this_target_expmed
 
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT,
@@ -333,7 +337,9 @@ mode_for_extraction (enum extraction_pat
 
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		   unsigned HOST_WIDE_INT bitnum,
+		   unsigned HOST_WIDE_INT maxbits,
+		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -547,7 +553,9 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
-				  bitnum + bit_offset, word_mode,
+				  bitnum + bit_offset,
+				  maxbits,
+				  word_mode,
 				  value_word, fallback_p))
 	    {
 	      delete_insns_since (last);
@@ -718,9 +726,10 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
+	  || (maxbits && GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, maxbits, MEM_ALIGN (op0),
 				  (op_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : op_mode),
 				  MEM_VOLATILE_P (op0));
@@ -748,7 +757,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  /* Fetch that unit, store the bitfield in it, then store
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
-	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
+	  if (store_bit_field_1 (tempreg, bitsize, xbitpos, maxbits,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -761,21 +770,28 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
-  store_fixed_bit_field (op0, offset, bitsize, bitpos, value);
+  store_fixed_bit_field (op0, offset, bitsize, bitpos, maxbits, value);
   return true;
 }
 
 /* Generate code to store value from rtx VALUE
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
+
+   MAXBITS is the maximum number of bits we are allowed to store into,
+   0 if no limit.
+
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		 unsigned HOST_WIDE_INT bitnum,
+		 unsigned HOST_WIDE_INT maxbits,
+		 enum machine_mode fieldmode,
 		 rtx value)
 {
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, fieldmode, value, true))
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, maxbits,
+			  fieldmode, value, true))
     gcc_unreachable ();
 }
 \f
@@ -791,7 +807,9 @@ store_bit_field (rtx str_rtx, unsigned H
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT maxbits,
+		       rtx value)
 {
   enum machine_mode mode;
   unsigned int total_bits = BITS_PER_WORD;
@@ -812,7 +830,7 @@ store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos, value);
+	  store_split_bit_field (op0, bitsize, bitpos, maxbits, value);
 	  return;
 	}
     }
@@ -830,10 +848,12 @@ store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
+	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
 	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+			      maxbits,
 			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
@@ -841,7 +861,7 @@ store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 value);
+				 maxbits, value);
 	  return;
 	}
 
@@ -961,7 +981,9 @@ store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT maxbits,
+		       rtx value)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1076,7 +1098,7 @@ store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, part);
+			       thispos, maxbits, part);
       bitsdone += thissize;
     }
 }
@@ -1520,7 +1542,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, 0, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1646,7 +1668,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 173263)
+++ Makefile.in	(working copy)
@@ -2916,7 +2916,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    typeclass.h hard-reg-set.h toplev.h $(DIAGNOSTIC_CORE_H) hard-reg-set.h $(EXCEPT_H) \
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
-   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H)
+   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) $(PARAMS_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
Index: stmt.c
===================================================================
--- stmt.c	(revision 173263)
+++ stmt.c	(working copy)
@@ -1758,7 +1758,7 @@ expand_return (tree retval)
 
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
-	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, word_mode,
+	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 173263)
+++ params.def	(working copy)
@@ -884,6 +884,13 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
           "Maximum number of conditional store pairs that can be sunk",
           2, 0, 0)
 
+/* Data race flags for C++0x memory model compliance.  */
+
+DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
+	  "allow-store-data-races",
+	  "Allow new data races on stores to be introduced",
+	  1, 0, 1)
+
 
 /*
 Local variables:

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-09 20:11   ` Aldy Hernandez
@ 2011-05-09 20:28     ` Jakub Jelinek
  2011-05-10 11:42       ` Richard Guenther
  2011-05-09 20:49     ` Jason Merrill
  1 sibling, 1 reply; 81+ messages in thread
From: Jakub Jelinek @ 2011-05-09 20:28 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, Jason Merrill, gcc-patches

On Mon, May 09, 2011 at 01:41:13PM -0500, Aldy Hernandez wrote:
> Jakub also gave me a testcase which triggered a buglet in
> max_field_size.  I have now added a parameter INNERDECL which is the
> inner reference, so we can properly determine if the inner decl is
> thread visible or not.

What I meant actually was something different, if max_field_size
and get_inner_reference was called on say
COMPONENT_REF <ARRAY_REF <x, 4>, bitfld>
then get_inner_reference returns the whole x and bitpos
is the relative bit position of bitfld within the struct plus
4 * sizeof the containing struct.  Then 
TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - bitpos
might get negative (well, it is unsigned, so huge).
Maybe with MEM_REF such nested handled components shouldn't appear,
if that's the case, you should assert that somewhere.
If it appears, you should probably use TREE_OPERAND (component_ref, 2)
instead of bitpos.

BTW, shouldn't BIT_FIELD_REF also be handled similarly to the COMPONENT_REF?
And, probably some coordination with Richi is needed with his bitfield tree
lowering.

	Jakub

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-09 20:11   ` Aldy Hernandez
  2011-05-09 20:28     ` Jakub Jelinek
@ 2011-05-09 20:49     ` Jason Merrill
  2011-05-13 22:35       ` Aldy Hernandez
  1 sibling, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-05-09 20:49 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches

 From a quick look it seems that this patch considers bitfields 
following the one we're deliberately touching, but not previous 
bitfields in the same memory location; we need to include those as well. 
  With your struct foo, the bits touched are the same regardless of 
whether we name .a or .b.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-09 20:28     ` Jakub Jelinek
@ 2011-05-10 11:42       ` Richard Guenther
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Guenther @ 2011-05-10 11:42 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Aldy Hernandez, Jeff Law, Jason Merrill, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1561 bytes --]

On Mon, May 9, 2011 at 8:54 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, May 09, 2011 at 01:41:13PM -0500, Aldy Hernandez wrote:
>> Jakub also gave me a testcase which triggered a buglet in
>> max_field_size.  I have now added a parameter INNERDECL which is the
>> inner reference, so we can properly determine if the inner decl is
>> thread visible or not.
>
> What I meant actually was something different, if max_field_size
> and get_inner_reference was called on say
> COMPONENT_REF <ARRAY_REF <x, 4>, bitfld>
> then get_inner_reference returns the whole x and bitpos
> is the relative bit position of bitfld within the struct plus
> 4 * sizeof the containing struct.  Then
> TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - bitpos
> might get negative (well, it is unsigned, so huge).
> Maybe with MEM_REF such nested handled components shouldn't appear,
> if that's the case, you should assert that somewhere.
> If it appears, you should probably use TREE_OPERAND (component_ref, 2)
> instead of bitpos.
>
> BTW, shouldn't BIT_FIELD_REF also be handled similarly to the COMPONENT_REF?
> And, probably some coordination with Richi is needed with his bitfield tree
> lowering.

Yes, we would need to handle BIT_FIELD_REFs similar (fold can introduce
them for example).  I attached a work-in-progress patch that does
bitfield lowering at the tree level.  There are interesting issues when
trying to work out the underlying object, as bitfield layout can deliberately
obfuscate things a lot.

Richard.

>        Jakub
>

[-- Attachment #2: lower-bitfields-to-mem-ref --]
[-- Type: application/octet-stream, Size: 23825 bytes --]

2011-05-06  Richard Guenther  <rguenther@suse.de>

	PR rtl-optimization/48696
	PR tree-optimization/45144


Index: gcc/gimple-low.c
===================================================================
*** gcc/gimple-low.c.orig	2011-05-06 10:46:48.000000000 +0200
--- gcc/gimple-low.c	2011-05-06 15:01:09.000000000 +0200
*************** along with GCC; see the file COPYING3.
*** 32,37 ****
--- 32,38 ----
  #include "function.h"
  #include "diagnostic-core.h"
  #include "tree-pass.h"
+ #include "tree-pretty-print.h"
  
  /* The differences between High GIMPLE and Low GIMPLE are the
     following:
*************** record_vars (tree vars)
*** 950,952 ****
--- 951,1194 ----
  {
    record_vars_into (vars, current_function_decl);
  }
+ 
+ 
+ /* From the bit-field reference tree REF get the offset and size of the
+    underlying non-bit-field object in *OFF and *SIZE, relative to
+    TREE_OPERAND (ref, 0) and the bits referenced of that object in
+    *BIT_OFFSET and *BIT_SIZE.
+ 
+    Return false if this is a reference tree we cannot handle.  */
+ 
+ static bool
+ get_underlying_offset_and_size (tree ref, tree *off, unsigned *size,
+ 				unsigned *bit_offset, unsigned *bit_size,
+ 				tree *type)
+ {
+   tree field;
+ 
+   /* ???  Handle BIT_FIELD_REF as well.  */
+   if (!REFERENCE_CLASS_P (ref)
+       || TREE_CODE (ref) != COMPONENT_REF
+       || !DECL_BIT_FIELD (TREE_OPERAND (ref, 1)))
+     return false;
+ 
+   /* ???  It's surely not for optimization, but we eventually want
+      to canonicalize all bitfield accesses anyway.  */
+   if (TREE_THIS_VOLATILE (ref))
+     return false;
+ 
+   field = TREE_OPERAND (ref, 1);
+ 
+   *off = component_ref_field_offset (ref);
+   *bit_offset = TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field));
+   *bit_size = TREE_INT_CST_LOW (DECL_SIZE (field));
+ 
+   /* We probably need to walk adjacent preceeding FIELD_DECLs of the
+      aggregate, looking for the beginning of the bit-field.  */
+   /* For non-packed structs we could also guess based on
+      DECL_BIT_FIELD_TYPEs size and alignment (if it matches).  */
+   /* Ok, for the DECL_PACKED just allocate a byte-aligned minimal-size
+      chunk that covers the field, for !DECL_PACKED assume we can
+      use the alignment of DECL_BIT_FIELD_TYPE to guess the start of
+      the underlying object and return an aligned chunk of memory.  */
+ 
+   if (!DECL_PACKED (field))
+     {
+       /* What to do for bool bitfields or enum bitfields?
+ 	 For both expansion does not perform bitfield reduction ...  */
+       /* Maybe just always use a mode-based type?  */
+       if (TREE_CODE (DECL_BIT_FIELD_TYPE (field)) != INTEGER_TYPE)
+ 	*type = build_nonstandard_integer_type
+ 	          (GET_MODE_PRECISION
+ 		     (TYPE_MODE (DECL_BIT_FIELD_TYPE (field))), 1);
+       else
+ 	*type = DECL_BIT_FIELD_TYPE (field);
+ 
+       *size = TREE_INT_CST_LOW (TYPE_SIZE (*type));
+       *off = fold_build2 (PLUS_EXPR, sizetype, *off,
+ 			  size_int ((*bit_offset & ~(*size - 1))
+ 				    / BITS_PER_UNIT));
+       *bit_offset &= (*size - 1);
+ 
+       /* ???  If we have to do two loads give up for now.  */
+       if (*bit_offset + *bit_size > *size)
+ 	return false;
+ 
+       if (dump_file && (dump_flags & TDF_DETAILS))
+ 	{
+ 	  fprintf (dump_file, "For ");
+ 	  print_generic_expr (dump_file, ref, 0);
+ 	  fprintf (dump_file, " use ");
+ 	  print_generic_expr (dump_file, *off, 0);
+ 	  fprintf (dump_file, " size %d, bit offset %d size %d\n",
+ 		   *size, *bit_offset, *bit_size);
+ 	}
+ 
+       return true;
+     }
+   else
+     {
+       /* FIXME */
+       return false;
+     }
+ }
+ 
+ /* Lower a bitfield store at *GSI to a read-modify-write cycle.  */
+ 
+ static void
+ lower_mem_lhs (gimple_stmt_iterator *gsi)
+ {
+   gimple stmt = gsi_stmt (*gsi);
+   tree lhs;
+   tree off;
+   unsigned size, bit_size, bit_offset;
+   gimple load;
+   tree type, tem, ref, val;
+ 
+   lhs = gimple_assign_lhs (stmt);
+   if (!get_underlying_offset_and_size (lhs, &off, &size, &bit_offset, &bit_size,
+ 				       &type))
+     return;
+ 
+   /* Build a MEM_REF tree that can be used to load/store the word we
+      want to manipulate.
+      ???  Gimplifying here avoids us to explicitly handle address-taken
+      stuff in case the access was variable.  */
+   tem = create_tmp_reg (type, "BF");
+   ref = fold_build2 (MEM_REF, type,
+ 		     build_fold_addr_expr
+ 		       (unshare_expr (TREE_OPERAND (lhs, 0))),
+ 		     /* TBAA and bitfields is tricky - various ABIs pack
+ 		        different underlying typed bit-fields together.
+ 			So use the type of the bit-field container instead.  */
+ 		     fold_convert (reference_alias_ptr_type (lhs), off));
+   ref = force_gimple_operand_gsi (gsi, ref, false,
+ 				  NULL_TREE, true, GSI_SAME_STMT);
+ 
+   /* Load the word.  */
+   load = gimple_build_assign (tem, ref);
+   gsi_insert_before (gsi, load, GSI_SAME_STMT);
+ 
+   /* Or the shifted and zero-extended val to the partially cleared
+      loaded value.
+      ???  The old mem-ref branch had BIT_FIELD_EXPR for this, but it
+      had four operands ...
+      ???  Using all the fold stuff makes us handle constants nicely
+      and transparently ...
+      ???  Do we need to think about BITS/BYTES_BIG_ENDIAN here?  */
+   val = gimple_assign_rhs1 (stmt);
+   tem = force_gimple_operand_gsi
+     (gsi,
+      fold_build2 (BIT_IOR_EXPR, type,
+ 		  /* Mask out existing bits.  */
+ 		  fold_build2 (BIT_AND_EXPR, type,
+ 			       tem,
+ 			       double_int_to_tree
+ 			         (type, double_int_not
+ 				         (double_int_lshift
+ 					   (double_int_mask (bit_size),
+ 					    bit_offset, size, false)))),
+ 		  /* Shift val into place.  */
+ 		  fold_build2 (LSHIFT_EXPR, type,
+ 			       /* Zero-extend val to type.  */
+ 			       fold_convert
+ 			         (type,
+ 				  fold_convert
+ 				    (build_nonstandard_integer_type
+ 				       (bit_size, 1), val)),
+ 			       build_int_cst (integer_type_node, bit_offset))),
+      true, tem, true, GSI_SAME_STMT);
+ 
+   /* Modify the old store.  */
+   gimple_assign_set_lhs (stmt, unshare_expr (ref));
+   gimple_assign_set_rhs1 (stmt, tem);
+ }
+ 
+ /* Lower a bitfield load at *GSI.  */
+ 
+ static void
+ lower_mem_rhs (gimple_stmt_iterator *gsi)
+ {
+   gimple stmt = gsi_stmt (*gsi);
+   tree rhs;
+   tree off;
+   unsigned size, bit_size, bit_offset;
+   gimple load;
+   tree type, tem, ref;
+ 
+   rhs = gimple_assign_rhs1 (stmt);
+   if (!get_underlying_offset_and_size (rhs, &off, &size, &bit_offset, &bit_size,
+ 				       &type))
+     return;
+ 
+   /* Build a MEM_REF tree that can be used to load/store the word we
+      want to manipulate.
+      ???  Gimplifying here avoids us to explicitly handle address-taken
+      stuff in case the access was variable.  */
+   tem = create_tmp_reg (type, "BF");
+   ref = fold_build2 (MEM_REF, type,
+ 		     build_fold_addr_expr
+ 		       (unshare_expr (TREE_OPERAND (rhs, 0))),
+ 		     /* TBAA and bitfields is tricky - various ABIs pack
+ 		        different underlying typed bit-fields together.
+ 			So use the type of the bit-field container instead.  */
+ 		     fold_convert (reference_alias_ptr_type (rhs), off));
+   ref = force_gimple_operand_gsi (gsi, ref, false,
+ 				  NULL_TREE, true, GSI_SAME_STMT);
+ 
+   /* Load the word.  */
+   load = gimple_build_assign (tem, ref);
+   gsi_insert_before (gsi, load, GSI_SAME_STMT);
+ 
+   /* Shift the value into place and properly zero-/sign-extend it.  */
+   tem = force_gimple_operand_gsi
+     (gsi,
+      fold_convert (TREE_TYPE (rhs),
+ 		   fold_build2 (RSHIFT_EXPR, type, tem,
+ 				build_int_cst (integer_type_node, bit_offset))),
+      false, NULL_TREE, true, GSI_SAME_STMT);
+ 
+   /* Modify the old load.  */
+   gimple_assign_set_rhs_from_tree (gsi, tem);
+ }
+ 
+ /* Lower (some) bitfield accesses.  */
+ 
+ static unsigned int
+ lower_mem_exprs (void)
+ {
+   gimple_seq body = gimple_body (current_function_decl);
+   gimple_stmt_iterator gsi;
+ 
+   for (gsi = gsi_start (body); !gsi_end_p (gsi); gsi_next (&gsi))
+     {
+       gimple stmt = gsi_stmt (gsi);
+       if (gimple_assign_single_p (stmt))
+ 	{
+ 	  lower_mem_lhs (&gsi);
+ 	  lower_mem_rhs (&gsi);
+ 	}
+     }
+ 
+   return 0;
+ }
+ 
+ struct gimple_opt_pass pass_lower_mem =
+ {
+  {
+   GIMPLE_PASS,
+   "memlower",				/* name */
+   NULL,					/* gate */
+   lower_mem_exprs,			/* execute */
+   NULL,					/* sub */
+   NULL,					/* next */
+   0,					/* static_pass_number */
+   TV_NONE,				/* tv_id */
+   PROP_gimple_lcf,			/* properties_required */
+   0/*PROP_gimple_lmem*/,			/* properties_provided */
+   0,					/* properties_destroyed */
+   0,					/* todo_flags_start */
+   TODO_dump_func 			/* todo_flags_finish */
+  }
+ };
Index: gcc/passes.c
===================================================================
*** gcc/passes.c.orig	2011-05-06 10:46:48.000000000 +0200
--- gcc/passes.c	2011-05-06 12:16:07.000000000 +0200
*************** init_optimization_passes (void)
*** 727,732 ****
--- 727,733 ----
    NEXT_PASS (pass_lower_cf);
    NEXT_PASS (pass_refactor_eh);
    NEXT_PASS (pass_lower_eh);
+   NEXT_PASS (pass_lower_mem);
    NEXT_PASS (pass_build_cfg);
    NEXT_PASS (pass_warn_function_return);
    NEXT_PASS (pass_build_cgraph_edges);
Index: gcc/tree-pass.h
===================================================================
*** gcc/tree-pass.h.orig	2011-05-06 10:46:48.000000000 +0200
--- gcc/tree-pass.h	2011-05-06 12:16:07.000000000 +0200
*************** extern void tree_lowering_passes (tree d
*** 352,357 ****
--- 352,358 ----
  extern struct gimple_opt_pass pass_mudflap_1;
  extern struct gimple_opt_pass pass_mudflap_2;
  extern struct gimple_opt_pass pass_lower_cf;
+ extern struct gimple_opt_pass pass_lower_mem;
  extern struct gimple_opt_pass pass_refactor_eh;
  extern struct gimple_opt_pass pass_lower_eh;
  extern struct gimple_opt_pass pass_lower_eh_dispatch;
Index: gcc/fold-const.c
===================================================================
*** gcc/fold-const.c.orig	2011-05-06 10:46:48.000000000 +0200
--- gcc/fold-const.c	2011-05-06 12:16:07.000000000 +0200
*************** fold_binary_loc (location_t loc,
*** 12432,12437 ****
--- 12432,12438 ----
        /* If this is a comparison of a field, we may be able to simplify it.  */
        if ((TREE_CODE (arg0) == COMPONENT_REF
  	   || TREE_CODE (arg0) == BIT_FIELD_REF)
+ 	  && 0
  	  /* Handle the constant case even without -O
  	     to make sure the warnings are given.  */
  	  && (optimize || TREE_CODE (arg1) == INTEGER_CST))
Index: gcc/Makefile.in
===================================================================
*** gcc/Makefile.in.orig	2011-05-06 10:46:48.000000000 +0200
--- gcc/Makefile.in	2011-05-06 12:16:07.000000000 +0200
*************** gimple-low.o : gimple-low.c $(CONFIG_H)
*** 2669,2675 ****
     $(DIAGNOSTIC_H) $(GIMPLE_H) $(TREE_INLINE_H) langhooks.h \
     $(LANGHOOKS_DEF_H) $(TREE_FLOW_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
     $(EXCEPT_H) $(FLAGS_H) $(RTL_H) $(FUNCTION_H) $(EXPR_H) $(TREE_PASS_H) \
!    $(HASHTAB_H) $(DIAGNOSTIC_CORE_H) tree-iterator.h
  omp-low.o : omp-low.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
     $(RTL_H) $(GIMPLE_H) $(TREE_INLINE_H) langhooks.h $(DIAGNOSTIC_CORE_H) \
     $(TREE_FLOW_H) $(TIMEVAR_H) $(FLAGS_H) $(EXPR_H) $(DIAGNOSTIC_CORE_H) \
--- 2669,2675 ----
     $(DIAGNOSTIC_H) $(GIMPLE_H) $(TREE_INLINE_H) langhooks.h \
     $(LANGHOOKS_DEF_H) $(TREE_FLOW_H) $(TIMEVAR_H) $(TM_H) coretypes.h \
     $(EXCEPT_H) $(FLAGS_H) $(RTL_H) $(FUNCTION_H) $(EXPR_H) $(TREE_PASS_H) \
!    $(HASHTAB_H) $(DIAGNOSTIC_CORE_H) tree-iterator.h tree-pretty-print.h
  omp-low.o : omp-low.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
     $(RTL_H) $(GIMPLE_H) $(TREE_INLINE_H) langhooks.h $(DIAGNOSTIC_CORE_H) \
     $(TREE_FLOW_H) $(TIMEVAR_H) $(FLAGS_H) $(EXPR_H) $(DIAGNOSTIC_CORE_H) \
Index: gcc/expr.c
===================================================================
*** gcc/expr.c.orig	2011-05-06 10:46:48.000000000 +0200
--- gcc/expr.c	2011-05-06 12:16:07.000000000 +0200
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7264,7270 ****
    /* An operation in what may be a bit-field type needs the
       result to be reduced to the precision of the bit-field type,
       which is narrower than that of the type's mode.  */
!   reduce_bit_field = (TREE_CODE (type) == INTEGER_TYPE
  		      && GET_MODE_PRECISION (mode) > TYPE_PRECISION (type));
  
    if (reduce_bit_field && modifier == EXPAND_STACK_PARM)
--- 7264,7270 ----
    /* An operation in what may be a bit-field type needs the
       result to be reduced to the precision of the bit-field type,
       which is narrower than that of the type's mode.  */
!   reduce_bit_field = (INTEGRAL_TYPE_P (type)
  		      && GET_MODE_PRECISION (mode) > TYPE_PRECISION (type));
  
    if (reduce_bit_field && modifier == EXPAND_STACK_PARM)
*************** expand_expr_real_1 (tree exp, rtx target
*** 8330,8336 ****
       result to be reduced to the precision of the bit-field type,
       which is narrower than that of the type's mode.  */
    reduce_bit_field = (!ignore
! 		      && TREE_CODE (type) == INTEGER_TYPE
  		      && GET_MODE_PRECISION (mode) > TYPE_PRECISION (type));
  
    /* If we are going to ignore this result, we need only do something
--- 8330,8336 ----
       result to be reduced to the precision of the bit-field type,
       which is narrower than that of the type's mode.  */
    reduce_bit_field = (!ignore
! 		      && INTEGRAL_TYPE_P (type)
  		      && GET_MODE_PRECISION (mode) > TYPE_PRECISION (type));
  
    /* If we are going to ignore this result, we need only do something
Index: gcc/gimple.c
===================================================================
*** gcc/gimple.c.orig	2011-04-21 16:33:55.000000000 +0200
--- gcc/gimple.c	2011-05-06 14:14:01.000000000 +0200
*************** canonicalize_cond_expr_cond (tree t)
*** 3137,3152 ****
        && truth_value_p (TREE_CODE (TREE_OPERAND (t, 0))))
      t = TREE_OPERAND (t, 0);
  
-   /* For (bool)x use x != 0.  */
-   if (CONVERT_EXPR_P (t)
-       && TREE_CODE (TREE_TYPE (t)) == BOOLEAN_TYPE)
-     {
-       tree top0 = TREE_OPERAND (t, 0);
-       t = build2 (NE_EXPR, TREE_TYPE (t),
- 		  top0, build_int_cst (TREE_TYPE (top0), 0));
-     }
    /* For !x use x == 0.  */
!   else if (TREE_CODE (t) == TRUTH_NOT_EXPR)
      {
        tree top0 = TREE_OPERAND (t, 0);
        t = build2 (EQ_EXPR, TREE_TYPE (t),
--- 3137,3144 ----
        && truth_value_p (TREE_CODE (TREE_OPERAND (t, 0))))
      t = TREE_OPERAND (t, 0);
  
    /* For !x use x == 0.  */
!   if (TREE_CODE (t) == TRUTH_NOT_EXPR)
      {
        tree top0 = TREE_OPERAND (t, 0);
        t = build2 (EQ_EXPR, TREE_TYPE (t),
Index: gcc/testsuite/gcc.dg/tree-ssa/pr45144.c
===================================================================
*** gcc/testsuite/gcc.dg/tree-ssa/pr45144.c.orig	2011-04-20 10:52:16.000000000 +0200
--- gcc/testsuite/gcc.dg/tree-ssa/pr45144.c	2011-05-06 13:51:29.000000000 +0200
*************** union TMP
*** 22,47 ****
  static unsigned
  foo (struct A *p)
  {
!   union TMP t;
!   struct A x;
    
!   x = *p;
!   t.a = x;
!   return t.b;
  }
  
  void
  bar (unsigned orig, unsigned *new)
  {
!   struct A a;
!   union TMP s;
  
!   s.b = orig;
!   a = s.a;
!   if (a.a1)
!     baz (a.a2);
!   *new = foo (&a);
  }
  
! /* { dg-final { scan-tree-dump " = VIEW_CONVERT_EXPR<unsigned int>\\(a\\);" "optimized"} } */
  /* { dg-final { cleanup-tree-dump "optimized" } } */
--- 22,48 ----
  static unsigned
  foo (struct A *p)
  {
!   union TMP tmpvar;
!   struct A avar;
    
!   avar = *p;
!   tmpvar.a = avar;
!   return tmpvar.b;
  }
  
  void
  bar (unsigned orig, unsigned *new)
  {
!   struct A avar;
!   union TMP tmpvar;
  
!   tmpvar.b = orig;
!   avar = tmpvar.a;
!   if (avar.a1)
!     baz (avar.a2);
!   *new = foo (&avar);
  }
  
! /* { dg-final { scan-tree-dump-not "avar" "optimized"} } */
! /* { dg-final { scan-tree-dump-not "tmpvar" "optimized"} } */
  /* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.target/i386/bitfield4.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.target/i386/bitfield4.c	2011-05-06 15:07:41.000000000 +0200
***************
*** 0 ****
--- 1,19 ----
+ /* PR48696 */
+ /* { dg-do compile } */
+ /* { dg-options "-O" } */
+ 
+ struct bad_gcc_code_generation {
+     unsigned type:6,
+ 	     pos:16,
+ 	     stream:10;
+ };
+ 
+ int
+ show_bug(struct bad_gcc_code_generation *a)
+ {
+   /* Avoid store-forwarding failure due to access size mismatch.  */
+   a->type = 0;
+   return a->pos;
+ }
+ 
+ /* { dg-final { scan-assembler-not "andb" } } */
Index: gcc/tree-ssa-forwprop.c
===================================================================
*** gcc/tree-ssa-forwprop.c.orig	2011-05-04 11:07:08.000000000 +0200
--- gcc/tree-ssa-forwprop.c	2011-05-06 17:00:44.000000000 +0200
*************** out:
*** 1938,1943 ****
--- 1938,2103 ----
    return false;
  }
  
+ /* Combine two conversions in a row for the second conversion at *GSI.
+    Returns true if there were any changes made.  */
+  
+ static bool
+ combine_conversions (gimple_stmt_iterator *gsi)
+ {
+   gimple stmt = gsi_stmt (*gsi);
+   gimple def_stmt;
+   tree op0, lhs;
+   enum tree_code code = gimple_assign_rhs_code (stmt);
+ 
+   gcc_checking_assert (CONVERT_EXPR_CODE_P (code)
+ 		       || code == FLOAT_EXPR
+ 		       || code == FIX_TRUNC_EXPR);
+ 
+   lhs = gimple_assign_lhs (stmt);
+   op0 = gimple_assign_rhs1 (stmt);
+   if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (op0)))
+     {
+       gimple_assign_set_rhs_code (stmt, TREE_CODE (op0));
+       return true;
+     }
+ 
+   if (TREE_CODE (op0) != SSA_NAME)
+     return false;
+ 
+   def_stmt = SSA_NAME_DEF_STMT (op0);
+   if (!is_gimple_assign (def_stmt))
+     return false;
+ 
+   if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
+     {
+       tree defop0 = gimple_assign_rhs1 (def_stmt);
+       tree type = TREE_TYPE (lhs);
+       tree inside_type = TREE_TYPE (defop0);
+       tree inter_type = TREE_TYPE (op0);
+       int inside_int = INTEGRAL_TYPE_P (inside_type);
+       int inside_ptr = POINTER_TYPE_P (inside_type);
+       int inside_float = FLOAT_TYPE_P (inside_type);
+       int inside_vec = TREE_CODE (inside_type) == VECTOR_TYPE;
+       unsigned int inside_prec = TYPE_PRECISION (inside_type);
+       int inside_unsignedp = TYPE_UNSIGNED (inside_type);
+       int inter_int = INTEGRAL_TYPE_P (inter_type);
+       int inter_ptr = POINTER_TYPE_P (inter_type);
+       int inter_float = FLOAT_TYPE_P (inter_type);
+       int inter_vec = TREE_CODE (inter_type) == VECTOR_TYPE;
+       unsigned int inter_prec = TYPE_PRECISION (inter_type);
+       int inter_unsignedp = TYPE_UNSIGNED (inter_type);
+       int final_int = INTEGRAL_TYPE_P (type);
+       int final_ptr = POINTER_TYPE_P (type);
+       int final_float = FLOAT_TYPE_P (type);
+       int final_vec = TREE_CODE (type) == VECTOR_TYPE;
+       unsigned int final_prec = TYPE_PRECISION (type);
+       int final_unsignedp = TYPE_UNSIGNED (type);
+ 
+       /* In addition to the cases of two conversions in a row
+ 	 handled below, if we are converting something to its own
+ 	 type via an object of identical or wider precision, neither
+ 	 conversion is needed.  */
+       if (useless_type_conversion_p (type, inside_type)
+ 	  && (((inter_int || inter_ptr) && final_int)
+ 	      || (inter_float && final_float))
+ 	  && inter_prec >= final_prec)
+ 	{
+ 	  gimple_assign_set_rhs1 (stmt, unshare_expr (defop0));
+ 	  gimple_assign_set_rhs_code (stmt, TREE_CODE (defop0));
+ 	  update_stmt (stmt);
+ 	  return true;
+ 	}
+ 
+       /* Likewise, if the intermediate and initial types are either both
+ 	 float or both integer, we don't need the middle conversion if the
+ 	 former is wider than the latter and doesn't change the signedness
+ 	 (for integers).  Avoid this if the final type is a pointer since
+ 	 then we sometimes need the middle conversion.  Likewise if the
+ 	 final type has a precision not equal to the size of its mode.  */
+       if (((inter_int && inside_int)
+ 	   || (inter_float && inside_float)
+ 	   || (inter_vec && inside_vec))
+ 	  && inter_prec >= inside_prec
+ 	  && (inter_float || inter_vec
+ 	      || inter_unsignedp == inside_unsignedp)
+ 	  && ! (final_prec != GET_MODE_BITSIZE (TYPE_MODE (type))
+ 		&& TYPE_MODE (type) == TYPE_MODE (inter_type))
+ 	  && ! final_ptr
+ 	  && (! final_vec || inter_prec == inside_prec))
+ 	{
+ 	  gimple_assign_set_rhs1 (stmt, defop0);
+ 	  update_stmt (stmt);
+ 	  return true;
+ 	}
+ 
+       /* If we have a sign-extension of a zero-extended value, we can
+ 	 replace that by a single zero-extension.  */
+       if (inside_int && inter_int && final_int
+ 	  && inside_prec < inter_prec && inter_prec < final_prec
+ 	  && inside_unsignedp && !inter_unsignedp)
+ 	{
+ 	  gimple_assign_set_rhs1 (stmt, defop0);
+ 	  update_stmt (stmt);
+ 	  return true;
+ 	}
+ 
+       /* Two conversions in a row are not needed unless:
+ 	 - some conversion is floating-point (overstrict for now), or
+ 	 - some conversion is a vector (overstrict for now), or
+ 	 - the intermediate type is narrower than both initial and
+ 	 final, or
+ 	 - the intermediate type and innermost type differ in signedness,
+ 	 and the outermost type is wider than the intermediate, or
+ 	 - the initial type is a pointer type and the precisions of the
+ 	 intermediate and final types differ, or
+ 	 - the final type is a pointer type and the precisions of the
+ 	 initial and intermediate types differ.  */
+       if (! inside_float && ! inter_float && ! final_float
+ 	  && ! inside_vec && ! inter_vec && ! final_vec
+ 	  && (inter_prec >= inside_prec || inter_prec >= final_prec)
+ 	  && ! (inside_int && inter_int
+ 		&& inter_unsignedp != inside_unsignedp
+ 		&& inter_prec < final_prec)
+ 	  && ((inter_unsignedp && inter_prec > inside_prec)
+ 	      == (final_unsignedp && final_prec > inter_prec))
+ 	  && ! (inside_ptr && inter_prec != final_prec)
+ 	  && ! (final_ptr && inside_prec != inter_prec)
+ 	  && ! (final_prec != GET_MODE_BITSIZE (TYPE_MODE (type))
+ 		&& TYPE_MODE (type) == TYPE_MODE (inter_type)))
+ 	{
+ 	  gimple_assign_set_rhs1 (stmt, defop0);
+ 	  update_stmt (stmt);
+ 	  return true;
+ 	}
+ 
+       /* A truncation to an unsigned type should be canonicalized as
+ 	 bitwise and of a mask.  */
+       if (final_int && inter_int && inside_int
+ 	  && final_prec == inside_prec
+ 	  && final_prec > inter_prec
+ 	  && inter_unsignedp)
+ 	{
+ 	  tree tem;
+ 	  tem = fold_build2 (BIT_AND_EXPR, inside_type,
+ 			     defop0,
+ 			     double_int_to_tree
+ 			       (inside_type, double_int_mask (inter_prec)));
+ 	  if (!useless_type_conversion_p (type, inside_type))
+ 	    {
+ 	      tem = force_gimple_operand_gsi (gsi, tem, true, NULL_TREE, true,
+ 					      GSI_SAME_STMT);
+ 	      gimple_assign_set_rhs1 (stmt, tem);
+ 	    }
+ 	  else
+ 	    gimple_assign_set_rhs_from_tree (gsi, tem);
+ 	  update_stmt (gsi_stmt (*gsi));
+ 	  return true;
+ 	}
+     }
+ 
+   return false;
+ }
+ 
  /* Main entry point for the forward propagation optimizer.  */
  
  static unsigned int
*************** tree_ssa_forward_propagate_single_use_va
*** 2061,2066 ****
--- 2221,2233 ----
  		  cfg_changed |= associate_plusminus (stmt);
  		  gsi_next (&gsi);
  		}
+ 	      else if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))
+ 		       || gimple_assign_rhs_code (stmt) == FLOAT_EXPR
+ 		       || gimple_assign_rhs_code (stmt) == FIX_TRUNC_EXPR)
+ 		{
+ 		  if (!combine_conversions (&gsi))
+ 		    gsi_next (&gsi);
+ 		}
  	      else
  		gsi_next (&gsi);
  	    }

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-09 20:49     ` Jason Merrill
@ 2011-05-13 22:35       ` Aldy Hernandez
  2011-05-16 21:20         ` Aldy Hernandez
  2011-05-19  7:17         ` Jason Merrill
  0 siblings, 2 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-05-13 22:35 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]

On 05/09/11 14:23, Jason Merrill wrote:
>  From a quick look it seems that this patch considers bitfields
> following the one we're deliberately touching, but not previous
> bitfields in the same memory location; we need to include those as well.
> With your struct foo, the bits touched are the same regardless of
> whether we name .a or .b.

Thanks all for looking into this.

Attached is a new patch that takes into account the previous bitfields 
as well.

If I understand Jakub correctly, this patch also fixes the relative bit 
position problem he pointed out, by virtue of no longer using a relative 
bit position but the difference between the start of the memory region 
and the end.

I have hand tested various bitfield combinations (at the beginning, at 
the end, in the middle, with and without zero-lengthened bitfields, etc 
etc).  There is also the generic x86 test in the previous incantation.

Bootstrapped without any issues.  Running the entire testsuite with 
--param=allow-store-data-races=0 is still in progress.

How does this one look?

p.s. I would like to address BIT_FIELD_REF in a followup patch to keep 
things simple.

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 27777 bytes --]

	* params.h (ALLOW_STORE_DATA_RACES): New.
	* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
	* Makefile.in (expr.o): Depend on PARAMS_H.
	* machmode.h (get_best_mode): Add argument.
	* fold-const.c (optimize_bit_field_compare): Add argument to
	get_best_mode.
	(fold_truthop): Same.
	* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
	* expr.c (emit_group_store): Same.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Add argument.
	Add argument to get_best_mode.
	(max_field_size): New.
	(expand_assignment): Calculate maxbits and pass it down
	accordingly.
	(store_field): New argument.
	(expand_expr_real_2): New argument to store_field.
	Include params.h.
	* expr.h (store_bit_field): New argument.
	* stor-layout.c (get_best_mode): Restrict mode expansion by taking
	into account maxbits.
	* calls.c (store_unaligned_arguments_into_pseudos): New argument
	to store_bit_field.
	* expmed.c (store_bit_field_1): New argument.  Use it.
	(store_bit_field): Same.
	(store_fixed_bit_field): Same.
	(store_split_bit_field): Same.
	(extract_bit_field_1): Pass new argument to get_best_mode.
	(extract_bit_field): Same.
	* stmt.c (store_bit_field): Pass new argument to store_bit_field.
	* tree.h (DECL_THREAD_VISIBLE_P): New.
	* doc/invoke.texi: Document parameter allow-store-data-races.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 173263)
+++ doc/invoke.texi	(working copy)
@@ -8886,6 +8886,11 @@ The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @end table
 @end table
 
Index: machmode.h
===================================================================
--- machmode.h	(revision 173263)
+++ machmode.h	(working copy)
@@ -248,7 +248,9 @@ extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+					unsigned HOST_WIDE_INT,
+					unsigned int,
 					enum machine_mode, int);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: tree.h
===================================================================
--- tree.h	(revision 173263)
+++ tree.h	(working copy)
@@ -3156,6 +3156,10 @@ struct GTY(()) tree_parm_decl {
 #define DECL_THREAD_LOCAL_P(NODE) \
   (VAR_DECL_CHECK (NODE)->decl_with_vis.tls_model >= TLS_MODEL_REAL)
 
+/* Return true if a VAR_DECL is visible from another thread.  */
+#define DECL_THREAD_VISIBLE_P(NODE) \
+  (TREE_STATIC (NODE) && !DECL_THREAD_LOCAL_P (NODE))
+
 /* In a non-local VAR_DECL with static storage duration, true if the
    variable has an initialization priority.  If false, the variable
    will be initialized at the DEFAULT_INIT_PRIORITY.  */
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 173263)
+++ fold-const.c	(working copy)
@@ -3409,7 +3409,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos,
+    nmode = get_best_mode (lbitsize, lbitpos, 0,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5237,7 +5237,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5302,7 +5302,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: params.h
===================================================================
--- params.h	(revision 173263)
+++ params.h	(working copy)
@@ -206,4 +206,6 @@ extern void init_param_values (int *para
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ALLOW_STORE_DATA_RACES \
+  PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)
 #endif /* ! GCC_PARAMS_H */
Index: testsuite/gcc.dg/20110509.c
===================================================================
--- testsuite/gcc.dg/20110509.c	(revision 0)
+++ testsuite/gcc.dg/20110509.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.A.  */
+
+struct S
+{
+  volatile unsigned int a : 4;
+  unsigned char b;
+  unsigned int c : 6;
+} var;
+
+void set_a()
+{
+  var.a = 12;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 173263)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,7 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, GET_MODE (x), y);
+	      store_bit_field (op, size, start, 0, GET_MODE (x), y);
 	      return;
 	    }
 
@@ -939,7 +939,7 @@ noce_emit_move_insn (rtx x, rtx y)
   inner = XEXP (outer, 0);
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
-  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, outmode, y);
+  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 173263)
+++ expr.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  
 #include "diagnostic.h"
 #include "ssaexpand.h"
 #include "target-globals.h"
+#include "params.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -142,7 +143,8 @@ static void store_constructor_field (rtx
 				     HOST_WIDE_INT, enum machine_mode,
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, enum machine_mode,
+static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
+			unsigned HOST_WIDE_INT, enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -2063,7 +2065,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 mode, tmps[i]);
+			 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2157,7 +2159,7 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2794,7 +2796,7 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3929,6 +3931,7 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
+				 unsigned HOST_WIDE_INT maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -3989,7 +3992,7 @@ optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
-      str_mode = get_best_mode (bitsize, bitpos,
+      str_mode = get_best_mode (bitsize, bitpos, maxbits,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4098,6 +4101,103 @@ optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
+/* In the C++ memory model, consecutive bit fields in a structure are
+   considered one memory location.
+
+   Given a COMPONENT_REF, this function returns the maximum number of
+   bits we are allowed to store into, when storing into the
+   COMPONENT_REF.  We return 0, if there is no restriction.
+
+   EXP is the COMPONENT_REF.
+   INNERDECL is actual object being referenced.
+   BITPOS is the position in bits where the bit starts within the structure.
+   BITSIZE is size in bits of the field being referenced in EXP.
+
+   For example, while storing into FOO.A here...
+
+      struct {
+        BIT 0:
+          unsigned int a : 4;
+	  unsigned int b : 1;
+	BIT 8:
+	  unsigned char c;
+	  unsigned int d : 6;
+      } foo;
+
+   ...we are not allowed to store past <b>, so for the layout above,
+   we would return 8 maximum bits (because who cares if we store into
+   the padding).  */
+
+static unsigned HOST_WIDE_INT
+max_field_size (tree exp, tree innerdecl,
+		HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+{
+  tree field, record_type, fld;
+  bool found_field = false;
+  bool prev_field_is_bitfield;
+  /* Starting bitpos for the current memory location.  */
+  int start_bitpos ;
+
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+
+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || !DECL_THREAD_VISIBLE_P (innerdecl))
+    return 0;
+
+  /* Bit field we're storing into.  */
+  field = TREE_OPERAND (exp, 1);
+  record_type = DECL_FIELD_CONTEXT (field);
+
+  /* Count the contiguous bitfields for the memory location that
+     contains FIELD.  */
+  start_bitpos = 0;
+  prev_field_is_bitfield = true;
+  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
+    {
+      tree t, offset;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
+		  unshare_expr (TREE_OPERAND (exp, 0)),
+		  fld, NULL_TREE);
+      get_inner_reference (t, &bitsize, &bitpos, &offset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      if (field == fld)
+	found_field = true;
+
+      if (DECL_BIT_FIELD (fld) && bitsize > 0)
+	{
+	  if (prev_field_is_bitfield == false)
+	    {
+	      start_bitpos = bitpos;
+	      prev_field_is_bitfield = true;
+	    }
+	}
+      else
+	{
+	  prev_field_is_bitfield = false;
+	  if (found_field)
+	    break;
+	}
+    }
+  gcc_assert (found_field);
+
+  if (fld)
+    {
+      /* We found the end of the bit field sequence.  Include the
+	 padding up to the next field and be done.  */
+      return bitpos - start_bitpos;
+    }
+  /* If this is the last element in the structure, include the padding
+     at the end of structure.  */
+  return TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - start_bitpos;
+}
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
@@ -4197,6 +4297,9 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
+      /* Max consecutive bits we are allowed to touch while storing
+	 into TO.  */
+      HOST_WIDE_INT maxbits = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4206,6 +4309,10 @@ expand_assignment (tree to, tree from, b
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
 				 &unsignedp, &volatilep, true);
 
+      if (TREE_CODE (to) == COMPONENT_REF
+	  && DECL_BIT_FIELD (TREE_OPERAND (to, 1)))
+	maxbits = max_field_size (to, tem, bitpos, bitsize);
+
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
 
@@ -4286,12 +4393,13 @@ expand_assignment (tree to, tree from, b
 	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
-	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
+	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos, maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2, mode1, from,
+				  bitpos - mode_bitsize / 2, maxbits,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
@@ -4312,7 +4420,8 @@ expand_assignment (tree to, tree from, b
 					    0);
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
-	      result = store_field (temp, bitsize, bitpos, mode1, from,
+	      result = store_field (temp, bitsize, bitpos, maxbits,
+				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
@@ -4337,11 +4446,12 @@ expand_assignment (tree to, tree from, b
 		MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 	    }
 
-	  if (optimize_bitfield_assignment_op (bitsize, bitpos, mode1,
+	  if (optimize_bitfield_assignment_op (bitsize, bitpos, maxbits, mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
-	    result = store_field (to_rtx, bitsize, bitpos, mode1, from,
+	    result = store_field (to_rtx, bitsize, bitpos, maxbits,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	}
@@ -4734,7 +4844,7 @@ store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, GET_MODE (temp), temp);
+			     0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5177,7 +5287,8 @@ store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, mode, exp, type, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, mode, exp, type, alias_set,
+		 false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5751,6 +5862,8 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
+   MAXBITS is the number of bits we can store into, 0 if no limit.
+
    Always return const0_rtx unless we have something particular to
    return.
 
@@ -5764,6 +5877,7 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+	     unsigned HOST_WIDE_INT maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5796,8 +5910,8 @@ store_field (rtx target, HOST_WIDE_INT b
       if (bitsize != (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (target)))
 	emit_move_insn (object, target);
 
-      store_field (blk_object, bitsize, bitpos, mode, exp, type, alias_set,
-		   nontemporal);
+      store_field (blk_object, bitsize, bitpos, maxbits,
+		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
 
@@ -5911,7 +6025,7 @@ store_field (rtx target, HOST_WIDE_INT b
 	}
 
       /* Store the value in the bitfield.  */
-      store_bit_field (target, bitsize, bitpos, mode, temp);
+      store_bit_field (target, bitsize, bitpos, maxbits, mode, temp);
 
       return const0_rtx;
     }
@@ -7323,7 +7437,7 @@ expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, TYPE_MODE (valtype), treeop0,
+			   0, 0, TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 173263)
+++ expr.h	(working copy)
@@ -665,7 +665,8 @@ extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT, enum machine_mode, rtx);
+			     unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
 			      enum machine_mode, enum machine_mode);
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 173263)
+++ stor-layout.c	(working copy)
@@ -2428,6 +2428,9 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
+   MAXBITS is the maximum number of bits we are allowed to touch, when
+   referencing this bit field.  MAXBITS is 0 if there is no limit.
+
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2445,7 +2448,8 @@ fixup_unsigned_type (tree type)
    decide which of the above modes should be used.  */
 
 enum machine_mode
-get_best_mode (int bitsize, int bitpos, unsigned int align,
+get_best_mode (int bitsize, int bitpos, unsigned HOST_WIDE_INT maxbits,
+	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
@@ -2484,6 +2488,7 @@ get_best_mode (int bitsize, int bitpos, 
 	  if (bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
+	      && (!maxbits || unit <= maxbits)
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 173263)
+++ calls.c	(working copy)
@@ -909,7 +909,7 @@ store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, word_mode,
+	    store_bit_field (reg, bitsize, endian_correction, 0, word_mode,
 			     word);
 	  }
       }
Index: expmed.c
===================================================================
--- expmed.c	(revision 173263)
+++ expmed.c	(working copy)
@@ -47,9 +47,13 @@ struct target_expmed *this_target_expmed
 
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT,
@@ -333,7 +337,9 @@ mode_for_extraction (enum extraction_pat
 
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		   unsigned HOST_WIDE_INT bitnum,
+		   unsigned HOST_WIDE_INT maxbits,
+		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -547,7 +553,9 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
-				  bitnum + bit_offset, word_mode,
+				  bitnum + bit_offset,
+				  maxbits,
+				  word_mode,
 				  value_word, fallback_p))
 	    {
 	      delete_insns_since (last);
@@ -718,9 +726,10 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
+	  || (maxbits && GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, maxbits, MEM_ALIGN (op0),
 				  (op_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : op_mode),
 				  MEM_VOLATILE_P (op0));
@@ -748,7 +757,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  /* Fetch that unit, store the bitfield in it, then store
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
-	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
+	  if (store_bit_field_1 (tempreg, bitsize, xbitpos, maxbits,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -761,21 +770,28 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
-  store_fixed_bit_field (op0, offset, bitsize, bitpos, value);
+  store_fixed_bit_field (op0, offset, bitsize, bitpos, maxbits, value);
   return true;
 }
 
 /* Generate code to store value from rtx VALUE
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
+
+   MAXBITS is the maximum number of bits we are allowed to store into,
+   0 if no limit.
+
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		 unsigned HOST_WIDE_INT bitnum,
+		 unsigned HOST_WIDE_INT maxbits,
+		 enum machine_mode fieldmode,
 		 rtx value)
 {
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, fieldmode, value, true))
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, maxbits,
+			  fieldmode, value, true))
     gcc_unreachable ();
 }
 \f
@@ -791,7 +807,9 @@ store_bit_field (rtx str_rtx, unsigned H
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT maxbits,
+		       rtx value)
 {
   enum machine_mode mode;
   unsigned int total_bits = BITS_PER_WORD;
@@ -812,7 +830,7 @@ store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos, value);
+	  store_split_bit_field (op0, bitsize, bitpos, maxbits, value);
 	  return;
 	}
     }
@@ -830,10 +848,12 @@ store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
+	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
 	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+			      maxbits,
 			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
@@ -841,7 +861,7 @@ store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 value);
+				 maxbits, value);
 	  return;
 	}
 
@@ -961,7 +981,9 @@ store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT maxbits,
+		       rtx value)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1076,7 +1098,7 @@ store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, part);
+			       thispos, maxbits, part);
       bitsdone += thissize;
     }
 }
@@ -1520,7 +1542,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, 0, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1646,7 +1668,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 173263)
+++ Makefile.in	(working copy)
@@ -2916,7 +2916,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    typeclass.h hard-reg-set.h toplev.h $(DIAGNOSTIC_CORE_H) hard-reg-set.h $(EXCEPT_H) \
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
-   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H)
+   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) $(PARAMS_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
Index: stmt.c
===================================================================
--- stmt.c	(revision 173263)
+++ stmt.c	(working copy)
@@ -1758,7 +1758,7 @@ expand_return (tree retval)
 
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
-	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, word_mode,
+	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 173263)
+++ params.def	(working copy)
@@ -884,6 +884,13 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
           "Maximum number of conditional store pairs that can be sunk",
           2, 0, 0)
 
+/* Data race flags for C++0x memory model compliance.  */
+
+DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
+	  "allow-store-data-races",
+	  "Allow new data races on stores to be introduced",
+	  1, 0, 1)
+
 
 /*
 Local variables:

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-13 22:35       ` Aldy Hernandez
@ 2011-05-16 21:20         ` Aldy Hernandez
  2011-05-19  7:17         ` Jason Merrill
  1 sibling, 0 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-05-16 21:20 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek


> Bootstrapped without any issues. Running the entire testsuite with
> --param=allow-store-data-races=0 is still in progress.

BTW, no regressions, even running the entire thing at 
--param=allow-store-data-races=0 to force testing this new bitfield 
implementation on all tests.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-13 22:35       ` Aldy Hernandez
  2011-05-16 21:20         ` Aldy Hernandez
@ 2011-05-19  7:17         ` Jason Merrill
  2011-05-20  9:21           ` Aldy Hernandez
  1 sibling, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-05-19  7:17 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

It seems like you're calculating maxbits correctly now, but an access 
doesn't necessarily start from the beginning of the sequence of 
bit-fields, especially given store_split_bit_field.  That is,

struct A
{
   int i;
   int j: 32;
   int k: 8;
   char c[2];
};

Here maxbits would be 40, so we decide that it's OK to use SImode to 
access the word starting with k, and clobber c in the process.  Am I wrong?

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-19  7:17         ` Jason Merrill
@ 2011-05-20  9:21           ` Aldy Hernandez
  2011-05-26 18:05             ` Jason Merrill
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-05-20  9:21 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

On 05/18/11 16:58, Jason Merrill wrote:
> It seems like you're calculating maxbits correctly now, but an access
> doesn't necessarily start from the beginning of the sequence of
> bit-fields, especially given store_split_bit_field. That is,

This is what I was trying to explain to you on irc.  And I obviously 
muffed up the whole explanation :).

>
> struct A
> {
> int i;
> int j: 32;
> int k: 8;
> char c[2];
> };
>
> Here maxbits would be 40, so we decide that it's OK to use SImode to
> access the word starting with k, and clobber c in the process. Am I wrong?

You are correct.  I have redesigned the patch to pass around starting 
and ending bit positions, so get_best_mode() can make a more informed 
decision.

I also started using DECL_BIT_FIELD_TYPE instead of DECL_BIT_FIELD to 
determine if a DECL is a bit field.  It turns out DECL_BIT_FIELD is not 
set for bit fields with mode sized number of bits (32-bits, 16-bits, etc).

Furthermore, I added another test to check the above scenario.

Bootstrapped and tested on x86-64 with --param=allow-store-data-races=0.

How do you like these apples?

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 31414 bytes --]

	* params.h (ALLOW_STORE_DATA_RACES): New.
	* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
	* Makefile.in (expr.o): Depend on PARAMS_H.
	* machmode.h (get_best_mode): Add argument.
	* fold-const.c (optimize_bit_field_compare): Add argument to
	get_best_mode.
	(fold_truthop): Same.
	* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
	* expr.c (emit_group_store): Same.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Add argument.
	Add argument to get_best_mode.
	(get_bit_range): New.
	(expand_assignment): Calculate maxbits and pass it down
	accordingly.
	(store_field): New argument.
	(expand_expr_real_2): New argument to store_field.
	Include params.h.
	* expr.h (store_bit_field): New argument.
	* stor-layout.c (get_best_mode): Restrict mode expansion by taking
	into account maxbits.
	* calls.c (store_unaligned_arguments_into_pseudos): New argument
	to store_bit_field.
	* expmed.c (store_bit_field_1): New argument.  Use it.
	(store_bit_field): Same.
	(store_fixed_bit_field): Same.
	(store_split_bit_field): Same.
	(extract_bit_field_1): Pass new argument to get_best_mode.
	(extract_bit_field): Same.
	* stmt.c (store_bit_field): Pass new argument to store_bit_field.
	* tree.h (DECL_THREAD_VISIBLE_P): New.
	* doc/invoke.texi: Document parameter allow-store-data-races.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 173263)
+++ doc/invoke.texi	(working copy)
@@ -8886,6 +8886,11 @@ The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @end table
 @end table
 
Index: machmode.h
===================================================================
--- machmode.h	(revision 173263)
+++ machmode.h	(working copy)
@@ -248,7 +248,10 @@ extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned int,
 					enum machine_mode, int);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: tree.h
===================================================================
--- tree.h	(revision 173263)
+++ tree.h	(working copy)
@@ -3156,6 +3156,10 @@ struct GTY(()) tree_parm_decl {
 #define DECL_THREAD_LOCAL_P(NODE) \
   (VAR_DECL_CHECK (NODE)->decl_with_vis.tls_model >= TLS_MODEL_REAL)
 
+/* Return true if a VAR_DECL is visible from another thread.  */
+#define DECL_THREAD_VISIBLE_P(NODE) \
+  (TREE_STATIC (NODE) && !DECL_THREAD_LOCAL_P (NODE))
+
 /* In a non-local VAR_DECL with static storage duration, true if the
    variable has an initialization priority.  If false, the variable
    will be initialized at the DEFAULT_INIT_PRIORITY.  */
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 173263)
+++ fold-const.c	(working copy)
@@ -3409,7 +3409,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos,
+    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5237,7 +5237,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5302,7 +5302,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: params.h
===================================================================
--- params.h	(revision 173263)
+++ params.h	(working copy)
@@ -206,4 +206,6 @@ extern void init_param_values (int *para
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ALLOW_STORE_DATA_RACES \
+  PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)
 #endif /* ! GCC_PARAMS_H */
Index: testsuite/gcc.dg/20110509.c
===================================================================
--- testsuite/gcc.dg/20110509.c	(revision 0)
+++ testsuite/gcc.dg/20110509.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.A.  */
+
+struct S
+{
+  volatile unsigned int a : 4;
+  unsigned char b;
+  unsigned int c : 6;
+} var;
+
+void set_a()
+{
+  var.a = 12;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: testsuite/gcc.dg/20110509-2.c
===================================================================
--- testsuite/gcc.dg/20110509-2.c	(revision 0)
+++ testsuite/gcc.dg/20110509-2.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.K.  */
+
+struct S
+{
+  volatile int i;
+  volatile int j: 32;
+  volatile int k: 15;
+  volatile char c[2];
+} var;
+
+void setit()
+{
+  var.k = 13;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 173263)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,7 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, GET_MODE (x), y);
+	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y);
 	      return;
 	    }
 
@@ -939,7 +939,8 @@ noce_emit_move_insn (rtx x, rtx y)
   inner = XEXP (outer, 0);
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
-  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, outmode, y);
+  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
+		   0, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 173263)
+++ expr.c	(working copy)
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  
 #include "diagnostic.h"
 #include "ssaexpand.h"
 #include "target-globals.h"
+#include "params.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -142,7 +143,9 @@ static void store_constructor_field (rtx
 				     HOST_WIDE_INT, enum machine_mode,
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, enum machine_mode,
+static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
+			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -2063,7 +2066,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 mode, tmps[i]);
+			 0, 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2157,7 +2160,7 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2794,7 +2797,7 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3929,6 +3932,8 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
+				 unsigned HOST_WIDE_INT bitregion_start,
+				 unsigned HOST_WIDE_INT bitregion_end,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -3990,6 +3995,7 @@ optimize_bitfield_assignment_op (unsigne
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
       str_mode = get_best_mode (bitsize, bitpos,
+				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4098,6 +4104,111 @@ optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
+/* In the C++ memory model, consecutive bit fields in a structure are
+   considered one memory location.
+
+   Given a COMPONENT_REF, this function returns the bit range of
+   consecutive bits in which this COMPONENT_REF belongs in.  The
+   values are returned in *BITSTART and *BITEND.  If either the C++
+   memory model is not in activated, or this memory access is not
+   thread visible, 0 is returned in *BITSTART and *BITEND.
+
+   EXP is the COMPONENT_REF.
+   INNERDECL is the actual object being referenced.
+   BITPOS is the position in bits where the bit starts within the structure.
+   BITSIZE is size in bits of the field being referenced in EXP.
+
+   For example, while storing into FOO.A here...
+
+      struct {
+        BIT 0:
+          unsigned int a : 4;
+	  unsigned int b : 1;
+	BIT 8:
+	  unsigned char c;
+	  unsigned int d : 6;
+      } foo;
+
+   ...we are not allowed to store past <b>, so for the layout above, a
+   range of 0..7 (because no one cares if we store into the
+   padding).  */
+
+static void
+get_bit_range (unsigned HOST_WIDE_INT *bitstart,
+	       unsigned HOST_WIDE_INT *bitend,
+	       tree exp, tree innerdecl,
+	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+{
+  tree field, record_type, fld;
+  bool found_field = false;
+  bool prev_field_is_bitfield;
+
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+
+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || !DECL_THREAD_VISIBLE_P (innerdecl))
+    {
+      *bitstart = *bitend = 0;
+      return;
+    }
+
+  /* Bit field we're storing into.  */
+  field = TREE_OPERAND (exp, 1);
+  record_type = DECL_FIELD_CONTEXT (field);
+
+  /* Count the contiguous bitfields for the memory location that
+     contains FIELD.  */
+  *bitstart = 0;
+  prev_field_is_bitfield = true;
+  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
+    {
+      tree t, offset;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
+		  unshare_expr (TREE_OPERAND (exp, 0)),
+		  fld, NULL_TREE);
+      get_inner_reference (t, &bitsize, &bitpos, &offset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      if (field == fld)
+	found_field = true;
+
+      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+	{
+	  if (prev_field_is_bitfield == false)
+	    {
+	      *bitstart = bitpos;
+	      prev_field_is_bitfield = true;
+	    }
+	}
+      else
+	{
+	  prev_field_is_bitfield = false;
+	  if (found_field)
+	    break;
+	}
+    }
+  gcc_assert (found_field);
+
+  if (fld)
+    {
+      /* We found the end of the bit field sequence.  Include the
+	 padding up to the next field and be done.  */
+      *bitend = bitpos - 1;
+    }
+  else
+    {
+      /* If this is the last element in the structure, include the padding
+	 at the end of structure.  */
+      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type));
+    }
+}
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
@@ -4197,6 +4308,8 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
+      unsigned HOST_WIDE_INT bitregion_start = 0;
+      unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4206,6 +4319,11 @@ expand_assignment (tree to, tree from, b
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
 				 &unsignedp, &volatilep, true);
 
+      if (TREE_CODE (to) == COMPONENT_REF
+	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
+	get_bit_range (&bitregion_start, &bitregion_end,
+		       to, tem, bitpos, bitsize);
+
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
 
@@ -4287,11 +4405,14 @@ expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
+				  bitregion_start, bitregion_end,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2, mode1, from,
+				  bitpos - mode_bitsize / 2,
+				  bitregion_start, bitregion_end,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
@@ -4312,7 +4433,9 @@ expand_assignment (tree to, tree from, b
 					    0);
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
-	      result = store_field (temp, bitsize, bitpos, mode1, from,
+	      result = store_field (temp, bitsize, bitpos,
+				    bitregion_start, bitregion_end,
+				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
@@ -4337,11 +4460,15 @@ expand_assignment (tree to, tree from, b
 		MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 	    }
 
-	  if (optimize_bitfield_assignment_op (bitsize, bitpos, mode1,
+	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
+					       bitregion_start, bitregion_end,
+					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
-	    result = store_field (to_rtx, bitsize, bitpos, mode1, from,
+	    result = store_field (to_rtx, bitsize, bitpos,
+				  bitregion_start, bitregion_end,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	}
@@ -4734,7 +4861,7 @@ store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, GET_MODE (temp), temp);
+			     0, 0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5177,7 +5304,8 @@ store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, mode, exp, type, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, 0, mode, exp, type, alias_set,
+		 false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5751,6 +5879,11 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
+   BITREGION_START is bitpos of the first bitfield in this region.
+   BITREGION_END is the bitpos of the ending bitfield in this region.
+   These two fields are 0, if the C++ memory model does not apply,
+   or we are not interested in keeping track of bitfield regions.
+
    Always return const0_rtx unless we have something particular to
    return.
 
@@ -5764,6 +5897,8 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+	     unsigned HOST_WIDE_INT bitregion_start,
+	     unsigned HOST_WIDE_INT bitregion_end,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5796,8 +5931,9 @@ store_field (rtx target, HOST_WIDE_INT b
       if (bitsize != (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (target)))
 	emit_move_insn (object, target);
 
-      store_field (blk_object, bitsize, bitpos, mode, exp, type, alias_set,
-		   nontemporal);
+      store_field (blk_object, bitsize, bitpos,
+		   bitregion_start, bitregion_end,
+		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
 
@@ -5911,7 +6047,9 @@ store_field (rtx target, HOST_WIDE_INT b
 	}
 
       /* Store the value in the bitfield.  */
-      store_bit_field (target, bitsize, bitpos, mode, temp);
+      store_bit_field (target, bitsize, bitpos,
+		       bitregion_start, bitregion_end,
+		       mode, temp);
 
       return const0_rtx;
     }
@@ -7323,7 +7461,7 @@ expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, TYPE_MODE (valtype), treeop0,
+			   0, 0, 0, TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 173263)
+++ expr.h	(working copy)
@@ -665,7 +665,10 @@ extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT, enum machine_mode, rtx);
+			     unsigned HOST_WIDE_INT,
+			     unsigned HOST_WIDE_INT,
+			     unsigned HOST_WIDE_INT,
+			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
 			      enum machine_mode, enum machine_mode);
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 173263)
+++ stor-layout.c	(working copy)
@@ -2428,6 +2428,13 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
+   BITREGION_START is the bit position of the first bit in this
+   sequence of bit fields.  BITREGION_END is the last bit in this
+   sequence.  If these two fields are non-zero, we should restrict the
+   memory access to a maximum sized chunk of
+   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
+   any adjacent non bit-fields.
+
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2445,11 +2452,23 @@ fixup_unsigned_type (tree type)
    decide which of the above modes should be used.  */
 
 enum machine_mode
-get_best_mode (int bitsize, int bitpos, unsigned int align,
+get_best_mode (int bitsize, int bitpos,
+	       unsigned HOST_WIDE_INT bitregion_start,
+	       unsigned HOST_WIDE_INT bitregion_end,
+	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
+  unsigned HOST_WIDE_INT maxbits;
+
+  /* If unset, no restriction.  */
+  if (!bitregion_end)
+    maxbits = 0;
+  else if ((unsigned) bitpos < bitregion_start)
+    maxbits = bitregion_end - bitregion_start + 1;
+  else
+    maxbits = bitregion_end - bitpos + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2484,6 +2503,7 @@ get_best_mode (int bitsize, int bitpos, 
 	  if (bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
+	      && (!maxbits || unit <= maxbits)
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 173263)
+++ calls.c	(working copy)
@@ -909,8 +909,8 @@ store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, word_mode,
-			     word);
+	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
+			     word_mode, word);
 	  }
       }
 }
Index: expmed.c
===================================================================
--- expmed.c	(revision 173263)
+++ expmed.c	(working copy)
@@ -47,9 +47,15 @@ struct target_expmed *this_target_expmed
 
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT,
@@ -333,7 +339,10 @@ mode_for_extraction (enum extraction_pat
 
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		   unsigned HOST_WIDE_INT bitnum,
+		   unsigned HOST_WIDE_INT bitregion_start,
+		   unsigned HOST_WIDE_INT bitregion_end,
+		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -547,7 +556,9 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
-				  bitnum + bit_offset, word_mode,
+				  bitnum + bit_offset,
+				  bitregion_start, bitregion_end,
+				  word_mode,
 				  value_word, fallback_p))
 	    {
 	      delete_insns_since (last);
@@ -711,6 +722,12 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
+      unsigned HOST_WIDE_INT maxbits;
+
+      if (bitnum < bitregion_start)
+	maxbits = bitregion_end - bitregion_start + 1;
+      else
+	maxbits = bitregion_end - bitnum + 1;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -718,9 +735,12 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
+	  || (bitregion_end && GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum,
+				  bitregion_start, bitregion_end,
+				  MEM_ALIGN (op0),
 				  (op_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : op_mode),
 				  MEM_VOLATILE_P (op0));
@@ -749,6 +769,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
+				 bitregion_start, bitregion_end,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -761,21 +782,33 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
-  store_fixed_bit_field (op0, offset, bitsize, bitpos, value);
+  store_fixed_bit_field (op0, offset, bitsize, bitpos,
+			 bitregion_start, bitregion_end, value);
   return true;
 }
 
 /* Generate code to store value from rtx VALUE
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
+
+   BITREGION_START is bitpos of the first bitfield in this region.
+   BITREGION_END is the bitpos of the ending bitfield in this region.
+   These two fields are 0, if the C++ memory model does not apply,
+   or we are not interested in keeping track of bitfield regions.
+
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		 unsigned HOST_WIDE_INT bitnum,
+		 unsigned HOST_WIDE_INT bitregion_start,
+		 unsigned HOST_WIDE_INT bitregion_end,
+		 enum machine_mode fieldmode,
 		 rtx value)
 {
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, fieldmode, value, true))
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
+			  bitregion_start, bitregion_end,
+			  fieldmode, value, true))
     gcc_unreachable ();
 }
 \f
@@ -791,7 +824,10 @@ store_bit_field (rtx str_rtx, unsigned H
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT bitregion_start,
+		       unsigned HOST_WIDE_INT bitregion_end,
+		       rtx value)
 {
   enum machine_mode mode;
   unsigned int total_bits = BITS_PER_WORD;
@@ -812,12 +848,23 @@ store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos, value);
+	  store_split_bit_field (op0, bitsize, bitpos,
+				 bitregion_start, bitregion_end,
+				 value);
 	  return;
 	}
     }
   else
     {
+      unsigned HOST_WIDE_INT maxbits;
+
+      if (!bitregion_end)
+	maxbits = 0;
+      else if (bitpos + offset * BITS_PER_UNIT < bitregion_start)
+	maxbits = bitregion_end - bitregion_start + 1;
+      else
+	maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;
+
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
 	 a word, we won't be doing the extraction the normal way.
@@ -830,10 +877,12 @@ store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
+	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
 	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+			      bitregion_start, bitregion_end,
 			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
@@ -841,7 +890,7 @@ store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 value);
+				 bitregion_start, bitregion_end, value);
 	  return;
 	}
 
@@ -961,7 +1010,10 @@ store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT bitregion_start,
+		       unsigned HOST_WIDE_INT bitregion_end,
+		       rtx value)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1076,7 +1128,7 @@ store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, part);
+			       thispos, bitregion_start, bitregion_end, part);
       bitsdone += thissize;
     }
 }
@@ -1520,7 +1572,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1646,7 +1698,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 173263)
+++ Makefile.in	(working copy)
@@ -2916,7 +2916,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    typeclass.h hard-reg-set.h toplev.h $(DIAGNOSTIC_CORE_H) hard-reg-set.h $(EXCEPT_H) \
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
-   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H)
+   $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) $(PARAMS_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
Index: stmt.c
===================================================================
--- stmt.c	(revision 173263)
+++ stmt.c	(working copy)
@@ -1758,7 +1758,8 @@ expand_return (tree retval)
 
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
-	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, word_mode,
+	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD,
+			   0, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 173263)
+++ params.def	(working copy)
@@ -884,6 +884,13 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
           "Maximum number of conditional store pairs that can be sunk",
           2, 0, 0)
 
+/* Data race flags for C++0x memory model compliance.  */
+
+DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
+	  "allow-store-data-races",
+	  "Allow new data races on stores to be introduced",
+	  1, 0, 1)
+
 
 /*
 Local variables:

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-20  9:21           ` Aldy Hernandez
@ 2011-05-26 18:05             ` Jason Merrill
  2011-05-26 18:28               ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-05-26 18:05 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

I'm afraid I think this is still wrong; the computation of maxbits in 
various places assumes that the bitfield is at the start of the unit 
we're going to access, so given

struct A
{
   int i: 4;
   int j: 28;
};

we won't use SImode to access A::j because we're setting maxbits to 28.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-26 18:05             ` Jason Merrill
@ 2011-05-26 18:28               ` Aldy Hernandez
  2011-05-26 19:07                 ` Jason Merrill
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-05-26 18:28 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

On 05/26/11 12:24, Jason Merrill wrote:
> I'm afraid I think this is still wrong; the computation of maxbits in
> various places assumes that the bitfield is at the start of the unit
> we're going to access, so given
>
> struct A
> {
> int i: 4;
> int j: 28;
> };
>
> we won't use SImode to access A::j because we're setting maxbits to 28.

No, maxbits is actually 32, because we include padding.  So it's correct 
in this case.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-26 18:28               ` Aldy Hernandez
@ 2011-05-26 19:07                 ` Jason Merrill
  2011-05-26 20:19                   ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-05-26 19:07 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

On 05/26/2011 01:39 PM, Aldy Hernandez wrote:
> On 05/26/11 12:24, Jason Merrill wrote:
>> struct A
>> {
>>   int i: 4;
>>   int j: 28;
>> };
>>
>> we won't use SImode to access A::j because we're setting maxbits to 28.
>
> No, maxbits is actually 32, because we include padding. So it's correct
> in this case.

What padding?  bitregion_end-bitregion_start+1 will be 32, but in 
get_best_mode I see

> +    maxbits = bitregion_end - bitpos + 1;

which is 28.  No?

Incidentally, I would expect _end to be one past the end rather than the 
index of the last element, but perhaps I just expect that because C++ 
iterators work that way.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-26 19:07                 ` Jason Merrill
@ 2011-05-26 20:19                   ` Aldy Hernandez
  2011-05-27 20:41                     ` Jason Merrill
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-05-26 20:19 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek


> What padding? bitregion_end-bitregion_start+1 will be 32, but in

Poop, I misread your example.

> get_best_mode I see
>
>> + maxbits = bitregion_end - bitpos + 1;
>
> which is 28. No?

Yes, but if you look at the next few lines you'll see:

   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
        mode = GET_MODE_WIDER_MODE (mode))
     {
       unit = GET_MODE_BITSIZE (mode);
       if ((bitpos % unit) + bitsize <= unit)
	break;
     }

The narrowest integer mode containing the bit field is still 32, so we 
access the bitfield with an SI instruction as expected.

> Incidentally, I would expect _end to be one past the end rather than the
> index of the last element, but perhaps I just expect that because C++
> iterators work that way.

I can fix that.

Aldy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-26 20:19                   ` Aldy Hernandez
@ 2011-05-27 20:41                     ` Jason Merrill
  2011-07-18 13:10                       ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-05-27 20:41 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

On 05/26/2011 02:37 PM, Aldy Hernandez wrote:
> The narrowest integer mode containing the bit field is still 32, so we
> access the bitfield with an SI instruction as expected.

OK, then:

struct A
{
   int i: 4;
   int j: 4;
   int k: 8;
   int l: 8;
   int m: 8;
};

now the narrowest mode containing 'j' is QI/8, but it would still be 
safe to use SI.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-05-27 20:41                     ` Jason Merrill
@ 2011-07-18 13:10                       ` Aldy Hernandez
  2011-07-22 19:16                         ` Jason Merrill
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-18 13:10 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 742 bytes --]

On 05/27/11 14:18, Jason Merrill wrote:
> On 05/26/2011 02:37 PM, Aldy Hernandez wrote:
>> The narrowest integer mode containing the bit field is still 32, so we
>> access the bitfield with an SI instruction as expected.
>
> OK, then:
>
> struct A
> {
> int i: 4;
> int j: 4;
> int k: 8;
> int l: 8;
> int m: 8;
> };
>
> now the narrowest mode containing 'j' is QI/8, but it would still be
> safe to use SI.

Hi Jason.

Sorry to have dropped the ball on this.  Your last review coincided with 
me going on vacation.

Here is another stab at it.  I am now taking into account alignment, 
which I believe addresses your issue.  I have also added the new 
testcase above, which the patch also fixes.

Tested on x86-64 Linux.

How is this?

Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 31959 bytes --]

	* params.h (ALLOW_STORE_DATA_RACES): New.
	* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
	* Makefile.in (expr.o): Depend on PARAMS_H.
	* machmode.h (get_best_mode): Add argument.
	* fold-const.c (optimize_bit_field_compare): Add argument to
	get_best_mode.
	(fold_truthop): Same.
	* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
	* expr.c (emit_group_store): Same.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Add argument.
	Add argument to get_best_mode.
	(get_bit_range): New.
	(expand_assignment): Calculate maxbits and pass it down
	accordingly.
	(store_field): New argument.
	(expand_expr_real_2): New argument to store_field.
	Include params.h.
	* expr.h (store_bit_field): New argument.
	* stor-layout.c (get_best_mode): Restrict mode expansion by taking
	into account maxbits.
	* calls.c (store_unaligned_arguments_into_pseudos): New argument
	to store_bit_field.
	* expmed.c (store_bit_field_1): New argument.  Use it.
	(store_bit_field): Same.
	(store_fixed_bit_field): Same.
	(store_split_bit_field): Same.
	(extract_bit_field_1): Pass new argument to get_best_mode.
	(extract_bit_field): Same.
	* stmt.c (store_bit_field): Pass new argument to store_bit_field.
	* tree.h (DECL_THREAD_VISIBLE_P): New.
	* doc/invoke.texi: Document parameter allow-store-data-races.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 176280)
+++ doc/invoke.texi	(working copy)
@@ -9027,6 +9027,11 @@ The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @item case-values-threshold
 The smallest number of different values for which it is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
Index: machmode.h
===================================================================
--- machmode.h	(revision 176280)
+++ machmode.h	(working copy)
@@ -248,7 +248,10 @@ extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned int,
 					enum machine_mode, int);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: tree.h
===================================================================
--- tree.h	(revision 176280)
+++ tree.h	(working copy)
@@ -3213,6 +3213,10 @@ struct GTY(()) tree_parm_decl {
 #define DECL_THREAD_LOCAL_P(NODE) \
   (VAR_DECL_CHECK (NODE)->decl_with_vis.tls_model >= TLS_MODEL_REAL)
 
+/* Return true if a VAR_DECL is visible from another thread.  */
+#define DECL_THREAD_VISIBLE_P(NODE) \
+  (TREE_STATIC (NODE) && !DECL_THREAD_LOCAL_P (NODE))
+
 /* In a non-local VAR_DECL with static storage duration, true if the
    variable has an initialization priority.  If false, the variable
    will be initialized at the DEFAULT_INIT_PRIORITY.  */
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 176280)
+++ fold-const.c	(working copy)
@@ -3394,7 +3394,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos,
+    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5222,7 +5222,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5287,7 +5287,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: params.h
===================================================================
--- params.h	(revision 176280)
+++ params.h	(working copy)
@@ -211,4 +211,6 @@ extern void init_param_values (int *para
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ALLOW_STORE_DATA_RACES \
+  PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)
 #endif /* ! GCC_PARAMS_H */
Index: testsuite/gcc.dg/20110509.c
===================================================================
--- testsuite/gcc.dg/20110509.c	(revision 0)
+++ testsuite/gcc.dg/20110509.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.A.  */
+
+struct S
+{
+  volatile unsigned int a : 4;
+  unsigned char b;
+  unsigned int c : 6;
+} var;
+
+void set_a()
+{
+  var.a = 12;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: testsuite/gcc.dg/20110509-2.c
===================================================================
--- testsuite/gcc.dg/20110509-2.c	(revision 0)
+++ testsuite/gcc.dg/20110509-2.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.K.  */
+
+struct S
+{
+  volatile int i;
+  volatile int j: 32;
+  volatile int k: 15;
+  volatile char c[2];
+} var;
+
+void setit()
+{
+  var.k = 13;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: testsuite/gcc.dg/20110509-3.c
===================================================================
--- testsuite/gcc.dg/20110509-3.c	(revision 0)
+++ testsuite/gcc.dg/20110509-3.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Make sure we don't narrow down to a QI or HI to store into VAR.J,
+   but instead use an SI.  */
+
+struct S
+{ 
+  volatile int i: 4;
+  volatile int j: 4;
+  volatile int k: 8;
+  volatile int l: 8;
+  volatile int m: 8;
+} var;
+
+void setit()
+{ 
+  var.j = 5;
+}
+
+/* { dg-final { scan-assembler "movl.*, var" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 176280)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,7 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, GET_MODE (x), y);
+	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y);
 	      return;
 	    }
 
@@ -939,7 +939,8 @@ noce_emit_move_insn (rtx x, rtx y)
   inner = XEXP (outer, 0);
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
-  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, outmode, y);
+  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
+		   0, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 176280)
+++ expr.c	(working copy)
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  
 #include "diagnostic.h"
 #include "ssaexpand.h"
 #include "target-globals.h"
+#include "params.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -143,7 +144,9 @@ static void store_constructor_field (rtx
 				     HOST_WIDE_INT, enum machine_mode,
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, enum machine_mode,
+static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
+			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -2074,7 +2077,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 mode, tmps[i]);
+			 0, 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2168,7 +2171,7 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2805,7 +2808,7 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3940,6 +3943,8 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
+				 unsigned HOST_WIDE_INT bitregion_start,
+				 unsigned HOST_WIDE_INT bitregion_end,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -4001,6 +4006,7 @@ optimize_bitfield_assignment_op (unsigne
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
       str_mode = get_best_mode (bitsize, bitpos,
+				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4109,6 +4115,111 @@ optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
+/* In the C++ memory model, consecutive bit fields in a structure are
+   considered one memory location.
+
+   Given a COMPONENT_REF, this function returns the bit range of
+   consecutive bits in which this COMPONENT_REF belongs in.  The
+   values are returned in *BITSTART and *BITEND.  If either the C++
+   memory model is not activated, or this memory access is not thread
+   visible, 0 is returned in *BITSTART and *BITEND.
+
+   EXP is the COMPONENT_REF.
+   INNERDECL is the actual object being referenced.
+   BITPOS is the position in bits where the bit starts within the structure.
+   BITSIZE is size in bits of the field being referenced in EXP.
+
+   For example, while storing into FOO.A here...
+
+      struct {
+        BIT 0:
+          unsigned int a : 4;
+	  unsigned int b : 1;
+	BIT 8:
+	  unsigned char c;
+	  unsigned int d : 6;
+      } foo;
+
+   ...we are not allowed to store past <b>, so for the layout above, a
+   range of 0..7 (because no one cares if we store into the
+   padding).  */
+
+static void
+get_bit_range (unsigned HOST_WIDE_INT *bitstart,
+	       unsigned HOST_WIDE_INT *bitend,
+	       tree exp, tree innerdecl,
+	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+{
+  tree field, record_type, fld;
+  bool found_field = false;
+  bool prev_field_is_bitfield;
+
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+
+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || !DECL_THREAD_VISIBLE_P (innerdecl))
+    {
+      *bitstart = *bitend = 0;
+      return;
+    }
+
+  /* Bit field we're storing into.  */
+  field = TREE_OPERAND (exp, 1);
+  record_type = DECL_FIELD_CONTEXT (field);
+
+  /* Count the contiguous bitfields for the memory location that
+     contains FIELD.  */
+  *bitstart = 0;
+  prev_field_is_bitfield = true;
+  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
+    {
+      tree t, offset;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
+		  unshare_expr (TREE_OPERAND (exp, 0)),
+		  fld, NULL_TREE);
+      get_inner_reference (t, &bitsize, &bitpos, &offset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      if (field == fld)
+	found_field = true;
+
+      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+	{
+	  if (prev_field_is_bitfield == false)
+	    {
+	      *bitstart = bitpos;
+	      prev_field_is_bitfield = true;
+	    }
+	}
+      else
+	{
+	  prev_field_is_bitfield = false;
+	  if (found_field)
+	    break;
+	}
+    }
+  gcc_assert (found_field);
+
+  if (fld)
+    {
+      /* We found the end of the bit field sequence.  Include the
+	 padding up to the next field and be done.  */
+      *bitend = bitpos - 1;
+    }
+  else
+    {
+      /* If this is the last element in the structure, include the padding
+	 at the end of structure.  */
+      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type));
+    }
+}
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
@@ -4208,6 +4319,8 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
+      unsigned HOST_WIDE_INT bitregion_start = 0;
+      unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4217,6 +4330,11 @@ expand_assignment (tree to, tree from, b
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
 				 &unsignedp, &volatilep, true);
 
+      if (TREE_CODE (to) == COMPONENT_REF
+	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
+	get_bit_range (&bitregion_start, &bitregion_end,
+		       to, tem, bitpos, bitsize);
+
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
 
@@ -4298,11 +4416,14 @@ expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
+				  bitregion_start, bitregion_end,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2, mode1, from,
+				  bitpos - mode_bitsize / 2,
+				  bitregion_start, bitregion_end,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
@@ -4323,7 +4444,9 @@ expand_assignment (tree to, tree from, b
 					    0);
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
-	      result = store_field (temp, bitsize, bitpos, mode1, from,
+	      result = store_field (temp, bitsize, bitpos,
+				    bitregion_start, bitregion_end,
+				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
@@ -4348,11 +4471,15 @@ expand_assignment (tree to, tree from, b
 		MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 	    }
 
-	  if (optimize_bitfield_assignment_op (bitsize, bitpos, mode1,
+	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
+					       bitregion_start, bitregion_end,
+					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
-	    result = store_field (to_rtx, bitsize, bitpos, mode1, from,
+	    result = store_field (to_rtx, bitsize, bitpos,
+				  bitregion_start, bitregion_end,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	}
@@ -4745,7 +4872,7 @@ store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, GET_MODE (temp), temp);
+			     0, 0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5210,7 +5337,8 @@ store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, mode, exp, type, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, 0, mode, exp, type, alias_set,
+		 false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5784,6 +5912,11 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
+   BITREGION_START is bitpos of the first bitfield in this region.
+   BITREGION_END is the bitpos of the ending bitfield in this region.
+   These two fields are 0, if the C++ memory model does not apply,
+   or we are not interested in keeping track of bitfield regions.
+
    Always return const0_rtx unless we have something particular to
    return.
 
@@ -5797,6 +5930,8 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+	     unsigned HOST_WIDE_INT bitregion_start,
+	     unsigned HOST_WIDE_INT bitregion_end,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5829,8 +5964,9 @@ store_field (rtx target, HOST_WIDE_INT b
       if (bitsize != (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (target)))
 	emit_move_insn (object, target);
 
-      store_field (blk_object, bitsize, bitpos, mode, exp, type, alias_set,
-		   nontemporal);
+      store_field (blk_object, bitsize, bitpos,
+		   bitregion_start, bitregion_end,
+		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
 
@@ -5944,7 +6080,9 @@ store_field (rtx target, HOST_WIDE_INT b
 	}
 
       /* Store the value in the bitfield.  */
-      store_bit_field (target, bitsize, bitpos, mode, temp);
+      store_bit_field (target, bitsize, bitpos,
+		       bitregion_start, bitregion_end,
+		       mode, temp);
 
       return const0_rtx;
     }
@@ -7354,7 +7492,7 @@ expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, TYPE_MODE (valtype), treeop0,
+			   0, 0, 0, TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 176280)
+++ expr.h	(working copy)
@@ -665,7 +665,10 @@ extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT, enum machine_mode, rtx);
+			     unsigned HOST_WIDE_INT,
+			     unsigned HOST_WIDE_INT,
+			     unsigned HOST_WIDE_INT,
+			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
 			      enum machine_mode, enum machine_mode);
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 176280)
+++ stor-layout.c	(working copy)
@@ -2361,6 +2361,13 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
+   BITREGION_START is the bit position of the first bit in this
+   sequence of bit fields.  BITREGION_END is the last bit in this
+   sequence.  If these two fields are non-zero, we should restrict the
+   memory access to a maximum sized chunk of
+   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
+   any adjacent non bit-fields.
+
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2378,11 +2385,21 @@ fixup_unsigned_type (tree type)
    decide which of the above modes should be used.  */
 
 enum machine_mode
-get_best_mode (int bitsize, int bitpos, unsigned int align,
+get_best_mode (int bitsize, int bitpos,
+	       unsigned HOST_WIDE_INT bitregion_start,
+	       unsigned HOST_WIDE_INT bitregion_end,
+	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
+  unsigned HOST_WIDE_INT maxbits;
+
+  /* If unset, no restriction.  */
+  if (!bitregion_end)
+    maxbits = 0;
+  else
+    maxbits = (bitregion_end - bitregion_start) % align;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2419,6 +2436,7 @@ get_best_mode (int bitsize, int bitpos, 
 	      && bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
+	      && (!maxbits || unit <= maxbits)
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 176280)
+++ calls.c	(working copy)
@@ -924,8 +924,8 @@ store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, word_mode,
-			     word);
+	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
+			     word_mode, word);
 	  }
       }
 }
Index: expmed.c
===================================================================
--- expmed.c	(revision 176280)
+++ expmed.c	(working copy)
@@ -47,9 +47,15 @@ struct target_expmed *this_target_expmed
 
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT,
@@ -333,7 +339,10 @@ mode_for_extraction (enum extraction_pat
 
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		   unsigned HOST_WIDE_INT bitnum,
+		   unsigned HOST_WIDE_INT bitregion_start,
+		   unsigned HOST_WIDE_INT bitregion_end,
+		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -547,7 +556,9 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
-				  bitnum + bit_offset, word_mode,
+				  bitnum + bit_offset,
+				  bitregion_start, bitregion_end,
+				  word_mode,
 				  value_word, fallback_p))
 	    {
 	      delete_insns_since (last);
@@ -710,6 +721,12 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
+      unsigned HOST_WIDE_INT maxbits;
+
+      if (!bitregion_end)
+	maxbits = 0;
+      else
+	maxbits = bitregion_end - bitregion_start;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -717,9 +734,12 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
+	  || (bitregion_end && GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum,
+				  bitregion_start, bitregion_end,
+				  MEM_ALIGN (op0),
 				  (op_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : op_mode),
 				  MEM_VOLATILE_P (op0));
@@ -748,6 +768,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
+				 bitregion_start, bitregion_end,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -760,21 +781,33 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
-  store_fixed_bit_field (op0, offset, bitsize, bitpos, value);
+  store_fixed_bit_field (op0, offset, bitsize, bitpos,
+			 bitregion_start, bitregion_end, value);
   return true;
 }
 
 /* Generate code to store value from rtx VALUE
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
+
+   BITREGION_START is bitpos of the first bitfield in this region.
+   BITREGION_END is the bitpos of the ending bitfield in this region.
+   These two fields are 0, if the C++ memory model does not apply,
+   or we are not interested in keeping track of bitfield regions.
+
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		 unsigned HOST_WIDE_INT bitnum,
+		 unsigned HOST_WIDE_INT bitregion_start,
+		 unsigned HOST_WIDE_INT bitregion_end,
+		 enum machine_mode fieldmode,
 		 rtx value)
 {
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, fieldmode, value, true))
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
+			  bitregion_start, bitregion_end,
+			  fieldmode, value, true))
     gcc_unreachable ();
 }
 \f
@@ -790,7 +823,10 @@ store_bit_field (rtx str_rtx, unsigned H
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT bitregion_start,
+		       unsigned HOST_WIDE_INT bitregion_end,
+		       rtx value)
 {
   enum machine_mode mode;
   unsigned int total_bits = BITS_PER_WORD;
@@ -811,12 +847,23 @@ store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos, value);
+	  store_split_bit_field (op0, bitsize, bitpos,
+				 bitregion_start, bitregion_end,
+				 value);
 	  return;
 	}
     }
   else
     {
+      unsigned HOST_WIDE_INT maxbits;
+
+      if (!bitregion_end)
+	maxbits = 0;
+      else if (1||bitpos + offset * BITS_PER_UNIT < bitregion_start)
+	maxbits = bitregion_end - bitregion_start;
+      else
+	maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;
+
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
 	 a word, we won't be doing the extraction the normal way.
@@ -829,10 +876,12 @@ store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
+	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
 	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+			      bitregion_start, bitregion_end,
 			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
@@ -840,7 +889,7 @@ store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 value);
+				 bitregion_start, bitregion_end, value);
 	  return;
 	}
 
@@ -960,7 +1009,10 @@ store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT bitregion_start,
+		       unsigned HOST_WIDE_INT bitregion_end,
+		       rtx value)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1075,7 +1127,7 @@ store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, part);
+			       thispos, bitregion_start, bitregion_end, part);
       bitsdone += thissize;
     }
 }
@@ -1515,7 +1567,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1641,7 +1693,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 176280)
+++ Makefile.in	(working copy)
@@ -2908,7 +2908,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) \
-   $(COMMON_TARGET_H)
+   $(PARAMS_H) $(COMMON_TARGET_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
Index: stmt.c
===================================================================
--- stmt.c	(revision 176280)
+++ stmt.c	(working copy)
@@ -1759,7 +1759,8 @@ expand_return (tree retval)
 
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
-	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, word_mode,
+	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD,
+			   0, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 176280)
+++ params.def	(working copy)
@@ -902,6 +902,12 @@ DEFPARAM (PARAM_CASE_VALUES_THRESHOLD,
 	  "if 0, use the default for the machine",
           0, 0, 0)
 
+/* Data race flags for C++0x memory model compliance.  */
+DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
+	  "allow-store-data-races",
+	  "Allow new data races on stores to be introduced",
+	  1, 0, 1)
+
 
 /*
 Local variables:

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-18 13:10                       ` Aldy Hernandez
@ 2011-07-22 19:16                         ` Jason Merrill
  2011-07-25 17:41                           ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-07-22 19:16 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

On 07/18/2011 08:02 AM, Aldy Hernandez wrote:
> +  /* If other threads can't see this value, no need to restrict stores.  */
> +  if (ALLOW_STORE_DATA_RACES
> +      || !DECL_THREAD_VISIBLE_P (innerdecl))
> +    {
> +      *bitstart = *bitend = 0;
> +      return;
> +    }

What if get_inner_reference returns something that isn't a DECL, such as 
an INDIRECT_REF?

> +  if (fld)
> +    {
> +      /* We found the end of the bit field sequence.  Include the
> +        padding up to the next field and be done.  */
> +      *bitend = bitpos - 1;
> +    }

bitpos is the position of "field", and it seems to me we want the 
position of "fld" here.

> +  /* If unset, no restriction.  */
> +  if (!bitregion_end)
> +    maxbits = 0;
> +  else
> +    maxbits = (bitregion_end - bitregion_start) % align;

Maybe use MAX_FIXED_MODE_SIZE so you don't have to test it against 0?

> +      if (!bitregion_end)
> +       maxbits = 0;
> +      else if (1||bitpos + offset * BITS_PER_UNIT < bitregion_start)
> +       maxbits = bitregion_end - bitregion_start;
> +      else
> +       maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;

I assume the 1|| was there for debugging?

Surely bitpos+offset*BITS_PER_UNIT, which would be the bit position of 
the bit-field, must be within [bitregion_start,bitregion_end)?

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-22 19:16                         ` Jason Merrill
@ 2011-07-25 17:41                           ` Aldy Hernandez
  2011-07-26  5:28                             ` Jason Merrill
  2011-07-27 18:24                             ` H.J. Lu
  0 siblings, 2 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-25 17:41 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2132 bytes --]

On 07/22/11 13:44, Jason Merrill wrote:
> On 07/18/2011 08:02 AM, Aldy Hernandez wrote:
>> + /* If other threads can't see this value, no need to restrict
>> stores. */
>> + if (ALLOW_STORE_DATA_RACES
>> + || !DECL_THREAD_VISIBLE_P (innerdecl))
>> + {
>> + *bitstart = *bitend = 0;
>> + return;
>> + }
>
> What if get_inner_reference returns something that isn't a DECL, such as
> an INDIRECT_REF?

I had changed this already to take into account aliasing, so if we get 
an INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we 
proceed with the restriction:

+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || (!ptr_deref_may_alias_global_p (innerdecl)
+         && (DECL_THREAD_LOCAL_P (innerdecl)
+             || !TREE_STATIC (innerdecl))))


>> + if (fld)
>> + {
>> + /* We found the end of the bit field sequence. Include the
>> + padding up to the next field and be done. */
>> + *bitend = bitpos - 1;
>> + }
>
> bitpos is the position of "field", and it seems to me we want the
> position of "fld" here.

Notice that bitpos gets recalculated at each iteration by 
get_inner_reference, so bitpos is actually the position of fld.

>> + /* If unset, no restriction. */
>> + if (!bitregion_end)
>> + maxbits = 0;
>> + else
>> + maxbits = (bitregion_end - bitregion_start) % align;
>
> Maybe use MAX_FIXED_MODE_SIZE so you don't have to test it against 0?

Fixed everywhere.

>> + if (!bitregion_end)
>> + maxbits = 0;
>> + else if (1||bitpos + offset * BITS_PER_UNIT < bitregion_start)
>> + maxbits = bitregion_end - bitregion_start;
>> + else
>> + maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;
>
> I assume the 1|| was there for debugging?

Fixed, plus I adjusted the calculation of maxbits everywhere because I 
found an off-by-one error.

I have also overhauled store_bit_field() to adjust the address of the 
address to point to the beginning of the bit region.  This fixed a 
myraid of corner cases pointed out by a test Hans Boehm was kind enough 
to provide.

I have added more tests.

How does this look?  (Pending tests.)

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 32883 bytes --]

	* params.h (ALLOW_STORE_DATA_RACES): New.
	* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
	* Makefile.in (expr.o): Depend on PARAMS_H.
	* machmode.h (get_best_mode): Add argument.
	* fold-const.c (optimize_bit_field_compare): Add argument to
	get_best_mode.
	(fold_truthop): Same.
	* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
	* expr.c (emit_group_store): Same.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Add argument.
	Add argument to get_best_mode.
	(get_bit_range): New.
	(expand_assignment): Calculate maxbits and pass it down
	accordingly.
	(store_field): New argument.
	(expand_expr_real_2): New argument to store_field.
	Include params.h.
	* expr.h (store_bit_field): New argument.
	* stor-layout.c (get_best_mode): Restrict mode expansion by taking
	into account maxbits.
	* calls.c (store_unaligned_arguments_into_pseudos): New argument
	to store_bit_field.
	* expmed.c (store_bit_field_1): New argument.  Use it.
	(store_bit_field): Same.
	(store_fixed_bit_field): Same.
	(store_split_bit_field): Same.
	(extract_bit_field_1): Pass new argument to get_best_mode.
	(extract_bit_field): Same.
	* stmt.c (store_bit_field): Pass new argument to store_bit_field.
	* doc/invoke.texi: Document parameter allow-store-data-races.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 176280)
+++ doc/invoke.texi	(working copy)
@@ -9027,6 +9027,11 @@ The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @item case-values-threshold
 The smallest number of different values for which it is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
Index: machmode.h
===================================================================
--- machmode.h	(revision 176280)
+++ machmode.h	(working copy)
@@ -248,7 +248,10 @@ extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+					unsigned HOST_WIDE_INT,
+					unsigned HOST_WIDE_INT,
+					unsigned int,
 					enum machine_mode, int);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 176280)
+++ fold-const.c	(working copy)
@@ -3394,7 +3394,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos,
+    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5222,7 +5222,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5287,7 +5287,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: params.h
===================================================================
--- params.h	(revision 176280)
+++ params.h	(working copy)
@@ -211,4 +211,6 @@ extern void init_param_values (int *para
   PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
 #define MAX_STORES_TO_SINK \
   PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ALLOW_STORE_DATA_RACES \
+  PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)
 #endif /* ! GCC_PARAMS_H */
Index: testsuite/gcc.dg/20110509-4.c
===================================================================
--- testsuite/gcc.dg/20110509-4.c	(revision 0)
+++ testsuite/gcc.dg/20110509-4.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  char a;
+  int b:7;
+  int c:9;
+  unsigned char d;
+} x;
+
+/* Store into <c> should not clobber <d>.  */
+void update_c(struct bits *p, int val) 
+{
+    p -> c = val;
+}
+
+/* { dg-final { scan-assembler-not "movl" } } */
Index: testsuite/gcc.dg/20110509.c
===================================================================
--- testsuite/gcc.dg/20110509.c	(revision 0)
+++ testsuite/gcc.dg/20110509.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.A.  */
+
+struct S
+{
+  volatile unsigned int a : 4;
+  unsigned char b;
+  unsigned int c : 6;
+} var;
+
+void set_a()
+{
+  var.a = 12;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: testsuite/gcc.dg/20110509-2.c
===================================================================
--- testsuite/gcc.dg/20110509-2.c	(revision 0)
+++ testsuite/gcc.dg/20110509-2.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Test that we don't store past VAR.K.  */
+
+struct S
+{
+  volatile int i;
+  volatile int j: 32;
+  volatile int k: 15;
+  volatile char c[2];
+} var;
+
+void setit()
+{
+  var.k = 13;
+}
+
+/* { dg-final { scan-assembler-not "movl.*, var" } } */
Index: testsuite/gcc.dg/20110509-3.c
===================================================================
--- testsuite/gcc.dg/20110509-3.c	(revision 0)
+++ testsuite/gcc.dg/20110509-3.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+/* Make sure we don't narrow down to a QI or HI to store into VAR.J,
+   but instead use an SI.  */
+
+struct S
+{ 
+  volatile int i: 4;
+  volatile int j: 4;
+  volatile int k: 8;
+  volatile int l: 8;
+  volatile int m: 8;
+} var;
+
+void setit()
+{ 
+  var.j = 5;
+}
+
+/* { dg-final { scan-assembler "movl.*, var" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 176280)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,7 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, GET_MODE (x), y);
+	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y);
 	      return;
 	    }
 
@@ -939,7 +939,8 @@ noce_emit_move_insn (rtx x, rtx y)
   inner = XEXP (outer, 0);
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
-  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos, outmode, y);
+  store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
+		   0, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 176280)
+++ expr.c	(working copy)
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  
 #include "diagnostic.h"
 #include "ssaexpand.h"
 #include "target-globals.h"
+#include "params.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -143,7 +144,9 @@ static void store_constructor_field (rtx
 				     HOST_WIDE_INT, enum machine_mode,
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT, enum machine_mode,
+static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
+			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -2074,7 +2077,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 mode, tmps[i]);
+			 0, 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2168,7 +2171,7 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2805,7 +2808,7 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3940,6 +3943,8 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
+				 unsigned HOST_WIDE_INT bitregion_start,
+				 unsigned HOST_WIDE_INT bitregion_end,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -4001,6 +4006,7 @@ optimize_bitfield_assignment_op (unsigne
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
       str_mode = get_best_mode (bitsize, bitpos,
+				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4109,6 +4115,113 @@ optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
+/* In the C++ memory model, consecutive bit fields in a structure are
+   considered one memory location.
+
+   Given a COMPONENT_REF, this function returns the bit range of
+   consecutive bits in which this COMPONENT_REF belongs in.  The
+   values are returned in *BITSTART and *BITEND.  If either the C++
+   memory model is not activated, or this memory access is not thread
+   visible, 0 is returned in *BITSTART and *BITEND.
+
+   EXP is the COMPONENT_REF.
+   INNERDECL is the actual object being referenced.
+   BITPOS is the position in bits where the bit starts within the structure.
+   BITSIZE is size in bits of the field being referenced in EXP.
+
+   For example, while storing into FOO.A here...
+
+      struct {
+        BIT 0:
+          unsigned int a : 4;
+	  unsigned int b : 1;
+	BIT 8:
+	  unsigned char c;
+	  unsigned int d : 6;
+      } foo;
+
+   ...we are not allowed to store past <b>, so for the layout above, a
+   range of 0..7 (because no one cares if we store into the
+   padding).  */
+
+static void
+get_bit_range (unsigned HOST_WIDE_INT *bitstart,
+	       unsigned HOST_WIDE_INT *bitend,
+	       tree exp, tree innerdecl,
+	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+{
+  tree field, record_type, fld;
+  bool found_field = false;
+  bool prev_field_is_bitfield;
+
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+
+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+      || (!ptr_deref_may_alias_global_p (innerdecl)
+	  && (DECL_THREAD_LOCAL_P (innerdecl)
+	      || !TREE_STATIC (innerdecl))))
+    {
+      *bitstart = *bitend = 0;
+      return;
+    }
+
+  /* Bit field we're storing into.  */
+  field = TREE_OPERAND (exp, 1);
+  record_type = DECL_FIELD_CONTEXT (field);
+
+  /* Count the contiguous bitfields for the memory location that
+     contains FIELD.  */
+  *bitstart = 0;
+  prev_field_is_bitfield = true;
+  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
+    {
+      tree t, offset;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
+		  unshare_expr (TREE_OPERAND (exp, 0)),
+		  fld, NULL_TREE);
+      get_inner_reference (t, &bitsize, &bitpos, &offset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      if (field == fld)
+	found_field = true;
+
+      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+	{
+	  if (prev_field_is_bitfield == false)
+	    {
+	      *bitstart = bitpos;
+	      prev_field_is_bitfield = true;
+	    }
+	}
+      else
+	{
+	  prev_field_is_bitfield = false;
+	  if (found_field)
+	    break;
+	}
+    }
+  gcc_assert (found_field);
+
+  if (fld)
+    {
+      /* We found the end of the bit field sequence.  Include the
+	 padding up to the next field and be done.  */
+      *bitend = bitpos - 1;
+    }
+  else
+    {
+      /* If this is the last element in the structure, include the padding
+	 at the end of structure.  */
+      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+    }
+}
 
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
@@ -4208,6 +4321,8 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
+      unsigned HOST_WIDE_INT bitregion_start = 0;
+      unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4217,6 +4332,11 @@ expand_assignment (tree to, tree from, b
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
 				 &unsignedp, &volatilep, true);
 
+      if (TREE_CODE (to) == COMPONENT_REF
+	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
+	get_bit_range (&bitregion_start, &bitregion_end,
+		       to, tem, bitpos, bitsize);
+
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
 
@@ -4298,11 +4418,14 @@ expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
+				  bitregion_start, bitregion_end,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2, mode1, from,
+				  bitpos - mode_bitsize / 2,
+				  bitregion_start, bitregion_end,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
@@ -4323,7 +4446,9 @@ expand_assignment (tree to, tree from, b
 					    0);
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
-	      result = store_field (temp, bitsize, bitpos, mode1, from,
+	      result = store_field (temp, bitsize, bitpos,
+				    bitregion_start, bitregion_end,
+				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
@@ -4348,11 +4473,15 @@ expand_assignment (tree to, tree from, b
 		MEM_KEEP_ALIAS_SET_P (to_rtx) = 1;
 	    }
 
-	  if (optimize_bitfield_assignment_op (bitsize, bitpos, mode1,
+	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
+					       bitregion_start, bitregion_end,
+					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
-	    result = store_field (to_rtx, bitsize, bitpos, mode1, from,
+	    result = store_field (to_rtx, bitsize, bitpos,
+				  bitregion_start, bitregion_end,
+				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
 	}
@@ -4745,7 +4874,7 @@ store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, GET_MODE (temp), temp);
+			     0, 0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5210,7 +5339,8 @@ store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, mode, exp, type, alias_set, false);
+    store_field (target, bitsize, bitpos, 0, 0, mode, exp, type, alias_set,
+		 false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5784,6 +5914,11 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
+   BITREGION_START is bitpos of the first bitfield in this region.
+   BITREGION_END is the bitpos of the ending bitfield in this region.
+   These two fields are 0, if the C++ memory model does not apply,
+   or we are not interested in keeping track of bitfield regions.
+
    Always return const0_rtx unless we have something particular to
    return.
 
@@ -5797,6 +5932,8 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+	     unsigned HOST_WIDE_INT bitregion_start,
+	     unsigned HOST_WIDE_INT bitregion_end,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5829,8 +5966,9 @@ store_field (rtx target, HOST_WIDE_INT b
       if (bitsize != (HOST_WIDE_INT) GET_MODE_BITSIZE (GET_MODE (target)))
 	emit_move_insn (object, target);
 
-      store_field (blk_object, bitsize, bitpos, mode, exp, type, alias_set,
-		   nontemporal);
+      store_field (blk_object, bitsize, bitpos,
+		   bitregion_start, bitregion_end,
+		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
 
@@ -5944,7 +6082,9 @@ store_field (rtx target, HOST_WIDE_INT b
 	}
 
       /* Store the value in the bitfield.  */
-      store_bit_field (target, bitsize, bitpos, mode, temp);
+      store_bit_field (target, bitsize, bitpos,
+		       bitregion_start, bitregion_end,
+		       mode, temp);
 
       return const0_rtx;
     }
@@ -7354,7 +7494,7 @@ expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, TYPE_MODE (valtype), treeop0,
+			   0, 0, 0, TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 176280)
+++ expr.h	(working copy)
@@ -665,7 +665,10 @@ extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT, enum machine_mode, rtx);
+			     unsigned HOST_WIDE_INT,
+			     unsigned HOST_WIDE_INT,
+			     unsigned HOST_WIDE_INT,
+			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
 			      enum machine_mode, enum machine_mode);
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 176280)
+++ stor-layout.c	(working copy)
@@ -2361,6 +2361,13 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
+   BITREGION_START is the bit position of the first bit in this
+   sequence of bit fields.  BITREGION_END is the last bit in this
+   sequence.  If these two fields are non-zero, we should restrict the
+   memory access to a maximum sized chunk of
+   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
+   any adjacent non bit-fields.
+
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2378,11 +2385,21 @@ fixup_unsigned_type (tree type)
    decide which of the above modes should be used.  */
 
 enum machine_mode
-get_best_mode (int bitsize, int bitpos, unsigned int align,
+get_best_mode (int bitsize, int bitpos,
+	       unsigned HOST_WIDE_INT bitregion_start,
+	       unsigned HOST_WIDE_INT bitregion_end,
+	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
+  unsigned HOST_WIDE_INT maxbits;
+
+  /* If unset, no restriction.  */
+  if (!bitregion_end)
+    maxbits = MAX_FIXED_MODE_SIZE;
+  else
+    maxbits = (bitregion_end - bitregion_start) % align + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2419,6 +2436,7 @@ get_best_mode (int bitsize, int bitpos, 
 	      && bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
+	      && unit <= maxbits
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 176280)
+++ calls.c	(working copy)
@@ -924,8 +924,8 @@ store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, word_mode,
-			     word);
+	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
+			     word_mode, word);
 	  }
       }
 }
Index: expmed.c
===================================================================
--- expmed.c	(revision 176280)
+++ expmed.c	(working copy)
@@ -47,9 +47,15 @@ struct target_expmed *this_target_expmed
 
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT, rtx);
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   unsigned HOST_WIDE_INT,
+				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT,
@@ -333,7 +339,10 @@ mode_for_extraction (enum extraction_pat
 
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		   unsigned HOST_WIDE_INT bitnum,
+		   unsigned HOST_WIDE_INT bitregion_start,
+		   unsigned HOST_WIDE_INT bitregion_end,
+		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -455,6 +464,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
   /* We may be accessing data outside the field, which means
      we can alias adjacent data.  */
+  /* ?? not always for C++0x memory model ?? */
   if (MEM_P (op0))
     {
       op0 = shallow_copy_rtx (op0);
@@ -547,7 +557,9 @@ store_bit_field_1 (rtx str_rtx, unsigned
 
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
-				  bitnum + bit_offset, word_mode,
+				  bitnum + bit_offset,
+				  bitregion_start, bitregion_end,
+				  word_mode,
 				  value_word, fallback_p))
 	    {
 	      delete_insns_since (last);
@@ -710,6 +722,10 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
+      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
+
+      if (bitregion_end)
+	maxbits = bitregion_end - bitregion_start + 1;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -717,9 +733,12 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
+	  || GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode  (bitsize, bitnum,
+				  bitregion_start, bitregion_end,
+				  MEM_ALIGN (op0),
 				  (op_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : op_mode),
 				  MEM_VOLATILE_P (op0));
@@ -748,6 +767,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
+				 bitregion_start, bitregion_end,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -760,21 +780,59 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
-  store_fixed_bit_field (op0, offset, bitsize, bitpos, value);
+  store_fixed_bit_field (op0, offset, bitsize, bitpos,
+			 bitregion_start, bitregion_end, value);
   return true;
 }
 
 /* Generate code to store value from rtx VALUE
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
+
+   BITREGION_START is bitpos of the first bitfield in this region.
+   BITREGION_END is the bitpos of the ending bitfield in this region.
+   These two fields are 0, if the C++ memory model does not apply,
+   or we are not interested in keeping track of bitfield regions.
+
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum, enum machine_mode fieldmode,
+		 unsigned HOST_WIDE_INT bitnum,
+		 unsigned HOST_WIDE_INT bitregion_start,
+		 unsigned HOST_WIDE_INT bitregion_end,
+		 enum machine_mode fieldmode,
 		 rtx value)
 {
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, fieldmode, value, true))
+  /* Under the C++0x memory model, we must not touch bits outside the
+     bit region.  Adjust the address to start at the beginning of the
+     bit region.  */
+  if (MEM_P (str_rtx)
+      && bitregion_start > 0)
+    {
+      enum machine_mode bestmode;
+      enum machine_mode op_mode;
+      unsigned HOST_WIDE_INT offset;
+
+      op_mode = mode_for_extraction (EP_insv, 3);
+      if (op_mode == MAX_MACHINE_MODE)
+	op_mode = VOIDmode;
+
+      offset = bitregion_start / BITS_PER_UNIT;
+      bitnum -= bitregion_start;
+      bitregion_end -= bitregion_start;
+      bitregion_start = 0;
+      bestmode = get_best_mode (bitsize, bitnum,
+				bitregion_start, bitregion_end,
+				MEM_ALIGN (str_rtx),
+				op_mode,
+				MEM_VOLATILE_P (str_rtx));
+      str_rtx = adjust_address (str_rtx, bestmode, offset);
+    }
+
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
+			  bitregion_start, bitregion_end,
+			  fieldmode, value, true))
     gcc_unreachable ();
 }
 \f
@@ -790,7 +848,10 @@ store_bit_field (rtx str_rtx, unsigned H
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT bitregion_start,
+		       unsigned HOST_WIDE_INT bitregion_end,
+		       rtx value)
 {
   enum machine_mode mode;
   unsigned int total_bits = BITS_PER_WORD;
@@ -811,12 +872,19 @@ store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos, value);
+	  store_split_bit_field (op0, bitsize, bitpos,
+				 bitregion_start, bitregion_end,
+				 value);
 	  return;
 	}
     }
   else
     {
+      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
+
+      if (bitregion_end)
+	maxbits = bitregion_end - bitregion_start + 1;
+
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
 	 a word, we won't be doing the extraction the normal way.
@@ -829,10 +897,12 @@ store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
+	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
 	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+			      bitregion_start, bitregion_end,
 			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
@@ -840,7 +910,7 @@ store_fixed_bit_field (rtx op0, unsigned
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 value);
+				 bitregion_start, bitregion_end, value);
 	  return;
 	}
 
@@ -960,7 +1030,10 @@ store_fixed_bit_field (rtx op0, unsigned
 
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitpos, rtx value)
+		       unsigned HOST_WIDE_INT bitpos,
+		       unsigned HOST_WIDE_INT bitregion_start,
+		       unsigned HOST_WIDE_INT bitregion_end,
+		       rtx value)
 {
   unsigned int unit;
   unsigned int bitsdone = 0;
@@ -1075,7 +1148,7 @@ store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, part);
+			       thispos, bitregion_start, bitregion_end, part);
       bitsdone += thissize;
     }
 }
@@ -1515,7 +1588,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1641,7 +1714,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 176280)
+++ Makefile.in	(working copy)
@@ -2908,7 +2908,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) 
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) \
-   $(COMMON_TARGET_H)
+   $(PARAMS_H) $(COMMON_TARGET_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
Index: stmt.c
===================================================================
--- stmt.c	(revision 176280)
+++ stmt.c	(working copy)
@@ -1759,7 +1759,8 @@ expand_return (tree retval)
 
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
-	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD, word_mode,
+	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD,
+			   0, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 176280)
+++ params.def	(working copy)
@@ -902,6 +902,12 @@ DEFPARAM (PARAM_CASE_VALUES_THRESHOLD,
 	  "if 0, use the default for the machine",
           0, 0, 0)
 
+/* Data race flags for C++0x memory model compliance.  */
+DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
+	  "allow-store-data-races",
+	  "Allow new data races on stores to be introduced",
+	  1, 0, 1)
+
 
 /*
 Local variables:

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-25 17:41                           ` Aldy Hernandez
@ 2011-07-26  5:28                             ` Jason Merrill
  2011-07-26 18:37                               ` Aldy Hernandez
  2011-07-26 20:05                               ` Aldy Hernandez
  2011-07-27 18:24                             ` H.J. Lu
  1 sibling, 2 replies; 81+ messages in thread
From: Jason Merrill @ 2011-07-26  5:28 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

On 07/25/2011 10:07 AM, Aldy Hernandez wrote:
> I had changed this already to take into account aliasing, so if we get
> an INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we
> proceed with the restriction:

Sounds good.  "global" includes malloc'd memory, right?  There don't 
seem to be any tests for that.

Speaking of tests, please put them in c-c++-common.

> +      bitnum -= bitregion_start;
> +      bitregion_end -= bitregion_start;
> +      bitregion_start = 0;

Why is this necessary/useful?

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-26 17:54                                 ` Jason Merrill
@ 2011-07-26 17:51                                   ` Aldy Hernandez
  2011-07-26 18:05                                     ` Jason Merrill
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-26 17:51 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek


> I think the adjustment above is intended to match the adjustment of the
> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
> that bitregion_start%BITS_PER_UNIT == 0.

That was intentional.  bitregion_start always falls on a byte boundary, 
does it not?

struct {
	stuff;
	unsigned int b:3;
	unsigned int other_bits:22;
	other_stuff;
}

Does not "b" always start at a byte boundary?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-26 18:37                               ` Aldy Hernandez
@ 2011-07-26 17:54                                 ` Jason Merrill
  2011-07-26 17:51                                   ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-07-26 17:54 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

On 07/26/2011 09:36 AM, Aldy Hernandez wrote:
>
>>> + bitnum -= bitregion_start;
>>> + bitregion_end -= bitregion_start;
>>> + bitregion_start = 0;
>>
>> Why is this necessary/useful?
>
> You mean, why am I resetting these values (because the call to
> get_best_mode() following it needs the adjusted values). Or why am I
> adjusting the address to point to the beginning of the region?

I think the adjustment above is intended to match the adjustment of the 
address by bitregion_start/BITS_PER_UNIT, but the above seems to assume 
that bitregion_start%BITS_PER_UNIT == 0.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-26 17:51                                   ` Aldy Hernandez
@ 2011-07-26 18:05                                     ` Jason Merrill
  2011-07-27 15:03                                       ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-07-26 18:05 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>
>> I think the adjustment above is intended to match the adjustment of the
>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>> that bitregion_start%BITS_PER_UNIT == 0.
>
> That was intentional. bitregion_start always falls on a byte boundary,
> does it not?

Ah, yes, of course, it's bitnum that might not.  The code changes look 
good, then.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-26  5:28                             ` Jason Merrill
@ 2011-07-26 18:37                               ` Aldy Hernandez
  2011-07-26 17:54                                 ` Jason Merrill
  2011-07-26 20:05                               ` Aldy Hernandez
  1 sibling, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-26 18:37 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek


>> + bitnum -= bitregion_start;
>> + bitregion_end -= bitregion_start;
>> + bitregion_start = 0;
>
> Why is this necessary/useful?

You mean, why am I resetting these values (because the call to 
get_best_mode() following it needs the adjusted values).  Or why am I 
adjusting the address to point to the beginning of the region?

A

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-26  5:28                             ` Jason Merrill
  2011-07-26 18:37                               ` Aldy Hernandez
@ 2011-07-26 20:05                               ` Aldy Hernandez
  1 sibling, 0 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-26 20:05 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Jeff Law, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

On 07/25/11 18:55, Jason Merrill wrote:
> On 07/25/2011 10:07 AM, Aldy Hernandez wrote:
>> I had changed this already to take into account aliasing, so if we get
>> an INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we
>> proceed with the restriction:
>
> Sounds good. "global" includes malloc'd memory, right? There don't seem
> to be any tests for that.

Is the attached test appropriate?

[-- Attachment #2: cxxbitfields-5.c --]
[-- Type: text/plain, Size: 403 bytes --]

/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
/* { dg-options "-O2 --param allow-store-data-races=0" } */

#include <stdlib.h>

struct bits
{
  char a;
  int b:7;
  int c:9;
  unsigned char d;
} x;

struct bits *p;

static void allocit()
{
  p = (struct bits *) malloc (sizeof (struct bits));
}

void foo()
{
  allocit();
  p -> c = 55;
}

/* { dg-final { scan-assembler-not "movl\t\\(" } } */

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-26 18:05                                     ` Jason Merrill
@ 2011-07-27 15:03                                       ` Richard Guenther
  2011-07-27 15:12                                         ` Richard Guenther
                                                           ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Richard Guenther @ 2011-07-27 15:03 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Aldy Hernandez, Jeff Law, gcc-patches, Jakub Jelinek

On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>
>>> I think the adjustment above is intended to match the adjustment of the
>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>> that bitregion_start%BITS_PER_UNIT == 0.
>>
>> That was intentional. bitregion_start always falls on a byte boundary,
>> does it not?
>
> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
> then.

Looks like this was an approval ...

Anyway, I don't think a --param is appropriate to control a flag whether
to allow store data-races to be created.  Why not use a regular option instead?

I believe that any after-the-fact attempt to recover bitfield boundaries is
going to fail unless you preserve more information during bitfield layout.

Consider

struct {
  char : 8;
  char : 0;
  char : 8;
};

where the : 0 isn't preserved in any way and you can't distinguish
it from struct { char : 8; char : 8; }.

Richard.

> Jason
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 15:03                                       ` Richard Guenther
@ 2011-07-27 15:12                                         ` Richard Guenther
  2011-07-27 15:53                                           ` Richard Guenther
  2011-07-27 18:22                                           ` Aldy Hernandez
  2011-07-27 17:29                                         ` Aldy Hernandez
  2011-07-28 22:26                                         ` Aldy Hernandez
  2 siblings, 2 replies; 81+ messages in thread
From: Richard Guenther @ 2011-07-27 15:12 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Aldy Hernandez, Jeff Law, gcc-patches, Jakub Jelinek

On Wed, Jul 27, 2011 at 4:52 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
>> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>>
>>>> I think the adjustment above is intended to match the adjustment of the
>>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>>> that bitregion_start%BITS_PER_UNIT == 0.
>>>
>>> That was intentional. bitregion_start always falls on a byte boundary,
>>> does it not?
>>
>> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
>> then.
>
> Looks like this was an approval ...
>
> Anyway, I don't think a --param is appropriate to control a flag whether
> to allow store data-races to be created.  Why not use a regular option instead?
>
> I believe that any after-the-fact attempt to recover bitfield boundaries is
> going to fail unless you preserve more information during bitfield layout.
>
> Consider
>
> struct {
>  char : 8;
>  char : 0;
>  char : 8;
> };
>
> where the : 0 isn't preserved in any way and you can't distinguish
> it from struct { char : 8; char : 8; }.

Oh, and

   INNERDECL is the actual object being referenced.

      || (!ptr_deref_may_alias_global_p (innerdecl)

is surely not what you want.  That asks if *innerdecl is global memory.
I suppose you want is_global_var (innerdecl)?  But with

          && (DECL_THREAD_LOCAL_P (innerdecl)
              || !TREE_STATIC (innerdecl))))

you can simply skip this test.  Or what was it supposed to do?

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 15:12                                         ` Richard Guenther
@ 2011-07-27 15:53                                           ` Richard Guenther
  2011-07-28 13:00                                             ` Richard Guenther
  2011-07-28 19:42                                             ` Aldy Hernandez
  2011-07-27 18:22                                           ` Aldy Hernandez
  1 sibling, 2 replies; 81+ messages in thread
From: Richard Guenther @ 2011-07-27 15:53 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Aldy Hernandez, Jeff Law, gcc-patches, Jakub Jelinek

On Wed, Jul 27, 2011 at 4:56 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jul 27, 2011 at 4:52 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
>>> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>>>
>>>>> I think the adjustment above is intended to match the adjustment of the
>>>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>>>> that bitregion_start%BITS_PER_UNIT == 0.
>>>>
>>>> That was intentional. bitregion_start always falls on a byte boundary,
>>>> does it not?
>>>
>>> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
>>> then.
>>
>> Looks like this was an approval ...
>>
>> Anyway, I don't think a --param is appropriate to control a flag whether
>> to allow store data-races to be created.  Why not use a regular option instead?
>>
>> I believe that any after-the-fact attempt to recover bitfield boundaries is
>> going to fail unless you preserve more information during bitfield layout.
>>
>> Consider
>>
>> struct {
>>  char : 8;
>>  char : 0;
>>  char : 8;
>> };
>>
>> where the : 0 isn't preserved in any way and you can't distinguish
>> it from struct { char : 8; char : 8; }.
>
> Oh, and
>
>   INNERDECL is the actual object being referenced.
>
>      || (!ptr_deref_may_alias_global_p (innerdecl)
>
> is surely not what you want.  That asks if *innerdecl is global memory.
> I suppose you want is_global_var (innerdecl)?  But with
>
>          && (DECL_THREAD_LOCAL_P (innerdecl)
>              || !TREE_STATIC (innerdecl))))
>
> you can simply skip this test.  Or what was it supposed to do?

And

      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
                  unshare_expr (TREE_OPERAND (exp, 0)),
                  fld, NULL_TREE);
      get_inner_reference (t, &bitsize, &bitpos, &offset,
                           &mode, &unsignedp, &volatilep, true);

for each field of a struct type is of course ... gross!  In fact you already
have the FIELD_DECL in the single caller!  Yes I know there is not
enough information preserved by bitfield layout - see my previous reply.

      if (TREE_CODE (to) == COMPONENT_REF
          && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
        get_bit_range (&bitregion_start, &bitregion_end,
                       to, tem, bitpos, bitsize);

and shouldn't this test DECL_BIT_FIELD instead of DECL_BIT_FIELD_TYPE?

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 15:03                                       ` Richard Guenther
  2011-07-27 15:12                                         ` Richard Guenther
@ 2011-07-27 17:29                                         ` Aldy Hernandez
  2011-07-27 17:57                                           ` Andrew MacLeod
  2011-07-28 22:26                                         ` Aldy Hernandez
  2 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-27 17:29 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek


> Anyway, I don't think a --param is appropriate to control a flag whether
> to allow store data-races to be created.  Why not use a regular option instead?

I don't care either way.  What -foption-name do you suggest?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 17:29                                         ` Aldy Hernandez
@ 2011-07-27 17:57                                           ` Andrew MacLeod
  2011-07-27 22:27                                             ` Joseph S. Myers
  2011-07-28  8:58                                             ` Richard Guenther
  0 siblings, 2 replies; 81+ messages in thread
From: Andrew MacLeod @ 2011-07-27 17:57 UTC (permalink / raw)
  To: Aldy Hernandez
  Cc: Richard Guenther, Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

On 07/27/2011 01:08 PM, Aldy Hernandez wrote:
>
>> Anyway, I don't think a --param is appropriate to control a flag whether
>> to allow store data-races to be created.  Why not use a regular 
>> option instead?
>
> I don't care either way.  What -foption-name do you suggest?
Well, I suggested a -f option set last year when this was laid out, and 
Ian suggested that it should be a --param

http://gcc.gnu.org/ml/gcc/2010-05/msg00118.html

"I don't agree with your proposed command line options.  They seem fine
for internal use, but I think very very few users would know when or
whether they should use -fno-data-race-stores.  I think you should
downgrade those options to a --param value, and think about a
multi-layered -fmemory-model option. "

Andrew

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 15:12                                         ` Richard Guenther
  2011-07-27 15:53                                           ` Richard Guenther
@ 2011-07-27 18:22                                           ` Aldy Hernandez
  2011-07-28  8:52                                             ` Richard Guenther
  1 sibling, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-27 18:22 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek


> Oh, and
>
>     INNERDECL is the actual object being referenced.
>
>        || (!ptr_deref_may_alias_global_p (innerdecl)
>
> is surely not what you want.  That asks if *innerdecl is global memory.
> I suppose you want is_global_var (innerdecl)?  But with
>
>            &&  (DECL_THREAD_LOCAL_P (innerdecl)
>                || !TREE_STATIC (innerdecl))))
>
> you can simply skip this test.  Or what was it supposed to do?

The test was there because neither DECL_THREAD_LOCAL_P nor is_global_var 
can handle MEM_REF's.

Would you prefer an explicit check for a *_DECL?

    if (ALLOW_STORE_DATA_RACES
-      || (!ptr_deref_may_alias_global_p (innerdecl)
+      || (DECL_P (innerdecl)
           && (DECL_THREAD_LOCAL_P (innerdecl)
               || !TREE_STATIC (innerdecl))))

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-25 17:41                           ` Aldy Hernandez
  2011-07-26  5:28                             ` Jason Merrill
@ 2011-07-27 18:24                             ` H.J. Lu
  2011-07-27 20:39                               ` Aldy Hernandez
  1 sibling, 1 reply; 81+ messages in thread
From: H.J. Lu @ 2011-07-27 18:24 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

On Mon, Jul 25, 2011 at 10:07 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 07/22/11 13:44, Jason Merrill wrote:
>>
>> On 07/18/2011 08:02 AM, Aldy Hernandez wrote:
>>>
>>> + /* If other threads can't see this value, no need to restrict
>>> stores. */
>>> + if (ALLOW_STORE_DATA_RACES
>>> + || !DECL_THREAD_VISIBLE_P (innerdecl))
>>> + {
>>> + *bitstart = *bitend = 0;
>>> + return;
>>> + }
>>
>> What if get_inner_reference returns something that isn't a DECL, such as
>> an INDIRECT_REF?
>
> I had changed this already to take into account aliasing, so if we get an
> INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we proceed
> with the restriction:
>
> +  /* If other threads can't see this value, no need to restrict stores.  */
> +  if (ALLOW_STORE_DATA_RACES
> +      || (!ptr_deref_may_alias_global_p (innerdecl)
> +         && (DECL_THREAD_LOCAL_P (innerdecl)
> +             || !TREE_STATIC (innerdecl))))
>
>
>>> + if (fld)
>>> + {
>>> + /* We found the end of the bit field sequence. Include the
>>> + padding up to the next field and be done. */
>>> + *bitend = bitpos - 1;
>>> + }
>>
>> bitpos is the position of "field", and it seems to me we want the
>> position of "fld" here.
>
> Notice that bitpos gets recalculated at each iteration by
> get_inner_reference, so bitpos is actually the position of fld.
>
>>> + /* If unset, no restriction. */
>>> + if (!bitregion_end)
>>> + maxbits = 0;
>>> + else
>>> + maxbits = (bitregion_end - bitregion_start) % align;
>>
>> Maybe use MAX_FIXED_MODE_SIZE so you don't have to test it against 0?
>
> Fixed everywhere.
>
>>> + if (!bitregion_end)
>>> + maxbits = 0;
>>> + else if (1||bitpos + offset * BITS_PER_UNIT < bitregion_start)
>>> + maxbits = bitregion_end - bitregion_start;
>>> + else
>>> + maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;
>>
>> I assume the 1|| was there for debugging?
>
> Fixed, plus I adjusted the calculation of maxbits everywhere because I found
> an off-by-one error.
>
> I have also overhauled store_bit_field() to adjust the address of the
> address to point to the beginning of the bit region.  This fixed a myraid of
> corner cases pointed out by a test Hans Boehm was kind enough to provide.
>
> I have added more tests.
>
> How does this look?  (Pending tests.)
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875

-- 
H.J.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 18:24                             ` H.J. Lu
@ 2011-07-27 20:39                               ` Aldy Hernandez
  2011-07-27 20:54                                 ` Jakub Jelinek
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-27 20:39 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 219 bytes --]


> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875

The assembler sequence on ia32 was a bit different.

H.J.  Can you try this on your end?  If it fixes the problem, I will 
commit as obvious.

Aldy

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 481 bytes --]

	PR middle-end/49875
	* c-c++-common/cxxbitfields-4.c: Check for smaller than long
	moves.

Index: c-c++-common/cxxbitfields-4.c
===================================================================
--- c-c++-common/cxxbitfields-4.c	(revision 176824)
+++ c-c++-common/cxxbitfields-4.c	(working copy)
@@ -15,4 +15,4 @@ void update_c(struct bits *p, int val) 
     p -> c = val;
 }
 
-/* { dg-final { scan-assembler-not "movl" } } */
+/* { dg-final { scan-assembler "mov\[bw\]" } } */

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 20:39                               ` Aldy Hernandez
@ 2011-07-27 20:54                                 ` Jakub Jelinek
  2011-07-27 21:00                                   ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Jakub Jelinek @ 2011-07-27 20:54 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: H.J. Lu, Jason Merrill, Jeff Law, gcc-patches

On Wed, Jul 27, 2011 at 01:51:04PM -0500, Aldy Hernandez wrote:
> >This caused:
> >
> >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875
> 
> The assembler sequence on ia32 was a bit different.
> 
> H.J.  Can you try this on your end?  If it fixes the problem, I will
> commit as obvious.

You could test it yourself on x86_64-linux too with
make check -k RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=cxxbit*'

> 	PR middle-end/49875
> 	* c-c++-common/cxxbitfields-4.c: Check for smaller than long
> 	moves.

	Jakub

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 20:54                                 ` Jakub Jelinek
@ 2011-07-27 21:00                                   ` Aldy Hernandez
  0 siblings, 0 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-27 21:00 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Jason Merrill, Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]

On 07/27/11 13:55, Jakub Jelinek wrote:
> On Wed, Jul 27, 2011 at 01:51:04PM -0500, Aldy Hernandez wrote:
>>> This caused:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875
>>
>> The assembler sequence on ia32 was a bit different.
>>
>> H.J.  Can you try this on your end?  If it fixes the problem, I will
>> commit as obvious.
>
> You could test it yourself on x86_64-linux too with
> make check -k RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=cxxbit*'

Committed.

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 883 bytes --]

	PR middle-end/49875
	* c-c++-common/cxxbitfields-4.c: Check for smaller than long
	moves.
	* c-c++-common/cxxbitfields-5.c: Same.

Index: c-c++-common/cxxbitfields-4.c
===================================================================
--- c-c++-common/cxxbitfields-4.c	(revision 176824)
+++ c-c++-common/cxxbitfields-4.c	(working copy)
@@ -15,4 +15,4 @@ void update_c(struct bits *p, int val) 
     p -> c = val;
 }
 
-/* { dg-final { scan-assembler-not "movl" } } */
+/* { dg-final { scan-assembler "mov\[bw\]" } } */
Index: c-c++-common/cxxbitfields-5.c
===================================================================
--- c-c++-common/cxxbitfields-5.c	(revision 176824)
+++ c-c++-common/cxxbitfields-5.c	(working copy)
@@ -26,4 +26,4 @@ void foo()
   p -> c = 55;
 }
 
-/* { dg-final { scan-assembler-not "movl\t\\(" } } */
+/* { dg-final { scan-assembler "mov\[bw\]" } } */

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 17:57                                           ` Andrew MacLeod
@ 2011-07-27 22:27                                             ` Joseph S. Myers
  2011-07-28  8:58                                             ` Richard Guenther
  1 sibling, 0 replies; 81+ messages in thread
From: Joseph S. Myers @ 2011-07-27 22:27 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Aldy Hernandez, Richard Guenther, Jason Merrill, Jeff Law,
	gcc-patches, Jakub Jelinek

On Wed, 27 Jul 2011, Andrew MacLeod wrote:

> On 07/27/2011 01:08 PM, Aldy Hernandez wrote:
> > 
> > > Anyway, I don't think a --param is appropriate to control a flag whether
> > > to allow store data-races to be created.  Why not use a regular option
> > > instead?
> > 
> > I don't care either way.  What -foption-name do you suggest?
> Well, I suggested a -f option set last year when this was laid out, and Ian
> suggested that it should be a --param
> 
> http://gcc.gnu.org/ml/gcc/2010-05/msg00118.html
> 
> "I don't agree with your proposed command line options.  They seem fine
> for internal use, but I think very very few users would know when or
> whether they should use -fno-data-race-stores.  I think you should
> downgrade those options to a --param value, and think about a
> multi-layered -fmemory-model option. "

The documentation says --param is for "various constants to control the 
amount of optimization that is done".  I don't think it should be used for 
anything that affects the semantics of the program; I think -f options are 
what's appropriate here (with appropriate warnings in the documentation if 
most of the options should not generally be used directly by users).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 18:22                                           ` Aldy Hernandez
@ 2011-07-28  8:52                                             ` Richard Guenther
  2011-07-29 12:05                                               ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-07-28  8:52 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

On Wed, Jul 27, 2011 at 7:36 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Oh, and
>>
>>    INNERDECL is the actual object being referenced.
>>
>>       || (!ptr_deref_may_alias_global_p (innerdecl)
>>
>> is surely not what you want.  That asks if *innerdecl is global memory.
>> I suppose you want is_global_var (innerdecl)?  But with
>>
>>           &&  (DECL_THREAD_LOCAL_P (innerdecl)
>>               || !TREE_STATIC (innerdecl))))
>>
>> you can simply skip this test.  Or what was it supposed to do?
>
> The test was there because neither DECL_THREAD_LOCAL_P nor is_global_var can
> handle MEM_REF's.

Ok, in that case you want

  (TREE_CODE (innerdecl) == MEM_REF || TREE_CODE (innerdecl) == TARGET_MEM_REF)
  && !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))

which gets you at the actual pointer.

> Would you prefer an explicit check for a *_DECL?
>
>   if (ALLOW_STORE_DATA_RACES
> -      || (!ptr_deref_may_alias_global_p (innerdecl)
> +      || (DECL_P (innerdecl)
>          && (DECL_THREAD_LOCAL_P (innerdecl)
>              || !TREE_STATIC (innerdecl))))

Yes.  Together with the above it looks then optimal.

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 17:57                                           ` Andrew MacLeod
  2011-07-27 22:27                                             ` Joseph S. Myers
@ 2011-07-28  8:58                                             ` Richard Guenther
  1 sibling, 0 replies; 81+ messages in thread
From: Richard Guenther @ 2011-07-28  8:58 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Aldy Hernandez, Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

On Wed, Jul 27, 2011 at 7:19 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 07/27/2011 01:08 PM, Aldy Hernandez wrote:
>>
>>> Anyway, I don't think a --param is appropriate to control a flag whether
>>> to allow store data-races to be created.  Why not use a regular option
>>> instead?
>>
>> I don't care either way.  What -foption-name do you suggest?
>
> Well, I suggested a -f option set last year when this was laid out, and Ian
> suggested that it should be a --param
>
> http://gcc.gnu.org/ml/gcc/2010-05/msg00118.html
>
> "I don't agree with your proposed command line options.  They seem fine
> for internal use, but I think very very few users would know when or
> whether they should use -fno-data-race-stores.  I think you should
> downgrade those options to a --param value, and think about a
> multi-layered -fmemory-model option. "

Hm, ok.  I suppose we can revisit this when implementing such -fmemory-model
option then.  --params we can at least freely remove between releases.

Richard.

> Andrew
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 15:53                                           ` Richard Guenther
@ 2011-07-28 13:00                                             ` Richard Guenther
  2011-07-29  2:58                                               ` Jason Merrill
                                                                 ` (2 more replies)
  2011-07-28 19:42                                             ` Aldy Hernandez
  1 sibling, 3 replies; 81+ messages in thread
From: Richard Guenther @ 2011-07-28 13:00 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Aldy Hernandez, Jeff Law, gcc-patches, Jakub Jelinek

On Wed, Jul 27, 2011 at 5:03 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Jul 27, 2011 at 4:56 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Wed, Jul 27, 2011 at 4:52 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill <jason@redhat.com> wrote:
>>>> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>>>>
>>>>>> I think the adjustment above is intended to match the adjustment of the
>>>>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>>>>> that bitregion_start%BITS_PER_UNIT == 0.
>>>>>
>>>>> That was intentional. bitregion_start always falls on a byte boundary,
>>>>> does it not?
>>>>
>>>> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
>>>> then.
>>>
>>> Looks like this was an approval ...
>>>
>>> Anyway, I don't think a --param is appropriate to control a flag whether
>>> to allow store data-races to be created.  Why not use a regular option instead?
>>>
>>> I believe that any after-the-fact attempt to recover bitfield boundaries is
>>> going to fail unless you preserve more information during bitfield layout.
>>>
>>> Consider
>>>
>>> struct {
>>>  char : 8;
>>>  char : 0;
>>>  char : 8;
>>> };
>>>
>>> where the : 0 isn't preserved in any way and you can't distinguish
>>> it from struct { char : 8; char : 8; }.
>>
>> Oh, and
>>
>>   INNERDECL is the actual object being referenced.
>>
>>      || (!ptr_deref_may_alias_global_p (innerdecl)
>>
>> is surely not what you want.  That asks if *innerdecl is global memory.
>> I suppose you want is_global_var (innerdecl)?  But with
>>
>>          && (DECL_THREAD_LOCAL_P (innerdecl)
>>              || !TREE_STATIC (innerdecl))))
>>
>> you can simply skip this test.  Or what was it supposed to do?
>
> And
>
>      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
>                  unshare_expr (TREE_OPERAND (exp, 0)),
>                  fld, NULL_TREE);
>      get_inner_reference (t, &bitsize, &bitpos, &offset,
>                           &mode, &unsignedp, &volatilep, true);
>
> for each field of a struct type is of course ... gross!  In fact you already
> have the FIELD_DECL in the single caller!  Yes I know there is not
> enough information preserved by bitfield layout - see my previous reply.

Looking at the C++ memory model what you need is indeed simple enough
to recover here.  Still this loop does quadratic work for a struct with
N bitfield members and a function which stores into all of them.
And that with a big constant factor as you build a component-ref
and even unshare trees (which isn't necessary here anyway).  In fact
you could easily manually keep track of bitpos when walking adjacent
bitfield members.  An initial call to get_inner_reference on
TREE_OPERAND (exp, 0) would give you the starting position of the record.

That would still be quadratic of course.

For bitfield lowering I'd like to preserve a way to get from a field-decl to
the first field-decl of a group of bitfield members that occupy an aligned
amount of storage (as place_field assigns it).  That wouldn't necessarily
match the first bitfield field in the C++ bitfield group sense but would
probably be sensible enough for conforming accesses (and you'd only
need to search forward from that first field looking for a zero-size
field).  Now, the question is of course what to do for DECL_PACKED
fields (I suppose, simply ignore the C++ memory model as C++ doesn't
have a notion of packed or specially (mis-)aligned structs or bitfields).

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 15:53                                           ` Richard Guenther
  2011-07-28 13:00                                             ` Richard Guenther
@ 2011-07-28 19:42                                             ` Aldy Hernandez
  1 sibling, 0 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-28 19:42 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek


>        if (TREE_CODE (to) == COMPONENT_REF
>            &&  DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
>          get_bit_range (&bitregion_start,&bitregion_end,
>                         to, tem, bitpos, bitsize);
>
> and shouldn't this test DECL_BIT_FIELD instead of DECL_BIT_FIELD_TYPE?

As I mentioned here:

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01416.html

I am using DECL_BIT_FIELD_TYPE instead of DECL_BIT_FIELD to determine if 
a DECL is a bit field because DECL_BIT_FIELD is not set for bit fields 
with mode sized number of bits (32-bits, 16-bits, etc).

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-29 12:05                                               ` Aldy Hernandez
@ 2011-07-28 19:58                                                 ` Richard Guenther
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Guenther @ 2011-07-28 19:58 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

On Thu, Jul 28, 2011 at 9:12 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Yes.  Together with the above it looks then optimal.
>
> Attached patch tested on x86-64 Linux.
>
> OK for mainline?

Ok with the || moved to the next line as per coding-standards.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-27 15:03                                       ` Richard Guenther
  2011-07-27 15:12                                         ` Richard Guenther
  2011-07-27 17:29                                         ` Aldy Hernandez
@ 2011-07-28 22:26                                         ` Aldy Hernandez
  2 siblings, 0 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-28 22:26 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek


> I believe that any after-the-fact attempt to recover bitfield boundaries is
> going to fail unless you preserve more information during bitfield layout.
>
> Consider
>
> struct {
>    char : 8;
>    char : 0;
>    char : 8;
> };
>
> where the : 0 isn't preserved in any way and you can't distinguish
> it from struct { char : 8; char : 8; }.

Huh?  In my tests the :0 is preserved, it just doesn't have a DECL_NAME.

(gdb) p fld
$41 = (tree) 0x7ffff7778130
(gdb) pt
  <field_decl 0x7ffff7778130 D.1593
...

I have tried the following scenario, and we calculate the beginning of 
the bit region correctly (bit 32).

struct bits
{
   char a;
   int b:7;
   int :0;		<-- bitregion start
   int c:9;		<-- bitregion start
   unsigned char d;
} *p;

void foo() { p -> c = 55; }

Am I misunderstanding?  Why do you suggest we need to preserve more 
information during bitfield layout?

FWIW, I should add a zero-length bit test.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-28 13:00                                             ` Richard Guenther
@ 2011-07-29  2:58                                               ` Jason Merrill
  2011-07-29 12:02                                               ` Aldy Hernandez
  2011-08-05 17:28                                               ` Aldy Hernandez
  2 siblings, 0 replies; 81+ messages in thread
From: Jason Merrill @ 2011-07-29  2:58 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Aldy Hernandez, Jeff Law, gcc-patches, Jakub Jelinek

On 07/28/2011 04:40 AM, Richard Guenther wrote:
> field).  Now, the question is of course what to do for DECL_PACKED
> fields (I suppose, simply ignore the C++ memory model as C++ doesn't
> have a notion of packed or specially (mis-)aligned structs or bitfields).

I think treat them as bitfields for this purpose.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-29 12:02                                               ` Aldy Hernandez
@ 2011-07-29 11:00                                                 ` Richard Guenther
  2011-08-01 13:51                                                   ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-07-29 11:00 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

On Fri, Jul 29, 2011 at 4:12 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 07/28/11 06:40, Richard Guenther wrote:
>
>> Looking at the C++ memory model what you need is indeed simple enough
>> to recover here.  Still this loop does quadratic work for a struct with
>> N bitfield members and a function which stores into all of them.
>> And that with a big constant factor as you build a component-ref
>> and even unshare trees (which isn't necessary here anyway).  In fact
>> you could easily manually keep track of bitpos when walking adjacent
>> bitfield members.  An initial call to get_inner_reference on
>> TREE_OPERAND (exp, 0) would give you the starting position of the record.
>>
>> That would still be quadratic of course.
>
> Actually, we don't need to call get_inner_reference at all.  It seems
> DECL_FIELD_BIT_OFFSET has all the information we need.
>
> How about we simplify things further as in the attached patch?
>
> Tested on x86-64 Linux.
>
> OK for mainline?

Well ... byte pieces of the offset can be in the tree offset
(DECL_FIELD_OFFSET).  Only up to DECL_OFFSET_ALIGN bits
are tracked in DECL_FIELD_BIT_OFFSET (and DECL_FIELD_OFFSET
can be a non-constant - at least for Ada, not sure about C++).

But - can you please expand a bit on the desired semantics of
get_bit_range?  Especially, relative to what is *bitstart / *bitend
supposed to be?  Why do you pass in bitpos and bitsize - they
seem to be used as local variables only.  Why is the check for
thread-local storage in this function and not in the caller (and
what's the magic [0,0] bit-range relative to?)?

The existing get_inner_reference calls give you a bitpos relative
to the start of the containing object - but

      /* If this is the last element in the structure, include the padding
         at the end of structure.  */
      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;

will set *bitend to the size of the direct parent structure size, not the
size of the underlying object.  Your proposed patch changes
bitpos to be relative to the direct parent structure.

So - I guess you need to play with some testcases like

struct {
   int some_padding;
   struct {
      int bitfield :1;
   } x;
};

and split / clarify some of get_bit_range comments.

Thanks,
Richard.

>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-28 13:00                                             ` Richard Guenther
  2011-07-29  2:58                                               ` Jason Merrill
@ 2011-07-29 12:02                                               ` Aldy Hernandez
  2011-07-29 11:00                                                 ` Richard Guenther
  2011-08-05 17:28                                               ` Aldy Hernandez
  2 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-29 12:02 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 885 bytes --]

On 07/28/11 06:40, Richard Guenther wrote:

> Looking at the C++ memory model what you need is indeed simple enough
> to recover here.  Still this loop does quadratic work for a struct with
> N bitfield members and a function which stores into all of them.
> And that with a big constant factor as you build a component-ref
> and even unshare trees (which isn't necessary here anyway).  In fact
> you could easily manually keep track of bitpos when walking adjacent
> bitfield members.  An initial call to get_inner_reference on
> TREE_OPERAND (exp, 0) would give you the starting position of the record.
>
> That would still be quadratic of course.

Actually, we don't need to call get_inner_reference at all.  It seems 
DECL_FIELD_BIT_OFFSET has all the information we need.

How about we simplify things further as in the attached patch?

Tested on x86-64 Linux.

OK for mainline?


[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 869 bytes --]

	* expr.c (get_bit_range): Get field bit offset from
	DECL_FIELD_BIT_OFFSET.

Index: expr.c
===================================================================
--- expr.c	(revision 176891)
+++ expr.c	(working copy)
@@ -4179,18 +4179,10 @@ get_bit_range (unsigned HOST_WIDE_INT *b
   prev_field_is_bitfield = true;
   for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
     {
-      tree t, offset;
-      enum machine_mode mode;
-      int unsignedp, volatilep;
-
       if (TREE_CODE (fld) != FIELD_DECL)
 	continue;
 
-      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
-		  unshare_expr (TREE_OPERAND (exp, 0)),
-		  fld, NULL_TREE);
-      get_inner_reference (t, &bitsize, &bitpos, &offset,
-			   &mode, &unsignedp, &volatilep, true);
+      bitpos = TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (fld));
 
       if (field == fld)
 	found_field = true;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-28  8:52                                             ` Richard Guenther
@ 2011-07-29 12:05                                               ` Aldy Hernandez
  2011-07-28 19:58                                                 ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-07-29 12:05 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 114 bytes --]


> Yes.  Together with the above it looks then optimal.

Attached patch tested on x86-64 Linux.

OK for mainline?

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 688 bytes --]

	* expr.c (get_bit_range): Handle *MEM_REF's.

Index: expr.c
===================================================================
--- expr.c	(revision 176824)
+++ expr.c	(working copy)
@@ -4158,7 +4158,10 @@ get_bit_range (unsigned HOST_WIDE_INT *b
 
   /* If other threads can't see this value, no need to restrict stores.  */
   if (ALLOW_STORE_DATA_RACES
-      || (!ptr_deref_may_alias_global_p (innerdecl)
+      || ((TREE_CODE (innerdecl) == MEM_REF ||
+	   TREE_CODE (innerdecl) == TARGET_MEM_REF)
+	  && !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
+      || (DECL_P (innerdecl)
 	  && (DECL_THREAD_LOCAL_P (innerdecl)
 	      || !TREE_STATIC (innerdecl))))
     {

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-29 11:00                                                 ` Richard Guenther
@ 2011-08-01 13:51                                                   ` Richard Guenther
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Guenther @ 2011-08-01 13:51 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, Jeff Law, gcc-patches, Jakub Jelinek

On Fri, Jul 29, 2011 at 11:37 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Fri, Jul 29, 2011 at 4:12 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> On 07/28/11 06:40, Richard Guenther wrote:
>>
>>> Looking at the C++ memory model what you need is indeed simple enough
>>> to recover here.  Still this loop does quadratic work for a struct with
>>> N bitfield members and a function which stores into all of them.
>>> And that with a big constant factor as you build a component-ref
>>> and even unshare trees (which isn't necessary here anyway).  In fact
>>> you could easily manually keep track of bitpos when walking adjacent
>>> bitfield members.  An initial call to get_inner_reference on
>>> TREE_OPERAND (exp, 0) would give you the starting position of the record.
>>>
>>> That would still be quadratic of course.
>>
>> Actually, we don't need to call get_inner_reference at all.  It seems
>> DECL_FIELD_BIT_OFFSET has all the information we need.
>>
>> How about we simplify things further as in the attached patch?
>>
>> Tested on x86-64 Linux.
>>
>> OK for mainline?
>
> Well ... byte pieces of the offset can be in the tree offset
> (DECL_FIELD_OFFSET).  Only up to DECL_OFFSET_ALIGN bits
> are tracked in DECL_FIELD_BIT_OFFSET (and DECL_FIELD_OFFSET
> can be a non-constant - at least for Ada, not sure about C++).
>
> But - can you please expand a bit on the desired semantics of
> get_bit_range?  Especially, relative to what is *bitstart / *bitend
> supposed to be?  Why do you pass in bitpos and bitsize - they
> seem to be used as local variables only.  Why is the check for
> thread-local storage in this function and not in the caller (and
> what's the magic [0,0] bit-range relative to?)?
>
> The existing get_inner_reference calls give you a bitpos relative
> to the start of the containing object - but
>
>      /* If this is the last element in the structure, include the padding
>         at the end of structure.  */
>      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
>
> will set *bitend to the size of the direct parent structure size, not the
> size of the underlying object.  Your proposed patch changes
> bitpos to be relative to the direct parent structure.

Using TYPE_SIZE can also run into issues with C++ tail packing,
you need to use DECL_SIZE of the respective field instead.  Consider

struct A {
  int : 17;
};
struct B : public A {
  char c;
};

where I'm not sure we are not allowed to pack c into the tail padding
in A.  Also neither TYPE_SIZE nor DECL_SIZE have to be constant,
at least in Ada you can have a variable-sized array before, and in
C you can have a trailing one.

Richard.

> So - I guess you need to play with some testcases like
>
> struct {
>   int some_padding;
>   struct {
>      int bitfield :1;
>   } x;
> };
>
> and split / clarify some of get_bit_range comments.
>
> Thanks,
> Richard.
>
>>
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-07-28 13:00                                             ` Richard Guenther
  2011-07-29  2:58                                               ` Jason Merrill
  2011-07-29 12:02                                               ` Aldy Hernandez
@ 2011-08-05 17:28                                               ` Aldy Hernandez
  2011-08-09 10:52                                                 ` Richard Guenther
  2 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-05 17:28 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2782 bytes --]

Alright, I'm back and bearing patches.  Firmly ready for the crucifixion 
you will likely submit me to. :)

I've pretty much rewritten everything, taking into account all your 
suggestions, and adding a handful of tests for corner cases we will now 
handle correctly.

It seems the minimum needed is to calculate the byte offset of the start 
of the bit region, and the length of the bit region.  (Notice I say BYTE 
offset, as the start of any bit region will happily coincide with a byte 
boundary).  These will of course be adjusted as various parts of the 
bitfield infrastructure adjust offsets and memory addresses throughout.

First, it's not as easy as calling get_inner_reference() only once as 
you've suggested.  The only way to determine the padding at the end of a 
field is getting the bit position of the field following the field in 
question (or the size of the direct parent structure in the case where 
the field in question is the last field in the structure).  So we need 
two calls to get_inner_reference for the general case.  Which is at 
least better than my original call to get_inner_reference() for every field.

I have clarified the comments and made it clear what the offsets are 
relative to.

I am now handling large offsets that may appear as a tree OFFSET from 
get_inner_reference, and have added a test for one such corner case, 
including nested structures with head padding as you suggested.  I am 
still unsure that a variable length offset can happen before a bit field 
region.  So currently we assert that the final offset is host integer 
representable.  If you have a testcase that invalidates my assumption, I 
will gladly add a test and fix the code.

Honestly, the code isn't pretty, but neither is the rest of the bit 
field machinery.  I tried to make due, but I'll gladly take suggestions 
that are not in the form of "the entire bit field code needs to be 
rewritten" :-).

To aid in reviewing, the crux of everything is in the rewritten 
get_bit_range() and the first block of store_bit_field().  Everything 
else is mostly noise.  I have attached all of get_bit_range() as a 
separate attachment to aid in reviewing, since that's the main engine, 
and it has been largely rewritten.

This pacth handles all the testcases I could come up with, mostly 
inspired by your suggestions.  Eventually I would like to replace these 
target specific tests with target-agnostic tests using the gdb simulated 
thread test harness in the cxx-mem-model branch.

Finally, you had mentioned possible problems with tail padding in C++, 
and suggested I use DECL_SIZE instead of calculating the padding using 
the size of direct parent structure.  DECL_SIZE doesn't include padding, 
so I'm open to suggestions.

Fire away, but please be kind :).

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 33421 bytes --]

	* machmode.h (get_best_mode): Remove 2 arguments.
	* fold-const.c (optimize_bit_field_compare): Same.
	(fold_truthop): Same.
	* expr.c (store_field): Change argument types in prototype.
	(emit_group_store): Change argument types to store_bit_field call.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Change argument types.
	Change arguments to get_best_mode.
	(get_bit_range): Rewrite.
	(expand_assignment): Adjust new call to get_bit_range.
	Adjust bitregion_offset when to_rtx is changed.
	Adjust calls to store_field with new argument types.
	(store_field): New argument types.
	Adjust calls to store_bit_field with new arguments.
	* expr.h (store_bit_field): Change argument types.
	* stor-layout.c (get_best_mode): Remove use of bitregion* arguments.
	* expmed.c (store_bit_field_1): Change argument types.
	Do not calculate maxbits.
	Adjust bitregion_maxbits if offset changes.
	(store_bit_field): Change argument types.
	Adjust address taking into account bitregion_offset.
	(store_fixed_bit_field): Change argument types.
	Do not calculate maxbits.
	(store_split_bit_field): Change argument types.
	(extract_bit_field_1): Adjust arguments to get_best_mode.
	(extract_fixed_bit_field): Same.

Index: machmode.h
===================================================================
--- machmode.h	(revision 176891)
+++ machmode.h	(working copy)
@@ -249,8 +249,6 @@ extern enum machine_mode mode_for_vector
 /* Find the best mode to use to access a bit field.  */
 
 extern enum machine_mode get_best_mode (int, int,
-					unsigned HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT,
 					unsigned int,
 					enum machine_mode, int);
 
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 176891)
+++ fold-const.c	(working copy)
@@ -3394,7 +3394,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
+    nmode = get_best_mode (lbitsize, lbitpos,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5221,7 +5221,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5286,7 +5286,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: testsuite/c-c++-common/cxxbitfields-6.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  char a;
+  int b:7;
+  int :0;
+  volatile int c:7;
+  unsigned char d;
+} x;
+
+/* Store into <c> should not clobber <d>.  */
+void update_c(struct bits *p, int val) 
+{
+    p -> c = val;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-8.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
@@ -0,0 +1,29 @@
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-O --param allow-store-data-races=0" } */
+
+struct bits {
+  /* Make sure the bit position of the bitfield is larger than what
+     can be represented in an unsigned HOST_WIDE_INT, to force
+     get_inner_reference() to return something in POFFSET.  */
+      
+  struct {
+    int some_padding[1<<30];
+    char more_padding;
+  } pad[1<<29];
+
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+struct bits *p;
+
+/* Test that the store into <bitfield> is not done with something
+   wider than a byte move.  */
+void foo()
+{
+  p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-7.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  int some_padding;
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+/* Store into <bitfield> should not clobber <b>.  */
+void update(struct bits *p)
+{
+    p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: expr.c
===================================================================
--- expr.c	(revision 176891)
+++ expr.c	(working copy)
@@ -145,7 +145,7 @@ static void store_constructor_field (rtx
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
-			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			tree, HOST_WIDE_INT,
 			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
@@ -2077,7 +2077,8 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i]);
+			 integer_zero_node, MAX_FIXED_MODE_SIZE,
+			 mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2171,7 +2172,8 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD,
+		       integer_zero_node, MAX_FIXED_MODE_SIZE, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2808,7 +2810,8 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
+		   integer_zero_node, MAX_FIXED_MODE_SIZE, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3943,8 +3946,8 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
-				 unsigned HOST_WIDE_INT bitregion_start,
-				 unsigned HOST_WIDE_INT bitregion_end,
+				 tree bitregion_offset ATTRIBUTE_UNUSED,
+				 HOST_WIDE_INT bitregion_maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -4005,8 +4008,9 @@ optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
+      if (bitregion_maxbits < GET_MODE_BITSIZE (str_mode))
+	str_mode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
       str_mode = get_best_mode (bitsize, bitpos,
-				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4118,18 +4122,31 @@ optimize_bitfield_assignment_op (unsigne
 /* In the C++ memory model, consecutive bit fields in a structure are
    considered one memory location.
 
-   Given a COMPONENT_REF, this function returns the bit range of
-   consecutive bits in which this COMPONENT_REF belongs in.  The
-   values are returned in *BITSTART and *BITEND.  If either the C++
-   memory model is not activated, or this memory access is not thread
-   visible, 0 is returned in *BITSTART and *BITEND.
+   Given a COMPONENT_REF, this function calculates the byte offset of
+   the beginning of the memory location containing bit field being
+   referenced.  The byte offset is returned in *OFFSET and is the byte
+   offset from the beginning of the containing object (INNERDECL).
+
+   The largest mode that can be used to write into the bit field will
+   be returned in *LARGEST_MODE.
+
+   For example, in the following structure, the bit region starts in
+   byte 4.  In an architecture where the size of BITS gets padded to
+   32-bits, SImode will be returned in *LARGEST_MODE.
+
+     struct bits {
+       int some_padding;
+       struct {
+         volatile char bitfield :1;
+       } bits;
+       char b;
+     };
 
    EXP is the COMPONENT_REF.
-   INNERDECL is the actual object being referenced.
-   BITPOS is the position in bits where the bit starts within the structure.
-   BITSIZE is size in bits of the field being referenced in EXP.
 
-   For example, while storing into FOO.A here...
+   Examples.
+
+   While storing into FOO.A here...
 
       struct {
         BIT 0:
@@ -4140,67 +4157,99 @@ optimize_bitfield_assignment_op (unsigne
 	  unsigned int d : 6;
       } foo;
 
-   ...we are not allowed to store past <b>, so for the layout above, a
-   range of 0..7 (because no one cares if we store into the
-   padding).  */
+   ...we are not allowed to store past <b>, so for the layout above,
+   *OFFSET will be byte 0, and *LARGEST_MODE will be QImode.
+
+   Here we have 3 distinct memory locations because of the zero-sized
+   bit-field separating the bits:
+   
+     struct bits
+     {
+       char a;
+       int b:7;
+       int :0;
+       int c:7;
+     } foo;
+
+   Here we also have 3 distinct memory locations because
+   structure/union boundaries will separate contiguous bit-field
+   sequences:
+
+     struct {
+       char a:3;
+       struct { char b:4; } x;
+       char c:5;
+     } foo;  */
 
 static void
-get_bit_range (unsigned HOST_WIDE_INT *bitstart,
-	       unsigned HOST_WIDE_INT *bitend,
-	       tree exp, tree innerdecl,
-	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+get_bit_range (tree exp, tree *offset, HOST_WIDE_INT *maxbits)
 {
   tree field, record_type, fld;
   bool found_field = false;
   bool prev_field_is_bitfield;
+  tree start_offset, end_offset, maxbits_tree;
+  tree start_bitpos_direct_parent = NULL_TREE;
+  HOST_WIDE_INT start_bitpos, end_bitpos;
+  HOST_WIDE_INT cumulative_bitsize = 0;
 
   gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
 
-  /* If other threads can't see this value, no need to restrict stores.  */
-  if (ALLOW_STORE_DATA_RACES
-      || ((TREE_CODE (innerdecl) == MEM_REF
-	   || TREE_CODE (innerdecl) == TARGET_MEM_REF)
-	  && !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
-      || (DECL_P (innerdecl)
-	  && (DECL_THREAD_LOCAL_P (innerdecl)
-	      || !TREE_STATIC (innerdecl))))
-    {
-      *bitstart = *bitend = 0;
-      return;
-    }
-
   /* Bit field we're storing into.  */
   field = TREE_OPERAND (exp, 1);
   record_type = DECL_FIELD_CONTEXT (field);
 
   /* Count the contiguous bitfields for the memory location that
      contains FIELD.  */
-  *bitstart = 0;
-  prev_field_is_bitfield = true;
+  start_offset = size_zero_node;
+  start_bitpos = 0;
+  prev_field_is_bitfield = false;
   for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
     {
-      tree t, offset;
-      enum machine_mode mode;
-      int unsignedp, volatilep;
-
       if (TREE_CODE (fld) != FIELD_DECL)
 	continue;
 
-      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
-		  unshare_expr (TREE_OPERAND (exp, 0)),
-		  fld, NULL_TREE);
-      get_inner_reference (t, &bitsize, &bitpos, &offset,
-			   &mode, &unsignedp, &volatilep, true);
-
       if (field == fld)
 	found_field = true;
 
-      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+      /* If we have a bit-field with a bitsize > 0... */
+      if (DECL_BIT_FIELD_TYPE (fld)
+	  && (!host_integerp (DECL_SIZE (fld), 1)
+	      || tree_low_cst (DECL_SIZE (fld), 1) > 0))
 	{
+	  /* Start of a new bit region.  */
 	  if (prev_field_is_bitfield == false)
 	    {
-	      *bitstart = bitpos;
+	      HOST_WIDE_INT bitsize;
+	      enum machine_mode mode;
+	      int unsignedp, volatilep;
+
+	      /* Save starting bitpos and offset.  */
+	      get_inner_reference (build3 (COMPONENT_REF,
+					   TREE_TYPE (exp),
+					   TREE_OPERAND (exp, 0),
+					   fld, NULL_TREE),
+				   &bitsize, &start_bitpos, &start_offset,
+				   &mode, &unsignedp, &volatilep, true);
+	      /* Save the bit offset of the current structure.  */
+	      start_bitpos_direct_parent = DECL_FIELD_BIT_OFFSET (fld);
 	      prev_field_is_bitfield = true;
+	      cumulative_bitsize = 0;
+	    }
+
+	  cumulative_bitsize += tree_low_cst (DECL_SIZE (fld), 1);
+
+	  /* Short-circuit out if we have the max bits allowed.  */
+	  /* ?? Is this even worth it.  ?? */
+	  if (cumulative_bitsize >= MAX_FIXED_MODE_SIZE)
+	    {
+	      *maxbits = MAX_FIXED_MODE_SIZE;
+	      /* Calculate byte offset to the beginning of the bit region.  */
+	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
+	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
+				     start_offset,
+				     build_int_cst (integer_type_node,
+						    start_bitpos / BITS_PER_UNIT));
+	      return;
 	    }
 	}
       else
@@ -4212,17 +4261,58 @@ get_bit_range (unsigned HOST_WIDE_INT *b
     }
   gcc_assert (found_field);
 
+  /* Calculate byte offset to the beginning of the bit region.  */
+  /* OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
+  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
+  if (!start_offset)
+    start_offset = size_zero_node;
+  *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
+			 start_offset,
+			 build_int_cst (integer_type_node,
+					start_bitpos / BITS_PER_UNIT));
   if (fld)
     {
+      HOST_WIDE_INT bitsize;
+      enum machine_mode mode;
+      int unsignedp, volatilep;
+
       /* We found the end of the bit field sequence.  Include the
-	 padding up to the next field and be done.  */
-      *bitend = bitpos - 1;
+	 padding up to the next field.  */
+
+      /* Calculate bitpos and offset of the next field.  */
+      get_inner_reference (build3 (COMPONENT_REF,
+				   TREE_TYPE (exp),
+				   TREE_OPERAND (exp, 0),
+				   fld, NULL_TREE),
+			   &bitsize, &end_bitpos, &end_offset,
+			   &mode, &unsignedp, &volatilep, true);
+      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);
+
+      if (end_offset)
+	{
+	  tree type = TREE_TYPE (end_offset), end;
+
+	  /* Calculate byte offset to the end of the bit region.  */
+	  end = fold_build2 (PLUS_EXPR, type,
+			     end_offset,
+			     build_int_cst (type,
+					    end_bitpos / BITS_PER_UNIT));
+	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
+	}
+      else
+	maxbits_tree = build_int_cst (integer_type_node,
+				      end_bitpos - start_bitpos);
+
+      /* ?? Can we get a variable-lengthened offset here ?? */
+      gcc_assert (host_integerp (maxbits_tree, 1));
+      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
     }
   else
     {
       /* If this is the last element in the structure, include the padding
 	 at the end of structure.  */
-      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
+	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
     }
 }
 
@@ -4324,8 +4414,8 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
-      unsigned HOST_WIDE_INT bitregion_start = 0;
-      unsigned HOST_WIDE_INT bitregion_end = 0;
+      tree bitregion_offset = size_zero_node;
+      HOST_WIDE_INT bitregion_maxbits = MAX_FIXED_MODE_SIZE;
       tree offset;
       int unsignedp;
       int volatilep = 0;
@@ -4337,8 +4427,23 @@ expand_assignment (tree to, tree from, b
 
       if (TREE_CODE (to) == COMPONENT_REF
 	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
-	get_bit_range (&bitregion_start, &bitregion_end,
-		       to, tem, bitpos, bitsize);
+	{
+	  /* If other threads can't see this value, no need to
+	     restrict stores.  */
+	  if (ALLOW_STORE_DATA_RACES
+	      || ((TREE_CODE (tem) == MEM_REF
+		   || TREE_CODE (tem) == TARGET_MEM_REF)
+		  && !ptr_deref_may_alias_global_p (TREE_OPERAND (tem, 0)))
+	      || (DECL_P (tem)
+		  && (DECL_THREAD_LOCAL_P (tem)
+		      || !TREE_STATIC (tem))))
+	    {
+	      bitregion_offset = size_zero_node;
+	      bitregion_maxbits = MAX_FIXED_MODE_SIZE;
+	    }
+	  else
+	    get_bit_range (to, &bitregion_offset, &bitregion_maxbits);
+	}
 
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
@@ -4388,12 +4493,19 @@ expand_assignment (tree to, tree from, b
 	      && MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1))
 	    {
 	      to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT);
+	      bitregion_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					      bitregion_offset,
+					      build_int_cst (integer_type_node,
+							     bitpos / BITS_PER_UNIT));
 	      bitpos = 0;
 	    }
 
 	  to_rtx = offset_address (to_rtx, offset_rtx,
 				   highest_pow2_factor_for_target (to,
 				   				   offset));
+	  bitregion_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					  bitregion_offset,
+					  offset);
 	}
 
       /* No action is needed if the target is not a memory and the field
@@ -4421,13 +4533,13 @@ expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -4450,7 +4562,7 @@ expand_assignment (tree to, tree from, b
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
 	      result = store_field (temp, bitsize, bitpos,
-				    bitregion_start, bitregion_end,
+				    bitregion_offset, bitregion_maxbits,
 				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
@@ -4477,13 +4589,14 @@ expand_assignment (tree to, tree from, b
 	    }
 
 	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
-					       bitregion_start, bitregion_end,
+					       bitregion_offset,
+					       bitregion_maxbits,
 					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -5917,10 +6030,10 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_OFFSET is the byte offset from the beginning of the
+   containing object to the start of the bit region.
+   BITREGION_MAXBITS is the size in bits of the largest mode that can
+   be used to set the bit-field in question.
 
    Always return const0_rtx unless we have something particular to
    return.
@@ -5935,8 +6048,8 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
-	     unsigned HOST_WIDE_INT bitregion_start,
-	     unsigned HOST_WIDE_INT bitregion_end,
+	     tree bitregion_offset,
+	     HOST_WIDE_INT bitregion_maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5970,7 +6083,7 @@ store_field (rtx target, HOST_WIDE_INT b
 	emit_move_insn (object, target);
 
       store_field (blk_object, bitsize, bitpos,
-		   bitregion_start, bitregion_end,
+		   bitregion_offset, bitregion_maxbits,
 		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
@@ -6086,7 +6199,7 @@ store_field (rtx target, HOST_WIDE_INT b
 
       /* Store the value in the bitfield.  */
       store_bit_field (target, bitsize, bitpos,
-		       bitregion_start, bitregion_end,
+		       bitregion_offset, bitregion_maxbits,
 		       mode, temp);
 
       return const0_rtx;
Index: expr.h
===================================================================
--- expr.h	(revision 176891)
+++ expr.h	(working copy)
@@ -666,8 +666,8 @@ mode_for_extraction (enum extraction_pat
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
+			     tree,
+			     HOST_WIDE_INT,
 			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 176891)
+++ stor-layout.c	(working copy)
@@ -2361,13 +2361,6 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
-   BITREGION_START is the bit position of the first bit in this
-   sequence of bit fields.  BITREGION_END is the last bit in this
-   sequence.  If these two fields are non-zero, we should restrict the
-   memory access to a maximum sized chunk of
-   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
-   any adjacent non bit-fields.
-
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2386,20 +2379,11 @@ fixup_unsigned_type (tree type)
 
 enum machine_mode
 get_best_mode (int bitsize, int bitpos,
-	       unsigned HOST_WIDE_INT bitregion_start,
-	       unsigned HOST_WIDE_INT bitregion_end,
 	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
-  unsigned HOST_WIDE_INT maxbits;
-
-  /* If unset, no restriction.  */
-  if (!bitregion_end)
-    maxbits = MAX_FIXED_MODE_SIZE;
-  else
-    maxbits = (bitregion_end - bitregion_start) % align + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2436,7 +2420,6 @@ get_best_mode (int bitsize, int bitpos,
 	      && bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
-	      && unit <= maxbits
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: expmed.c
===================================================================
--- expmed.c	(revision 176891)
+++ expmed.c	(working copy)
@@ -48,13 +48,11 @@ struct target_expmed *this_target_expmed
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   tree, HOST_WIDE_INT,
 				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   tree, HOST_WIDE_INT,
 				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
@@ -340,8 +338,8 @@ mode_for_extraction (enum extraction_pat
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		   unsigned HOST_WIDE_INT bitnum,
-		   unsigned HOST_WIDE_INT bitregion_start,
-		   unsigned HOST_WIDE_INT bitregion_end,
+		   tree bitregion_offset,
+		   HOST_WIDE_INT bitregion_maxbits,
 		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
@@ -558,7 +556,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
 				  bitnum + bit_offset,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  word_mode,
 				  value_word, fallback_p))
 	    {
@@ -722,10 +720,6 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -733,15 +727,18 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
-	  || GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits
+	  || GET_MODE_BITSIZE (GET_MODE (op0)) > bitregion_maxbits
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode  (bitsize, bitnum,
-				  bitregion_start, bitregion_end,
-				  MEM_ALIGN (op0),
-				  (op_mode == MAX_MACHINE_MODE
-				   ? VOIDmode : op_mode),
-				  MEM_VOLATILE_P (op0));
+	{
+	  bestmode = (op_mode == MAX_MACHINE_MODE ? VOIDmode : op_mode);
+	  if (bitregion_maxbits < GET_MODE_SIZE (op_mode))
+	    bestmode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
+	  bestmode = get_best_mode  (bitsize, bitnum,
+				     MEM_ALIGN (op0),
+				     bestmode,
+				     MEM_VOLATILE_P (op0));
+	}
       else
 	bestmode = GET_MODE (op0);
 
@@ -767,7 +764,8 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
-				 bitregion_start, bitregion_end,
+				 bitregion_offset,
+				 bitregion_maxbits - xoffset * BITS_PER_UNIT,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -780,8 +778,9 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
+  bitregion_maxbits -= offset * BITS_PER_UNIT;
   store_fixed_bit_field (op0, offset, bitsize, bitpos,
-			 bitregion_start, bitregion_end, value);
+			 bitregion_offset, bitregion_maxbits, value);
   return true;
 }
 
@@ -789,18 +788,17 @@ store_bit_field_1 (rtx str_rtx, unsigned
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_OFFSET is the byte offset STR_RTX to the start of the bit
+   region.  BITREGION_MAXBITS is the number of bits of the largest
+   mode that can be used to set the bit-field in question.
 
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		 unsigned HOST_WIDE_INT bitnum,
-		 unsigned HOST_WIDE_INT bitregion_start,
-		 unsigned HOST_WIDE_INT bitregion_end,
+		 tree bitregion_offset,
+		 HOST_WIDE_INT bitregion_maxbits,
 		 enum machine_mode fieldmode,
 		 rtx value)
 {
@@ -808,30 +806,23 @@ store_bit_field (rtx str_rtx, unsigned H
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
   if (MEM_P (str_rtx)
-      && bitregion_start > 0)
+      && bitregion_maxbits < MAX_FIXED_MODE_SIZE)
     {
-      enum machine_mode bestmode;
-      enum machine_mode op_mode;
-      unsigned HOST_WIDE_INT offset;
+      HOST_WIDE_INT offset;
 
-      op_mode = mode_for_extraction (EP_insv, 3);
-      if (op_mode == MAX_MACHINE_MODE)
-	op_mode = VOIDmode;
-
-      offset = bitregion_start / BITS_PER_UNIT;
-      bitnum -= bitregion_start;
-      bitregion_end -= bitregion_start;
-      bitregion_start = 0;
-      bestmode = get_best_mode (bitsize, bitnum,
-				bitregion_start, bitregion_end,
-				MEM_ALIGN (str_rtx),
-				op_mode,
-				MEM_VOLATILE_P (str_rtx));
-      str_rtx = adjust_address (str_rtx, bestmode, offset);
+      /* ?? Can we get a variable length offset here ?? */
+      gcc_assert (host_integerp (bitregion_offset, 1));
+      offset = tree_low_cst (bitregion_offset, 1);
+
+      /* Adjust the bit position accordingly.  */
+      bitnum -= offset * BITS_PER_UNIT;
+      bitregion_offset = integer_zero_node;
+      /* Adjust the actual address.  */
+      str_rtx = adjust_address (str_rtx, GET_MODE (str_rtx), offset);
     }
 
   if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
-			  bitregion_start, bitregion_end,
+			  bitregion_offset, bitregion_maxbits,
 			  fieldmode, value, true))
     gcc_unreachable ();
 }
@@ -849,8 +840,8 @@ static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       tree bitregion_offset,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   enum machine_mode mode;
@@ -873,17 +864,14 @@ store_fixed_bit_field (rtx op0, unsigned
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
 	  store_split_bit_field (op0, bitsize, bitpos,
-				 bitregion_start, bitregion_end,
+				 bitregion_offset, bitregion_maxbits,
 				 value);
 	  return;
 	}
     }
   else
     {
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
+      HOST_WIDE_INT maxbits = bitregion_maxbits;
 
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
@@ -901,16 +889,19 @@ store_fixed_bit_field (rtx op0, unsigned
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
-			      bitregion_start, bitregion_end,
-			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	{
+	  if (bitregion_maxbits < GET_MODE_BITSIZE (mode))
+	    mode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
+	  mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+				MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	}
 
       if (mode == VOIDmode)
 	{
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 bitregion_start, bitregion_end, value);
+				 bitregion_offset, bitregion_maxbits, value);
 	  return;
 	}
 
@@ -1031,8 +1022,8 @@ store_fixed_bit_field (rtx op0, unsigned
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       tree bitregion_offset,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   unsigned int unit;
@@ -1148,7 +1139,8 @@ store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, bitregion_start, bitregion_end, part);
+			       thispos, bitregion_offset, bitregion_maxbits,
+			       part);
       bitsdone += thissize;
     }
 }
@@ -1588,7 +1580,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1714,7 +1706,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)

[-- Attachment #3: get-bit-range --]
[-- Type: text/plain, Size: 5785 bytes --]

/* In the C++ memory model, consecutive bit fields in a structure are
   considered one memory location.

   Given a COMPONENT_REF, this function calculates the byte offset of
   the beginning of the memory location containing bit field being
   referenced.  The byte offset is returned in *OFFSET and is the byte
   offset from the beginning of the containing object (INNERDECL).

   The largest mode that can be used to write into the bit field will
   be returned in *LARGEST_MODE.

   For example, in the following structure, the bit region starts in
   byte 4.  In an architecture where the size of BITS gets padded to
   32-bits, SImode will be returned in *LARGEST_MODE.

     struct bits {
       int some_padding;
       struct {
         volatile char bitfield :1;
       } bits;
       char b;
     };

   EXP is the COMPONENT_REF.

   Examples.

   While storing into FOO.A here...

      struct {
        BIT 0:
          unsigned int a : 4;
	  unsigned int b : 1;
	BIT 8:
	  unsigned char c;
	  unsigned int d : 6;
      } foo;

   ...we are not allowed to store past <b>, so for the layout above,
   *OFFSET will be byte 0, and *LARGEST_MODE will be QImode.

   Here we have 3 distinct memory locations because of the zero-sized
   bit-field separating the bits:
   
     struct bits
     {
       char a;
       int b:7;
       int :0;
       int c:7;
     } foo;

   Here we also have 3 distinct memory locations because
   structure/union boundaries will separate contiguous bit-field
   sequences:

     struct {
       char a:3;
       struct { char b:4; } x;
       char c:5;
     } foo;  */

static void
get_bit_range (tree exp, tree *offset, HOST_WIDE_INT *maxbits)
{
  tree field, record_type, fld;
  bool found_field = false;
  bool prev_field_is_bitfield;
  tree start_offset, end_offset, maxbits_tree;
  tree start_bitpos_direct_parent = NULL_TREE;
  HOST_WIDE_INT start_bitpos, end_bitpos;
  HOST_WIDE_INT cumulative_bitsize = 0;

  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);

  /* Bit field we're storing into.  */
  field = TREE_OPERAND (exp, 1);
  record_type = DECL_FIELD_CONTEXT (field);

  /* Count the contiguous bitfields for the memory location that
     contains FIELD.  */
  start_offset = size_zero_node;
  start_bitpos = 0;
  prev_field_is_bitfield = false;
  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;

      if (field == fld)
	found_field = true;

      /* If we have a bit-field with a bitsize > 0... */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && (!host_integerp (DECL_SIZE (fld), 1)
	      || tree_low_cst (DECL_SIZE (fld), 1) > 0))
	{
	  /* Start of a new bit region.  */
	  if (prev_field_is_bitfield == false)
	    {
	      HOST_WIDE_INT bitsize;
	      enum machine_mode mode;
	      int unsignedp, volatilep;

	      /* Save starting bitpos and offset.  */
	      get_inner_reference (build3 (COMPONENT_REF,
					   TREE_TYPE (exp),
					   TREE_OPERAND (exp, 0),
					   fld, NULL_TREE),
				   &bitsize, &start_bitpos, &start_offset,
				   &mode, &unsignedp, &volatilep, true);
	      /* Save the bit offset of the current structure.  */
	      start_bitpos_direct_parent = DECL_FIELD_BIT_OFFSET (fld);
	      prev_field_is_bitfield = true;
	      cumulative_bitsize = 0;
	    }

	  cumulative_bitsize += tree_low_cst (DECL_SIZE (fld), 1);

	  /* Short-circuit out if we have the max bits allowed.  */
	  /* ?? Is this even worth it.  ?? */
	  if (cumulative_bitsize >= MAX_FIXED_MODE_SIZE)
	    {
	      *maxbits = MAX_FIXED_MODE_SIZE;
	      /* Calculate byte offset to the beginning of the bit region.  */
	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
				     start_offset,
				     build_int_cst (integer_type_node,
						    start_bitpos / BITS_PER_UNIT));
	      return;
	    }
	}
      else
	{
	  prev_field_is_bitfield = false;
	  if (found_field)
	    break;
	}
    }
  gcc_assert (found_field);

  /* Calculate byte offset to the beginning of the bit region.  */
  /* OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
  if (!start_offset)
    start_offset = size_zero_node;
  *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
			 start_offset,
			 build_int_cst (integer_type_node,
					start_bitpos / BITS_PER_UNIT));
  if (fld)
    {
      HOST_WIDE_INT bitsize;
      enum machine_mode mode;
      int unsignedp, volatilep;

      /* We found the end of the bit field sequence.  Include the
	 padding up to the next field.  */

      /* Calculate bitpos and offset of the next field.  */
      get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   fld, NULL_TREE),
			   &bitsize, &end_bitpos, &end_offset,
			   &mode, &unsignedp, &volatilep, true);
      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);

      if (end_offset)
	{
	  tree type = TREE_TYPE (end_offset), end;

	  /* Calculate byte offset to the end of the bit region.  */
	  end = fold_build2 (PLUS_EXPR, type,
			     end_offset,
			     build_int_cst (type,
					    end_bitpos / BITS_PER_UNIT));
	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
	}
      else
	maxbits_tree = build_int_cst (integer_type_node,
				      end_bitpos - start_bitpos);

      /* ?? Can we get a variable-lengthened offset here ?? */
      gcc_assert (host_integerp (maxbits_tree, 1));
      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
    }
  else
    {
      /* If this is the last element in the structure, include the padding
	 at the end of structure.  */
      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
    }
}

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-05 17:28                                               ` Aldy Hernandez
@ 2011-08-09 10:52                                                 ` Richard Guenther
  2011-08-09 20:53                                                   ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-08-09 10:52 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

On Fri, Aug 5, 2011 at 7:25 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> Alright, I'm back and bearing patches.  Firmly ready for the crucifixion you
> will likely submit me to. :)
>
> I've pretty much rewritten everything, taking into account all your
> suggestions, and adding a handful of tests for corner cases we will now
> handle correctly.
>
> It seems the minimum needed is to calculate the byte offset of the start of
> the bit region, and the length of the bit region.  (Notice I say BYTE
> offset, as the start of any bit region will happily coincide with a byte
> boundary).  These will of course be adjusted as various parts of the
> bitfield infrastructure adjust offsets and memory addresses throughout.
>
> First, it's not as easy as calling get_inner_reference() only once as you've
> suggested.  The only way to determine the padding at the end of a field is
> getting the bit position of the field following the field in question (or
> the size of the direct parent structure in the case where the field in
> question is the last field in the structure).  So we need two calls to
> get_inner_reference for the general case.  Which is at least better than my
> original call to get_inner_reference() for every field.
>
> I have clarified the comments and made it clear what the offsets are
> relative to.
>
> I am now handling large offsets that may appear as a tree OFFSET from
> get_inner_reference, and have added a test for one such corner case,
> including nested structures with head padding as you suggested.  I am still
> unsure that a variable length offset can happen before a bit field region.
>  So currently we assert that the final offset is host integer representable.
>  If you have a testcase that invalidates my assumption, I will gladly add a
> test and fix the code.
>
> Honestly, the code isn't pretty, but neither is the rest of the bit field
> machinery.  I tried to make due, but I'll gladly take suggestions that are
> not in the form of "the entire bit field code needs to be rewritten" :-).
>
> To aid in reviewing, the crux of everything is in the rewritten
> get_bit_range() and the first block of store_bit_field().  Everything else
> is mostly noise.  I have attached all of get_bit_range() as a separate
> attachment to aid in reviewing, since that's the main engine, and it has
> been largely rewritten.
>
> This pacth handles all the testcases I could come up with, mostly inspired
> by your suggestions.  Eventually I would like to replace these target
> specific tests with target-agnostic tests using the gdb simulated thread
> test harness in the cxx-mem-model branch.
>
> Finally, you had mentioned possible problems with tail padding in C++, and
> suggested I use DECL_SIZE instead of calculating the padding using the size
> of direct parent structure.  DECL_SIZE doesn't include padding, so I'm open
> to suggestions.
>
> Fire away, but please be kind :).

Just reading and commenting top-down on the new get_bit_range function.

      /* If we have a bit-field with a bitsize > 0... */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && (!host_integerp (DECL_SIZE (fld), 1)
	      || tree_low_cst (DECL_SIZE (fld), 1) > 0))

DECL_SIZE should always be host_integerp for bitfields.

	      /* Save starting bitpos and offset.  */
	      get_inner_reference (build3 (COMPONENT_REF,
					   TREE_TYPE (exp),
					   TREE_OPERAND (exp, 0),
					   fld, NULL_TREE),
				   &bitsize, &start_bitpos, &start_offset,
				   &mode, &unsignedp, &volatilep, true);

ok, so now you do this only for the first field in a bitfield group.  But you
do it for _all_ bitfield groups in a struct, not only for the interesting one.

May I suggest to split the loop into two, first searching the first field
in the bitfield group that contains fld and then in a separate loop computing
the bitwidth?

Backing up, considering one of my earlier questions.  What is *offset
supposed to be relative to?  The docs say sth like "relative to INNERDECL",
but the code doesn't contain a reference to INNERDECL anymore.

I think if the offset is really supposed to be relative to INNERDECL then
you should return a split offset, similar to get_inner_reference itself.
Thus, return a byte tree offset plus a HWI bit offset and maxbits
(that HWI bit offset is the offset to the start of the bitfield group, right?
Not the offset of the field that is referenced?)

It really feels like you should do something like

  /* Get the offset to our parent structure.  */
  get_inner_reference (TREE_OPERAND (exp, 0), &offset, &bit_offset....);

  for (fld = TYPE_FIELDS (...) ...)
    /* Search for the starting field of the bitfield group of
TREE_OPERAND (exp, 1) */

  offset += DECL_FIELD_OFFSET (first_field_of_group);
  bit_offset += DECL_FIELD_BIT_OFFSET (first_field_of_group);
  (well, basically copy what get_inner_reference would do here)

  for (...)
    accumulate bit-offsets of the group (mind they'll eventually wrap
    when hitting DECL_OFFSET_ALIGN) to compute maxbits
    (that also always will fit in a HWI)

Now we come to that padding thing.  What's the C++ memory model
semantic for re-used tail padding?  Consider

  struct A
  {
     int i;
     bool a:1;
  }
  struct B : public A
  {
     bool b:1;
  }

The tail-padding of A is 3 bytes that may be used by b.  Now, is
accessing a allowed to race with accessing b?  Then the group for
a may include the 3 bytes tail padding.  If not, then it may not
(in which case using DECL_SIZE would be appropriate).

There is too much get_inner_reference and tree folding stuff in this
patch (which makes it expensive given that the algorithm is still
inherently quadratic).  You can rely on the bitfield group advancing
by integer-cst bits (but the start offset may be non-constant, so
may the size of the underlying record).

Now seeing all this - and considering that this is purely C++ frontend
semantics.  Why can't the C++ frontend itself constrain accesses
according to the required semantics?  It could simply create
BIT_FIELD_REF <MEM_REF <&containing_record,
byte-offset-to-start-of-group>, bit-size, bit-offset> for all bitfield
references (with a proper
type for the MEM_REF, specifying the size of the group).  That would
also avoid issues during tree optimization and would at least allow
optimizing the bitfield accesses according to the desired C++ semantics.

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-09 10:52                                                 ` Richard Guenther
@ 2011-08-09 20:53                                                   ` Aldy Hernandez
  2011-08-10 13:34                                                     ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-09 20:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2364 bytes --]


> ok, so now you do this only for the first field in a bitfield group.  But you
> do it for _all_ bitfield groups in a struct, not only for the interesting one.
>
> May I suggest to split the loop into two, first searching the first field
> in the bitfield group that contains fld and then in a separate loop computing
> the bitwidth?

Excellent idea.  Done!  Now there are at most two calls to 
get_inner_reference, and in many cases, only one.

> Backing up, considering one of my earlier questions.  What is *offset
> supposed to be relative to?  The docs say sth like "relative to INNERDECL",
> but the code doesn't contain a reference to INNERDECL anymore.

Sorry, I see your confusion.  The comments at the top were completely 
out of date.  I have simplified and rewritten them accordingly.  I am 
attaching get_bit_range() with these and other changes you suggested. 
See if it makes sense now.

> Now we come to that padding thing.  What's the C++ memory model
> semantic for re-used tail padding?  Consider

Andrew addressed this elsewhere.

> There is too much get_inner_reference and tree folding stuff in this
> patch (which makes it expensive given that the algorithm is still
> inherently quadratic).  You can rely on the bitfield group advancing
> by integer-cst bits (but the start offset may be non-constant, so
> may the size of the underlying record).

Now there are only two tree folding calls (apart from 
get_inner_reference), and the common case has very simple arithmetic 
tuples.  I see no clear way of removing the last call to 
get_inner_reference(), as the padding after the field can only be 
calculated by calling get_inner_reference() on the subsequent field.

> Now seeing all this - and considering that this is purely C++ frontend
> semantics.  Why can't the C++ frontend itself constrain accesses
> according to the required semantics?  It could simply create
> BIT_FIELD_REF<MEM_REF<&containing_record,
> byte-offset-to-start-of-group>, bit-size, bit-offset>  for all bitfield
> references (with a proper
> type for the MEM_REF, specifying the size of the group).  That would
> also avoid issues during tree optimization and would at least allow
> optimizing the bitfield accesses according to the desired C++ semantics.

Andrew addressed this as well.  Could you respond to his email if you 
think it is unsatisfactory?

a



[-- Attachment #2: stuff --]
[-- Type: text/plain, Size: 5150 bytes --]

/* In the C++ memory model, consecutive non-zero bit fields in a
   structure are considered one memory location.

   Given a COMPONENT_REF, this function calculates the byte offset
   from the containing object to the start of the contiguous bit
   region containing the field in question.  This byte offset is
   returned in *OFFSET.

   The maximum number of bits that can be addressed while storing into
   the COMPONENT_REF is returned in *MAXBITS.  This number is the
   number of bits in the contiguous bit region, up to a maximum of
   MAX_FIXED_MODE_SIZE.  */

static void
get_bit_range (tree exp, tree *offset, HOST_WIDE_INT *maxbits)
{
  tree field, record_type, fld;
  bool prev_field_is_bitfield;
  tree start_offset;
  tree start_bitpos_direct_parent = NULL_TREE;
  HOST_WIDE_INT start_bitpos;
  HOST_WIDE_INT cumulative_bitsize = 0;
  /* First field of the bitfield group containing the bitfield we are
     referencing.  */
  tree bitregion_start;

  HOST_WIDE_INT tbitsize;
  enum machine_mode tmode;
  int tunsignedp, tvolatilep;
  bool found;

  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);

  /* Bit field we're storing into.  */
  field = TREE_OPERAND (exp, 1);
  record_type = DECL_FIELD_CONTEXT (field);

  /* Find the bitfield group containing the field in question, and set
     BITREGION_START to the start of the group.  */
  prev_field_is_bitfield = false;
  bitregion_start = NULL_TREE;
  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;
      /* If we have a bit-field with a bitsize > 0... */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && tree_low_cst (DECL_SIZE (fld), 1) > 0)
	{
	  if (!prev_field_is_bitfield)
	    {
	      bitregion_start = fld;
	      prev_field_is_bitfield = true;
	    }
	}
      else
	prev_field_is_bitfield = false;
      if (fld == field)
	break;
    }
  gcc_assert (bitregion_start);
  gcc_assert (fld);

  /* Save the starting position of the bitregion.  */
  get_inner_reference (build3 (COMPONENT_REF,
			       TREE_TYPE (exp),
			       TREE_OPERAND (exp, 0),
			       bitregion_start, NULL_TREE),
		       &tbitsize, &start_bitpos, &start_offset,
		       &tmode, &tunsignedp, &tvolatilep, true);
  if (!start_offset)
    start_offset = size_zero_node;
  /* Calculate byte offset to the beginning of the bit region.  */
  /* OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
  *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
			 start_offset,
			 build_int_cst (integer_type_node,
					start_bitpos / BITS_PER_UNIT));
  /* Save the bit offset of the current structure.  */
  start_bitpos_direct_parent = DECL_FIELD_BIT_OFFSET (bitregion_start);

  /* Count the bitsize of the bitregion containing the field in question.  */
  found = false;
  cumulative_bitsize = 0;
  for (fld = bitregion_start; fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;
      if (fld == field)
	found = true;

      if (DECL_BIT_FIELD_TYPE (fld)
	  && tree_low_cst (DECL_SIZE (fld), 1) > 0)
	{
	  cumulative_bitsize += tree_low_cst (DECL_SIZE (fld), 1);

	  /* Short-circuit out if we have the max bits allowed.  */
	  if (cumulative_bitsize >= MAX_FIXED_MODE_SIZE)
	    {
	      *maxbits = MAX_FIXED_MODE_SIZE;
	      /* Calculate byte offset to the beginning of the bit region.  */
	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
				     start_offset,
				     build_int_cst (integer_type_node,
						    start_bitpos / BITS_PER_UNIT));
	      return;
	    }
	}
      else if (found)
	break;
    }

  /* If we found the end of the bit field sequence, include the
     padding up to the next field...  */
  if (fld)
    {
      tree end_offset, maxbits_tree;
      HOST_WIDE_INT end_bitpos;

      /* Calculate bitpos and offset of the next field.  */
      get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   fld, NULL_TREE),
			   &tbitsize, &end_bitpos, &end_offset,
			   &tmode, &tunsignedp, &tvolatilep, true);
      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);

      if (end_offset)
	{
	  tree type = TREE_TYPE (end_offset), end;

	  /* Calculate byte offset to the end of the bit region.  */
	  end = fold_build2 (PLUS_EXPR, type,
			     end_offset,
			     build_int_cst (type,
					    end_bitpos / BITS_PER_UNIT));
	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
	}
      else
	maxbits_tree = build_int_cst (integer_type_node,
				      end_bitpos - start_bitpos);

      /* ?? Can we get a variable-lengthened offset here ?? */
      gcc_assert (host_integerp (maxbits_tree, 1));
      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
    }
  /* ...otherwise, this is the last element in the structure.  */
  else
    {
      /* Include the padding at the end of structure.  */
      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
      if (*maxbits > MAX_FIXED_MODE_SIZE)
	*maxbits = MAX_FIXED_MODE_SIZE;
    }
}

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-09 20:53                                                   ` Aldy Hernandez
@ 2011-08-10 13:34                                                     ` Richard Guenther
  2011-08-15 19:26                                                       ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-08-10 13:34 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

On Tue, Aug 9, 2011 at 8:39 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> ok, so now you do this only for the first field in a bitfield group.  But
>> you
>> do it for _all_ bitfield groups in a struct, not only for the interesting
>> one.
>>
>> May I suggest to split the loop into two, first searching the first field
>> in the bitfield group that contains fld and then in a separate loop
>> computing
>> the bitwidth?
>
> Excellent idea.  Done!  Now there are at most two calls to
> get_inner_reference, and in many cases, only one.
>
>> Backing up, considering one of my earlier questions.  What is *offset
>> supposed to be relative to?  The docs say sth like "relative to
>> INNERDECL",
>> but the code doesn't contain a reference to INNERDECL anymore.
>
> Sorry, I see your confusion.  The comments at the top were completely out of
> date.  I have simplified and rewritten them accordingly.  I am attaching
> get_bit_range() with these and other changes you suggested. See if it makes
> sense now.
>
>> Now we come to that padding thing.  What's the C++ memory model
>> semantic for re-used tail padding?  Consider
>
> Andrew addressed this elsewhere.
>
>> There is too much get_inner_reference and tree folding stuff in this
>> patch (which makes it expensive given that the algorithm is still
>> inherently quadratic).  You can rely on the bitfield group advancing
>> by integer-cst bits (but the start offset may be non-constant, so
>> may the size of the underlying record).
>
> Now there are only two tree folding calls (apart from get_inner_reference),
> and the common case has very simple arithmetic tuples.  I see no clear way
> of removing the last call to get_inner_reference(), as the padding after the
> field can only be calculated by calling get_inner_reference() on the
> subsequent field.
>
>> Now seeing all this - and considering that this is purely C++ frontend
>> semantics.  Why can't the C++ frontend itself constrain accesses
>> according to the required semantics?  It could simply create
>> BIT_FIELD_REF<MEM_REF<&containing_record,
>> byte-offset-to-start-of-group>, bit-size, bit-offset>  for all bitfield
>> references (with a proper
>> type for the MEM_REF, specifying the size of the group).  That would
>> also avoid issues during tree optimization and would at least allow
>> optimizing the bitfield accesses according to the desired C++ semantics.
>
> Andrew addressed this as well.  Could you respond to his email if you think
> it is unsatisfactory?

Some comments.

      /* If we have a bit-field with a bitsize > 0... */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && tree_low_cst (DECL_SIZE (fld), 1) > 0)

I think we can check bitsize != 0, thus

&& !integer_zerop (DECL_SIZE (fld))

instead.  You don't break groups here with MAX_FIXED_MODE_SIZE, so
I don't think it's ok to do that in the 2nd loop

	  /* Short-circuit out if we have the max bits allowed.  */
	  if (cumulative_bitsize >= MAX_FIXED_MODE_SIZE)
	    {
	      *maxbits = MAX_FIXED_MODE_SIZE;
	      /* Calculate byte offset to the beginning of the bit region.  */
	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
				     start_offset,
				     build_int_cst (integer_type_node,
						    start_bitpos / BITS_PER_UNIT));
	      return;

apart from the *offset calculation being redundant, *offset + maxbits
may not include the referenced field.  How do you plan to find
an "optimal" window for such access? (*)

  /* Count the bitsize of the bitregion containing the field in question.  */
  found = false;
  cumulative_bitsize = 0;
  for (fld = bitregion_start; fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;
      if (fld == field)
	found = true;

      if (DECL_BIT_FIELD_TYPE (fld)
	  && tree_low_cst (DECL_SIZE (fld), 1) > 0)
	{
...
         }
      else if (found)
	break;

should probably be

      if (!DECL_BIT_FIELD_TYPE (fld)
         || integer_zerop (DECL_SIZE (fld)))
        break;

we know that we'll eventually find field.

  /* If we found the end of the bit field sequence, include the
     padding up to the next field...  */
  if (fld)
    {

could be a non-FIELD_DECL, you have to skip those first.

     /* Calculate bitpos and offset of the next field.  */
      get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   fld, NULL_TREE),
			   &tbitsize, &end_bitpos, &end_offset,
			   &tmode, &tunsignedp, &tvolatilep, true);
      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);

      if (end_offset)
	{
	  tree type = TREE_TYPE (end_offset), end;

	  /* Calculate byte offset to the end of the bit region.  */
	  end = fold_build2 (PLUS_EXPR, type,
			     end_offset,
			     build_int_cst (type,
					    end_bitpos / BITS_PER_UNIT));
	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
	}
      else
	maxbits_tree = build_int_cst (integer_type_node,
				      end_bitpos - start_bitpos);

      /* ?? Can we get a variable-lengthened offset here ?? */
      gcc_assert (host_integerp (maxbits_tree, 1));
      *maxbits = TREE_INT_CST_LOW (maxbits_tree);

I think you may end up enlarging maxbits to more than
MAX_FIXED_MODE_SIZE here.  What you instead should do (I think)
is sth along

   *maxbits = MIN (MAX_FIXED_MODE_SIZE,
                            *maxbits
                            + operand_equal_p (DECL_FIELD_OFFSET
(fld), DECL_FIELD_OFFSET (field)) ? DECL_FIELD_BIT_OFFSET (fld) -
DECL_FIELD_BIT_OFFSET (field) : DECL_OFFSET_ALIGN (field) -
DECL_FIELD_BIT_OFFSET (field));

Note that another complication comes to my mind now - the offset
field of a COMPONENT_REF is used to specify a variable offset
and has to be used, if present, instead of DECL_FIELD_OFFSET.
Thus your building of COMPONENT_REFs to then pass them to
get_inner_reference is broken.  As you are in generic code and not
in the C++ frontend I believe you have to properly handle this case
(may I suggest to, at the start of the function, simply return a
minimum byte-aligned blob for the case that there is a variable
offset to the bitfield?)

  /* ...otherwise, this is the last element in the structure.  */
  else
    {
      /* Include the padding at the end of structure.  */
      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
      if (*maxbits > MAX_FIXED_MODE_SIZE)
	*maxbits = MAX_FIXED_MODE_SIZE;
    }

with Andrews answer this is invalid.  You can (and should) at most do

  else
    *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);

thus, round *maxbits up to the next byte.

There is still the general issue of packed bitfields which will probably
make the issue of the computed group not covering all of field more
prominent (esp. if you limit to MAX_FIXED_MODE_SIZE - consider
struct __attribute__((packed)) { long long : 1; long long a : 64; char
c; } where
a does not fit in a DImode mem but crosses it.  Why constrain
*maxbits to MAX_FIXED_MODE_SIZE at all?  Shoudln't the *offset,
*maxbits pair just constrain what the caller does, not force it to actually
use an access covering that full range (does it?)?

Richard.

(*) For bitfield lowering we discussed this a bit and the solution would be
to mirror what place_field does, fill groups until the space for the mode of
the sofar largest field is filled (doesn't work for packed bitfields of course).

> a
>
>
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-10 13:34                                                     ` Richard Guenther
@ 2011-08-15 19:26                                                       ` Aldy Hernandez
  2011-08-27  0:05                                                         ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-15 19:26 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 4241 bytes --]


> Some comments.
>
>        /* If we have a bit-field with a bitsize>  0... */
>        if (DECL_BIT_FIELD_TYPE (fld)
> 	&&  tree_low_cst (DECL_SIZE (fld), 1)>  0)
>
> I think we can check bitsize != 0, thus
>
> &&  !integer_zerop (DECL_SIZE (fld))

Done

> 	  /* Short-circuit out if we have the max bits allowed.  */
> 	  if (cumulative_bitsize>= MAX_FIXED_MODE_SIZE)
> 	    {
> 	      *maxbits = MAX_FIXED_MODE_SIZE;
> 	      /* Calculate byte offset to the beginning of the bit region.  */
> 	      gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
> 	      *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
> 				     start_offset,
> 				     build_int_cst (integer_type_node,
> 						    start_bitpos / BITS_PER_UNIT));
> 	      return;
>
> apart from the *offset calculation being redundant, *offset + maxbits
> may not include the referenced field.  How do you plan to find
> an "optimal" window for such access? (*)

Actually offset is always needed because we use it in store_bit_field() 
to adjust the memory reference up to the bit region.  However... I have 
removed the MAX_FIXED_MODE_SIZE constrain all throughout.  See 
explanation below.

>        if (!DECL_BIT_FIELD_TYPE (fld)
>           || integer_zerop (DECL_SIZE (fld)))
>          break;
>
> we know that we'll eventually find field.

Good catch.  Done.

>    /* If we found the end of the bit field sequence, include the
>       padding up to the next field...  */
>    if (fld)
>      {
>
> could be a non-FIELD_DECL, you have to skip those first.

I can't find an example of a non-FIELD_DECL here for the life of me. 
I've run numerous tests and can't trigger one.  Do you have one in mind 
so I can handle it?

> Note that another complication comes to my mind now - the offset
> field of a COMPONENT_REF is used to specify a variable offset
> and has to be used, if present, instead of DECL_FIELD_OFFSET.
> Thus your building of COMPONENT_REFs to then pass them to
> get_inner_reference is broken.  As you are in generic code and not
> in the C++ frontend I believe you have to properly handle this case
> (may I suggest to, at the start of the function, simply return a
> minimum byte-aligned blob for the case that there is a variable
> offset to the bitfield?)

Done.

BTW, where would this happen?  I'd like an actual test to stick in 
there.  Is this for non C/C++?

>    /* ...otherwise, this is the last element in the structure.  */
>    else
>      {
>        /* Include the padding at the end of structure.  */
>        *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
> 	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
>        if (*maxbits>  MAX_FIXED_MODE_SIZE)
> 	*maxbits = MAX_FIXED_MODE_SIZE;
>      }
>
> with Andrews answer this is invalid.  You can (and should) at most do
>
>    else
>      *maxbits = (*maxbits + BITS_PER_UNIT - 1)&  ~(BITS_PER_UNIT - 1);

I have done this, but I have also removed the constraint to 
MAX_FIXED_MODE_SIZE.  See below.

> There is still the general issue of packed bitfields which will probably
> make the issue of the computed group not covering all of field more
> prominent (esp. if you limit to MAX_FIXED_MODE_SIZE - consider
> struct __attribute__((packed)) { long long : 1; long long a : 64; char
> c; } where
> a does not fit in a DImode mem but crosses it.  Why constrain
> *maxbits to MAX_FIXED_MODE_SIZE at all?  Shoudln't the *offset,
> *maxbits pair just constrain what the caller does, not force it to actually
> use an access covering that full range (does it?)?

Well, I have removed the MAX_FIXED_MODE_SIZE restriction, since every 
offset adjustment in the bit field machinery will need a corresponding 
MAXBITS adjustment.  We can end up reducing MAXBITS into negative 
territory.  So we really have to keep track of the actual number of 
bits.  So, we've converged, just for different reasons :).

I have bootstrapped the compiler with the bitfield restrictions in 
place.  This unearthed a few corner cases, which we are now handling 
correctly with this revision.  So, please take a look at the entire 
patch, as there are small changes throughout.

I am including both the entire patch, and the get_bit_range() function 
separately, to make it easier to review.

Thanks.

[-- Attachment #2: get-bit-range-function --]
[-- Type: text/plain, Size: 4700 bytes --]

/* In the C++ memory model, consecutive non-zero bit fields in a
   structure are considered one memory location.

   Given a COMPONENT_REF, this function calculates the byte offset
   from the containing object to the start of the contiguous bit
   region containing the field in question.  This byte offset is
   returned in *OFFSET.

   The maximum number of bits that can be addressed while storing into
   the COMPONENT_REF is returned in *MAXBITS.  This number is the
   number of bits in the contiguous bit region, including any
   padding.  */

static void
get_bit_range (tree exp, tree *offset, HOST_WIDE_INT *maxbits)
{
  tree field, record_type, fld;
  bool prev_field_is_bitfield;
  tree start_offset;
  tree start_bitpos_direct_parent = NULL_TREE;
  HOST_WIDE_INT start_bitpos;
  HOST_WIDE_INT cumulative_bitsize = 0;
  /* First field of the bitfield group containing the bitfield we are
     referencing.  */
  tree bitregion_start;

  HOST_WIDE_INT tbitsize;
  enum machine_mode tmode;
  int tunsignedp, tvolatilep;

  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);

  /* Be as conservative as possible on variable offsets.  */
  if (TREE_OPERAND (exp, 2)
      && !host_integerp (TREE_OPERAND (exp, 2), 1))
    {
      *offset = TREE_OPERAND (exp, 2);
      *maxbits = BITS_PER_UNIT;
      return;
    }

  /* Bit field we're storing into.  */
  field = TREE_OPERAND (exp, 1);
  record_type = DECL_FIELD_CONTEXT (field);

  /* Find the bitfield group containing the field in question, and set
     BITREGION_START to the start of the group.  */
  prev_field_is_bitfield = false;
  bitregion_start = NULL_TREE;
  for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;

      /* If we have a non-zero bit-field.  */
      if (DECL_BIT_FIELD_TYPE (fld)
	  && !integer_zerop (DECL_SIZE (fld)))
	{
	  if (!prev_field_is_bitfield)
	    {
	      bitregion_start = fld;
	      prev_field_is_bitfield = true;
	    }
	}
      else
	prev_field_is_bitfield = false;
      if (fld == field)
	break;
    }
  gcc_assert (bitregion_start);
  gcc_assert (fld);

  /* Save the starting position of the bitregion.  */
  get_inner_reference (build3 (COMPONENT_REF,
			       TREE_TYPE (exp),
			       TREE_OPERAND (exp, 0),
			       bitregion_start, NULL_TREE),
		       &tbitsize, &start_bitpos, &start_offset,
		       &tmode, &tunsignedp, &tvolatilep, true);

  if (!start_offset)
    start_offset = size_zero_node;
  /* Calculate byte offset to the beginning of the bit region.  */
  /* OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
  *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
			 start_offset,
			 build_int_cst (integer_type_node,
					start_bitpos / BITS_PER_UNIT));

  /* Save the bit offset of the current structure.  */
  start_bitpos_direct_parent = DECL_FIELD_BIT_OFFSET (bitregion_start);

  /* Count the bitsize of the bitregion containing the field in question.  */
  cumulative_bitsize = 0;
  for (fld = bitregion_start; fld; fld = DECL_CHAIN (fld))
    {
      if (TREE_CODE (fld) != FIELD_DECL)
	continue;

      if (!DECL_BIT_FIELD_TYPE (fld)
	  || integer_zerop (DECL_SIZE (fld)))
	break;

      cumulative_bitsize += tree_low_cst (DECL_SIZE (fld), 1);
    }

  /* If we found the end of the bit field sequence, include the
     padding up to the next field...  */
  if (fld)
    {
      tree end_offset, maxbits_tree;
      HOST_WIDE_INT end_bitpos;

      /* Calculate bitpos and offset of the next field.  */
      get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   fld, NULL_TREE),
			   &tbitsize, &end_bitpos, &end_offset,
			   &tmode, &tunsignedp, &tvolatilep, true);
      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);

      if (end_offset)
	{
	  tree type = TREE_TYPE (end_offset), end;

	  /* Calculate byte offset to the end of the bit region.  */
	  end = fold_build2 (PLUS_EXPR, type,
			     end_offset,
			     build_int_cst (type,
					    end_bitpos / BITS_PER_UNIT));
	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
	}
      else
	maxbits_tree = build_int_cst (integer_type_node,
				      end_bitpos - start_bitpos);

      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
    }
  /* ...otherwise, this is the last element in the structure.  */
  else
    {
      /* Include the padding at the end of structure.  */
      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
      /* Round up to the next byte.  */
      *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
    }
}

[-- Attachment #3: curr --]
[-- Type: text/plain, Size: 36103 bytes --]

	* machmode.h (get_best_mode): Remove 2 arguments.
	* fold-const.c (optimize_bit_field_compare): Same.
	(fold_truthop): Same.
	* expr.c (store_field): Change argument types in prototype.
	(emit_group_store): Change argument types to store_bit_field call.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Change argument types.
	Change arguments to get_best_mode.
	(get_bit_range): Rewrite.
	(expand_assignment): Adjust new call to get_bit_range.
	Adjust bitregion_offset when to_rtx is changed.
	Adjust calls to store_field with new argument types.
	(store_field): New argument types.
	Adjust calls to store_bit_field with new arguments.
	* expr.h (store_bit_field): Change argument types.
	* stor-layout.c (get_best_mode): Remove use of bitregion* arguments.
	* expmed.c (store_bit_field_1): Change argument types.
	Do not calculate maxbits.
	Adjust bitregion_maxbits if offset changes.
	(store_bit_field): Change argument types.
	Adjust address taking into account bitregion_offset.
	(store_fixed_bit_field): Change argument types.
	Do not calculate maxbits.
	(store_split_bit_field): Change argument types.
	(extract_bit_field_1): Adjust arguments to get_best_mode.
	(extract_fixed_bit_field): Same.

Index: machmode.h
===================================================================
--- machmode.h	(revision 176891)
+++ machmode.h	(working copy)
@@ -249,8 +249,6 @@ extern enum machine_mode mode_for_vector
 /* Find the best mode to use to access a bit field.  */
 
 extern enum machine_mode get_best_mode (int, int,
-					unsigned HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT,
 					unsigned int,
 					enum machine_mode, int);
 
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 176891)
+++ fold-const.c	(working copy)
@@ -3394,7 +3394,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
+    nmode = get_best_mode (lbitsize, lbitpos,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5221,7 +5221,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5286,7 +5286,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: testsuite/c-c++-common/cxxbitfields-6.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  char a;
+  int b:7;
+  int :0;
+  volatile int c:7;
+  unsigned char d;
+} x;
+
+/* Store into <c> should not clobber <d>.  */
+void update_c(struct bits *p, int val) 
+{
+    p -> c = val;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-8.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
@@ -0,0 +1,29 @@
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-O --param allow-store-data-races=0" } */
+
+struct bits {
+  /* Make sure the bit position of the bitfield is larger than what
+     can be represented in an unsigned HOST_WIDE_INT, to force
+     get_inner_reference() to return something in POFFSET.  */
+      
+  struct {
+    int some_padding[1<<30];
+    char more_padding;
+  } pad[1<<29];
+
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+struct bits *p;
+
+/* Test that the store into <bitfield> is not done with something
+   wider than a byte move.  */
+void foo()
+{
+  p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-7.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  int some_padding;
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+/* Store into <bitfield> should not clobber <b>.  */
+void update(struct bits *p)
+{
+    p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 176891)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,8 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y);
+	      store_bit_field (op, size, start, integer_zero_node, 0,
+			       GET_MODE (x), y);
 	      return;
 	    }
 
@@ -940,7 +941,7 @@ noce_emit_move_insn (rtx x, rtx y)
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
   store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
-		   0, 0, outmode, y);
+		   integer_zero_node, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 176891)
+++ expr.c	(working copy)
@@ -145,7 +145,7 @@ static void store_constructor_field (rtx
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
-			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			tree, HOST_WIDE_INT,
 			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
@@ -2077,7 +2077,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i]);
+			 integer_zero_node, 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2171,7 +2171,8 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD,
+		       integer_zero_node, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2808,7 +2809,8 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
+		   integer_zero_node, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3943,8 +3945,7 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
-				 unsigned HOST_WIDE_INT bitregion_start,
-				 unsigned HOST_WIDE_INT bitregion_end,
+				 HOST_WIDE_INT bitregion_maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -4005,8 +4006,9 @@ optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
+      if (bitregion_maxbits && bitregion_maxbits < GET_MODE_BITSIZE (str_mode))
+	str_mode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
       str_mode = get_best_mode (bitsize, bitpos,
-				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4115,57 +4117,44 @@ optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
-/* In the C++ memory model, consecutive bit fields in a structure are
-   considered one memory location.
+/* In the C++ memory model, consecutive non-zero bit fields in a
+   structure are considered one memory location.
 
-   Given a COMPONENT_REF, this function returns the bit range of
-   consecutive bits in which this COMPONENT_REF belongs in.  The
-   values are returned in *BITSTART and *BITEND.  If either the C++
-   memory model is not activated, or this memory access is not thread
-   visible, 0 is returned in *BITSTART and *BITEND.
-
-   EXP is the COMPONENT_REF.
-   INNERDECL is the actual object being referenced.
-   BITPOS is the position in bits where the bit starts within the structure.
-   BITSIZE is size in bits of the field being referenced in EXP.
-
-   For example, while storing into FOO.A here...
-
-      struct {
-        BIT 0:
-          unsigned int a : 4;
-	  unsigned int b : 1;
-	BIT 8:
-	  unsigned char c;
-	  unsigned int d : 6;
-      } foo;
-
-   ...we are not allowed to store past <b>, so for the layout above, a
-   range of 0..7 (because no one cares if we store into the
-   padding).  */
+   Given a COMPONENT_REF, this function calculates the byte offset
+   from the containing object to the start of the contiguous bit
+   region containing the field in question.  This byte offset is
+   returned in *OFFSET.
+
+   The maximum number of bits that can be addressed while storing into
+   the COMPONENT_REF is returned in *MAXBITS.  This number is the
+   number of bits in the contiguous bit region, including any
+   padding.  */
 
 static void
-get_bit_range (unsigned HOST_WIDE_INT *bitstart,
-	       unsigned HOST_WIDE_INT *bitend,
-	       tree exp, tree innerdecl,
-	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+get_bit_range (tree exp, tree *offset, HOST_WIDE_INT *maxbits)
 {
   tree field, record_type, fld;
-  bool found_field = false;
   bool prev_field_is_bitfield;
+  tree start_offset;
+  tree start_bitpos_direct_parent = NULL_TREE;
+  HOST_WIDE_INT start_bitpos;
+  HOST_WIDE_INT cumulative_bitsize = 0;
+  /* First field of the bitfield group containing the bitfield we are
+     referencing.  */
+  tree bitregion_start;
+
+  HOST_WIDE_INT tbitsize;
+  enum machine_mode tmode;
+  int tunsignedp, tvolatilep;
 
   gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
 
-  /* If other threads can't see this value, no need to restrict stores.  */
-  if (ALLOW_STORE_DATA_RACES
-      || ((TREE_CODE (innerdecl) == MEM_REF
-	   || TREE_CODE (innerdecl) == TARGET_MEM_REF)
-	  && !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
-      || (DECL_P (innerdecl)
-	  && (DECL_THREAD_LOCAL_P (innerdecl)
-	      || !TREE_STATIC (innerdecl))))
+  /* Be as conservative as possible on variable offsets.  */
+  if (TREE_OPERAND (exp, 2)
+      && !host_integerp (TREE_OPERAND (exp, 2), 1))
     {
-      *bitstart = *bitend = 0;
+      *offset = TREE_OPERAND (exp, 2);
+      *maxbits = BITS_PER_UNIT;
       return;
     }
 
@@ -4173,56 +4162,109 @@ get_bit_range (unsigned HOST_WIDE_INT *b
   field = TREE_OPERAND (exp, 1);
   record_type = DECL_FIELD_CONTEXT (field);
 
-  /* Count the contiguous bitfields for the memory location that
-     contains FIELD.  */
-  *bitstart = 0;
-  prev_field_is_bitfield = true;
+  /* Find the bitfield group containing the field in question, and set
+     BITREGION_START to the start of the group.  */
+  prev_field_is_bitfield = false;
+  bitregion_start = NULL_TREE;
   for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
     {
-      tree t, offset;
-      enum machine_mode mode;
-      int unsignedp, volatilep;
-
       if (TREE_CODE (fld) != FIELD_DECL)
 	continue;
 
-      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
-		  unshare_expr (TREE_OPERAND (exp, 0)),
-		  fld, NULL_TREE);
-      get_inner_reference (t, &bitsize, &bitpos, &offset,
-			   &mode, &unsignedp, &volatilep, true);
-
-      if (field == fld)
-	found_field = true;
-
-      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+      /* If we have a non-zero bit-field.  */
+      if (DECL_BIT_FIELD_TYPE (fld)
+	  && !integer_zerop (DECL_SIZE (fld)))
 	{
-	  if (prev_field_is_bitfield == false)
+	  if (!prev_field_is_bitfield)
 	    {
-	      *bitstart = bitpos;
+	      bitregion_start = fld;
 	      prev_field_is_bitfield = true;
 	    }
 	}
       else
-	{
-	  prev_field_is_bitfield = false;
-	  if (found_field)
-	    break;
-	}
+	prev_field_is_bitfield = false;
+      if (fld == field)
+	break;
     }
-  gcc_assert (found_field);
+  gcc_assert (bitregion_start);
+  gcc_assert (fld);
 
+  /* Save the starting position of the bitregion.  */
+  get_inner_reference (build3 (COMPONENT_REF,
+			       TREE_TYPE (exp),
+			       TREE_OPERAND (exp, 0),
+			       bitregion_start, NULL_TREE),
+		       &tbitsize, &start_bitpos, &start_offset,
+		       &tmode, &tunsignedp, &tvolatilep, true);
+
+  if (!start_offset)
+    start_offset = size_zero_node;
+  /* Calculate byte offset to the beginning of the bit region.  */
+  /* OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
+  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
+  *offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
+			 start_offset,
+			 build_int_cst (integer_type_node,
+					start_bitpos / BITS_PER_UNIT));
+
+  /* Save the bit offset of the current structure.  */
+  start_bitpos_direct_parent = DECL_FIELD_BIT_OFFSET (bitregion_start);
+
+  /* Count the bitsize of the bitregion containing the field in question.  */
+  cumulative_bitsize = 0;
+  for (fld = bitregion_start; fld; fld = DECL_CHAIN (fld))
+    {
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      if (!DECL_BIT_FIELD_TYPE (fld)
+	  || integer_zerop (DECL_SIZE (fld)))
+	break;
+
+      cumulative_bitsize += tree_low_cst (DECL_SIZE (fld), 1);
+    }
+
+  /* If we found the end of the bit field sequence, include the
+     padding up to the next field...  */
   if (fld)
     {
-      /* We found the end of the bit field sequence.  Include the
-	 padding up to the next field and be done.  */
-      *bitend = bitpos - 1;
+      tree end_offset, maxbits_tree;
+      HOST_WIDE_INT end_bitpos;
+
+      /* Calculate bitpos and offset of the next field.  */
+      get_inner_reference (build3 (COMPONENT_REF,
+				   TREE_TYPE (exp),
+				   TREE_OPERAND (exp, 0),
+				   fld, NULL_TREE),
+			   &tbitsize, &end_bitpos, &end_offset,
+			   &tmode, &tunsignedp, &tvolatilep, true);
+      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);
+
+      if (end_offset)
+	{
+	  tree type = TREE_TYPE (end_offset), end;
+
+	  /* Calculate byte offset to the end of the bit region.  */
+	  end = fold_build2 (PLUS_EXPR, type,
+			     end_offset,
+			     build_int_cst (type,
+					    end_bitpos / BITS_PER_UNIT));
+	  maxbits_tree = fold_build2 (MINUS_EXPR, type, end, *offset);
+	}
+      else
+	maxbits_tree = build_int_cst (integer_type_node,
+				      end_bitpos - start_bitpos);
+
+      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
     }
+  /* ...otherwise, this is the last element in the structure.  */
   else
     {
-      /* If this is the last element in the structure, include the padding
-	 at the end of structure.  */
-      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+      /* Include the padding at the end of structure.  */
+      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
+	- TREE_INT_CST_LOW (start_bitpos_direct_parent);
+      /* Round up to the next byte.  */
+      *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
     }
 }
 
@@ -4324,12 +4366,14 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
-      unsigned HOST_WIDE_INT bitregion_start = 0;
-      unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
       tree tem;
+      tree bitregion_offset = size_zero_node;
+      /* Set to 0 for the special case where there is no restriction
+	 in play.  */
+      HOST_WIDE_INT bitregion_maxbits = 0;
 
       push_temp_slots ();
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
@@ -4337,8 +4381,26 @@ expand_assignment (tree to, tree from, b
 
       if (TREE_CODE (to) == COMPONENT_REF
 	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
-	get_bit_range (&bitregion_start, &bitregion_end,
-		       to, tem, bitpos, bitsize);
+	{
+	  /* If other threads can't see this value, no need to
+	     restrict stores.  */
+	  if (ALLOW_STORE_DATA_RACES
+	      || ((TREE_CODE (tem) == MEM_REF
+		   || TREE_CODE (tem) == TARGET_MEM_REF)
+		  && !ptr_deref_may_alias_global_p (TREE_OPERAND (tem, 0)))
+	      || TREE_CODE (tem) == RESULT_DECL
+	      || (DECL_P (tem)
+		  && (DECL_THREAD_LOCAL_P (tem)
+		      || !TREE_STATIC (tem))))
+	    {
+	      bitregion_offset = size_zero_node;
+	      /* Set to 0 for the special case where there is no
+		 restriction in play.  */
+	      bitregion_maxbits = 0;
+	    }
+	  else
+	    get_bit_range (to, &bitregion_offset, &bitregion_maxbits);
+	}
 
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
@@ -4388,12 +4450,19 @@ expand_assignment (tree to, tree from, b
 	      && MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1))
 	    {
 	      to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT);
+	      bitregion_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					      bitregion_offset,
+					      build_int_cst (integer_type_node,
+							     bitpos / BITS_PER_UNIT));
 	      bitpos = 0;
 	    }
 
 	  to_rtx = offset_address (to_rtx, offset_rtx,
 				   highest_pow2_factor_for_target (to,
 				   				   offset));
+	  bitregion_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					  bitregion_offset,
+					  offset);
 	}
 
       /* No action is needed if the target is not a memory and the field
@@ -4421,13 +4490,13 @@ expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -4450,7 +4519,7 @@ expand_assignment (tree to, tree from, b
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
 	      result = store_field (temp, bitsize, bitpos,
-				    bitregion_start, bitregion_end,
+				    bitregion_offset, bitregion_maxbits,
 				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
@@ -4477,13 +4546,13 @@ expand_assignment (tree to, tree from, b
 	    }
 
 	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
-					       bitregion_start, bitregion_end,
+					       bitregion_maxbits,
 					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_offset, bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -4877,7 +4946,7 @@ store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, 0, 0, GET_MODE (temp), temp);
+			     0, integer_zero_node, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5342,8 +5411,8 @@ store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, 0, 0, mode, exp, type, alias_set,
-		 false);
+    store_field (target, bitsize, bitpos, integer_zero_node, 0, mode, exp,
+		 type, alias_set, false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5917,10 +5986,10 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_OFFSET is the byte offset from the beginning of the
+   containing object to the start of the bit region.
+   BITREGION_MAXBITS is the size in bits of the largest mode that can
+   be used to set the bit-field in question.
 
    Always return const0_rtx unless we have something particular to
    return.
@@ -5935,8 +6004,8 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
-	     unsigned HOST_WIDE_INT bitregion_start,
-	     unsigned HOST_WIDE_INT bitregion_end,
+	     tree bitregion_offset,
+	     HOST_WIDE_INT bitregion_maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5970,7 +6039,7 @@ store_field (rtx target, HOST_WIDE_INT b
 	emit_move_insn (object, target);
 
       store_field (blk_object, bitsize, bitpos,
-		   bitregion_start, bitregion_end,
+		   bitregion_offset, bitregion_maxbits,
 		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
@@ -6086,7 +6155,7 @@ store_field (rtx target, HOST_WIDE_INT b
 
       /* Store the value in the bitfield.  */
       store_bit_field (target, bitsize, bitpos,
-		       bitregion_start, bitregion_end,
+		       bitregion_offset, bitregion_maxbits,
 		       mode, temp);
 
       return const0_rtx;
Index: expr.h
===================================================================
--- expr.h	(revision 176891)
+++ expr.h	(working copy)
@@ -666,8 +666,8 @@ mode_for_extraction (enum extraction_pat
 
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
+			     tree,
+			     HOST_WIDE_INT,
 			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 176891)
+++ stor-layout.c	(working copy)
@@ -2361,13 +2361,6 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
-   BITREGION_START is the bit position of the first bit in this
-   sequence of bit fields.  BITREGION_END is the last bit in this
-   sequence.  If these two fields are non-zero, we should restrict the
-   memory access to a maximum sized chunk of
-   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
-   any adjacent non bit-fields.
-
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2386,20 +2379,11 @@ fixup_unsigned_type (tree type)
 
 enum machine_mode
 get_best_mode (int bitsize, int bitpos,
-	       unsigned HOST_WIDE_INT bitregion_start,
-	       unsigned HOST_WIDE_INT bitregion_end,
 	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
-  unsigned HOST_WIDE_INT maxbits;
-
-  /* If unset, no restriction.  */
-  if (!bitregion_end)
-    maxbits = MAX_FIXED_MODE_SIZE;
-  else
-    maxbits = (bitregion_end - bitregion_start) % align + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2436,7 +2420,6 @@ get_best_mode (int bitsize, int bitpos,
 	      && bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
-	      && unit <= maxbits
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: expmed.c
===================================================================
--- expmed.c	(revision 176891)
+++ expmed.c	(working copy)
@@ -48,13 +48,11 @@ struct target_expmed *this_target_expmed
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   HOST_WIDE_INT,
 				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   HOST_WIDE_INT,
 				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
@@ -340,8 +338,7 @@ mode_for_extraction (enum extraction_pat
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		   unsigned HOST_WIDE_INT bitnum,
-		   unsigned HOST_WIDE_INT bitregion_start,
-		   unsigned HOST_WIDE_INT bitregion_end,
+		   HOST_WIDE_INT bitregion_maxbits,
 		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
@@ -558,7 +555,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
 				  bitnum + bit_offset,
-				  bitregion_start, bitregion_end,
+				  bitregion_maxbits,
 				  word_mode,
 				  value_word, fallback_p))
 	    {
@@ -722,10 +719,6 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -733,15 +726,19 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
-	  || GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits
+	  || (bitregion_maxbits
+	      && GET_MODE_BITSIZE (GET_MODE (op0)) > bitregion_maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode  (bitsize, bitnum,
-				  bitregion_start, bitregion_end,
-				  MEM_ALIGN (op0),
-				  (op_mode == MAX_MACHINE_MODE
-				   ? VOIDmode : op_mode),
-				  MEM_VOLATILE_P (op0));
+	{
+	  bestmode = (op_mode == MAX_MACHINE_MODE ? VOIDmode : op_mode);
+	  if (bitregion_maxbits && bitregion_maxbits < GET_MODE_SIZE (op_mode))
+	    bestmode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
+	  bestmode = get_best_mode  (bitsize, bitnum,
+				     MEM_ALIGN (op0),
+				     bestmode,
+				     MEM_VOLATILE_P (op0));
+	}
       else
 	bestmode = GET_MODE (op0);
 
@@ -752,6 +749,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	{
 	  rtx last, tempreg, xop0;
 	  unsigned HOST_WIDE_INT xoffset, xbitpos;
+	  HOST_WIDE_INT xmaxbits = bitregion_maxbits;
 
 	  last = get_last_insn ();
 
@@ -762,12 +760,13 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  xoffset = (bitnum / unit) * GET_MODE_SIZE (bestmode);
 	  xbitpos = bitnum % unit;
 	  xop0 = adjust_address (op0, bestmode, xoffset);
+	  if (xmaxbits)
+	    xmaxbits -= xoffset * BITS_PER_UNIT;
 
 	  /* Fetch that unit, store the bitfield in it, then store
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
-	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
-				 bitregion_start, bitregion_end,
+	  if (store_bit_field_1 (tempreg, bitsize, xbitpos, xmaxbits,
 				 fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
@@ -780,8 +779,10 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (!fallback_p)
     return false;
 
+  if (bitregion_maxbits)
+    bitregion_maxbits -= offset * BITS_PER_UNIT;
   store_fixed_bit_field (op0, offset, bitsize, bitpos,
-			 bitregion_start, bitregion_end, value);
+			 bitregion_maxbits, value);
   return true;
 }
 
@@ -789,18 +790,17 @@ store_bit_field_1 (rtx str_rtx, unsigned
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_OFFSET is the byte offset STR_RTX to the start of the bit
+   region.  BITREGION_MAXBITS is the number of bits of the largest
+   mode that can be used to set the bit-field in question.
 
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		 unsigned HOST_WIDE_INT bitnum,
-		 unsigned HOST_WIDE_INT bitregion_start,
-		 unsigned HOST_WIDE_INT bitregion_end,
+		 tree bitregion_offset,
+		 HOST_WIDE_INT bitregion_maxbits,
 		 enum machine_mode fieldmode,
 		 rtx value)
 {
@@ -808,30 +808,29 @@ store_bit_field (rtx str_rtx, unsigned H
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
   if (MEM_P (str_rtx)
-      && bitregion_start > 0)
+      && bitregion_maxbits
+      && !integer_zerop (bitregion_offset))
     {
-      enum machine_mode bestmode;
-      enum machine_mode op_mode;
-      unsigned HOST_WIDE_INT offset;
+      HOST_WIDE_INT offset;
 
-      op_mode = mode_for_extraction (EP_insv, 3);
-      if (op_mode == MAX_MACHINE_MODE)
-	op_mode = VOIDmode;
-
-      offset = bitregion_start / BITS_PER_UNIT;
-      bitnum -= bitregion_start;
-      bitregion_end -= bitregion_start;
-      bitregion_start = 0;
-      bestmode = get_best_mode (bitsize, bitnum,
-				bitregion_start, bitregion_end,
-				MEM_ALIGN (str_rtx),
-				op_mode,
-				MEM_VOLATILE_P (str_rtx));
-      str_rtx = adjust_address (str_rtx, bestmode, offset);
+      if (host_integerp (bitregion_offset, 1))
+	{
+	  /* Adjust the bit position accordingly.  */
+	  offset = tree_low_cst (bitregion_offset, 1);
+	  bitnum -= offset * BITS_PER_UNIT;
+	  /* Adjust the actual address.  */
+	  str_rtx = adjust_address (str_rtx, GET_MODE (str_rtx), offset);
+	}
+      else
+	{
+	  /* Handle variable length offsets.  */
+	  str_rtx = offset_address (str_rtx,
+				    expand_normal (bitregion_offset), 1);
+	}
+      bitregion_offset = integer_zero_node;
     }
 
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
-			  bitregion_start, bitregion_end,
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, bitregion_maxbits,
 			  fieldmode, value, true))
     gcc_unreachable ();
 }
@@ -849,8 +848,7 @@ static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   enum machine_mode mode;
@@ -872,19 +870,12 @@ store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos,
-				 bitregion_start, bitregion_end,
-				 value);
+	  store_split_bit_field (op0, bitsize, bitpos, bitregion_maxbits, value);
 	  return;
 	}
     }
   else
     {
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
-
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
 	 a word, we won't be doing the extraction the normal way.
@@ -897,20 +888,26 @@ store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
-	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
+	  && (!bitregion_maxbits
+	      || GET_MODE_BITSIZE (GET_MODE (op0)) <= bitregion_maxbits)
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
-			      bitregion_start, bitregion_end,
-			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	{
+	  if (bitregion_maxbits && bitregion_maxbits < GET_MODE_BITSIZE (mode))
+	    mode = smallest_mode_for_size (bitregion_maxbits, MODE_INT);
+	  mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+				MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	}
 
       if (mode == VOIDmode)
 	{
+	  if (bitregion_maxbits)
+	    bitregion_maxbits -= offset * BITS_PER_UNIT;
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 bitregion_start, bitregion_end, value);
+				 bitregion_maxbits, value);
 	  return;
 	}
 
@@ -1031,8 +1028,7 @@ store_fixed_bit_field (rtx op0, unsigned
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   unsigned int unit;
@@ -1147,8 +1143,13 @@ store_split_bit_field (rtx op0, unsigned
 	 store_fixed_bit_field wants offset in bytes.  If WORD is const0_rtx,
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
-	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, bitregion_start, bitregion_end, part);
+	{
+	  HOST_WIDE_INT xmaxbits = bitregion_maxbits;
+	  if (bitregion_maxbits)
+	    xmaxbits -= offset * unit / BITS_PER_UNIT;
+	  store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
+				 thispos, xmaxbits, part);
+	}
       bitsdone += thissize;
     }
 }
@@ -1588,7 +1589,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1714,7 +1715,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-15 19:26                                                       ` Aldy Hernandez
@ 2011-08-27  0:05                                                         ` Aldy Hernandez
  2011-08-29 12:54                                                           ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-27  0:05 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2028 bytes --]

This is a "slight" update from the last revision, with your issues 
addressed as I explained in the last email.  However, everything turned 
out to be much tricker than I expected (variable length offsets with 
arrays, bit fields spanning multiple words, surprising padding 
gymnastics by GCC, etc etc).

It turns out that what we need is to know the precise bit region size at 
all times, and adjust it as we rearrange and cut things into pieces 
throughout the RTL bit field machinery.

I enabled the C++ memory model, and forced a boostrap and regression 
test with it.  This brought about many interesting cases, which I was 
able to distill and add to the testsuite.

Of particular interest was the struct-layout-1.exp tests.  Since many of 
the tests set a global bit field, only to later check it against a local 
variable containing the same value, it is the perfect stressor because, 
while globals are restricted under the memory model, locals are not.  So 
we can check that we can interoperate with the less restrictive model, 
and that the patch does not introduce ABI inconsistencies.  After much 
grief, we are now passing all the struct-layout-1.exp tests. 
Eventually, I'd like to force the struct-layout-1.exp tests to run for 
"--param allow-store-data-races=0" as well.  Unfortunately, this will 
increase testing time.

I have (unfortunately) introduced an additional call to 
get_inner_reference(), but only for the field itself (one time).  I 
can't remember the details, but it was something to effect of the bit 
position + padding being impossible to calculate in one variable array 
reference case.  I can dig up the case if you'd like.

I am currently tackling a reload miscompilation failure while building a 
32-bit library.  I am secretly hoping your review will uncover the flaw 
without me having to pick this up.  Otherwise, this is a much more 
comprehensive approach than what is currently in mainline, and we now 
pass all the bitfield tests the GCC testsuite could throw at it.

Fire away.

[-- Attachment #2: curr --]
[-- Type: text/plain, Size: 44911 bytes --]

	* machmode.h (get_best_mode): Remove 2 arguments.
	* fold-const.c (optimize_bit_field_compare): Same.
	(fold_truthop): Same.
	* expr.c (store_field): Change argument types in prototype.
	(emit_group_store): Change argument types to store_bit_field call.
	(copy_blkmode_from_reg): Same.
	(write_complex_part): Same.
	(optimize_bitfield_assignment_op): Change argument types.
	Change arguments to get_best_mode.
	(get_bit_range): Rewrite.
	(expand_assignment): Adjust new call to get_bit_range.
	Adjust bitregion_offset when to_rtx is changed.
	Adjust calls to store_field with new argument types.
	(store_field): New argument types.
	Adjust calls to store_bit_field with new arguments.
	* expr.h (store_bit_field): Change argument types.
	* stor-layout.c (get_best_mode): Remove use of bitregion* arguments.
	* expmed.c (store_bit_field_1): Change argument types.
	Do not calculate maxbits.
	Adjust bitregion_maxbits if offset changes.
	(store_bit_field): Change argument types.
	Adjust address taking into account bitregion_offset.
	(store_fixed_bit_field): Change argument types.
	Do not calculate maxbits.
	(store_split_bit_field): Change argument types.
	(extract_bit_field_1): Adjust arguments to get_best_mode.
	(extract_fixed_bit_field): Same.

Index: machmode.h
===================================================================
--- machmode.h	(revision 176891)
+++ machmode.h	(working copy)
@@ -249,8 +249,6 @@ extern enum machine_mode mode_for_vector
 /* Find the best mode to use to access a bit field.  */
 
 extern enum machine_mode get_best_mode (int, int,
-					unsigned HOST_WIDE_INT,
-					unsigned HOST_WIDE_INT,
 					unsigned int,
 					enum machine_mode, int);
 
Index: fold-const.c
===================================================================
--- fold-const.c	(revision 176891)
+++ fold-const.c	(working copy)
@@ -3394,7 +3394,7 @@ optimize_bit_field_compare (location_t l
       && flag_strict_volatile_bitfields > 0)
     nmode = lmode;
   else
-    nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
+    nmode = get_best_mode (lbitsize, lbitpos,
 			   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
 			   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
 				  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5221,7 +5221,7 @@ fold_truthop (location_t loc, enum tree_
      to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit,
 			  TYPE_ALIGN (TREE_TYPE (ll_inner)), word_mode,
 			  volatilep);
   if (lnmode == VOIDmode)
@@ -5286,7 +5286,7 @@ fold_truthop (location_t loc, enum tree_
 
       first_bit = MIN (lr_bitpos, rr_bitpos);
       end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-      rnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+      rnmode = get_best_mode (end_bit - first_bit, first_bit,
 			      TYPE_ALIGN (TREE_TYPE (lr_inner)), word_mode,
 			      volatilep);
       if (rnmode == VOIDmode)
Index: testsuite/c-c++-common/cxxbitfields-9.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-9.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-9.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+
+enum bigenum
+{ bigee = 12345678901LL
+};
+
+struct objtype
+{
+  enum bigenum a;
+  int b:25;
+  int c:15;
+  signed char d;
+  unsigned int e[3] __attribute__ ((aligned));
+  int f;
+};
+
+struct objtype obj;
+
+void foo(){
+  obj.c = 33;
+}
Index: testsuite/c-c++-common/cxxbitfields-10.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-10.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-10.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+
+/* Variable length offsets with the bit field not ending the record.  */
+   
+typedef struct
+{
+  short f:3, g:3, h:10;
+  char xxx;
+} small;
+
+struct sometype
+{
+  int i;
+  small s[10];
+} x;
+
+int main ()
+{
+  int i;
+  for (i = 0; i < 10; i++)
+    x.s[i].f = 0;
+  return 0;
+}
Index: testsuite/c-c++-common/cxxbitfields-12.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-12.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-12.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+struct stuff_type
+{
+  double a;
+  int b:27;
+  int c:9;
+  int d:9;
+  unsigned char e;
+} stuff;
+
+void foo(){
+stuff.d = 3;
+}
Index: testsuite/c-c++-common/cxxbitfields-14.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-14.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-14.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "--param allow-store-data-races=0" } */
+
+enum E0 { e0_0 };
+
+enum E2 { e2_m3 = -3, e2_m2, e2_m1, e2_0, e2_1, e2_2, e2_3 };
+
+struct S757
+{ 
+  enum E0 a;
+  enum E2 b:17;
+  enum E2 c:17;
+  unsigned char d;
+};
+
+struct S757 s757;
+
+int main()
+{   
+    s757.c = e2_m2;
+    return 0;
+}
+
+/* Make sure we don't load/store a full 32-bits.  */
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-6.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-6.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  char a;
+  int b:7;
+  int :0;
+  volatile int c:7;
+  unsigned char d;
+} x;
+
+/* Store into <c> should not clobber <d>.  */
+void update_c(struct bits *p, int val) 
+{
+    p -> c = val;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-8.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-8.c	(revision 0)
@@ -0,0 +1,29 @@
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-O --param allow-store-data-races=0" } */
+
+struct bits {
+  /* Make sure the bit position of the bitfield is larger than what
+     can be represented in an unsigned HOST_WIDE_INT, to force
+     get_inner_reference() to return something in POFFSET.  */
+      
+  struct {
+    int some_padding[1<<30];
+    char more_padding;
+  } pad[1<<29];
+
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+struct bits *p;
+
+/* Test that the store into <bitfield> is not done with something
+   wider than a byte move.  */
+void foo()
+{
+  p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: testsuite/c-c++-common/cxxbitfields-11.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-11.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-11.c	(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+struct S1075
+{
+  unsigned short int a;
+  unsigned long long int b:29;
+  unsigned long long int c:35;
+  unsigned long long int d:31;
+  unsigned long long int e:50;
+  char *f;
+};
+
+struct S1075 blob;
+void foo(){
+blob.d=55;
+}
Index: testsuite/c-c++-common/cxxbitfields-13.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-13.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-13.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "--param allow-store-data-races=0" } */
+
+/* Test bit fields that are split word boundaries.  */
+
+struct footype
+{
+    int c:9;
+    int d:9;
+    char e;
+} foo;
+
+void funky()
+{
+    foo.d = 88;
+}
+
+/* Make sure we don't load/store a full 32-bits.  */
+/* { dg-final { scan-assembler-not "movl\[ \t\]foo" } } */
Index: testsuite/c-c++-common/cxxbitfields-7.c
===================================================================
--- testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
+++ testsuite/c-c++-common/cxxbitfields-7.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 --param allow-store-data-races=0" } */
+
+struct bits
+{
+  int some_padding;
+  struct {
+    volatile char bitfield :1;
+  } x;
+  char b;
+};
+
+/* Store into <bitfield> should not clobber <b>.  */
+void update(struct bits *p)
+{
+    p->x.bitfield = 1;
+}
+
+/* { dg-final { scan-assembler "movb" } } */
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 176891)
+++ ifcvt.c	(working copy)
@@ -885,7 +885,8 @@ noce_emit_move_insn (rtx x, rtx y)
 		}
 
 	      gcc_assert (start < (MEM_P (op) ? BITS_PER_UNIT : BITS_PER_WORD));
-	      store_bit_field (op, size, start, 0, 0, GET_MODE (x), y);
+	      store_bit_field (op, size, start, integer_zero_node, 0, 0,
+			       GET_MODE (x), y);
 	      return;
 	    }
 
@@ -940,7 +941,7 @@ noce_emit_move_insn (rtx x, rtx y)
   outmode = GET_MODE (outer);
   bitpos = SUBREG_BYTE (outer) * BITS_PER_UNIT;
   store_bit_field (inner, GET_MODE_BITSIZE (outmode), bitpos,
-		   0, 0, outmode, y);
+		   integer_zero_node, 0, 0, outmode, y);
 }
 
 /* Return sequence of instructions generated by if conversion.  This
Index: expr.c
===================================================================
--- expr.c	(revision 176891)
+++ expr.c	(working copy)
@@ -145,7 +145,7 @@ static void store_constructor_field (rtx
 				     tree, tree, int, alias_set_type);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
-			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			tree, HOST_WIDE_INT, HOST_WIDE_INT,
 			enum machine_mode,
 			tree, tree, alias_set_type, bool);
 
@@ -2077,7 +2077,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
       else
 	store_bit_field (dest, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT,
-			 0, 0, mode, tmps[i]);
+			 integer_zero_node, 0, 0, mode, tmps[i]);
     }
 
   /* Copy from the pseudo into the (probable) hard reg.  */
@@ -2171,7 +2171,8 @@ copy_blkmode_from_reg (rtx tgtblk, rtx s
 
       /* Use xbitpos for the source extraction (right justified) and
 	 bitpos for the destination store (left justified).  */
-      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD, 0, 0, copy_mode,
+      store_bit_field (dst, bitsize, bitpos % BITS_PER_WORD,
+		       integer_zero_node, 0, 0, copy_mode,
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1, false,
 					  NULL_RTX, copy_mode, copy_mode));
@@ -2808,7 +2809,8 @@ write_complex_part (rtx cplx, rtx val, b
 	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
     }
 
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val);
+  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
+		   integer_zero_node, 0, 0, imode, val);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
@@ -3943,8 +3945,7 @@ get_subtarget (rtx x)
 static bool
 optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
 				 unsigned HOST_WIDE_INT bitpos,
-				 unsigned HOST_WIDE_INT bitregion_start,
-				 unsigned HOST_WIDE_INT bitregion_end,
+				 HOST_WIDE_INT bitregion_maxbits,
 				 enum machine_mode mode1, rtx str_rtx,
 				 tree to, tree src)
 {
@@ -4005,8 +4006,9 @@ optimize_bitfield_assignment_op (unsigne
 
       if (str_bitsize == 0 || str_bitsize > BITS_PER_WORD)
 	str_mode = word_mode;
+      if (bitregion_maxbits && bitregion_maxbits < GET_MODE_BITSIZE (str_mode))
+	str_mode = get_max_mode (bitregion_maxbits);
       str_mode = get_best_mode (bitsize, bitpos,
-				bitregion_start, bitregion_end,
 				MEM_ALIGN (str_rtx), str_mode, 0);
       if (str_mode == VOIDmode)
 	return false;
@@ -4115,114 +4117,184 @@ optimize_bitfield_assignment_op (unsigne
   return false;
 }
 
-/* In the C++ memory model, consecutive bit fields in a structure are
-   considered one memory location.
+/* In the C++ memory model, consecutive non-zero bit fields in a
+   structure are considered one memory location.
 
-   Given a COMPONENT_REF, this function returns the bit range of
-   consecutive bits in which this COMPONENT_REF belongs in.  The
-   values are returned in *BITSTART and *BITEND.  If either the C++
-   memory model is not activated, or this memory access is not thread
-   visible, 0 is returned in *BITSTART and *BITEND.
-
-   EXP is the COMPONENT_REF.
-   INNERDECL is the actual object being referenced.
-   BITPOS is the position in bits where the bit starts within the structure.
-   BITSIZE is size in bits of the field being referenced in EXP.
-
-   For example, while storing into FOO.A here...
-
-      struct {
-        BIT 0:
-          unsigned int a : 4;
-	  unsigned int b : 1;
-	BIT 8:
-	  unsigned char c;
-	  unsigned int d : 6;
-      } foo;
-
-   ...we are not allowed to store past <b>, so for the layout above, a
-   range of 0..7 (because no one cares if we store into the
-   padding).  */
+   Given a COMPONENT_REF, this function calculates the byte offset
+   from the containing object to the start of the contiguous bit
+   region containing the field in question.  This byte offset is
+   returned in *BYTE_OFFSET.
+
+   The bit offset from the start of the bit region to the bit field in
+   question is returned in *BIT_OFFSET.
+
+   The maximum number of bits that can be addressed while storing into
+   the COMPONENT_REF is returned in *MAXBITS.  This number is the
+   number of bits in the contiguous bit region, including any
+   padding.  */
 
 static void
-get_bit_range (unsigned HOST_WIDE_INT *bitstart,
-	       unsigned HOST_WIDE_INT *bitend,
-	       tree exp, tree innerdecl,
-	       HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize)
+get_bit_range (tree exp, tree *byte_offset, HOST_WIDE_INT *bit_offset,
+	       HOST_WIDE_INT *maxbits)
 {
   tree field, record_type, fld;
-  bool found_field = false;
   bool prev_field_is_bitfield;
+  tree start_offset;
+  HOST_WIDE_INT start_bitpos;
+  /* First field of the bitfield group containing the bitfield we are
+     referencing.  */
+  tree bitregion_start;
 
-  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
+  HOST_WIDE_INT tbitsize;
+  enum machine_mode tmode;
+  int tunsignedp, tvolatilep;
 
-  /* If other threads can't see this value, no need to restrict stores.  */
-  if (ALLOW_STORE_DATA_RACES
-      || ((TREE_CODE (innerdecl) == MEM_REF
-	   || TREE_CODE (innerdecl) == TARGET_MEM_REF)
-	  && !ptr_deref_may_alias_global_p (TREE_OPERAND (innerdecl, 0)))
-      || (DECL_P (innerdecl)
-	  && (DECL_THREAD_LOCAL_P (innerdecl)
-	      || !TREE_STATIC (innerdecl))))
-    {
-      *bitstart = *bitend = 0;
-      return;
-    }
+  gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
 
   /* Bit field we're storing into.  */
   field = TREE_OPERAND (exp, 1);
   record_type = DECL_FIELD_CONTEXT (field);
 
-  /* Count the contiguous bitfields for the memory location that
-     contains FIELD.  */
-  *bitstart = 0;
-  prev_field_is_bitfield = true;
+  /* Find the bitfield group containing the field in question, and set
+     BITREGION_START to the start of the group.  */
+  prev_field_is_bitfield = false;
+  bitregion_start = NULL_TREE;
   for (fld = TYPE_FIELDS (record_type); fld; fld = DECL_CHAIN (fld))
     {
-      tree t, offset;
-      enum machine_mode mode;
-      int unsignedp, volatilep;
-
       if (TREE_CODE (fld) != FIELD_DECL)
 	continue;
 
-      t = build3 (COMPONENT_REF, TREE_TYPE (exp),
-		  unshare_expr (TREE_OPERAND (exp, 0)),
-		  fld, NULL_TREE);
-      get_inner_reference (t, &bitsize, &bitpos, &offset,
-			   &mode, &unsignedp, &volatilep, true);
-
-      if (field == fld)
-	found_field = true;
-
-      if (DECL_BIT_FIELD_TYPE (fld) && bitsize > 0)
+      /* If we have a non-zero bit-field.  */
+      if (DECL_BIT_FIELD_TYPE (fld)
+	  && !integer_zerop (DECL_SIZE (fld)))
 	{
-	  if (prev_field_is_bitfield == false)
+	  if (!prev_field_is_bitfield)
 	    {
-	      *bitstart = bitpos;
+	      bitregion_start = fld;
 	      prev_field_is_bitfield = true;
 	    }
 	}
       else
+	prev_field_is_bitfield = false;
+      if (fld == field)
+	break;
+    }
+  gcc_assert (bitregion_start);
+  gcc_assert (fld);
+
+  /* Save the starting position of the bitregion.  */
+  get_inner_reference (build3 (COMPONENT_REF,
+			       TREE_TYPE (exp),
+			       TREE_OPERAND (exp, 0),
+			       bitregion_start, NULL_TREE),
+		       &tbitsize, &start_bitpos, &start_offset,
+		       &tmode, &tunsignedp, &tvolatilep, true);
+
+  if (!start_offset)
+    start_offset = size_zero_node;
+  /* Calculate byte offset to the beginning of the bit region.  */
+  /* BYTE_OFFSET = START_OFFSET + (START_BITPOS / BITS_PER_UNIT) */
+  gcc_assert (start_bitpos % BITS_PER_UNIT == 0);
+  *byte_offset = fold_build2 (PLUS_EXPR, TREE_TYPE (start_offset),
+			      start_offset,
+			      build_int_cst (integer_type_node,
+					     start_bitpos / BITS_PER_UNIT));
+
+  /* Calculate the starting bit offset and find the end of the bit
+     region.  */
+  for (fld = bitregion_start; fld; fld = DECL_CHAIN (fld))
+    {
+      if (TREE_CODE (fld) != FIELD_DECL)
+	continue;
+
+      if (!DECL_BIT_FIELD_TYPE (fld)
+	  || integer_zerop (DECL_SIZE (fld)))
+	break;
+
+      if (fld == field)
 	{
-	  prev_field_is_bitfield = false;
-	  if (found_field)
-	    break;
+	  tree t = DECL_FIELD_OFFSET (fld);
+	  tree bits = build_int_cst (integer_type_node, BITS_PER_UNIT);
+	  HOST_WIDE_INT tbitpos;
+	  tree toffset;
+
+	  get_inner_reference (build3 (COMPONENT_REF,
+				       TREE_TYPE (exp),
+				       TREE_OPERAND (exp, 0),
+				       fld, NULL_TREE),
+			       &tbitsize, &tbitpos, &toffset,
+			       &tmode, &tunsignedp, &tvolatilep, true);
+
+	  if (!toffset)
+	    toffset = size_zero_node;
+
+	  /* bitoff = start_byte * 8 - (fld.byteoff * 8 + fld.bitoff) */
+	  t = fold_build2 (MINUS_EXPR, size_type_node,
+			   fold_build2 (PLUS_EXPR, size_type_node,
+					fold_build2 (MULT_EXPR, size_type_node,
+						     toffset, bits),
+					build_int_cst (integer_type_node,
+						       tbitpos)),
+			   fold_build2 (MULT_EXPR, size_type_node,
+					*byte_offset, bits));
+
+	  *bit_offset = tree_low_cst (t, 1);
 	}
     }
-  gcc_assert (found_field);
 
+  /* Be as conservative as possible on variable offsets.  */
+  if (TREE_OPERAND (exp, 2)
+      && !host_integerp (TREE_OPERAND (exp, 2), 1))
+    {
+      *byte_offset = TREE_OPERAND (exp, 2);
+      *maxbits = BITS_PER_UNIT;
+      return;
+    }
+
+  /* If we found the end of the bit field sequence, include the
+     padding up to the next field...  */
   if (fld)
     {
-      /* We found the end of the bit field sequence.  Include the
-	 padding up to the next field and be done.  */
-      *bitend = bitpos - 1;
+      tree end_offset, maxbits_tree;
+      HOST_WIDE_INT end_bitpos;
+
+      /* Calculate bitpos and offset of the next field.  */
+      get_inner_reference (build3 (COMPONENT_REF,
+				   TREE_TYPE (exp),
+				   TREE_OPERAND (exp, 0),
+				   fld, NULL_TREE),
+			   &tbitsize, &end_bitpos, &end_offset,
+			   &tmode, &tunsignedp, &tvolatilep, true);
+      gcc_assert (end_bitpos % BITS_PER_UNIT == 0);
+
+      if (end_offset)
+	{
+	  tree type = TREE_TYPE (end_offset);
+
+	  maxbits_tree = fold_build2 (PLUS_EXPR, type,
+				      build2 (MULT_EXPR, type,
+					      build2 (MINUS_EXPR, type,
+						      end_offset,
+						      *byte_offset),
+					      build_int_cst (size_type_node,
+							     BITS_PER_UNIT)),
+				      build_int_cst (size_type_node,
+						     end_bitpos));
+	}
+      else
+	maxbits_tree = build_int_cst (integer_type_node,
+				      end_bitpos - start_bitpos);
+
+      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
     }
+  /* ...otherwise, this is the last element in the structure.  */
   else
     {
-      /* If this is the last element in the structure, include the padding
-	 at the end of structure.  */
-      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+      /* Include the padding at the end of structure.  */
+      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
+	- TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start));
+      /* Round up to the next byte.  */
+      *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
     }
 }
 
@@ -4324,12 +4396,15 @@ expand_assignment (tree to, tree from, b
     {
       enum machine_mode mode1;
       HOST_WIDE_INT bitsize, bitpos;
-      unsigned HOST_WIDE_INT bitregion_start = 0;
-      unsigned HOST_WIDE_INT bitregion_end = 0;
       tree offset;
       int unsignedp;
       int volatilep = 0;
       tree tem;
+      tree bitregion_byte_offset = size_zero_node;
+      HOST_WIDE_INT bitregion_bit_offset = 0;
+      /* Set to 0 for the special case where there is no restriction
+	 in play.  */
+      HOST_WIDE_INT bitregion_maxbits = 0;
 
       push_temp_slots ();
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
@@ -4337,8 +4412,30 @@ expand_assignment (tree to, tree from, b
 
       if (TREE_CODE (to) == COMPONENT_REF
 	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
-	get_bit_range (&bitregion_start, &bitregion_end,
-		       to, tem, bitpos, bitsize);
+	{
+	  /* If other threads can't see this value, no need to
+	     restrict stores.  */
+	  if (ALLOW_STORE_DATA_RACES
+	      || ((TREE_CODE (tem) == MEM_REF
+		   || TREE_CODE (tem) == TARGET_MEM_REF)
+		  && !ptr_deref_may_alias_global_p (TREE_OPERAND (tem, 0)))
+	      || TREE_CODE (tem) == RESULT_DECL
+	      || TREE_CODE (tem) == PARM_DECL
+	      || (DECL_P (tem)
+		  && ((TREE_CODE (tem) == VAR_DECL
+		       && DECL_THREAD_LOCAL_P (tem))
+		      || !TREE_STATIC (tem))))
+	    {
+	      bitregion_byte_offset = size_zero_node;
+	      bitregion_bit_offset = 0;
+	      /* Set to 0 for the special case where there is no
+		 restriction in play.  */
+	      bitregion_maxbits = 0;
+	    }
+	  else
+	    get_bit_range (to, &bitregion_byte_offset,
+			   &bitregion_bit_offset, &bitregion_maxbits);
+	}
 
       /* If we are going to use store_bit_field and extract_bit_field,
 	 make sure to_rtx will be safe for multiple use.  */
@@ -4388,6 +4485,10 @@ expand_assignment (tree to, tree from, b
 	      && MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1))
 	    {
 	      to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT);
+	      bitregion_byte_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					      bitregion_byte_offset,
+					      build_int_cst (integer_type_node,
+							     bitpos / BITS_PER_UNIT));
 	      bitpos = 0;
 	    }
 
@@ -4421,13 +4522,15 @@ expand_assignment (tree to, tree from, b
 				 nontemporal);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_byte_offset, bitregion_bit_offset,
+				  bitregion_maxbits,
 				  mode1, from, TREE_TYPE (tem),
 				  get_alias_set (to), nontemporal);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
-				  bitregion_start, bitregion_end,
+				  bitregion_byte_offset, bitregion_bit_offset,
+				  bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -4450,7 +4553,8 @@ expand_assignment (tree to, tree from, b
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
 	      result = store_field (temp, bitsize, bitpos,
-				    bitregion_start, bitregion_end,
+				    bitregion_byte_offset, bitregion_bit_offset,
+				    bitregion_maxbits,
 				    mode1, from,
 				    TREE_TYPE (tem), get_alias_set (to),
 				    nontemporal);
@@ -4477,13 +4581,14 @@ expand_assignment (tree to, tree from, b
 	    }
 
 	  if (optimize_bitfield_assignment_op (bitsize, bitpos,
-					       bitregion_start, bitregion_end,
+					       bitregion_maxbits,
 					       mode1,
 					       to_rtx, to, from))
 	    result = NULL;
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,
-				  bitregion_start, bitregion_end,
+				  bitregion_byte_offset, bitregion_bit_offset,
+				  bitregion_maxbits,
 				  mode1, from,
 				  TREE_TYPE (tem), get_alias_set (to),
 				  nontemporal);
@@ -4877,7 +4982,7 @@ store_expr (tree exp, rtx target, int ca
 			      : BLOCK_OP_NORMAL));
 	  else if (GET_MODE (target) == BLKmode)
 	    store_bit_field (target, INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-			     0, 0, 0, GET_MODE (temp), temp);
+			     0, integer_zero_node, 0, 0, GET_MODE (temp), temp);
 	  else
 	    convert_move (target, temp, unsignedp);
 	}
@@ -5342,8 +5447,8 @@ store_constructor_field (rtx target, uns
       store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT);
     }
   else
-    store_field (target, bitsize, bitpos, 0, 0, mode, exp, type, alias_set,
-		 false);
+    store_field (target, bitsize, bitpos, integer_zero_node, 0, 0, mode, exp,
+		 type, alias_set, false);
 }
 
 /* Store the value of constructor EXP into the rtx TARGET.
@@ -5917,10 +6022,14 @@ store_constructor (tree exp, rtx target,
    BITSIZE bits, starting BITPOS bits from the start of TARGET.
    If MODE is VOIDmode, it means that we are storing into a bit-field.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_BYTE_OFFSET is the byte offset from the beginning of the
+   containing object to the start of the bit region.
+
+   BITREGION_BIT_OFFSET is the bit offset from the start of the bit
+   region.
+
+   BITREGION_MAXBITS is the size of the bit region containing the bit
+   field in question.
 
    Always return const0_rtx unless we have something particular to
    return.
@@ -5935,8 +6044,9 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
-	     unsigned HOST_WIDE_INT bitregion_start,
-	     unsigned HOST_WIDE_INT bitregion_end,
+	     tree bitregion_byte_offset,
+	     HOST_WIDE_INT bitregion_bit_offset,
+	     HOST_WIDE_INT bitregion_maxbits,
 	     enum machine_mode mode, tree exp, tree type,
 	     alias_set_type alias_set, bool nontemporal)
 {
@@ -5970,7 +6080,8 @@ store_field (rtx target, HOST_WIDE_INT b
 	emit_move_insn (object, target);
 
       store_field (blk_object, bitsize, bitpos,
-		   bitregion_start, bitregion_end,
+		   bitregion_byte_offset, bitregion_bit_offset,
+		   bitregion_maxbits,
 		   mode, exp, type, alias_set, nontemporal);
 
       emit_move_insn (target, object);
@@ -6086,7 +6197,8 @@ store_field (rtx target, HOST_WIDE_INT b
 
       /* Store the value in the bitfield.  */
       store_bit_field (target, bitsize, bitpos,
-		       bitregion_start, bitregion_end,
+		       bitregion_byte_offset, bitregion_bit_offset,
+		       bitregion_maxbits,
 		       mode, temp);
 
       return const0_rtx;
@@ -7497,7 +7609,8 @@ expand_expr_real_2 (sepops ops, rtx targ
 						    (treeop0))
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
-			   0, 0, 0, TYPE_MODE (valtype), treeop0,
+			   0, integer_zero_node, 0, 0,
+			   TYPE_MODE (valtype), treeop0,
 			   type, 0, false);
 	    }
 
Index: expr.h
===================================================================
--- expr.h	(revision 176891)
+++ expr.h	(working copy)
@@ -664,10 +664,12 @@ enum extraction_pattern { EP_insv, EP_ex
 extern enum machine_mode
 mode_for_extraction (enum extraction_pattern, int);
 
+extern enum machine_mode get_max_mode (HOST_WIDE_INT);
 extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
+			     tree,
+			     HOST_WIDE_INT,
+			     HOST_WIDE_INT,
 			     enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, bool, rtx,
Index: stor-layout.c
===================================================================
--- stor-layout.c	(revision 176891)
+++ stor-layout.c	(working copy)
@@ -2361,13 +2361,6 @@ fixup_unsigned_type (tree type)
 /* Find the best machine mode to use when referencing a bit field of length
    BITSIZE bits starting at BITPOS.
 
-   BITREGION_START is the bit position of the first bit in this
-   sequence of bit fields.  BITREGION_END is the last bit in this
-   sequence.  If these two fields are non-zero, we should restrict the
-   memory access to a maximum sized chunk of
-   BITREGION_END - BITREGION_START + 1.  Otherwise, we are allowed to touch
-   any adjacent non bit-fields.
-
    The underlying object is known to be aligned to a boundary of ALIGN bits.
    If LARGEST_MODE is not VOIDmode, it means that we should not use a mode
    larger than LARGEST_MODE (usually SImode).
@@ -2386,20 +2379,11 @@ fixup_unsigned_type (tree type)
 
 enum machine_mode
 get_best_mode (int bitsize, int bitpos,
-	       unsigned HOST_WIDE_INT bitregion_start,
-	       unsigned HOST_WIDE_INT bitregion_end,
 	       unsigned int align,
 	       enum machine_mode largest_mode, int volatilep)
 {
   enum machine_mode mode;
   unsigned int unit = 0;
-  unsigned HOST_WIDE_INT maxbits;
-
-  /* If unset, no restriction.  */
-  if (!bitregion_end)
-    maxbits = MAX_FIXED_MODE_SIZE;
-  else
-    maxbits = (bitregion_end - bitregion_start) % align + 1;
 
   /* Find the narrowest integer mode that contains the bit field.  */
   for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
@@ -2436,7 +2420,6 @@ get_best_mode (int bitsize, int bitpos,
 	      && bitpos / unit == (bitpos + bitsize - 1) / unit
 	      && unit <= BITS_PER_WORD
 	      && unit <= MIN (align, BIGGEST_ALIGNMENT)
-	      && unit <= maxbits
 	      && (largest_mode == VOIDmode
 		  || unit <= GET_MODE_BITSIZE (largest_mode)))
 	    wide_mode = tmode;
Index: calls.c
===================================================================
--- calls.c	(revision 176891)
+++ calls.c	(working copy)
@@ -924,7 +924,8 @@ store_unaligned_arguments_into_pseudos (
 	    emit_move_insn (reg, const0_rtx);
 
 	    bytes -= bitsize / BITS_PER_UNIT;
-	    store_bit_field (reg, bitsize, endian_correction, 0, 0,
+	    store_bit_field (reg, bitsize, endian_correction,
+			     integer_zero_node, 0, 0,
 			     word_mode, word);
 	  }
       }
Index: expmed.c
===================================================================
--- expmed.c	(revision 176891)
+++ expmed.c	(working copy)
@@ -48,13 +48,11 @@ struct target_expmed *this_target_expmed
 static void store_fixed_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   HOST_WIDE_INT,
 				   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   HOST_WIDE_INT,
 				   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
 				    unsigned HOST_WIDE_INT,
@@ -340,8 +338,7 @@ mode_for_extraction (enum extraction_pat
 static bool
 store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		   unsigned HOST_WIDE_INT bitnum,
-		   unsigned HOST_WIDE_INT bitregion_start,
-		   unsigned HOST_WIDE_INT bitregion_end,
+		   HOST_WIDE_INT bitregion_maxbits,
 		   enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
@@ -558,7 +555,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  if (!store_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					    bitsize - i * BITS_PER_WORD),
 				  bitnum + bit_offset,
-				  bitregion_start, bitregion_end,
+				  bitregion_maxbits,
 				  word_mode,
 				  value_word, fallback_p))
 	    {
@@ -722,10 +719,6 @@ store_bit_field_1 (rtx str_rtx, unsigned
   if (HAVE_insv && MEM_P (op0))
     {
       enum machine_mode bestmode;
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
 
       /* Get the mode to use for inserting into this field.  If OP0 is
 	 BLKmode, get the smallest mode consistent with the alignment. If
@@ -733,15 +726,20 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 mode. Otherwise, use the smallest mode containing the field.  */
 
       if (GET_MODE (op0) == BLKmode
-	  || GET_MODE_BITSIZE (GET_MODE (op0)) > maxbits
+	  || (bitregion_maxbits
+	      && GET_MODE_BITSIZE (GET_MODE (op0)) > bitregion_maxbits)
 	  || (op_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (op_mode)))
-	bestmode = get_best_mode  (bitsize, bitnum,
-				  bitregion_start, bitregion_end,
-				  MEM_ALIGN (op0),
-				  (op_mode == MAX_MACHINE_MODE
-				   ? VOIDmode : op_mode),
-				  MEM_VOLATILE_P (op0));
+	{
+	  bestmode = (op_mode == MAX_MACHINE_MODE ? VOIDmode : op_mode);
+	  if (bitregion_maxbits
+	      && bitregion_maxbits < GET_MODE_BITSIZE (op_mode))
+	    bestmode = get_max_mode (bitregion_maxbits);
+	  bestmode = get_best_mode  (bitsize, bitnum,
+				     MEM_ALIGN (op0),
+				     bestmode,
+				     MEM_VOLATILE_P (op0));
+	}
       else
 	bestmode = GET_MODE (op0);
 
@@ -752,6 +750,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	{
 	  rtx last, tempreg, xop0;
 	  unsigned HOST_WIDE_INT xoffset, xbitpos;
+	  HOST_WIDE_INT xmaxbits = bitregion_maxbits;
 
 	  last = get_last_insn ();
 
@@ -762,13 +761,24 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  xoffset = (bitnum / unit) * GET_MODE_SIZE (bestmode);
 	  xbitpos = bitnum % unit;
 	  xop0 = adjust_address (op0, bestmode, xoffset);
+	  if (xmaxbits)
+	    xmaxbits -= xoffset * BITS_PER_UNIT;
 
 	  /* Fetch that unit, store the bitfield in it, then store
 	     the unit.  */
 	  tempreg = copy_to_reg (xop0);
-	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
-				 bitregion_start, bitregion_end,
-				 fieldmode, orig_value, false))
+	  if (xmaxbits && unit > xmaxbits)
+	    {
+	      /* Do not allow reading past the bit region.
+		 Technically, you can read past the bitregion, because
+		 load data races are allowed.  You just can't write
+		 past the bit region.
+
+		 ?? Perhaps allow reading, and adjust everything else
+		 accordingly.  Ughh. */
+	    }
+	  else if (store_bit_field_1 (tempreg, bitsize, xbitpos, xmaxbits,
+				      fieldmode, orig_value, false))
 	    {
 	      emit_move_insn (xop0, tempreg);
 	      return true;
@@ -781,7 +791,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
     return false;
 
   store_fixed_bit_field (op0, offset, bitsize, bitpos,
-			 bitregion_start, bitregion_end, value);
+			 bitregion_maxbits, value);
   return true;
 }
 
@@ -789,18 +799,22 @@ store_bit_field_1 (rtx str_rtx, unsigned
    into a bit-field within structure STR_RTX
    containing BITSIZE bits starting at bit BITNUM.
 
-   BITREGION_START is bitpos of the first bitfield in this region.
-   BITREGION_END is the bitpos of the ending bitfield in this region.
-   These two fields are 0, if the C++ memory model does not apply,
-   or we are not interested in keeping track of bitfield regions.
+   BITREGION_BYTE_OFFSET is the byte offset from STR_RTX to the start
+   of the bit region.
+
+   BITREGION_BIT_OFFSET is the field's bit offset from the start of
+   the bit region.
+
+   BITREGION_MAXBITS is the number of bits in the bit region.
 
    FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
 store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		 unsigned HOST_WIDE_INT bitnum,
-		 unsigned HOST_WIDE_INT bitregion_start,
-		 unsigned HOST_WIDE_INT bitregion_end,
+		 tree bitregion_byte_offset,
+		 HOST_WIDE_INT bitregion_bit_offset,
+		 HOST_WIDE_INT bitregion_maxbits,
 		 enum machine_mode fieldmode,
 		 rtx value)
 {
@@ -808,33 +822,51 @@ store_bit_field (rtx str_rtx, unsigned H
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
   if (MEM_P (str_rtx)
-      && bitregion_start > 0)
+      && bitregion_maxbits
+      && !integer_zerop (bitregion_byte_offset))
     {
-      enum machine_mode bestmode;
-      enum machine_mode op_mode;
-      unsigned HOST_WIDE_INT offset;
+      HOST_WIDE_INT offset;
 
-      op_mode = mode_for_extraction (EP_insv, 3);
-      if (op_mode == MAX_MACHINE_MODE)
-	op_mode = VOIDmode;
-
-      offset = bitregion_start / BITS_PER_UNIT;
-      bitnum -= bitregion_start;
-      bitregion_end -= bitregion_start;
-      bitregion_start = 0;
-      bestmode = get_best_mode (bitsize, bitnum,
-				bitregion_start, bitregion_end,
-				MEM_ALIGN (str_rtx),
-				op_mode,
-				MEM_VOLATILE_P (str_rtx));
-      str_rtx = adjust_address (str_rtx, bestmode, offset);
+      if (host_integerp (bitregion_byte_offset, 1))
+	{
+	  /* Adjust the bit position accordingly.  */
+	  offset = tree_low_cst (bitregion_byte_offset, 1);
+	  /* Adjust the actual address.  */
+	  str_rtx = adjust_address (str_rtx, GET_MODE (str_rtx), offset);
+	}
+      else
+	{
+	  /* Handle variable length offsets.  */
+	  str_rtx = offset_address (str_rtx,
+				    expand_normal (bitregion_byte_offset), 1);
+	}
+      bitregion_byte_offset = integer_zero_node;
+      bitnum = bitregion_bit_offset;
     }
 
-  if (!store_bit_field_1 (str_rtx, bitsize, bitnum,
-			  bitregion_start, bitregion_end,
+  if (!store_bit_field_1 (str_rtx, bitsize, bitnum, bitregion_maxbits,
 			  fieldmode, value, true))
     gcc_unreachable ();
 }
+
+/* Return the largest mode that can be used to address a bit field of
+   size BITS.  This is basically a MODE whose bit size is <= BITS.  */
+enum machine_mode
+get_max_mode (HOST_WIDE_INT bits)
+{
+  enum machine_mode mode, prev;
+
+  for (prev = mode = GET_CLASS_NARROWEST_MODE (MODE_INT); mode != VOIDmode;
+       mode = GET_MODE_WIDER_MODE (mode))
+    {
+      if (GET_MODE_BITSIZE (mode) > bits
+	  || GET_MODE_BITSIZE (mode) > MAX_FIXED_MODE_SIZE)
+	return prev;
+      prev = mode;
+    }
+  gcc_unreachable ();
+  return VOIDmode;
+}
 \f
 /* Use shifts and boolean operations to store VALUE
    into a bit field of width BITSIZE
@@ -843,14 +875,16 @@ store_bit_field (rtx str_rtx, unsigned H
    The field starts at position BITPOS within the byte.
     (If OP0 is a register, it may be a full word or a narrower mode,
      but BITPOS still counts within a full word,
-     which is significant on bigendian machines.)  */
+     which is significant on bigendian machines.)
+
+     BITREGION_MAXBITS is the number of bits in the bit region, which
+     starts at OP0.  */
 
 static void
 store_fixed_bit_field (rtx op0, unsigned HOST_WIDE_INT offset,
 		       unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   enum machine_mode mode;
@@ -872,19 +906,12 @@ store_fixed_bit_field (rtx op0, unsigned
       /* Special treatment for a bit field split across two registers.  */
       if (bitsize + bitpos > BITS_PER_WORD)
 	{
-	  store_split_bit_field (op0, bitsize, bitpos,
-				 bitregion_start, bitregion_end,
-				 value);
+	  store_split_bit_field (op0, bitsize, bitpos, bitregion_maxbits, value);
 	  return;
 	}
     }
   else
     {
-      unsigned HOST_WIDE_INT maxbits = MAX_FIXED_MODE_SIZE;
-
-      if (bitregion_end)
-	maxbits = bitregion_end - bitregion_start + 1;
-
       /* Get the proper mode to use for this field.  We want a mode that
 	 includes the entire field.  If such a mode would be larger than
 	 a word, we won't be doing the extraction the normal way.
@@ -897,20 +924,26 @@ store_fixed_bit_field (rtx op0, unsigned
 
       if (MEM_VOLATILE_P (op0)
           && GET_MODE_BITSIZE (GET_MODE (op0)) > 0
-	  && GET_MODE_BITSIZE (GET_MODE (op0)) <= maxbits
+	  && (!bitregion_maxbits
+	      || GET_MODE_BITSIZE (GET_MODE (op0)) <= bitregion_maxbits)
 	  && flag_strict_volatile_bitfields > 0)
 	mode = GET_MODE (op0);
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
-			      bitregion_start, bitregion_end,
-			      MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	{
+	  if (bitregion_maxbits
+	      && (bitregion_maxbits - offset * BITS_PER_UNIT
+		  < GET_MODE_BITSIZE (mode)))
+	    mode = get_max_mode (bitregion_maxbits - offset * BITS_PER_UNIT);
+	  mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
+				MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
+	}
 
       if (mode == VOIDmode)
 	{
 	  /* The only way this should occur is if the field spans word
 	     boundaries.  */
 	  store_split_bit_field (op0, bitsize, bitpos + offset * BITS_PER_UNIT,
-				 bitregion_start, bitregion_end, value);
+				 bitregion_maxbits, value);
 	  return;
 	}
 
@@ -932,6 +965,14 @@ store_fixed_bit_field (rtx op0, unsigned
 	 Then alter OP0 to refer to that word.  */
       bitpos += (offset % (total_bits / BITS_PER_UNIT)) * BITS_PER_UNIT;
       offset -= (offset % (total_bits / BITS_PER_UNIT));
+      if (bitregion_maxbits)
+	{
+	  enum machine_mode tmode;
+	  bitregion_maxbits -= offset * BITS_PER_UNIT;
+	  tmode = get_max_mode (bitregion_maxbits);
+	  if (GET_MODE_SIZE (mode) > GET_MODE_SIZE (tmode))
+	    mode = tmode;
+	}
       op0 = adjust_address (op0, mode, offset);
     }
 
@@ -1031,8 +1072,7 @@ store_fixed_bit_field (rtx op0, unsigned
 static void
 store_split_bit_field (rtx op0, unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       HOST_WIDE_INT bitregion_maxbits,
 		       rtx value)
 {
   unsigned int unit;
@@ -1043,7 +1083,14 @@ store_split_bit_field (rtx op0, unsigned
   if (REG_P (op0) || GET_CODE (op0) == SUBREG)
     unit = BITS_PER_WORD;
   else
-    unit = MIN (MEM_ALIGN (op0), BITS_PER_WORD);
+    {
+      unit = MIN (MEM_ALIGN (op0), BITS_PER_WORD);
+
+      /* ?? Ideally we should do as much as we can with the wider
+	 mode, and use BITS_PER_UNIT for the remaining bits.  */
+      if (bitregion_maxbits % unit)
+	unit = BITS_PER_UNIT;
+    }
 
   /* If VALUE is a constant other than a CONST_INT, get it into a register in
      WORD_MODE.  If we can do this using gen_lowpart_common, do so.  Note
@@ -1148,7 +1195,7 @@ store_split_bit_field (rtx op0, unsigned
 	 it is just an out-of-bounds access.  Ignore it.  */
       if (word != const0_rtx)
 	store_fixed_bit_field (word, offset * unit / BITS_PER_UNIT, thissize,
-			       thispos, bitregion_start, bitregion_end, part);
+			       thispos, bitregion_maxbits, part);
       bitsdone += thissize;
     }
 }
@@ -1588,7 +1635,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
       if (GET_MODE (op0) == BLKmode
 	  || (ext_mode != MAX_MACHINE_MODE
 	      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (ext_mode)))
-	bestmode = get_best_mode (bitsize, bitnum, 0, 0, MEM_ALIGN (op0),
+	bestmode = get_best_mode (bitsize, bitnum, MEM_ALIGN (op0),
 				  (ext_mode == MAX_MACHINE_MODE
 				   ? VOIDmode : ext_mode),
 				  MEM_VOLATILE_P (op0));
@@ -1714,7 +1761,7 @@ extract_fixed_bit_field (enum machine_mo
 	    mode = tmode;
 	}
       else
-	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT, 0, 0,
+	mode = get_best_mode (bitsize, bitpos + offset * BITS_PER_UNIT,
 			      MEM_ALIGN (op0), word_mode, MEM_VOLATILE_P (op0));
 
       if (mode == VOIDmode)
Index: stmt.c
===================================================================
--- stmt.c	(revision 176891)
+++ stmt.c	(working copy)
@@ -1760,7 +1760,7 @@ expand_return (tree retval)
 	  /* Use bitpos for the source extraction (left justified) and
 	     xbitpos for the destination store (right justified).  */
 	  store_bit_field (dst, bitsize, xbitpos % BITS_PER_WORD,
-			   0, 0, word_mode,
+			   integer_zero_node, 0, 0, word_mode,
 			   extract_bit_field (src, bitsize,
 					      bitpos % BITS_PER_WORD, 1, false,
 					      NULL_RTX, word_mode, word_mode));
Index: params.def
===================================================================
--- params.def	(revision 176891)
+++ params.def	(working copy)
@@ -912,7 +912,9 @@ DEFPARAM (PARAM_CASE_VALUES_THRESHOLD,
 DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
 	  "allow-store-data-races",
 	  "Allow new data races on stores to be introduced",
-	  1, 0, 1)
+	  /* TESTING TESTING */
+	  /* TESTING: Enable the memory model by default.  */
+	  0, 0, 1)
 
 
 /*

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-27  0:05                                                         ` Aldy Hernandez
@ 2011-08-29 12:54                                                           ` Richard Guenther
  2011-08-30 16:07                                                             ` Aldy Hernandez
                                                                               ` (3 more replies)
  0 siblings, 4 replies; 81+ messages in thread
From: Richard Guenther @ 2011-08-29 12:54 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

On Fri, Aug 26, 2011 at 8:54 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> This is a "slight" update from the last revision, with your issues addressed
> as I explained in the last email.  However, everything turned out to be much
> tricker than I expected (variable length offsets with arrays, bit fields
> spanning multiple words, surprising padding gymnastics by GCC, etc etc).
>
> It turns out that what we need is to know the precise bit region size at all
> times, and adjust it as we rearrange and cut things into pieces throughout
> the RTL bit field machinery.
>
> I enabled the C++ memory model, and forced a boostrap and regression test
> with it.  This brought about many interesting cases, which I was able to
> distill and add to the testsuite.
>
> Of particular interest was the struct-layout-1.exp tests.  Since many of the
> tests set a global bit field, only to later check it against a local
> variable containing the same value, it is the perfect stressor because,
> while globals are restricted under the memory model, locals are not.  So we
> can check that we can interoperate with the less restrictive model, and that
> the patch does not introduce ABI inconsistencies.  After much grief, we are
> now passing all the struct-layout-1.exp tests. Eventually, I'd like to force
> the struct-layout-1.exp tests to run for "--param allow-store-data-races=0"
> as well.  Unfortunately, this will increase testing time.
>
> I have (unfortunately) introduced an additional call to
> get_inner_reference(), but only for the field itself (one time).  I can't
> remember the details, but it was something to effect of the bit position +
> padding being impossible to calculate in one variable array reference case.
>  I can dig up the case if you'd like.
>
> I am currently tackling a reload miscompilation failure while building a
> 32-bit library.  I am secretly hoping your review will uncover the flaw
> without me having to pick this up.  Otherwise, this is a much more
> comprehensive approach than what is currently in mainline, and we now pass
> all the bitfield tests the GCC testsuite could throw at it.
>
> Fire away.

+  /* Be as conservative as possible on variable offsets.  */
+  if (TREE_OPERAND (exp, 2)
+      && !host_integerp (TREE_OPERAND (exp, 2), 1))
+    {
+      *byte_offset = TREE_OPERAND (exp, 2);
+      *maxbits = BITS_PER_UNIT;
+      return;
+    }

shouldn't this be at the very beginning of the function?  Because
you've set *bit_offset to an offset that was _not_ calculated relative
to TREE_OPERAND (exp, 2).  And you'll avoid ICEing

+	  /* bitoff = start_byte * 8 - (fld.byteoff * 8 + fld.bitoff) */
+	  t = fold_build2 (MINUS_EXPR, size_type_node,
+			   fold_build2 (PLUS_EXPR, size_type_node,
+					fold_build2 (MULT_EXPR, size_type_node,
+						     toffset, bits),
+					build_int_cst (integer_type_node,
+						       tbitpos)),
+			   fold_build2 (MULT_EXPR, size_type_node,
+					*byte_offset, bits));
+
+	  *bit_offset = tree_low_cst (t, 1);

here in case t isn't an INTEGER_CST.  The comment before the
tree formula above doesn't match it, please update it.  If
*bit_offset is supposed to be relative to *byte_offset then it should
be easy to calculate it without another get_inner_reference.

Btw, *byte_offset is still not relative to the containing object as
documented, but relative to the base object of the exp reference
tree (thus, to a in a.i.j.k.l instead of to a.i.j.k).  If it were supposed
to be relative to a.i.j.k get_inner_reference would be not needed
either.  Can you clarify what "containing object" means in the
overall comment please?

If it is really relative to the innermost reference of exp you can
"CSE" the offset of TREE_OPERAND (exp, 0) and do relative
adjustments for all the other get_inner_reference calls.  For
example the

+  /* If we found the end of the bit field sequence, include the
+     padding up to the next field...  */
   if (fld)
     {
...
+      /* Calculate bitpos and offset of the next field.  */
+      get_inner_reference (build3 (COMPONENT_REF,
+				   TREE_TYPE (exp),
+				   TREE_OPERAND (exp, 0),
+				   fld, NULL_TREE),
+			   &tbitsize, &end_bitpos, &end_offset,
+			   &tmode, &tunsignedp, &tvolatilep, true);

case is not correct anyway, fld may have variable position
(non-INTEGER_CST DECL_FIELD_OFFSET), you can't
assume

+      *maxbits = TREE_INT_CST_LOW (maxbits_tree);

this thus.

+  /* ...otherwise, this is the last element in the structure.  */
   else
     {
-      /* If this is the last element in the structure, include the padding
-	 at the end of structure.  */
-      *bitend = TREE_INT_CST_LOW (TYPE_SIZE (record_type)) - 1;
+      /* Include the padding at the end of structure.  */
+      *maxbits = TREE_INT_CST_LOW (TYPE_SIZE (record_type))
+	- TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start));
+      /* Round up to the next byte.  */
+      *maxbits = (*maxbits + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
     }

so you wasn't convinced about my worries about tail-padding re-use?
And you blindly assume a constant-size record_type ...
and you don't account for DECL_FIELD_OFFSET of bitregion_start
(shouldn't you simply use (and compute) a byte_offset relative to
the start of the record)?  Well, I still think you cannot incoude the
padding at the end of the structure (if TREE_OPERAND (exp, 0) is
a COMPONENT_REF as well then its DECL_SIZE can be different
than it's TYPE_SIZE).

+	      bitregion_byte_offset = fold_build2 (MINUS_EXPR, integer_type_node,
+					      bitregion_byte_offset,
+					      build_int_cst (integer_type_node,
+							     bitpos / BITS_PER_UNIT));

general remark - you should be using sizetype for byte offsets,
bitsizetype for bit offset trees and size_binop for computations, instead
of fold_build2 (applies everywhere).  And thus pass size_zero_node
to store_field bitregion_byte_offset.

Can you split out the get_best_mode two param removal pieces?  Consider
them pre-approved.

Why do you need to adjust store_bit_field with the extra param - can't
you simply pass an adjusted str_rtx from the single caller that can
have that non-zero?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-29 12:54                                                           ` Richard Guenther
@ 2011-08-30 16:07                                                             ` Aldy Hernandez
  2011-08-31  8:38                                                               ` Richard Guenther
  2011-08-30 16:53                                                             ` Aldy Hernandez
                                                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-30 16:07 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

[I'm going to respond to this piece-meal, to make sure I don't drop 
anything.  My apologies for the long thread, but I'm pretty sure it's in 
everybody's kill file by now.]

> +  /* Be as conservative as possible on variable offsets.  */
> +  if (TREE_OPERAND (exp, 2)
> +&&  !host_integerp (TREE_OPERAND (exp, 2), 1))
> +    {
> +      *byte_offset = TREE_OPERAND (exp, 2);
> +      *maxbits = BITS_PER_UNIT;
> +      return;
> +    }
>
> shouldn't this be at the very beginning of the function?  Because
> you've set *bit_offset to an offset that was _not_ calculated relative

Sure.  I assume in this case, *bit_offset would be 0, right?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-29 12:54                                                           ` Richard Guenther
  2011-08-30 16:07                                                             ` Aldy Hernandez
@ 2011-08-30 16:53                                                             ` Aldy Hernandez
  2011-08-31  8:55                                                               ` Richard Guenther
  2011-08-30 21:33                                                             ` Aldy Hernandez
  2011-09-01 14:53                                                             ` Aldy Hernandez
  3 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-30 16:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches


> *bit_offset is supposed to be relative to *byte_offset then it should
> be easy to calculate it without another get_inner_reference.

Since, as you suggested, we will terminate early on variable length 
offsets, we can assume both DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET 
will be constants by now.  So, I assume we can calculate the bit offset 
like this:

*bit_offset = (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (fld))
	       * BITS_PER_UNIT
	       + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (fld)))
   - (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (bitregion_start))
      * BITS_PER_UNIT
      + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start)));

(Yes, I know we can factor out the BITS_PER_UNIT and only do one 
multiplication, it's just easier to read this way.)

Is this what you had in mind?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-29 12:54                                                           ` Richard Guenther
  2011-08-30 16:07                                                             ` Aldy Hernandez
  2011-08-30 16:53                                                             ` Aldy Hernandez
@ 2011-08-30 21:33                                                             ` Aldy Hernandez
  2011-08-31  8:55                                                               ` Richard Guenther
  2011-09-01 14:53                                                             ` Aldy Hernandez
  3 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-30 21:33 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches


> Btw, *byte_offset is still not relative to the containing object as
> documented, but relative to the base object of the exp reference
> tree (thus, to a in a.i.j.k.l instead of to a.i.j.k).  If it were supposed
> to be relative to a.i.j.k get_inner_reference would be not needed
> either.  Can you clarify what "containing object" means in the
> overall comment please?

I'm thoroughly confused here.  Originally I had "inner decl", then we 
changed the nomenclature to "containing object", and now there's this 
"innermost reference".

What I mean to say is the "a" in a.i.j.k.l.  How would you like me to 
call that?  The innermost reference?  The inner decl?  Would this 
comment be acceptable:

    Given a COMPONENT_REF, this function calculates the byte offset
    from the innermost reference ("a" in a.i.j.k.l) to the start of the
    contiguous bit region containing the field in question.

>
> If it is really relative to the innermost reference of exp you can
> "CSE" the offset of TREE_OPERAND (exp, 0) and do relative
> adjustments for all the other get_inner_reference calls.  For
> example the
>
> +  /* If we found the end of the bit field sequence, include the
> +     padding up to the next field...  */
>     if (fld)
>       {
> ...
> +      /* Calculate bitpos and offset of the next field.  */
> +      get_inner_reference (build3 (COMPONENT_REF,
> +				   TREE_TYPE (exp),
> +				   TREE_OPERAND (exp, 0),
> +				   fld, NULL_TREE),
> +			&tbitsize,&end_bitpos,&end_offset,
> +			&tmode,&tunsignedp,&tvolatilep, true);
>
> case is not correct anyway, fld may have variable position
> (non-INTEGER_CST DECL_FIELD_OFFSET), you can't
> assume

Innermost here means "a" in a.i.j.k.l?  If so, this is what we're 
currently doing, *byte_offset is the start of the bit region, and 
*bit_offset is the offset from that.

First, I thought we couldn't get a variable position here because we are 
now handling that case at the beginning of the function with:

   /* Be as conservative as possible on variable offsets.  */
   if (TREE_OPERAND (exp, 2)
       && !host_integerp (TREE_OPERAND (exp, 2), 1))
     {
       *byte_offset = TREE_OPERAND (exp, 2);
       *maxbits = BITS_PER_UNIT;
       *bit_offset = 0;
       return;
     }

And even if we do get a variable position, I have so far being able to 
get away with this...

>
> +      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
>
> this thus.

...because the call to fold_build2 immediately preceding this will fold 
away the variable offset.

Is what you want, that we call get_inner_reference once, and then use 
DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET to calculate any subsequent bit 
offset?  I found this to be quite tricky with padding, and such, but am 
willing to give it a whirl again.

However, could I beg you to reconsider this, and get something working 
first, only later concentrating on removing the get_inner_reference() 
calls, and performing any other tweaks/optimizations?

Aldy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-30 16:07                                                             ` Aldy Hernandez
@ 2011-08-31  8:38                                                               ` Richard Guenther
  2011-08-31 13:56                                                                 ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-08-31  8:38 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

On Tue, Aug 30, 2011 at 5:01 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> [I'm going to respond to this piece-meal, to make sure I don't drop
> anything.  My apologies for the long thread, but I'm pretty sure it's in
> everybody's kill file by now.]
>
>> +  /* Be as conservative as possible on variable offsets.  */
>> +  if (TREE_OPERAND (exp, 2)
>> +&&  !host_integerp (TREE_OPERAND (exp, 2), 1))
>> +    {
>> +      *byte_offset = TREE_OPERAND (exp, 2);
>> +      *maxbits = BITS_PER_UNIT;
>> +      return;
>> +    }
>>
>> shouldn't this be at the very beginning of the function?  Because
>> you've set *bit_offset to an offset that was _not_ calculated relative
>
> Sure.  I assume in this case, *bit_offset would be 0, right?

It would be DECL_FIELD_BIT_OFFSET of that field.  Oh, and
*byte_offset would be

*byte_offset = size_binop (MULT_EXPR, TREE_OPERAND (exp, 2),
                                       size_int (DECL_OFFSET_ALIGN
(field) / BITS_PER_UNIT));

see expr.c:component_ref_field_offset () (which you conveniently
could use here).

Note that both TREE_OPERAND (exp, 2) and compoment_ref_field_offset
return offsets relative to the immediate containing struct type, not
relative to the base object like get_inner_reference does ...
(where it is still unclear to me what we are supposed to return from this
function ...)

Thus, conservative would be using get_inner_reference here, if the
offset is supposed to be relative to the base object.

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-30 21:33                                                             ` Aldy Hernandez
@ 2011-08-31  8:55                                                               ` Richard Guenther
  2011-08-31 20:37                                                                 ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-08-31  8:55 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

On Tue, Aug 30, 2011 at 8:13 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Btw, *byte_offset is still not relative to the containing object as
>> documented, but relative to the base object of the exp reference
>> tree (thus, to a in a.i.j.k.l instead of to a.i.j.k).  If it were supposed
>> to be relative to a.i.j.k get_inner_reference would be not needed
>> either.  Can you clarify what "containing object" means in the
>> overall comment please?
>
> I'm thoroughly confused here.  Originally I had "inner decl", then we
> changed the nomenclature to "containing object", and now there's this
> "innermost reference".

Well, the nomenclature is not so important once the function only
computes one variant.  Only because it doesn't right now I am
confused with the nomenclature trying to figure out what it is supposed
to be relative to ...

The containing object of a component-ref is TREE_OPERAND (exp, 0)
to me.  The base object would be get_base_object (exp), which is
eventually what we want, right?

> What I mean to say is the "a" in a.i.j.k.l.  How would you like me to call
> that?  The innermost reference?  The inner decl?  Would this comment be
> acceptable:
>
>   Given a COMPONENT_REF, this function calculates the byte offset
>   from the innermost reference ("a" in a.i.j.k.l) to the start of the
>   contiguous bit region containing the field in question.

  from the base object ("a" in a.i.j.k.l) ...

would be fine with me.

>>
>> If it is really relative to the innermost reference of exp you can
>> "CSE" the offset of TREE_OPERAND (exp, 0) and do relative
>> adjustments for all the other get_inner_reference calls.  For
>> example the
>>
>> +  /* If we found the end of the bit field sequence, include the
>> +     padding up to the next field...  */
>>    if (fld)
>>      {
>> ...
>> +      /* Calculate bitpos and offset of the next field.  */
>> +      get_inner_reference (build3 (COMPONENT_REF,
>> +                                  TREE_TYPE (exp),
>> +                                  TREE_OPERAND (exp, 0),
>> +                                  fld, NULL_TREE),
>> +                       &tbitsize,&end_bitpos,&end_offset,
>> +                       &tmode,&tunsignedp,&tvolatilep, true);
>>
>> case is not correct anyway, fld may have variable position
>> (non-INTEGER_CST DECL_FIELD_OFFSET), you can't
>> assume
>
> Innermost here means "a" in a.i.j.k.l?  If so, this is what we're currently
> doing, *byte_offset is the start of the bit region, and *bit_offset is the
> offset from that.
>
> First, I thought we couldn't get a variable position here because we are now
> handling that case at the beginning of the function with:
>
>  /* Be as conservative as possible on variable offsets.  */
>  if (TREE_OPERAND (exp, 2)
>      && !host_integerp (TREE_OPERAND (exp, 2), 1))
>    {
>      *byte_offset = TREE_OPERAND (exp, 2);
>      *maxbits = BITS_PER_UNIT;
>      *bit_offset = 0;
>      return;
>    }
>
> And even if we do get a variable position, I have so far being able to get
> away with this...

Did you test Ada and enable the C++ memory model? ;)

Btw, even if the bitfield we access (and thus the whole region) is at a
constant offset, the field _following_ the bitregion (the one you query
above with get_inner_reference) can be at variable offset.  I suggest
to simply not include any padding in that case (which would be,
TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST).

>>
>> +      *maxbits = TREE_INT_CST_LOW (maxbits_tree);
>>
>> this thus.
>
> ...because the call to fold_build2 immediately preceding this will fold away
> the variable offset.

You hope so ;)

> Is what you want, that we call get_inner_reference once, and then use
> DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET to calculate any subsequent bit
> offset?  I found this to be quite tricky with padding, and such, but am
> willing to give it a whirl again.

Yes.

> However, could I beg you to reconsider this, and get something working
> first, only later concentrating on removing the get_inner_reference() calls,
> and performing any other tweaks/optimizations?

Sure, it's fine to tweak this in a followup.

Thanks,
Richard.

> Aldy
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-30 16:53                                                             ` Aldy Hernandez
@ 2011-08-31  8:55                                                               ` Richard Guenther
  2011-08-31 17:24                                                                 ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-08-31  8:55 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

On Tue, Aug 30, 2011 at 6:15 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> *bit_offset is supposed to be relative to *byte_offset then it should
>> be easy to calculate it without another get_inner_reference.
>
> Since, as you suggested, we will terminate early on variable length offsets,
> we can assume both DECL_FIELD_OFFSET and DECL_FIELD_BIT_OFFSET will be
> constants by now.

Yes.

>  So, I assume we can calculate the bit offset like this:
>
> *bit_offset = (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (fld))
>               * BITS_PER_UNIT
>               + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (fld)))
>  - (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (bitregion_start))
>     * BITS_PER_UNIT
>     + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start)));
>
> (Yes, I know we can factor out the BITS_PER_UNIT and only do one
> multiplication, it's just easier to read this way.)
>
> Is this what you had in mind?

Yes.  For convenience I'd simply use double_ints for the intermediate
calculations.

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-31  8:38                                                               ` Richard Guenther
@ 2011-08-31 13:56                                                                 ` Richard Guenther
  2011-08-31 20:37                                                                   ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-08-31 13:56 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jason Merrill, gcc-patches, Jakub Jelinek

On Wed, Aug 31, 2011 at 9:45 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Aug 30, 2011 at 5:01 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> [I'm going to respond to this piece-meal, to make sure I don't drop
>> anything.  My apologies for the long thread, but I'm pretty sure it's in
>> everybody's kill file by now.]
>>
>>> +  /* Be as conservative as possible on variable offsets.  */
>>> +  if (TREE_OPERAND (exp, 2)
>>> +&&  !host_integerp (TREE_OPERAND (exp, 2), 1))
>>> +    {
>>> +      *byte_offset = TREE_OPERAND (exp, 2);
>>> +      *maxbits = BITS_PER_UNIT;
>>> +      return;
>>> +    }
>>>
>>> shouldn't this be at the very beginning of the function?  Because
>>> you've set *bit_offset to an offset that was _not_ calculated relative
>>
>> Sure.  I assume in this case, *bit_offset would be 0, right?
>
> It would be DECL_FIELD_BIT_OFFSET of that field.  Oh, and
> *byte_offset would be
>
> *byte_offset = size_binop (MULT_EXPR, TREE_OPERAND (exp, 2),
>                                       size_int (DECL_OFFSET_ALIGN
> (field) / BITS_PER_UNIT));
>
> see expr.c:component_ref_field_offset () (which you conveniently
> could use here).
>
> Note that both TREE_OPERAND (exp, 2) and compoment_ref_field_offset
> return offsets relative to the immediate containing struct type, not
> relative to the base object like get_inner_reference does ...
> (where it is still unclear to me what we are supposed to return from this
> function ...)
>
> Thus, conservative would be using get_inner_reference here, if the
> offset is supposed to be relative to the base object.

That said, shouldn't *maxbits not at least make sure to cover the field itself?

> Richard.
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-31  8:55                                                               ` Richard Guenther
@ 2011-08-31 17:24                                                                 ` Aldy Hernandez
  0 siblings, 0 replies; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-31 17:24 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches


>> *bit_offset = (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (fld))
>>                * BITS_PER_UNIT
>>                + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (fld)))
>>   - (TREE_INT_CST_LOW (DECL_FIELD_OFFSET (bitregion_start))
>>      * BITS_PER_UNIT
>>      + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (bitregion_start)));
>>
>> (Yes, I know we can factor out the BITS_PER_UNIT and only do one
>> multiplication, it's just easier to read this way.)
>>
>> Is this what you had in mind?
>
> Yes.  For convenience I'd simply use double_ints for the intermediate
> calculations.

Ok, let's leave it like this for now.  I have added a FIXME note, and we 
can optimize this after we get everything working.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-31  8:55                                                               ` Richard Guenther
@ 2011-08-31 20:37                                                                 ` Aldy Hernandez
  2011-09-01  7:02                                                                   ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-31 20:37 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches


> Did you test Ada and enable the C++ memory model? ;)

See my earlier comment on Ada.  Who would ever use the C++ memory model 
on Ada?

> Btw, even if the bitfield we access (and thus the whole region) is at a
> constant offset, the field _following_ the bitregion (the one you query
> above with get_inner_reference) can be at variable offset.  I suggest
> to simply not include any padding in that case (which would be,
> TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST).

I still have not found a place where we get a variable offset here 
(after folding the computation).  How about we put a gcc_assert() along 
with a big fat comment with your above suggestion when we encounter 
this.  Or can you give me an example of this case?

>> Is what you want, that we call get_inner_reference once, and then use
>> DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET to calculate any subsequent bit
>> offset?  I found this to be quite tricky with padding, and such, but am
>> willing to give it a whirl again.
>
> Yes.

I have added a comment to this effect, and will address it along with 
the get_inner_reference() removal you have suggested as a followup.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-31 13:56                                                                 ` Richard Guenther
@ 2011-08-31 20:37                                                                   ` Aldy Hernandez
  2011-09-01  6:58                                                                     ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-08-31 20:37 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches


>>> Sure.  I assume in this case, *bit_offset would be 0, right?
>>
>> It would be DECL_FIELD_BIT_OFFSET of that field.  Oh, and
>> *byte_offset would be
>>
>> *byte_offset = size_binop (MULT_EXPR, TREE_OPERAND (exp, 2),
>>                                        size_int (DECL_OFFSET_ALIGN
>> (field) / BITS_PER_UNIT));
>>
>> see expr.c:component_ref_field_offset () (which you conveniently
>> could use here).
>>
>> Note that both TREE_OPERAND (exp, 2) and compoment_ref_field_offset
>> return offsets relative to the immediate containing struct type, not
>> relative to the base object like get_inner_reference does ...
>> (where it is still unclear to me what we are supposed to return from this
>> function ...)

Ok, I see where your confusion lies.  The function is supposed to return 
a byte offset from the base object, none of this containing object or 
immediate struct, or whatever.  Base object, as in "a" in a.i.j.k, as in 
what you get back from get_base_address().

Originally everything was calculated with get_inner_reference(), which 
is relative to the base object, but now we have this hodge podge of 
get_inner_reference() calls with ad-hoc calculations and optimizations. 
  Gladly, we've agreed to use get_inner_reference() and optimize at a 
later time.

So... base object throughout, anything else is a mistake on my part.

BTW, this whole variable length offset I still can't trigger.  I know 
you want to cater to Ada, but does it even make sense to enable the C++ 
memory model in Ada?  Who would ever do this?  Be that as it may, I'll 
humor you and handle it.

>> Thus, conservative would be using get_inner_reference here, if the
>> offset is supposed to be relative to the base object.
>
> That said, shouldn't *maxbits not at least make sure to cover the field itself?

Is this what you want?

   /* Be as conservative as possible on variable offsets.  */
   if (TREE_OPERAND (exp, 2)
       && !host_integerp (TREE_OPERAND (exp, 2), 1))
     {
       get_inner_reference (build3 (COMPONENT_REF,
				   TREE_TYPE (exp),
				   TREE_OPERAND (exp, 0),
				   field, NULL_TREE),
			   &tbitsize, &start_bitpos, &start_offset,
			   &tmode, &tunsignedp, &tvolatilep, true);

       *byte_offset = start_offset ? start_offset : size_zero_node;
       *bit_offset = start_bitpos;
       *maxbits = tbitsize;
       return;
     }

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-31 20:37                                                                   ` Aldy Hernandez
@ 2011-09-01  6:58                                                                     ` Richard Guenther
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Guenther @ 2011-09-01  6:58 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

On Wed, Aug 31, 2011 at 6:53 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>>>> Sure.  I assume in this case, *bit_offset would be 0, right?
>>>
>>> It would be DECL_FIELD_BIT_OFFSET of that field.  Oh, and
>>> *byte_offset would be
>>>
>>> *byte_offset = size_binop (MULT_EXPR, TREE_OPERAND (exp, 2),
>>>                                       size_int (DECL_OFFSET_ALIGN
>>> (field) / BITS_PER_UNIT));
>>>
>>> see expr.c:component_ref_field_offset () (which you conveniently
>>> could use here).
>>>
>>> Note that both TREE_OPERAND (exp, 2) and compoment_ref_field_offset
>>> return offsets relative to the immediate containing struct type, not
>>> relative to the base object like get_inner_reference does ...
>>> (where it is still unclear to me what we are supposed to return from this
>>> function ...)
>
> Ok, I see where your confusion lies.  The function is supposed to return a
> byte offset from the base object, none of this containing object or
> immediate struct, or whatever.  Base object, as in "a" in a.i.j.k, as in
> what you get back from get_base_address().
>
> Originally everything was calculated with get_inner_reference(), which is
> relative to the base object, but now we have this hodge podge of
> get_inner_reference() calls with ad-hoc calculations and optimizations.
>  Gladly, we've agreed to use get_inner_reference() and optimize at a later
> time.
>
> So... base object throughout, anything else is a mistake on my part.
>
> BTW, this whole variable length offset I still can't trigger.  I know you
> want to cater to Ada, but does it even make sense to enable the C++ memory
> model in Ada?  Who would ever do this?  Be that as it may, I'll humor you
> and handle it.
>
>>> Thus, conservative would be using get_inner_reference here, if the
>>> offset is supposed to be relative to the base object.
>>
>> That said, shouldn't *maxbits not at least make sure to cover the field
>> itself?
>
> Is this what you want?
>
>  /* Be as conservative as possible on variable offsets.  */
>  if (TREE_OPERAND (exp, 2)
>      && !host_integerp (TREE_OPERAND (exp, 2), 1))
>    {
>      get_inner_reference (build3 (COMPONENT_REF,
>                                   TREE_TYPE (exp),
>                                   TREE_OPERAND (exp, 0),
>                                   field, NULL_TREE),
>                           &tbitsize, &start_bitpos, &start_offset,
>                           &tmode, &tunsignedp, &tvolatilep, true);
>
>      *byte_offset = start_offset ? start_offset : size_zero_node;
>      *bit_offset = start_bitpos;
>      *maxbits = tbitsize;
>      return;
>    }

Yes, exactly.

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-31 20:37                                                                 ` Aldy Hernandez
@ 2011-09-01  7:02                                                                   ` Richard Guenther
  2011-09-01  7:05                                                                     ` Arnaud Charlet
  2011-09-01 14:16                                                                     ` Aldy Hernandez
  0 siblings, 2 replies; 81+ messages in thread
From: Richard Guenther @ 2011-09-01  7:02 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

On Wed, Aug 31, 2011 at 8:09 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Did you test Ada and enable the C++ memory model? ;)
>
> See my earlier comment on Ada.  Who would ever use the C++ memory model on
> Ada?

People interoperating Ada with C++.  Our bug triager Zdenek who
figures out the --param?

>> Btw, even if the bitfield we access (and thus the whole region) is at a
>> constant offset, the field _following_ the bitregion (the one you query
>> above with get_inner_reference) can be at variable offset.  I suggest
>> to simply not include any padding in that case (which would be,
>> TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST).
>
> I still have not found a place where we get a variable offset here (after
> folding the computation).  How about we put a gcc_assert() along with a big
> fat comment with your above suggestion when we encounter this.  Or can you
> give me an example of this case?

My point is, the middle-end infrastructure makes it possible for this
case to appear, and it seems to be easy to handle conservatively.
There isn't a need to wait for users to run into an ICE or an assert we put
there IMHO.  If I'd be fluent in Ada I'd write you a testcase, but I ain't.

>>> Is what you want, that we call get_inner_reference once, and then use
>>> DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET to calculate any subsequent bit
>>> offset?  I found this to be quite tricky with padding, and such, but am
>>> willing to give it a whirl again.
>>
>> Yes.
>
> I have added a comment to this effect, and will address it along with the
> get_inner_reference() removal you have suggested as a followup.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-01  7:02                                                                   ` Richard Guenther
@ 2011-09-01  7:05                                                                     ` Arnaud Charlet
  2011-09-01 14:16                                                                     ` Aldy Hernandez
  1 sibling, 0 replies; 81+ messages in thread
From: Arnaud Charlet @ 2011-09-01  7:05 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Aldy Hernandez, gcc-patches, Eric Botcazou

> >> Did you test Ada and enable the C++ memory model? ;)
> >
> > See my earlier comment on Ada.  Who would ever use the C++ memory model on
> > Ada?
> 
> People interoperating Ada with C++.  Our bug triager Zdenek who
> figures out the --param?

Right, that's one example. There are also actually some similarities between
the C++ memory model and the Ada language, so it's not so unconcievable
that Ada would like to take advantage of some of these capabilities.

Arno

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-01  7:02                                                                   ` Richard Guenther
  2011-09-01  7:05                                                                     ` Arnaud Charlet
@ 2011-09-01 14:16                                                                     ` Aldy Hernandez
  2011-09-02  8:48                                                                       ` Richard Guenther
  1 sibling, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-09-01 14:16 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches


> My point is, the middle-end infrastructure makes it possible for this
> case to appear, and it seems to be easy to handle conservatively.
> There isn't a need to wait for users to run into an ICE or an assert we put
> there IMHO.  If I'd be fluent in Ada I'd write you a testcase, but I ain't.

Ughh, this is getting messier.

Ok, I propose keeping track of the field prior (lastfld), calling 
get_inner_reference() and adding DECL_SIZE (or tbitsize if you prefer) 
to calculate maxbits without the padding.

Notice the comment at the top.  We can get rid of yet another call to 
get_inner_reference later.

Is this what you had in mind?

BTW, we don't need to round up to the next byte here, do we?

Thanks.
Aldy

   /* If we found the end of the bit field sequence, include the
      padding up to the next field...  */
   if (fld)
     {
       tree end_offset, t;
       HOST_WIDE_INT end_bitpos;

       /* FIXME: Only call get_inner_reference once (at the beginning
	 of the bit region), and use
	 DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET throughout to
	 calculate any subsequent bit offset.  */

       /* Even if the bitfield we access (and thus the whole region) is
	 at a constant offset, the field _following_ the bitregion can
	 be at variable offset.  In this case, do not include any
	 padding.  This is mostly for Ada.  */
       if (TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST)
	{
	  get_inner_reference (build3 (COMPONENT_REF,
				       TREE_TYPE (exp),
				       TREE_OPERAND (exp, 0),
				       lastfld, NULL_TREE),
			       &tbitsize, &end_bitpos, &end_offset,
			       &tmode, &tunsignedp, &tvolatilep, true);

	  /* Calculate the size of the bit region up the last
	     bitfield, excluding any subsequent padding.

	     t = (end_byte_off - start_byte_offset) * 8 + end_bit_off */
	  end_offset = end_offset ? end_offset : size_zero_node;
	  t = fold_build2 (PLUS_EXPR, size_type_node,
			   fold_build2 (MULT_EXPR, size_type_node,
					fold_build2 (MINUS_EXPR, size_type_node,
						     end_offset,
						     *byte_offset),
					build_int_cst (size_type_node,
						       BITS_PER_UNIT)),
			   build_int_cst (size_type_node,
					  end_bitpos));
	  /* Add the bitsize of the last field.  */
	  t = fold_build2 (PLUS_EXPR, size_type_node,
			   t, DECL_SIZE (lastfld));

	  *maxbits = tree_low_cst (t, 1);
	  return;
	}
...
...
...

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-08-29 12:54                                                           ` Richard Guenther
                                                                               ` (2 preceding siblings ...)
  2011-08-30 21:33                                                             ` Aldy Hernandez
@ 2011-09-01 14:53                                                             ` Aldy Hernandez
  2011-09-01 15:01                                                               ` Jason Merrill
  3 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-09-01 14:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Jason Merrill, gcc-patches

[Jason, can you pontificate on tail-padding and the upcoming C++ 
standard with regards to bitfields?]

> so you wasn't convinced about my worries about tail-padding re-use?

To answer your question, I believe we can't touch past the last field 
(into the padding) if the subsequent record will be packed into the 
first's padding.

struct A {
   int a : 17;
};
struct B : public A {
   char c;
};

So here, if <c> gets packed into the tail-padding of A, we can't touch 
the padding of A when storing into <a>.  These are different structures, 
and I assume would be treated as nested structures, which are distinct 
memory locations.

Is there a way of distinguishing this particular variant (possible 
tail-packing), or will we have to disallow storing into the record tail 
padding altogether?  That would seriously suck.

Aldy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-01 14:53                                                             ` Aldy Hernandez
@ 2011-09-01 15:01                                                               ` Jason Merrill
  2011-09-01 15:10                                                                 ` Aldy Hernandez
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-09-01 15:01 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Richard Guenther, gcc-patches

On 09/01/2011 10:52 AM, Aldy Hernandez wrote:
> To answer your question, I believe we can't touch past the last field
> (into the padding) if the subsequent record will be packed into the
> first's padding.

Right.

> struct A {
>   int a : 17;
> };
> struct B : public A {
>   char c;
> };
>
> So here, if <c> gets packed into the tail-padding of A, we can't touch
> the padding of A when storing into <a>.

But that doesn't apply to this testcase because A is a POD class, so we 
don't mess with its tail padding.

> Is there a way of distinguishing this particular variant (possible
> tail-packing), or will we have to disallow storing into the record tail
> padding altogether? That would seriously suck.

Basically you can only touch the size of the CLASSTYPE_AS_BASE variant. 
  For many classes this will be the same as the size of the class itself.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-01 15:01                                                               ` Jason Merrill
@ 2011-09-01 15:10                                                                 ` Aldy Hernandez
  2011-09-01 15:20                                                                   ` Jason Merrill
  0 siblings, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-09-01 15:10 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Richard Guenther, gcc-patches


>> Is there a way of distinguishing this particular variant (possible
>> tail-packing), or will we have to disallow storing into the record tail
>> padding altogether? That would seriously suck.
>
> Basically you can only touch the size of the CLASSTYPE_AS_BASE variant.
> For many classes this will be the same as the size of the class itself.

All this code is in the middle end, so we're language agnostic.

What do we need here, a hook to query the front-end, or is it too late? 
  Or will we have to play it conservative and never touch the padding 
(regardless of language)?

Aldy

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-01 15:10                                                                 ` Aldy Hernandez
@ 2011-09-01 15:20                                                                   ` Jason Merrill
  2011-09-02  8:53                                                                     ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-09-01 15:20 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Richard Guenther, gcc-patches

On 09/01/2011 11:10 AM, Aldy Hernandez wrote:
>> Basically you can only touch the size of the CLASSTYPE_AS_BASE variant.
>> For many classes this will be the same as the size of the class itself.
>
> All this code is in the middle end, so we're language agnostic.
>
> What do we need here, a hook to query the front-end, or is it too late?
> Or will we have to play it conservative and never touch the padding
> (regardless of language)?

I think it would make sense to expose this information to the back end 
somehow.  A hook would do the trick: call it type_data_size or 
type_min_size or some such, which in the C++ front end would return 
TYPE_SIZE (CLASSTYPE_AS_BASE (t)) for classes or just TYPE_SIZE for 
other types.

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-01 14:16                                                                     ` Aldy Hernandez
@ 2011-09-02  8:48                                                                       ` Richard Guenther
  2011-09-02 12:49                                                                         ` Aldy Hernandez
  2011-09-02 20:34                                                                         ` Jeff Law
  0 siblings, 2 replies; 81+ messages in thread
From: Richard Guenther @ 2011-09-02  8:48 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

On Thu, Sep 1, 2011 at 4:16 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> My point is, the middle-end infrastructure makes it possible for this
>> case to appear, and it seems to be easy to handle conservatively.
>> There isn't a need to wait for users to run into an ICE or an assert we
>> put
>> there IMHO.  If I'd be fluent in Ada I'd write you a testcase, but I
>> ain't.
>
> Ughh, this is getting messier.
>
> Ok, I propose keeping track of the field prior (lastfld), calling
> get_inner_reference() and adding DECL_SIZE (or tbitsize if you prefer) to
> calculate maxbits without the padding.
>
> Notice the comment at the top.  We can get rid of yet another call to
> get_inner_reference later.
>
> Is this what you had in mind?

That could work also for the tail-padding re-use case, yes.  Note that
DECL_SIZE of the field is just the last fieds bit-precision, so ..

> BTW, we don't need to round up to the next byte here, do we?

.. rounding up to the next byte cannot hurt (dependent on what the
caller will do with that value).

Note that with all this mess I'll re-iterate some of my initial thoughts.
1) why not do this C++ (or C) specific stuff in the frontends, maybe
at gimplifying/genericization time?  That way you wouldn't need to
worry about middle-end features but you could rely solely on what
C/C++ permit.  It is, after all, C++ _frontend_ semantics that we
enforce here, in the middle-end, which looks out-of-place.
2) all this information we try to re-construct here is sort-of readily
available when we layout the record (thus, from layout_type and
friends).  We should really, really try to preserve it there, rather
than jumping through hoops here (ideally we'd have an
(unused?) FIELD_DECL that covers the whole "bitfield group"
followed by the individual FIELD_DECLS for the bits (yep, they'd
overlap that group FIELD_DECL), and they would refer back to
that group FIELD_DECL)

Is the C++ memory model stuff going to be "ready" for 4.7?

Thanks,
Richard.

> Thanks.
> Aldy
>
>  /* If we found the end of the bit field sequence, include the
>     padding up to the next field...  */
>  if (fld)
>    {
>      tree end_offset, t;
>      HOST_WIDE_INT end_bitpos;
>
>      /* FIXME: Only call get_inner_reference once (at the beginning
>         of the bit region), and use
>         DECL_FIELD_OFFSET+DECL_FIELD_BIT_OFFSET throughout to
>         calculate any subsequent bit offset.  */
>
>      /* Even if the bitfield we access (and thus the whole region) is
>         at a constant offset, the field _following_ the bitregion can
>         be at variable offset.  In this case, do not include any
>         padding.  This is mostly for Ada.  */
>      if (TREE_CODE (DECL_FIELD_OFFSET (fld)) != INTEGER_CST)
>        {
>          get_inner_reference (build3 (COMPONENT_REF,
>                                       TREE_TYPE (exp),
>                                       TREE_OPERAND (exp, 0),
>                                       lastfld, NULL_TREE),
>                               &tbitsize, &end_bitpos, &end_offset,
>                               &tmode, &tunsignedp, &tvolatilep, true);
>
>          /* Calculate the size of the bit region up the last
>             bitfield, excluding any subsequent padding.
>
>             t = (end_byte_off - start_byte_offset) * 8 + end_bit_off */
>          end_offset = end_offset ? end_offset : size_zero_node;
>          t = fold_build2 (PLUS_EXPR, size_type_node,
>                           fold_build2 (MULT_EXPR, size_type_node,
>                                        fold_build2 (MINUS_EXPR,
> size_type_node,
>                                                     end_offset,
>                                                     *byte_offset),
>                                        build_int_cst (size_type_node,
>                                                       BITS_PER_UNIT)),
>                           build_int_cst (size_type_node,
>                                          end_bitpos));
>          /* Add the bitsize of the last field.  */
>          t = fold_build2 (PLUS_EXPR, size_type_node,
>                           t, DECL_SIZE (lastfld));
>
>          *maxbits = tree_low_cst (t, 1);
>          return;
>        }
> ...
> ...
> ...
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-01 15:20                                                                   ` Jason Merrill
@ 2011-09-02  8:53                                                                     ` Richard Guenther
  2011-09-02 14:10                                                                       ` Jason Merrill
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-09-02  8:53 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Aldy Hernandez, gcc-patches

On Thu, Sep 1, 2011 at 5:19 PM, Jason Merrill <jason@redhat.com> wrote:
> On 09/01/2011 11:10 AM, Aldy Hernandez wrote:
>>>
>>> Basically you can only touch the size of the CLASSTYPE_AS_BASE variant.
>>> For many classes this will be the same as the size of the class itself.
>>
>> All this code is in the middle end, so we're language agnostic.
>>
>> What do we need here, a hook to query the front-end, or is it too late?
>> Or will we have to play it conservative and never touch the padding
>> (regardless of language)?
>
> I think it would make sense to expose this information to the back end
> somehow.  A hook would do the trick: call it type_data_size or type_min_size
> or some such, which in the C++ front end would return TYPE_SIZE
> (CLASSTYPE_AS_BASE (t)) for classes or just TYPE_SIZE for other types.

That's too late to work with LTO, you'd need to store that information
permanently
somewhere.

Maybe move this whole C++ specific bitfield handling where it belongs,
namely to the C++ frontend?

I suggest to always not re-use tail padding for now (I believe if your
parent object is a COMPONENT_REF, thus, x.parent.bitfield,
you can use the TYPE_SIZE vs. field-decl DECL_SIZE discrepance
to decide about whether the tail-padding was reused, but please
double-check that ;)))

Richard.

> Jason
>
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-02  8:48                                                                       ` Richard Guenther
@ 2011-09-02 12:49                                                                         ` Aldy Hernandez
  2011-09-02 13:05                                                                           ` Richard Guenther
  2011-09-02 20:34                                                                         ` Jeff Law
  1 sibling, 1 reply; 81+ messages in thread
From: Aldy Hernandez @ 2011-09-02 12:49 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches


> Note that with all this mess I'll re-iterate some of my initial thoughts.
> 1) why not do this C++ (or C) specific stuff in the frontends, maybe
> at gimplifying/genericization time?  That way you wouldn't need to
> worry about middle-end features but you could rely solely on what
> C/C++ permit.  It is, after all, C++ _frontend_ semantics that we
> enforce here, in the middle-end, which looks out-of-place.

The front-end, really?  After all this going back and forth?  After you 
were all so worried about Ada, and now you're ditching it in favor of 
handling only C++?

> Is the C++ memory model stuff going to be "ready" for 4.7?

No, not if you expect me rewrite things every day.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-02 12:49                                                                         ` Aldy Hernandez
@ 2011-09-02 13:05                                                                           ` Richard Guenther
  0 siblings, 0 replies; 81+ messages in thread
From: Richard Guenther @ 2011-09-02 13:05 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

On Fri, Sep 2, 2011 at 2:49 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>> Note that with all this mess I'll re-iterate some of my initial thoughts.
>> 1) why not do this C++ (or C) specific stuff in the frontends, maybe
>> at gimplifying/genericization time?  That way you wouldn't need to
>> worry about middle-end features but you could rely solely on what
>> C/C++ permit.  It is, after all, C++ _frontend_ semantics that we
>> enforce here, in the middle-end, which looks out-of-place.
>
> The front-end, really?  After all this going back and forth?

Well, I'm fine with handling it in the middle-end if it's correct there.

> After you were
> all so worried about Ada, and now you're ditching it in favor of handling
> only C++?

I'm just showing you a possible solution for where you'd not need to
worry ;)  Consider LTOing an Ada and a C++ module - you need to
enable the C++ memory model at link-time so it is in effect when we
process bit-fields.  That will automatically enable it for the Ada pieces, too.

>> Is the C++ memory model stuff going to be "ready" for 4.7?
>
> No, not if you expect me rewrite things every day.

I don't expect you to rewrite things every day.

Don't read every comment I make as a definite decision and order to
you.  I am a mere mortal, too, and the bitfield thing is, I must admit,
still partially a mystery to myself (which is why I keep asking questions
instead of simply providing you with definite answers).  After all I pushed
back my idea of lowering bitfield accesses somewhere on GIMPLE and
I'm not sure if I get back to it for 4.7.  And I definitely would consider
2) for that work.

Btw, it would be nice if I weren't the only one reading your updated
patches :/  I'm just punching holes where I see them and hope I and
you learn something in that process.

Richard.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-02  8:53                                                                     ` Richard Guenther
@ 2011-09-02 14:10                                                                       ` Jason Merrill
  2011-09-02 14:38                                                                         ` Richard Guenther
  0 siblings, 1 reply; 81+ messages in thread
From: Jason Merrill @ 2011-09-02 14:10 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Aldy Hernandez, gcc-patches

On 09/02/2011 04:53 AM, Richard Guenther wrote:
> On Thu, Sep 1, 2011 at 5:19 PM, Jason Merrill<jason@redhat.com>  wrote:
>> I think it would make sense to expose this information to the back end
>> somehow.  A hook would do the trick: call it type_data_size or type_min_size
>> or some such, which in the C++ front end would return TYPE_SIZE
>> (CLASSTYPE_AS_BASE (t)) for classes or just TYPE_SIZE for other types.
>
> That's too late to work with LTO, you'd need to store that information
> permanently somewhere.

OK.

> Maybe move this whole C++ specific bitfield handling where it belongs,
> namely to the C++ frontend?

I don't think that is the way to go; C is adopting the same memory 
model, and this is the only sane thing to do with bit-fields.

> I suggest to always not re-use tail padding for now (I believe if your
> parent object is a COMPONENT_REF, thus, x.parent.bitfield,
> you can use the TYPE_SIZE vs. field-decl DECL_SIZE discrepancy
> to decide about whether the tail-padding was reused, but please
> double-check that ;)))

But you don't always have a COMPONENT_REF; you still need to avoid 
touching the tail padding when you just have a pointer to the type 
because it might be a base sub-object.

I wonder what would break if C++ just set TYPE_SIZE to the as-base size?

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-02 14:10                                                                       ` Jason Merrill
@ 2011-09-02 14:38                                                                         ` Richard Guenther
  2011-09-07 18:12                                                                           ` Jason Merrill
  0 siblings, 1 reply; 81+ messages in thread
From: Richard Guenther @ 2011-09-02 14:38 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Aldy Hernandez, gcc-patches

On Fri, Sep 2, 2011 at 4:10 PM, Jason Merrill <jason@redhat.com> wrote:
> On 09/02/2011 04:53 AM, Richard Guenther wrote:
>>
>> On Thu, Sep 1, 2011 at 5:19 PM, Jason Merrill<jason@redhat.com>  wrote:
>>>
>>> I think it would make sense to expose this information to the back end
>>> somehow.  A hook would do the trick: call it type_data_size or
>>> type_min_size
>>> or some such, which in the C++ front end would return TYPE_SIZE
>>> (CLASSTYPE_AS_BASE (t)) for classes or just TYPE_SIZE for other types.
>>
>> That's too late to work with LTO, you'd need to store that information
>> permanently somewhere.
>
> OK.
>
>> Maybe move this whole C++ specific bitfield handling where it belongs,
>> namely to the C++ frontend?
>
> I don't think that is the way to go; C is adopting the same memory model,
> and this is the only sane thing to do with bit-fields.
>
>> I suggest to always not re-use tail padding for now (I believe if your
>> parent object is a COMPONENT_REF, thus, x.parent.bitfield,
>> you can use the TYPE_SIZE vs. field-decl DECL_SIZE discrepancy
>> to decide about whether the tail-padding was reused, but please
>> double-check that ;)))
>
> But you don't always have a COMPONENT_REF; you still need to avoid touching
> the tail padding when you just have a pointer to the type because it might
> be a base sub-object.
>
> I wonder what would break if C++ just set TYPE_SIZE to the as-base size?

Good question.  Probably argument passing, as the as-base size wouldn't
get a proper mode assigned form layout_type then(?) for small structs?

Maybe worth a try ...

Richard.

> Jason
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-02  8:48                                                                       ` Richard Guenther
  2011-09-02 12:49                                                                         ` Aldy Hernandez
@ 2011-09-02 20:34                                                                         ` Jeff Law
  1 sibling, 0 replies; 81+ messages in thread
From: Jeff Law @ 2011-09-02 20:34 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Aldy Hernandez, gcc-patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/02/11 02:48, Richard Guenther wrote:
> 
> Note that with all this mess I'll re-iterate some of my initial
> thoughts. 1) why not do this C++ (or C) specific stuff in the
> frontends, maybe at gimplifying/genericization time?  That way you
> wouldn't need to worry about middle-end features but you could rely
> solely on what C/C++ permit.  It is, after all, C++ _frontend_
> semantics that we enforce here, in the middle-end, which looks
> out-of-place.
Well, it's worth keeping in mind that fixing the way we handle bitfields
is just one piece of a larger project.  Furthermore, many of the ideas
in the C++ memory model are applicable to other languages.

However, I must admit, I'm somewhat at a loss, I thought we were doing
all this in stor-layout.c at the time we layout the structure's memory
form then just trying to keep the code generator and optimizers from
mucking things up by combining accesses and the like.

Clearly I'm going to need to sit down and review the code as well.
Which means learning about a part of GCC I've largely been able to
ignore...  Sigh...

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOYT2jAAoJEBRtltQi2kC7BF4IAJWwqzsPUdhsHaodlUfm1LRu
JpMojs04wPVfu12+8zcXaWfitjOhEPJDPcZ5c0AHb74NRPJINJmjDvSBZWfFazaE
CrZE/U9IBz7Ay8s/gw//uMIWDS8lNjjYFxpqn6VMUpY2F/4QSkDZtaTlsTfm8YfU
+IZnwq82Johh8MzGDRuYY0HBKRdAotGS2F+SycdOxBGW6hnW0WR/2pt0BpIxNYgl
ro0dOgSptWoEmOFzhlN9pVsFvImSjXVlbV9GnF4AsDrh9x9FIaFIpvhgZMUU8wc+
Akg2jgLy2hhysQ9JtES0rL9qrptPPcQVqCL8ct/sLB85vYw7oUxwSenj7w8mwmE=
=yjMx
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [C++0x] contiguous bitfields race implementation
  2011-09-02 14:38                                                                         ` Richard Guenther
@ 2011-09-07 18:12                                                                           ` Jason Merrill
  0 siblings, 0 replies; 81+ messages in thread
From: Jason Merrill @ 2011-09-07 18:12 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Aldy Hernandez, gcc-patches

On 09/02/2011 10:38 AM, Richard Guenther wrote:
> On Fri, Sep 2, 2011 at 4:10 PM, Jason Merrill<jason@redhat.com>  wrote:
>> I wonder what would break if C++ just set TYPE_SIZE to the as-base size?
>
> Good question.  Probably argument passing, as the as-base size wouldn't
> get a proper mode assigned form layout_type then(?) for small structs?

Classes for which the as-base size is different are passed by invisible 
reference, so that wouldn't be an issue.

But layout_decl would get the wrong size for variables and fields of the 
type, so that won't work.

Perhaps it's time to get serious about the change I talked about in 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22488#c42 ...

Jason

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2011-09-07 18:08 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-09 17:12 [C++0x] contiguous bitfields race implementation Aldy Hernandez
2011-05-09 18:04 ` Jeff Law
2011-05-09 18:05   ` Aldy Hernandez
2011-05-09 19:19     ` Jeff Law
2011-05-09 20:11   ` Aldy Hernandez
2011-05-09 20:28     ` Jakub Jelinek
2011-05-10 11:42       ` Richard Guenther
2011-05-09 20:49     ` Jason Merrill
2011-05-13 22:35       ` Aldy Hernandez
2011-05-16 21:20         ` Aldy Hernandez
2011-05-19  7:17         ` Jason Merrill
2011-05-20  9:21           ` Aldy Hernandez
2011-05-26 18:05             ` Jason Merrill
2011-05-26 18:28               ` Aldy Hernandez
2011-05-26 19:07                 ` Jason Merrill
2011-05-26 20:19                   ` Aldy Hernandez
2011-05-27 20:41                     ` Jason Merrill
2011-07-18 13:10                       ` Aldy Hernandez
2011-07-22 19:16                         ` Jason Merrill
2011-07-25 17:41                           ` Aldy Hernandez
2011-07-26  5:28                             ` Jason Merrill
2011-07-26 18:37                               ` Aldy Hernandez
2011-07-26 17:54                                 ` Jason Merrill
2011-07-26 17:51                                   ` Aldy Hernandez
2011-07-26 18:05                                     ` Jason Merrill
2011-07-27 15:03                                       ` Richard Guenther
2011-07-27 15:12                                         ` Richard Guenther
2011-07-27 15:53                                           ` Richard Guenther
2011-07-28 13:00                                             ` Richard Guenther
2011-07-29  2:58                                               ` Jason Merrill
2011-07-29 12:02                                               ` Aldy Hernandez
2011-07-29 11:00                                                 ` Richard Guenther
2011-08-01 13:51                                                   ` Richard Guenther
2011-08-05 17:28                                               ` Aldy Hernandez
2011-08-09 10:52                                                 ` Richard Guenther
2011-08-09 20:53                                                   ` Aldy Hernandez
2011-08-10 13:34                                                     ` Richard Guenther
2011-08-15 19:26                                                       ` Aldy Hernandez
2011-08-27  0:05                                                         ` Aldy Hernandez
2011-08-29 12:54                                                           ` Richard Guenther
2011-08-30 16:07                                                             ` Aldy Hernandez
2011-08-31  8:38                                                               ` Richard Guenther
2011-08-31 13:56                                                                 ` Richard Guenther
2011-08-31 20:37                                                                   ` Aldy Hernandez
2011-09-01  6:58                                                                     ` Richard Guenther
2011-08-30 16:53                                                             ` Aldy Hernandez
2011-08-31  8:55                                                               ` Richard Guenther
2011-08-31 17:24                                                                 ` Aldy Hernandez
2011-08-30 21:33                                                             ` Aldy Hernandez
2011-08-31  8:55                                                               ` Richard Guenther
2011-08-31 20:37                                                                 ` Aldy Hernandez
2011-09-01  7:02                                                                   ` Richard Guenther
2011-09-01  7:05                                                                     ` Arnaud Charlet
2011-09-01 14:16                                                                     ` Aldy Hernandez
2011-09-02  8:48                                                                       ` Richard Guenther
2011-09-02 12:49                                                                         ` Aldy Hernandez
2011-09-02 13:05                                                                           ` Richard Guenther
2011-09-02 20:34                                                                         ` Jeff Law
2011-09-01 14:53                                                             ` Aldy Hernandez
2011-09-01 15:01                                                               ` Jason Merrill
2011-09-01 15:10                                                                 ` Aldy Hernandez
2011-09-01 15:20                                                                   ` Jason Merrill
2011-09-02  8:53                                                                     ` Richard Guenther
2011-09-02 14:10                                                                       ` Jason Merrill
2011-09-02 14:38                                                                         ` Richard Guenther
2011-09-07 18:12                                                                           ` Jason Merrill
2011-07-28 19:42                                             ` Aldy Hernandez
2011-07-27 18:22                                           ` Aldy Hernandez
2011-07-28  8:52                                             ` Richard Guenther
2011-07-29 12:05                                               ` Aldy Hernandez
2011-07-28 19:58                                                 ` Richard Guenther
2011-07-27 17:29                                         ` Aldy Hernandez
2011-07-27 17:57                                           ` Andrew MacLeod
2011-07-27 22:27                                             ` Joseph S. Myers
2011-07-28  8:58                                             ` Richard Guenther
2011-07-28 22:26                                         ` Aldy Hernandez
2011-07-26 20:05                               ` Aldy Hernandez
2011-07-27 18:24                             ` H.J. Lu
2011-07-27 20:39                               ` Aldy Hernandez
2011-07-27 20:54                                 ` Jakub Jelinek
2011-07-27 21:00                                   ` Aldy Hernandez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).