public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] Implement Undefined Behavior Sanitizer
@ 2013-06-05 17:57 Marek Polacek
  2013-06-05 18:44 ` Andrew Pinski
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Marek Polacek @ 2013-06-05 17:57 UTC (permalink / raw)
  To: GCC Patches

Hi!

This is an attempt to add the Undefined Behavior Sanitizer to GCC.
Note that it's very alpha version; so far it doesn't do that much,
at the moment it should handle division by zero cases, INT_MIN / -1,
and various shift cases (shifting by a negative value, shifting when
second operand is >= than TYPE_PRECISION (first_operand) and suchlike.
(On integer types, so far.)

It works by creating a COMPOUND_EXPR around original expression, so e.g.
it creates:

if (b < 0 || (b > 31 || a < 0))
  {
    __builtin___ubsan_handle_shift_out_of_bounds ();
  }
else
  {
    0
  }, a << b;

from original "a <<= b;".

There is of course a lot of stuff that needs to be done, more
specifically:
  0) fix an ICE which I've noticed right now ;(
        long a = 1;
        int b = 3;
        a <<= b;
      (error: mismatching comparison operand types)
      temporarily solved by surrounding "doing_shift = true;"
      with if (comptypes (type0, type1))
      But that needs a better solution I'm afraid.  Bah.
  1) import & build the ubsan library from LLVM
     I've already spent some time on this, but failed miserably.  I've thought
     that importing ubsan/ from LLVM into libsanitizer/, adding
     libsanitizer/ubsan/Makefile.{am,in}, editing libsanitizer/Makefile.am
     and libsanitizer/configure.ac, then something like aclocal && automake
     could be sufficient, but no.  I'd very much appreciate any help with
     this; is someone willing to help me with this one?  And it seemed so easy...
  2) construct arguments for ubsan library
     I guess that if we want to call for instance
     void __ubsan::__ubsan_handle_shift_out_of_bounds(ShiftOutOfBoundsData *Data,
                                                 ValueHandle LHS, ValueHandle RHS)
     from GCC, we need to construct arguments compatible with
     ShiftOutOfBoundsData/ValueHandle.  
     So, perhaps we need some helper function that constructs the CALL_EXPR
     for the builtin; so far I haven't spent much time on this and don't know
     what exactly to do here.  Time to look at what asan/tsan do.
  3) add parsing of -fsanitize=<...>
     LLVM supports e.g. -fsanitize=shift,divbyzero combination, we should too.
     This doesn't sound like a big deal; just parse the arguments and set
     various flags, or error out on invalid combinations.
  4) and of course, more instrumentation (C/C++ FE, gimple level)
     What comes to mind is:
     - float/double to integer conversions,
     - integer overflows (a long list of various cases here),
     - invalid conversions of int to bool,
     - reaching a __builtin_unreachable() call,
     - VLAs size (e.g. negative size),
     - store to/load of misaligned address,
     - store to/load of null pointer,
     - etc.
     For the time being, I plan to work on overflows instrumentation.

Regtested/bootstrapped on x86_64-linux.

Comments, please?

2013-06-05  Marek Polacek  <polacek@redhat.com>

	* Makefile.in: Add ubsan.c
	* common.opt: Add -fsanitize=undefined option.
	* doc/invoke.texi: Document the new flag.
	* ubsan.h: New file.
	* ubsan.c): New file.
	* sanitizer.def (DEF_SANITIZER_BUILTIN):
	* builtins.def: Define BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW and
	BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS.
	* cp/typeck.c (cp_build_binary_op): Add division by zero and shift
	instrumentation.
	* c/c-typeck.c (build_binary_op): Likewise.
	* builtin-attrs.def: Define ATTR_COLD.
	* asan.c (initialize_sanitizer_builtins): Build
	BT_FN_VOID_PTR_PTR_PTR.

--- gcc/sanitizer.def.mp	2013-06-05 18:23:41.077439836 +0200
+++ gcc/sanitizer.def	2013-06-05 18:26:04.749921181 +0200
@@ -283,3 +283,13 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOM
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE,
 		      "__tsan_atomic_signal_fence",
 		      BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
+
+/* Undefined Behavior Sanitizer */
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
+		      "__ubsan_handle_divrem_overflow",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS,
+		      "__ubsan_handle_shift_out_of_bounds",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
--- gcc/builtins.def.mp	2013-06-05 18:23:41.072439816 +0200
+++ gcc/builtins.def	2013-06-05 18:26:04.728921097 +0200
@@ -155,7 +155,7 @@ along with GCC; see the file COPYING3.
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
 	       true, true, true, ATTRS, true, \
-	       (flag_asan || flag_tsan))
+	       (flag_asan || flag_tsan || flag_ubsan))
 
 #undef DEF_CILKPLUS_BUILTIN
 #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
--- gcc/Makefile.in.mp	2013-06-05 18:23:25.807388466 +0200
+++ gcc/Makefile.in	2013-06-05 18:26:04.723921077 +0200
@@ -1377,6 +1377,7 @@ OBJS = \
 	tree-affine.o \
 	asan.o \
 	tsan.o \
+	ubsan.o \
 	tree-call-cdce.o \
 	tree-cfg.o \
 	tree-cfgcleanup.o \
@@ -2259,6 +2260,10 @@ tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_
    $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_CORE_H) $(GIMPLE_H) tree-iterator.h \
    intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h \
    tree-ssa-propagate.h
+ubsan.o : ubsan.c $(CONFIG_H) $(SYSTEM_H) $(GIMPLE_H) \
+   output.h coretypes.h $(GIMPLE_PRETTY_PRINT_H) $(CFGLOOP_H) \
+   tree-iterator.h $(TREE_FLOW_H) $(TREE_PASS_H) \
+   $(TARGET_H) $(EXPR_H) $(OPTABS_H) $(TM_P_H) langhooks.h
 tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
    $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \
    $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) $(CFGLOOP_H) \
--- gcc/doc/invoke.texi.mp	2013-06-05 18:29:18.301611796 +0200
+++ gcc/doc/invoke.texi	2013-06-05 18:33:53.756623280 +0200
@@ -5143,6 +5143,11 @@ Memory access instructions will be instr
 data race bugs.
 See @uref{http://code.google.com/p/data-race-test/wiki/ThreadSanitizer} for more details.
 
+@item -fsanitize=undefined
+Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector
+Various computations will be instrumented to detect
+undefined behavior, e.g. division by zero or various overflows.
+
 @item -fdump-final-insns@r{[}=@var{file}@r{]}
 @opindex fdump-final-insns
 Dump the final internal representation (RTL) to @var{file}.  If the
--- gcc/ubsan.h.mp	2013-06-05 18:23:55.083486235 +0200
+++ gcc/ubsan.h	2013-06-05 18:10:21.284693807 +0200
@@ -0,0 +1,27 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_UBSAN_H
+#define GCC_UBSAN_H
+
+extern tree ubsan_instrument_division (location_t, enum tree_code, tree, tree);
+extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
+
+#endif  /* GCC_UBSAN_H  */
--- gcc/ubsan.c.mp	2013-06-05 18:23:49.411467508 +0200
+++ gcc/ubsan.c	2013-06-05 18:00:25.000000000 +0200
@@ -0,0 +1,107 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "c-family/c-common.h"
+
+/* Instrument division by zero and INT_MIN / -1.  */
+
+tree
+ubsan_instrument_division (location_t loc, enum tree_code code,
+			   tree op0, tree op1)
+{
+  tree t, tt;
+  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
+
+  if (TREE_CODE (TREE_TYPE (op0)) != INTEGER_TYPE
+      || TREE_CODE (TREE_TYPE (op1)) != INTEGER_TYPE)
+    return orig;
+
+  /* If we *know* that the divisor is not -1 or 0, we don't have to
+     instrument this expression.
+     ??? We could use decl_constant_value to cover up more cases.  */
+  if (TREE_CODE (op1) == INTEGER_CST
+      && integer_nonzerop (op1)
+      && !integer_minus_onep (op1))
+    return orig;
+
+  tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
+		    integer_minus_one_node);
+  t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
+		   TYPE_MIN_VALUE (TREE_TYPE (op0)));
+  t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
+  tt = build2 (EQ_EXPR, boolean_type_node,
+	       op1, integer_zero_node);
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, t);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW);
+  // XXX Do we want _loc version here?
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig);
+
+  return t;
+}
+
+/* Instrument left and right shifts.  */
+
+tree
+ubsan_instrument_shift (location_t loc, enum tree_code code,
+			tree op0, tree op1)
+{
+  tree t, tt;
+  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
+  tree prec = build_int_cst (TREE_TYPE (op0),
+			     TYPE_PRECISION (TREE_TYPE (op0)));
+
+  t = fold_build2 (LT_EXPR, boolean_type_node, op1, integer_zero_node);
+  tt = fold_build2 (GE_EXPR, boolean_type_node, op1, prec);
+
+  /* int a = 1;
+     a <<= 31;
+     is undefined in C99/C11.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && (flag_isoc99 || flag_isoc11))
+    {
+      tree prec1 = build_int_cst (TREE_TYPE (op1),
+				  TYPE_PRECISION (TREE_TYPE (op1)) - 1);
+      tree x = fold_build2 (EQ_EXPR, boolean_type_node, op1, prec1);
+      tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, x);
+    }
+
+  /* For left shift, shifting a negative value is undefined.  */
+  if (code == LSHIFT_EXPR)
+    {
+      tree x = fold_build2 (LT_EXPR, boolean_type_node, op0,
+			    integer_zero_node);
+      tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, x);
+    }
+
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, tt);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig);
+
+  return t;
+}
--- gcc/cp/typeck.c.mp	2013-06-05 18:23:41.076439832 +0200
+++ gcc/cp/typeck.c	2013-06-05 18:26:04.746921169 +0200
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "intl.h"
 #include "target.h"
 #include "convert.h"
+#include "ubsan.h"
 #include "c-family/c-common.h"
 #include "c-family/c-objc.h"
 #include "params.h"
@@ -3891,6 +3892,12 @@ cp_build_binary_op (location_t location,
   op0 = orig_op0;
   op1 = orig_op1;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
   if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
       || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
       || code == TRUTH_XOR_EXPR)
@@ -4070,8 +4077,15 @@ cp_build_binary_op (location_t location,
 	{
 	  enum tree_code tcode0 = code0, tcode1 = code1;
 	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	  cop1 = maybe_constant_value (cop1);
+
+	  if (!processing_template_decl && tcode0 == INTEGER_TYPE
+	      && (TREE_CODE (cop1) != INTEGER_CST
+		  || integer_zerop (cop1)
+		  || integer_minus_onep (cop1)))
+	    doing_div_or_mod = true;
 
-	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	  warn_for_div_by_zero (location, cop1);
 
 	  if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE)
 	    tcode0 = TREE_CODE (TREE_TYPE (TREE_TYPE (op0)));
@@ -4109,8 +4123,14 @@ cp_build_binary_op (location_t location,
     case FLOOR_MOD_EXPR:
       {
 	tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	cop1 = maybe_constant_value (cop1);
 
-	warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	if (!processing_template_decl && code0 == INTEGER_TYPE
+	    && (TREE_CODE (cop1) != INTEGER_CST
+		|| integer_zerop (cop1)
+		|| integer_minus_onep (cop1)))
+	  doing_div_or_mod = true;
+	warn_for_div_by_zero (location, cop1);
       }
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -4164,6 +4184,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4211,6 +4232,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4607,6 +4629,18 @@ cp_build_binary_op (location_t location,
       break;
     }
 
+  if (flag_ubsan && doing_div_or_mod)
+    {
+      resultcode = COMPOUND_EXPR;
+      return ubsan_instrument_division (location, code, op0, op1);
+    }
+
+  if (flag_ubsan && doing_shift)
+    {
+      resultcode = COMPOUND_EXPR;
+      return ubsan_instrument_shift (location, code, op0, op1);
+    }
+
   if (((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE
 	|| code0 == ENUMERAL_TYPE)
        && (code1 == INTEGER_TYPE || code1 == REAL_TYPE
--- gcc/common.opt.mp	2013-06-05 18:23:41.075439828 +0200
+++ gcc/common.opt	2013-06-05 18:26:04.740921145 +0200
@@ -858,6 +858,10 @@ fsanitize=thread
 Common Report Var(flag_tsan)
 Enable ThreadSanitizer, a data race detector
 
+fsanitize=undefined
+Common Report Var(flag_ubsan)
+Enable UndefinedBehaviorSanitizer, an undefined behavior detector
+
 fasynchronous-unwind-tables
 Common Report Var(flag_asynchronous_unwind_tables) Optimization
 Generate unwind tables that are exact at each instruction boundary
--- gcc/builtin-attrs.def.mp	2013-06-05 18:23:41.071439812 +0200
+++ gcc/builtin-attrs.def	2013-06-05 18:26:04.727921093 +0200
@@ -83,6 +83,7 @@ DEF_LIST_INT_INT (5,6)
 #undef DEF_LIST_INT_INT
 
 /* Construct trees for identifiers.  */
+DEF_ATTR_IDENT (ATTR_COLD, "cold")
 DEF_ATTR_IDENT (ATTR_CONST, "const")
 DEF_ATTR_IDENT (ATTR_FORMAT, "format")
 DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
@@ -130,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHRO
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST, ATTR_COLD,\
+			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
 			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC,	\
--- gcc/c/c-typeck.c.mp	2013-06-05 18:23:41.073439820 +0200
+++ gcc/c/c-typeck.c	2013-06-05 18:26:04.736921129 +0200
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
 #include "tree-iterator.h"
 #include "bitmap.h"
 #include "gimple.h"
+#include "ubsan.h"
 #include "c-family/c-objc.h"
 #include "c-family/c-common.h"
 
@@ -9542,6 +9543,12 @@ build_binary_op (location_t location, en
      operands to truth-values.  */
   bool boolean_op = false;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
   if (location == UNKNOWN_LOCATION)
     location = input_location;
 
@@ -9743,6 +9750,7 @@ build_binary_op (location_t location, en
     case FLOOR_DIV_EXPR:
     case ROUND_DIV_EXPR:
     case EXACT_DIV_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
@@ -9790,6 +9798,7 @@ build_binary_op (location_t location, en
 
     case TRUNC_MOD_EXPR:
     case FLOOR_MOD_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -9888,6 +9897,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -9940,6 +9950,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -10224,6 +10235,20 @@ build_binary_op (location_t location, en
       return error_mark_node;
     }
 
+  if (flag_ubsan && doing_div_or_mod)
+    {
+      ret = ubsan_instrument_division (location, code, op0, op1);
+      resultcode = COMPOUND_EXPR;
+      goto return_build_binary_op;
+    }
+
+  if (flag_ubsan && doing_shift)
+    {
+      ret = ubsan_instrument_shift (location, code, op0, op1);
+      resultcode = COMPOUND_EXPR;
+      goto return_build_binary_op;
+    }
+
   if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE
        || code0 == FIXED_POINT_TYPE || code0 == VECTOR_TYPE)
       &&
--- gcc/asan.c.mp	2013-06-05 18:23:41.070439808 +0200
+++ gcc/asan.c	2013-06-05 18:26:04.726921089 +0200
@@ -2034,6 +2034,9 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_VOID = build_function_type_list (void_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR
     = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
+  tree BT_FN_VOID_PTR_PTR_PTR
+    = build_function_type_list (void_type_node, ptr_type_node,
+				ptr_type_node, ptr_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR_PTRMODE
     = build_function_type_list (void_type_node, ptr_type_node,
 				build_nonstandard_integer_type (POINTER_SIZE,
@@ -2099,6 +2102,9 @@ initialize_sanitizer_builtins (void)
 #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
   ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
+#undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST
+#define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \
+  /* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST
 #undef DEF_SANITIZER_BUILTIN
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,		\

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 17:57 [RFC] Implement Undefined Behavior Sanitizer Marek Polacek
@ 2013-06-05 18:44 ` Andrew Pinski
  2013-06-05 19:23   ` Jakub Jelinek
  2013-06-05 19:19 ` Jakub Jelinek
  2013-06-05 19:51 ` [RFC] Implement Undefined Behavior Sanitizer Joseph S. Myers
  2 siblings, 1 reply; 46+ messages in thread
From: Andrew Pinski @ 2013-06-05 18:44 UTC (permalink / raw)
  To: Marek Polacek; +Cc: GCC Patches

On Wed, Jun 5, 2013 at 10:57 AM, Marek Polacek <polacek@redhat.com> wrote:
> Hi!
>
> This is an attempt to add the Undefined Behavior Sanitizer to GCC.
> Note that it's very alpha version; so far it doesn't do that much,
> at the moment it should handle division by zero cases, INT_MIN / -1,
> and various shift cases (shifting by a negative value, shifting when
> second operand is >= than TYPE_PRECISION (first_operand) and suchlike.
> (On integer types, so far.)
>
> It works by creating a COMPOUND_EXPR around original expression, so e.g.
> it creates:
>
> if (b < 0 || (b > 31 || a < 0))
>   {
>     __builtin___ubsan_handle_shift_out_of_bounds ();
>   }
> else
>   {
>     0
>   }, a << b;
>
> from original "a <<= b;".
>
> There is of course a lot of stuff that needs to be done, more
> specifically:
>   0) fix an ICE which I've noticed right now ;(
>         long a = 1;
>         int b = 3;
>         a <<= b;
>       (error: mismatching comparison operand types)
>       temporarily solved by surrounding "doing_shift = true;"
>       with if (comptypes (type0, type1))
>       But that needs a better solution I'm afraid.  Bah.
>   1) import & build the ubsan library from LLVM
>      I've already spent some time on this, but failed miserably.  I've thought
>      that importing ubsan/ from LLVM into libsanitizer/, adding
>      libsanitizer/ubsan/Makefile.{am,in}, editing libsanitizer/Makefile.am
>      and libsanitizer/configure.ac, then something like aclocal && automake
>      could be sufficient, but no.  I'd very much appreciate any help with
>      this; is someone willing to help me with this one?  And it seemed so easy...
>   2) construct arguments for ubsan library
>      I guess that if we want to call for instance
>      void __ubsan::__ubsan_handle_shift_out_of_bounds(ShiftOutOfBoundsData *Data,
>                                                  ValueHandle LHS, ValueHandle RHS)
>      from GCC, we need to construct arguments compatible with
>      ShiftOutOfBoundsData/ValueHandle.
>      So, perhaps we need some helper function that constructs the CALL_EXPR
>      for the builtin; so far I haven't spent much time on this and don't know
>      what exactly to do here.  Time to look at what asan/tsan do.
>   3) add parsing of -fsanitize=<...>
>      LLVM supports e.g. -fsanitize=shift,divbyzero combination, we should too.
>      This doesn't sound like a big deal; just parse the arguments and set
>      various flags, or error out on invalid combinations.
>   4) and of course, more instrumentation (C/C++ FE, gimple level)
>      What comes to mind is:
>      - float/double to integer conversions,
>      - integer overflows (a long list of various cases here),
>      - invalid conversions of int to bool,
>      - reaching a __builtin_unreachable() call,
>      - VLAs size (e.g. negative size),
>      - store to/load of misaligned address,
>      - store to/load of null pointer,
>      - etc.
>      For the time being, I plan to work on overflows instrumentation.
>
> Regtested/bootstrapped on x86_64-linux.
>
> Comments, please?
I think it might be better to do handle this while gimplification
happens rather than while parsing.  The main reason is that constexpr
might fail due to the added function calls.

Also please don't shorten file names like ubsan,  we already have file
names which don't fit in the older POSIX tar format and needs extended
length support.

Thanks,
Andrew Pinski


>
> 2013-06-05  Marek Polacek  <polacek@redhat.com>
>
>         * Makefile.in: Add ubsan.c
>         * common.opt: Add -fsanitize=undefined option.
>         * doc/invoke.texi: Document the new flag.
>         * ubsan.h: New file.
>         * ubsan.c): New file.
>         * sanitizer.def (DEF_SANITIZER_BUILTIN):
>         * builtins.def: Define BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW and
>         BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS.
>         * cp/typeck.c (cp_build_binary_op): Add division by zero and shift
>         instrumentation.
>         * c/c-typeck.c (build_binary_op): Likewise.
>         * builtin-attrs.def: Define ATTR_COLD.
>         * asan.c (initialize_sanitizer_builtins): Build
>         BT_FN_VOID_PTR_PTR_PTR.
>
> --- gcc/sanitizer.def.mp        2013-06-05 18:23:41.077439836 +0200
> +++ gcc/sanitizer.def   2013-06-05 18:26:04.749921181 +0200
> @@ -283,3 +283,13 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOM
>  DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE,
>                       "__tsan_atomic_signal_fence",
>                       BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +/* Undefined Behavior Sanitizer */
> +DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
> +                     "__ubsan_handle_divrem_overflow",
> +                     BT_FN_VOID_PTR_PTR_PTR,
> +                     ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS,
> +                     "__ubsan_handle_shift_out_of_bounds",
> +                     BT_FN_VOID_PTR_PTR_PTR,
> +                     ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
> --- gcc/builtins.def.mp 2013-06-05 18:23:41.072439816 +0200
> +++ gcc/builtins.def    2013-06-05 18:26:04.728921097 +0200
> @@ -155,7 +155,7 @@ along with GCC; see the file COPYING3.
>  #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>    DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
>                true, true, true, ATTRS, true, \
> -              (flag_asan || flag_tsan))
> +              (flag_asan || flag_tsan || flag_ubsan))
>
>  #undef DEF_CILKPLUS_BUILTIN
>  #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
> --- gcc/Makefile.in.mp  2013-06-05 18:23:25.807388466 +0200
> +++ gcc/Makefile.in     2013-06-05 18:26:04.723921077 +0200
> @@ -1377,6 +1377,7 @@ OBJS = \
>         tree-affine.o \
>         asan.o \
>         tsan.o \
> +       ubsan.o \
>         tree-call-cdce.o \
>         tree-cfg.o \
>         tree-cfgcleanup.o \
> @@ -2259,6 +2260,10 @@ tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_
>     $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_CORE_H) $(GIMPLE_H) tree-iterator.h \
>     intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h \
>     tree-ssa-propagate.h
> +ubsan.o : ubsan.c $(CONFIG_H) $(SYSTEM_H) $(GIMPLE_H) \
> +   output.h coretypes.h $(GIMPLE_PRETTY_PRINT_H) $(CFGLOOP_H) \
> +   tree-iterator.h $(TREE_FLOW_H) $(TREE_PASS_H) \
> +   $(TARGET_H) $(EXPR_H) $(OPTABS_H) $(TM_P_H) langhooks.h
>  tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
>     $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \
>     $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) $(CFGLOOP_H) \
> --- gcc/doc/invoke.texi.mp      2013-06-05 18:29:18.301611796 +0200
> +++ gcc/doc/invoke.texi 2013-06-05 18:33:53.756623280 +0200
> @@ -5143,6 +5143,11 @@ Memory access instructions will be instr
>  data race bugs.
>  See @uref{http://code.google.com/p/data-race-test/wiki/ThreadSanitizer} for more details.
>
> +@item -fsanitize=undefined
> +Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector
> +Various computations will be instrumented to detect
> +undefined behavior, e.g. division by zero or various overflows.
> +
>  @item -fdump-final-insns@r{[}=@var{file}@r{]}
>  @opindex fdump-final-insns
>  Dump the final internal representation (RTL) to @var{file}.  If the
> --- gcc/ubsan.h.mp      2013-06-05 18:23:55.083486235 +0200
> +++ gcc/ubsan.h 2013-06-05 18:10:21.284693807 +0200
> @@ -0,0 +1,27 @@
> +/* UndefinedBehaviorSanitizer, undefined behavior detector.
> +   Copyright (C) 2013 Free Software Foundation, Inc.
> +   Contributed by Marek Polacek <polacek@redhat.com>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_UBSAN_H
> +#define GCC_UBSAN_H
> +
> +extern tree ubsan_instrument_division (location_t, enum tree_code, tree, tree);
> +extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
> +
> +#endif  /* GCC_UBSAN_H  */
> --- gcc/ubsan.c.mp      2013-06-05 18:23:49.411467508 +0200
> +++ gcc/ubsan.c 2013-06-05 18:00:25.000000000 +0200
> @@ -0,0 +1,107 @@
> +/* UndefinedBehaviorSanitizer, undefined behavior detector.
> +   Copyright (C) 2013 Free Software Foundation, Inc.
> +   Contributed by Marek Polacek <polacek@redhat.com>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "c-family/c-common.h"
> +
> +/* Instrument division by zero and INT_MIN / -1.  */
> +
> +tree
> +ubsan_instrument_division (location_t loc, enum tree_code code,
> +                          tree op0, tree op1)
> +{
> +  tree t, tt;
> +  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> +
> +  if (TREE_CODE (TREE_TYPE (op0)) != INTEGER_TYPE
> +      || TREE_CODE (TREE_TYPE (op1)) != INTEGER_TYPE)
> +    return orig;
> +
> +  /* If we *know* that the divisor is not -1 or 0, we don't have to
> +     instrument this expression.
> +     ??? We could use decl_constant_value to cover up more cases.  */
> +  if (TREE_CODE (op1) == INTEGER_CST
> +      && integer_nonzerop (op1)
> +      && !integer_minus_onep (op1))
> +    return orig;
> +
> +  tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
> +                   integer_minus_one_node);
> +  t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
> +                  TYPE_MIN_VALUE (TREE_TYPE (op0)));
> +  t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
> +  tt = build2 (EQ_EXPR, boolean_type_node,
> +              op1, integer_zero_node);
> +  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, t);
> +  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW);
> +  // XXX Do we want _loc version here?
> +  tt = build_call_expr_loc (loc, tt, 0);
> +  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
> +  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig);
> +
> +  return t;
> +}
> +
> +/* Instrument left and right shifts.  */
> +
> +tree
> +ubsan_instrument_shift (location_t loc, enum tree_code code,
> +                       tree op0, tree op1)
> +{
> +  tree t, tt;
> +  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> +  tree prec = build_int_cst (TREE_TYPE (op0),
> +                            TYPE_PRECISION (TREE_TYPE (op0)));
> +
> +  t = fold_build2 (LT_EXPR, boolean_type_node, op1, integer_zero_node);
> +  tt = fold_build2 (GE_EXPR, boolean_type_node, op1, prec);
> +
> +  /* int a = 1;
> +     a <<= 31;
> +     is undefined in C99/C11.  */
> +  if (code == LSHIFT_EXPR
> +      && !TYPE_UNSIGNED (TREE_TYPE (op0))
> +      && (flag_isoc99 || flag_isoc11))
> +    {
> +      tree prec1 = build_int_cst (TREE_TYPE (op1),
> +                                 TYPE_PRECISION (TREE_TYPE (op1)) - 1);
> +      tree x = fold_build2 (EQ_EXPR, boolean_type_node, op1, prec1);
> +      tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, x);
> +    }
> +
> +  /* For left shift, shifting a negative value is undefined.  */
> +  if (code == LSHIFT_EXPR)
> +    {
> +      tree x = fold_build2 (LT_EXPR, boolean_type_node, op0,
> +                           integer_zero_node);
> +      tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, x);
> +    }
> +
> +  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, tt);
> +  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS);
> +  tt = build_call_expr_loc (loc, tt, 0);
> +  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
> +  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig);
> +
> +  return t;
> +}
> --- gcc/cp/typeck.c.mp  2013-06-05 18:23:41.076439832 +0200
> +++ gcc/cp/typeck.c     2013-06-05 18:26:04.746921169 +0200
> @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
>  #include "intl.h"
>  #include "target.h"
>  #include "convert.h"
> +#include "ubsan.h"
>  #include "c-family/c-common.h"
>  #include "c-family/c-objc.h"
>  #include "params.h"
> @@ -3891,6 +3892,12 @@ cp_build_binary_op (location_t location,
>    op0 = orig_op0;
>    op1 = orig_op1;
>
> +  /* Remember whether we're doing / or %.  */
> +  bool doing_div_or_mod = false;
> +
> +  /* Remember whether we're doing << or >>.  */
> +  bool doing_shift = false;
> +
>    if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
>        || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
>        || code == TRUTH_XOR_EXPR)
> @@ -4070,8 +4077,15 @@ cp_build_binary_op (location_t location,
>         {
>           enum tree_code tcode0 = code0, tcode1 = code1;
>           tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
> +         cop1 = maybe_constant_value (cop1);
> +
> +         if (!processing_template_decl && tcode0 == INTEGER_TYPE
> +             && (TREE_CODE (cop1) != INTEGER_CST
> +                 || integer_zerop (cop1)
> +                 || integer_minus_onep (cop1)))
> +           doing_div_or_mod = true;
>
> -         warn_for_div_by_zero (location, maybe_constant_value (cop1));
> +         warn_for_div_by_zero (location, cop1);
>
>           if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE)
>             tcode0 = TREE_CODE (TREE_TYPE (TREE_TYPE (op0)));
> @@ -4109,8 +4123,14 @@ cp_build_binary_op (location_t location,
>      case FLOOR_MOD_EXPR:
>        {
>         tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
> +       cop1 = maybe_constant_value (cop1);
>
> -       warn_for_div_by_zero (location, maybe_constant_value (cop1));
> +       if (!processing_template_decl && code0 == INTEGER_TYPE
> +           && (TREE_CODE (cop1) != INTEGER_CST
> +               || integer_zerop (cop1)
> +               || integer_minus_onep (cop1)))
> +         doing_div_or_mod = true;
> +       warn_for_div_by_zero (location, cop1);
>        }
>
>        if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
> @@ -4164,6 +4184,7 @@ cp_build_binary_op (location_t location,
>           if (TREE_CODE (const_op1) != INTEGER_CST)
>             const_op1 = op1;
>           result_type = type0;
> +         doing_shift = true;
>           if (TREE_CODE (const_op1) == INTEGER_CST)
>             {
>               if (tree_int_cst_lt (const_op1, integer_zero_node))
> @@ -4211,6 +4232,7 @@ cp_build_binary_op (location_t location,
>           if (TREE_CODE (const_op1) != INTEGER_CST)
>             const_op1 = op1;
>           result_type = type0;
> +         doing_shift = true;
>           if (TREE_CODE (const_op1) == INTEGER_CST)
>             {
>               if (tree_int_cst_lt (const_op1, integer_zero_node))
> @@ -4607,6 +4629,18 @@ cp_build_binary_op (location_t location,
>        break;
>      }
>
> +  if (flag_ubsan && doing_div_or_mod)
> +    {
> +      resultcode = COMPOUND_EXPR;
> +      return ubsan_instrument_division (location, code, op0, op1);
> +    }
> +
> +  if (flag_ubsan && doing_shift)
> +    {
> +      resultcode = COMPOUND_EXPR;
> +      return ubsan_instrument_shift (location, code, op0, op1);
> +    }
> +
>    if (((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE
>         || code0 == ENUMERAL_TYPE)
>         && (code1 == INTEGER_TYPE || code1 == REAL_TYPE
> --- gcc/common.opt.mp   2013-06-05 18:23:41.075439828 +0200
> +++ gcc/common.opt      2013-06-05 18:26:04.740921145 +0200
> @@ -858,6 +858,10 @@ fsanitize=thread
>  Common Report Var(flag_tsan)
>  Enable ThreadSanitizer, a data race detector
>
> +fsanitize=undefined
> +Common Report Var(flag_ubsan)
> +Enable UndefinedBehaviorSanitizer, an undefined behavior detector
> +
>  fasynchronous-unwind-tables
>  Common Report Var(flag_asynchronous_unwind_tables) Optimization
>  Generate unwind tables that are exact at each instruction boundary
> --- gcc/builtin-attrs.def.mp    2013-06-05 18:23:41.071439812 +0200
> +++ gcc/builtin-attrs.def       2013-06-05 18:26:04.727921093 +0200
> @@ -83,6 +83,7 @@ DEF_LIST_INT_INT (5,6)
>  #undef DEF_LIST_INT_INT
>
>  /* Construct trees for identifiers.  */
> +DEF_ATTR_IDENT (ATTR_COLD, "cold")
>  DEF_ATTR_IDENT (ATTR_CONST, "const")
>  DEF_ATTR_IDENT (ATTR_FORMAT, "format")
>  DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
> @@ -130,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHRO
>                         ATTR_NULL, ATTR_NOTHROW_LIST)
>  DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
>                         ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
> +DEF_ATTR_TREE_LIST (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST, ATTR_COLD,\
> +                       ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
>  DEF_ATTR_TREE_LIST (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
>                         ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
>  DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC,     \
> --- gcc/c/c-typeck.c.mp 2013-06-05 18:23:41.073439820 +0200
> +++ gcc/c/c-typeck.c    2013-06-05 18:26:04.736921129 +0200
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-iterator.h"
>  #include "bitmap.h"
>  #include "gimple.h"
> +#include "ubsan.h"
>  #include "c-family/c-objc.h"
>  #include "c-family/c-common.h"
>
> @@ -9542,6 +9543,12 @@ build_binary_op (location_t location, en
>       operands to truth-values.  */
>    bool boolean_op = false;
>
> +  /* Remember whether we're doing / or %.  */
> +  bool doing_div_or_mod = false;
> +
> +  /* Remember whether we're doing << or >>.  */
> +  bool doing_shift = false;
> +
>    if (location == UNKNOWN_LOCATION)
>      location = input_location;
>
> @@ -9743,6 +9750,7 @@ build_binary_op (location_t location, en
>      case FLOOR_DIV_EXPR:
>      case ROUND_DIV_EXPR:
>      case EXACT_DIV_EXPR:
> +      doing_div_or_mod = true;
>        warn_for_div_by_zero (location, op1);
>
>        if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
> @@ -9790,6 +9798,7 @@ build_binary_op (location_t location, en
>
>      case TRUNC_MOD_EXPR:
>      case FLOOR_MOD_EXPR:
> +      doing_div_or_mod = true;
>        warn_for_div_by_zero (location, op1);
>
>        if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
> @@ -9888,6 +9897,7 @@ build_binary_op (location_t location, en
>        else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
>           && code1 == INTEGER_TYPE)
>         {
> +         doing_shift = true;
>           if (TREE_CODE (op1) == INTEGER_CST)
>             {
>               if (tree_int_cst_sgn (op1) < 0)
> @@ -9940,6 +9950,7 @@ build_binary_op (location_t location, en
>        else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
>           && code1 == INTEGER_TYPE)
>         {
> +         doing_shift = true;
>           if (TREE_CODE (op1) == INTEGER_CST)
>             {
>               if (tree_int_cst_sgn (op1) < 0)
> @@ -10224,6 +10235,20 @@ build_binary_op (location_t location, en
>        return error_mark_node;
>      }
>
> +  if (flag_ubsan && doing_div_or_mod)
> +    {
> +      ret = ubsan_instrument_division (location, code, op0, op1);
> +      resultcode = COMPOUND_EXPR;
> +      goto return_build_binary_op;
> +    }
> +
> +  if (flag_ubsan && doing_shift)
> +    {
> +      ret = ubsan_instrument_shift (location, code, op0, op1);
> +      resultcode = COMPOUND_EXPR;
> +      goto return_build_binary_op;
> +    }
> +
>    if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE
>         || code0 == FIXED_POINT_TYPE || code0 == VECTOR_TYPE)
>        &&
> --- gcc/asan.c.mp       2013-06-05 18:23:41.070439808 +0200
> +++ gcc/asan.c  2013-06-05 18:26:04.726921089 +0200
> @@ -2034,6 +2034,9 @@ initialize_sanitizer_builtins (void)
>    tree BT_FN_VOID = build_function_type_list (void_type_node, NULL_TREE);
>    tree BT_FN_VOID_PTR
>      = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
> +  tree BT_FN_VOID_PTR_PTR_PTR
> +    = build_function_type_list (void_type_node, ptr_type_node,
> +                               ptr_type_node, ptr_type_node, NULL_TREE);
>    tree BT_FN_VOID_PTR_PTRMODE
>      = build_function_type_list (void_type_node, ptr_type_node,
>                                 build_nonstandard_integer_type (POINTER_SIZE,
> @@ -2099,6 +2102,9 @@ initialize_sanitizer_builtins (void)
>  #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
>  #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
>    ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
> +#undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST
> +#define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \
> +  /* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST
>  #undef DEF_SANITIZER_BUILTIN
>  #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>    decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,          \
>
>         Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 17:57 [RFC] Implement Undefined Behavior Sanitizer Marek Polacek
  2013-06-05 18:44 ` Andrew Pinski
@ 2013-06-05 19:19 ` Jakub Jelinek
  2013-06-05 19:35   ` Jakub Jelinek
  2013-06-08 16:43   ` [RFC] Implement Undefined Behavior Sanitizer (take 2) Marek Polacek
  2013-06-05 19:51 ` [RFC] Implement Undefined Behavior Sanitizer Joseph S. Myers
  2 siblings, 2 replies; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-05 19:19 UTC (permalink / raw)
  To: Marek Polacek; +Cc: GCC Patches

On Wed, Jun 05, 2013 at 07:57:28PM +0200, Marek Polacek wrote:
> There is of course a lot of stuff that needs to be done, more
> specifically:
>   0) fix an ICE which I've noticed right now ;(
>         long a = 1;
>         int b = 3;
>         a <<= b;
>       (error: mismatching comparison operand types)
>       temporarily solved by surrounding "doing_shift = true;"
>       with if (comptypes (type0, type1))
>       But that needs a better solution I'm afraid.  Bah.

> +  tree t, tt;
> +  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> +  tree prec = build_int_cst (TREE_TYPE (op0),
> +			     TYPE_PRECISION (TREE_TYPE (op0)));

You compare prec with op1, thus they should have the same type, shifts
are one of the few binary ops that can have different types of the
operands (result type is the same as first argument, second argument
is something else).  So, if you use TREE_TYPE (op1) as the type of prec,
you should be fine.  More importantly, perhaps you can just use
precm1 in all the places and just use GT_EXPR for tt with precm1, and
use it in EQ.

That said, the C99 rules look somewhat different. 0 << 31 is perfectly
valid, int x = 0; x << 31 is as well.  Undefined is in C99 (likely C11 too
and maybe C89 as well?) is the usual shift count out of bounds (negative or
>= prec), or if the first operand is signed and negative, or if the first
operand is signed positive, but for x << y the expression x * 2^y overflows
in the type of x.

>   1) import & build the ubsan library from LLVM
>      I've already spent some time on this, but failed miserably.  I've thought
>      that importing ubsan/ from LLVM into libsanitizer/, adding
>      libsanitizer/ubsan/Makefile.{am,in}, editing libsanitizer/Makefile.am
>      and libsanitizer/configure.ac, then something like aclocal && automake
>      could be sufficient, but no.  I'd very much appreciate any help with
>      this; is someone willing to help me with this one?  And it seemed so easy...

I'll look at this tomorrow.

>   2) construct arguments for ubsan library
>      I guess that if we want to call for instance
>      void __ubsan::__ubsan_handle_shift_out_of_bounds(ShiftOutOfBoundsData *Data,
>                                                  ValueHandle LHS, ValueHandle RHS)
>      from GCC, we need to construct arguments compatible with
>      ShiftOutOfBoundsData/ValueHandle.  
>      So, perhaps we need some helper function that constructs the CALL_EXPR
>      for the builtin; so far I haven't spent much time on this and don't know
>      what exactly to do here.  Time to look at what asan/tsan do.
>   3) add parsing of -fsanitize=<...>
>      LLVM supports e.g. -fsanitize=shift,divbyzero combination, we should too.
>      This doesn't sound like a big deal; just parse the arguments and set
>      various flags, or error out on invalid combinations.
>   4) and of course, more instrumentation (C/C++ FE, gimple level)
>      What comes to mind is:
>      - float/double to integer conversions,
>      - integer overflows (a long list of various cases here),
>      - invalid conversions of int to bool,
>      - reaching a __builtin_unreachable() call,
>      - VLAs size (e.g. negative size),
>      - store to/load of misaligned address,
>      - store to/load of null pointer,
>      - etc.
>      For the time being, I plan to work on overflows instrumentation.

For at least signed addition, subtraction, multiplication overflow we
ideally want to handle it very efficiently on CPUs that can handle it
efficiently, so pretty much say on x86_64/i386 addl followed by jo
We need some builtin for that, either one with two return values
(this can be done right now say by returning a vector or complex int,
one integer will be the result of the addition/subtraction/multiplication,
another one a flag whether we've overflowed), or maybe we want new tree code
for that or something.

> 2013-06-05  Marek Polacek  <polacek@redhat.com>
> 
> 	* Makefile.in: Add ubsan.c

Missing dot at end of line.

> 	* common.opt: Add -fsanitize=undefined option.
> 	* doc/invoke.texi: Document the new flag.
> 	* ubsan.h: New file.
> 	* ubsan.c): New file.

Extra ).  If prefer if the support routines for ubsan instrumentation
done in the C/C++ FEs only would live in c-family/c-ubsan.[ch] or so.
ubsan.[ch] can perhaps then be used for any instrumentation done at the
gimplification level (if anything is suitable for that), or as support code
for both of that and c-ubsan.c.

> 	* sanitizer.def (DEF_SANITIZER_BUILTIN):

Define. ?

> 	* builtins.def: Define BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW and
> 	BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS.

	* builtins.def (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
	BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS): Define.

cp/ stuff goes into cp/ ChangeLog, without cp/ paths.

> 	* cp/typeck.c (cp_build_binary_op): Add division by zero and shift
> 	instrumentation.

Please make sure you only add it for !processing_template_decl.

Again, c/ ChangeLog.

> 	* c/c-typeck.c (build_binary_op): Likewise.
> 	* builtin-attrs.def: Define ATTR_COLD.

(ATTR_COLD): Define.

Also, the question is where exactly to place these calls to c-ubsan.c
functions.  You generally want it before stuff like short_compare and
similar handling, but on the other side you want it after type promotion
(seems ok already) but e.g. for the division also after conversion to a
single result_type.  Say the ubsan division libcall wants both arguments
to have the same type (unlike ubsan shift call, which has two types of
course), so if you have long long l; char c; l / c or c / l you want
both arguments converted to long long already.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 18:44 ` Andrew Pinski
@ 2013-06-05 19:23   ` Jakub Jelinek
  2013-06-05 19:40     ` Andrew Pinski
  2013-06-05 19:57     ` Joseph S. Myers
  0 siblings, 2 replies; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-05 19:23 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Marek Polacek, GCC Patches

On Wed, Jun 05, 2013 at 11:44:07AM -0700, Andrew Pinski wrote:
> On Wed, Jun 5, 2013 at 10:57 AM, Marek Polacek <polacek@redhat.com> wrote:
> > Comments, please?
> I think it might be better to do handle this while gimplification
> happens rather than while parsing.  The main reason is that constexpr
> might fail due to the added function calls.

Gimplification is too late, the FEs perform various operation shortenings
etc. in many cases, and what exactly is undefined behavior is apparently
heavily dependent on the particular language (C has different rules from
C++).  Yes, constexpr is something to consider in this light, but not
something that can't be handled (recognizing ubsan builtins and just
handling them specially).

> Also please don't shorten file names like ubsan,  we already have file
> names which don't fit in the older POSIX tar format and needs extended
> length support.

We already have asan.c and tsan.c, and that is how it is commonly called.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 19:19 ` Jakub Jelinek
@ 2013-06-05 19:35   ` Jakub Jelinek
  2013-06-06  6:07     ` Jakub Jelinek
  2013-06-08 16:43   ` [RFC] Implement Undefined Behavior Sanitizer (take 2) Marek Polacek
  1 sibling, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-05 19:35 UTC (permalink / raw)
  To: Marek Polacek; +Cc: GCC Patches

On Wed, Jun 05, 2013 at 09:19:10PM +0200, Jakub Jelinek wrote:
> On Wed, Jun 05, 2013 at 07:57:28PM +0200, Marek Polacek wrote:
> > +  tree t, tt;
> > +  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> > +  tree prec = build_int_cst (TREE_TYPE (op0),
> > +			     TYPE_PRECISION (TREE_TYPE (op0)));

BTW, also, to check that the shift count is not < 0 or >= prec, you can
just test that fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op1)), op1)
is LE_EXPR than precm1 (also using the unsigned type).
While optimizers often fold it to that, you might very well just create
fewer trees from the start.

The C99 undefined behavior of left signed shift can be tested by
testing if ((unsigned type for op0's type) op0) >> (precm1 - y) is
non-zero.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 19:23   ` Jakub Jelinek
@ 2013-06-05 19:40     ` Andrew Pinski
  2013-06-06  7:46       ` Konstantin Serebryany
  2013-06-05 19:57     ` Joseph S. Myers
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Pinski @ 2013-06-05 19:40 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Marek Polacek, GCC Patches

On Wed, Jun 5, 2013 at 12:23 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jun 05, 2013 at 11:44:07AM -0700, Andrew Pinski wrote:
>> On Wed, Jun 5, 2013 at 10:57 AM, Marek Polacek <polacek@redhat.com> wrote:
>> > Comments, please?
>> I think it might be better to do handle this while gimplification
>> happens rather than while parsing.  The main reason is that constexpr
>> might fail due to the added function calls.
>
> Gimplification is too late, the FEs perform various operation shortenings
> etc. in many cases, and what exactly is undefined behavior is apparently
> heavily dependent on the particular language (C has different rules from
> C++).  Yes, constexpr is something to consider in this light, but not
> something that can't be handled (recognizing ubsan builtins and just
> handling them specially).
>
>> Also please don't shorten file names like ubsan,  we already have file
>> names which don't fit in the older POSIX tar format and needs extended
>> length support.
>
> We already have asan.c and tsan.c, and that is how it is commonly called.

Can we just move them to array-sanitizer and thread-sanitizer?  I
think those are better names than asan and tsan.  Shorten names are
not useful when a new person is learning the code.

Thanks,
Andrew

>
>         Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 17:57 [RFC] Implement Undefined Behavior Sanitizer Marek Polacek
  2013-06-05 18:44 ` Andrew Pinski
  2013-06-05 19:19 ` Jakub Jelinek
@ 2013-06-05 19:51 ` Joseph S. Myers
  2013-06-07 12:38   ` Marek Polacek
  2 siblings, 1 reply; 46+ messages in thread
From: Joseph S. Myers @ 2013-06-05 19:51 UTC (permalink / raw)
  To: Marek Polacek; +Cc: GCC Patches

On Wed, 5 Jun 2013, Marek Polacek wrote:

> It works by creating a COMPOUND_EXPR around original expression, so e.g.
> it creates:
> 
> if (b < 0 || (b > 31 || a < 0))
>   {
>     __builtin___ubsan_handle_shift_out_of_bounds ();
>   }
> else
>   {
>     0
>   }, a << b;
> 
> from original "a <<= b;".

For the "a < 0" here, and signed left shift of a positive value shifting a 
1 into or past the sign bit, I think it should be possible to control the 
checks separately from other checks on shifts - both because those cases 
were implementation-defined in C90, only undefined in C99/C11, and because 
they are widely used in practice.

> There is of course a lot of stuff that needs to be done, more
> specifically:

5) Testcases (or if applicable, running existing testcases coming with the 
library).

6) Map -ftrapv onto an appropriate subset of this option that handles the 
cases -ftrapv was meant to handle (so arithmetic overflow, which I'd say 
should include INT_MIN / -1).

>   4) and of course, more instrumentation (C/C++ FE, gimple level)
>      What comes to mind is:
>      - float/double to integer conversions,

Under Annex F, these return an unspecified value rather than being 
undefined behavior.

>      - integer overflows (a long list of various cases here),

Strictly, including INT_MIN % -1 (both / and % are undefined if the result 
of either is unrepresentable) - it appears you've already got that.  Of 
course INT_MIN % -1 and INT_MIN / -1 should *work* reliably with -fwrapv, 
which is another bug (30484).

>      - invalid conversions of int to bool,

What do you mean?  Conversion to bool is just a comparison != 0.

>      - VLAs size (e.g. negative size),

Or the multiplication used to compute the size in bytes overflows (really, 
there should be some code generated expanding the stack bit by bit to 
avoid it accidentally overflowing into another allocated area of memory, I 
suppose).

> +@item -fsanitize=undefined
> +Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector
> +Various computations will be instrumented to detect
> +undefined behavior, e.g. division by zero or various overflows.

e.g.@:

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 19:23   ` Jakub Jelinek
  2013-06-05 19:40     ` Andrew Pinski
@ 2013-06-05 19:57     ` Joseph S. Myers
  1 sibling, 0 replies; 46+ messages in thread
From: Joseph S. Myers @ 2013-06-05 19:57 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andrew Pinski, Marek Polacek, GCC Patches

On Wed, 5 Jun 2013, Jakub Jelinek wrote:

> On Wed, Jun 05, 2013 at 11:44:07AM -0700, Andrew Pinski wrote:
> > On Wed, Jun 5, 2013 at 10:57 AM, Marek Polacek <polacek@redhat.com> wrote:
> > > Comments, please?
> > I think it might be better to do handle this while gimplification
> > happens rather than while parsing.  The main reason is that constexpr
> > might fail due to the added function calls.
> 
> Gimplification is too late, the FEs perform various operation shortenings
> etc. in many cases, and what exactly is undefined behavior is apparently
> heavily dependent on the particular language (C has different rules from
> C++).  Yes, constexpr is something to consider in this light, but not
> something that can't be handled (recognizing ubsan builtins and just
> handling them specially).

Agreed, this needs handling before folding and other optimizations in the 
front ends to have predictable results.

It may make sense to try running the whole testsuite with this option, 
minus tests of -fwrapv, to make sure it doesn't break any corner cases of 
valid tests (of course it may well show up some invalid tests).  In 
particular, gcc.dg/*const-expr* and gcc.dg/overflow-warn*.  Generating 
extra diagnostics for code in those tests that already gets a diagnostic 
is OK, as long as it doesn't generate diagnostics for non-overflow cases 
in those tests that aren't meant to be treated as overflow, or lose 
diagnostics for cases that are required to be diagnosed.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 19:35   ` Jakub Jelinek
@ 2013-06-06  6:07     ` Jakub Jelinek
  2013-06-06 12:17       ` Jason Merrill
  2013-06-06 13:26       ` Segher Boessenkool
  0 siblings, 2 replies; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-06  6:07 UTC (permalink / raw)
  To: Marek Polacek, Jason Merrill, Joseph S. Myers; +Cc: GCC Patches

On Wed, Jun 05, 2013 at 09:35:08PM +0200, Jakub Jelinek wrote:
> On Wed, Jun 05, 2013 at 09:19:10PM +0200, Jakub Jelinek wrote:
> > On Wed, Jun 05, 2013 at 07:57:28PM +0200, Marek Polacek wrote:
> > > +  tree t, tt;
> > > +  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> > > +  tree prec = build_int_cst (TREE_TYPE (op0),
> > > +			     TYPE_PRECISION (TREE_TYPE (op0)));
> 
> BTW, also, to check that the shift count is not < 0 or >= prec, you can
> just test that fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op1)), op1)
> is LE_EXPR than precm1 (also using the unsigned type).
> While optimizers often fold it to that, you might very well just create
> fewer trees from the start.
> 
> The C99 undefined behavior of left signed shift can be tested by
> testing if ((unsigned type for op0's type) op0) >> (precm1 - y) is
> non-zero.

The C++11/C++14 undefined behavior of left signed shift can be tested
similarly, if ((unsigned type for op0's type) op0) >> (precm1 - y)
is greater than one, then it is undefined behavior.
Jason, does
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3675.html#1457
apply just to C++11/C++14, or to C++03 too?

In C++03 I see in [expr.shift]/2
"The value of E1 << E2 is E1 (interpreted as a bit pattern) left-shifted E2
bit positions; vacated bits are zero-filled. If E1 has an unsigned type,
the value of the result is E1 multiplied by the quantity 2 raised to
the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long,
UINT_MAX+1 otherwise."  Is that the same case as C90 then, the wording seems
to be pretty much the same?

As for controlling the C99 (or even C++11?) warning individually, either it
can be a separate suboption of -fsanitize=, like -fsanitize=shift,lshiftc99
(but then, would lshiftc99 be included in undefined and similar option
groups), or IMHO better we just convince ubsan upstream to have env var for
controlling the lshift diagnostics, gcc emits always checks for precisely
what the current -std= makes as undefined behavior (though, because of DRs
that is somewhat fuzzy, pre-DR1457 C++11 vs. post-DR1457 C++11), and users
would through env var just choose, ok, please ignore left shift warnings
of the 1 << 31 style, or ignore those and also 2 << 31 style.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 19:40     ` Andrew Pinski
@ 2013-06-06  7:46       ` Konstantin Serebryany
  2013-06-06  8:21         ` Jakub Jelinek
  0 siblings, 1 reply; 46+ messages in thread
From: Konstantin Serebryany @ 2013-06-06  7:46 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Jakub Jelinek, Marek Polacek, GCC Patches

On Wed, Jun 5, 2013 at 11:40 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Wed, Jun 5, 2013 at 12:23 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Wed, Jun 05, 2013 at 11:44:07AM -0700, Andrew Pinski wrote:
>>> On Wed, Jun 5, 2013 at 10:57 AM, Marek Polacek <polacek@redhat.com> wrote:
>>> > Comments, please?
>>> I think it might be better to do handle this while gimplification
>>> happens rather than while parsing.  The main reason is that constexpr
>>> might fail due to the added function calls.
>>
>> Gimplification is too late, the FEs perform various operation shortenings
>> etc. in many cases, and what exactly is undefined behavior is apparently
>> heavily dependent on the particular language (C has different rules from
>> C++).  Yes, constexpr is something to consider in this light, but not
>> something that can't be handled (recognizing ubsan builtins and just
>> handling them specially).
>>
>>> Also please don't shorten file names like ubsan,  we already have file
>>> names which don't fit in the older POSIX tar format and needs extended
>>> length support.
>>
>> We already have asan.c and tsan.c, and that is how it is commonly called.
>
> Can we just move them to array-sanitizer and thread-sanitizer?  I

s/array-sanitizer/address-sanitizer/

If we are going to import the ubsan run-time from LLVM's
projects/compiler-rt/lib/ubsan,
we may also need to update the contents of
libsanitizer/sanitizer_common and keep them in sync afterwards.
(ubsan shares few bits of code with asan/tsan/msan)
The simplest way to do that is to extend libsanitizer/merge.sh

--kcc


> think those are better names than asan and tsan.  Shorten names are
> not useful when a new person is learning the code.
>
> Thanks,
> Andrew
>
>>
>>         Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  7:46       ` Konstantin Serebryany
@ 2013-06-06  8:21         ` Jakub Jelinek
  2013-06-06  8:26           ` Andrew Pinski
  2013-06-06  8:47           ` Konstantin Serebryany
  0 siblings, 2 replies; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-06  8:21 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: Andrew Pinski, Marek Polacek, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 896 bytes --]

On Thu, Jun 06, 2013 at 11:46:19AM +0400, Konstantin Serebryany wrote:
> If we are going to import the ubsan run-time from LLVM's
> projects/compiler-rt/lib/ubsan,
> we may also need to update the contents of
> libsanitizer/sanitizer_common and keep them in sync afterwards.
> (ubsan shares few bits of code with asan/tsan/msan)
> The simplest way to do that is to extend libsanitizer/merge.sh

Sure.  I've done so far just a partial merge by hand (only 3 changed files
for the minimum of changes required to get ubsan to build), and have tested just
that it compiles, not that libubsan actually works.

P1 patch is the toplevel stuff to add ubsan into GCC libsanitizer, plus
ubsan/Makefile* and ubsan/libtool-version (i.e. gcc owned files).
P2 is the actual merge of the ubsan files.
P3 is something I'd propose for ubsan upstream, without it g++ warns about
__int128 in -pedantic mode.

	Jakub

[-- Attachment #2: P1 --]
[-- Type: text/plain, Size: 30115 bytes --]

2013-06-06  Jakub Jelinek  <jakub@redhat.com>

	* Makefile.am (SUBDIRS): Add ubsan.
	* configure.ac (AC_CONFIG_FILES): Add ubsan/Makefile.
	* merge.sh: Merge ubsan.
	* sanitizer_common/sanitizer_report_decorator.h: Partial merge from trunk.
	* sanitizer_common/sanitizer_printf.cc: Likewise.
	* sanitizer_common/sanitizer_common.h: Likewise.
	* ubsan: New directory. Import ubsan runtime from llvm.

--- libsanitizer/Makefile.am.jj	2012-12-12 23:32:04.000000000 +0100
+++ libsanitizer/Makefile.am	2013-06-06 09:04:07.194055649 +0200
@@ -1,13 +1,13 @@
 ACLOCAL_AMFLAGS = -I .. -I ../config
 
 if TSAN_SUPPORTED
-SUBDIRS = interception sanitizer_common asan tsan 
+SUBDIRS = interception sanitizer_common asan tsan ubsan
 else
-SUBDIRS = interception sanitizer_common asan 
+SUBDIRS = interception sanitizer_common asan ubsan
 endif
 
 if USING_MAC_INTERPOSE
-SUBDIRS = sanitizer_common asan
+SUBDIRS = sanitizer_common asan ubsan
 endif
 
 # Work around what appears to be a GNU make bug handling MAKEFLAGS
--- libsanitizer/configure.ac.jj	2013-03-22 15:13:50.000000000 +0100
+++ libsanitizer/configure.ac	2013-06-06 09:07:09.123734487 +0200
@@ -89,7 +89,7 @@ AM_CONDITIONAL(USING_MAC_INTERPOSE, $MAC
 
 AC_CONFIG_FILES([Makefile])
 
-AC_CONFIG_FILES(AC_FOREACH([DIR], [interception sanitizer_common asan], [DIR/Makefile ]),
+AC_CONFIG_FILES(AC_FOREACH([DIR], [interception sanitizer_common asan ubsan], [DIR/Makefile ]),
   [cat > vpsed$$ << \_EOF
 s!`test -f '$<' || echo '$(srcdir)/'`!!
 _EOF
--- libsanitizer/merge.sh.jj	2013-02-21 20:10:41.000000000 +0100
+++ libsanitizer/merge.sh	2013-06-06 09:08:11.278370545 +0200
@@ -69,6 +69,7 @@ merge lib/asan asan
 merge lib/tsan/rtl tsan
 merge lib/sanitizer_common sanitizer_common
 merge lib/interception interception
+merge lib/ubsan ubsan
 
 rm -rf upstream
 
--- libsanitizer/sanitizer_common/sanitizer_report_decorator.h.jj	2013-01-10 14:35:27.000000000 +0100
+++ libsanitizer/sanitizer_common/sanitizer_report_decorator.h	2013-06-06 08:51:12.000000000 +0200
@@ -12,24 +14,26 @@
 //
 //===----------------------------------------------------------------------===//
 
-#ifndef SANITIZER_ALLOCATOR_H
-#define SANITIZER_ALLOCATOR_H
+#ifndef SANITIZER_REPORT_DECORATOR_H
+#define SANITIZER_REPORT_DECORATOR_H
 
 namespace __sanitizer {
 class AnsiColorDecorator {
  public:
   explicit AnsiColorDecorator(bool use_ansi_colors) : ansi_(use_ansi_colors) { }
-  const char *Black()        { return ansi_ ? "\033[1m\033[30m" : ""; }
-  const char *Red()          { return ansi_ ? "\033[1m\033[31m" : ""; }
-  const char *Green()        { return ansi_ ? "\033[1m\033[32m" : ""; }
-  const char *Yellow()       { return ansi_ ? "\033[1m\033[33m" : ""; }
-  const char *Blue()         { return ansi_ ? "\033[1m\033[34m" : ""; }
-  const char *Magenta()      { return ansi_ ? "\033[1m\033[35m" : ""; }
-  const char *Cyan()         { return ansi_ ? "\033[1m\033[36m" : ""; }
-  const char *White()        { return ansi_ ? "\033[1m\033[37m" : ""; }
-  const char *Default()      { return ansi_ ? "\033[1m\033[0m"  : ""; }
+  const char *Bold()    const { return ansi_ ? "\033[1m" : ""; }
+  const char *Black()   const { return ansi_ ? "\033[1m\033[30m" : ""; }
+  const char *Red()     const { return ansi_ ? "\033[1m\033[31m" : ""; }
+  const char *Green()   const { return ansi_ ? "\033[1m\033[32m" : ""; }
+  const char *Yellow()  const { return ansi_ ? "\033[1m\033[33m" : ""; }
+  const char *Blue()    const { return ansi_ ? "\033[1m\033[34m" : ""; }
+  const char *Magenta() const { return ansi_ ? "\033[1m\033[35m" : ""; }
+  const char *Cyan()    const { return ansi_ ? "\033[1m\033[36m" : ""; }
+  const char *White()   const { return ansi_ ? "\033[1m\033[37m" : ""; }
+  const char *Default() const { return ansi_ ? "\033[1m\033[0m"  : ""; }
  private:
   bool ansi_;
 };
 }  // namespace __sanitizer
-#endif  // SANITIZER_ALLOCATOR_H
+
+#endif  // SANITIZER_REPORT_DECORATOR_H
--- libsanitizer/sanitizer_common/sanitizer_printf.cc.jj	2013-01-10 14:35:27.000000000 +0100
+++ libsanitizer/sanitizer_common/sanitizer_printf.cc	2013-06-06 09:30:57.012700637 +0200
@@ -21,6 +21,8 @@
 
 namespace __sanitizer {
 
+StaticSpinMutex CommonSanitizerReportMutex;
+
 static int AppendChar(char **buff, const char *buff_end, char c) {
   if (*buff < buff_end) {
     **buff = c;
--- libsanitizer/sanitizer_common/sanitizer_common.h.jj	2013-02-21 14:10:41.000000000 +0100
+++ libsanitizer/sanitizer_common/sanitizer_common.h	2013-06-06 09:30:28.399184514 +0200
@@ -15,6 +15,7 @@
 #define SANITIZER_COMMON_H
 
 #include "sanitizer_internal_defs.h"
+#include "sanitizer_mutex.h"
 
 namespace __sanitizer {
 struct StackTrace;
@@ -105,6 +106,8 @@ bool PrintsToTty();
 void Printf(const char *format, ...);
 void Report(const char *format, ...);
 void SetPrintfAndReportCallback(void (*callback)(const char *));
+// Can be used to prevent mixing error reports from different sanitizers.
+extern StaticSpinMutex CommonSanitizerReportMutex;
 
 fd_t OpenFile(const char *filename, bool write);
 // Opens the file 'file_name" and reads up to 'max_len' bytes.
--- libsanitizer/ubsan/libtool-version.jj	2013-06-06 10:04:49.662190892 +0200
+++ libsanitizer/ubsan/libtool-version	2012-11-23 00:14:41.000000000 +0100
@@ -0,0 +1,6 @@
+# This file is used to maintain libtool version info for libmudflap.  See
+# the libtool manual to understand the meaning of the fields.  This is
+# a separate file so that version updates don't involve re-running
+# automake.
+# CURRENT:REVISION:AGE
+0:0:0
--- libsanitizer/ubsan/Makefile.am.jj	2013-06-06 10:04:40.326579930 +0200
+++ libsanitizer/ubsan/Makefile.am	2013-06-06 10:04:08.001835829 +0200
@@ -0,0 +1,65 @@
+AM_CPPFLAGS = -I $(top_srcdir) -I $(top_srcdir)/include
+
+# May be used by toolexeclibdir.
+gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
+
+DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS 
+AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long  -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fvisibility=hidden -Wno-variadic-macros
+AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
+ACLOCAL_AMFLAGS = -I m4
+
+toolexeclib_LTLIBRARIES = libubsan.la
+
+ubsan_files = \
+	ubsan_diag.cc \
+	ubsan_handlers.cc \
+	ubsan_handlers_cxx.cc \
+	ubsan_type_hash.cc \
+	ubsan_value.cc
+
+libubsan_la_SOURCES = $(ubsan_files) 
+libubsan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la $(top_builddir)/interception/libinterception.la $(LIBSTDCXX_RAW_CXX_LDFLAGS)
+libubsan_la_LDFLAGS = -version-info `grep -v '^\#' $(srcdir)/libtool-version` -lpthread -ldl
+
+# Work around what appears to be a GNU make bug handling MAKEFLAGS
+# values defined in terms of make variables, as is the case for CC and
+# friends when we are called from the top level Makefile.
+AM_MAKEFLAGS = \
+	"AR_FLAGS=$(AR_FLAGS)" \
+	"CC_FOR_BUILD=$(CC_FOR_BUILD)" \
+	"CFLAGS=$(CFLAGS)" \
+	"CXXFLAGS=$(CXXFLAGS)" \
+	"CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
+	"CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
+	"INSTALL=$(INSTALL)" \
+	"INSTALL_DATA=$(INSTALL_DATA)" \
+	"INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \
+	"INSTALL_SCRIPT=$(INSTALL_SCRIPT)" \
+	"JC1FLAGS=$(JC1FLAGS)" \
+	"LDFLAGS=$(LDFLAGS)" \
+	"LIBCFLAGS=$(LIBCFLAGS)" \
+	"LIBCFLAGS_FOR_TARGET=$(LIBCFLAGS_FOR_TARGET)" \
+	"MAKE=$(MAKE)" \
+	"MAKEINFO=$(MAKEINFO) $(MAKEINFOFLAGS)" \
+	"PICFLAG=$(PICFLAG)" \
+	"PICFLAG_FOR_TARGET=$(PICFLAG_FOR_TARGET)" \
+	"SHELL=$(SHELL)" \
+	"RUNTESTFLAGS=$(RUNTESTFLAGS)" \
+	"exec_prefix=$(exec_prefix)" \
+	"infodir=$(infodir)" \
+	"libdir=$(libdir)" \
+	"prefix=$(prefix)" \
+	"includedir=$(includedir)" \
+	"AR=$(AR)" \
+	"AS=$(AS)" \
+	"LD=$(LD)" \
+	"LIBCFLAGS=$(LIBCFLAGS)" \
+	"NM=$(NM)" \
+	"PICFLAG=$(PICFLAG)" \
+	"RANLIB=$(RANLIB)" \
+	"DESTDIR=$(DESTDIR)"
+
+MAKEOVERRIDES=
+
+## ################################################################
+
--- libsanitizer/ubsan/Makefile.in.jj	2013-06-06 10:04:44.061932728 +0200
+++ libsanitizer/ubsan/Makefile.in	2013-06-06 10:04:24.991501999 +0200
@@ -0,0 +1,578 @@
+# Makefile.in generated by automake 1.11.1 from Makefile.am.
+# @configure_input@
+
+# Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
+# 2003, 2004, 2005, 2006, 2007, 2008, 2009  Free Software Foundation,
+# Inc.
+# This Makefile.in is free software; the Free Software Foundation
+# gives unlimited permission to copy and/or distribute it,
+# with or without modifications, as long as this notice is preserved.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY, to the extent permitted by law; without
+# even the implied warranty of MERCHANTABILITY or FITNESS FOR A
+# PARTICULAR PURPOSE.
+
+@SET_MAKE@
+
+VPATH = @srcdir@
+pkgdatadir = $(datadir)/@PACKAGE@
+pkgincludedir = $(includedir)/@PACKAGE@
+pkglibdir = $(libdir)/@PACKAGE@
+pkglibexecdir = $(libexecdir)/@PACKAGE@
+am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd
+install_sh_DATA = $(install_sh) -c -m 644
+install_sh_PROGRAM = $(install_sh) -c
+install_sh_SCRIPT = $(install_sh) -c
+INSTALL_HEADER = $(INSTALL_DATA)
+transform = $(program_transform_name)
+NORMAL_INSTALL = :
+PRE_INSTALL = :
+POST_INSTALL = :
+NORMAL_UNINSTALL = :
+PRE_UNINSTALL = :
+POST_UNINSTALL = :
+build_triplet = @build@
+host_triplet = @host@
+target_triplet = @target@
+subdir = ubsan
+DIST_COMMON = $(srcdir)/Makefile.in $(srcdir)/Makefile.am
+ACLOCAL_M4 = $(top_srcdir)/aclocal.m4
+am__aclocal_m4_deps = $(top_srcdir)/../config/acx.m4 \
+	$(top_srcdir)/../config/depstand.m4 \
+	$(top_srcdir)/../config/lead-dot.m4 \
+	$(top_srcdir)/../config/libstdc++-raw-cxx.m4 \
+	$(top_srcdir)/../config/multi.m4 \
+	$(top_srcdir)/../config/override.m4 \
+	$(top_srcdir)/../ltoptions.m4 $(top_srcdir)/../ltsugar.m4 \
+	$(top_srcdir)/../ltversion.m4 $(top_srcdir)/../lt~obsolete.m4 \
+	$(top_srcdir)/acinclude.m4 $(top_srcdir)/../libtool.m4 \
+	$(top_srcdir)/configure.ac
+am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \
+	$(ACLOCAL_M4)
+mkinstalldirs = $(SHELL) $(top_srcdir)/../mkinstalldirs
+CONFIG_CLEAN_FILES =
+CONFIG_CLEAN_VPATH_FILES =
+am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`;
+am__vpath_adj = case $$p in \
+    $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \
+    *) f=$$p;; \
+  esac;
+am__strip_dir = f=`echo $$p | sed -e 's|^.*/||'`;
+am__install_max = 40
+am__nobase_strip_setup = \
+  srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*|]/\\\\&/g'`
+am__nobase_strip = \
+  for p in $$list; do echo "$$p"; done | sed -e "s|$$srcdirstrip/||"
+am__nobase_list = $(am__nobase_strip_setup); \
+  for p in $$list; do echo "$$p $$p"; done | \
+  sed "s| $$srcdirstrip/| |;"' / .*\//!s/ .*/ ./; s,\( .*\)/[^/]*$$,\1,' | \
+  $(AWK) 'BEGIN { files["."] = "" } { files[$$2] = files[$$2] " " $$1; \
+    if (++n[$$2] == $(am__install_max)) \
+      { print $$2, files[$$2]; n[$$2] = 0; files[$$2] = "" } } \
+    END { for (dir in files) print dir, files[dir] }'
+am__base_list = \
+  sed '$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;s/\n/ /g' | \
+  sed '$$!N;$$!N;$$!N;$$!N;s/\n/ /g'
+am__installdirs = "$(DESTDIR)$(toolexeclibdir)"
+LTLIBRARIES = $(toolexeclib_LTLIBRARIES)
+am__DEPENDENCIES_1 =
+libubsan_la_DEPENDENCIES =  \
+	$(top_builddir)/sanitizer_common/libsanitizer_common.la \
+	$(top_builddir)/interception/libinterception.la \
+	$(am__DEPENDENCIES_1)
+am__objects_1 = ubsan_diag.lo ubsan_handlers.lo ubsan_handlers_cxx.lo \
+	ubsan_type_hash.lo ubsan_value.lo
+am_libubsan_la_OBJECTS = $(am__objects_1)
+libubsan_la_OBJECTS = $(am_libubsan_la_OBJECTS)
+libubsan_la_LINK = $(LIBTOOL) --tag=CXX $(AM_LIBTOOLFLAGS) \
+	$(LIBTOOLFLAGS) --mode=link $(CXXLD) $(AM_CXXFLAGS) \
+	$(CXXFLAGS) $(libubsan_la_LDFLAGS) $(LDFLAGS) -o $@
+DEFAULT_INCLUDES = -I.@am__isrc@
+depcomp = $(SHELL) $(top_srcdir)/../depcomp
+am__depfiles_maybe = depfiles
+am__mv = mv -f
+CXXCOMPILE = $(CXX) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
+	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS)
+LTCXXCOMPILE = $(LIBTOOL) --tag=CXX $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=compile $(CXX) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
+	$(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS)
+CXXLD = $(CXX)
+CXXLINK = $(LIBTOOL) --tag=CXX $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
+	--mode=link $(CXXLD) $(AM_CXXFLAGS) $(CXXFLAGS) $(AM_LDFLAGS) \
+	$(LDFLAGS) -o $@
+SOURCES = $(libubsan_la_SOURCES)
+ETAGS = etags
+CTAGS = ctags
+ACLOCAL = @ACLOCAL@
+AMTAR = @AMTAR@
+AR = @AR@
+AUTOCONF = @AUTOCONF@
+AUTOHEADER = @AUTOHEADER@
+AUTOMAKE = @AUTOMAKE@
+AWK = @AWK@
+CC = @CC@
+CCAS = @CCAS@
+CCASDEPMODE = @CCASDEPMODE@
+CCASFLAGS = @CCASFLAGS@
+CCDEPMODE = @CCDEPMODE@
+CFLAGS = @CFLAGS@
+CPP = @CPP@
+CPPFLAGS = @CPPFLAGS@
+CXX = @CXX@
+CXXCPP = @CXXCPP@
+CXXDEPMODE = @CXXDEPMODE@
+CXXFLAGS = @CXXFLAGS@
+CYGPATH_W = @CYGPATH_W@
+DEFS = -D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS 
+DEPDIR = @DEPDIR@
+DSYMUTIL = @DSYMUTIL@
+DUMPBIN = @DUMPBIN@
+ECHO_C = @ECHO_C@
+ECHO_N = @ECHO_N@
+ECHO_T = @ECHO_T@
+EGREP = @EGREP@
+EXEEXT = @EXEEXT@
+FGREP = @FGREP@
+GREP = @GREP@
+INSTALL = @INSTALL@
+INSTALL_DATA = @INSTALL_DATA@
+INSTALL_PROGRAM = @INSTALL_PROGRAM@
+INSTALL_SCRIPT = @INSTALL_SCRIPT@
+INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@
+LD = @LD@
+LDFLAGS = @LDFLAGS@
+LIBOBJS = @LIBOBJS@
+LIBS = @LIBS@
+LIBSTDCXX_RAW_CXX_CXXFLAGS = @LIBSTDCXX_RAW_CXX_CXXFLAGS@
+LIBSTDCXX_RAW_CXX_LDFLAGS = @LIBSTDCXX_RAW_CXX_LDFLAGS@
+LIBTOOL = @LIBTOOL@
+LIPO = @LIPO@
+LN_S = @LN_S@
+LTLIBOBJS = @LTLIBOBJS@
+MAINT = @MAINT@
+MAKEINFO = @MAKEINFO@
+MKDIR_P = @MKDIR_P@
+NM = @NM@
+NMEDIT = @NMEDIT@
+OBJDUMP = @OBJDUMP@
+OBJEXT = @OBJEXT@
+OTOOL = @OTOOL@
+OTOOL64 = @OTOOL64@
+PACKAGE = @PACKAGE@
+PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@
+PACKAGE_NAME = @PACKAGE_NAME@
+PACKAGE_STRING = @PACKAGE_STRING@
+PACKAGE_TARNAME = @PACKAGE_TARNAME@
+PACKAGE_URL = @PACKAGE_URL@
+PACKAGE_VERSION = @PACKAGE_VERSION@
+PATH_SEPARATOR = @PATH_SEPARATOR@
+RANLIB = @RANLIB@
+SED = @SED@
+SET_MAKE = @SET_MAKE@
+SHELL = @SHELL@
+STRIP = @STRIP@
+VERSION = @VERSION@
+abs_builddir = @abs_builddir@
+abs_srcdir = @abs_srcdir@
+abs_top_builddir = @abs_top_builddir@
+abs_top_srcdir = @abs_top_srcdir@
+ac_ct_CC = @ac_ct_CC@
+ac_ct_CXX = @ac_ct_CXX@
+ac_ct_DUMPBIN = @ac_ct_DUMPBIN@
+am__include = @am__include@
+am__leading_dot = @am__leading_dot@
+am__quote = @am__quote@
+am__tar = @am__tar@
+am__untar = @am__untar@
+bindir = @bindir@
+build = @build@
+build_alias = @build_alias@
+build_cpu = @build_cpu@
+build_os = @build_os@
+build_vendor = @build_vendor@
+builddir = @builddir@
+datadir = @datadir@
+datarootdir = @datarootdir@
+docdir = @docdir@
+dvidir = @dvidir@
+enable_shared = @enable_shared@
+enable_static = @enable_static@
+exec_prefix = @exec_prefix@
+host = @host@
+host_alias = @host_alias@
+host_cpu = @host_cpu@
+host_os = @host_os@
+host_vendor = @host_vendor@
+htmldir = @htmldir@
+includedir = @includedir@
+infodir = @infodir@
+install_sh = @install_sh@
+libdir = @libdir@
+libexecdir = @libexecdir@
+localedir = @localedir@
+localstatedir = @localstatedir@
+mandir = @mandir@
+mkdir_p = @mkdir_p@
+multi_basedir = @multi_basedir@
+oldincludedir = @oldincludedir@
+pdfdir = @pdfdir@
+prefix = @prefix@
+program_transform_name = @program_transform_name@
+psdir = @psdir@
+sbindir = @sbindir@
+sharedstatedir = @sharedstatedir@
+srcdir = @srcdir@
+sysconfdir = @sysconfdir@
+target = @target@
+target_alias = @target_alias@
+target_cpu = @target_cpu@
+target_noncanonical = @target_noncanonical@
+target_os = @target_os@
+target_vendor = @target_vendor@
+toolexecdir = @toolexecdir@
+toolexeclibdir = @toolexeclibdir@
+top_build_prefix = @top_build_prefix@
+top_builddir = @top_builddir@
+top_srcdir = @top_srcdir@
+AM_CPPFLAGS = -I $(top_srcdir) -I $(top_srcdir)/include
+
+# May be used by toolexeclibdir.
+gcc_version := $(shell cat $(top_srcdir)/../gcc/BASE-VER)
+AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings -pedantic \
+	-Wno-long-long -fPIC -fno-builtin -fno-exceptions \
+	-fomit-frame-pointer -funwind-tables -fvisibility=hidden \
+	-Wno-variadic-macros $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
+ACLOCAL_AMFLAGS = -I m4
+toolexeclib_LTLIBRARIES = libubsan.la
+ubsan_files = \
+	ubsan_diag.cc \
+	ubsan_handlers.cc \
+	ubsan_handlers_cxx.cc \
+	ubsan_type_hash.cc \
+	ubsan_value.cc
+
+libubsan_la_SOURCES = $(ubsan_files) 
+libubsan_la_LIBADD = $(top_builddir)/sanitizer_common/libsanitizer_common.la $(top_builddir)/interception/libinterception.la $(LIBSTDCXX_RAW_CXX_LDFLAGS)
+libubsan_la_LDFLAGS = -version-info `grep -v '^\#' $(srcdir)/libtool-version` -lpthread -ldl
+
+# Work around what appears to be a GNU make bug handling MAKEFLAGS
+# values defined in terms of make variables, as is the case for CC and
+# friends when we are called from the top level Makefile.
+AM_MAKEFLAGS = \
+	"AR_FLAGS=$(AR_FLAGS)" \
+	"CC_FOR_BUILD=$(CC_FOR_BUILD)" \
+	"CFLAGS=$(CFLAGS)" \
+	"CXXFLAGS=$(CXXFLAGS)" \
+	"CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
+	"CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
+	"INSTALL=$(INSTALL)" \
+	"INSTALL_DATA=$(INSTALL_DATA)" \
+	"INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \
+	"INSTALL_SCRIPT=$(INSTALL_SCRIPT)" \
+	"JC1FLAGS=$(JC1FLAGS)" \
+	"LDFLAGS=$(LDFLAGS)" \
+	"LIBCFLAGS=$(LIBCFLAGS)" \
+	"LIBCFLAGS_FOR_TARGET=$(LIBCFLAGS_FOR_TARGET)" \
+	"MAKE=$(MAKE)" \
+	"MAKEINFO=$(MAKEINFO) $(MAKEINFOFLAGS)" \
+	"PICFLAG=$(PICFLAG)" \
+	"PICFLAG_FOR_TARGET=$(PICFLAG_FOR_TARGET)" \
+	"SHELL=$(SHELL)" \
+	"RUNTESTFLAGS=$(RUNTESTFLAGS)" \
+	"exec_prefix=$(exec_prefix)" \
+	"infodir=$(infodir)" \
+	"libdir=$(libdir)" \
+	"prefix=$(prefix)" \
+	"includedir=$(includedir)" \
+	"AR=$(AR)" \
+	"AS=$(AS)" \
+	"LD=$(LD)" \
+	"LIBCFLAGS=$(LIBCFLAGS)" \
+	"NM=$(NM)" \
+	"PICFLAG=$(PICFLAG)" \
+	"RANLIB=$(RANLIB)" \
+	"DESTDIR=$(DESTDIR)"
+
+MAKEOVERRIDES = 
+all: all-am
+
+.SUFFIXES:
+.SUFFIXES: .cc .lo .o .obj
+$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am  $(am__configure_deps)
+	@for dep in $?; do \
+	  case '$(am__configure_deps)' in \
+	    *$$dep*) \
+	      ( cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh ) \
+	        && { if test -f $@; then exit 0; else break; fi; }; \
+	      exit 1;; \
+	  esac; \
+	done; \
+	echo ' cd $(top_srcdir) && $(AUTOMAKE) --foreign ubsan/Makefile'; \
+	$(am__cd) $(top_srcdir) && \
+	  $(AUTOMAKE) --foreign ubsan/Makefile
+.PRECIOUS: Makefile
+Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status
+	@case '$?' in \
+	  *config.status*) \
+	    cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh;; \
+	  *) \
+	    echo ' cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe)'; \
+	    cd $(top_builddir) && $(SHELL) ./config.status $(subdir)/$@ $(am__depfiles_maybe);; \
+	esac;
+
+$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES)
+	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
+
+$(top_srcdir)/configure: @MAINTAINER_MODE_TRUE@ $(am__configure_deps)
+	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
+$(ACLOCAL_M4): @MAINTAINER_MODE_TRUE@ $(am__aclocal_m4_deps)
+	cd $(top_builddir) && $(MAKE) $(AM_MAKEFLAGS) am--refresh
+$(am__aclocal_m4_deps):
+install-toolexeclibLTLIBRARIES: $(toolexeclib_LTLIBRARIES)
+	@$(NORMAL_INSTALL)
+	test -z "$(toolexeclibdir)" || $(MKDIR_P) "$(DESTDIR)$(toolexeclibdir)"
+	@list='$(toolexeclib_LTLIBRARIES)'; test -n "$(toolexeclibdir)" || list=; \
+	list2=; for p in $$list; do \
+	  if test -f $$p; then \
+	    list2="$$list2 $$p"; \
+	  else :; fi; \
+	done; \
+	test -z "$$list2" || { \
+	  echo " $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=install $(INSTALL) $(INSTALL_STRIP_FLAG) $$list2 '$(DESTDIR)$(toolexeclibdir)'"; \
+	  $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=install $(INSTALL) $(INSTALL_STRIP_FLAG) $$list2 "$(DESTDIR)$(toolexeclibdir)"; \
+	}
+
+uninstall-toolexeclibLTLIBRARIES:
+	@$(NORMAL_UNINSTALL)
+	@list='$(toolexeclib_LTLIBRARIES)'; test -n "$(toolexeclibdir)" || list=; \
+	for p in $$list; do \
+	  $(am__strip_dir) \
+	  echo " $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=uninstall rm -f '$(DESTDIR)$(toolexeclibdir)/$$f'"; \
+	  $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=uninstall rm -f "$(DESTDIR)$(toolexeclibdir)/$$f"; \
+	done
+
+clean-toolexeclibLTLIBRARIES:
+	-test -z "$(toolexeclib_LTLIBRARIES)" || rm -f $(toolexeclib_LTLIBRARIES)
+	@list='$(toolexeclib_LTLIBRARIES)'; for p in $$list; do \
+	  dir="`echo $$p | sed -e 's|/[^/]*$$||'`"; \
+	  test "$$dir" != "$$p" || dir=.; \
+	  echo "rm -f \"$${dir}/so_locations\""; \
+	  rm -f "$${dir}/so_locations"; \
+	done
+libubsan.la: $(libubsan_la_OBJECTS) $(libubsan_la_DEPENDENCIES) 
+	$(libubsan_la_LINK) -rpath $(toolexeclibdir) $(libubsan_la_OBJECTS) $(libubsan_la_LIBADD) $(LIBS)
+
+mostlyclean-compile:
+	-rm -f *.$(OBJEXT)
+
+distclean-compile:
+	-rm -f *.tab.c
+
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ubsan_diag.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ubsan_handlers.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ubsan_handlers_cxx.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ubsan_type_hash.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ubsan_value.Plo@am__quote@
+
+.cc.o:
+@am__fastdepCXX_TRUE@	$(CXXCOMPILE) -MT $@ -MD -MP -MF $(DEPDIR)/$*.Tpo -c -o $@ $<
+@am__fastdepCXX_TRUE@	$(am__mv) $(DEPDIR)/$*.Tpo $(DEPDIR)/$*.Po
+@AMDEP_TRUE@@am__fastdepCXX_FALSE@	source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCXX_FALSE@	DEPDIR=$(DEPDIR) $(CXXDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCXX_FALSE@	$(CXXCOMPILE) -c -o $@ $<
+
+.cc.obj:
+@am__fastdepCXX_TRUE@	$(CXXCOMPILE) -MT $@ -MD -MP -MF $(DEPDIR)/$*.Tpo -c -o $@ `$(CYGPATH_W) '$<'`
+@am__fastdepCXX_TRUE@	$(am__mv) $(DEPDIR)/$*.Tpo $(DEPDIR)/$*.Po
+@AMDEP_TRUE@@am__fastdepCXX_FALSE@	source='$<' object='$@' libtool=no @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCXX_FALSE@	DEPDIR=$(DEPDIR) $(CXXDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCXX_FALSE@	$(CXXCOMPILE) -c -o $@ `$(CYGPATH_W) '$<'`
+
+.cc.lo:
+@am__fastdepCXX_TRUE@	$(LTCXXCOMPILE) -MT $@ -MD -MP -MF $(DEPDIR)/$*.Tpo -c -o $@ $<
+@am__fastdepCXX_TRUE@	$(am__mv) $(DEPDIR)/$*.Tpo $(DEPDIR)/$*.Plo
+@AMDEP_TRUE@@am__fastdepCXX_FALSE@	source='$<' object='$@' libtool=yes @AMDEPBACKSLASH@
+@AMDEP_TRUE@@am__fastdepCXX_FALSE@	DEPDIR=$(DEPDIR) $(CXXDEPMODE) $(depcomp) @AMDEPBACKSLASH@
+@am__fastdepCXX_FALSE@	$(LTCXXCOMPILE) -c -o $@ $<
+
+mostlyclean-libtool:
+	-rm -f *.lo
+
+clean-libtool:
+	-rm -rf .libs _libs
+
+ID: $(HEADERS) $(SOURCES) $(LISP) $(TAGS_FILES)
+	list='$(SOURCES) $(HEADERS) $(LISP) $(TAGS_FILES)'; \
+	unique=`for i in $$list; do \
+	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
+	  done | \
+	  $(AWK) '{ files[$$0] = 1; nonempty = 1; } \
+	      END { if (nonempty) { for (i in files) print i; }; }'`; \
+	mkid -fID $$unique
+tags: TAGS
+
+TAGS:  $(HEADERS) $(SOURCES)  $(TAGS_DEPENDENCIES) \
+		$(TAGS_FILES) $(LISP)
+	set x; \
+	here=`pwd`; \
+	list='$(SOURCES) $(HEADERS)  $(LISP) $(TAGS_FILES)'; \
+	unique=`for i in $$list; do \
+	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
+	  done | \
+	  $(AWK) '{ files[$$0] = 1; nonempty = 1; } \
+	      END { if (nonempty) { for (i in files) print i; }; }'`; \
+	shift; \
+	if test -z "$(ETAGS_ARGS)$$*$$unique"; then :; else \
+	  test -n "$$unique" || unique=$$empty_fix; \
+	  if test $$# -gt 0; then \
+	    $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
+	      "$$@" $$unique; \
+	  else \
+	    $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \
+	      $$unique; \
+	  fi; \
+	fi
+ctags: CTAGS
+CTAGS:  $(HEADERS) $(SOURCES)  $(TAGS_DEPENDENCIES) \
+		$(TAGS_FILES) $(LISP)
+	list='$(SOURCES) $(HEADERS)  $(LISP) $(TAGS_FILES)'; \
+	unique=`for i in $$list; do \
+	    if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \
+	  done | \
+	  $(AWK) '{ files[$$0] = 1; nonempty = 1; } \
+	      END { if (nonempty) { for (i in files) print i; }; }'`; \
+	test -z "$(CTAGS_ARGS)$$unique" \
+	  || $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \
+	     $$unique
+
+GTAGS:
+	here=`$(am__cd) $(top_builddir) && pwd` \
+	  && $(am__cd) $(top_srcdir) \
+	  && gtags -i $(GTAGS_ARGS) "$$here"
+
+distclean-tags:
+	-rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags
+check-am: all-am
+check: check-am
+all-am: Makefile $(LTLIBRARIES)
+installdirs:
+	for dir in "$(DESTDIR)$(toolexeclibdir)"; do \
+	  test -z "$$dir" || $(MKDIR_P) "$$dir"; \
+	done
+install: install-am
+install-exec: install-exec-am
+install-data: install-data-am
+uninstall: uninstall-am
+
+install-am: all-am
+	@$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am
+
+installcheck: installcheck-am
+install-strip:
+	$(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \
+	  install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \
+	  `test -z '$(STRIP)' || \
+	    echo "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'"` install
+mostlyclean-generic:
+
+clean-generic:
+
+distclean-generic:
+	-test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES)
+	-test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES)
+
+maintainer-clean-generic:
+	@echo "This command is intended for maintainers to use"
+	@echo "it deletes files that may require special tools to rebuild."
+clean: clean-am
+
+clean-am: clean-generic clean-libtool clean-toolexeclibLTLIBRARIES \
+	mostlyclean-am
+
+distclean: distclean-am
+	-rm -rf ./$(DEPDIR)
+	-rm -f Makefile
+distclean-am: clean-am distclean-compile distclean-generic \
+	distclean-tags
+
+dvi: dvi-am
+
+dvi-am:
+
+html: html-am
+
+html-am:
+
+info: info-am
+
+info-am:
+
+install-data-am:
+
+install-dvi: install-dvi-am
+
+install-dvi-am:
+
+install-exec-am: install-toolexeclibLTLIBRARIES
+
+install-html: install-html-am
+
+install-html-am:
+
+install-info: install-info-am
+
+install-info-am:
+
+install-man:
+
+install-pdf: install-pdf-am
+
+install-pdf-am:
+
+install-ps: install-ps-am
+
+install-ps-am:
+
+installcheck-am:
+
+maintainer-clean: maintainer-clean-am
+	-rm -rf ./$(DEPDIR)
+	-rm -f Makefile
+maintainer-clean-am: distclean-am maintainer-clean-generic
+
+mostlyclean: mostlyclean-am
+
+mostlyclean-am: mostlyclean-compile mostlyclean-generic \
+	mostlyclean-libtool
+
+pdf: pdf-am
+
+pdf-am:
+
+ps: ps-am
+
+ps-am:
+
+uninstall-am: uninstall-toolexeclibLTLIBRARIES
+
+.MAKE: install-am install-strip
+
+.PHONY: CTAGS GTAGS all all-am check check-am clean clean-generic \
+	clean-libtool clean-toolexeclibLTLIBRARIES ctags distclean \
+	distclean-compile distclean-generic distclean-libtool \
+	distclean-tags dvi dvi-am html html-am info info-am install \
+	install-am install-data install-data-am install-dvi \
+	install-dvi-am install-exec install-exec-am install-html \
+	install-html-am install-info install-info-am install-man \
+	install-pdf install-pdf-am install-ps install-ps-am \
+	install-strip install-toolexeclibLTLIBRARIES installcheck \
+	installcheck-am installdirs maintainer-clean \
+	maintainer-clean-generic mostlyclean mostlyclean-compile \
+	mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \
+	tags uninstall uninstall-am uninstall-toolexeclibLTLIBRARIES
+
+
+# Tell versions [3.59,3.63) of GNU make to not export all variables.
+# Otherwise a system limit (for SysV at least) may be exceeded.
+.NOEXPORT:
--- libsanitizer/configure.jj	2013-01-08 09:08:49.000000000 +0100
+++ libsanitizer/configure	2013-06-06 09:07:24.127865389 +0200
@@ -14543,7 +14543,7 @@ fi
 ac_config_files="$ac_config_files Makefile"
 
 
-ac_config_files="$ac_config_files interception/Makefile sanitizer_common/Makefile asan/Makefile"
+ac_config_files="$ac_config_files interception/Makefile sanitizer_common/Makefile asan/Makefile ubsan/Makefile"
 
 
 if test "x$TSAN_SUPPORTED" = "xyes"; then
@@ -15674,6 +15674,7 @@ do
     "interception/Makefile") CONFIG_FILES="$CONFIG_FILES interception/Makefile" ;;
     "sanitizer_common/Makefile") CONFIG_FILES="$CONFIG_FILES sanitizer_common/Makefile" ;;
     "asan/Makefile") CONFIG_FILES="$CONFIG_FILES asan/Makefile" ;;
+    "ubsan/Makefile") CONFIG_FILES="$CONFIG_FILES ubsan/Makefile" ;;
     "tsan/Makefile") CONFIG_FILES="$CONFIG_FILES tsan/Makefile" ;;
 
   *) as_fn_error "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
@@ -17032,6 +17033,17 @@ _EOF
 s!`test -f '$<' || echo '$(srcdir)/'`!!
 _EOF
    sed -f vpsed$$ $ac_file > tmp$$
+   mv tmp$$ $ac_file
+   rm vpsed$$
+   echo 'MULTISUBDIR =' >> $ac_file
+   ml_norecursion=yes
+   . ${multi_basedir}/config-ml.in
+   { ml_norecursion=; unset ml_norecursion;}
+ ;;
+    "ubsan/Makefile":F) cat > vpsed$$ << \_EOF
+s!`test -f '$<' || echo '$(srcdir)/'`!!
+_EOF
+   sed -f vpsed$$ $ac_file > tmp$$
    mv tmp$$ $ac_file
    rm vpsed$$
    echo 'MULTISUBDIR =' >> $ac_file
--- libsanitizer/Makefile.in.jj	2013-03-22 15:13:50.000000000 +0100
+++ libsanitizer/Makefile.in	2013-06-06 09:07:26.707894720 +0200
@@ -76,7 +76,7 @@ AM_RECURSIVE_TARGETS = $(RECURSIVE_TARGE
 	$(RECURSIVE_CLEAN_TARGETS:-recursive=) tags TAGS ctags CTAGS
 ETAGS = etags
 CTAGS = ctags
-DIST_SUBDIRS = interception sanitizer_common asan tsan
+DIST_SUBDIRS = interception sanitizer_common asan ubsan tsan
 ACLOCAL = @ACLOCAL@
 AMTAR = @AMTAR@
 AR = @AR@
@@ -209,9 +209,9 @@ top_build_prefix = @top_build_prefix@
 top_builddir = @top_builddir@
 top_srcdir = @top_srcdir@
 ACLOCAL_AMFLAGS = -I .. -I ../config
-@TSAN_SUPPORTED_FALSE@SUBDIRS = interception sanitizer_common asan 
-@TSAN_SUPPORTED_TRUE@SUBDIRS = interception sanitizer_common asan tsan 
-@USING_MAC_INTERPOSE_TRUE@SUBDIRS = sanitizer_common asan
+@TSAN_SUPPORTED_FALSE@SUBDIRS = interception sanitizer_common asan ubsan
+@TSAN_SUPPORTED_TRUE@SUBDIRS = interception sanitizer_common asan tsan ubsan
+@USING_MAC_INTERPOSE_TRUE@SUBDIRS = sanitizer_common asan ubsan
 
 # Work around what appears to be a GNU make bug handling MAKEFLAGS
 # values defined in terms of make variables, as is the case for CC and

[-- Attachment #3: P2 --]
[-- Type: text/plain, Size: 55131 bytes --]

--- libsanitizer/ubsan/ubsan_diag.cc	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_diag.cc	2013-06-06 08:51:12.000000000 +0200
@@ -0,0 +1,261 @@
+//===-- ubsan_diag.cc -----------------------------------------------------===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Diagnostic reporting for the UBSan runtime.
+//
+//===----------------------------------------------------------------------===//
+
+#include "ubsan_diag.h"
+#include "sanitizer_common/sanitizer_common.h"
+#include "sanitizer_common/sanitizer_libc.h"
+#include "sanitizer_common/sanitizer_report_decorator.h"
+#include "sanitizer_common/sanitizer_stacktrace.h"
+#include "sanitizer_common/sanitizer_symbolizer.h"
+#include <stdio.h>
+
+using namespace __ubsan;
+
+Location __ubsan::getCallerLocation(uptr CallerLoc) {
+  if (!CallerLoc)
+    return Location();
+
+  uptr Loc = StackTrace::GetPreviousInstructionPc(CallerLoc);
+
+  AddressInfo Info;
+  if (!SymbolizeCode(Loc, &Info, 1) || !Info.module || !*Info.module)
+    return Location(Loc);
+
+  if (!Info.file)
+    return ModuleLocation(Info.module, Info.module_offset);
+
+  return SourceLocation(Info.file, Info.line, Info.column);
+}
+
+Diag &Diag::operator<<(const TypeDescriptor &V) {
+  return AddArg(V.getTypeName());
+}
+
+Diag &Diag::operator<<(const Value &V) {
+  if (V.getType().isSignedIntegerTy())
+    AddArg(V.getSIntValue());
+  else if (V.getType().isUnsignedIntegerTy())
+    AddArg(V.getUIntValue());
+  else if (V.getType().isFloatTy())
+    AddArg(V.getFloatValue());
+  else
+    AddArg("<unknown>");
+  return *this;
+}
+
+/// Hexadecimal printing for numbers too large for Printf to handle directly.
+static void PrintHex(UIntMax Val) {
+#if HAVE_INT128_T
+  Printf("0x%08x%08x%08x%08x",
+          (unsigned int)(Val >> 96),
+          (unsigned int)(Val >> 64),
+          (unsigned int)(Val >> 32),
+          (unsigned int)(Val));
+#else
+  UNREACHABLE("long long smaller than 64 bits?");
+#endif
+}
+
+static void renderLocation(Location Loc) {
+  switch (Loc.getKind()) {
+  case Location::LK_Source: {
+    SourceLocation SLoc = Loc.getSourceLocation();
+    if (SLoc.isInvalid())
+      Printf("<unknown>:");
+    else {
+      Printf("%s:%d:", SLoc.getFilename(), SLoc.getLine());
+      if (SLoc.getColumn())
+        Printf("%d:", SLoc.getColumn());
+    }
+    break;
+  }
+  case Location::LK_Module:
+    Printf("%s:0x%zx:", Loc.getModuleLocation().getModuleName(),
+           Loc.getModuleLocation().getOffset());
+    break;
+  case Location::LK_Memory:
+    Printf("%p:", Loc.getMemoryLocation());
+    break;
+  case Location::LK_Null:
+    Printf("<unknown>:");
+    break;
+  }
+}
+
+static void renderText(const char *Message, const Diag::Arg *Args) {
+  for (const char *Msg = Message; *Msg; ++Msg) {
+    if (*Msg != '%') {
+      char Buffer[64];
+      unsigned I;
+      for (I = 0; Msg[I] && Msg[I] != '%' && I != 63; ++I)
+        Buffer[I] = Msg[I];
+      Buffer[I] = '\0';
+      Printf(Buffer);
+      Msg += I - 1;
+    } else {
+      const Diag::Arg &A = Args[*++Msg - '0'];
+      switch (A.Kind) {
+      case Diag::AK_String:
+        Printf("%s", A.String);
+        break;
+      case Diag::AK_Mangled: {
+        Printf("'%s'", Demangle(A.String));
+        break;
+      }
+      case Diag::AK_SInt:
+        // 'long long' is guaranteed to be at least 64 bits wide.
+        if (A.SInt >= INT64_MIN && A.SInt <= INT64_MAX)
+          Printf("%lld", (long long)A.SInt);
+        else
+          PrintHex(A.SInt);
+        break;
+      case Diag::AK_UInt:
+        if (A.UInt <= UINT64_MAX)
+          Printf("%llu", (unsigned long long)A.UInt);
+        else
+          PrintHex(A.UInt);
+        break;
+      case Diag::AK_Float: {
+        // FIXME: Support floating-point formatting in sanitizer_common's
+        //        printf, and stop using snprintf here.
+        char Buffer[32];
+        snprintf(Buffer, sizeof(Buffer), "%Lg", (long double)A.Float);
+        Printf("%s", Buffer);
+        break;
+      }
+      case Diag::AK_Pointer:
+        Printf("%p", A.Pointer);
+        break;
+      }
+    }
+  }
+}
+
+/// Find the earliest-starting range in Ranges which ends after Loc.
+static Range *upperBound(MemoryLocation Loc, Range *Ranges,
+                         unsigned NumRanges) {
+  Range *Best = 0;
+  for (unsigned I = 0; I != NumRanges; ++I)
+    if (Ranges[I].getEnd().getMemoryLocation() > Loc &&
+        (!Best ||
+         Best->getStart().getMemoryLocation() >
+         Ranges[I].getStart().getMemoryLocation()))
+      Best = &Ranges[I];
+  return Best;
+}
+
+/// Render a snippet of the address space near a location.
+static void renderMemorySnippet(const __sanitizer::AnsiColorDecorator &Decor,
+                                MemoryLocation Loc,
+                                Range *Ranges, unsigned NumRanges,
+                                const Diag::Arg *Args) {
+  const unsigned BytesToShow = 32;
+  const unsigned MinBytesNearLoc = 4;
+
+  // Show at least the 8 bytes surrounding Loc.
+  MemoryLocation Min = Loc - MinBytesNearLoc, Max = Loc + MinBytesNearLoc;
+  for (unsigned I = 0; I < NumRanges; ++I) {
+    Min = __sanitizer::Min(Ranges[I].getStart().getMemoryLocation(), Min);
+    Max = __sanitizer::Max(Ranges[I].getEnd().getMemoryLocation(), Max);
+  }
+
+  // If we have too many interesting bytes, prefer to show bytes after Loc.
+  if (Max - Min > BytesToShow)
+    Min = __sanitizer::Min(Max - BytesToShow, Loc - MinBytesNearLoc);
+  Max = Min + BytesToShow;
+
+  // Emit data.
+  for (uptr P = Min; P != Max; ++P) {
+    // FIXME: Check that the address is readable before printing it.
+    unsigned char C = *reinterpret_cast<const unsigned char*>(P);
+    Printf("%s%02x", (P % 8 == 0) ? "  " : " ", C);
+  }
+  Printf("\n");
+
+  // Emit highlights.
+  Printf(Decor.Green());
+  Range *InRange = upperBound(Min, Ranges, NumRanges);
+  for (uptr P = Min; P != Max; ++P) {
+    char Pad = ' ', Byte = ' ';
+    if (InRange && InRange->getEnd().getMemoryLocation() == P)
+      InRange = upperBound(P, Ranges, NumRanges);
+    if (!InRange && P > Loc)
+      break;
+    if (InRange && InRange->getStart().getMemoryLocation() < P)
+      Pad = '~';
+    if (InRange && InRange->getStart().getMemoryLocation() <= P)
+      Byte = '~';
+    char Buffer[] = { Pad, Pad, P == Loc ? '^' : Byte, Byte, 0 };
+    Printf((P % 8 == 0) ? Buffer : &Buffer[1]);
+  }
+  Printf("%s\n", Decor.Default());
+
+  // Go over the line again, and print names for the ranges.
+  InRange = 0;
+  unsigned Spaces = 0;
+  for (uptr P = Min; P != Max; ++P) {
+    if (!InRange || InRange->getEnd().getMemoryLocation() == P)
+      InRange = upperBound(P, Ranges, NumRanges);
+    if (!InRange)
+      break;
+
+    Spaces += (P % 8) == 0 ? 2 : 1;
+
+    if (InRange && InRange->getStart().getMemoryLocation() == P) {
+      while (Spaces--)
+        Printf(" ");
+      renderText(InRange->getText(), Args);
+      Printf("\n");
+      // FIXME: We only support naming one range for now!
+      break;
+    }
+
+    Spaces += 2;
+  }
+
+  // FIXME: Print names for anything we can identify within the line:
+  //
+  //  * If we can identify the memory itself as belonging to a particular
+  //    global, stack variable, or dynamic allocation, then do so.
+  //
+  //  * If we have a pointer-size, pointer-aligned range highlighted,
+  //    determine whether the value of that range is a pointer to an
+  //    entity which we can name, and if so, print that name.
+  //
+  // This needs an external symbolizer, or (preferably) ASan instrumentation.
+}
+
+Diag::~Diag() {
+  __sanitizer::AnsiColorDecorator Decor(PrintsToTty());
+  SpinMutexLock l(&CommonSanitizerReportMutex);
+  Printf(Decor.Bold());
+
+  renderLocation(Loc);
+
+  switch (Level) {
+  case DL_Error:
+    Printf("%s runtime error: %s%s",
+           Decor.Red(), Decor.Default(), Decor.Bold());
+    break;
+
+  case DL_Note:
+    Printf("%s note: %s", Decor.Black(), Decor.Default());
+    break;
+  }
+
+  renderText(Message, Args);
+
+  Printf("%s\n", Decor.Default());
+
+  if (Loc.isMemoryLocation())
+    renderMemorySnippet(Decor, Loc.getMemoryLocation(), Ranges,
+                        NumRanges, Args);
+}
--- libsanitizer/ubsan/ubsan_diag.h	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_diag.h	2013-01-10 11:34:39.000000000 +0100
@@ -0,0 +1,200 @@
+//===-- ubsan_diag.h --------------------------------------------*- C++ -*-===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Diagnostics emission for Clang's undefined behavior sanitizer.
+//
+//===----------------------------------------------------------------------===//
+#ifndef UBSAN_DIAG_H
+#define UBSAN_DIAG_H
+
+#include "ubsan_value.h"
+
+namespace __ubsan {
+
+/// \brief A location within a loaded module in the program. These are used when
+/// the location can't be resolved to a SourceLocation.
+class ModuleLocation {
+  const char *ModuleName;
+  uptr Offset;
+
+public:
+  ModuleLocation() : ModuleName(0), Offset(0) {}
+  ModuleLocation(const char *ModuleName, uptr Offset)
+    : ModuleName(ModuleName), Offset(Offset) {}
+  const char *getModuleName() const { return ModuleName; }
+  uptr getOffset() const { return Offset; }
+};
+
+/// A location of some data within the program's address space.
+typedef uptr MemoryLocation;
+
+/// \brief Location at which a diagnostic can be emitted. Either a
+/// SourceLocation, a ModuleLocation, or a MemoryLocation.
+class Location {
+public:
+  enum LocationKind { LK_Null, LK_Source, LK_Module, LK_Memory };
+
+private:
+  LocationKind Kind;
+  // FIXME: In C++11, wrap these in an anonymous union.
+  SourceLocation SourceLoc;
+  ModuleLocation ModuleLoc;
+  MemoryLocation MemoryLoc;
+
+public:
+  Location() : Kind(LK_Null) {}
+  Location(SourceLocation Loc) :
+    Kind(LK_Source), SourceLoc(Loc) {}
+  Location(ModuleLocation Loc) :
+    Kind(LK_Module), ModuleLoc(Loc) {}
+  Location(MemoryLocation Loc) :
+    Kind(LK_Memory), MemoryLoc(Loc) {}
+
+  LocationKind getKind() const { return Kind; }
+
+  bool isSourceLocation() const { return Kind == LK_Source; }
+  bool isModuleLocation() const { return Kind == LK_Module; }
+  bool isMemoryLocation() const { return Kind == LK_Memory; }
+
+  SourceLocation getSourceLocation() const {
+    CHECK(isSourceLocation());
+    return SourceLoc;
+  }
+  ModuleLocation getModuleLocation() const {
+    CHECK(isModuleLocation());
+    return ModuleLoc;
+  }
+  MemoryLocation getMemoryLocation() const {
+    CHECK(isMemoryLocation());
+    return MemoryLoc;
+  }
+};
+
+/// Try to obtain a location for the caller. This might fail, and produce either
+/// an invalid location or a module location for the caller.
+Location getCallerLocation(uptr CallerLoc = GET_CALLER_PC());
+
+/// A diagnostic severity level.
+enum DiagLevel {
+  DL_Error, ///< An error.
+  DL_Note   ///< A note, attached to a prior diagnostic.
+};
+
+/// \brief Annotation for a range of locations in a diagnostic.
+class Range {
+  Location Start, End;
+  const char *Text;
+
+public:
+  Range() : Start(), End(), Text() {}
+  Range(MemoryLocation Start, MemoryLocation End, const char *Text)
+    : Start(Start), End(End), Text(Text) {}
+  Location getStart() const { return Start; }
+  Location getEnd() const { return End; }
+  const char *getText() const { return Text; }
+};
+
+/// \brief A mangled C++ name. Really just a strong typedef for 'const char*'.
+class MangledName {
+  const char *Name;
+public:
+  MangledName(const char *Name) : Name(Name) {}
+  const char *getName() const { return Name; }
+};
+
+/// \brief Representation of an in-flight diagnostic.
+///
+/// Temporary \c Diag instances are created by the handler routines to
+/// accumulate arguments for a diagnostic. The destructor emits the diagnostic
+/// message.
+class Diag {
+  /// The location at which the problem occurred.
+  Location Loc;
+
+  /// The diagnostic level.
+  DiagLevel Level;
+
+  /// The message which will be emitted, with %0, %1, ... placeholders for
+  /// arguments.
+  const char *Message;
+
+public:
+  /// Kinds of arguments, corresponding to members of \c Arg's union.
+  enum ArgKind {
+    AK_String, ///< A string argument, displayed as-is.
+    AK_Mangled,///< A C++ mangled name, demangled before display.
+    AK_UInt,   ///< An unsigned integer argument.
+    AK_SInt,   ///< A signed integer argument.
+    AK_Float,  ///< A floating-point argument.
+    AK_Pointer ///< A pointer argument, displayed in hexadecimal.
+  };
+
+  /// An individual diagnostic message argument.
+  struct Arg {
+    Arg() {}
+    Arg(const char *String) : Kind(AK_String), String(String) {}
+    Arg(MangledName MN) : Kind(AK_Mangled), String(MN.getName()) {}
+    Arg(UIntMax UInt) : Kind(AK_UInt), UInt(UInt) {}
+    Arg(SIntMax SInt) : Kind(AK_SInt), SInt(SInt) {}
+    Arg(FloatMax Float) : Kind(AK_Float), Float(Float) {}
+    Arg(const void *Pointer) : Kind(AK_Pointer), Pointer(Pointer) {}
+
+    ArgKind Kind;
+    union {
+      const char *String;
+      UIntMax UInt;
+      SIntMax SInt;
+      FloatMax Float;
+      const void *Pointer;
+    };
+  };
+
+private:
+  static const unsigned MaxArgs = 5;
+  static const unsigned MaxRanges = 1;
+
+  /// The arguments which have been added to this diagnostic so far.
+  Arg Args[MaxArgs];
+  unsigned NumArgs;
+
+  /// The ranges which have been added to this diagnostic so far.
+  Range Ranges[MaxRanges];
+  unsigned NumRanges;
+
+  Diag &AddArg(Arg A) {
+    CHECK(NumArgs != MaxArgs);
+    Args[NumArgs++] = A;
+    return *this;
+  }
+
+  Diag &AddRange(Range A) {
+    CHECK(NumRanges != MaxRanges);
+    Ranges[NumRanges++] = A;
+    return *this;
+  }
+
+  /// \c Diag objects are not copyable.
+  Diag(const Diag &); // NOT IMPLEMENTED
+  Diag &operator=(const Diag &);
+
+public:
+  Diag(Location Loc, DiagLevel Level, const char *Message)
+    : Loc(Loc), Level(Level), Message(Message), NumArgs(0), NumRanges(0) {}
+  ~Diag();
+
+  Diag &operator<<(const char *Str) { return AddArg(Str); }
+  Diag &operator<<(MangledName MN) { return AddArg(MN); }
+  Diag &operator<<(unsigned long long V) { return AddArg(UIntMax(V)); }
+  Diag &operator<<(const void *V) { return AddArg(V); }
+  Diag &operator<<(const TypeDescriptor &V);
+  Diag &operator<<(const Value &V);
+  Diag &operator<<(const Range &R) { return AddRange(R); }
+};
+
+} // namespace __ubsan
+
+#endif // UBSAN_DIAG_H
--- libsanitizer/ubsan/ubsan_handlers.cc	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_handlers.cc	2013-06-06 08:51:12.000000000 +0200
@@ -0,0 +1,258 @@
+//===-- ubsan_handlers.cc -------------------------------------------------===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Error logging entry points for the UBSan runtime.
+//
+//===----------------------------------------------------------------------===//
+
+#include "ubsan_handlers.h"
+#include "ubsan_diag.h"
+
+#include "sanitizer_common/sanitizer_common.h"
+
+using namespace __sanitizer;
+using namespace __ubsan;
+
+namespace __ubsan {
+  const char *TypeCheckKinds[] = {
+    "load of", "store to", "reference binding to", "member access within",
+    "member call on", "constructor call on", "downcast of", "downcast of"
+  };
+}
+
+static void handleTypeMismatchImpl(TypeMismatchData *Data, ValueHandle Pointer,
+                                   Location FallbackLoc) {
+  Location Loc = Data->Loc.acquire();
+
+  // Use the SourceLocation from Data to track deduplication, even if 'invalid'
+  if (Loc.getSourceLocation().isDisabled())
+    return;
+  if (Data->Loc.isInvalid())
+    Loc = FallbackLoc;
+
+  if (!Pointer)
+    Diag(Loc, DL_Error, "%0 null pointer of type %1")
+      << TypeCheckKinds[Data->TypeCheckKind] << Data->Type;
+  else if (Data->Alignment && (Pointer & (Data->Alignment - 1)))
+    Diag(Loc, DL_Error, "%0 misaligned address %1 for type %3, "
+                        "which requires %2 byte alignment")
+      << TypeCheckKinds[Data->TypeCheckKind] << (void*)Pointer
+      << Data->Alignment << Data->Type;
+  else
+    Diag(Loc, DL_Error, "%0 address %1 with insufficient space "
+                        "for an object of type %2")
+      << TypeCheckKinds[Data->TypeCheckKind] << (void*)Pointer << Data->Type;
+  if (Pointer)
+    Diag(Pointer, DL_Note, "pointer points here");
+}
+void __ubsan::__ubsan_handle_type_mismatch(TypeMismatchData *Data,
+                                           ValueHandle Pointer) {
+  handleTypeMismatchImpl(Data, Pointer, getCallerLocation());
+}
+void __ubsan::__ubsan_handle_type_mismatch_abort(TypeMismatchData *Data,
+                                                 ValueHandle Pointer) {
+  handleTypeMismatchImpl(Data, Pointer, getCallerLocation());
+  Die();
+}
+
+/// \brief Common diagnostic emission for various forms of integer overflow.
+template<typename T> static void HandleIntegerOverflow(OverflowData *Data,
+                                                       ValueHandle LHS,
+                                                       const char *Operator,
+                                                       T RHS) {
+  SourceLocation Loc = Data->Loc.acquire();
+  if (Loc.isDisabled())
+    return;
+
+  Diag(Loc, DL_Error, "%0 integer overflow: "
+                      "%1 %2 %3 cannot be represented in type %4")
+    << (Data->Type.isSignedIntegerTy() ? "signed" : "unsigned")
+    << Value(Data->Type, LHS) << Operator << RHS << Data->Type;
+}
+
+void __ubsan::__ubsan_handle_add_overflow(OverflowData *Data,
+                                          ValueHandle LHS, ValueHandle RHS) {
+  HandleIntegerOverflow(Data, LHS, "+", Value(Data->Type, RHS));
+}
+void __ubsan::__ubsan_handle_add_overflow_abort(OverflowData *Data,
+                                                 ValueHandle LHS,
+                                                 ValueHandle RHS) {
+  __ubsan_handle_add_overflow(Data, LHS, RHS);
+  Die();
+}
+
+void __ubsan::__ubsan_handle_sub_overflow(OverflowData *Data,
+                                          ValueHandle LHS, ValueHandle RHS) {
+  HandleIntegerOverflow(Data, LHS, "-", Value(Data->Type, RHS));
+}
+void __ubsan::__ubsan_handle_sub_overflow_abort(OverflowData *Data,
+                                                 ValueHandle LHS,
+                                                 ValueHandle RHS) {
+  __ubsan_handle_sub_overflow(Data, LHS, RHS);
+  Die();
+}
+
+void __ubsan::__ubsan_handle_mul_overflow(OverflowData *Data,
+                                          ValueHandle LHS, ValueHandle RHS) {
+  HandleIntegerOverflow(Data, LHS, "*", Value(Data->Type, RHS));
+}
+void __ubsan::__ubsan_handle_mul_overflow_abort(OverflowData *Data,
+                                                 ValueHandle LHS,
+                                                 ValueHandle RHS) {
+  __ubsan_handle_mul_overflow(Data, LHS, RHS);
+  Die();
+}
+
+void __ubsan::__ubsan_handle_negate_overflow(OverflowData *Data,
+                                             ValueHandle OldVal) {
+  SourceLocation Loc = Data->Loc.acquire();
+  if (Loc.isDisabled())
+    return;
+
+  if (Data->Type.isSignedIntegerTy())
+    Diag(Loc, DL_Error,
+         "negation of %0 cannot be represented in type %1; "
+         "cast to an unsigned type to negate this value to itself")
+      << Value(Data->Type, OldVal) << Data->Type;
+  else
+    Diag(Loc, DL_Error,
+         "negation of %0 cannot be represented in type %1")
+      << Value(Data->Type, OldVal) << Data->Type;
+}
+void __ubsan::__ubsan_handle_negate_overflow_abort(OverflowData *Data,
+                                                    ValueHandle OldVal) {
+  __ubsan_handle_negate_overflow(Data, OldVal);
+  Die();
+}
+
+void __ubsan::__ubsan_handle_divrem_overflow(OverflowData *Data,
+                                             ValueHandle LHS, ValueHandle RHS) {
+  SourceLocation Loc = Data->Loc.acquire();
+  if (Loc.isDisabled())
+    return;
+
+  Value LHSVal(Data->Type, LHS);
+  Value RHSVal(Data->Type, RHS);
+  if (RHSVal.isMinusOne())
+    Diag(Loc, DL_Error,
+         "division of %0 by -1 cannot be represented in type %1")
+      << LHSVal << Data->Type;
+  else
+    Diag(Loc, DL_Error, "division by zero");
+}
+void __ubsan::__ubsan_handle_divrem_overflow_abort(OverflowData *Data,
+                                                    ValueHandle LHS,
+                                                    ValueHandle RHS) {
+  __ubsan_handle_divrem_overflow(Data, LHS, RHS);
+  Die();
+}
+
+void __ubsan::__ubsan_handle_shift_out_of_bounds(ShiftOutOfBoundsData *Data,
+                                                 ValueHandle LHS,
+                                                 ValueHandle RHS) {
+  SourceLocation Loc = Data->Loc.acquire();
+  if (Loc.isDisabled())
+    return;
+
+  Value LHSVal(Data->LHSType, LHS);
+  Value RHSVal(Data->RHSType, RHS);
+  if (RHSVal.isNegative())
+    Diag(Loc, DL_Error, "shift exponent %0 is negative") << RHSVal;
+  else if (RHSVal.getPositiveIntValue() >= Data->LHSType.getIntegerBitWidth())
+    Diag(Loc, DL_Error,
+         "shift exponent %0 is too large for %1-bit type %2")
+      << RHSVal << Data->LHSType.getIntegerBitWidth() << Data->LHSType;
+  else if (LHSVal.isNegative())
+    Diag(Loc, DL_Error, "left shift of negative value %0") << LHSVal;
+  else
+    Diag(Loc, DL_Error,
+         "left shift of %0 by %1 places cannot be represented in type %2")
+      << LHSVal << RHSVal << Data->LHSType;
+}
+void __ubsan::__ubsan_handle_shift_out_of_bounds_abort(
+                                                     ShiftOutOfBoundsData *Data,
+                                                     ValueHandle LHS,
+                                                     ValueHandle RHS) {
+  __ubsan_handle_shift_out_of_bounds(Data, LHS, RHS);
+  Die();
+}
+
+void __ubsan::__ubsan_handle_out_of_bounds(OutOfBoundsData *Data,
+                                           ValueHandle Index) {
+  SourceLocation Loc = Data->Loc.acquire();
+  if (Loc.isDisabled())
+    return;
+
+  Value IndexVal(Data->IndexType, Index);
+  Diag(Loc, DL_Error, "index %0 out of bounds for type %1")
+    << IndexVal << Data->ArrayType;
+}
+void __ubsan::__ubsan_handle_out_of_bounds_abort(OutOfBoundsData *Data,
+                                                 ValueHandle Index) {
+  __ubsan_handle_out_of_bounds(Data, Index);
+  Die();
+}
+
+void __ubsan::__ubsan_handle_builtin_unreachable(UnreachableData *Data) {
+  Diag(Data->Loc, DL_Error, "execution reached a __builtin_unreachable() call");
+  Die();
+}
+
+void __ubsan::__ubsan_handle_missing_return(UnreachableData *Data) {
+  Diag(Data->Loc, DL_Error,
+       "execution reached the end of a value-returning function "
+       "without returning a value");
+  Die();
+}
+
+void __ubsan::__ubsan_handle_vla_bound_not_positive(VLABoundData *Data,
+                                                    ValueHandle Bound) {
+  SourceLocation Loc = Data->Loc.acquire();
+  if (Loc.isDisabled())
+    return;
+
+  Diag(Loc, DL_Error, "variable length array bound evaluates to "
+                      "non-positive value %0")
+    << Value(Data->Type, Bound);
+}
+void __ubsan::__ubsan_handle_vla_bound_not_positive_abort(VLABoundData *Data,
+                                                           ValueHandle Bound) {
+  __ubsan_handle_vla_bound_not_positive(Data, Bound);
+  Die();
+}
+
+
+void __ubsan::__ubsan_handle_float_cast_overflow(FloatCastOverflowData *Data,
+                                                 ValueHandle From) {
+  // TODO: Add deduplication once a SourceLocation is generated for this check.
+  Diag(getCallerLocation(), DL_Error,
+       "value %0 is outside the range of representable values of type %2")
+    << Value(Data->FromType, From) << Data->FromType << Data->ToType;
+}
+void __ubsan::__ubsan_handle_float_cast_overflow_abort(
+                                                    FloatCastOverflowData *Data,
+                                                    ValueHandle From) {
+  Diag(getCallerLocation(), DL_Error,
+       "value %0 is outside the range of representable values of type %2")
+    << Value(Data->FromType, From) << Data->FromType << Data->ToType;
+  Die();
+}
+
+void __ubsan::__ubsan_handle_load_invalid_value(InvalidValueData *Data,
+                                                ValueHandle Val) {
+  // TODO: Add deduplication once a SourceLocation is generated for this check.
+  Diag(getCallerLocation(), DL_Error,
+       "load of value %0, which is not a valid value for type %1")
+    << Value(Data->Type, Val) << Data->Type;
+}
+void __ubsan::__ubsan_handle_load_invalid_value_abort(InvalidValueData *Data,
+                                                      ValueHandle Val) {
+  Diag(getCallerLocation(), DL_Error,
+       "load of value %0, which is not a valid value for type %1")
+    << Value(Data->Type, Val) << Data->Type;
+  Die();
+}
--- libsanitizer/ubsan/ubsan_handlers.h	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_handlers.h	2013-06-06 08:51:12.000000000 +0200
@@ -0,0 +1,115 @@
+//===-- ubsan_handlers.h ----------------------------------------*- C++ -*-===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Entry points to the runtime library for Clang's undefined behavior sanitizer.
+//
+//===----------------------------------------------------------------------===//
+#ifndef UBSAN_HANDLERS_H
+#define UBSAN_HANDLERS_H
+
+#include "ubsan_value.h"
+
+namespace __ubsan {
+
+struct TypeMismatchData {
+  SourceLocation Loc;
+  const TypeDescriptor &Type;
+  uptr Alignment;
+  unsigned char TypeCheckKind;
+};
+
+#define RECOVERABLE(checkname, ...) \
+  extern "C" SANITIZER_INTERFACE_ATTRIBUTE \
+    void __ubsan_handle_ ## checkname( __VA_ARGS__ ); \
+  extern "C" SANITIZER_INTERFACE_ATTRIBUTE \
+    void __ubsan_handle_ ## checkname ## _abort( __VA_ARGS__ );
+
+/// \brief Handle a runtime type check failure, caused by either a misaligned
+/// pointer, a null pointer, or a pointer to insufficient storage for the
+/// type.
+RECOVERABLE(type_mismatch, TypeMismatchData *Data, ValueHandle Pointer)
+
+struct OverflowData {
+  SourceLocation Loc;
+  const TypeDescriptor &Type;
+};
+
+/// \brief Handle an integer addition overflow.
+RECOVERABLE(add_overflow, OverflowData *Data, ValueHandle LHS, ValueHandle RHS)
+
+/// \brief Handle an integer subtraction overflow.
+RECOVERABLE(sub_overflow, OverflowData *Data, ValueHandle LHS, ValueHandle RHS)
+
+/// \brief Handle an integer multiplication overflow.
+RECOVERABLE(mul_overflow, OverflowData *Data, ValueHandle LHS, ValueHandle RHS)
+
+/// \brief Handle a signed integer overflow for a unary negate operator.
+RECOVERABLE(negate_overflow, OverflowData *Data, ValueHandle OldVal)
+
+/// \brief Handle an INT_MIN/-1 overflow or division by zero.
+RECOVERABLE(divrem_overflow, OverflowData *Data,
+            ValueHandle LHS, ValueHandle RHS)
+
+struct ShiftOutOfBoundsData {
+  SourceLocation Loc;
+  const TypeDescriptor &LHSType;
+  const TypeDescriptor &RHSType;
+};
+
+/// \brief Handle a shift where the RHS is out of bounds or a left shift where
+/// the LHS is negative or overflows.
+RECOVERABLE(shift_out_of_bounds, ShiftOutOfBoundsData *Data,
+            ValueHandle LHS, ValueHandle RHS)
+
+struct OutOfBoundsData {
+  SourceLocation Loc;
+  const TypeDescriptor &ArrayType;
+  const TypeDescriptor &IndexType;
+};
+
+/// \brief Handle an array index out of bounds error.
+RECOVERABLE(out_of_bounds, OutOfBoundsData *Data, ValueHandle Index)
+
+struct UnreachableData {
+  SourceLocation Loc;
+};
+
+/// \brief Handle a __builtin_unreachable which is reached.
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE
+void __ubsan_handle_builtin_unreachable(UnreachableData *Data);
+/// \brief Handle reaching the end of a value-returning function.
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE
+void __ubsan_handle_missing_return(UnreachableData *Data);
+
+struct VLABoundData {
+  SourceLocation Loc;
+  const TypeDescriptor &Type;
+};
+
+/// \brief Handle a VLA with a non-positive bound.
+RECOVERABLE(vla_bound_not_positive, VLABoundData *Data, ValueHandle Bound)
+
+struct FloatCastOverflowData {
+  // FIXME: SourceLocation Loc;
+  const TypeDescriptor &FromType;
+  const TypeDescriptor &ToType;
+};
+
+/// \brief Handle overflow in a conversion to or from a floating-point type.
+RECOVERABLE(float_cast_overflow, FloatCastOverflowData *Data, ValueHandle From)
+
+struct InvalidValueData {
+  // FIXME: SourceLocation Loc;
+  const TypeDescriptor &Type;
+};
+
+/// \brief Handle a load of an invalid value for the type.
+RECOVERABLE(load_invalid_value, InvalidValueData *Data, ValueHandle Val)
+
+}
+
+#endif // UBSAN_HANDLERS_H
--- libsanitizer/ubsan/ubsan_handlers_cxx.cc	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_handlers_cxx.cc	2013-02-14 09:35:03.000000000 +0100
@@ -0,0 +1,72 @@
+//===-- ubsan_handlers_cxx.cc ---------------------------------------------===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Error logging entry points for the UBSan runtime, which are only used for C++
+// compilations. This file is permitted to use language features which require
+// linking against a C++ ABI library.
+//
+//===----------------------------------------------------------------------===//
+
+#include "ubsan_handlers_cxx.h"
+#include "ubsan_diag.h"
+#include "ubsan_type_hash.h"
+
+#include "sanitizer_common/sanitizer_common.h"
+
+using namespace __sanitizer;
+using namespace __ubsan;
+
+namespace __ubsan {
+  extern const char *TypeCheckKinds[];
+}
+
+static void HandleDynamicTypeCacheMiss(
+    DynamicTypeCacheMissData *Data, ValueHandle Pointer, ValueHandle Hash,
+    bool Abort) {
+  if (checkDynamicType((void*)Pointer, Data->TypeInfo, Hash))
+    // Just a cache miss. The type matches after all.
+    return;
+
+  SourceLocation Loc = Data->Loc.acquire();
+  if (Loc.isDisabled())
+    return;
+
+  Diag(Loc, DL_Error,
+       "%0 address %1 which does not point to an object of type %2")
+    << TypeCheckKinds[Data->TypeCheckKind] << (void*)Pointer << Data->Type;
+
+  // If possible, say what type it actually points to.
+  DynamicTypeInfo DTI = getDynamicTypeInfo((void*)Pointer);
+  if (!DTI.isValid())
+    Diag(Pointer, DL_Note, "object has invalid vptr")
+      << MangledName(DTI.getMostDerivedTypeName())
+      << Range(Pointer, Pointer + sizeof(uptr), "invalid vptr");
+  else if (!DTI.getOffset())
+    Diag(Pointer, DL_Note, "object is of type %0")
+      << MangledName(DTI.getMostDerivedTypeName())
+      << Range(Pointer, Pointer + sizeof(uptr), "vptr for %0");
+  else
+    // FIXME: Find the type at the specified offset, and include that
+    //        in the note.
+    Diag(Pointer - DTI.getOffset(), DL_Note,
+         "object is base class subobject at offset %0 within object of type %1")
+      << DTI.getOffset() << MangledName(DTI.getMostDerivedTypeName())
+      << MangledName(DTI.getSubobjectTypeName())
+      << Range(Pointer, Pointer + sizeof(uptr), "vptr for %2 base class of %1");
+
+  if (Abort)
+    Die();
+}
+
+void __ubsan::__ubsan_handle_dynamic_type_cache_miss(
+    DynamicTypeCacheMissData *Data, ValueHandle Pointer, ValueHandle Hash) {
+  HandleDynamicTypeCacheMiss(Data, Pointer, Hash, false);
+}
+void __ubsan::__ubsan_handle_dynamic_type_cache_miss_abort(
+    DynamicTypeCacheMissData *Data, ValueHandle Pointer, ValueHandle Hash) {
+  HandleDynamicTypeCacheMiss(Data, Pointer, Hash, true);
+}
--- libsanitizer/ubsan/ubsan_handlers_cxx.h	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_handlers_cxx.h	2013-01-24 09:12:50.000000000 +0100
@@ -0,0 +1,38 @@
+//===-- ubsan_handlers_cxx.h ------------------------------------*- C++ -*-===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Entry points to the runtime library for Clang's undefined behavior sanitizer,
+// for C++-specific checks. This code is not linked into C binaries.
+//
+//===----------------------------------------------------------------------===//
+#ifndef UBSAN_HANDLERS_CXX_H
+#define UBSAN_HANDLERS_CXX_H
+
+#include "ubsan_value.h"
+
+namespace __ubsan {
+
+struct DynamicTypeCacheMissData {
+  SourceLocation Loc;
+  const TypeDescriptor &Type;
+  void *TypeInfo;
+  unsigned char TypeCheckKind;
+};
+
+/// \brief Handle a runtime type check failure, caused by an incorrect vptr.
+/// When this handler is called, all we know is that the type was not in the
+/// cache; this does not necessarily imply the existence of a bug.
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE
+void __ubsan_handle_dynamic_type_cache_miss(
+  DynamicTypeCacheMissData *Data, ValueHandle Pointer, ValueHandle Hash);
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE
+void __ubsan_handle_dynamic_type_cache_miss_abort(
+  DynamicTypeCacheMissData *Data, ValueHandle Pointer, ValueHandle Hash);
+
+}
+
+#endif // UBSAN_HANDLERS_H
--- libsanitizer/ubsan/ubsan_type_hash.cc	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_type_hash.cc	2013-06-06 08:51:12.000000000 +0200
@@ -0,0 +1,246 @@
+//===-- ubsan_type_hash.cc ------------------------------------------------===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Implementation of a hash table for fast checking of inheritance
+// relationships. This file is only linked into C++ compilations, and is
+// permitted to use language features which require a C++ ABI library.
+//
+//===----------------------------------------------------------------------===//
+
+#include "ubsan_type_hash.h"
+
+#include "sanitizer_common/sanitizer_common.h"
+
+// The following are intended to be binary compatible with the definitions
+// given in the Itanium ABI. We make no attempt to be ODR-compatible with
+// those definitions, since existing ABI implementations aren't.
+
+namespace std {
+  class type_info {
+  public:
+    virtual ~type_info();
+
+    const char *__type_name;
+  };
+}
+
+namespace __cxxabiv1 {
+
+/// Type info for classes with no bases, and base class for type info for
+/// classes with bases.
+class __class_type_info : public std::type_info {
+  virtual ~__class_type_info();
+};
+
+/// Type info for classes with simple single public inheritance.
+class __si_class_type_info : public __class_type_info {
+public:
+  virtual ~__si_class_type_info();
+
+  const __class_type_info *__base_type;
+};
+
+class __base_class_type_info {
+public:
+  const __class_type_info *__base_type;
+  long __offset_flags;
+
+  enum __offset_flags_masks {
+    __virtual_mask = 0x1,
+    __public_mask = 0x2,
+    __offset_shift = 8
+  };
+};
+
+/// Type info for classes with multiple, virtual, or non-public inheritance.
+class __vmi_class_type_info : public __class_type_info {
+public:
+  virtual ~__vmi_class_type_info();
+
+  unsigned int flags;
+  unsigned int base_count;
+  __base_class_type_info base_info[1];
+};
+
+}
+
+namespace abi = __cxxabiv1;
+
+// We implement a simple two-level cache for type-checking results. For each
+// (vptr,type) pair, a hash is computed. This hash is assumed to be globally
+// unique; if it collides, we will get false negatives, but:
+//  * such a collision would have to occur on the *first* bad access,
+//  * the probability of such a collision is low (and for a 64-bit target, is
+//    negligible), and
+//  * the vptr, and thus the hash, can be affected by ASLR, so multiple runs
+//    give better coverage.
+//
+// The first caching layer is a small hash table with no chaining; buckets are
+// reused as needed. The second caching layer is a large hash table with open
+// chaining. We can freely evict from either layer since this is just a cache.
+//
+// FIXME: Make these hash table accesses thread-safe. The races here are benign
+//        (worst-case, we could miss a bug or see a slowdown) but we should
+//        avoid upsetting race detectors.
+
+/// Find a bucket to store the given hash value in.
+static __ubsan::HashValue *getTypeCacheHashTableBucket(__ubsan::HashValue V) {
+  static const unsigned HashTableSize = 65537;
+  static __ubsan::HashValue __ubsan_vptr_hash_set[HashTableSize] = { 1 };
+
+  unsigned Probe = V & 65535;
+  for (int Tries = 5; Tries; --Tries) {
+    if (!__ubsan_vptr_hash_set[Probe] || __ubsan_vptr_hash_set[Probe] == V)
+      return &__ubsan_vptr_hash_set[Probe];
+    Probe += ((V >> 16) & 65535) + 1;
+    if (Probe >= HashTableSize)
+      Probe -= HashTableSize;
+  }
+  // FIXME: Pick a random entry from the probe sequence to evict rather than
+  //        just taking the first.
+  return &__ubsan_vptr_hash_set[V];
+}
+
+/// A cache of recently-checked hashes. Mini hash table with "random" evictions.
+__ubsan::HashValue
+__ubsan::__ubsan_vptr_type_cache[__ubsan::VptrTypeCacheSize] = { 1 };
+
+/// \brief Determine whether \p Derived has a \p Base base class subobject at
+/// offset \p Offset.
+static bool isDerivedFromAtOffset(const abi::__class_type_info *Derived,
+                                  const abi::__class_type_info *Base,
+                                  sptr Offset) {
+  if (Derived->__type_name == Base->__type_name)
+    return Offset == 0;
+
+  if (const abi::__si_class_type_info *SI =
+        dynamic_cast<const abi::__si_class_type_info*>(Derived))
+    return isDerivedFromAtOffset(SI->__base_type, Base, Offset);
+
+  const abi::__vmi_class_type_info *VTI =
+    dynamic_cast<const abi::__vmi_class_type_info*>(Derived);
+  if (!VTI)
+    // No base class subobjects.
+    return false;
+
+  // Look for a base class which is derived from \p Base at the right offset.
+  for (unsigned int base = 0; base != VTI->base_count; ++base) {
+    // FIXME: Curtail the recursion if this base can't possibly contain the
+    //        given offset.
+    sptr OffsetHere = VTI->base_info[base].__offset_flags >>
+                      abi::__base_class_type_info::__offset_shift;
+    if (VTI->base_info[base].__offset_flags &
+          abi::__base_class_type_info::__virtual_mask)
+      // For now, just punt on virtual bases and say 'yes'.
+      // FIXME: OffsetHere is the offset in the vtable of the virtual base
+      //        offset. Read the vbase offset out of the vtable and use it.
+      return true;
+    if (isDerivedFromAtOffset(VTI->base_info[base].__base_type,
+                              Base, Offset - OffsetHere))
+      return true;
+  }
+
+  return false;
+}
+
+/// \brief Find the derived-most dynamic base class of \p Derived at offset
+/// \p Offset.
+static const abi::__class_type_info *findBaseAtOffset(
+    const abi::__class_type_info *Derived, sptr Offset) {
+  if (!Offset)
+    return Derived;
+
+  if (const abi::__si_class_type_info *SI =
+        dynamic_cast<const abi::__si_class_type_info*>(Derived))
+    return findBaseAtOffset(SI->__base_type, Offset);
+
+  const abi::__vmi_class_type_info *VTI =
+    dynamic_cast<const abi::__vmi_class_type_info*>(Derived);
+  if (!VTI)
+    // No base class subobjects.
+    return 0;
+
+  for (unsigned int base = 0; base != VTI->base_count; ++base) {
+    sptr OffsetHere = VTI->base_info[base].__offset_flags >>
+                      abi::__base_class_type_info::__offset_shift;
+    if (VTI->base_info[base].__offset_flags &
+          abi::__base_class_type_info::__virtual_mask)
+      // FIXME: Can't handle virtual bases yet.
+      continue;
+    if (const abi::__class_type_info *Base =
+          findBaseAtOffset(VTI->base_info[base].__base_type,
+                           Offset - OffsetHere))
+      return Base;
+  }
+
+  return 0;
+}
+
+namespace {
+
+struct VtablePrefix {
+  /// The offset from the vptr to the start of the most-derived object.
+  /// This should never be greater than zero, and will usually be exactly
+  /// zero.
+  sptr Offset;
+  /// The type_info object describing the most-derived class type.
+  std::type_info *TypeInfo;
+};
+VtablePrefix *getVtablePrefix(void *Object) {
+  VtablePrefix **VptrPtr = reinterpret_cast<VtablePrefix**>(Object);
+  if (!*VptrPtr)
+    return 0;
+  VtablePrefix *Prefix = *VptrPtr - 1;
+  if (Prefix->Offset > 0 || !Prefix->TypeInfo)
+    // This can't possibly be a valid vtable.
+    return 0;
+  return Prefix;
+}
+
+}
+
+bool __ubsan::checkDynamicType(void *Object, void *Type, HashValue Hash) {
+  // A crash anywhere within this function probably means the vptr is corrupted.
+  // FIXME: Perform these checks more cautiously.
+
+  // Check whether this is something we've evicted from the cache.
+  HashValue *Bucket = getTypeCacheHashTableBucket(Hash);
+  if (*Bucket == Hash) {
+    __ubsan_vptr_type_cache[Hash % VptrTypeCacheSize] = Hash;
+    return true;
+  }
+
+  VtablePrefix *Vtable = getVtablePrefix(Object);
+  if (!Vtable)
+    return false;
+
+  // Check that this is actually a type_info object for a class type.
+  abi::__class_type_info *Derived =
+    dynamic_cast<abi::__class_type_info*>(Vtable->TypeInfo);
+  if (!Derived)
+    return false;
+
+  abi::__class_type_info *Base = (abi::__class_type_info*)Type;
+  if (!isDerivedFromAtOffset(Derived, Base, -Vtable->Offset))
+    return false;
+
+  // Success. Cache this result.
+  __ubsan_vptr_type_cache[Hash % VptrTypeCacheSize] = Hash;
+  *Bucket = Hash;
+  return true;
+}
+
+__ubsan::DynamicTypeInfo __ubsan::getDynamicTypeInfo(void *Object) {
+  VtablePrefix *Vtable = getVtablePrefix(Object);
+  if (!Vtable)
+    return DynamicTypeInfo(0, 0, 0);
+  const abi::__class_type_info *ObjectType = findBaseAtOffset(
+    static_cast<const abi::__class_type_info*>(Vtable->TypeInfo),
+    -Vtable->Offset);
+  return DynamicTypeInfo(Vtable->TypeInfo->__type_name, -Vtable->Offset,
+                         ObjectType ? ObjectType->__type_name : "<unknown>");
+}
--- libsanitizer/ubsan/ubsan_type_hash.h	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_type_hash.h	2013-01-24 09:12:50.000000000 +0100
@@ -0,0 +1,61 @@
+//===-- ubsan_type_hash.h ---------------------------------------*- C++ -*-===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Hashing of types for Clang's undefined behavior checker.
+//
+//===----------------------------------------------------------------------===//
+#ifndef UBSAN_TYPE_HASH_H
+#define UBSAN_TYPE_HASH_H
+
+#include "sanitizer_common/sanitizer_common.h"
+
+namespace __ubsan {
+
+typedef uptr HashValue;
+
+/// \brief Information about the dynamic type of an object (extracted from its
+/// vptr).
+class DynamicTypeInfo {
+  const char *MostDerivedTypeName;
+  sptr Offset;
+  const char *SubobjectTypeName;
+
+public:
+  DynamicTypeInfo(const char *MDTN, sptr Offset, const char *STN)
+    : MostDerivedTypeName(MDTN), Offset(Offset), SubobjectTypeName(STN) {}
+
+  /// Determine whether the object had a valid dynamic type.
+  bool isValid() const { return MostDerivedTypeName; }
+  /// Get the name of the most-derived type of the object.
+  const char *getMostDerivedTypeName() const { return MostDerivedTypeName; }
+  /// Get the offset from the most-derived type to this base class.
+  sptr getOffset() const { return Offset; }
+  /// Get the name of the most-derived type at the specified offset.
+  const char *getSubobjectTypeName() const { return SubobjectTypeName; }
+};
+
+/// \brief Get information about the dynamic type of an object.
+DynamicTypeInfo getDynamicTypeInfo(void *Object);
+
+/// \brief Check whether the dynamic type of \p Object has a \p Type subobject
+/// at offset 0.
+/// \return \c true if the type matches, \c false if not.
+bool checkDynamicType(void *Object, void *Type, HashValue Hash);
+
+const unsigned VptrTypeCacheSize = 128;
+
+/// \brief A cache of the results of checkDynamicType. \c checkDynamicType would
+/// return \c true (modulo hash collisions) if
+/// \code
+///   __ubsan_vptr_type_cache[Hash % VptrTypeCacheSize] == Hash
+/// \endcode
+extern "C" SANITIZER_INTERFACE_ATTRIBUTE
+HashValue __ubsan_vptr_type_cache[VptrTypeCacheSize];
+
+} // namespace __ubsan
+
+#endif // UBSAN_TYPE_HASH_H
--- libsanitizer/ubsan/ubsan_value.cc	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_value.cc	2013-06-06 08:51:12.000000000 +0200
@@ -0,0 +1,99 @@
+//===-- ubsan_value.cc ----------------------------------------------------===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Representation of a runtime value, as marshaled from the generated code to
+// the ubsan runtime.
+//
+//===----------------------------------------------------------------------===//
+
+#include "ubsan_value.h"
+#include "sanitizer_common/sanitizer_common.h"
+#include "sanitizer_common/sanitizer_libc.h"
+
+using namespace __ubsan;
+
+SIntMax Value::getSIntValue() const {
+  CHECK(getType().isSignedIntegerTy());
+  if (isInlineInt()) {
+    // Val was zero-extended to ValueHandle. Sign-extend from original width
+    // to SIntMax.
+    const unsigned ExtraBits =
+      sizeof(SIntMax) * 8 - getType().getIntegerBitWidth();
+    return SIntMax(Val) << ExtraBits >> ExtraBits;
+  }
+  if (getType().getIntegerBitWidth() == 64)
+    return *reinterpret_cast<s64*>(Val);
+#if HAVE_INT128_T
+  if (getType().getIntegerBitWidth() == 128)
+    return *reinterpret_cast<s128*>(Val);
+#else
+  if (getType().getIntegerBitWidth() == 128)
+    UNREACHABLE("libclang_rt.ubsan was built without __int128 support");
+#endif
+  UNREACHABLE("unexpected bit width");
+}
+
+UIntMax Value::getUIntValue() const {
+  CHECK(getType().isUnsignedIntegerTy());
+  if (isInlineInt())
+    return Val;
+  if (getType().getIntegerBitWidth() == 64)
+    return *reinterpret_cast<u64*>(Val);
+#if HAVE_INT128_T
+  if (getType().getIntegerBitWidth() == 128)
+    return *reinterpret_cast<u128*>(Val);
+#else
+  if (getType().getIntegerBitWidth() == 128)
+    UNREACHABLE("libclang_rt.ubsan was built without __int128 support");
+#endif
+  UNREACHABLE("unexpected bit width");
+}
+
+UIntMax Value::getPositiveIntValue() const {
+  if (getType().isUnsignedIntegerTy())
+    return getUIntValue();
+  SIntMax Val = getSIntValue();
+  CHECK(Val >= 0);
+  return Val;
+}
+
+/// Get the floating-point value of this object, extended to a long double.
+/// These are always passed by address (our calling convention doesn't allow
+/// them to be passed in floating-point registers, so this has little cost).
+FloatMax Value::getFloatValue() const {
+  CHECK(getType().isFloatTy());
+  if (isInlineFloat()) {
+    switch (getType().getFloatBitWidth()) {
+#if 0
+      // FIXME: OpenCL / NEON 'half' type. LLVM can't lower the conversion
+      //        from '__fp16' to 'long double'.
+      case 16: {
+        __fp16 Value;
+        internal_memcpy(&Value, &Val, 4);
+        return Value;
+      }
+#endif
+      case 32: {
+        float Value;
+        internal_memcpy(&Value, &Val, 4);
+        return Value;
+      }
+      case 64: {
+        double Value;
+        internal_memcpy(&Value, &Val, 8);
+        return Value;
+      }
+    }
+  } else {
+    switch (getType().getFloatBitWidth()) {
+    case 64: return *reinterpret_cast<double*>(Val);
+    case 80: return *reinterpret_cast<long double*>(Val);
+    case 128: return *reinterpret_cast<long double*>(Val);
+    }
+  }
+  UNREACHABLE("unexpected floating point bit width");
+}
--- libsanitizer/ubsan/ubsan_value.h	2013-05-30 10:36:17.967496945 +0200
+++ libsanitizer/ubsan/ubsan_value.h	2013-06-06 09:18:31.784050370 +0200
@@ -0,0 +1,202 @@
+//===-- ubsan_value.h -------------------------------------------*- C++ -*-===//
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Representation of data which is passed from the compiler-generated calls into
+// the ubsan runtime.
+//
+//===----------------------------------------------------------------------===//
+#ifndef UBSAN_VALUE_H
+#define UBSAN_VALUE_H
+
+// For now, only support linux and darwin. Other platforms should be easy to
+// add, and probably work as-is.
+#if !defined(__linux__) && !defined(__APPLE__)
+#error "UBSan not supported for this platform!"
+#endif
+
+#include "sanitizer_common/sanitizer_atomic.h"
+#include "sanitizer_common/sanitizer_common.h"
+
+// FIXME: Move this out to a config header.
+#if __SIZEOF_INT128__
+typedef __int128 s128;
+typedef unsigned __int128 u128;
+#define HAVE_INT128_T 1
+#else
+#define HAVE_INT128_T 0
+#endif
+
+
+namespace __ubsan {
+
+/// \brief Largest integer types we support.
+#if HAVE_INT128_T
+typedef s128 SIntMax;
+typedef u128 UIntMax;
+#else
+typedef s64 SIntMax;
+typedef u64 UIntMax;
+#endif
+
+/// \brief Largest floating-point type we support.
+typedef long double FloatMax;
+
+/// \brief A description of a source location. This corresponds to Clang's
+/// \c PresumedLoc type.
+class SourceLocation {
+  const char *Filename;
+  u32 Line;
+  u32 Column;
+
+public:
+  SourceLocation() : Filename(), Line(), Column() {}
+  SourceLocation(const char *Filename, unsigned Line, unsigned Column)
+    : Filename(Filename), Line(Line), Column(Column) {}
+
+  /// \brief Determine whether the source location is known.
+  bool isInvalid() const { return !Filename; }
+
+  /// \brief Atomically acquire a copy, disabling original in-place.
+  /// Exactly one call to acquire() returns a copy that isn't disabled.
+  SourceLocation acquire() {
+    u32 OldColumn = __sanitizer::atomic_exchange(
+                        (__sanitizer::atomic_uint32_t *)&Column, ~u32(0),
+                        __sanitizer::memory_order_relaxed);
+    return SourceLocation(Filename, Line, OldColumn);
+  }
+
+  /// \brief Determine if this Location has been disabled.
+  /// Disabled SourceLocations are invalid to use.
+  bool isDisabled() {
+    return Column == ~u32(0);
+  }
+
+  /// \brief Get the presumed filename for the source location.
+  const char *getFilename() const { return Filename; }
+  /// \brief Get the presumed line number.
+  unsigned getLine() const { return Line; }
+  /// \brief Get the column within the presumed line.
+  unsigned getColumn() const { return Column; }
+};
+
+
+/// \brief A description of a type.
+class TypeDescriptor {
+  /// A value from the \c Kind enumeration, specifying what flavor of type we
+  /// have.
+  u16 TypeKind;
+
+  /// A \c Type-specific value providing information which allows us to
+  /// interpret the meaning of a ValueHandle of this type.
+  u16 TypeInfo;
+
+  /// The name of the type follows, in a format suitable for including in
+  /// diagnostics.
+  char TypeName[1];
+
+public:
+  enum Kind {
+    /// An integer type. Lowest bit is 1 for a signed value, 0 for an unsigned
+    /// value. Remaining bits are log_2(bit width). The value representation is
+    /// the integer itself if it fits into a ValueHandle, and a pointer to the
+    /// integer otherwise.
+    TK_Integer = 0x0000,
+    /// A floating-point type. Low 16 bits are bit width. The value
+    /// representation is that of bitcasting the floating-point value to an
+    /// integer type.
+    TK_Float = 0x0001,
+    /// Any other type. The value representation is unspecified.
+    TK_Unknown = 0xffff
+  };
+
+  const char *getTypeName() const { return TypeName; }
+
+  Kind getKind() const {
+    return static_cast<Kind>(TypeKind);
+  }
+
+  bool isIntegerTy() const { return getKind() == TK_Integer; }
+  bool isSignedIntegerTy() const {
+    return isIntegerTy() && (TypeInfo & 1);
+  }
+  bool isUnsignedIntegerTy() const {
+    return isIntegerTy() && !(TypeInfo & 1);
+  }
+  unsigned getIntegerBitWidth() const {
+    CHECK(isIntegerTy());
+    return 1 << (TypeInfo >> 1);
+  }
+
+  bool isFloatTy() const { return getKind() == TK_Float; }
+  unsigned getFloatBitWidth() const {
+    CHECK(isFloatTy());
+    return TypeInfo;
+  }
+};
+
+/// \brief An opaque handle to a value.
+typedef uptr ValueHandle;
+
+
+/// \brief Representation of an operand value provided by the instrumented code.
+///
+/// This is a combination of a TypeDescriptor (which is emitted as constant data
+/// as an operand to a handler function) and a ValueHandle (which is passed at
+/// runtime when a check failure occurs).
+class Value {
+  /// The type of the value.
+  const TypeDescriptor &Type;
+  /// The encoded value itself.
+  ValueHandle Val;
+
+  /// Is \c Val a (zero-extended) integer?
+  bool isInlineInt() const {
+    CHECK(getType().isIntegerTy());
+    const unsigned InlineBits = sizeof(ValueHandle) * 8;
+    const unsigned Bits = getType().getIntegerBitWidth();
+    return Bits <= InlineBits;
+  }
+
+  /// Is \c Val a (zero-extended) integer representation of a float?
+  bool isInlineFloat() const {
+    CHECK(getType().isFloatTy());
+    const unsigned InlineBits = sizeof(ValueHandle) * 8;
+    const unsigned Bits = getType().getFloatBitWidth();
+    return Bits <= InlineBits;
+  }
+
+public:
+  Value(const TypeDescriptor &Type, ValueHandle Val) : Type(Type), Val(Val) {}
+
+  const TypeDescriptor &getType() const { return Type; }
+
+  /// \brief Get this value as a signed integer.
+  SIntMax getSIntValue() const;
+
+  /// \brief Get this value as an unsigned integer.
+  UIntMax getUIntValue() const;
+
+  /// \brief Decode this value, which must be a positive or unsigned integer.
+  UIntMax getPositiveIntValue() const;
+
+  /// Is this an integer with value -1?
+  bool isMinusOne() const {
+    return getType().isSignedIntegerTy() && getSIntValue() == -1;
+  }
+
+  /// Is this a negative integer?
+  bool isNegative() const {
+    return getType().isSignedIntegerTy() && getSIntValue() < 0;
+  }
+
+  /// \brief Get this value as a floating-point quantity.
+  FloatMax getFloatValue() const;
+};
+
+} // namespace __ubsan
+
+#endif // UBSAN_VALUE_H

[-- Attachment #4: P3 --]
[-- Type: text/plain, Size: 436 bytes --]

--- libsanitizer/ubsan/ubsan_value.h.jj	2013-06-06 08:51:12.000000000 +0200
+++ libsanitizer/ubsan/ubsan_value.h	2013-06-06 09:18:31.784050370 +0200
@@ -25,8 +25,8 @@
 
 // FIXME: Move this out to a config header.
 #if __SIZEOF_INT128__
-typedef __int128 s128;
-typedef unsigned __int128 u128;
+__extension__ typedef __int128 s128;
+__extension__ typedef unsigned __int128 u128;
 #define HAVE_INT128_T 1
 #else
 #define HAVE_INT128_T 0

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  8:21         ` Jakub Jelinek
@ 2013-06-06  8:26           ` Andrew Pinski
  2013-06-06  8:40             ` Jakub Jelinek
  2013-06-06  8:42             ` Konstantin Serebryany
  2013-06-06  8:47           ` Konstantin Serebryany
  1 sibling, 2 replies; 46+ messages in thread
From: Andrew Pinski @ 2013-06-06  8:26 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Konstantin Serebryany, Marek Polacek, GCC Patches

On Thu, Jun 6, 2013 at 1:21 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jun 06, 2013 at 11:46:19AM +0400, Konstantin Serebryany wrote:
>> If we are going to import the ubsan run-time from LLVM's
>> projects/compiler-rt/lib/ubsan,
>> we may also need to update the contents of
>> libsanitizer/sanitizer_common and keep them in sync afterwards.
>> (ubsan shares few bits of code with asan/tsan/msan)
>> The simplest way to do that is to extend libsanitizer/merge.sh
>
> Sure.  I've done so far just a partial merge by hand (only 3 changed files
> for the minimum of changes required to get ubsan to build), and have tested just
> that it compiles, not that libubsan actually works.
>
> P1 patch is the toplevel stuff to add ubsan into GCC libsanitizer, plus
> ubsan/Makefile* and ubsan/libtool-version (i.e. gcc owned files).
> P2 is the actual merge of the ubsan files.
> P3 is something I'd propose for ubsan upstream, without it g++ warns about
> __int128 in -pedantic mode.

Is there a reason why ubsan runtime in C++?  That seems like a bad
idea to require linking against libstdc++ when doing development of a
C only program.

Also it seems easy enough to write a GCC specific runtime that does
not depend on the rest of libsanitizer stuff anyways.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  8:26           ` Andrew Pinski
@ 2013-06-06  8:40             ` Jakub Jelinek
  2013-06-06  8:42             ` Konstantin Serebryany
  1 sibling, 0 replies; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-06  8:40 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Konstantin Serebryany, Marek Polacek, GCC Patches

On Thu, Jun 06, 2013 at 01:26:06AM -0700, Andrew Pinski wrote:
> On Thu, Jun 6, 2013 at 1:21 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Thu, Jun 06, 2013 at 11:46:19AM +0400, Konstantin Serebryany wrote:
> >> If we are going to import the ubsan run-time from LLVM's
> >> projects/compiler-rt/lib/ubsan,
> >> we may also need to update the contents of
> >> libsanitizer/sanitizer_common and keep them in sync afterwards.
> >> (ubsan shares few bits of code with asan/tsan/msan)
> >> The simplest way to do that is to extend libsanitizer/merge.sh
> >
> > Sure.  I've done so far just a partial merge by hand (only 3 changed files
> > for the minimum of changes required to get ubsan to build), and have tested just
> > that it compiles, not that libubsan actually works.
> >
> > P1 patch is the toplevel stuff to add ubsan into GCC libsanitizer, plus
> > ubsan/Makefile* and ubsan/libtool-version (i.e. gcc owned files).
> > P2 is the actual merge of the ubsan files.
> > P3 is something I'd propose for ubsan upstream, without it g++ warns about
> > __int128 in -pedantic mode.
> 
> Is there a reason why ubsan runtime in C++?  That seems like a bad
> idea to require linking against libstdc++ when doing development of a
> C only program.

-fsanitize=undefined etc. are debugging modes, not something meant for
release versions of programs, I think it is not a big deal.
C++ is implementation language for the libraries, why exactly we actually
link libasan and libtsan against -lstdc++ I don't really remember, maybe we don't
have to, it is compiled with -fno-exceptions and doesn't use any libstdc++
symbols.  libubsan apparently has two files which actually use some
libstdc++ symbols and are for some C++ sanitization.
BTW, all the libs link against -ldl too (not that big a deal) and -lpthread
(IMHO more serious problem than -lstdc++).

> Also it seems easy enough to write a GCC specific runtime that does
> not depend on the rest of libsanitizer stuff anyways.

We already have libasan and libtsan in gcc, ubsan is just a think layer on
top of the sanitizer_common infrastructure, we'd have to write from scratch
not just the handlers, but some infrastructure too, for what gain?

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  8:26           ` Andrew Pinski
  2013-06-06  8:40             ` Jakub Jelinek
@ 2013-06-06  8:42             ` Konstantin Serebryany
  2013-06-06  8:45               ` Jakub Jelinek
  1 sibling, 1 reply; 46+ messages in thread
From: Konstantin Serebryany @ 2013-06-06  8:42 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Jakub Jelinek, Marek Polacek, GCC Patches

On Thu, Jun 6, 2013 at 12:26 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Thu, Jun 6, 2013 at 1:21 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Jun 06, 2013 at 11:46:19AM +0400, Konstantin Serebryany wrote:
>>> If we are going to import the ubsan run-time from LLVM's
>>> projects/compiler-rt/lib/ubsan,
>>> we may also need to update the contents of
>>> libsanitizer/sanitizer_common and keep them in sync afterwards.
>>> (ubsan shares few bits of code with asan/tsan/msan)
>>> The simplest way to do that is to extend libsanitizer/merge.sh
>>
>> Sure.  I've done so far just a partial merge by hand (only 3 changed files
>> for the minimum of changes required to get ubsan to build), and have tested just
>> that it compiles, not that libubsan actually works.
>>
>> P1 patch is the toplevel stuff to add ubsan into GCC libsanitizer, plus
>> ubsan/Makefile* and ubsan/libtool-version (i.e. gcc owned files).
>> P2 is the actual merge of the ubsan files.
>> P3 is something I'd propose for ubsan upstream, without it g++ warns about
>> __int128 in -pedantic mode.
>
> Is there a reason why ubsan runtime in C++?  That seems like a bad
> idea to require linking against libstdc++ when doing development of a
> C only program.

for asan/tsan/msan/lsan the reason is that C++ is a better language
(in the author's humble opinion :).
For ubsan, I think the reason is the same, plus ubsan shares some C++
code with asan/tsan/msan/lsan.

As for libstdc++, I completely agree, we don't want to depend on it,
and we don't.
None of sanitizer run-times uses C++ features  that require libstdc++

--kcc

>
> Also it seems easy enough to write a GCC specific runtime that does
> not depend on the rest of libsanitizer stuff anyways.
>
> Thanks,
> Andrew Pinski

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  8:42             ` Konstantin Serebryany
@ 2013-06-06  8:45               ` Jakub Jelinek
  2013-06-06  8:55                 ` Konstantin Serebryany
  0 siblings, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-06  8:45 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: Andrew Pinski, Marek Polacek, GCC Patches

On Thu, Jun 06, 2013 at 12:41:56PM +0400, Konstantin Serebryany wrote:
> As for libstdc++, I completely agree, we don't want to depend on it,
> and we don't.

ubsan actually needs
                 U _ZTIN10__cxxabiv117__class_type_infoE@@CXXABI_1.3
                 U _ZTIN10__cxxabiv120__si_class_type_infoE@@CXXABI_1.3
                 U _ZTIN10__cxxabiv121__vmi_class_type_infoE@@CXXABI_1.3
                 U _ZTISt9type_info@@GLIBCXX_3.4
                 U __dynamic_cast@@CXXABI_1.3
plus all the libs have:
                 w __cxa_demangle@@CXXABI_1.3

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  8:21         ` Jakub Jelinek
  2013-06-06  8:26           ` Andrew Pinski
@ 2013-06-06  8:47           ` Konstantin Serebryany
  1 sibling, 0 replies; 46+ messages in thread
From: Konstantin Serebryany @ 2013-06-06  8:47 UTC (permalink / raw)
  To: Jakub Jelinek, richard; +Cc: Andrew Pinski, Marek Polacek, GCC Patches

+richard@metafoo.co.uk

On Thu, Jun 6, 2013 at 12:21 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jun 06, 2013 at 11:46:19AM +0400, Konstantin Serebryany wrote:
>> If we are going to import the ubsan run-time from LLVM's
>> projects/compiler-rt/lib/ubsan,
>> we may also need to update the contents of
>> libsanitizer/sanitizer_common and keep them in sync afterwards.
>> (ubsan shares few bits of code with asan/tsan/msan)
>> The simplest way to do that is to extend libsanitizer/merge.sh
>
> Sure.  I've done so far just a partial merge by hand (only 3 changed files
> for the minimum of changes required to get ubsan to build), and have tested just
> that it compiles, not that libubsan actually works.
>
> P1 patch is the toplevel stuff to add ubsan into GCC libsanitizer, plus
> ubsan/Makefile* and ubsan/libtool-version (i.e. gcc owned files).

The trivial patch to merge.sh is ok.
The partial merge is ok if it doesn't break asan/tsan build.

> P2 is the actual merge of the ubsan files.

Ok too.

> P3 is something I'd propose for ubsan upstream, without it g++ warns about
> __int128 in -pedantic mode.

Looks good.

richard@metafoo.co.uk, do you agree to apply this upstream?

ubsan/ubsan_value.h:
-typedef __int128 s128;
-typedef unsigned __int128 u128;
+__extension__ typedef __int128 s128;
+__extension__ typedef unsigned __int128 u128;

--kcc


>
>         Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  8:45               ` Jakub Jelinek
@ 2013-06-06  8:55                 ` Konstantin Serebryany
  2013-06-06  8:59                   ` Jakub Jelinek
       [not found]                   ` <CAOfiQqkyXWpS-hM=FONEsZqSaWhd3UmUHb=sD71Cb96Df7er4w@mail.gmail.com>
  0 siblings, 2 replies; 46+ messages in thread
From: Konstantin Serebryany @ 2013-06-06  8:55 UTC (permalink / raw)
  To: Jakub Jelinek, richard; +Cc: Andrew Pinski, Marek Polacek, GCC Patches

On Thu, Jun 6, 2013 at 12:44 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jun 06, 2013 at 12:41:56PM +0400, Konstantin Serebryany wrote:
>> As for libstdc++, I completely agree, we don't want to depend on it,
>> and we don't.
>
> ubsan actually needs
>                  U _ZTIN10__cxxabiv117__class_type_infoE@@CXXABI_1.3
>                  U _ZTIN10__cxxabiv120__si_class_type_infoE@@CXXABI_1.3
>                  U _ZTIN10__cxxabiv121__vmi_class_type_infoE@@CXXABI_1.3
>                  U _ZTISt9type_info@@GLIBCXX_3.4
>                  U __dynamic_cast@@CXXABI_1.3

These things are needed only for the C++-specific undefined behavior checking.
At least, if I compile a C test using clang -fsanitize=undefined I
don't see any of  these.

Richard, am I right?


> plus all the libs have:
>                  w __cxa_demangle@@CXXABI_1.3

This beast is declared as weak:
sanitizer_common/sanitizer_symbolizer_itanium.cc
  extern "C" char *__cxa_demangle(const char *mangled, char *buffer,
                                  size_t *length, int *status)
    SANITIZER_WEAK_ATTRIBUTE;

If we have the C++ run-time linked-in, we can use __cxa_demangle.
If we don't have the C++ run-time, we most likely don't need
__cxa_demangle either.

You can confirm this by building some C program with "clang
-fsanitize=address" -- it will not depend on libc++.

--kcc

>
>         Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  8:55                 ` Konstantin Serebryany
@ 2013-06-06  8:59                   ` Jakub Jelinek
  2013-06-06  9:03                     ` Konstantin Serebryany
       [not found]                   ` <CAOfiQqkyXWpS-hM=FONEsZqSaWhd3UmUHb=sD71Cb96Df7er4w@mail.gmail.com>
  1 sibling, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-06  8:59 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: richard, Andrew Pinski, Marek Polacek, GCC Patches

On Thu, Jun 06, 2013 at 12:55:17PM +0400, Konstantin Serebryany wrote:
> > ubsan actually needs
> >                  U _ZTIN10__cxxabiv117__class_type_infoE@@CXXABI_1.3
> >                  U _ZTIN10__cxxabiv120__si_class_type_infoE@@CXXABI_1.3
> >                  U _ZTIN10__cxxabiv121__vmi_class_type_infoE@@CXXABI_1.3
> >                  U _ZTISt9type_info@@GLIBCXX_3.4
> >                  U __dynamic_cast@@CXXABI_1.3
> 
> These things are needed only for the C++-specific undefined behavior checking.
> At least, if I compile a C test using clang -fsanitize=undefined I
> don't see any of  these.

But that is only because of the statically linking everything approach.
When libubsan is a shared library, when any part of the library needs
libstdc++, you need it for everything, unless you do some weakref tricks and
use it only conditionally (but that might be harder when you use C++
dynamic_cast, you'd need to call the runtime routine through weak symbol
instead by hand).

> > plus all the libs have:
> >                  w __cxa_demangle@@CXXABI_1.3
> 
> This beast is declared as weak:

Sure, I know very well what w means ;), was listing this just for
completeness.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  8:59                   ` Jakub Jelinek
@ 2013-06-06  9:03                     ` Konstantin Serebryany
  0 siblings, 0 replies; 46+ messages in thread
From: Konstantin Serebryany @ 2013-06-06  9:03 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: richard, Andrew Pinski, Marek Polacek, GCC Patches

On Thu, Jun 6, 2013 at 12:59 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jun 06, 2013 at 12:55:17PM +0400, Konstantin Serebryany wrote:
>> > ubsan actually needs
>> >                  U _ZTIN10__cxxabiv117__class_type_infoE@@CXXABI_1.3
>> >                  U _ZTIN10__cxxabiv120__si_class_type_infoE@@CXXABI_1.3
>> >                  U _ZTIN10__cxxabiv121__vmi_class_type_infoE@@CXXABI_1.3
>> >                  U _ZTISt9type_info@@GLIBCXX_3.4
>> >                  U __dynamic_cast@@CXXABI_1.3
>>
>> These things are needed only for the C++-specific undefined behavior checking.
>> At least, if I compile a C test using clang -fsanitize=undefined I
>> don't see any of  these.
>
> But that is only because of the statically linking everything approach.

Err. Yes, right.
We don't link either of sanitizers dynamically.


> When libubsan is a shared library, when any part of the library needs
> libstdc++, you need it for everything, unless you do some weakref tricks and
> use it only conditionally (but that might be harder when you use C++
> dynamic_cast, you'd need to call the runtime routine through weak symbol
> instead by hand).
>
>> > plus all the libs have:
>> >                  w __cxa_demangle@@CXXABI_1.3
>>
>> This beast is declared as weak:
>
> Sure, I know very well what w means ;), was listing this just for
> completeness.
>
>         Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
       [not found]                   ` <CAOfiQqkyXWpS-hM=FONEsZqSaWhd3UmUHb=sD71Cb96Df7er4w@mail.gmail.com>
@ 2013-06-06 10:23                     ` Richard Smith
  0 siblings, 0 replies; 46+ messages in thread
From: Richard Smith @ 2013-06-06 10:23 UTC (permalink / raw)
  To: Konstantin Serebryany
  Cc: Jakub Jelinek, Andrew Pinski, Marek Polacek, GCC Patches

[Resending with less text/html]

On Thu, Jun 6, 2013 at 1:55 AM, Konstantin Serebryany
<konstantin.s.serebryany@gmail.com> wrote:
> On Thu, Jun 6, 2013 at 12:44 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Thu, Jun 06, 2013 at 12:41:56PM +0400, Konstantin Serebryany wrote:
> >> As for libstdc++, I completely agree, we don't want to depend on it,
> >> and we don't.
> >
> > ubsan actually needs
> >                  U _ZTIN10__cxxabiv117__class_type_infoE@@CXXABI_1.3
> >                  U _ZTIN10__cxxabiv120__si_class_type_infoE@@CXXABI_1.3
> >                  U _ZTIN10__cxxabiv121__vmi_class_type_infoE@@CXXABI_1.3
> >                  U _ZTISt9type_info@@GLIBCXX_3.4
> >                  U __dynamic_cast@@CXXABI_1.3
>
> These things are needed only for the C++-specific undefined behavior checking.
> At least, if I compile a C test using clang -fsanitize=undefined I
> don't see any of  these.
>
> Richard, am I right?

Yes. We build two different runtimes, one which needs these bits (for
C++) and one which doesn't (for C).

Adding __extension__ to the __int128 typedefs is fine too.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  6:07     ` Jakub Jelinek
@ 2013-06-06 12:17       ` Jason Merrill
  2013-06-06 13:26       ` Segher Boessenkool
  1 sibling, 0 replies; 46+ messages in thread
From: Jason Merrill @ 2013-06-06 12:17 UTC (permalink / raw)
  To: Jakub Jelinek, Marek Polacek, Joseph S. Myers; +Cc: GCC Patches

On 06/06/2013 02:07 AM, Jakub Jelinek wrote:
> Jason, does
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3675.html#1457
> apply just to C++11/C++14, or to C++03 too?

The committee hasn't said anything about which DRs since C++03 apply to 
it.  I take the position that most do, but not this one, since it is a 
change to wording that doesn't exist in C++03.

> In C++03 I see in [expr.shift]/2
> "The value of E1 << E2 is E1 (interpreted as a bit pattern) left-shifted E2
> bit positions; vacated bits are zero-filled. If E1 has an unsigned type,
> the value of the result is E1 multiplied by the quantity 2 raised to
> the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long,
> UINT_MAX+1 otherwise."  Is that the same case as C90 then, the wording seems
> to be pretty much the same?

Yes, that's the same as C90.

> what the current -std= makes as undefined behavior (though, because of DRs
> that is somewhat fuzzy, pre-DR1457 C++11 vs. post-DR1457 C++11)

In contrast to the C++03 situation, the committee has been clear about 
which DRs apply to C++11 and which to C++1y, and this one does apply to 
C++11.

It's unfortunate that C and C++ have different rules here.  I'm actually 
inclined to agree with comment 48 from 
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n834.htm that we should 
have left the C90/C++98 rules alone, but I guess that comment was rejected.

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06  6:07     ` Jakub Jelinek
  2013-06-06 12:17       ` Jason Merrill
@ 2013-06-06 13:26       ` Segher Boessenkool
  2013-06-06 13:35         ` Jakub Jelinek
  1 sibling, 1 reply; 46+ messages in thread
From: Segher Boessenkool @ 2013-06-06 13:26 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Marek Polacek, Jason Merrill, Joseph S. Myers, GCC Patches

> The C++11/C++14 undefined behavior of left signed shift can be tested
> similarly, if ((unsigned type for op0's type) op0) >> (precm1 - y)
> is greater than one, then it is undefined behavior.
> Jason, does
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/ 
> n3675.html#1457
> apply just to C++11/C++14, or to C++03 too?

Doesn't DR1457 also leave

    neg << 0

as undefined, where "neg" is a negative value?  That isn't caught by
your "greater than one" expression.


Segher

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-06 13:26       ` Segher Boessenkool
@ 2013-06-06 13:35         ` Jakub Jelinek
  0 siblings, 0 replies; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-06 13:35 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Marek Polacek, Jason Merrill, Joseph S. Myers, GCC Patches

On Thu, Jun 06, 2013 at 03:26:19PM +0200, Segher Boessenkool wrote:
> >The C++11/C++14 undefined behavior of left signed shift can be tested
> >similarly, if ((unsigned type for op0's type) op0) >> (precm1 - y)
> >is greater than one, then it is undefined behavior.
> >Jason, does
> >http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3675.html#1457
> >apply just to C++11/C++14, or to C++03 too?
> 
> Doesn't DR1457 also leave
> 
>    neg << 0
> 
> as undefined, where "neg" is a negative value?  That isn't caught by
> your "greater than one" expression.

Yeah, of course, it needs to be for any shift x << y or x >> y (signed or unsigned):
1) if ((unsigned) y > precm1) ub
plus for signed x << y:
2) for C99/C11 if ((unsigned) x >> (precm1 - y)) ub
3) for C++11/C++14 if (x < 0 || ((unsigned) x >> (precm1 - y)) > 1) ub

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer
  2013-06-05 19:51 ` [RFC] Implement Undefined Behavior Sanitizer Joseph S. Myers
@ 2013-06-07 12:38   ` Marek Polacek
  0 siblings, 0 replies; 46+ messages in thread
From: Marek Polacek @ 2013-06-07 12:38 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC Patches

On Wed, Jun 05, 2013 at 07:50:52PM +0000, Joseph S. Myers wrote:
> On Wed, 5 Jun 2013, Marek Polacek wrote:
> 
> > It works by creating a COMPOUND_EXPR around original expression, so e.g.
> > it creates:
> > 
> > if (b < 0 || (b > 31 || a < 0))
> >   {
> >     __builtin___ubsan_handle_shift_out_of_bounds ();
> >   }
> > else
> >   {
> >     0
> >   }, a << b;
> > 
> > from original "a <<= b;".
> 
> For the "a < 0" here, and signed left shift of a positive value shifting a 
> 1 into or past the sign bit, I think it should be possible to control the 
> checks separately from other checks on shifts - both because those cases 
> were implementation-defined in C90, only undefined in C99/C11, and because 
> they are widely used in practice.

Ok, I see.

> > There is of course a lot of stuff that needs to be done, more
> > specifically:
> 
> 5) Testcases (or if applicable, running existing testcases coming with the 
> library).

Yeah -- we definitely want to have some testcases; the trouble is
that, like for tsan, we don't have any infrastructure for that yet.
Probably we could just put new tests into gcc.dg and put
-fsanitize=undefined into dg-options?  Or maybe tweak .exp files and
run some testcases also with -fsanitize=undefined, but the thing is
that we can't use dg-do compile tests, we need dg-do run tests.

> 6) Map -ftrapv onto an appropriate subset of this option that handles the 
> cases -ftrapv was meant to handle (so arithmetic overflow, which I'd say 
> should include INT_MIN / -1).

Ok, we can look at this maybe later when ubsan is more mature.

> >   4) and of course, more instrumentation (C/C++ FE, gimple level)
> >      What comes to mind is:
> >      - float/double to integer conversions,
> 
> Under Annex F, these return an unspecified value rather than being 
> undefined behavior.

Aha, good to know.  I've mentioned it because clang instruments that.

> >      - integer overflows (a long list of various cases here),
> 
> Strictly, including INT_MIN % -1 (both / and % are undefined if the result 
> of either is unrepresentable) - it appears you've already got that.  Of 
> course INT_MIN % -1 and INT_MIN / -1 should *work* reliably with -fwrapv, 
> which is another bug (30484).
> 
> >      - invalid conversions of int to bool,
> 
> What do you mean?  Conversion to bool is just a comparison != 0.

Something like e.g.:

unsigned char c = 42;
int
main (void)
{
  _Bool *b = (_Bool *) &c;
  return *b;
}

(clang catches this.)

> >      - VLAs size (e.g. negative size),
> 
> Or the multiplication used to compute the size in bytes overflows (really, 
> there should be some code generated expanding the stack bit by bit to 
> avoid it accidentally overflowing into another allocated area of memory, I 
> suppose).

Yeah, that sounds interesting as well.

> > +@item -fsanitize=undefined
> > +Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector
> > +Various computations will be instrumented to detect
> > +undefined behavior, e.g. division by zero or various overflows.
> 
> e.g.@:

Fixed.  Thanks!

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-05 19:19 ` Jakub Jelinek
  2013-06-05 19:35   ` Jakub Jelinek
@ 2013-06-08 16:43   ` Marek Polacek
  2013-06-08 17:48     ` Marc Glisse
  2013-06-10 14:29     ` Joseph S. Myers
  1 sibling, 2 replies; 46+ messages in thread
From: Marek Polacek @ 2013-06-08 16:43 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

Thanks for the reviews, here is another version.  I haven't touched
the division by zero instrumentation, but the shift instrumentation is
revamped; what it should instrument now is, as Jakub wrote:
1) if ((unsigned) y > precm1) ub
plus for signed x << y:
2) for C99/C11 if ((unsigned) x >> (precm1 - y)) ub
3) for C++11/C++14 if (x < 0 || ((unsigned) x >> (precm1 - y)) > 1) ub
Is there anything left to do for shifts?

The c-ubsan.c now resides in c-family/.
I also fixed the ICE with long/long long type (the resulting type of
a RSHIFT_EXPR should really be that of the left operand).
But I have no clue how it behaves with e.g. constexpr specifier.

When/if this is ok enough, I think we could put this on a branch,
together with Jakub's patches from
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00291.html

The next part will be to construct arguments for the ubsan library;
hopefully I can (mis)use parts of asan.c code.  Then the command-line
parsing (parse options into the flag_sanitize bitmask).  And after this
is done, I'd say we should add a testsuite, ideally something in
c-c++-common/ and something in g++.dg/ (templates and other stuff).

Regtested/bootstrapped on x86_64-linux.

2013-06-07  Marek Polacek  <polacek@redhat.com>

	* Makefile.in: Add ubsan.c.
	* common.opt: Add -fsanitize=undefined option.
	* doc/invoke.texi: Document the new flag.
	* sanitizer.def (DEF_SANITIZER_BUILTIN): Define.
	* builtin-attrs.def (ATTR_COLD): Define.
	* asan.c (initialize_sanitizer_builtins): Build
	BT_FN_VOID_PTR_PTR_PTR.
	* builtins.def (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
	BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS): Define.

c-family/
	* c-ubsan.c: New file.
	* c-ubsan.h: New file.

cp/
	* typeck.c (cp_build_binary_op): Add division by zero and shift
	instrumentation.

c/
	* c-typeck.c (build_binary_op): Add division by zero and shift
	instrumentation.

--- gcc/c-family/c-ubsan.c.mp	2013-06-07 22:58:18.084990548 +0200
+++ gcc/c-family/c-ubsan.c	2013-06-07 22:29:34.677588474 +0200
@@ -0,0 +1,120 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "c-family/c-common.h"
+#include "c-family/c-ubsan.h"
+
+/* Instrument division by zero and INT_MIN / -1.  */
+
+tree
+ubsan_instrument_division (location_t loc, enum tree_code code,
+			   tree op0, tree op1)
+{
+  tree t, tt;
+  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
+
+  if (TREE_CODE (TREE_TYPE (op0)) != INTEGER_TYPE
+      || TREE_CODE (TREE_TYPE (op1)) != INTEGER_TYPE)
+    return orig;
+
+  /* If we *know* that the divisor is not -1 or 0, we don't have to
+     instrument this expression.
+     ??? We could use decl_constant_value to cover up more cases.  */
+  if (TREE_CODE (op1) == INTEGER_CST
+      && integer_nonzerop (op1)
+      && !integer_minus_onep (op1))
+    return orig;
+
+  tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
+		    integer_minus_one_node);
+  t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
+		   TYPE_MIN_VALUE (TREE_TYPE (op0)));
+  t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
+  tt = build2 (EQ_EXPR, boolean_type_node,
+	       op1, integer_zero_node);
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, t);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig);
+
+  return t;
+}
+
+/* Instrument left and right shifts.  */
+
+tree
+ubsan_instrument_shift (location_t loc, enum tree_code code,
+			tree op0, tree op1)
+{
+  tree t, tt = NULL_TREE;
+  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
+  tree uprecm1 = build_int_cst (unsigned_type_for (TREE_TYPE (op1)),
+			       TYPE_PRECISION (TREE_TYPE (op0)) - 1);
+  tree precm1 = build_int_cst (TREE_TYPE (op1),
+			       TYPE_PRECISION (TREE_TYPE (op0)) - 1);
+
+  t = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op1)), op1);
+  t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1);
+
+  /* For signed x << y, in C99/C11, the following:
+     (unsigned) x >> (precm1 - y)
+     if non-zero, is undefined.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && (flag_isoc99 || flag_isoc11))
+    {
+      tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
+      tt = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op0)), op0);
+      tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x);
+      tt = fold_build2 (NE_EXPR, boolean_type_node, tt,
+			build_int_cst (TREE_TYPE (tt), 0));
+    }
+
+  /* For signed x << y, in C++11/C++14, the following:
+     x < 0 || ((unsigned) x >> (precm1 - y))
+     if > 1, is undefined.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && (cxx_dialect == cxx11 || cxx_dialect == cxx1y))
+    {
+      tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
+      tt = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op0)), op0);
+      tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x);
+      tt = fold_build2 (GT_EXPR, boolean_type_node, tt,
+			build_int_cst (TREE_TYPE (tt), 1));
+      x = fold_build2 (LT_EXPR, boolean_type_node, op0,
+		       build_int_cst (TREE_TYPE (op0), 0));
+      tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, x, tt);
+    }
+
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t,
+		   tt ? tt : integer_zero_node);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig);
+
+  return t;
+}
--- gcc/c-family/c-ubsan.h.mp	2013-06-07 22:58:23.009004841 +0200
+++ gcc/c-family/c-ubsan.h	2013-06-05 18:10:21.284693807 +0200
@@ -0,0 +1,27 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_UBSAN_H
+#define GCC_UBSAN_H
+
+extern tree ubsan_instrument_division (location_t, enum tree_code, tree, tree);
+extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
+
+#endif  /* GCC_UBSAN_H  */
--- gcc/sanitizer.def.mp	2013-06-07 23:01:16.783536178 +0200
+++ gcc/sanitizer.def	2013-06-07 23:04:47.646161427 +0200
@@ -283,3 +283,13 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOM
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE,
 		      "__tsan_atomic_signal_fence",
 		      BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
+
+/* Undefined Behavior Sanitizer */
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
+		      "__ubsan_handle_divrem_overflow",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS,
+		      "__ubsan_handle_shift_out_of_bounds",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
--- gcc/builtins.def.mp	2013-06-07 23:01:16.775536150 +0200
+++ gcc/builtins.def	2013-06-07 23:04:47.611161301 +0200
@@ -155,7 +155,7 @@ along with GCC; see the file COPYING3.
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
 	       true, true, true, ATTRS, true, \
-	       (flag_asan || flag_tsan))
+	       (flag_asan || flag_tsan || flag_ubsan))
 
 #undef DEF_CILKPLUS_BUILTIN
 #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
--- gcc/Makefile.in.mp	2013-06-07 23:01:16.771536135 +0200
+++ gcc/Makefile.in	2013-06-07 23:04:47.602161268 +0200
@@ -1150,7 +1150,7 @@ C_COMMON_OBJS = c-family/c-common.o c-fa
   c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o \
   c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \
   c-family/c-semantics.o c-family/c-ada-spec.o tree-mudflap.o \
-  c-family/array-notation-common.o
+  c-family/array-notation-common.o c-family/c-ubsan.o
 
 # Language-independent object files.
 # We put the insn-*.o files first so that a parallel make will build
@@ -2021,6 +2021,9 @@ c-family/array-notation-common.o : c-fam
 c-family/stub-objc.o : c-family/stub-objc.c $(CONFIG_H) $(SYSTEM_H) \
 	coretypes.h $(TREE_H) $(C_COMMON_H) c-family/c-objc.h
 
+c-family/c-ubsan.o : c-family/c-ubsan.c $(CONFIG_H) $(SYSTEM_H) \
+	coretypes.h $(TREE_H) $(C_COMMON_H) c-family/c-ubsan.h
+
 default-c.o: config/default-c.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
   $(C_TARGET_H) $(C_TARGET_DEF_H)
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
--- gcc/doc/invoke.texi.mp	2013-06-07 23:01:16.781536172 +0200
+++ gcc/doc/invoke.texi	2013-06-07 23:04:47.639161403 +0200
@@ -5143,6 +5143,11 @@ Memory access instructions will be instr
 data race bugs.
 See @uref{http://code.google.com/p/data-race-test/wiki/ThreadSanitizer} for more details.
 
+@item -fsanitize=undefined
+Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector
+Various computations will be instrumented to detect
+undefined behavior, e.g.@: division by zero or various overflows.
+
 @item -fdump-final-insns@r{[}=@var{file}@r{]}
 @opindex fdump-final-insns
 Dump the final internal representation (RTL) to @var{file}.  If the
--- gcc/cp/typeck.c.mp	2013-06-07 23:01:16.779536165 +0200
+++ gcc/cp/typeck.c	2013-06-07 23:04:47.625161351 +0200
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
 #include "convert.h"
 #include "c-family/c-common.h"
 #include "c-family/c-objc.h"
+#include "c-family/c-ubsan.h"
 #include "params.h"
 
 static tree pfn_from_ptrmemfunc (tree);
@@ -3891,6 +3892,12 @@ cp_build_binary_op (location_t location,
   op0 = orig_op0;
   op1 = orig_op1;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
   if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
       || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
       || code == TRUTH_XOR_EXPR)
@@ -4070,8 +4077,15 @@ cp_build_binary_op (location_t location,
 	{
 	  enum tree_code tcode0 = code0, tcode1 = code1;
 	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	  cop1 = maybe_constant_value (cop1);
 
-	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	  if (!processing_template_decl && tcode0 == INTEGER_TYPE
+	      && (TREE_CODE (cop1) != INTEGER_CST
+		  || integer_zerop (cop1)
+		  || integer_minus_onep (cop1)))
+	    doing_div_or_mod = true;
+
+	  warn_for_div_by_zero (location, cop1);
 
 	  if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE)
 	    tcode0 = TREE_CODE (TREE_TYPE (TREE_TYPE (op0)));
@@ -4109,8 +4123,14 @@ cp_build_binary_op (location_t location,
     case FLOOR_MOD_EXPR:
       {
 	tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	cop1 = maybe_constant_value (cop1);
 
-	warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	if (!processing_template_decl && code0 == INTEGER_TYPE
+	    && (TREE_CODE (cop1) != INTEGER_CST
+		|| integer_zerop (cop1)
+		|| integer_minus_onep (cop1)))
+	  doing_div_or_mod = true;
+	warn_for_div_by_zero (location, cop1);
       }
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -4164,6 +4184,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4211,6 +4232,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4607,6 +4629,17 @@ cp_build_binary_op (location_t location,
       break;
     }
 
+  if (flag_ubsan && doing_div_or_mod && !processing_template_decl)
+    {
+      resultcode = COMPOUND_EXPR;
+      return ubsan_instrument_division (location, code, op0, op1);
+    }
+  else if (flag_ubsan && doing_shift && !processing_template_decl)
+    {
+      resultcode = COMPOUND_EXPR;
+      return ubsan_instrument_shift (location, code, op0, op1);
+    }
+
   if (((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE
 	|| code0 == ENUMERAL_TYPE)
        && (code1 == INTEGER_TYPE || code1 == REAL_TYPE
--- gcc/common.opt.mp	2013-06-07 23:01:16.778536161 +0200
+++ gcc/common.opt	2013-06-07 23:04:47.621161337 +0200
@@ -858,6 +858,10 @@ fsanitize=thread
 Common Report Var(flag_tsan)
 Enable ThreadSanitizer, a data race detector
 
+fsanitize=undefined
+Common Report Var(flag_ubsan)
+Enable UndefinedBehaviorSanitizer, an undefined behavior detector
+
 fasynchronous-unwind-tables
 Common Report Var(flag_asynchronous_unwind_tables) Optimization
 Generate unwind tables that are exact at each instruction boundary
--- gcc/builtin-attrs.def.mp	2013-06-07 23:01:16.774536147 +0200
+++ gcc/builtin-attrs.def	2013-06-07 23:04:47.609161293 +0200
@@ -83,6 +83,7 @@ DEF_LIST_INT_INT (5,6)
 #undef DEF_LIST_INT_INT
 
 /* Construct trees for identifiers.  */
+DEF_ATTR_IDENT (ATTR_COLD, "cold")
 DEF_ATTR_IDENT (ATTR_CONST, "const")
 DEF_ATTR_IDENT (ATTR_FORMAT, "format")
 DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
@@ -130,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHRO
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST, ATTR_COLD,\
+			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
 			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC,	\
--- gcc/c/c-typeck.c.mp	2013-06-07 23:01:16.776536153 +0200
+++ gcc/c/c-typeck.c	2013-06-07 23:04:47.617161321 +0200
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.
 #include "gimple.h"
 #include "c-family/c-objc.h"
 #include "c-family/c-common.h"
+#include "c-family/c-ubsan.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
    diagnostic messages in convert_for_assignment.  */
@@ -9527,6 +9528,12 @@ build_binary_op (location_t location, en
      operands to truth-values.  */
   bool boolean_op = false;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
   if (location == UNKNOWN_LOCATION)
     location = input_location;
 
@@ -9728,6 +9735,7 @@ build_binary_op (location_t location, en
     case FLOOR_DIV_EXPR:
     case ROUND_DIV_EXPR:
     case EXACT_DIV_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
@@ -9775,6 +9783,7 @@ build_binary_op (location_t location, en
 
     case TRUNC_MOD_EXPR:
     case FLOOR_MOD_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -9873,6 +9882,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -9925,6 +9935,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -10209,6 +10220,19 @@ build_binary_op (location_t location, en
       return error_mark_node;
     }
 
+  if (flag_ubsan && doing_div_or_mod)
+    {
+      ret = ubsan_instrument_division (location, code, op0, op1);
+      resultcode = COMPOUND_EXPR;
+      goto return_build_binary_op;
+    }
+  else if (flag_ubsan && doing_shift)
+    {
+      ret = ubsan_instrument_shift (location, code, op0, op1);
+      resultcode = COMPOUND_EXPR;
+      goto return_build_binary_op;
+    }
+
   if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE
        || code0 == FIXED_POINT_TYPE || code0 == VECTOR_TYPE)
       &&
--- gcc/asan.c.mp	2013-06-07 23:01:16.773536143 +0200
+++ gcc/asan.c	2013-06-07 23:04:47.603161271 +0200
@@ -2034,6 +2034,9 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_VOID = build_function_type_list (void_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR
     = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
+  tree BT_FN_VOID_PTR_PTR_PTR
+    = build_function_type_list (void_type_node, ptr_type_node,
+				ptr_type_node, ptr_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR_PTRMODE
     = build_function_type_list (void_type_node, ptr_type_node,
 				build_nonstandard_integer_type (POINTER_SIZE,
@@ -2099,6 +2102,9 @@ initialize_sanitizer_builtins (void)
 #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
   ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
+#undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST
+#define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \
+  /* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST
 #undef DEF_SANITIZER_BUILTIN
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,		\
 
	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-08 16:43   ` [RFC] Implement Undefined Behavior Sanitizer (take 2) Marek Polacek
@ 2013-06-08 17:48     ` Marc Glisse
  2013-06-08 18:22       ` Jakub Jelinek
  2013-06-10  9:24       ` Marek Polacek
  2013-06-10 14:29     ` Joseph S. Myers
  1 sibling, 2 replies; 46+ messages in thread
From: Marc Glisse @ 2013-06-08 17:48 UTC (permalink / raw)
  To: Marek Polacek; +Cc: GCC Patches

Hello,

thanks for working on this. Just a few questions inline:

On Sat, 8 Jun 2013, Marek Polacek wrote:

> +/* Instrument division by zero and INT_MIN / -1.  */
> +
> +tree
> +ubsan_instrument_division (location_t loc, enum tree_code code,
> +			   tree op0, tree op1)
> +{
> +  tree t, tt;
> +  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> +
> +  if (TREE_CODE (TREE_TYPE (op0)) != INTEGER_TYPE
> +      || TREE_CODE (TREE_TYPE (op1)) != INTEGER_TYPE)
> +    return orig;

Type promotion means that INTEGRAL_TYPE_P wouldn't catch anything this 
doesn't?

> +  /* If we *know* that the divisor is not -1 or 0, we don't have to
> +     instrument this expression.
> +     ??? We could use decl_constant_value to cover up more cases.  */
> +  if (TREE_CODE (op1) == INTEGER_CST
> +      && integer_nonzerop (op1)
> +      && !integer_minus_onep (op1))
> +    return orig;

This is just to speed up compilation a bit, I assume, since fold would 
remove the instrumentation anyway.

> +  tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
> +		    integer_minus_one_node);

Don't we usually try to have both operands of a comparison of the same 
type?

> +  t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
> +		   TYPE_MIN_VALUE (TREE_TYPE (op0)));

I didn't see where this test was restricted to the signed case (0u/-1 
is well defined)?

> +  t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
> +  tt = build2 (EQ_EXPR, boolean_type_node,
> +	       op1, integer_zero_node);

Why not fold this one?

> +  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, t);
> +  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW);
> +  tt = build_call_expr_loc (loc, tt, 0);
> +  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
> +  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (orig), t, orig);
> +
> +  return t;
> +}
> +
> +/* Instrument left and right shifts.  */
> +
> +tree
> +ubsan_instrument_shift (location_t loc, enum tree_code code,
> +			tree op0, tree op1)
> +{
> +  tree t, tt = NULL_TREE;
> +  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> +  tree uprecm1 = build_int_cst (unsigned_type_for (TREE_TYPE (op1)),
> +			       TYPE_PRECISION (TREE_TYPE (op0)) - 1);
> +  tree precm1 = build_int_cst (TREE_TYPE (op1),
> +			       TYPE_PRECISION (TREE_TYPE (op0)) - 1);

(if we later want to extend this to vector-scalar shifts, 
element_precision will be better than TYPE_PRECISION)

Name unsigned_type_for (TREE_TYPE (op1)) and TYPE_PRECISION (TREE_TYPE 
(op0)) that are used several times?

> +  t = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op1)), op1);
> +  t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1);
[...]
> --- gcc/cp/typeck.c.mp	2013-06-07 23:01:16.779536165 +0200
> +++ gcc/cp/typeck.c	2013-06-07 23:04:47.625161351 +0200
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
> #include "convert.h"
> #include "c-family/c-common.h"
> #include "c-family/c-objc.h"
> +#include "c-family/c-ubsan.h"
> #include "params.h"
>
> static tree pfn_from_ptrmemfunc (tree);
> @@ -3891,6 +3892,12 @@ cp_build_binary_op (location_t location,
>   op0 = orig_op0;
>   op1 = orig_op1;
>
> +  /* Remember whether we're doing / or %.  */
> +  bool doing_div_or_mod = false;
> +
> +  /* Remember whether we're doing << or >>.  */
> +  bool doing_shift = false;
> +
>   if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
>       || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
>       || code == TRUTH_XOR_EXPR)
> @@ -4070,8 +4077,15 @@ cp_build_binary_op (location_t location,
> 	{
> 	  enum tree_code tcode0 = code0, tcode1 = code1;
> 	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
> +	  cop1 = maybe_constant_value (cop1);
>
> -	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
> +	  if (!processing_template_decl && tcode0 == INTEGER_TYPE
> +	      && (TREE_CODE (cop1) != INTEGER_CST
> +		  || integer_zerop (cop1)
> +		  || integer_minus_onep (cop1)))
> +	    doing_div_or_mod = true;

Aren't you already doing this test in ubsan_instrument_division?


-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-08 17:48     ` Marc Glisse
@ 2013-06-08 18:22       ` Jakub Jelinek
  2013-06-11 18:44         ` Marek Polacek
  2013-06-10  9:24       ` Marek Polacek
  1 sibling, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-08 18:22 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marek Polacek

On Sat, Jun 08, 2013 at 07:48:27PM +0200, Marc Glisse wrote:
> >+/* Instrument division by zero and INT_MIN / -1.  */
> >+
> >+tree
> >+ubsan_instrument_division (location_t loc, enum tree_code code,
> >+			   tree op0, tree op1)
> >+{
> >+  tree t, tt;
> >+  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> >+
> >+  if (TREE_CODE (TREE_TYPE (op0)) != INTEGER_TYPE
> >+      || TREE_CODE (TREE_TYPE (op1)) != INTEGER_TYPE)
> >+    return orig;
> 
> Type promotion means that INTEGRAL_TYPE_P wouldn't catch anything
> this doesn't?

This is after type promotion, and e.g. cp_build_binary_op already checks
for == INTEGER_TYPE, so I think enums/bools won't show up here.

> >+  /* If we *know* that the divisor is not -1 or 0, we don't have to
> >+     instrument this expression.
> >+     ??? We could use decl_constant_value to cover up more cases.  */
> >+  if (TREE_CODE (op1) == INTEGER_CST
> >+      && integer_nonzerop (op1)
> >+      && !integer_minus_onep (op1))
> >+    return orig;
> 
> This is just to speed up compilation a bit, I assume, since fold
> would remove the instrumentation anyway.

Yeah.
> 
> >+  tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
> >+		    integer_minus_one_node);
> 
> Don't we usually try to have both operands of a comparison of the
> same type?

Not just usually, it really has to be build_int_cst (TREE_TYPE (op1), -1).
And, more importantly, at least in cp_build_binary_op the calls need to be
moved further down in the function, at least after if (processing_template_decl)
but e.g. for division the trouble is that shorten_binary_op is performed
before actually promoting one or both operand to the result_type.  I guess
for the diagnostics which prints the types, it would be best to diagnose
using the promoted types and result_type constructed out of that, but
without shorten_binary_op etc., that is just an optimization I think.
So, maybe record the original result_type before shortening, and if
shortening changed that, convert the arguments for the instrumentation only
to the original result_type, otherwise use the conversion done normally.
For shifts this isn't a big deal, because they always use result_type of the
first operand after promotion, and the ubsan handler wants to see two types
there (the question is, does it want for the shift amount look for the
original shift count type, or the one converted to int)?

Also, perhaps it would be better if these ubsan_instrument* functions
didn't return a COMPOUND_EXPR, but instead just the lhs of that (i.e. the
actual instrumentation) and let the caller set some var to that and if that
var is non-NULL, after building the binary operation build a COMPOUND_EXPR
with lhs being the instrumentation and rhs the binary operation itself.

> 
> >+  t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
> >+		   TYPE_MIN_VALUE (TREE_TYPE (op0)));
> 
> I didn't see where this test was restricted to the signed case
> (0u/-1 is well defined)?
> 
> >+  t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
> >+  tt = build2 (EQ_EXPR, boolean_type_node,
> >+	       op1, integer_zero_node);
> 
> Why not fold this one?

Sure.  And yeah, the INT_MIN/-1 checking needs to be done for signed types
only.

> >+tree
> >+ubsan_instrument_shift (location_t loc, enum tree_code code,
> >+			tree op0, tree op1)
> >+{
> >+  tree t, tt = NULL_TREE;
> >+  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> >+  tree uprecm1 = build_int_cst (unsigned_type_for (TREE_TYPE (op1)),
> >+			       TYPE_PRECISION (TREE_TYPE (op0)) - 1);
> >+  tree precm1 = build_int_cst (TREE_TYPE (op1),
> >+			       TYPE_PRECISION (TREE_TYPE (op0)) - 1);
> 
> (if we later want to extend this to vector-scalar shifts,
> element_precision will be better than TYPE_PRECISION)
> 
> Name unsigned_type_for (TREE_TYPE (op1)) and TYPE_PRECISION
> (TREE_TYPE (op0)) that are used several times?

Yes.  Note that for vector-scalar shifts (and even vector-vector) we'd need
library support to handle that.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-08 17:48     ` Marc Glisse
  2013-06-08 18:22       ` Jakub Jelinek
@ 2013-06-10  9:24       ` Marek Polacek
  2013-06-10  9:32         ` Jakub Jelinek
  1 sibling, 1 reply; 46+ messages in thread
From: Marek Polacek @ 2013-06-10  9:24 UTC (permalink / raw)
  To: Marc Glisse; +Cc: GCC Patches

On Sat, Jun 08, 2013 at 07:48:27PM +0200, Marc Glisse wrote:
> >+  tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
> >+		    integer_minus_one_node);
> 
> Don't we usually try to have both operands of a comparison of the
> same type?

Will fix.

> >+  t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
> >+		   TYPE_MIN_VALUE (TREE_TYPE (op0)));
> 
> I didn't see where this test was restricted to the signed case
> (0u/-1 is well defined)?

Will fix.

> >+  t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
> >+  tt = build2 (EQ_EXPR, boolean_type_node,
> >+	       op1, integer_zero_node);
> 
> Why not fold this one?

Sure, will do.

> Name unsigned_type_for (TREE_TYPE (op1)) and TYPE_PRECISION
> (TREE_TYPE (op0)) that are used several times?

Yeah.

> >@@ -4070,8 +4077,15 @@ cp_build_binary_op (location_t location,
> >	{
> >	  enum tree_code tcode0 = code0, tcode1 = code1;
> >	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
> >+	  cop1 = maybe_constant_value (cop1);
> >
> >-	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
> >+	  if (!processing_template_decl && tcode0 == INTEGER_TYPE
> >+	      && (TREE_CODE (cop1) != INTEGER_CST
> >+		  || integer_zerop (cop1)
> >+		  || integer_minus_onep (cop1)))
> >+	    doing_div_or_mod = true;
> 
> Aren't you already doing this test in ubsan_instrument_division?

Yep, I'll throw it out of cp/typeck.c.

Thanks for the review!

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-10  9:24       ` Marek Polacek
@ 2013-06-10  9:32         ` Jakub Jelinek
  2013-06-10  9:49           ` Marek Polacek
  0 siblings, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-10  9:32 UTC (permalink / raw)
  To: Marek Polacek; +Cc: Marc Glisse, GCC Patches

On Mon, Jun 10, 2013 at 11:24:16AM +0200, Marek Polacek wrote:
> > >@@ -4070,8 +4077,15 @@ cp_build_binary_op (location_t location,
> > >	{
> > >	  enum tree_code tcode0 = code0, tcode1 = code1;
> > >	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
> > >+	  cop1 = maybe_constant_value (cop1);
> > >
> > >-	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
> > >+	  if (!processing_template_decl && tcode0 == INTEGER_TYPE
> > >+	      && (TREE_CODE (cop1) != INTEGER_CST
> > >+		  || integer_zerop (cop1)
> > >+		  || integer_minus_onep (cop1)))
> > >+	    doing_div_or_mod = true;
> > 
> > Aren't you already doing this test in ubsan_instrument_division?
> 
> Yep, I'll throw it out of cp/typeck.c.

Note that the above one actually performs more than what you do in
ubsan_instrument_division, because it works on maybe_constant_value result.
So, perhaps typeck.c should ensure that the ubsan functions are always
called with arguments passed through
maybe_constant_value (fold_non_dependent_expr_sfinae (opX, tf_none)).

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-10  9:32         ` Jakub Jelinek
@ 2013-06-10  9:49           ` Marek Polacek
  0 siblings, 0 replies; 46+ messages in thread
From: Marek Polacek @ 2013-06-10  9:49 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Marc Glisse, GCC Patches

On Mon, Jun 10, 2013 at 11:32:22AM +0200, Jakub Jelinek wrote:
> On Mon, Jun 10, 2013 at 11:24:16AM +0200, Marek Polacek wrote:
> > > >@@ -4070,8 +4077,15 @@ cp_build_binary_op (location_t location,
> > > >	{
> > > >	  enum tree_code tcode0 = code0, tcode1 = code1;
> > > >	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
> > > >+	  cop1 = maybe_constant_value (cop1);
> > > >
> > > >-	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
> > > >+	  if (!processing_template_decl && tcode0 == INTEGER_TYPE
> > > >+	      && (TREE_CODE (cop1) != INTEGER_CST
> > > >+		  || integer_zerop (cop1)
> > > >+		  || integer_minus_onep (cop1)))
> > > >+	    doing_div_or_mod = true;
> > > 
> > > Aren't you already doing this test in ubsan_instrument_division?
> > 
> > Yep, I'll throw it out of cp/typeck.c.
> 
> Note that the above one actually performs more than what you do in
> ubsan_instrument_division, because it works on maybe_constant_value result.
> So, perhaps typeck.c should ensure that the ubsan functions are always
> called with arguments passed through
> maybe_constant_value (fold_non_dependent_expr_sfinae (opX, tf_none)).

Ah, ok, will add it there.  Thanks.

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-08 16:43   ` [RFC] Implement Undefined Behavior Sanitizer (take 2) Marek Polacek
  2013-06-08 17:48     ` Marc Glisse
@ 2013-06-10 14:29     ` Joseph S. Myers
  2013-06-11  1:48       ` Marek Polacek
  1 sibling, 1 reply; 46+ messages in thread
From: Joseph S. Myers @ 2013-06-10 14:29 UTC (permalink / raw)
  To: Marek Polacek; +Cc: Jakub Jelinek, GCC Patches

On Sat, 8 Jun 2013, Marek Polacek wrote:

> +  if (code == LSHIFT_EXPR
> +      && !TYPE_UNSIGNED (TREE_TYPE (op0))
> +      && (flag_isoc99 || flag_isoc11))

flag_isoc11 implies flag_isoc99, you only need to check flag_isoc99 here.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-10 14:29     ` Joseph S. Myers
@ 2013-06-11  1:48       ` Marek Polacek
  0 siblings, 0 replies; 46+ messages in thread
From: Marek Polacek @ 2013-06-11  1:48 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Jakub Jelinek, GCC Patches

On Mon, Jun 10, 2013 at 02:29:25PM +0000, Joseph S. Myers wrote:
> On Sat, 8 Jun 2013, Marek Polacek wrote:
> 
> > +  if (code == LSHIFT_EXPR
> > +      && !TYPE_UNSIGNED (TREE_TYPE (op0))
> > +      && (flag_isoc99 || flag_isoc11))
> 
> flag_isoc11 implies flag_isoc99, you only need to check flag_isoc99 here.

Ah, sure.  Thanks,

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-08 18:22       ` Jakub Jelinek
@ 2013-06-11 18:44         ` Marek Polacek
  2013-06-11 19:14           ` Marc Glisse
  0 siblings, 1 reply; 46+ messages in thread
From: Marek Polacek @ 2013-06-11 18:44 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

On Sat, Jun 08, 2013 at 08:22:33PM +0200, Jakub Jelinek wrote:
> > >+  tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
> > >+		    integer_minus_one_node);
> > 
> > Don't we usually try to have both operands of a comparison of the
> > same type?
> 
> Not just usually, it really has to be build_int_cst (TREE_TYPE (op1), -1).
> And, more importantly, at least in cp_build_binary_op the calls need to be
> moved further down in the function, at least after if (processing_template_decl)
> but e.g. for division the trouble is that shorten_binary_op is performed
> before actually promoting one or both operand to the result_type.  I guess
> for the diagnostics which prints the types, it would be best to diagnose
> using the promoted types and result_type constructed out of that, but
> without shorten_binary_op etc., that is just an optimization I think.
> So, maybe record the original result_type before shortening, and if
> shortening changed that, convert the arguments for the instrumentation only
> to the original result_type, otherwise use the conversion done normally.
> For shifts this isn't a big deal, because they always use result_type of the
> first operand after promotion, and the ubsan handler wants to see two types
> there (the question is, does it want for the shift amount look for the
> original shift count type, or the one converted to int)?
> 
> Also, perhaps it would be better if these ubsan_instrument* functions
> didn't return a COMPOUND_EXPR, but instead just the lhs of that (i.e. the
> actual instrumentation) and let the caller set some var to that and if that
> var is non-NULL, after building the binary operation build a COMPOUND_EXPR
> with lhs being the instrumentation and rhs the binary operation itself.

All should be resolved.

> > >+  t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
> > >+		   TYPE_MIN_VALUE (TREE_TYPE (op0)));
> > 
> > I didn't see where this test was restricted to the signed case
> > (0u/-1 is well defined)?
> > 
> > >+  t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
> > >+  tt = build2 (EQ_EXPR, boolean_type_node,
> > >+	       op1, integer_zero_node);
> > 
> > Why not fold this one?
> 
> Sure.  And yeah, the INT_MIN/-1 checking needs to be done for signed types
> only.

Done.

> > >+tree
> > >+ubsan_instrument_shift (location_t loc, enum tree_code code,
> > >+			tree op0, tree op1)
> > >+{
> > >+  tree t, tt = NULL_TREE;
> > >+  tree orig = build2 (code, TREE_TYPE (op0), op0, op1);
> > >+  tree uprecm1 = build_int_cst (unsigned_type_for (TREE_TYPE (op1)),
> > >+			       TYPE_PRECISION (TREE_TYPE (op0)) - 1);
> > >+  tree precm1 = build_int_cst (TREE_TYPE (op1),
> > >+			       TYPE_PRECISION (TREE_TYPE (op0)) - 1);
> > 
> > (if we later want to extend this to vector-scalar shifts,
> > element_precision will be better than TYPE_PRECISION)
> > 
> > Name unsigned_type_for (TREE_TYPE (op1)) and TYPE_PRECISION
> > (TREE_TYPE (op0)) that are used several times?

Done.

Thanks for the review.
Here's another version, hopefully all issues are fixed.  During
the rewriting I had to fix a few ICEs, so this patch took more time.
I guess I might've misunderstood the cp_convert part, so sorry if
I did it wrong.

Lightly tested, I'm really starting to miss the ubsan testsuite ;).

Regtested on x86_64-linux.

2013-06-11  Marek Polacek  <polacek@redhat.com>

	* Makefile.in: Add ubsan.c.
	* common.opt: Add -fsanitize=undefined option.
	* doc/invoke.texi: Document the new flag.
	* sanitizer.def (DEF_SANITIZER_BUILTIN): Define.
	* builtin-attrs.def (ATTR_COLD): Define.
	* asan.c (initialize_sanitizer_builtins): Build
	BT_FN_VOID_PTR_PTR_PTR.
	* builtins.def (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
	BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS): Define.

c-family/
	* c-ubsan.c: New file.
	* c-ubsan.h: New file.

cp/
	* typeck.c (cp_build_binary_op): Add division by zero and shift
	instrumentation.

c/
	* c-typeck.c (build_binary_op): Add division by zero and shift
	instrumentation.


--- gcc/c-family/c-ubsan.c.mp	2013-06-11 19:51:55.555492466 +0200
+++ gcc/c-family/c-ubsan.c	2013-06-11 19:29:16.925551907 +0200
@@ -0,0 +1,126 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "c-family/c-common.h"
+#include "c-family/c-ubsan.h"
+
+/* Instrument division by zero and INT_MIN / -1.  If not instrumenting,
+   return NULL_TREE.  */
+
+tree
+ubsan_instrument_division (location_t loc, tree op0, tree op1)
+{
+  tree t, tt;
+  tree type0 = TREE_TYPE (op0);
+  tree type1 = TREE_TYPE (op1);
+  tree type1_zero_cst = build_int_cst (type1, 0);
+
+  if (TREE_CODE (type0) != INTEGER_TYPE
+      || TREE_CODE (type1) != INTEGER_TYPE)
+    return NULL_TREE;
+
+  /* If we *know* that the divisor is not -1 or 0, we don't have to
+     instrument this expression.
+     ??? We could use decl_constant_value to cover up more cases.  */
+  if (TREE_CODE (op1) == INTEGER_CST
+      && integer_nonzerop (op1)
+      && !integer_minus_onep (op1))
+    return NULL_TREE;
+
+  /* We check INT_MIN / -1 only for signed types.  */
+  if (!TYPE_UNSIGNED (type0) && !TYPE_UNSIGNED (type1))
+    {
+      tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
+			build_int_cst (type1, -1));
+      t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
+		       TYPE_MIN_VALUE (type0));
+      t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
+    }
+  else
+    t = type1_zero_cst;
+  tt = fold_build2 (EQ_EXPR, boolean_type_node,
+		    op1, type1_zero_cst);
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, t);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+
+  return t;
+}
+
+/* Instrument left and right shifts.  If not instrumenting, return
+   NULL_TREE.  */
+
+tree
+ubsan_instrument_shift (location_t loc, enum tree_code code,
+			tree op0, tree op1)
+{
+  tree t, tt = NULL_TREE;
+  tree op1_utype = unsigned_type_for (TREE_TYPE (op1));
+  HOST_WIDE_INT op0_prec = TYPE_PRECISION (TREE_TYPE (op0));
+  tree uprecm1 = build_int_cst (op1_utype, op0_prec - 1);
+  tree precm1 = build_int_cst (TREE_TYPE (op1), op0_prec - 1);
+
+  t = fold_convert_loc (loc, op1_utype, op1);
+  t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1);
+
+  /* For signed x << y, in C99/C11, the following:
+     (unsigned) x >> (precm1 - y)
+     if non-zero, is undefined.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && flag_isoc99)
+    {
+      tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
+      tt = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op0)), op0);
+      tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x);
+      tt = fold_build2 (NE_EXPR, boolean_type_node, tt,
+			build_int_cst (TREE_TYPE (tt), 0));
+    }
+
+  /* For signed x << y, in C++11/C++14, the following:
+     x < 0 || ((unsigned) x >> (precm1 - y))
+     if > 1, is undefined.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && (cxx_dialect == cxx11 || cxx_dialect == cxx1y))
+    {
+      tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
+      tt = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op0)), op0);
+      tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x);
+      tt = fold_build2 (GT_EXPR, boolean_type_node, tt,
+			build_int_cst (TREE_TYPE (tt), 1));
+      x = fold_build2 (LT_EXPR, boolean_type_node, op0,
+		       build_int_cst (TREE_TYPE (op0), 0));
+      tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, x, tt);
+    }
+
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t,
+		   tt ? tt : integer_zero_node);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+
+  return t;
+}
--- gcc/c-family/c-ubsan.h.mp	2013-06-11 19:51:50.616457500 +0200
+++ gcc/c-family/c-ubsan.h	2013-06-11 16:51:38.297942275 +0200
@@ -0,0 +1,27 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_UBSAN_H
+#define GCC_UBSAN_H
+
+extern tree ubsan_instrument_division (location_t, tree, tree);
+extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
+
+#endif  /* GCC_UBSAN_H  */
--- gcc/sanitizer.def.mp	2013-06-11 19:51:43.781408808 +0200
+++ gcc/sanitizer.def	2013-06-11 19:53:37.768224970 +0200
@@ -283,3 +283,13 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOM
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE,
 		      "__tsan_atomic_signal_fence",
 		      BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
+
+/* Undefined Behavior Sanitizer */
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
+		      "__ubsan_handle_divrem_overflow",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS,
+		      "__ubsan_handle_shift_out_of_bounds",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
--- gcc/builtins.def.mp	2013-06-11 19:51:43.790408877 +0200
+++ gcc/builtins.def	2013-06-11 19:53:37.721224606 +0200
@@ -155,7 +155,7 @@ along with GCC; see the file COPYING3.
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
 	       true, true, true, ATTRS, true, \
-	       (flag_asan || flag_tsan))
+	       (flag_asan || flag_tsan || flag_ubsan))
 
 #undef DEF_CILKPLUS_BUILTIN
 #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
--- gcc/Makefile.in.mp	2013-06-11 19:51:43.780408801 +0200
+++ gcc/Makefile.in	2013-06-11 19:53:37.710224521 +0200
@@ -1150,7 +1150,7 @@ C_COMMON_OBJS = c-family/c-common.o c-fa
   c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o \
   c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \
   c-family/c-semantics.o c-family/c-ada-spec.o tree-mudflap.o \
-  c-family/array-notation-common.o
+  c-family/array-notation-common.o c-family/c-ubsan.o
 
 # Language-independent object files.
 # We put the insn-*.o files first so that a parallel make will build
@@ -2021,6 +2021,9 @@ c-family/array-notation-common.o : c-fam
 c-family/stub-objc.o : c-family/stub-objc.c $(CONFIG_H) $(SYSTEM_H) \
 	coretypes.h $(TREE_H) $(C_COMMON_H) c-family/c-objc.h
 
+c-family/c-ubsan.o : c-family/c-ubsan.c $(CONFIG_H) $(SYSTEM_H) \
+	coretypes.h $(TREE_H) $(C_COMMON_H) c-family/c-ubsan.h
+
 default-c.o: config/default-c.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
   $(C_TARGET_H) $(C_TARGET_DEF_H)
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
--- gcc/doc/invoke.texi.mp	2013-06-11 19:51:43.784408831 +0200
+++ gcc/doc/invoke.texi	2013-06-11 19:53:37.761224914 +0200
@@ -5143,6 +5143,11 @@ Memory access instructions will be instr
 data race bugs.
 See @uref{http://code.google.com/p/data-race-test/wiki/ThreadSanitizer} for more details.
 
+@item -fsanitize=undefined
+Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector
+Various computations will be instrumented to detect
+undefined behavior, e.g.@: division by zero or various overflows.
+
 @item -fdump-final-insns@r{[}=@var{file}@r{]}
 @opindex fdump-final-insns
 Dump the final internal representation (RTL) to @var{file}.  If the
--- gcc/cp/typeck.c.mp	2013-06-11 19:51:43.785408839 +0200
+++ gcc/cp/typeck.c	2013-06-11 19:53:37.747224808 +0200
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
 #include "convert.h"
 #include "c-family/c-common.h"
 #include "c-family/c-objc.h"
+#include "c-family/c-ubsan.h"
 #include "params.h"
 
 static tree pfn_from_ptrmemfunc (tree);
@@ -3867,6 +3868,7 @@ cp_build_binary_op (location_t location,
   tree final_type = 0;
 
   tree result;
+  tree orig_type = NULL;
 
   /* Nonzero if this is an operation like MIN or MAX which can
      safely be computed in short if both args are promoted shorts.
@@ -3891,6 +3893,15 @@ cp_build_binary_op (location_t location,
   op0 = orig_op0;
   op1 = orig_op1;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
+  /* Tree holding instrumentation expression.  */
+  tree instrument_expr = NULL;
+
   if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
       || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
       || code == TRUTH_XOR_EXPR)
@@ -4070,8 +4081,12 @@ cp_build_binary_op (location_t location,
 	{
 	  enum tree_code tcode0 = code0, tcode1 = code1;
 	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	  cop1 = maybe_constant_value (cop1);
 
-	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	  if (!processing_template_decl && tcode0 == INTEGER_TYPE)
+	    doing_div_or_mod = true;
+
+	  warn_for_div_by_zero (location, cop1);
 
 	  if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE)
 	    tcode0 = TREE_CODE (TREE_TYPE (TREE_TYPE (op0)));
@@ -4109,8 +4124,11 @@ cp_build_binary_op (location_t location,
     case FLOOR_MOD_EXPR:
       {
 	tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	cop1 = maybe_constant_value (cop1);
 
-	warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	if (!processing_template_decl && code0 == INTEGER_TYPE)
+	  doing_div_or_mod = true;
+	warn_for_div_by_zero (location, cop1);
       }
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -4164,6 +4182,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4211,6 +4230,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4765,8 +4785,9 @@ cp_build_binary_op (location_t location,
 
       if (shorten && none_complex)
 	{
+	  orig_type = result_type;
 	  final_type = result_type;
-	  result_type = shorten_binary_op (result_type, op0, op1, 
+	  result_type = shorten_binary_op (result_type, op0, op1,
 					   shorten == -1);
 	}
 
@@ -4814,6 +4835,31 @@ cp_build_binary_op (location_t location,
 	}
     }
 
+  if (flag_ubsan && doing_div_or_mod && !processing_template_decl)
+    {
+      op0 = maybe_constant_value (fold_non_dependent_expr_sfinae (op0,
+								  tf_none));
+      op1 = maybe_constant_value (fold_non_dependent_expr_sfinae (op1,
+								  tf_none));
+      /* For diagnostics we want to use the promoted types without
+	 shorten_binary_op.  So convert the arguments to the
+	 original result_type.  */
+      if (orig_type != NULL && result_type != orig_type)
+        {
+	  op0 = cp_convert (orig_type, op0, complain);
+	  op1 = cp_convert (orig_type, op1, complain);
+	}
+      instrument_expr = ubsan_instrument_division (location, op0, op1);
+    }
+  else if (flag_ubsan && doing_shift && !processing_template_decl)
+    {
+      op0 = maybe_constant_value (fold_non_dependent_expr_sfinae (op0,
+								  tf_none));
+      op1 = maybe_constant_value (fold_non_dependent_expr_sfinae (op1,
+								  tf_none));
+      instrument_expr = ubsan_instrument_shift (location, code, op0, op1);
+    }
+
   /* If CONVERTED is zero, both args will be converted to type RESULT_TYPE.
      Then the expression will be built.
      It will be given type FINAL_TYPE if that is nonzero;
@@ -4842,6 +4888,10 @@ cp_build_binary_op (location_t location,
       && !TREE_OVERFLOW_P (op1))
     overflow_warning (location, result);
 
+  if (flag_ubsan && instrument_expr != NULL)
+    result = fold_build2 (COMPOUND_EXPR, TREE_TYPE (result),
+			  instrument_expr, result);
+
   return result;
 }
 \f
--- gcc/common.opt.mp	2013-06-11 19:51:43.787408855 +0200
+++ gcc/common.opt	2013-06-11 19:53:37.742224768 +0200
@@ -858,6 +858,10 @@ fsanitize=thread
 Common Report Var(flag_tsan)
 Enable ThreadSanitizer, a data race detector
 
+fsanitize=undefined
+Common Report Var(flag_ubsan)
+Enable UndefinedBehaviorSanitizer, an undefined behavior detector
+
 fasynchronous-unwind-tables
 Common Report Var(flag_asynchronous_unwind_tables) Optimization
 Generate unwind tables that are exact at each instruction boundary
--- gcc/builtin-attrs.def.mp	2013-06-11 19:51:43.791408885 +0200
+++ gcc/builtin-attrs.def	2013-06-11 19:53:37.717224576 +0200
@@ -83,6 +83,7 @@ DEF_LIST_INT_INT (5,6)
 #undef DEF_LIST_INT_INT
 
 /* Construct trees for identifiers.  */
+DEF_ATTR_IDENT (ATTR_COLD, "cold")
 DEF_ATTR_IDENT (ATTR_CONST, "const")
 DEF_ATTR_IDENT (ATTR_FORMAT, "format")
 DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
@@ -130,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHRO
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST, ATTR_COLD,\
+			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
 			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC,	\
--- gcc/c/c-typeck.c.mp	2013-06-11 19:51:43.789408869 +0200
+++ gcc/c/c-typeck.c	2013-06-11 19:53:37.737224731 +0200
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.
 #include "gimple.h"
 #include "c-family/c-objc.h"
 #include "c-family/c-common.h"
+#include "c-family/c-ubsan.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
    diagnostic messages in convert_for_assignment.  */
@@ -9527,6 +9528,15 @@ build_binary_op (location_t location, en
      operands to truth-values.  */
   bool boolean_op = false;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
+  /* Tree holding instrumentation expression.  */
+  tree instrument_expr = NULL;
+
   if (location == UNKNOWN_LOCATION)
     location = input_location;
 
@@ -9728,6 +9738,7 @@ build_binary_op (location_t location, en
     case FLOOR_DIV_EXPR:
     case ROUND_DIV_EXPR:
     case EXACT_DIV_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
@@ -9775,6 +9786,7 @@ build_binary_op (location_t location, en
 
     case TRUNC_MOD_EXPR:
     case FLOOR_MOD_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -9873,6 +9885,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -9925,6 +9938,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -10209,6 +10223,11 @@ build_binary_op (location_t location, en
       return error_mark_node;
     }
 
+  if (flag_ubsan && doing_div_or_mod)
+    instrument_expr = ubsan_instrument_division (location, op0, op1);
+  else if (flag_ubsan && doing_shift)
+    instrument_expr = ubsan_instrument_shift (location, code, op0, op1);
+
   if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == COMPLEX_TYPE
        || code0 == FIXED_POINT_TYPE || code0 == VECTOR_TYPE)
       &&
@@ -10492,6 +10511,11 @@ build_binary_op (location_t location, en
   if (semantic_result_type)
     ret = build1 (EXCESS_PRECISION_EXPR, semantic_result_type, ret);
   protected_set_expr_location (ret, location);
+
+  if (flag_ubsan && instrument_expr != NULL)
+    ret = fold_build2 (COMPOUND_EXPR, TREE_TYPE (ret),
+		       instrument_expr, ret);
+
   return ret;
 }
 
--- gcc/asan.c.mp	2013-06-11 19:51:43.793408901 +0200
+++ gcc/asan.c	2013-06-11 19:53:37.713224545 +0200
@@ -2034,6 +2034,9 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_VOID = build_function_type_list (void_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR
     = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
+  tree BT_FN_VOID_PTR_PTR_PTR
+    = build_function_type_list (void_type_node, ptr_type_node,
+				ptr_type_node, ptr_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR_PTRMODE
     = build_function_type_list (void_type_node, ptr_type_node,
 				build_nonstandard_integer_type (POINTER_SIZE,
@@ -2099,6 +2102,9 @@ initialize_sanitizer_builtins (void)
 #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
   ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
+#undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST
+#define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \
+  /* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST
 #undef DEF_SANITIZER_BUILTIN
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,		\

   	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 18:44         ` Marek Polacek
@ 2013-06-11 19:14           ` Marc Glisse
  2013-06-11 19:44             ` Marek Polacek
  0 siblings, 1 reply; 46+ messages in thread
From: Marc Glisse @ 2013-06-11 19:14 UTC (permalink / raw)
  To: Marek Polacek; +Cc: Jakub Jelinek, GCC Patches

Hello,

couple comments (not a true review)

On Tue, 11 Jun 2013, Marek Polacek wrote:

> +tree
> +ubsan_instrument_division (location_t loc, tree op0, tree op1)
> +{
> +  tree t, tt;
> +  tree type0 = TREE_TYPE (op0);
> +  tree type1 = TREE_TYPE (op1);

Can the 2 types be different? I thought divisions had homogeneous 
arguments, and the instrumentation was done late enough to avoid any 
potential issue, but maybe not...

> +  tree type1_zero_cst = build_int_cst (type1, 0);

It is a bit funny to do that before the following test ;-)

> +  if (TREE_CODE (type0) != INTEGER_TYPE
> +      || TREE_CODE (type1) != INTEGER_TYPE)
> +    return NULL_TREE;
> +
> +  /* If we *know* that the divisor is not -1 or 0, we don't have to
> +     instrument this expression.
> +     ??? We could use decl_constant_value to cover up more cases.  */
> +  if (TREE_CODE (op1) == INTEGER_CST
> +      && integer_nonzerop (op1)
> +      && !integer_minus_onep (op1))
> +    return NULL_TREE;
> +
> +  /* We check INT_MIN / -1 only for signed types.  */
> +  if (!TYPE_UNSIGNED (type0) && !TYPE_UNSIGNED (type1))
> +    {
> +      tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
> +			build_int_cst (type1, -1));
> +      t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
> +		       TYPE_MIN_VALUE (type0));
> +      t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
> +    }
> +  else
> +    t = type1_zero_cst;
> +  tt = fold_build2 (EQ_EXPR, boolean_type_node,
> +		    op1, type1_zero_cst);
> +  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, t);

If you wrote the comparison with 0 first, you could put the OR in the 
signed branch instead of relying on folding |0, no?

> +  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW);
> +  tt = build_call_expr_loc (loc, tt, 0);
> +  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
> +
> +  return t;
> +}

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 19:14           ` Marc Glisse
@ 2013-06-11 19:44             ` Marek Polacek
  2013-06-11 20:09               ` Jakub Jelinek
  0 siblings, 1 reply; 46+ messages in thread
From: Marek Polacek @ 2013-06-11 19:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek

Hi!

On Tue, Jun 11, 2013 at 09:14:36PM +0200, Marc Glisse wrote:
> Hello,
> 
> couple comments (not a true review)

Thanks anyway ;).

> On Tue, 11 Jun 2013, Marek Polacek wrote:
> 
> >+tree
> >+ubsan_instrument_division (location_t loc, tree op0, tree op1)
> >+{
> >+  tree t, tt;
> >+  tree type0 = TREE_TYPE (op0);
> >+  tree type1 = TREE_TYPE (op1);
> 
> Can the 2 types be different? I thought divisions had homogeneous
> arguments, and the instrumentation was done late enough to avoid any
> potential issue, but maybe not...

Yeah, they can; they only have to be of arithmetic type.

> >+  tree type1_zero_cst = build_int_cst (type1, 0);
> 
> It is a bit funny to do that before the following test ;-)

Perhaps, yes.  Moved it...

> >+  if (TREE_CODE (type0) != INTEGER_TYPE
> >+      || TREE_CODE (type1) != INTEGER_TYPE)
> >+    return NULL_TREE;

..here.

> >+  /* We check INT_MIN / -1 only for signed types.  */
> >+  if (!TYPE_UNSIGNED (type0) && !TYPE_UNSIGNED (type1))
> >+    {
> >+      tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
> >+			build_int_cst (type1, -1));
> >+      t = fold_build2 (EQ_EXPR, boolean_type_node, op0,
> >+		       TYPE_MIN_VALUE (type0));
> >+      t = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, t, tt);
> >+    }
> >+  else
> >+    t = type1_zero_cst;
> >+  tt = fold_build2 (EQ_EXPR, boolean_type_node,
> >+		    op1, type1_zero_cst);
> >+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, tt, t);
> 
> If you wrote the comparison with 0 first, you could put the OR in
> the signed branch instead of relying on folding |0, no?

Duh, indeed.  Will adjust.  Thanks!

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 19:44             ` Marek Polacek
@ 2013-06-11 20:09               ` Jakub Jelinek
  2013-06-11 20:20                 ` Marek Polacek
  0 siblings, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-11 20:09 UTC (permalink / raw)
  To: Marek Polacek; +Cc: gcc-patches

On Tue, Jun 11, 2013 at 09:44:40PM +0200, Marek Polacek wrote:
> > >+  tree type0 = TREE_TYPE (op0);
> > >+  tree type1 = TREE_TYPE (op1);
> > 
> > Can the 2 types be different? I thought divisions had homogeneous
> > arguments, and the instrumentation was done late enough to avoid any
> > potential issue, but maybe not...
> 
> Yeah, they can; they only have to be of arithmetic type.

Nope, if this is after conversion to result_type (resp. orig_type),
then they both have result_type resp. orig_type type.

Shift is different, there the two arguments can have different type.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 20:09               ` Jakub Jelinek
@ 2013-06-11 20:20                 ` Marek Polacek
  2013-06-11 20:33                   ` Jakub Jelinek
  0 siblings, 1 reply; 46+ messages in thread
From: Marek Polacek @ 2013-06-11 20:20 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Tue, Jun 11, 2013 at 10:09:00PM +0200, Jakub Jelinek wrote:
> On Tue, Jun 11, 2013 at 09:44:40PM +0200, Marek Polacek wrote:
> > > >+  tree type0 = TREE_TYPE (op0);
> > > >+  tree type1 = TREE_TYPE (op1);
> > > 
> > > Can the 2 types be different? I thought divisions had homogeneous
> > > arguments, and the instrumentation was done late enough to avoid any
> > > potential issue, but maybe not...
> > 
> > Yeah, they can; they only have to be of arithmetic type.
> 
> Nope, if this is after conversion to result_type (resp. orig_type),
> then they both have result_type resp. orig_type type.
> 
> Shift is different, there the two arguments can have different type.

But currently I'm cp_convert-ing the arguments to orig_type only if
we were performing the shortening which changed the result_type.
If, with current patch, I put debug_tree (type0); debug_tree (type1);
into ubsan_instrument_division, I see different types (int vs.
unsigned int etc.).

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 20:20                 ` Marek Polacek
@ 2013-06-11 20:33                   ` Jakub Jelinek
  2013-06-11 20:40                     ` Marek Polacek
  0 siblings, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-11 20:33 UTC (permalink / raw)
  To: Marek Polacek; +Cc: gcc-patches

On Tue, Jun 11, 2013 at 10:20:24PM +0200, Marek Polacek wrote:
> On Tue, Jun 11, 2013 at 10:09:00PM +0200, Jakub Jelinek wrote:
> > On Tue, Jun 11, 2013 at 09:44:40PM +0200, Marek Polacek wrote:
> > > > >+  tree type0 = TREE_TYPE (op0);
> > > > >+  tree type1 = TREE_TYPE (op1);
> > > > 
> > > > Can the 2 types be different? I thought divisions had homogeneous
> > > > arguments, and the instrumentation was done late enough to avoid any
> > > > potential issue, but maybe not...
> > > 
> > > Yeah, they can; they only have to be of arithmetic type.
> > 
> > Nope, if this is after conversion to result_type (resp. orig_type),
> > then they both have result_type resp. orig_type type.
> > 
> > Shift is different, there the two arguments can have different type.
> 
> But currently I'm cp_convert-ing the arguments to orig_type only if
> we were performing the shortening which changed the result_type.
> If, with current patch, I put debug_tree (type0); debug_tree (type1);
> into ubsan_instrument_division, I see different types (int vs.
> unsigned int etc.).

That means you probably should move the function call down in
cp_build_binary_op (resp. C counterpart), after the arguments are converted
to result_type?

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 20:33                   ` Jakub Jelinek
@ 2013-06-11 20:40                     ` Marek Polacek
  2013-06-11 20:44                       ` Jakub Jelinek
  0 siblings, 1 reply; 46+ messages in thread
From: Marek Polacek @ 2013-06-11 20:40 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Tue, Jun 11, 2013 at 10:33:25PM +0200, Jakub Jelinek wrote:
> That means you probably should move the function call down in
> cp_build_binary_op (resp. C counterpart), after the arguments are converted
> to result_type?

Ok, certainly.  Seems the arguments are converted here:

  if (! converted)
    {
      if (TREE_TYPE (op0) != result_type)
	op0 = cp_convert_and_check (result_type, op0, complain);
      if (TREE_TYPE (op1) != result_type)
	op1 = cp_convert_and_check (result_type, op1, complain);

      if (op0 == error_mark_node || op1 == error_mark_node)
	return error_mark_node;
    }

I'll move the instrumentation after the hunk above.  And then 
in ubsan_instrument_division I might want to have just 
tree type = TREE_TYPE (op0);, maybe together with an assert like
gcc_assert (TREE_TYPE (op0) == TREE_TYPE (op1).

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 20:40                     ` Marek Polacek
@ 2013-06-11 20:44                       ` Jakub Jelinek
  2013-06-11 20:52                         ` Marek Polacek
  2013-06-12 13:48                         ` Marek Polacek
  0 siblings, 2 replies; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-11 20:44 UTC (permalink / raw)
  To: Marek Polacek; +Cc: gcc-patches

On Tue, Jun 11, 2013 at 10:40:12PM +0200, Marek Polacek wrote:
> On Tue, Jun 11, 2013 at 10:33:25PM +0200, Jakub Jelinek wrote:
> > That means you probably should move the function call down in
> > cp_build_binary_op (resp. C counterpart), after the arguments are converted
> > to result_type?
> 
> Ok, certainly.  Seems the arguments are converted here:
> 
>   if (! converted)
>     {
>       if (TREE_TYPE (op0) != result_type)
> 	op0 = cp_convert_and_check (result_type, op0, complain);
>       if (TREE_TYPE (op1) != result_type)
> 	op1 = cp_convert_and_check (result_type, op1, complain);
> 
>       if (op0 == error_mark_node || op1 == error_mark_node)
> 	return error_mark_node;
>     }
> 
> I'll move the instrumentation after the hunk above.  And then 
> in ubsan_instrument_division I might want to have just 
> tree type = TREE_TYPE (op0);, maybe together with an assert like
> gcc_assert (TREE_TYPE (op0) == TREE_TYPE (op1).

There is another thing to solve BTW, op0 and/or op1 might have side-effects,
if you are going to evaluate them more than once, they need to be surrounded
into cp_save_expr resp. c_save_expr.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 20:44                       ` Jakub Jelinek
@ 2013-06-11 20:52                         ` Marek Polacek
  2013-06-12 13:48                         ` Marek Polacek
  1 sibling, 0 replies; 46+ messages in thread
From: Marek Polacek @ 2013-06-11 20:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Tue, Jun 11, 2013 at 10:44:12PM +0200, Jakub Jelinek wrote:
> There is another thing to solve BTW, op0 and/or op1 might have side-effects,
> if you are going to evaluate them more than once, they need to be surrounded
> into cp_save_expr resp. c_save_expr.

I see.  Thanks for the notice.

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-11 20:44                       ` Jakub Jelinek
  2013-06-11 20:52                         ` Marek Polacek
@ 2013-06-12 13:48                         ` Marek Polacek
  2013-06-12 13:52                           ` Jakub Jelinek
  1 sibling, 1 reply; 46+ messages in thread
From: Marek Polacek @ 2013-06-12 13:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Tue, Jun 11, 2013 at 10:44:12PM +0200, Jakub Jelinek wrote:
> There is another thing to solve BTW, op0 and/or op1 might have side-effects,
> if you are going to evaluate them more than once, they need to be surrounded
> into cp_save_expr resp. c_save_expr.

There's that unpleasant thing that cp_save_expr is declared in
cp/cp-tree.h, but we don't want to include cp/*.h or c/*.h files
in c-family/c-ubsan.c.  Should I use save_expr from tree.c instead?
I seem to recall that that isn't the best thing to do...

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-12 13:48                         ` Marek Polacek
@ 2013-06-12 13:52                           ` Jakub Jelinek
  2013-06-12 15:17                             ` Marek Polacek
  0 siblings, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-12 13:52 UTC (permalink / raw)
  To: Marek Polacek; +Cc: gcc-patches

On Wed, Jun 12, 2013 at 03:48:24PM +0200, Marek Polacek wrote:
> On Tue, Jun 11, 2013 at 10:44:12PM +0200, Jakub Jelinek wrote:
> > There is another thing to solve BTW, op0 and/or op1 might have side-effects,
> > if you are going to evaluate them more than once, they need to be surrounded
> > into cp_save_expr resp. c_save_expr.
> 
> There's that unpleasant thing that cp_save_expr is declared in
> cp/cp-tree.h, but we don't want to include cp/*.h or c/*.h files
> in c-family/c-ubsan.c.  Should I use save_expr from tree.c instead?
> I seem to recall that that isn't the best thing to do...

No, you really need to use the cp_save_expr/c_save_expr, especially for
C it e.g. fully folds etc.  You want to call that in
cp_build_binary_op etc., also because you want both the instrument_expr
itself, but also the original binary expression to use the SAVE_EXPRs if
they are created.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-12 13:52                           ` Jakub Jelinek
@ 2013-06-12 15:17                             ` Marek Polacek
  2013-06-12 15:29                               ` Jakub Jelinek
  0 siblings, 1 reply; 46+ messages in thread
From: Marek Polacek @ 2013-06-12 15:17 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Marc Glisse, Jason Merrill

On Wed, Jun 12, 2013 at 03:52:08PM +0200, Jakub Jelinek wrote:
> No, you really need to use the cp_save_expr/c_save_expr, especially for
> C it e.g. fully folds etc.  You want to call that in
> cp_build_binary_op etc., also because you want both the instrument_expr
> itself, but also the original binary expression to use the SAVE_EXPRs if
> they are created.

I see.  Here's somewhat tweaked version; it uses
c_save_expr/cp_save_expr + contains a few fixes suggested by Marc.
How does it look now?  Jason, does the cp/typeck.c part look sane?
Thanks.

2013-06-12  Marek Polacek  <polacek@redhat.com>

	* Makefile.in: Add ubsan.c.
	* common.opt: Add -fsanitize=undefined option.
	* doc/invoke.texi: Document the new flag.
	* sanitizer.def (DEF_SANITIZER_BUILTIN): Define.
	* builtin-attrs.def (ATTR_COLD): Define.
	* asan.c (initialize_sanitizer_builtins): Build
	BT_FN_VOID_PTR_PTR_PTR.
	* builtins.def (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
	BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS): Define.

c-family/
	* c-ubsan.c: New file.
	* c-ubsan.h: New file.

cp/
	* typeck.c (cp_build_binary_op): Add division by zero and shift
	instrumentation.

c/
	* c-typeck.c (build_binary_op): Add division by zero and shift
	instrumentation.

--- gcc/c-family/c-ubsan.c.mp	2013-06-11 19:51:55.555492466 +0200
+++ gcc/c-family/c-ubsan.c	2013-06-12 17:05:20.800370083 +0200
@@ -0,0 +1,127 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "c-family/c-common.h"
+#include "c-family/c-ubsan.h"
+
+/* Instrument division by zero and INT_MIN / -1.  If not instrumenting,
+   return NULL_TREE.  */
+
+tree
+ubsan_instrument_division (location_t loc, tree op0, tree op1)
+{
+  tree t, tt;
+  tree type = TREE_TYPE (op0);
+
+  /* At this point both operands should have the same type,
+     because they are already converted to RESULT_TYPE.  */
+  gcc_assert (type == TREE_TYPE (op1));
+
+  if (TREE_CODE (type) != INTEGER_TYPE)
+    return NULL_TREE;
+
+  /* If we *know* that the divisor is not -1 or 0, we don't have to
+     instrument this expression.
+     ??? We could use decl_constant_value to cover up more cases.  */
+  if (TREE_CODE (op1) == INTEGER_CST
+      && integer_nonzerop (op1)
+      && !integer_minus_onep (op1))
+    return NULL_TREE;
+
+  t = fold_build2 (EQ_EXPR, boolean_type_node,
+		    op1, build_int_cst (type, 0));
+
+  /* We check INT_MIN / -1 only for signed types.  */
+  if (!TYPE_UNSIGNED (type))
+    {
+      tree x;
+      tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
+			build_int_cst (type, -1));
+      x = fold_build2 (EQ_EXPR, boolean_type_node, op0,
+		       TYPE_MIN_VALUE (type));
+      x = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, x, tt);
+      t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, x);
+    }
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+
+  return t;
+}
+
+/* Instrument left and right shifts.  If not instrumenting, return
+   NULL_TREE.  */
+
+tree
+ubsan_instrument_shift (location_t loc, enum tree_code code,
+			tree op0, tree op1)
+{
+  tree t, tt = NULL_TREE;
+  tree op1_utype = unsigned_type_for (TREE_TYPE (op1));
+  HOST_WIDE_INT op0_prec = TYPE_PRECISION (TREE_TYPE (op0));
+  tree uprecm1 = build_int_cst (op1_utype, op0_prec - 1);
+  tree precm1 = build_int_cst (TREE_TYPE (op1), op0_prec - 1);
+
+  t = fold_convert_loc (loc, op1_utype, op1);
+  t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1);
+
+  /* For signed x << y, in C99/C11, the following:
+     (unsigned) x >> (precm1 - y)
+     if non-zero, is undefined.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && flag_isoc99)
+    {
+      tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
+      tt = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op0)), op0);
+      tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x);
+      tt = fold_build2 (NE_EXPR, boolean_type_node, tt,
+			build_int_cst (TREE_TYPE (tt), 0));
+    }
+
+  /* For signed x << y, in C++11/C++14, the following:
+     x < 0 || ((unsigned) x >> (precm1 - y))
+     if > 1, is undefined.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && (cxx_dialect == cxx11 || cxx_dialect == cxx1y))
+    {
+      tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
+      tt = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op0)), op0);
+      tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x);
+      tt = fold_build2 (GT_EXPR, boolean_type_node, tt,
+			build_int_cst (TREE_TYPE (tt), 1));
+      x = fold_build2 (LT_EXPR, boolean_type_node, op0,
+		       build_int_cst (TREE_TYPE (op0), 0));
+      tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, x, tt);
+    }
+
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t,
+		   tt ? tt : integer_zero_node);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+
+  return t;
+}
--- gcc/c-family/c-ubsan.h.mp	2013-06-11 19:51:50.616457500 +0200
+++ gcc/c-family/c-ubsan.h	2013-06-11 16:51:38.297942275 +0200
@@ -0,0 +1,27 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_UBSAN_H
+#define GCC_UBSAN_H
+
+extern tree ubsan_instrument_division (location_t, tree, tree);
+extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
+
+#endif  /* GCC_UBSAN_H  */
--- gcc/sanitizer.def.mp	2013-06-11 19:51:43.781408808 +0200
+++ gcc/sanitizer.def	2013-06-11 19:53:37.768224970 +0200
@@ -283,3 +283,13 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOM
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE,
 		      "__tsan_atomic_signal_fence",
 		      BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
+
+/* Undefined Behavior Sanitizer */
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
+		      "__ubsan_handle_divrem_overflow",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS,
+		      "__ubsan_handle_shift_out_of_bounds",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
--- gcc/builtins.def.mp	2013-06-11 19:51:43.790408877 +0200
+++ gcc/builtins.def	2013-06-11 19:53:37.721224606 +0200
@@ -155,7 +155,7 @@ along with GCC; see the file COPYING3.
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
 	       true, true, true, ATTRS, true, \
-	       (flag_asan || flag_tsan))
+	       (flag_asan || flag_tsan || flag_ubsan))
 
 #undef DEF_CILKPLUS_BUILTIN
 #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
--- gcc/Makefile.in.mp	2013-06-11 19:51:43.780408801 +0200
+++ gcc/Makefile.in	2013-06-11 19:53:37.710224521 +0200
@@ -1150,7 +1150,7 @@ C_COMMON_OBJS = c-family/c-common.o c-fa
   c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o \
   c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \
   c-family/c-semantics.o c-family/c-ada-spec.o tree-mudflap.o \
-  c-family/array-notation-common.o
+  c-family/array-notation-common.o c-family/c-ubsan.o
 
 # Language-independent object files.
 # We put the insn-*.o files first so that a parallel make will build
@@ -2021,6 +2021,9 @@ c-family/array-notation-common.o : c-fam
 c-family/stub-objc.o : c-family/stub-objc.c $(CONFIG_H) $(SYSTEM_H) \
 	coretypes.h $(TREE_H) $(C_COMMON_H) c-family/c-objc.h
 
+c-family/c-ubsan.o : c-family/c-ubsan.c $(CONFIG_H) $(SYSTEM_H) \
+	coretypes.h $(TREE_H) $(C_COMMON_H) c-family/c-ubsan.h
+
 default-c.o: config/default-c.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
   $(C_TARGET_H) $(C_TARGET_DEF_H)
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
--- gcc/doc/invoke.texi.mp	2013-06-11 19:51:43.784408831 +0200
+++ gcc/doc/invoke.texi	2013-06-11 19:53:37.761224914 +0200
@@ -5143,6 +5143,11 @@ Memory access instructions will be instr
 data race bugs.
 See @uref{http://code.google.com/p/data-race-test/wiki/ThreadSanitizer} for more details.
 
+@item -fsanitize=undefined
+Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector
+Various computations will be instrumented to detect
+undefined behavior, e.g.@: division by zero or various overflows.
+
 @item -fdump-final-insns@r{[}=@var{file}@r{]}
 @opindex fdump-final-insns
 Dump the final internal representation (RTL) to @var{file}.  If the
--- gcc/cp/typeck.c.mp	2013-06-11 19:51:43.785408839 +0200
+++ gcc/cp/typeck.c	2013-06-12 17:03:19.635943275 +0200
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
 #include "convert.h"
 #include "c-family/c-common.h"
 #include "c-family/c-objc.h"
+#include "c-family/c-ubsan.h"
 #include "params.h"
 
 static tree pfn_from_ptrmemfunc (tree);
@@ -3867,6 +3868,7 @@ cp_build_binary_op (location_t location,
   tree final_type = 0;
 
   tree result;
+  tree orig_type = NULL;
 
   /* Nonzero if this is an operation like MIN or MAX which can
      safely be computed in short if both args are promoted shorts.
@@ -3891,6 +3893,15 @@ cp_build_binary_op (location_t location,
   op0 = orig_op0;
   op1 = orig_op1;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
+  /* Tree holding instrumentation expression.  */
+  tree instrument_expr = NULL;
+
   if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
       || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
       || code == TRUTH_XOR_EXPR)
@@ -4070,8 +4081,12 @@ cp_build_binary_op (location_t location,
 	{
 	  enum tree_code tcode0 = code0, tcode1 = code1;
 	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	  cop1 = maybe_constant_value (cop1);
 
-	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	  if (!processing_template_decl && tcode0 == INTEGER_TYPE)
+	    doing_div_or_mod = true;
+
+	  warn_for_div_by_zero (location, cop1);
 
 	  if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE)
 	    tcode0 = TREE_CODE (TREE_TYPE (TREE_TYPE (op0)));
@@ -4109,8 +4124,11 @@ cp_build_binary_op (location_t location,
     case FLOOR_MOD_EXPR:
       {
 	tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	cop1 = maybe_constant_value (cop1);
 
-	warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	if (!processing_template_decl && code0 == INTEGER_TYPE)
+	  doing_div_or_mod = true;
+	warn_for_div_by_zero (location, cop1);
       }
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -4164,6 +4182,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4211,6 +4230,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4765,8 +4785,9 @@ cp_build_binary_op (location_t location,
 
       if (shorten && none_complex)
 	{
+	  orig_type = result_type;
 	  final_type = result_type;
-	  result_type = shorten_binary_op (result_type, op0, op1, 
+	  result_type = shorten_binary_op (result_type, op0, op1,
 					   shorten == -1);
 	}
 
@@ -4832,6 +4853,31 @@ cp_build_binary_op (location_t location,
   if (build_type == NULL_TREE)
     build_type = result_type;
 
+  if (flag_ubsan && !processing_template_decl)
+    {
+      /* OP0 and/or OP1 might have side-effects.  */
+      op0 = cp_save_expr (op0);
+      op1 = cp_save_expr (op1);
+      op0 = maybe_constant_value (fold_non_dependent_expr_sfinae (op0,
+								  tf_none));
+      op1 = maybe_constant_value (fold_non_dependent_expr_sfinae (op1,
+								  tf_none));
+      if (doing_div_or_mod)
+	{
+	  /* For diagnostics we want to use the promoted types without
+	     shorten_binary_op.  So convert the arguments to the
+	     original result_type.  */
+	  if (orig_type != NULL && result_type != orig_type)
+	    {
+	      op0 = cp_convert (orig_type, op0, complain);
+	      op1 = cp_convert (orig_type, op1, complain);
+	    }
+	  instrument_expr = ubsan_instrument_division (location, op0, op1);
+	}
+      else if (doing_shift)
+	instrument_expr = ubsan_instrument_shift (location, code, op0, op1);
+    }
+
   result = build2 (resultcode, build_type, op0, op1);
   result = fold_if_not_in_template (result);
   if (final_type != 0)
@@ -4842,6 +4888,10 @@ cp_build_binary_op (location_t location,
       && !TREE_OVERFLOW_P (op1))
     overflow_warning (location, result);
 
+  if (flag_ubsan && instrument_expr != NULL)
+    result = fold_build2 (COMPOUND_EXPR, TREE_TYPE (result),
+			  instrument_expr, result);
+
   return result;
 }
 \f
--- gcc/common.opt.mp	2013-06-11 19:51:43.787408855 +0200
+++ gcc/common.opt	2013-06-11 19:53:37.742224768 +0200
@@ -858,6 +858,10 @@ fsanitize=thread
 Common Report Var(flag_tsan)
 Enable ThreadSanitizer, a data race detector
 
+fsanitize=undefined
+Common Report Var(flag_ubsan)
+Enable UndefinedBehaviorSanitizer, an undefined behavior detector
+
 fasynchronous-unwind-tables
 Common Report Var(flag_asynchronous_unwind_tables) Optimization
 Generate unwind tables that are exact at each instruction boundary
--- gcc/builtin-attrs.def.mp	2013-06-11 19:51:43.791408885 +0200
+++ gcc/builtin-attrs.def	2013-06-11 19:53:37.717224576 +0200
@@ -83,6 +83,7 @@ DEF_LIST_INT_INT (5,6)
 #undef DEF_LIST_INT_INT
 
 /* Construct trees for identifiers.  */
+DEF_ATTR_IDENT (ATTR_COLD, "cold")
 DEF_ATTR_IDENT (ATTR_CONST, "const")
 DEF_ATTR_IDENT (ATTR_FORMAT, "format")
 DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
@@ -130,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHRO
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST, ATTR_COLD,\
+			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
 			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC,	\
--- gcc/c/c-typeck.c.mp	2013-06-11 19:51:43.789408869 +0200
+++ gcc/c/c-typeck.c	2013-06-12 17:03:32.582989258 +0200
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.
 #include "gimple.h"
 #include "c-family/c-objc.h"
 #include "c-family/c-common.h"
+#include "c-family/c-ubsan.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
    diagnostic messages in convert_for_assignment.  */
@@ -9527,6 +9528,15 @@ build_binary_op (location_t location, en
      operands to truth-values.  */
   bool boolean_op = false;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
+  /* Tree holding instrumentation expression.  */
+  tree instrument_expr = NULL;
+
   if (location == UNKNOWN_LOCATION)
     location = input_location;
 
@@ -9728,6 +9738,7 @@ build_binary_op (location_t location, en
     case FLOOR_DIV_EXPR:
     case ROUND_DIV_EXPR:
     case EXACT_DIV_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
@@ -9775,6 +9786,7 @@ build_binary_op (location_t location, en
 
     case TRUNC_MOD_EXPR:
     case FLOOR_MOD_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -9873,6 +9885,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -9925,6 +9938,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -10469,6 +10483,17 @@ build_binary_op (location_t location, en
 	return error_mark_node;
     }
 
+  if (flag_ubsan)
+    {
+      /* OP0 and/or OP1 might have side-effects.  */
+      op0 = c_save_expr (op0);
+      op1 = c_save_expr (op1);
+      if (doing_div_or_mod)
+	instrument_expr = ubsan_instrument_division (location, op0, op1);
+      else if (doing_shift)
+	instrument_expr = ubsan_instrument_shift (location, code, op0, op1);
+    }
+
   /* Treat expressions in initializers specially as they can't trap.  */
   if (int_const_or_overflow)
     ret = (require_constant_value
@@ -10492,6 +10517,11 @@ build_binary_op (location_t location, en
   if (semantic_result_type)
     ret = build1 (EXCESS_PRECISION_EXPR, semantic_result_type, ret);
   protected_set_expr_location (ret, location);
+
+  if (flag_ubsan && instrument_expr != NULL)
+    ret = fold_build2 (COMPOUND_EXPR, TREE_TYPE (ret),
+		       instrument_expr, ret);
+
   return ret;
 }
 
--- gcc/asan.c.mp	2013-06-11 19:51:43.793408901 +0200
+++ gcc/asan.c	2013-06-11 19:53:37.713224545 +0200
@@ -2034,6 +2034,9 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_VOID = build_function_type_list (void_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR
     = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
+  tree BT_FN_VOID_PTR_PTR_PTR
+    = build_function_type_list (void_type_node, ptr_type_node,
+				ptr_type_node, ptr_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR_PTRMODE
     = build_function_type_list (void_type_node, ptr_type_node,
 				build_nonstandard_integer_type (POINTER_SIZE,
@@ -2099,6 +2102,9 @@ initialize_sanitizer_builtins (void)
 #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
   ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
+#undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST
+#define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \
+  /* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST
 #undef DEF_SANITIZER_BUILTIN
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,		\


	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-12 15:17                             ` Marek Polacek
@ 2013-06-12 15:29                               ` Jakub Jelinek
  2013-06-12 15:46                                 ` Marek Polacek
  0 siblings, 1 reply; 46+ messages in thread
From: Jakub Jelinek @ 2013-06-12 15:29 UTC (permalink / raw)
  To: Marek Polacek; +Cc: GCC Patches, Marc Glisse, Jason Merrill

On Wed, Jun 12, 2013 at 05:17:45PM +0200, Marek Polacek wrote:
> @@ -3867,6 +3868,7 @@ cp_build_binary_op (location_t location,
>    tree final_type = 0;
>  
>    tree result;
> +  tree orig_type = NULL;
>  
>    /* Nonzero if this is an operation like MIN or MAX which can
>       safely be computed in short if both args are promoted shorts.
> @@ -3891,6 +3893,15 @@ cp_build_binary_op (location_t location,
>    op0 = orig_op0;
>    op1 = orig_op1;
>  
> +  /* Remember whether we're doing / or %.  */
> +  bool doing_div_or_mod = false;
> +
> +  /* Remember whether we're doing << or >>.  */
> +  bool doing_shift = false;
> +
> +  /* Tree holding instrumentation expression.  */
> +  tree instrument_expr = NULL;
> +
>    if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
>        || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
>        || code == TRUTH_XOR_EXPR)
> @@ -4070,8 +4081,12 @@ cp_build_binary_op (location_t location,
>  	{
>  	  enum tree_code tcode0 = code0, tcode1 = code1;
>  	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
> +	  cop1 = maybe_constant_value (cop1);
>  
> -	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
> +	  if (!processing_template_decl && tcode0 == INTEGER_TYPE)
> +	    doing_div_or_mod = true;

Either the !processing_template_decl here is unneeded, or
if you'd check it (and perhaps flag_ubsan too) in this part of code,
then you wouldn't need to check it later.

> @@ -4832,6 +4853,31 @@ cp_build_binary_op (location_t location,
>    if (build_type == NULL_TREE)
>      build_type = result_type;
>  
> +  if (flag_ubsan && !processing_template_decl)

But, I'd certainly avoid doing the cp_save_expr/maybe_constant_value
etc. for all the binary operations you don't want to instrument
(thus check (doing_div_or_mod || doing_shift) also).
- 
> +    {
> +      /* OP0 and/or OP1 might have side-effects.  */
> +      op0 = cp_save_expr (op0);
> +      op1 = cp_save_expr (op1);
> +      op0 = maybe_constant_value (fold_non_dependent_expr_sfinae (op0,
> +								  tf_none));
> +      op1 = maybe_constant_value (fold_non_dependent_expr_sfinae (op1,
> +								  tf_none));
> +      if (doing_div_or_mod)
> +	{
> +	  /* For diagnostics we want to use the promoted types without
> +	     shorten_binary_op.  So convert the arguments to the
> +	     original result_type.  */
> +	  if (orig_type != NULL && result_type != orig_type)
> +	    {
> +	      op0 = cp_convert (orig_type, op0, complain);
> +	      op1 = cp_convert (orig_type, op1, complain);

And you don't want to change op0/op1, have your own tree vars, assign
op{0,1} to them and change here if result_type is not orig_type,
then pass those vars to ubsan_instrument_division.

	Jakub

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [RFC] Implement Undefined Behavior Sanitizer (take 2)
  2013-06-12 15:29                               ` Jakub Jelinek
@ 2013-06-12 15:46                                 ` Marek Polacek
  0 siblings, 0 replies; 46+ messages in thread
From: Marek Polacek @ 2013-06-12 15:46 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Marc Glisse, Jason Merrill

On Wed, Jun 12, 2013 at 05:29:21PM +0200, Jakub Jelinek wrote:
> > @@ -4070,8 +4081,12 @@ cp_build_binary_op (location_t location,
> >  	{
> >  	  enum tree_code tcode0 = code0, tcode1 = code1;
> >  	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
> > +	  cop1 = maybe_constant_value (cop1);
> >  
> > -	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
> > +	  if (!processing_template_decl && tcode0 == INTEGER_TYPE)
> > +	    doing_div_or_mod = true;
> 
> Either the !processing_template_decl here is unneeded, or
> if you'd check it (and perhaps flag_ubsan too) in this part of code,
> then you wouldn't need to check it later.

Fixed.

> > @@ -4832,6 +4853,31 @@ cp_build_binary_op (location_t location,
> >    if (build_type == NULL_TREE)
> >      build_type = result_type;
> >  
> > +  if (flag_ubsan && !processing_template_decl)
> 
> But, I'd certainly avoid doing the cp_save_expr/maybe_constant_value
> etc. for all the binary operations you don't want to instrument
> (thus check (doing_div_or_mod || doing_shift) also).

Of course.  Fixed.

> > +    {
> > +      /* OP0 and/or OP1 might have side-effects.  */
> > +      op0 = cp_save_expr (op0);
> > +      op1 = cp_save_expr (op1);
> > +      op0 = maybe_constant_value (fold_non_dependent_expr_sfinae (op0,
> > +								  tf_none));
> > +      op1 = maybe_constant_value (fold_non_dependent_expr_sfinae (op1,
> > +								  tf_none));
> > +      if (doing_div_or_mod)
> > +	{
> > +	  /* For diagnostics we want to use the promoted types without
> > +	     shorten_binary_op.  So convert the arguments to the
> > +	     original result_type.  */
> > +	  if (orig_type != NULL && result_type != orig_type)
> > +	    {
> > +	      op0 = cp_convert (orig_type, op0, complain);
> > +	      op1 = cp_convert (orig_type, op1, complain);
> 
> And you don't want to change op0/op1, have your own tree vars, assign
> op{0,1} to them and change here if result_type is not orig_type,
> then pass those vars to ubsan_instrument_division.

Like this?

2013-06-12  Marek Polacek  <polacek@redhat.com>

	* Makefile.in: Add ubsan.c.
	* common.opt: Add -fsanitize=undefined option.
	* doc/invoke.texi: Document the new flag.
	* sanitizer.def (DEF_SANITIZER_BUILTIN): Define.
	* builtin-attrs.def (ATTR_COLD): Define.
	* asan.c (initialize_sanitizer_builtins): Build
	BT_FN_VOID_PTR_PTR_PTR.
	* builtins.def (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
	BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS): Define.

c-family/
	* c-ubsan.c: New file.
	* c-ubsan.h: New file.

cp/
	* typeck.c (cp_build_binary_op): Add division by zero and shift
	instrumentation.

c/
	* c-typeck.c (build_binary_op): Add division by zero and shift
	instrumentation.

--- gcc/c-family/c-ubsan.c.mp	2013-06-11 19:51:55.555492466 +0200
+++ gcc/c-family/c-ubsan.c	2013-06-12 17:05:20.800370083 +0200
@@ -0,0 +1,127 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "c-family/c-common.h"
+#include "c-family/c-ubsan.h"
+
+/* Instrument division by zero and INT_MIN / -1.  If not instrumenting,
+   return NULL_TREE.  */
+
+tree
+ubsan_instrument_division (location_t loc, tree op0, tree op1)
+{
+  tree t, tt;
+  tree type = TREE_TYPE (op0);
+
+  /* At this point both operands should have the same type,
+     because they are already converted to RESULT_TYPE.  */
+  gcc_assert (type == TREE_TYPE (op1));
+
+  if (TREE_CODE (type) != INTEGER_TYPE)
+    return NULL_TREE;
+
+  /* If we *know* that the divisor is not -1 or 0, we don't have to
+     instrument this expression.
+     ??? We could use decl_constant_value to cover up more cases.  */
+  if (TREE_CODE (op1) == INTEGER_CST
+      && integer_nonzerop (op1)
+      && !integer_minus_onep (op1))
+    return NULL_TREE;
+
+  t = fold_build2 (EQ_EXPR, boolean_type_node,
+		    op1, build_int_cst (type, 0));
+
+  /* We check INT_MIN / -1 only for signed types.  */
+  if (!TYPE_UNSIGNED (type))
+    {
+      tree x;
+      tt = fold_build2 (EQ_EXPR, boolean_type_node, op1,
+			build_int_cst (type, -1));
+      x = fold_build2 (EQ_EXPR, boolean_type_node, op0,
+		       TYPE_MIN_VALUE (type));
+      x = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, x, tt);
+      t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, x);
+    }
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+
+  return t;
+}
+
+/* Instrument left and right shifts.  If not instrumenting, return
+   NULL_TREE.  */
+
+tree
+ubsan_instrument_shift (location_t loc, enum tree_code code,
+			tree op0, tree op1)
+{
+  tree t, tt = NULL_TREE;
+  tree op1_utype = unsigned_type_for (TREE_TYPE (op1));
+  HOST_WIDE_INT op0_prec = TYPE_PRECISION (TREE_TYPE (op0));
+  tree uprecm1 = build_int_cst (op1_utype, op0_prec - 1);
+  tree precm1 = build_int_cst (TREE_TYPE (op1), op0_prec - 1);
+
+  t = fold_convert_loc (loc, op1_utype, op1);
+  t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1);
+
+  /* For signed x << y, in C99/C11, the following:
+     (unsigned) x >> (precm1 - y)
+     if non-zero, is undefined.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && flag_isoc99)
+    {
+      tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
+      tt = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op0)), op0);
+      tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x);
+      tt = fold_build2 (NE_EXPR, boolean_type_node, tt,
+			build_int_cst (TREE_TYPE (tt), 0));
+    }
+
+  /* For signed x << y, in C++11/C++14, the following:
+     x < 0 || ((unsigned) x >> (precm1 - y))
+     if > 1, is undefined.  */
+  if (code == LSHIFT_EXPR
+      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && (cxx_dialect == cxx11 || cxx_dialect == cxx1y))
+    {
+      tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
+      tt = fold_convert_loc (loc, unsigned_type_for (TREE_TYPE (op0)), op0);
+      tt = fold_build2 (RSHIFT_EXPR, TREE_TYPE (tt), tt, x);
+      tt = fold_build2 (GT_EXPR, boolean_type_node, tt,
+			build_int_cst (TREE_TYPE (tt), 1));
+      x = fold_build2 (LT_EXPR, boolean_type_node, op0,
+		       build_int_cst (TREE_TYPE (op0), 0));
+      tt = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, x, tt);
+    }
+
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t,
+		   tt ? tt : integer_zero_node);
+  tt = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS);
+  tt = build_call_expr_loc (loc, tt, 0);
+  t = fold_build3 (COND_EXPR, void_type_node, t, tt, void_zero_node);
+
+  return t;
+}
--- gcc/c-family/c-ubsan.h.mp	2013-06-11 19:51:50.616457500 +0200
+++ gcc/c-family/c-ubsan.h	2013-06-11 16:51:38.297942275 +0200
@@ -0,0 +1,27 @@
+/* UndefinedBehaviorSanitizer, undefined behavior detector.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Marek Polacek <polacek@redhat.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_UBSAN_H
+#define GCC_UBSAN_H
+
+extern tree ubsan_instrument_division (location_t, tree, tree);
+extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
+
+#endif  /* GCC_UBSAN_H  */
--- gcc/sanitizer.def.mp	2013-06-11 19:51:43.781408808 +0200
+++ gcc/sanitizer.def	2013-06-11 19:53:37.768224970 +0200
@@ -283,3 +283,13 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOM
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE,
 		      "__tsan_atomic_signal_fence",
 		      BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
+
+/* Undefined Behavior Sanitizer */
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DIVREM_OVERFLOW,
+		      "__ubsan_handle_divrem_overflow",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_SHIFT_OUT_OF_BOUNDS,
+		      "__ubsan_handle_shift_out_of_bounds",
+		      BT_FN_VOID_PTR_PTR_PTR,
+		      ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
--- gcc/builtins.def.mp	2013-06-11 19:51:43.790408877 +0200
+++ gcc/builtins.def	2013-06-11 19:53:37.721224606 +0200
@@ -155,7 +155,7 @@ along with GCC; see the file COPYING3.
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
 	       true, true, true, ATTRS, true, \
-	       (flag_asan || flag_tsan))
+	       (flag_asan || flag_tsan || flag_ubsan))
 
 #undef DEF_CILKPLUS_BUILTIN
 #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
--- gcc/Makefile.in.mp	2013-06-11 19:51:43.780408801 +0200
+++ gcc/Makefile.in	2013-06-11 19:53:37.710224521 +0200
@@ -1150,7 +1150,7 @@ C_COMMON_OBJS = c-family/c-common.o c-fa
   c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o \
   c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \
   c-family/c-semantics.o c-family/c-ada-spec.o tree-mudflap.o \
-  c-family/array-notation-common.o
+  c-family/array-notation-common.o c-family/c-ubsan.o
 
 # Language-independent object files.
 # We put the insn-*.o files first so that a parallel make will build
@@ -2021,6 +2021,9 @@ c-family/array-notation-common.o : c-fam
 c-family/stub-objc.o : c-family/stub-objc.c $(CONFIG_H) $(SYSTEM_H) \
 	coretypes.h $(TREE_H) $(C_COMMON_H) c-family/c-objc.h
 
+c-family/c-ubsan.o : c-family/c-ubsan.c $(CONFIG_H) $(SYSTEM_H) \
+	coretypes.h $(TREE_H) $(C_COMMON_H) c-family/c-ubsan.h
+
 default-c.o: config/default-c.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
   $(C_TARGET_H) $(C_TARGET_DEF_H)
 	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) \
--- gcc/doc/invoke.texi.mp	2013-06-11 19:51:43.784408831 +0200
+++ gcc/doc/invoke.texi	2013-06-11 19:53:37.761224914 +0200
@@ -5143,6 +5143,11 @@ Memory access instructions will be instr
 data race bugs.
 See @uref{http://code.google.com/p/data-race-test/wiki/ThreadSanitizer} for more details.
 
+@item -fsanitize=undefined
+Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector
+Various computations will be instrumented to detect
+undefined behavior, e.g.@: division by zero or various overflows.
+
 @item -fdump-final-insns@r{[}=@var{file}@r{]}
 @opindex fdump-final-insns
 Dump the final internal representation (RTL) to @var{file}.  If the
--- gcc/cp/typeck.c.mp	2013-06-11 19:51:43.785408839 +0200
+++ gcc/cp/typeck.c	2013-06-12 17:42:40.599293416 +0200
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
 #include "convert.h"
 #include "c-family/c-common.h"
 #include "c-family/c-objc.h"
+#include "c-family/c-ubsan.h"
 #include "params.h"
 
 static tree pfn_from_ptrmemfunc (tree);
@@ -3867,6 +3868,7 @@ cp_build_binary_op (location_t location,
   tree final_type = 0;
 
   tree result;
+  tree orig_type = NULL;
 
   /* Nonzero if this is an operation like MIN or MAX which can
      safely be computed in short if both args are promoted shorts.
@@ -3891,6 +3893,15 @@ cp_build_binary_op (location_t location,
   op0 = orig_op0;
   op1 = orig_op1;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
+  /* Tree holding instrumentation expression.  */
+  tree instrument_expr = NULL;
+
   if (code == TRUTH_AND_EXPR || code == TRUTH_ANDIF_EXPR
       || code == TRUTH_OR_EXPR || code == TRUTH_ORIF_EXPR
       || code == TRUTH_XOR_EXPR)
@@ -4070,8 +4081,12 @@ cp_build_binary_op (location_t location,
 	{
 	  enum tree_code tcode0 = code0, tcode1 = code1;
 	  tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	  cop1 = maybe_constant_value (cop1);
 
-	  warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	  if (tcode0 == INTEGER_TYPE)
+	    doing_div_or_mod = true;
+
+	  warn_for_div_by_zero (location, cop1);
 
 	  if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE)
 	    tcode0 = TREE_CODE (TREE_TYPE (TREE_TYPE (op0)));
@@ -4109,8 +4124,11 @@ cp_build_binary_op (location_t location,
     case FLOOR_MOD_EXPR:
       {
 	tree cop1 = fold_non_dependent_expr_sfinae (op1, tf_none);
+	cop1 = maybe_constant_value (cop1);
 
-	warn_for_div_by_zero (location, maybe_constant_value (cop1));
+	if (code0 == INTEGER_TYPE)
+	  doing_div_or_mod = true;
+	warn_for_div_by_zero (location, cop1);
       }
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -4164,6 +4182,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4211,6 +4230,7 @@ cp_build_binary_op (location_t location,
 	  if (TREE_CODE (const_op1) != INTEGER_CST)
 	    const_op1 = op1;
 	  result_type = type0;
+	  doing_shift = true;
 	  if (TREE_CODE (const_op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_lt (const_op1, integer_zero_node))
@@ -4765,8 +4785,9 @@ cp_build_binary_op (location_t location,
 
       if (shorten && none_complex)
 	{
+	  orig_type = result_type;
 	  final_type = result_type;
-	  result_type = shorten_binary_op (result_type, op0, op1, 
+	  result_type = shorten_binary_op (result_type, op0, op1,
 					   shorten == -1);
 	}
 
@@ -4832,6 +4853,34 @@ cp_build_binary_op (location_t location,
   if (build_type == NULL_TREE)
     build_type = result_type;
 
+  if (flag_ubsan && !processing_template_decl
+      && (doing_div_or_mod || doing_shift))
+    {
+      /* OP0 and/or OP1 might have side-effects.  */
+      op0 = cp_save_expr (op0);
+      op1 = cp_save_expr (op1);
+      op0 = maybe_constant_value (fold_non_dependent_expr_sfinae (op0,
+								  tf_none));
+      op1 = maybe_constant_value (fold_non_dependent_expr_sfinae (op1,
+								  tf_none));
+      if (doing_div_or_mod)
+	{
+	  /* For diagnostics we want to use the promoted types without
+	     shorten_binary_op.  So convert the arguments to the
+	     original result_type.  */
+	  tree cop0 = op0;
+	  tree cop1 = op1;
+	  if (orig_type != NULL && result_type != orig_type)
+	    {
+	      cop0 = cp_convert (orig_type, op0, complain);
+	      cop1 = cp_convert (orig_type, op1, complain);
+	    }
+	  instrument_expr = ubsan_instrument_division (location, cop0, cop1);
+	}
+      else if (doing_shift)
+	instrument_expr = ubsan_instrument_shift (location, code, op0, op1);
+    }
+
   result = build2 (resultcode, build_type, op0, op1);
   result = fold_if_not_in_template (result);
   if (final_type != 0)
@@ -4842,6 +4891,10 @@ cp_build_binary_op (location_t location,
       && !TREE_OVERFLOW_P (op1))
     overflow_warning (location, result);
 
+  if (flag_ubsan && instrument_expr != NULL)
+    result = fold_build2 (COMPOUND_EXPR, TREE_TYPE (result),
+			  instrument_expr, result);
+
   return result;
 }
 \f
--- gcc/common.opt.mp	2013-06-11 19:51:43.787408855 +0200
+++ gcc/common.opt	2013-06-11 19:53:37.742224768 +0200
@@ -858,6 +858,10 @@ fsanitize=thread
 Common Report Var(flag_tsan)
 Enable ThreadSanitizer, a data race detector
 
+fsanitize=undefined
+Common Report Var(flag_ubsan)
+Enable UndefinedBehaviorSanitizer, an undefined behavior detector
+
 fasynchronous-unwind-tables
 Common Report Var(flag_asynchronous_unwind_tables) Optimization
 Generate unwind tables that are exact at each instruction boundary
--- gcc/builtin-attrs.def.mp	2013-06-11 19:51:43.791408885 +0200
+++ gcc/builtin-attrs.def	2013-06-11 19:53:37.717224576 +0200
@@ -83,6 +83,7 @@ DEF_LIST_INT_INT (5,6)
 #undef DEF_LIST_INT_INT
 
 /* Construct trees for identifiers.  */
+DEF_ATTR_IDENT (ATTR_COLD, "cold")
 DEF_ATTR_IDENT (ATTR_CONST, "const")
 DEF_ATTR_IDENT (ATTR_FORMAT, "format")
 DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
@@ -130,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHRO
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST, ATTR_COLD,\
+			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
 			ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC,	\
--- gcc/c/c-typeck.c.mp	2013-06-11 19:51:43.789408869 +0200
+++ gcc/c/c-typeck.c	2013-06-12 17:03:32.582989258 +0200
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.
 #include "gimple.h"
 #include "c-family/c-objc.h"
 #include "c-family/c-common.h"
+#include "c-family/c-ubsan.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
    diagnostic messages in convert_for_assignment.  */
@@ -9527,6 +9528,15 @@ build_binary_op (location_t location, en
      operands to truth-values.  */
   bool boolean_op = false;
 
+  /* Remember whether we're doing / or %.  */
+  bool doing_div_or_mod = false;
+
+  /* Remember whether we're doing << or >>.  */
+  bool doing_shift = false;
+
+  /* Tree holding instrumentation expression.  */
+  tree instrument_expr = NULL;
+
   if (location == UNKNOWN_LOCATION)
     location = input_location;
 
@@ -9728,6 +9738,7 @@ build_binary_op (location_t location, en
     case FLOOR_DIV_EXPR:
     case ROUND_DIV_EXPR:
     case EXACT_DIV_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
@@ -9775,6 +9786,7 @@ build_binary_op (location_t location, en
 
     case TRUNC_MOD_EXPR:
     case FLOOR_MOD_EXPR:
+      doing_div_or_mod = true;
       warn_for_div_by_zero (location, op1);
 
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
@@ -9873,6 +9885,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -9925,6 +9938,7 @@ build_binary_op (location_t location, en
       else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
 	  && code1 == INTEGER_TYPE)
 	{
+	  doing_shift = true;
 	  if (TREE_CODE (op1) == INTEGER_CST)
 	    {
 	      if (tree_int_cst_sgn (op1) < 0)
@@ -10469,6 +10483,17 @@ build_binary_op (location_t location, en
 	return error_mark_node;
     }
 
+  if (flag_ubsan)
+    {
+      /* OP0 and/or OP1 might have side-effects.  */
+      op0 = c_save_expr (op0);
+      op1 = c_save_expr (op1);
+      if (doing_div_or_mod)
+	instrument_expr = ubsan_instrument_division (location, op0, op1);
+      else if (doing_shift)
+	instrument_expr = ubsan_instrument_shift (location, code, op0, op1);
+    }
+
   /* Treat expressions in initializers specially as they can't trap.  */
   if (int_const_or_overflow)
     ret = (require_constant_value
@@ -10492,6 +10517,11 @@ build_binary_op (location_t location, en
   if (semantic_result_type)
     ret = build1 (EXCESS_PRECISION_EXPR, semantic_result_type, ret);
   protected_set_expr_location (ret, location);
+
+  if (flag_ubsan && instrument_expr != NULL)
+    ret = fold_build2 (COMPOUND_EXPR, TREE_TYPE (ret),
+		       instrument_expr, ret);
+
   return ret;
 }
 
--- gcc/asan.c.mp	2013-06-11 19:51:43.793408901 +0200
+++ gcc/asan.c	2013-06-11 19:53:37.713224545 +0200
@@ -2034,6 +2034,9 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_VOID = build_function_type_list (void_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR
     = build_function_type_list (void_type_node, ptr_type_node, NULL_TREE);
+  tree BT_FN_VOID_PTR_PTR_PTR
+    = build_function_type_list (void_type_node, ptr_type_node,
+				ptr_type_node, ptr_type_node, NULL_TREE);
   tree BT_FN_VOID_PTR_PTRMODE
     = build_function_type_list (void_type_node, ptr_type_node,
 				build_nonstandard_integer_type (POINTER_SIZE,
@@ -2099,6 +2102,9 @@ initialize_sanitizer_builtins (void)
 #undef ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST
 #define ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST \
   ECF_TM_PURE | ATTR_NORETURN_NOTHROW_LEAF_LIST
+#undef ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST
+#define ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST \
+  /* ECF_COLD missing */ ATTR_NORETURN_NOTHROW_LEAF_LIST
 #undef DEF_SANITIZER_BUILTIN
 #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   decl = add_builtin_function ("__builtin_" NAME, TYPE, ENUM,		\

	Marek

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2013-06-12 15:46 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-05 17:57 [RFC] Implement Undefined Behavior Sanitizer Marek Polacek
2013-06-05 18:44 ` Andrew Pinski
2013-06-05 19:23   ` Jakub Jelinek
2013-06-05 19:40     ` Andrew Pinski
2013-06-06  7:46       ` Konstantin Serebryany
2013-06-06  8:21         ` Jakub Jelinek
2013-06-06  8:26           ` Andrew Pinski
2013-06-06  8:40             ` Jakub Jelinek
2013-06-06  8:42             ` Konstantin Serebryany
2013-06-06  8:45               ` Jakub Jelinek
2013-06-06  8:55                 ` Konstantin Serebryany
2013-06-06  8:59                   ` Jakub Jelinek
2013-06-06  9:03                     ` Konstantin Serebryany
     [not found]                   ` <CAOfiQqkyXWpS-hM=FONEsZqSaWhd3UmUHb=sD71Cb96Df7er4w@mail.gmail.com>
2013-06-06 10:23                     ` Richard Smith
2013-06-06  8:47           ` Konstantin Serebryany
2013-06-05 19:57     ` Joseph S. Myers
2013-06-05 19:19 ` Jakub Jelinek
2013-06-05 19:35   ` Jakub Jelinek
2013-06-06  6:07     ` Jakub Jelinek
2013-06-06 12:17       ` Jason Merrill
2013-06-06 13:26       ` Segher Boessenkool
2013-06-06 13:35         ` Jakub Jelinek
2013-06-08 16:43   ` [RFC] Implement Undefined Behavior Sanitizer (take 2) Marek Polacek
2013-06-08 17:48     ` Marc Glisse
2013-06-08 18:22       ` Jakub Jelinek
2013-06-11 18:44         ` Marek Polacek
2013-06-11 19:14           ` Marc Glisse
2013-06-11 19:44             ` Marek Polacek
2013-06-11 20:09               ` Jakub Jelinek
2013-06-11 20:20                 ` Marek Polacek
2013-06-11 20:33                   ` Jakub Jelinek
2013-06-11 20:40                     ` Marek Polacek
2013-06-11 20:44                       ` Jakub Jelinek
2013-06-11 20:52                         ` Marek Polacek
2013-06-12 13:48                         ` Marek Polacek
2013-06-12 13:52                           ` Jakub Jelinek
2013-06-12 15:17                             ` Marek Polacek
2013-06-12 15:29                               ` Jakub Jelinek
2013-06-12 15:46                                 ` Marek Polacek
2013-06-10  9:24       ` Marek Polacek
2013-06-10  9:32         ` Jakub Jelinek
2013-06-10  9:49           ` Marek Polacek
2013-06-10 14:29     ` Joseph S. Myers
2013-06-11  1:48       ` Marek Polacek
2013-06-05 19:51 ` [RFC] Implement Undefined Behavior Sanitizer Joseph S. Myers
2013-06-07 12:38   ` Marek Polacek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).