public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] FMA on trees
@ 2010-11-02 13:35 Richard Guenther
  2010-11-02 15:40 ` Richard Henderson
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Guenther @ 2010-11-02 13:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: rth


This accumulates the patches from RTH and me and implements the suggestion
made by Joseph for -ffp-contract.  The patch passes bootstrap & regtest
on x86_64-unknown-linux-gnu with the following fallout:

FAIL: gcc.target/i386/fma3-builtin-2.c scan-assembler-times vfmadd...ps 1
FAIL: gcc.target/i386/fma3-builtin-2.c scan-assembler-times vfmadd...pd 1
FAIL: gcc.target/i386/fma3-builtin-2.c scan-assembler-times vfmsub...ps 1
FAIL: gcc.target/i386/fma3-builtin-2.c scan-assembler-times vfmsub...pd 1
FAIL: gcc.target/i386/fma3-builtin-2.c scan-assembler-times vfnmadd...ps 1
FAIL: gcc.target/i386/fma3-builtin-2.c scan-assembler-times vfnmadd...pd 1
FAIL: gcc.target/i386/fma3-builtin-2.c scan-assembler-times vfnmsub...ps 1
FAIL: gcc.target/i386/fma3-builtin-2.c scan-assembler-times vfnmsub...pd 1
FAIL: gcc.target/i386/fma4-builtin-2.c scan-assembler-times vfmaddps 1
FAIL: gcc.target/i386/fma4-builtin-2.c scan-assembler-times vfmaddpd 1
FAIL: gcc.target/i386/fma4-builtin-2.c scan-assembler-times vfmsubps 1
FAIL: gcc.target/i386/fma4-builtin-2.c scan-assembler-times vfmsubpd 1
FAIL: gcc.target/i386/fma4-builtin-2.c scan-assembler-times vfnmaddps 1
FAIL: gcc.target/i386/fma4-builtin-2.c scan-assembler-times vfnmaddpd 1
FAIL: gcc.target/i386/fma4-builtin-2.c scan-assembler-times vfnmsubps 1
FAIL: gcc.target/i386/fma4-builtin-2.c scan-assembler-times vfnmsubpd 1
FAIL: gcc.target/i386/fma4-vector-2.c scan-assembler vfmaddps
FAIL: gcc.target/i386/fma4-vector-2.c scan-assembler vfmsubps

without investigating this probably means that folding __builtin_fma
to FMA_EXPR breaks vectorization here.  I'll fix that and re-submit.

Still posted to make the stage1 deadline.

Richard.

2010-10-22  Richard Guenther  <rguenther@suse.de>
	Richard Henderson  <rth@redhat.com>

	* tree.def (FMA_EXPR): New tree code.
	* expr.c (expand_expr_real_2): Add FMA_EXPR expansion code.
	* gimple.c (gimple_rhs_class_table): FMA_EXPR is a GIMPLE_TERNARY_RHS.
	* tree-cfg.c (verify_gimple_assign_ternary): Verify FMA_EXPR types.
	* tree-inline.c (estimate_operator_cost): Handle FMA_EXPR.
	* gimple-pretty-print.c (dump_ternary_rhs): Likewise.
	* tree-ssa-math-opts.c (convert_mult_to_fma): New function.
	(execute_optimize_widening_mul): Call it.  Reorganize to allow
	dead stmt removal.  Move TODO flags ...
	(pass_optimize_widening_mul): ... here.
	* flag-types.h (enum fp_contract_mode): New enum.
	* common.opt (flag_fp_contract_mode): New variable.
	(-ffp-contract): New option.
	* opts.c (common_handle_option): Handle it.
	(set_unsafe_math_optimizations_flags): Enable -ffp-contract=fast.
	(fast_math_flags_set_p): Check flag_fp_contract_mode.
	* doc/invoke.texi (-ffp-contract): Document.
	(-funsafe-math-optimizations): Adjust.
	* builtins.c (fold_builtin_fma): New function.
	(fold_builtin_3): Call it for fma.
	* config/i386/sse.md (fms<mode>4, fnma<mode>, fnms<mode>4):
	New expanders.
	* doc/md.texi (fms<mode>4, fnma<mode>, fnms<mode>4): Document new
	named patterns.
	* genopinit.c (optabs): Initialize fms_optab, fnma_optab and fnms_optab.
	* optabs.h (enum optab_index): Add OTI_fms, OTI_fnma and OTI_fnms.
	(fms_optab, fnma_optab, fnms_optab): New defines.
	* gimplify.c (gimplify_expr): Handle binary truth expressions
	explicitly.  Handle FMA_EXPR.

	* gcc.target/i386/fma4-vector-2.c: New testcase.

Index: gcc/tree.def
===================================================================
*** gcc/tree.def.orig	2010-10-22 17:06:23.000000000 +0200
--- gcc/tree.def	2010-11-02 12:08:20.000000000 +0100
*************** DEFTREECODE (WIDEN_MULT_PLUS_EXPR, "wide
*** 1092,1097 ****
--- 1092,1103 ----
     is subtracted from t3.  */
  DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_plus_expr", tcc_expression, 3)
  
+ /* Fused multiply-add.
+    All operands and the result are of the same type.  No intermediate
+    rounding is performed after multiplying operand one with operand two
+    before adding operand three.  */
+ DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3)
+ 
  /* Whole vector left/right shift in bits.
     Operand 0 is a vector to be shifted.
     Operand 1 is an integer shift amount in bits.  */
Index: gcc/expr.c
===================================================================
*** gcc/expr.c.orig	2010-10-22 17:06:23.000000000 +0200
--- gcc/expr.c	2010-11-02 12:18:39.000000000 +0100
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7254,7260 ****
    int ignore;
    bool reduce_bit_field;
    location_t loc = ops->location;
!   tree treeop0, treeop1;
  #define REDUCE_BIT_FIELD(expr)	(reduce_bit_field			  \
  				 ? reduce_to_bit_field_precision ((expr), \
  								  target, \
--- 7254,7260 ----
    int ignore;
    bool reduce_bit_field;
    location_t loc = ops->location;
!   tree treeop0, treeop1, treeop2;
  #define REDUCE_BIT_FIELD(expr)	(reduce_bit_field			  \
  				 ? reduce_to_bit_field_precision ((expr), \
  								  target, \
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7267,7272 ****
--- 7267,7273 ----
  
    treeop0 = ops->op0;
    treeop1 = ops->op1;
+   treeop2 = ops->op2;
  
    /* We should be called only on simple (binary or unary) expressions,
       exactly those that are valid in gimple expressions that aren't
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7624,7630 ****
      case WIDEN_MULT_PLUS_EXPR:
      case WIDEN_MULT_MINUS_EXPR:
        expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
!       op2 = expand_normal (ops->op2);
        target = expand_widen_pattern_expr (ops, op0, op1, op2,
  					  target, unsignedp);
        return target;
--- 7625,7631 ----
      case WIDEN_MULT_PLUS_EXPR:
      case WIDEN_MULT_MINUS_EXPR:
        expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
!       op2 = expand_normal (treeop2);
        target = expand_widen_pattern_expr (ops, op0, op1, op2,
  					  target, unsignedp);
        return target;
*************** expand_expr_real_2 (sepops ops, rtx targ
*** 7711,7716 ****
--- 7712,7757 ----
        expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL);
        return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp));
  
+     case FMA_EXPR:
+       {
+ 	optab opt = fma_optab;
+ 	gimple def0, def2;
+ 
+ 	def0 = get_def_for_expr (treeop0, NEGATE_EXPR);
+ 	def2 = get_def_for_expr (treeop2, NEGATE_EXPR);
+ 
+ 	op0 = op2 = NULL;
+ 
+ 	if (def0 && def2
+ 	    && optab_handler (fnms_optab, mode) != CODE_FOR_nothing)
+ 	  {
+ 	    opt = fnms_optab;
+ 	    op0 = expand_normal (gimple_assign_rhs1 (def0));
+ 	    op2 = expand_normal (gimple_assign_rhs1 (def2));
+ 	  }
+ 	else if (def0
+ 		 && optab_handler (fnma_optab, mode) != CODE_FOR_nothing)
+ 	  {
+ 	    opt = fnma_optab;
+ 	    op0 = expand_normal (gimple_assign_rhs1 (def0));
+ 	  }
+ 	else if (def2
+ 		 && optab_handler (fms_optab, mode) != CODE_FOR_nothing)
+ 	  {
+ 	    opt = fms_optab;
+ 	    op2 = expand_normal (gimple_assign_rhs1 (def2));
+ 	  }
+ 
+ 	if (op0 == NULL)
+ 	  op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
+ 	if (op2 == NULL)
+ 	  op2 = expand_normal (treeop2);
+ 	op1 = expand_normal (treeop1);
+ 
+ 	return expand_ternary_op (TYPE_MODE (type), opt,
+ 				  op0, op1, op2, target, 0);
+       }
+ 
      case MULT_EXPR:
        /* If this is a fixed-point operation, then we cannot use the code
  	 below because "expand_mult" doesn't support sat/no-sat fixed-point
Index: gcc/gimple.c
===================================================================
*** gcc/gimple.c.orig	2010-11-02 11:16:39.000000000 +0100
--- gcc/gimple.c	2010-11-02 12:08:20.000000000 +0100
*************** get_gimple_rhs_num_ops (enum tree_code c
*** 2530,2536 ****
        || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS			    \
     : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS				    \
     : ((SYM) == WIDEN_MULT_PLUS_EXPR					    \
!       || (SYM) == WIDEN_MULT_MINUS_EXPR) ? GIMPLE_TERNARY_RHS		    \
     : ((SYM) == COND_EXPR						    \
        || (SYM) == CONSTRUCTOR						    \
        || (SYM) == OBJ_TYPE_REF						    \
--- 2530,2537 ----
        || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS			    \
     : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS				    \
     : ((SYM) == WIDEN_MULT_PLUS_EXPR					    \
!       || (SYM) == WIDEN_MULT_MINUS_EXPR					    \
!       || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS			    \
     : ((SYM) == COND_EXPR						    \
        || (SYM) == CONSTRUCTOR						    \
        || (SYM) == OBJ_TYPE_REF						    \
Index: gcc/tree-cfg.c
===================================================================
*** gcc/tree-cfg.c.orig	2010-10-22 17:06:23.000000000 +0200
--- gcc/tree-cfg.c	2010-11-02 12:08:20.000000000 +0100
*************** verify_gimple_assign_ternary (gimple stm
*** 3748,3753 ****
--- 3748,3767 ----
  	}
        break;
  
+     case FMA_EXPR:
+       if (!useless_type_conversion_p (lhs_type, rhs1_type)
+ 	  || !useless_type_conversion_p (lhs_type, rhs2_type)
+ 	  || !useless_type_conversion_p (lhs_type, rhs3_type))
+ 	{
+ 	  error ("type mismatch in fused multiply-add expression");
+ 	  debug_generic_expr (lhs_type);
+ 	  debug_generic_expr (rhs1_type);
+ 	  debug_generic_expr (rhs2_type);
+ 	  debug_generic_expr (rhs3_type);
+ 	  return true;
+ 	}
+       break;
+ 
      default:
        gcc_unreachable ();
      }
Index: gcc/tree-inline.c
===================================================================
*** gcc/tree-inline.c.orig	2010-10-22 17:06:23.000000000 +0200
--- gcc/tree-inline.c	2010-11-02 12:08:20.000000000 +0100
*************** estimate_operator_cost (enum tree_code c
*** 3284,3289 ****
--- 3284,3290 ----
      case POINTER_PLUS_EXPR:
      case MINUS_EXPR:
      case MULT_EXPR:
+     case FMA_EXPR:
  
      case ADDR_SPACE_CONVERT_EXPR:
      case FIXED_CONVERT_EXPR:
Index: gcc/gimple-pretty-print.c
===================================================================
*** gcc/gimple-pretty-print.c.orig	2010-10-22 17:06:23.000000000 +0200
--- gcc/gimple-pretty-print.c	2010-11-02 12:08:20.000000000 +0100
*************** dump_ternary_rhs (pretty_printer *buffer
*** 400,405 ****
--- 400,413 ----
        pp_character (buffer, '>');
        break;
  
+     case FMA_EXPR:
+       dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+       pp_string (buffer, " * ");
+       dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+       pp_string (buffer, " + ");
+       dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+       break;
+ 
      default:
        gcc_unreachable ();
      }
Index: gcc/tree-ssa-math-opts.c
===================================================================
*** gcc/tree-ssa-math-opts.c.orig	2010-10-22 17:06:23.000000000 +0200
--- gcc/tree-ssa-math-opts.c	2010-11-02 12:33:53.000000000 +0100
*************** convert_plusminus_to_widen (gimple_stmt_
*** 1494,1499 ****
--- 1494,1616 ----
    return true;
  }
  
+ /* Combine the multiplication at MUL_STMT with uses in additions and
+    subtractions to form fused multiply-add operations.  Returns true
+    if successful and MUL_STMT should be removed.  */
+ 
+ static bool
+ convert_mult_to_fma (gimple mul_stmt)
+ {
+   tree mul_result = gimple_assign_lhs (mul_stmt);
+   tree type = TREE_TYPE (mul_result);
+   gimple use_stmt, fma_stmt;
+   use_operand_p use_p;
+   imm_use_iterator imm_iter;
+ 
+   if (FLOAT_TYPE_P (type)
+       && flag_fp_contract_mode == FP_CONTRACT_OFF)
+     return false;
+ 
+   /* We don't want to do bitfield reduction ops.  */
+   if (INTEGRAL_TYPE_P (type)
+       && (TYPE_PRECISION (type)
+ 	  != GET_MODE_PRECISION (TYPE_MODE (type))))
+     return false;
+ 
+   /* If the target doesn't support it, don't generate it.  We assume that
+      if fma isn't available then fms, fnma or fnms are not either.  */
+   if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+     return false;
+ 
+   /* Make sure that the multiplication statement becomes dead after
+      the transformation, thus that all uses are transformed to FMAs.
+      This means we assume that an FMA operation has the same cost
+      as an addition.  */
+   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result)
+     {
+       enum tree_code use_code;
+ 
+       use_stmt = USE_STMT (use_p);
+ 
+       if (!is_gimple_assign (use_stmt))
+ 	return false;
+       use_code = gimple_assign_rhs_code (use_stmt);
+       /* ???  We need to handle NEGATE_EXPR to eventually form fnms.  */
+       if (use_code != PLUS_EXPR
+ 	  && use_code != MINUS_EXPR)
+ 	return false;
+ 
+       /* For now restrict this operations to single basic blocks.  In theory
+ 	 we would want to support sinking the multiplication in
+ 	 m = a*b;
+ 	 if ()
+ 	   ma = m + c;
+ 	 else
+ 	   d = m;
+ 	 to form a fma in the then block and sink the multiplication to the
+ 	 else block.  */
+       if (gimple_bb (use_stmt) != gimple_bb (mul_stmt))
+ 	return false;
+ 
+       /* We can't handle a * b + a * b.  */
+       if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt))
+ 	return false;
+ 
+       /* If the target doesn't support a * b - c then drop the ball.  */
+       if (gimple_assign_rhs1 (use_stmt) == mul_result
+ 	  && use_code == MINUS_EXPR
+ 	  && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+ 	return false;
+ 
+       /* If the target doesn't support -a * b + c then drop the ball.  */
+       if (gimple_assign_rhs2 (use_stmt) == mul_result
+ 	  && use_code == MINUS_EXPR
+ 	  && optab_handler (fnma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+ 	return false;
+ 
+       /* We don't yet generate -a * b - c below yet.  */
+     }
+ 
+   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)
+     {
+       tree addop, mulop1;
+       gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+ 
+       mulop1 = gimple_assign_rhs1 (mul_stmt);
+       if (gimple_assign_rhs1 (use_stmt) == mul_result)
+ 	{
+ 	  addop = gimple_assign_rhs2 (use_stmt);
+ 	  /* a * b - c -> a * b + (-c)  */
+ 	  if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
+ 	    addop = force_gimple_operand_gsi (&gsi,
+ 					      build1 (NEGATE_EXPR,
+ 						      type, addop),
+ 					      true, NULL_TREE, true,
+ 					      GSI_SAME_STMT);
+ 	}
+       else
+ 	{
+ 	  addop = gimple_assign_rhs1 (use_stmt);
+ 	  /* a - b * c -> (-b) * c + a */
+ 	  if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR)
+ 	    mulop1 = force_gimple_operand_gsi (&gsi,
+ 					       build1 (NEGATE_EXPR,
+ 						       type, mulop1),
+ 					       true, NULL_TREE, true,
+ 					       GSI_SAME_STMT);
+ 	}
+ 
+       fma_stmt = gimple_build_assign_with_ops3 (FMA_EXPR,
+ 						gimple_assign_lhs (use_stmt),
+ 						mulop1,
+ 						gimple_assign_rhs2 (mul_stmt),
+ 						addop);
+       gsi_replace (&gsi, fma_stmt, true);
+     }
+ 
+   return true;
+ }
+ 
  /* Find integer multiplications where the operands are extended from
     smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
     where appropriate.  */
*************** convert_plusminus_to_widen (gimple_stmt_
*** 1501,1531 ****
  static unsigned int
  execute_optimize_widening_mul (void)
  {
-   bool changed = false;
    basic_block bb;
  
    FOR_EACH_BB (bb)
      {
        gimple_stmt_iterator gsi;
  
!       for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
          {
  	  gimple stmt = gsi_stmt (gsi);
  	  enum tree_code code;
  
! 	  if (!is_gimple_assign (stmt))
! 	    continue;
! 
! 	  code = gimple_assign_rhs_code (stmt);
! 	  if (code == MULT_EXPR)
! 	    changed |= convert_mult_to_widen (stmt);
! 	  else if (code == PLUS_EXPR || code == MINUS_EXPR)
! 	    changed |= convert_plusminus_to_widen (&gsi, stmt, code);
  	}
      }
  
!   return (changed ? TODO_dump_func | TODO_update_ssa | TODO_verify_ssa
! 	  | TODO_verify_stmts : 0);
  }
  
  static bool
--- 1618,1662 ----
  static unsigned int
  execute_optimize_widening_mul (void)
  {
    basic_block bb;
  
    FOR_EACH_BB (bb)
      {
        gimple_stmt_iterator gsi;
  
!       for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
          {
  	  gimple stmt = gsi_stmt (gsi);
  	  enum tree_code code;
  
! 	  if (is_gimple_assign (stmt))
! 	    {
! 	      code = gimple_assign_rhs_code (stmt);
! 	      switch (code)
! 		{
! 		case MULT_EXPR:
! 		  if (!convert_mult_to_widen (stmt)
! 		      && convert_mult_to_fma (stmt))
! 		    {
! 		      gsi_remove (&gsi, true);
! 		      release_defs (stmt);
! 		      continue;
! 		    }
! 		  break;
! 
! 		case PLUS_EXPR:
! 		case MINUS_EXPR:
! 		  convert_plusminus_to_widen (&gsi, stmt, code);
! 		  break;
! 
! 		default:;
! 		}
! 	    }
! 	  gsi_next (&gsi);
  	}
      }
  
!   return 0;
  }
  
  static bool
*************** struct gimple_opt_pass pass_optimize_wid
*** 1549,1554 ****
    0,					/* properties_provided */
    0,					/* properties_destroyed */
    0,					/* todo_flags_start */
!   0                                     /* todo_flags_finish */
   }
  };
--- 1680,1688 ----
    0,					/* properties_provided */
    0,					/* properties_destroyed */
    0,					/* todo_flags_start */
!   TODO_verify_ssa
!   | TODO_verify_stmts
!   | TODO_dump_func
!   | TODO_update_ssa                     /* todo_flags_finish */
   }
  };
Index: gcc/testsuite/gcc.target/i386/fma4-vector-2.c
===================================================================
*** /dev/null	1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.target/i386/fma4-vector-2.c	2010-11-02 12:08:20.000000000 +0100
***************
*** 0 ****
--- 1,21 ----
+ /* { dg-do compile } */
+ /* { dg-require-effective-target lp64 } */
+ /* { dg-options "-O2 -mfma4 -ftree-vectorize -mtune=generic" } */
+ 
+ float r[256], s[256];
+ float x[256];
+ float y[256];
+ float z[256];
+ 
+ void foo (void)
+ {
+   int i;
+   for (i = 0; i < 256; ++i)
+     {
+       r[i] = x[i] * y[i] - z[i];
+       s[i] = x[i] * y[i] + z[i];
+     }
+ }
+ 
+ /* { dg-final { scan-assembler "vfmaddps" } } */
+ /* { dg-final { scan-assembler "vfmsubps" } } */
Index: gcc/common.opt
===================================================================
*** gcc/common.opt.orig	2010-11-02 11:16:39.000000000 +0100
--- gcc/common.opt	2010-11-02 12:45:41.000000000 +0100
*************** bool flag_warn_unused_result = false
*** 58,63 ****
--- 58,67 ----
  Variable
  int *param_values
  
+ ; Floating-point contraction mode, fast by default.
+ Variable
+ enum fp_contract_mode flag_fp_contract_mode = FP_CONTRACT_FAST
+ 
  ###
  Driver
  
*************** fforward-propagate
*** 857,862 ****
--- 861,870 ----
  Common Report Var(flag_forward_propagate) Optimization
  Perform a forward propagation pass on RTL
  
+ ffp-contract=
+ Common Joined RejectNegative
+ -ffp-contract=[off|on|fast] Perform floating-point expression contraction.
+ 
  ; Nonzero means don't put addresses of constant functions in registers.
  ; Used for compiling the Unix kernel, where strange substitutions are
  ; done on the assembly output.
Index: gcc/doc/invoke.texi
===================================================================
*** gcc/doc/invoke.texi.orig	2010-11-02 11:16:10.000000000 +0100
--- gcc/doc/invoke.texi	2010-11-02 12:14:37.000000000 +0100
*************** Objective-C and Objective-C++ Dialects}.
*** 342,348 ****
  -fdelayed-branch -fdelete-null-pointer-checks -fdse -fdse @gol
  -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol
  -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
! -fforward-propagate -ffunction-sections @gol
  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
  -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol
  -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
--- 342,348 ----
  -fdelayed-branch -fdelete-null-pointer-checks -fdse -fdse @gol
  -fearly-inlining -fipa-sra -fexpensive-optimizations -ffast-math @gol
  -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol
! -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
  -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
  -fgcse-sm -fif-conversion -fif-conversion2 -findirect-inlining @gol
  -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
*************** loop unrolling.
*** 5980,5985 ****
--- 5980,5997 ----
  This option is enabled by default at optimization levels @option{-O},
  @option{-O2}, @option{-O3}, @option{-Os}.
  
+ @item -ffp-contract=@var{style}
+ @opindex ffp-contract
+ @option{-ffp-contract=off} disables floating-point expression contraction.
+ @option{-ffp-contract=fast} enables floating-point expression contraction
+ such as forming of fused multiply-add operations if the target has
+ native support for them.
+ @option{-ffp-contract=on} enables floating-point expression contraction
+ if allowed by the language standard.  This is currently not implemented
+ and treated equal to @option{-ffp-contract=off}.
+ 
+ The default is @option{-ffp-contract=fast}.
+ 
  @item -fomit-frame-pointer
  @opindex fomit-frame-pointer
  Don't keep the frame pointer in a register for functions that
*************** an exact implementation of IEEE or ISO r
*** 7816,7822 ****
  math functions. It may, however, yield faster code for programs
  that do not require the guarantees of these specifications.
  Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
! @option{-fassociative-math} and @option{-freciprocal-math}.
  
  The default is @option{-fno-unsafe-math-optimizations}.
  
--- 7828,7835 ----
  math functions. It may, however, yield faster code for programs
  that do not require the guarantees of these specifications.
  Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
! @option{-fassociative-math}, @option{-freciprocal-math} and
! @option{-ffp-contract=fast}.
  
  The default is @option{-fno-unsafe-math-optimizations}.
  
Index: gcc/opts.c
===================================================================
*** gcc/opts.c.orig	2010-10-22 17:06:23.000000000 +0200
--- gcc/opts.c	2010-11-02 12:45:34.000000000 +0100
*************** common_handle_option (struct gcc_options
*** 1901,1906 ****
--- 1901,1918 ----
  	return false;
        break;
  
+     case OPT_ffp_contract_:
+       if (!strcmp (arg, "on"))
+ 	/* Not implemented, fall back to conservative FP_CONTRACT_OFF.  */
+ 	flag_fp_contract_mode = FP_CONTRACT_OFF;
+       else if (!strcmp (arg, "off"))
+ 	flag_fp_contract_mode = FP_CONTRACT_OFF;
+       else if (!strcmp (arg, "fast"))
+ 	flag_fp_contract_mode = FP_CONTRACT_FAST;
+       else
+ 	error ("unknown floating point contraction style \"%s\"", arg);
+       break;
+ 
      case OPT_fexcess_precision_:
        if (!strcmp (arg, "fast"))
  	flag_excess_precision_cmdline = EXCESS_PRECISION_FAST;
*************** set_unsafe_math_optimizations_flags (int
*** 2289,2294 ****
--- 2301,2307 ----
    flag_signed_zeros = !set;
    flag_associative_math = set;
    flag_reciprocal_math = set;
+   flag_fp_contract_mode = set ? FP_CONTRACT_FAST : FP_CONTRACT_OFF;
  }
  
  /* Return true iff flags are set as if -ffast-math.  */
*************** fast_math_flags_set_p (void)
*** 2299,2305 ****
  	  && flag_unsafe_math_optimizations
  	  && flag_finite_math_only
  	  && !flag_signed_zeros
! 	  && !flag_errno_math);
  }
  
  /* Return true iff flags are set as if -ffast-math but using the flags stored
--- 2312,2319 ----
  	  && flag_unsafe_math_optimizations
  	  && flag_finite_math_only
  	  && !flag_signed_zeros
! 	  && !flag_errno_math
! 	  && flag_fp_contract_mode == FP_CONTRACT_FAST);
  }
  
  /* Return true iff flags are set as if -ffast-math but using the flags stored
Index: gcc/builtins.c
===================================================================
*** gcc/builtins.c.orig	2010-11-02 11:16:39.000000000 +0100
--- gcc/builtins.c	2010-11-02 12:19:44.000000000 +0100
*************** fold_builtin_abs (location_t loc, tree a
*** 9262,9267 ****
--- 9262,9288 ----
    return fold_build1_loc (loc, ABS_EXPR, type, arg);
  }
  
+ /* Fold a call to fma, fmaf, or fmal with arguments ARG[012].  */
+ 
+ static tree
+ fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type)
+ {
+   if (validate_arg (arg0, REAL_TYPE)
+       && validate_arg(arg1, REAL_TYPE)
+       && validate_arg(arg2, REAL_TYPE))
+     {
+       if (TREE_CODE (arg0) == REAL_CST
+ 	  && TREE_CODE (arg1) == REAL_CST
+ 	  && TREE_CODE (arg2) == REAL_CST)
+ 	return do_mpfr_arg3 (arg0, arg1, arg2, type, mpfr_fma);
+ 
+       /* ??? Only expand to FMA_EXPR if it's directly supported.  */
+       if (optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing)
+         return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2);
+     }
+   return NULL_TREE;
+ }
+ 
  /* Fold a call to builtin fmin or fmax.  */
  
  static tree
*************** fold_builtin_3 (location_t loc, tree fnd
*** 10536,10545 ****
        return fold_builtin_sincos (loc, arg0, arg1, arg2);
  
      CASE_FLT_FN (BUILT_IN_FMA):
!       if (validate_arg (arg0, REAL_TYPE)
! 	  && validate_arg(arg1, REAL_TYPE)
! 	  && validate_arg(arg2, REAL_TYPE))
! 	return do_mpfr_arg3 (arg0, arg1, arg2, type, mpfr_fma);
      break;
  
      CASE_FLT_FN (BUILT_IN_REMQUO):
--- 10557,10563 ----
        return fold_builtin_sincos (loc, arg0, arg1, arg2);
  
      CASE_FLT_FN (BUILT_IN_FMA):
!       return fold_builtin_fma (loc, arg0, arg1, arg2, type);
      break;
  
      CASE_FLT_FN (BUILT_IN_REMQUO):
Index: gcc/config/i386/sse.md
===================================================================
*** gcc/config/i386/sse.md.orig	2010-11-02 11:16:36.000000000 +0100
--- gcc/config/i386/sse.md	2010-11-02 12:18:39.000000000 +0100
***************
*** 1859,1865 ****
  
  ;; Intrinsic FMA operations.
  
! ;; The standard name for fma is only available with SSE math enabled.
  (define_expand "fma<mode>4"
    [(set (match_operand:FMAMODE 0 "register_operand")
  	(fma:FMAMODE
--- 1859,1865 ----
  
  ;; Intrinsic FMA operations.
  
! ;; The standard names for fma is only available with SSE math enabled.
  (define_expand "fma<mode>4"
    [(set (match_operand:FMAMODE 0 "register_operand")
  	(fma:FMAMODE
***************
*** 1869,1874 ****
--- 1869,1901 ----
    "(TARGET_FMA || TARGET_FMA4) && TARGET_SSE_MATH"
    "")
  
+ (define_expand "fms<mode>4"
+   [(set (match_operand:FMAMODE 0 "register_operand")
+ 	(fma:FMAMODE
+ 	  (match_operand:FMAMODE 1 "nonimmediate_operand")
+ 	  (match_operand:FMAMODE 2 "nonimmediate_operand")
+ 	  (neg:FMAMODE (match_operand:FMAMODE 3 "nonimmediate_operand"))))]
+   "(TARGET_FMA || TARGET_FMA4) && TARGET_SSE_MATH"
+   "")
+ 
+ (define_expand "fnma<mode>4"
+   [(set (match_operand:FMAMODE 0 "register_operand")
+ 	(fma:FMAMODE
+ 	  (neg:FMAMODE (match_operand:FMAMODE 1 "nonimmediate_operand"))
+ 	  (match_operand:FMAMODE 2 "nonimmediate_operand")
+ 	  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
+   "(TARGET_FMA || TARGET_FMA4) && TARGET_SSE_MATH"
+   "")
+ 
+ (define_expand "fnms<mode>4"
+   [(set (match_operand:FMAMODE 0 "register_operand")
+ 	(fma:FMAMODE
+ 	  (neg:FMAMODE (match_operand:FMAMODE 1 "nonimmediate_operand"))
+ 	  (match_operand:FMAMODE 2 "nonimmediate_operand")
+ 	  (neg:FMAMODE (match_operand:FMAMODE 3 "nonimmediate_operand"))))]
+   "(TARGET_FMA || TARGET_FMA4) && TARGET_SSE_MATH"
+   "")
+ 
  ;; The builtin for fma4intrin.h is not constrained by SSE math enabled.
  (define_expand "fma4i_fmadd_<mode>"
    [(set (match_operand:FMAMODE 0 "register_operand")
Index: gcc/doc/md.texi
===================================================================
*** gcc/doc/md.texi.orig	2010-10-18 11:11:47.000000000 +0200
--- gcc/doc/md.texi	2010-11-02 12:18:39.000000000 +0100
*************** pattern is used to implement the @code{f
*** 3958,3963 ****
--- 3958,3993 ----
  multiply followed by the add if the machine does not perform a
  rounding step between the operations.
  
+ @cindex @code{fms@var{m}4} instruction pattern
+ @item @samp{fms@var{m}4}
+ Like @code{fma@var{m}4}, except operand 3 subtracted from the
+ product instead of added to the product.  This is represented
+ in the rtl as
+ 
+ @smallexample
+ (fma:@var{m} @var{op1} @var{op2} (neg:@var{m} @var{op3}))
+ @end smallexample
+ 
+ @cindex @code{fnma@var{m}4} instruction pattern
+ @item @samp{fnma@var{m}4}
+ Like @code{fma@var{m}4} except that the intermediate product
+ is negated before being added to operand 3.  This is represented
+ in the rtl as
+ 
+ @smallexample
+ (fma:@var{m} (neg:@var{m} @var{op1}) @var{op2} @var{op3})
+ @end smallexample
+ 
+ @cindex @code{fnms@var{m}4} instruction pattern
+ @item @samp{fnms@var{m}4}
+ Like @code{fms@var{m}4} except that the intermediate product
+ is negated before subtracting operand 3.  This is represented
+ in the rtl as
+ 
+ @smallexample
+ (fma:@var{m} (neg:@var{m} @var{op1}) @var{op2} (neg:@var{m} @var{op3}))
+ @end smallexample
+ 
  @cindex @code{min@var{m}3} instruction pattern
  @cindex @code{max@var{m}3} instruction pattern
  @item @samp{smin@var{m}3}, @samp{smax@var{m}3}
Index: gcc/flag-types.h
===================================================================
*** gcc/flag-types.h.orig	2010-10-12 14:57:13.000000000 +0200
--- gcc/flag-types.h	2010-11-02 12:12:45.000000000 +0100
*************** enum warn_strict_overflow_code
*** 152,155 ****
--- 152,162 ----
    WARN_STRICT_OVERFLOW_MAGNITUDE = 5
  };
  
+ /* Floating-point contraction mode.  */
+ enum fp_contract_mode {
+   FP_CONTRACT_OFF = 0,
+   FP_CONTRACT_ON = 1,
+   FP_CONTRACT_FAST = 2
+ };
+ 
  #endif /* ! GCC_FLAG_TYPES_H */
Index: gcc/genopinit.c
===================================================================
*** gcc/genopinit.c.orig	2010-10-18 11:11:51.000000000 +0200
--- gcc/genopinit.c	2010-11-02 12:18:39.000000000 +0100
*************** static const char * const optabs[] =
*** 160,165 ****
--- 160,168 ----
    "set_optab_handler (floor_optab, $A, CODE_FOR_$(floor$a2$))",
    "set_convert_optab_handler (lfloor_optab, $B, $A, CODE_FOR_$(lfloor$F$a$I$b2$))",
    "set_optab_handler (fma_optab, $A, CODE_FOR_$(fma$a4$))",
+   "set_optab_handler (fms_optab, $A, CODE_FOR_$(fms$a4$))",
+   "set_optab_handler (fnma_optab, $A, CODE_FOR_$(fnma$a4$))",
+   "set_optab_handler (fnms_optab, $A, CODE_FOR_$(fnms$a4$))",
    "set_optab_handler (ceil_optab, $A, CODE_FOR_$(ceil$a2$))",
    "set_convert_optab_handler (lceil_optab, $B, $A, CODE_FOR_$(lceil$F$a$I$b2$))",
    "set_optab_handler (round_optab, $A, CODE_FOR_$(round$a2$))",
Index: gcc/gimplify.c
===================================================================
*** gcc/gimplify.c.orig	2010-11-02 11:16:39.000000000 +0100
--- gcc/gimplify.c	2010-11-02 12:19:44.000000000 +0100
*************** gimplify_expr (tree *expr_p, gimple_seq
*** 7170,7175 ****
--- 7170,7185 ----
  	  ret = gimplify_omp_atomic (expr_p, pre_p);
  	  break;
  
+ 	case TRUTH_AND_EXPR:
+ 	case TRUTH_OR_EXPR:
+ 	case TRUTH_XOR_EXPR:
+ 	  /* Classified as tcc_expression.  */
+ 	  goto expr_2;
+ 
+ 	case FMA_EXPR:
+ 	  /* Classified as tcc_expression.  */
+ 	  goto expr_3;
+ 
  	case POINTER_PLUS_EXPR:
            /* Convert ((type *)A)+offset into &A->field_of_type_and_offset.
  	     The second is gimple immediate saving a need for extra statement.
*************** gimplify_expr (tree *expr_p, gimple_seq
*** 7249,7264 ****
  		break;
  	      }
  
  	    case tcc_declaration:
  	    case tcc_constant:
  	      ret = GS_ALL_DONE;
  	      goto dont_recalculate;
  
  	    default:
! 	      gcc_assert (TREE_CODE (*expr_p) == TRUTH_AND_EXPR
! 			  || TREE_CODE (*expr_p) == TRUTH_OR_EXPR
! 			  || TREE_CODE (*expr_p) == TRUTH_XOR_EXPR);
! 	      goto expr_2;
  	    }
  
  	  recalculate_side_effects (*expr_p);
--- 7259,7286 ----
  		break;
  	      }
  
+ 	    expr_3:
+ 	      {
+ 		enum gimplify_status r0, r1, r2;
+ 
+ 		r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
+ 		                    post_p, is_gimple_val, fb_rvalue);
+ 		r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p,
+ 				    post_p, is_gimple_val, fb_rvalue);
+ 		r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p,
+ 				    post_p, is_gimple_val, fb_rvalue);
+ 
+ 		ret = MIN (MIN (r0, r1), r2);
+ 		break;
+ 	      }
+ 
  	    case tcc_declaration:
  	    case tcc_constant:
  	      ret = GS_ALL_DONE;
  	      goto dont_recalculate;
  
  	    default:
! 	      gcc_unreachable ();
  	    }
  
  	  recalculate_side_effects (*expr_p);
Index: gcc/optabs.h
===================================================================
*** gcc/optabs.h.orig	2010-10-18 11:11:51.000000000 +0200
--- gcc/optabs.h	2010-11-02 12:18:39.000000000 +0100
*************** enum optab_index
*** 192,197 ****
--- 192,200 ----
    OTI_atan2,
    /* Floating multiply/add */
    OTI_fma,
+   OTI_fms,
+   OTI_fnma,
+   OTI_fnms,
  
    /* Move instruction.  */
    OTI_mov,
*************** enum optab_index
*** 435,440 ****
--- 438,446 ----
  #define pow_optab (&optab_table[OTI_pow])
  #define atan2_optab (&optab_table[OTI_atan2])
  #define fma_optab (&optab_table[OTI_fma])
+ #define fms_optab (&optab_table[OTI_fms])
+ #define fnma_optab (&optab_table[OTI_fnma])
+ #define fnms_optab (&optab_table[OTI_fnms])
  
  #define mov_optab (&optab_table[OTI_mov])
  #define movstrict_optab (&optab_table[OTI_movstrict])

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-11-09 17:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-02 13:35 [PATCH] FMA on trees Richard Guenther
2010-11-02 15:40 ` Richard Henderson
2010-11-03 13:13   ` Richard Guenther
2010-11-03 15:30     ` Richard Guenther
2010-11-08 19:23       ` Steve Ellcey
2010-11-08 19:27         ` -mfused-madd vs -ffp-contract deprication? Richard Henderson
2010-11-08 23:14           ` Joseph S. Myers
2010-11-09  1:41             ` Richard Henderson
2010-11-09  9:09               ` Richard Sandiford
2010-11-09 15:49                 ` Richard Henderson
2010-11-09  9:53           ` Richard Guenther
2010-11-09 17:14             ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).