public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Index expmed.c's shift_cost by machine mode
@ 2004-06-13  3:27 Roger Sayle
  2004-06-13 14:08 ` Richard Henderson
  2004-06-13 19:47 ` Eric Botcazou
  0 siblings, 2 replies; 8+ messages in thread
From: Roger Sayle @ 2004-06-13  3:27 UTC (permalink / raw)
  To: gcc-patches


The following patch is the next step in my efforts to improve code
generation on targets where shifts are several times more expensive
than additions (including the Pentium4).  This change adds an extra
machine mode index to the shift_cost, shiftadd_cost and shiftsub_cost
arrays used by expmed.c.  The rest of expmed.c is then updated to
use the operation's mode when determining the cost of right shifts,
allowing improved synthetic multiplcation sequences for suitably
parameterized backends.

There was no response to my question of whether it would be
preferrable to initialize these tables dynamically.  I decided that
typically there will be at most six integer modes (BI, QI, HI, SI,
DI and TI) and the additional start-up overhead should be negligible
compared to the more complex logic required for lazy initialization.
I also intend to export and make use of these tables in the RTL
optimizers (e.g. simplify_rtx), which will improve that cost/benefit
ratio further.

The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages, and regression tested with
a top-level "make -k check" with no new failures.

Ok for mainline?



2004-06-12  Roger Sayle  <roger@eyesopen.com>

	* expmed.c (shift_cost, shiftadd_cost, shiftsub_cost): Additionally
	index by machine mode.
	(init_expmed): Initialize shift_cost, shiftadd_cost and shiftsub_cost
	tables inside the loop over machine modes.
	(synth_mult, expand_mult_highpart_optab, expand_mult_highpart,
	expand_divmod): Index shift*_cost by the appropriate machine mode.


Index: expmed.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expmed.c,v
retrieving revision 1.164
diff -c -3 -p -r1.164 expmed.c
*** expmed.c	11 Jun 2004 21:34:23 -0000	1.164
--- expmed.c	12 Jun 2004 03:54:05 -0000
*************** static int smod_pow2_cheap[NUM_MACHINE_M
*** 94,102 ****
  static int zero_cost;
  static int add_cost[NUM_MACHINE_MODES];
  static int neg_cost[NUM_MACHINE_MODES];
! static int shift_cost[MAX_BITS_PER_WORD];
! static int shiftadd_cost[MAX_BITS_PER_WORD];
! static int shiftsub_cost[MAX_BITS_PER_WORD];
  static int mul_cost[NUM_MACHINE_MODES];
  static int div_cost[NUM_MACHINE_MODES];
  static int mul_widen_cost[NUM_MACHINE_MODES];
--- 94,102 ----
  static int zero_cost;
  static int add_cost[NUM_MACHINE_MODES];
  static int neg_cost[NUM_MACHINE_MODES];
! static int shift_cost[NUM_MACHINE_MODES][MAX_BITS_PER_WORD];
! static int shiftadd_cost[NUM_MACHINE_MODES][MAX_BITS_PER_WORD];
! static int shiftsub_cost[NUM_MACHINE_MODES][MAX_BITS_PER_WORD];
  static int mul_cost[NUM_MACHINE_MODES];
  static int div_cost[NUM_MACHINE_MODES];
  static int mul_widen_cost[NUM_MACHINE_MODES];
*************** void
*** 106,143 ****
  init_expmed (void)
  {
    rtx reg, shift_insn, shiftadd_insn, shiftsub_insn;
    int dummy;
!   int m;
    enum machine_mode mode, wider_mode;

    start_sequence ();

-   /* This is "some random pseudo register" for purposes of calling recog
-      to see what insns exist.  */
-   reg = gen_rtx_REG (word_mode, 10000);
-
    zero_cost = rtx_cost (const0_rtx, 0);

-   shift_insn = emit_insn (gen_rtx_SET (VOIDmode, reg,
- 				       gen_rtx_ASHIFT (word_mode, reg,
- 						       const0_rtx)));
-
-   shiftadd_insn
-     = emit_insn (gen_rtx_SET (VOIDmode, reg,
- 			      gen_rtx_PLUS (word_mode,
- 					    gen_rtx_MULT (word_mode,
- 							  reg, const0_rtx),
- 					    reg)));
-
-   shiftsub_insn
-     = emit_insn (gen_rtx_SET (VOIDmode, reg,
- 			      gen_rtx_MINUS (word_mode,
- 					     gen_rtx_MULT (word_mode,
- 							   reg, const0_rtx),
- 					     reg)));
-
    init_recog ();


    for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT);
         mode != VOIDmode;
--- 106,129 ----
  init_expmed (void)
  {
    rtx reg, shift_insn, shiftadd_insn, shiftsub_insn;
+   rtx shift_pat, shiftadd_pat, shiftsub_pat;
+   rtx pow2[MAX_BITS_PER_WORD];
+   rtx cint[MAX_BITS_PER_WORD];
    int dummy;
!   int m, n;
    enum machine_mode mode, wider_mode;

    start_sequence ();

    zero_cost = rtx_cost (const0_rtx, 0);

    init_recog ();

+   for (m = 1; m < MAX_BITS_PER_WORD; m++)
+     {
+       pow2[m] = GEN_INT ((HOST_WIDE_INT) 1 << m);
+       cint[m] = GEN_INT (m);
+     }

    for (mode = GET_CLASS_NARROWEST_MODE (MODE_INT);
         mode != VOIDmode;
*************** init_expmed (void)
*** 176,202 ****
  					   GEN_INT (GET_MODE_BITSIZE (mode)))),
  			SET);
  	}
-     }

!   shift_cost[0] = 0;
!   shiftadd_cost[0] = shiftsub_cost[0] = add_cost[word_mode];

!   for (m = 1; m < MAX_BITS_PER_WORD; m++)
!     {
!       rtx c_int = GEN_INT ((HOST_WIDE_INT) 1 << m);
!       shift_cost[m] = shiftadd_cost[m] = shiftsub_cost[m] = 32000;

!       XEXP (SET_SRC (PATTERN (shift_insn)), 1) = GEN_INT (m);
!       if (recog (PATTERN (shift_insn), shift_insn, &dummy) >= 0)
! 	shift_cost[m] = rtx_cost (SET_SRC (PATTERN (shift_insn)), SET);
!
!       XEXP (XEXP (SET_SRC (PATTERN (shiftadd_insn)), 0), 1) = c_int;
!       if (recog (PATTERN (shiftadd_insn), shiftadd_insn, &dummy) >= 0)
! 	shiftadd_cost[m] = rtx_cost (SET_SRC (PATTERN (shiftadd_insn)), SET);
!
!       XEXP (XEXP (SET_SRC (PATTERN (shiftsub_insn)), 0), 1) = c_int;
!       if (recog (PATTERN (shiftsub_insn), shiftsub_insn, &dummy) >= 0)
! 	shiftsub_cost[m] = rtx_cost (SET_SRC (PATTERN (shiftsub_insn)), SET);
      }

    end_sequence ();
--- 162,213 ----
  					   GEN_INT (GET_MODE_BITSIZE (mode)))),
  			SET);
  	}

! 	shift_insn = emit_insn (gen_rtx_SET (VOIDmode, reg,
! 					     gen_rtx_ASHIFT (mode, reg,
! 							     const0_rtx)));
!
! 	shiftadd_insn
! 	  = emit_insn (gen_rtx_SET (VOIDmode, reg,
! 				    gen_rtx_PLUS (mode,
! 						  gen_rtx_MULT (mode,
! 								reg,
! 								const0_rtx),
! 						  reg)));
!
! 	shiftsub_insn
! 	  = emit_insn (gen_rtx_SET (VOIDmode, reg,
! 				    gen_rtx_MINUS (mode,
! 						   gen_rtx_MULT (mode,
! 								 reg,
! 								 const0_rtx),
! 						   reg)));
!
! 	shift_pat = PATTERN (shift_insn);
! 	shiftadd_pat = PATTERN (shiftadd_insn);
! 	shiftsub_pat = PATTERN (shiftsub_insn);

! 	shift_cost[mode][0] = 0;
! 	shiftadd_cost[mode][0] = shiftsub_cost[mode][0] = add_cost[mode];

! 	n = MIN (MAX_BITS_PER_WORD, GET_MODE_BITSIZE (mode));
! 	for (m = 1; m < n; m++)
! 	  {
! 	    shift_cost[mode][m] = 32000;
! 	    XEXP (SET_SRC (shift_pat), 1) = cint[m];
! 	    if (recog (shift_pat, shift_insn, &dummy) >= 0)
! 	      shift_cost[mode][m] = rtx_cost (SET_SRC (shift_pat), SET);
!
! 	    shiftadd_cost[mode][m] = 32000;
! 	    XEXP (XEXP (SET_SRC (shiftadd_pat), 0), 1) = pow2[m];
! 	    if (recog (shiftadd_pat, shiftadd_insn, &dummy) >= 0)
! 	      shiftadd_cost[mode][m] = rtx_cost (SET_SRC (shiftadd_pat), SET);
!
! 	    shiftsub_cost[mode][m] = 32000;
! 	    XEXP (XEXP (SET_SRC (shiftsub_pat), 0), 1) = pow2[m];
! 	    if (recog (shiftsub_pat, shiftsub_insn, &dummy) >= 0)
! 	      shiftsub_cost[mode][m] = rtx_cost (SET_SRC (shiftsub_pat), SET);
! 	  }
      }

    end_sequence ();
*************** synth_mult (struct algorithm *alg_out, u
*** 2226,2232 ****
        if (m < BITS_PER_WORD)
  	{
  	  q = t >> m;
! 	  cost = shift_cost[m];
  	  synth_mult (alg_in, q, cost_limit - cost, mode);

  	  cost += alg_in->cost;
--- 2237,2243 ----
        if (m < BITS_PER_WORD)
  	{
  	  q = t >> m;
! 	  cost = shift_cost[mode][m];
  	  synth_mult (alg_in, q, cost_limit - cost, mode);

  	  cost += alg_in->cost;
*************** synth_mult (struct algorithm *alg_out, u
*** 2310,2318 ****
        d = ((unsigned HOST_WIDE_INT) 1 << m) + 1;
        if (t % d == 0 && t > d && m < BITS_PER_WORD)
  	{
! 	  cost = add_cost[mode] + shift_cost[m];
! 	  if (shiftadd_cost[m] < cost)
! 	    cost = shiftadd_cost[m];
  	  synth_mult (alg_in, t / d, cost_limit - cost, mode);

  	  cost += alg_in->cost;
--- 2321,2329 ----
        d = ((unsigned HOST_WIDE_INT) 1 << m) + 1;
        if (t % d == 0 && t > d && m < BITS_PER_WORD)
  	{
! 	  cost = add_cost[mode] + shift_cost[mode][m];
! 	  if (shiftadd_cost[mode][m] < cost)
! 	    cost = shiftadd_cost[mode][m];
  	  synth_mult (alg_in, t / d, cost_limit - cost, mode);

  	  cost += alg_in->cost;
*************** synth_mult (struct algorithm *alg_out, u
*** 2331,2339 ****
        d = ((unsigned HOST_WIDE_INT) 1 << m) - 1;
        if (t % d == 0 && t > d && m < BITS_PER_WORD)
  	{
! 	  cost = add_cost[mode] + shift_cost[m];
! 	  if (shiftsub_cost[m] < cost)
! 	    cost = shiftsub_cost[m];
  	  synth_mult (alg_in, t / d, cost_limit - cost, mode);

  	  cost += alg_in->cost;
--- 2342,2350 ----
        d = ((unsigned HOST_WIDE_INT) 1 << m) - 1;
        if (t % d == 0 && t > d && m < BITS_PER_WORD)
  	{
! 	  cost = add_cost[mode] + shift_cost[mode][m];
! 	  if (shiftsub_cost[mode][m] < cost)
! 	    cost = shiftsub_cost[mode][m];
  	  synth_mult (alg_in, t / d, cost_limit - cost, mode);

  	  cost += alg_in->cost;
*************** synth_mult (struct algorithm *alg_out, u
*** 2358,2364 ****
        m = exact_log2 (q);
        if (m >= 0 && m < BITS_PER_WORD)
  	{
! 	  cost = shiftadd_cost[m];
  	  synth_mult (alg_in, (t - 1) >> m, cost_limit - cost, mode);

  	  cost += alg_in->cost;
--- 2369,2375 ----
        m = exact_log2 (q);
        if (m >= 0 && m < BITS_PER_WORD)
  	{
! 	  cost = shiftadd_cost[mode][m];
  	  synth_mult (alg_in, (t - 1) >> m, cost_limit - cost, mode);

  	  cost += alg_in->cost;
*************** synth_mult (struct algorithm *alg_out, u
*** 2377,2383 ****
        m = exact_log2 (q);
        if (m >= 0 && m < BITS_PER_WORD)
  	{
! 	  cost = shiftsub_cost[m];
  	  synth_mult (alg_in, (t + 1) >> m, cost_limit - cost, mode);

  	  cost += alg_in->cost;
--- 2388,2394 ----
        m = exact_log2 (q);
        if (m >= 0 && m < BITS_PER_WORD)
  	{
! 	  cost = shiftsub_cost[mode][m];
  	  synth_mult (alg_in, (t + 1) >> m, cost_limit - cost, mode);

  	  cost += alg_in->cost;
*************** expand_mult_highpart_optab (enum machine
*** 2911,2917 ****
    /* Secondly, same as above, but use sign flavor opposite of unsignedp.
       Need to adjust the result after the multiplication.  */
    if (size - 1 < BITS_PER_WORD
!       && (mul_highpart_cost[mode] + 2 * shift_cost[size-1]
  	  + 4 * add_cost[mode] < max_cost))
      {
        moptab = unsignedp ? smul_highpart_optab : umul_highpart_optab;
--- 2922,2928 ----
    /* Secondly, same as above, but use sign flavor opposite of unsignedp.
       Need to adjust the result after the multiplication.  */
    if (size - 1 < BITS_PER_WORD
!       && (mul_highpart_cost[mode] + 2 * shift_cost[mode][size-1]
  	  + 4 * add_cost[mode] < max_cost))
      {
        moptab = unsignedp ? smul_highpart_optab : umul_highpart_optab;
*************** expand_mult_highpart_optab (enum machine
*** 2938,2944 ****
    moptab = smul_optab;
    if (smul_optab->handlers[wider_mode].insn_code != CODE_FOR_nothing
        && size - 1 < BITS_PER_WORD
!       && mul_cost[wider_mode] + shift_cost[size-1] < max_cost)
      {
        tem = expand_binop (wider_mode, moptab, op0, op1, 0,
  			  unsignedp, OPTAB_WIDEN);
--- 2949,2955 ----
    moptab = smul_optab;
    if (smul_optab->handlers[wider_mode].insn_code != CODE_FOR_nothing
        && size - 1 < BITS_PER_WORD
!       && mul_cost[wider_mode] + shift_cost[mode][size-1] < max_cost)
      {
        tem = expand_binop (wider_mode, moptab, op0, op1, 0,
  			  unsignedp, OPTAB_WIDEN);
*************** expand_mult_highpart_optab (enum machine
*** 2950,2956 ****
    moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
    if (moptab->handlers[wider_mode].insn_code != CODE_FOR_nothing
        && size - 1 < BITS_PER_WORD
!       && (mul_widen_cost[wider_mode] + 2 * shift_cost[size-1]
  	  + 4 * add_cost[mode] < max_cost))
      {
        tem = expand_binop (wider_mode, moptab, op0, narrow_op1,
--- 2961,2967 ----
    moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
    if (moptab->handlers[wider_mode].insn_code != CODE_FOR_nothing
        && size - 1 < BITS_PER_WORD
!       && (mul_widen_cost[wider_mode] + 2 * shift_cost[mode][size-1]
  	  + 4 * add_cost[mode] < max_cost))
      {
        tem = expand_binop (wider_mode, moptab, op0, narrow_op1,
*************** expand_mult_highpart (enum machine_mode
*** 3004,3010 ****
      return expand_mult_highpart_optab (mode, op0, op1, target,
  				       unsignedp, max_cost);

!   extra_cost = shift_cost[GET_MODE_BITSIZE (mode) - 1];

    /* Check whether we try to multiply by a negative constant.  */
    if (!unsignedp && ((cnst1 >> (GET_MODE_BITSIZE (mode) - 1)) & 1))
--- 3015,3021 ----
      return expand_mult_highpart_optab (mode, op0, op1, target,
  				       unsignedp, max_cost);

!   extra_cost = shift_cost[mode][GET_MODE_BITSIZE (mode) - 1];

    /* Check whether we try to multiply by a negative constant.  */
    if (!unsignedp && ((cnst1 >> (GET_MODE_BITSIZE (mode) - 1)) & 1))
*************** expand_divmod (int rem_flag, enum tree_c
*** 3344,3352 ****
  			    if (post_shift - 1 >= BITS_PER_WORD)
  			      goto fail1;

! 			    extra_cost = (shift_cost[post_shift - 1]
! 					  + shift_cost[1]
! 					  + 2 * add_cost[compute_mode]);
  			    t1 = expand_mult_highpart (compute_mode, op0, ml,
  						       NULL_RTX, 1,
  						       max_cost - extra_cost);
--- 3355,3364 ----
  			    if (post_shift - 1 >= BITS_PER_WORD)
  			      goto fail1;

! 			    extra_cost
! 			      = (shift_cost[compute_mode][post_shift - 1]
! 				 + shift_cost[compute_mode][1]
! 				 + 2 * add_cost[compute_mode]);
  			    t1 = expand_mult_highpart (compute_mode, op0, ml,
  						       NULL_RTX, 1,
  						       max_cost - extra_cost);
*************** expand_divmod (int rem_flag, enum tree_c
*** 3376,3383 ****
  			    t1 = expand_shift (RSHIFT_EXPR, compute_mode, op0,
  					       build_int_2 (pre_shift, 0),
  					       NULL_RTX, 1);
! 			    extra_cost = (shift_cost[pre_shift]
! 					  + shift_cost[post_shift]);
  			    t2 = expand_mult_highpart (compute_mode, t1, ml,
  						       NULL_RTX, 1,
  						       max_cost - extra_cost);
--- 3388,3396 ----
  			    t1 = expand_shift (RSHIFT_EXPR, compute_mode, op0,
  					       build_int_2 (pre_shift, 0),
  					       NULL_RTX, 1);
! 			    extra_cost
! 			      = (shift_cost[compute_mode][pre_shift]
! 				 + shift_cost[compute_mode][post_shift]);
  			    t2 = expand_mult_highpart (compute_mode, t1, ml,
  						       NULL_RTX, 1,
  						       max_cost - extra_cost);
*************** expand_divmod (int rem_flag, enum tree_c
*** 3511,3518 ****
  			    || size - 1 >= BITS_PER_WORD)
  			  goto fail1;

! 			extra_cost = (shift_cost[post_shift]
! 				      + shift_cost[size - 1]
  				      + add_cost[compute_mode]);
  			t1 = expand_mult_highpart (compute_mode, op0, ml,
  						   NULL_RTX, 0,
--- 3524,3531 ----
  			    || size - 1 >= BITS_PER_WORD)
  			  goto fail1;

! 			extra_cost = (shift_cost[compute_mode][post_shift]
! 				      + shift_cost[compute_mode][size - 1]
  				      + add_cost[compute_mode]);
  			t1 = expand_mult_highpart (compute_mode, op0, ml,
  						   NULL_RTX, 0,
*************** expand_divmod (int rem_flag, enum tree_c
*** 3543,3550 ****
  			  goto fail1;

  			ml |= (~(unsigned HOST_WIDE_INT) 0) << (size - 1);
! 			extra_cost = (shift_cost[post_shift]
! 				      + shift_cost[size - 1]
  				      + 2 * add_cost[compute_mode]);
  			t1 = expand_mult_highpart (compute_mode, op0, ml,
  						   NULL_RTX, 0,
--- 3556,3563 ----
  			  goto fail1;

  			ml |= (~(unsigned HOST_WIDE_INT) 0) << (size - 1);
! 			extra_cost = (shift_cost[compute_mode][post_shift]
! 				      + shift_cost[compute_mode][size - 1]
  				      + 2 * add_cost[compute_mode]);
  			t1 = expand_mult_highpart (compute_mode, op0, ml,
  						   NULL_RTX, 0,
*************** expand_divmod (int rem_flag, enum tree_c
*** 3634,3641 ****
  					   NULL_RTX, 0);
  			t2 = expand_binop (compute_mode, xor_optab, op0, t1,
  					   NULL_RTX, 0, OPTAB_WIDEN);
! 			extra_cost = (shift_cost[post_shift]
! 				      + shift_cost[size - 1]
  				      + 2 * add_cost[compute_mode]);
  			t3 = expand_mult_highpart (compute_mode, t2, ml,
  						   NULL_RTX, 1,
--- 3647,3654 ----
  					   NULL_RTX, 0);
  			t2 = expand_binop (compute_mode, xor_optab, op0, t1,
  					   NULL_RTX, 0, OPTAB_WIDEN);
! 			extra_cost = (shift_cost[compute_mode][post_shift]
! 				      + shift_cost[compute_mode][size - 1]
  				      + 2 * add_cost[compute_mode]);
  			t3 = expand_mult_highpart (compute_mode, t2, ml,
  						   NULL_RTX, 1,

Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Index expmed.c's shift_cost by machine mode
  2004-06-13  3:27 [PATCH] Index expmed.c's shift_cost by machine mode Roger Sayle
@ 2004-06-13 14:08 ` Richard Henderson
  2004-06-13 19:47 ` Eric Botcazou
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Henderson @ 2004-06-13 14:08 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

On Sat, Jun 12, 2004 at 12:20:50PM -0600, Roger Sayle wrote:
> 	* expmed.c (shift_cost, shiftadd_cost, shiftsub_cost): Additionally
> 	index by machine mode.
> 	(init_expmed): Initialize shift_cost, shiftadd_cost and shiftsub_cost
> 	tables inside the loop over machine modes.
> 	(synth_mult, expand_mult_highpart_optab, expand_mult_highpart,
> 	expand_divmod): Index shift*_cost by the appropriate machine mode.

Ok.


r~

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Index expmed.c's shift_cost by machine mode
  2004-06-13  3:27 [PATCH] Index expmed.c's shift_cost by machine mode Roger Sayle
  2004-06-13 14:08 ` Richard Henderson
@ 2004-06-13 19:47 ` Eric Botcazou
  2004-06-13 21:43   ` Roger Sayle
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Botcazou @ 2004-06-13 19:47 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2632 bytes --]

> 2004-06-12  Roger Sayle  <roger@eyesopen.com>
>
> 	* expmed.c (shift_cost, shiftadd_cost, shiftsub_cost): Additionally
> 	index by machine mode.
> 	(init_expmed): Initialize shift_cost, shiftadd_cost and shiftsub_cost
> 	tables inside the loop over machine modes.
> 	(synth_mult, expand_mult_highpart_optab, expand_mult_highpart,
> 	expand_divmod): Index shift*_cost by the appropriate machine mode.

Broke bootstrap on SPARC 64-bit: lshift_significand is miscompiled.

It's specifically these lines:

> ! 	n = MIN (MAX_BITS_PER_WORD, GET_MODE_BITSIZE (mode));
> 	for (m = 1; m < n; m++)

This means that, on 64-bit targets, shifts by 32 in SImode have zero cost.  
So they are chosen in synt_mult when multiplying by -1 because the clamp is 
on BITS_PER_WORD, not GET_MODE_BITSIZE (mode):

      if (t % d == 0 && t > d && m < BITS_PER_WORD)

Then expand_shift enters the game and returns its unmodified argument.  So we 
end up with this RTL in 09.loop at -O2:

(insn 176 165 177 (set (reg:SI 169 [ ofs ])
        (reg/v:SI 113 [ ofs ])) -1 (nil)
    (nil))

(insn 177 176 178 (set (reg:SI 170)
        (minus:SI (reg:SI 169 [ ofs ])
            (reg/v:SI 113 [ ofs ]))) -1 (nil)
    (expr_list:REG_EQUAL (mult:SI (reg/v:SI 113 [ ofs ])
            (const_int 4294967295 [0xffffffff]))
        (nil)))

(insn 178 177 179 (set (reg:SI 171)
        (plus:SI (reg:SI 170)
            (reg:SI 170))) -1 (nil)
    (expr_list:REG_EQUAL (mult:SI (reg/v:SI 113 [ ofs ])
            (const_int -1 [0xffffffffffffffff]))
        (nil)))

(insn 179 178 181 (set (reg:SI 168)
        (plus:SI (reg:SI 171)
            (const_int 1 [0x1]))) -1 (nil)
    (nil))

(insn 181 179 183 (set (reg:SI 172)
        (neg:SI (reg/v:SI 113 [ ofs ]))) -1 (nil)
    (nil))

(insn 183 181 184 (set (reg:SI 174 [ ofs ])
        (reg/v:SI 113 [ ofs ])) -1 (nil)
    (nil))

(insn 184 183 185 (set (reg:SI 175)
        (minus:SI (reg:SI 174 [ ofs ])
            (reg/v:SI 113 [ ofs ]))) -1 (nil)
    (expr_list:REG_EQUAL (mult:SI (reg/v:SI 113 [ ofs ])
            (const_int 4294967295 [0xffffffff]))
        (nil)))

(insn 185 184 186 (set (reg:SI 176)
        (plus:SI (reg:SI 175)
            (reg:SI 175))) -1 (nil)
    (expr_list:REG_EQUAL (mult:SI (reg/v:SI 113 [ ofs ])
            (const_int -1 [0xffffffffffffffff]))
        (nil)))

for

  (mult:SI (reg/v:SI 113 [ ofs ]) (const_int -1))


Testcase attached, miscompiled by cross to sparc64-sun-solaris2.9 at -O2.


Restoring the original line

	for (m = 1; m < MAX_BITS_PER_WORD; m++)

fixes the problem.


Do you want to try anything else or can I commit the fix after testing?

-- 
Eric Botcazou

[-- Attachment #2: real_bug.c --]
[-- Type: text/x-csrc, Size: 1582 bytes --]

#define HOST_BITS_PER_LONG  64

#define SIGNIFICAND_BITS	(128 + HOST_BITS_PER_LONG)
#define EXP_BITS		(32 - 5)
#define MAX_EXP			((1 << (EXP_BITS - 1)) - 1)
#define SIGSZ			(SIGNIFICAND_BITS / HOST_BITS_PER_LONG)
#define SIG_MSB			((unsigned long)1 << (HOST_BITS_PER_LONG - 1))

struct real_value
{
  /* Use the same underlying type for all bit-fields, so as to make
     sure they're packed together, otherwise REAL_VALUE_TYPE_SIZE will
     be miscomputed.  */
  unsigned int /* ENUM_BITFIELD (real_value_class) */ class : 2;
  unsigned int sign : 1;
  unsigned int signalling : 1;
  unsigned int canonical : 1;
  unsigned int uexp : EXP_BITS;
  unsigned long sig[SIGSZ];
};

/* Various headers condition prototypes on #ifdef REAL_VALUE_TYPE, so it
   needs to be a macro.  We do need to continue to have a structure tag
   so that other headers can forward declare it.  */
#define REAL_VALUE_TYPE struct real_value

/* Left-shift the significand of A by N bits; put the result in the
   significand of R.  */

void
lshift_significand (REAL_VALUE_TYPE *r, const REAL_VALUE_TYPE *a,
		    unsigned int n)
{
  unsigned int i, ofs = n / HOST_BITS_PER_LONG;

  n &= HOST_BITS_PER_LONG - 1;
  if (n == 0)
    {
      for (i = 0; ofs + i < SIGSZ; ++i)
	r->sig[SIGSZ-1-i] = a->sig[SIGSZ-1-i-ofs];
      for (; i < SIGSZ; ++i)
	r->sig[SIGSZ-1-i] = 0;
    }
  else
    for (i = 0; i < SIGSZ; ++i)
      {
	r->sig[SIGSZ-1-i]
	  = (((ofs + i >= SIGSZ ? 0 : a->sig[SIGSZ-1-i-ofs]) << n)
	     | ((ofs + i + 1 >= SIGSZ ? 0 : a->sig[SIGSZ-1-i-ofs-1])
		>> (HOST_BITS_PER_LONG - n)));
      }
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Index expmed.c's shift_cost by machine mode
  2004-06-13 19:47 ` Eric Botcazou
@ 2004-06-13 21:43   ` Roger Sayle
  2004-06-13 22:48     ` Eric Botcazou
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Sayle @ 2004-06-13 21:43 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches


Hi Eric,

On Sun, 13 Jun 2004, Eric Botcazou wrote:
> > 2004-06-12  Roger Sayle  <roger@eyesopen.com>
> > 	* expmed.c (shift_cost, shiftadd_cost, shiftsub_cost): Additionally
> > 	index by machine mode.
> > 	(init_expmed): Initialize shift_cost, shiftadd_cost and shiftsub_cost
> > 	tables inside the loop over machine modes.
> > 	(synth_mult, expand_mult_highpart_optab, expand_mult_highpart,
> > 	expand_divmod): Index shift*_cost by the appropriate machine mode.
>
> Broke bootstrap on SPARC 64-bit: lshift_significand is miscompiled.
>
> It's specifically these lines:
>
> > ! 	n = MIN (MAX_BITS_PER_WORD, GET_MODE_BITSIZE (mode));
> > 	for (m = 1; m < n; m++)
>
> This means that, on 64-bit targets, shifts by 32 in SImode have zero cost.
> So they are chosen in synt_mult when multiplying by -1 because the clamp is
> on BITS_PER_WORD, not GET_MODE_BITSIZE (mode):
>
>       if (t % d == 0 && t > d && m < BITS_PER_WORD)

Doh!  Obviously it makes no sense to shift a register by more than it's
mode's width.  The correct fix is to change each of these tests in
expmed.c to test against GET_MODE_BITSIZE (mode) instead of against
BITS_PER_WORD.  Or more safely, the lower of the two.

> Restoring the original line
>
> 	for (m = 1; m < MAX_BITS_PER_WORD; m++)
>
> fixes the problem.
>
> Do you want to try anything else or can I commit the fix after testing?


I think this change is reasonable if you wish to restore bootstrap as
quickly as possible.  However my preference would be to add

  int maxm = MIN (MAX_BITS_PER_WORD, GET_MODE_BITSIZE (mode));

to the top of synth_mult, and then replace all but the last use of
MAX_BITS_PER_WORD in synth_mult with maxm.


Note that SImode multiplication by 0xffffffff should generate the same
sequence of shifts/adds as -1, even on host's where these aren't the
same value!

It's impressive that the backend's have never encountered problems
being asked to estimate the costs of clearly invalid shifts.  There
may even be a few platforms the benefit as synth_mult is no longer
misguided by cost estimates based on operand value etc...


If you'd prefer to commit your fix, whilst I bootstrap and regression
test the proposed solution above, let me know.  I'd appreciate it if
you could confirm that it also restores sparc64 bootstrap.

Sorry for the breakage.

Roger
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Index expmed.c's shift_cost by machine mode
  2004-06-13 21:43   ` Roger Sayle
@ 2004-06-13 22:48     ` Eric Botcazou
  2004-06-14  4:14       ` [PATCH] Fix sparc64 bootstrap failure in synth_mult Roger Sayle
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Botcazou @ 2004-06-13 22:48 UTC (permalink / raw)
  To: Roger Sayle; +Cc: gcc-patches

> Hi Eric,

Hi Roger,

> Doh!  Obviously it makes no sense to shift a register by more than it's
> mode's width.  The correct fix is to change each of these tests in
> expmed.c to test against GET_MODE_BITSIZE (mode) instead of against
> BITS_PER_WORD.  Or more safely, the lower of the two.

Agreed.

> If you'd prefer to commit your fix, whilst I bootstrap and regression
> test the proposed solution above, let me know.  I'd appreciate it if
> you could confirm that it also restores sparc64 bootstrap.

I'll use my fix locally for the time being because I need to test other 
things.  I'll test your patch tomorrow on the SPARC boxes, but I can test it 
now with a cross.

> Sorry for the breakage.

No problem, Sunday was not very interesting anyway ;-)

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Fix sparc64 bootstrap failure in synth_mult
  2004-06-13 22:48     ` Eric Botcazou
@ 2004-06-14  4:14       ` Roger Sayle
  2004-06-15 12:27         ` Eric Botcazou
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Sayle @ 2004-06-14  4:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Eric Botcazou, Richard Henderson



The following patch should address the current bootstrap failure
on sparc64 caused by my recent patch to index shift_cost by the
multiplication's machine mode.

In my opinion, the underlying problem is a signedness issue with
synth_mult's multiplier co-efficient, that has been harmlessly(?)
latent for a long time.  This function's "t" argument is an unsigned
HOST_WIDE_INT, whose bits represent the value the multiplicand.
On many platforms, multiplications are typically performed in
word_mode, which matches the width of HOST_WIDE_INT, so there's
no distintion between passing signed vs. unsigned multipliers
to synth_mult.

However, on sparc64 and other platforms where the host's HOST_WIDE_INT
is wider than the mode of an operation, the sign extension of negative
multipliers produces extra one-bits with confuse the synth_mult algorithm.
For years on these platforms, synth_mult has been relying on the
backend's rtx_cost to return "reasonable" values for invalid RTL such as
(ASHIFT:SI (REG:SI x) (const_int 40)), [where the shift is by a value
wider than the presumed 32-bit SImode].


With the recent transition to look-up tables, that sensibly avoid
determining shift_cost[SImode][40], this latent problem has been
exposed, and trigger's a miscompilation on sparc64.


The fix below is two fold.  The main "obvious" fix is at the start of
synth_mult function to mask the multiplier to width of the machine
mode, using GET_MODE_MASK.  Hence for QImode multiplications by -1,
we consider the multiplier to have 8 bits set, not 32 or 64.  This
also improves termination conditions on the algorithm.  The second part
of the fix, just to be doubly safe, is to tweak the sanity checks in
synth_mult such that we never access shift_cost[mode][m] for invalid
m, i.e. m must be < GET_MODE_BITSIZE (mode).


The following patch has been tested on i686-pc-linux-gnu with a full
"make bootstrap", all default languages, and regression tested with a
top-level "make -k check" with no new failures.  From Eric Botcazou's
analysis and description of the sparc64 bootstrap failure, either of
these changes in the patch should (in theory) resolve the issue.


Ok for mainline?



2004-06-13  Roger Sayle  <roger@eyesopen.com>

	* expmed.c (synth_mult): Mask bits of the multiplier to the
	machine mode of the multiplication.  Don't consider shifts
	by more than (or equal two) the width of the operation's mode.


Index: expmed.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expmed.c,v
retrieving revision 1.165
diff -c -3 -p -r1.165 expmed.c
*** expmed.c	13 Jun 2004 02:46:08 -0000	1.165
--- expmed.c	13 Jun 2004 21:04:36 -0000
*************** synth_mult (struct algorithm *alg_out, u
*** 2191,2196 ****
--- 2191,2197 ----
    struct algorithm *alg_in, *best_alg;
    int cost;
    unsigned HOST_WIDE_INT q;
+   int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));

    /* Indicate that no algorithm is yet found.  If no algorithm
       is found, this value will be returned and indicate failure.  */
*************** synth_mult (struct algorithm *alg_out, u
*** 2199,2204 ****
--- 2200,2208 ----
    if (cost_limit <= 0)
      return;

+   /* Restrict the bits of "t" to the multiplication's mode.  */
+   t &= GET_MODE_MASK (mode);
+
    /* t == 1 can be done in zero cost.  */
    if (t == 1)
      {
*************** synth_mult (struct algorithm *alg_out, u
*** 2234,2240 ****
    if ((t & 1) == 0)
      {
        m = floor_log2 (t & -t);	/* m = number of low zero bits */
!       if (m < BITS_PER_WORD)
  	{
  	  q = t >> m;
  	  cost = shift_cost[mode][m];
--- 2238,2244 ----
    if ((t & 1) == 0)
      {
        m = floor_log2 (t & -t);	/* m = number of low zero bits */
!       if (m < maxm)
  	{
  	  q = t >> m;
  	  cost = shift_cost[mode][m];
*************** synth_mult (struct algorithm *alg_out, u
*** 2319,2325 ****
        unsigned HOST_WIDE_INT d;

        d = ((unsigned HOST_WIDE_INT) 1 << m) + 1;
!       if (t % d == 0 && t > d && m < BITS_PER_WORD)
  	{
  	  cost = add_cost[mode] + shift_cost[mode][m];
  	  if (shiftadd_cost[mode][m] < cost)
--- 2323,2329 ----
        unsigned HOST_WIDE_INT d;

        d = ((unsigned HOST_WIDE_INT) 1 << m) + 1;
!       if (t % d == 0 && t > d && m < maxm)
  	{
  	  cost = add_cost[mode] + shift_cost[mode][m];
  	  if (shiftadd_cost[mode][m] < cost)
*************** synth_mult (struct algorithm *alg_out, u
*** 2340,2346 ****
  	}

        d = ((unsigned HOST_WIDE_INT) 1 << m) - 1;
!       if (t % d == 0 && t > d && m < BITS_PER_WORD)
  	{
  	  cost = add_cost[mode] + shift_cost[mode][m];
  	  if (shiftsub_cost[mode][m] < cost)
--- 2344,2350 ----
  	}

        d = ((unsigned HOST_WIDE_INT) 1 << m) - 1;
!       if (t % d == 0 && t > d && m < maxm)
  	{
  	  cost = add_cost[mode] + shift_cost[mode][m];
  	  if (shiftsub_cost[mode][m] < cost)
*************** synth_mult (struct algorithm *alg_out, u
*** 2367,2373 ****
        q = t - 1;
        q = q & -q;
        m = exact_log2 (q);
!       if (m >= 0 && m < BITS_PER_WORD)
  	{
  	  cost = shiftadd_cost[mode][m];
  	  synth_mult (alg_in, (t - 1) >> m, cost_limit - cost, mode);
--- 2371,2377 ----
        q = t - 1;
        q = q & -q;
        m = exact_log2 (q);
!       if (m >= 0 && m < maxm)
  	{
  	  cost = shiftadd_cost[mode][m];
  	  synth_mult (alg_in, (t - 1) >> m, cost_limit - cost, mode);
*************** synth_mult (struct algorithm *alg_out, u
*** 2386,2392 ****
        q = t + 1;
        q = q & -q;
        m = exact_log2 (q);
!       if (m >= 0 && m < BITS_PER_WORD)
  	{
  	  cost = shiftsub_cost[mode][m];
  	  synth_mult (alg_in, (t + 1) >> m, cost_limit - cost, mode);
--- 2390,2396 ----
        q = t + 1;
        q = q & -q;
        m = exact_log2 (q);
!       if (m >= 0 && m < maxm)
  	{
  	  cost = shiftsub_cost[mode][m];
  	  synth_mult (alg_in, (t + 1) >> m, cost_limit - cost, mode);


Roger
--
Roger Sayle,                         E-mail: roger@eyesopen.com
OpenEye Scientific Software,         WWW: http://www.eyesopen.com/
Suite 1107, 3600 Cerrillos Road,     Tel: (+1) 505-473-7385
Santa Fe, New Mexico, 87507.         Fax: (+1) 505-473-0833

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix sparc64 bootstrap failure in synth_mult
  2004-06-14  4:14       ` [PATCH] Fix sparc64 bootstrap failure in synth_mult Roger Sayle
@ 2004-06-15 12:27         ` Eric Botcazou
  2004-06-15 16:24           ` Roger Sayle
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Botcazou @ 2004-06-15 12:27 UTC (permalink / raw)
  To: Roger Sayle; +Cc: Richard Henderson, gcc-patches

> The following patch should address the current bootstrap failure
> on sparc64 caused by my recent patch to index shift_cost by the
> multiplication's machine mode.

Confirmed.  It even doesn't break SPARC 32-bit so I'm happy :-)

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix sparc64 bootstrap failure in synth_mult
  2004-06-15 12:27         ` Eric Botcazou
@ 2004-06-15 16:24           ` Roger Sayle
  0 siblings, 0 replies; 8+ messages in thread
From: Roger Sayle @ 2004-06-15 16:24 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Henderson, gcc-patches


On Tue, 15 Jun 2004, Eric Botcazou wrote:
> > The following patch should address the current bootstrap failure
> > on sparc64 caused by my recent patch to index shift_cost by the
> > multiplication's machine mode.
>
> Confirmed.  It even doesn't break SPARC 32-bit so I'm happy :-)

Excellent.  That's good enough for me.  Commited to mainline CVS.
Thanks for your help, and sorry again for the inconvenience.

Roger
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-06-15 15:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-06-13  3:27 [PATCH] Index expmed.c's shift_cost by machine mode Roger Sayle
2004-06-13 14:08 ` Richard Henderson
2004-06-13 19:47 ` Eric Botcazou
2004-06-13 21:43   ` Roger Sayle
2004-06-13 22:48     ` Eric Botcazou
2004-06-14  4:14       ` [PATCH] Fix sparc64 bootstrap failure in synth_mult Roger Sayle
2004-06-15 12:27         ` Eric Botcazou
2004-06-15 16:24           ` Roger Sayle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).