divide 64-bit by constant for 32-bit target machines

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* divide 64-bit by constant for 32-bit target machines
@ 2012-04-20 12:57 Dinar Temirbulatov
  2012-04-23 14:30 ` Andrew Haley
  2012-04-24  1:49 ` Michael Hope
  0 siblings, 2 replies; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-04-20 12:57 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 550 bytes --]

Hi,
Here is the patch that adds support for divide 64-bit by constant for
32-bit target machines, this patch was tested on arm-7a with no new
regressions, also I am not sure on how to avoid for example i686
targets since div operation there is fast compared to over targets and
it showed better performance with  libc/sysdeps/wordsize-32/divdi3.c
__udivdi3 vs my implementation on the compiler side, it is not clear
for me by looking at the udiv_cost[speed][SImode] value.
                                                              thanks, Dinar.

[-- Attachment #2: 11.patch --]
[-- Type: application/octet-stream, Size: 2498 bytes --]

diff -rup gcc-20120418-orig/gcc/expmed.c gcc-20120418/gcc/expmed.c
--- gcc-20120418-orig/gcc/expmed.c	2012-04-20 14:00:49.125256428 +0400
+++ gcc-20120418/gcc/expmed.c	2012-04-20 16:15:08.305042269 +0400
@@ -3523,6 +3523,68 @@ expand_mult_highpart_optab (enum machine
 	}
     }
 
+  if (size - 1 > BITS_PER_WORD
+      && BITS_PER_WORD == 32 && mode == DImode)
+    {
+      unsigned HOST_WIDE_INT d;
+      rtx x1, x0, y1, y0, z2, z0, tmp, tmp1, u0, u0tmp, u1, c, c1, ccst, cres, result, seq;
+
+      d = (INTVAL (op1) & GET_MODE_MASK (DImode));
+      start_sequence ();
+      x1 = gen_highpart (SImode, op0);
+      x1 = force_reg (SImode, x1);
+      x0 = gen_lowpart (SImode, op0);
+      x0 = force_reg (SImode, x0);
+
+      x1 = convert_to_mode (DImode, x1, 0);
+      x0 = convert_to_mode (DImode, x0, 0);
+
+      y0 = gen_rtx_CONST_INT (DImode, d&UINT_MAX);
+      y1 = gen_rtx_CONST_INT (DImode, d>>32);
+
+      z2 = gen_reg_rtx (DImode);
+      u0 = gen_reg_rtx (DImode);
+
+      z2 = expand_mult(DImode, x1, y1, z2, 0);
+      u0 = expand_mult(DImode, x0, y1, u0, 0);
+
+      z0 = gen_reg_rtx (DImode);
+      u1 = gen_reg_rtx (DImode);
+      z0 = expand_mult(DImode, x0, y0, z0, 0);
+      u1 = expand_mult(DImode, x1, y0, u1, 0);
+
+      u0tmp = gen_reg_rtx (DImode);
+      u0tmp = expand_shift (RSHIFT_EXPR, DImode, z0, 32, u0tmp, 1);
+      expand_inc (u0, u0tmp);
+      tmp = gen_reg_rtx (DImode);
+      tmp = expand_binop (DImode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+      if (!tmp)
+             return 0;
+
+      c = gen_reg_rtx (DImode);
+      c1 = gen_reg_rtx (DImode);
+      cres = gen_reg_rtx (DImode);
+      emit_store_flag_force (c, GT, u0, tmp, DImode, 1, -1);
+      emit_store_flag_force (c1, GT, u1, tmp, DImode, 1, -1);
+      result = expand_binop (DImode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
+      if (!result)
+           return 0;
+
+      ccst = gen_reg_rtx (DImode);
+      ccst = expand_shift (LSHIFT_EXPR, DImode, cres, 32, ccst, 1);
+
+      expand_inc (z2, ccst);
+
+      tmp1 = gen_reg_rtx (DImode);
+      tmp1 = expand_shift (RSHIFT_EXPR, DImode, tmp, 32, NULL_RTX, 1);
+      expand_inc (z2, tmp1);
+      seq = get_insns ();
+      end_sequence ();
+      emit_insn (seq);
+      return z2;
+
+    }
+
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-04-20 12:57 divide 64-bit by constant for 32-bit target machines Dinar Temirbulatov
@ 2012-04-23 14:30 ` Andrew Haley
  2012-04-24  1:49 ` Michael Hope
  1 sibling, 0 replies; 19+ messages in thread
From: Andrew Haley @ 2012-04-23 14:30 UTC (permalink / raw)
  To: gcc-patches

On 04/20/2012 01:57 PM, Dinar Temirbulatov wrote:
> Here is the patch that adds support for divide 64-bit by constant for
> 32-bit target machines, this patch was tested on arm-7a with no new
> regressions, also I am not sure on how to avoid for example i686
> targets since div operation there is fast compared to over targets and
> it showed better performance with  libc/sysdeps/wordsize-32/divdi3.c
> __udivdi3 vs my implementation on the compiler side, it is not clear
> for me by looking at the udiv_cost[speed][SImode] value.

I can't approve this patch.  However, I do think that a comment
which explains the algorithm is needed.

Andrew.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-04-20 12:57 divide 64-bit by constant for 32-bit target machines Dinar Temirbulatov
  2012-04-23 14:30 ` Andrew Haley
@ 2012-04-24  1:49 ` Michael Hope
  2012-05-03 10:28   ` Dinar Temirbulatov
  1 sibling, 1 reply; 19+ messages in thread
From: Michael Hope @ 2012-04-24  1:49 UTC (permalink / raw)
  To: Dinar Temirbulatov; +Cc: gcc-patches

On 21 April 2012 00:57, Dinar Temirbulatov <dtemirbulatov@gmail.com> wrote:
> Hi,
> Here is the patch that adds support for divide 64-bit by constant for
> 32-bit target machines, this patch was tested on arm-7a with no new
> regressions, also I am not sure on how to avoid for example i686
> targets since div operation there is fast compared to over targets and
> it showed better performance with  libc/sysdeps/wordsize-32/divdi3.c
> __udivdi3 vs my implementation on the compiler side, it is not clear
> for me by looking at the udiv_cost[speed][SImode] value.

Hi Dinar.  I'm afraid it gives the wrong results for some dividends:
 * 82625484914982912 / 2023346444509052928: gives 4096, should be zero
 * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9
 * 12097415865295708879 / 4545815675034402816: gives 130, should be 2
 * 18195490362097456014 / 6999635335417036800: gives 10, should be 2

The expanded version is very large.  Perhaps it should only turn on at
-O2 and always turn off at -Os?

The speed increase is quite impressive - I'm seeing between 2.7 and
20x faster on a Cortex-A9 for things like x / 3.

-- Michael

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-04-24  1:49 ` Michael Hope
@ 2012-05-03 10:28   ` Dinar Temirbulatov
  2012-05-03 13:41     ` Richard Earnshaw
  0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-05-03 10:28 UTC (permalink / raw)
  To: Michael Hope; +Cc: gcc-patches, aph

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

Hi,
Here is updated version of patch. I added comments describing the algorithm.

> Hi Dinar.  I'm afraid it gives the wrong results for some dividends
>  * 82625484914982912 / 2023346444509052928: gives 4096, should be zero
>  * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9
>  * 12097415865295708879 / 4545815675034402816: gives 130, should be 2
>  * 18195490362097456014 / 6999635335417036800: gives 10, should be 2
Oh, I have used signed multiplication instead of unsigned and that was
the reason of errors above, fixed that typo.
Tested on arm-7l with no new regressions.
Ok for trunk?

thanks, Dinar.

[-- Attachment #2: 14.patch --]
[-- Type: application/octet-stream, Size: 4513 bytes --]

diff -rup gcc-20120418-orig/gcc/expmed.c gcc-20120418/gcc/expmed.c
--- gcc-20120418-orig/gcc/expmed.c	2012-04-20 14:00:49.125256428 +0400
+++ gcc-20120418/gcc/expmed.c	2012-05-03 14:16:13.755219789 +0400
@@ -3523,6 +3523,106 @@ expand_mult_highpart_optab (enum machine
 	}
     }
 
+  if ((size - 1 > BITS_PER_WORD
+      && BITS_PER_WORD == 32 && mode == DImode)
+      && (!optimize_size) && (optimize>1))
+    {
+      unsigned HOST_WIDE_INT d;
+      rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result;
+
+      d = (INTVAL (op1) & GET_MODE_MASK (DImode));
+     
+      /* Extracting the higher part of the 64-bit multiplier */
+      x1 = gen_highpart (SImode, op0);
+      x1 = force_reg (SImode, x1);
+
+      /* Extracting the lower part of the 64-bit multiplier */
+      x0 = gen_lowpart (SImode, op0);
+      x0 = force_reg (SImode, x0);
+
+      x1 = convert_to_mode (DImode, x1, 1);
+      x0 = convert_to_mode (DImode, x0, 1);
+
+      /* Splitting 64-bit constant for higher and lower parts */ 
+      y0 = gen_rtx_CONST_INT (DImode, d&UINT_MAX);
+      y1 = gen_rtx_CONST_INT (DImode, d>>32);
+
+      z2 = gen_reg_rtx (DImode);
+      u0 = gen_reg_rtx (DImode);
+
+      /* Unsigned multiplication of the higher mulitplier part 
+	 and the higher constant part */
+      z2 = expand_mult(DImode, x1, y1, z2, 1);
+      /* Unsigned multiplication of the lower mulitplier part 
+         and the higher constant part */	
+      u0 = expand_mult(DImode, x0, y1, u0, 1);
+
+      z0 = gen_reg_rtx (DImode);
+      u1 = gen_reg_rtx (DImode);
+
+      /* Unsigned multiplication of the lower mulitplier part 
+         and the lower constant part */
+      z0 = expand_mult(DImode, x0, y0, z0, 1);
+
+      /* Unsigned multiplication of the higher mulitplier part 
+         and the lower constant part */
+      u1 = expand_mult(DImode, x1, y0, u1, 1);
+
+      /* Getting the higher part of multiplication between the lower mulitplier 
+         part and the lower constant part, the lower part is not interesting 
+         for the final result */
+      u0tmp = gen_highpart (SImode, z0);
+      u0tmp = force_reg (SImode, u0tmp);
+
+      /* Adding the higher part of multiplication between the lower mulitplier 
+         part and the lower constant part to the result of mutiliplication between
+	 the lower mulitplier part and the higher constant part. Please note, 
+	 that we couldn't get overflow here since in the worst case 
+         (0xffffffff*0xffffffff)+0xffffffff we get 0xffffffff00000000L */
+      expand_inc (u0, u0tmp);
+      tmp = gen_reg_rtx (DImode);
+      
+      /* Adding mutiliplication between the lower mulitplier part and the higher 
+         constant part with the higher part of multiplication between the lower 
+         mulitplier part and the lower constant part to the result of mutiliplication 
+         between the higher mulitplier part and the lower constant part */
+      tmp = expand_binop (DImode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+      if (!tmp)
+             return 0;
+
+      /* Checking for owerflow, please not that we couldn't user carry-flag
+         here before the reload pass .
+	 cres = (tmp < u0) || (tmp < u1); */
+      c = gen_reg_rtx (DImode);
+      c1 = gen_reg_rtx (DImode);
+      cres = gen_reg_rtx (DImode);
+      
+      emit_store_flag_force (c, GT, u0, tmp, DImode, 1, -1);
+      emit_store_flag_force (c1, GT, u1, tmp, DImode, 1, -1);
+      result = expand_binop (DImode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
+      if (!result)
+           return 0;
+
+      ccst = gen_reg_rtx (DImode);
+      ccst = expand_shift (LSHIFT_EXPR, DImode, cres, 32, ccst, 1);
+
+      /* Adding 0x10000000 in case of overflow to result of multiplication 
+         higher mulitplier part and higher constant part. Please note that
+         we don't have to check for overflow here because 
+         (0xffffffff*0xffffffff) + 0x100000000 equals to 0xffffffff00000001L */
+      expand_inc (z2, ccst);
+
+
+      /* Extrating the higher part of the sum */
+      tmp = gen_highpart (SImode, tmp);
+      tmp = force_reg (SImode, tmp);
+      
+      /* The final result, again we don't have to check for overflow here */
+      expand_inc (z2, tmp);
+      return z2;
+
+    }
+
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-05-03 10:28   ` Dinar Temirbulatov
@ 2012-05-03 13:41     ` Richard Earnshaw
  2012-05-22 14:05       ` Dinar Temirbulatov
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Earnshaw @ 2012-05-03 13:41 UTC (permalink / raw)
  To: Dinar Temirbulatov; +Cc: Michael Hope, gcc-patches, aph

On 03/05/12 11:27, Dinar Temirbulatov wrote:
> Hi,
> Here is updated version of patch. I added comments describing the algorithm.
> 
>> Hi Dinar.  I'm afraid it gives the wrong results for some dividends
>>  * 82625484914982912 / 2023346444509052928: gives 4096, should be zero
>>  * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9
>>  * 12097415865295708879 / 4545815675034402816: gives 130, should be 2
>>  * 18195490362097456014 / 6999635335417036800: gives 10, should be 2
> Oh, I have used signed multiplication instead of unsigned and that was
> the reason of errors above, fixed that typo.
> Tested on arm-7l with no new regressions.
> Ok for trunk?
> 

This clearly needs more work.

Comments:  Need to end with a period and two spaces.
Binary Operators: Need to be surrounded with white space.

As Andrew Haley has already said, some documentation of the algorithm is
needed.

Why is this restricted to BITS_PER_WORD == 32?

Costs: This code clearly needs to understand the relative cost of doing
division this way; there is a limit to the amount of inline expansion
that we should tolerate, particularly at -O2 and it's not clear that
this will be much better than a library call if we don't have a widening
multiply operation (as is true for older ARM chips, and presumably some
other CPUs).  In essence, you need to work out the cost of a divide
instruction (just as rtx costs for this) and the approximate cost of the
expanded algorithm.

Another issue that worries me is the number of live DImode values; on
machines with few registers this could cause excessive spilling to start
happening.

I also wonder whether it would be beneficial to generate custom
functions for division by specific constants (using this algorithm) and
then call those functions rather than the standard lib-call.  On ELF
systems the functions can marked as hidden and put into common sections
so that we only end up with one instance of each function in a program.

+      /* Checking for owerflow, please not that we couldn't user carry-flag
+         here before the reload pass .
+	 cres = (tmp < u0) || (tmp < u1); */

Generic code can't assume the presence of a carry flag either before or
after reload, so the latter part of the comment is somewhat meaningless.
 Also spelling error in comment.

Finally, do you have a copyright assignment with the FSF?  We can't use
this code without one.

R.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-05-03 13:41     ` Richard Earnshaw
@ 2012-05-22 14:05       ` Dinar Temirbulatov
  2012-05-22 15:46         ` Richard Henderson
  0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-05-22 14:05 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Michael Hope, gcc-patches, aph, Alexey Kravets

[-- Attachment #1: Type: text/plain, Size: 3438 bytes --]

Hi,
Here is the new version of the patch
I have fixed two errors in the previous version,
       on mips32 the compiler could not expand division and terminated
with ICE, this change fixed the issue:
      /* Extrating the higher part of the sum */
      tmp = gen_highpart (SImode, tmp);
      tmp = force_reg (SImode, tmp);
+      tmp = force_reg (SImode, tmp);
+      tmp = convert_to_mode (DImode, tmp, 1);

       and another error on i686 and mips32r2: I found that overflow
check in multiplication was not working. For example
0xfe34b4190a392b23/257 produced wrong result. Following change
resolved the issue:
       -emit_store_flag_force (c, GT, u0, tmp, DImode, 1, -1);
       +emit_store_flag_force (c, GT, u0, tmp, DImode, 1, 1);
Tested this new version of the patch on -none-linux-gnu with arm-7l,
mips-32r2 (74k), i686 without new regressions.

On Thu, May 3, 2012 at 5:40 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 03/05/12 11:27, Dinar Temirbulatov wrote:
>
>
> This clearly needs more work.
>
> Comments:  Need to end with a period and two spaces.
> Binary Operators: Need to be surrounded with white space.
sorry for this, hope I resolved such issues with the new version.
>
> As Andrew Haley has already said, some documentation of the algorithm is
> needed.
General documentation for the issue could be found here
gmplib.org/~tege/divcnst-pldi94.pdf.
Multiplication to get higher 128-bit was developed by me and Alexey
Kravets, I attached C version of the algorithm.
>
> Why is this restricted to BITS_PER_WORD == 32?
I am checking here that we are generating code for 32-bit target with
32-bit wide general propose registers, and with 64-bit wide registers
usually there is an instruction to get the higher 64-bit of 64x64-bit
multiplication cheaply.

>
> Costs: This code clearly needs to understand the relative cost of doing
> division this way; there is a limit to the amount of inline expansion
> that we should tolerate, particularly at -O2 and it's not clear that
> this will be much better than a library call if we don't have a widening
> multiply operation (as is true for older ARM chips, and presumably some
> other CPUs).  In essence, you need to work out the cost of a divide
> instruction (just as rtx costs for this) and the approximate cost of the
> expanded algorithm.
Added cost calculation.
>
> Another issue that worries me is the number of live DImode values; on
> machines with few registers this could cause excessive spilling to start
> happening.
The cost calculation suppose to take this into account.
>
> I also wonder whether it would be beneficial to generate custom
> functions for division by specific constants (using this algorithm) and
> then call those functions rather than the standard lib-call.  On ELF
> systems the functions can marked as hidden and put into common sections
> so that we only end up with one instance of each function in a program.
yes, I think it is a good approach, I could redo my patch with such
intrinsic function implementation with pre-shift, post-shift, and
64-bit constant as function parameters.
>
> Finally, do you have a copyright assignment with the FSF?  We can't use
> this code without one.
yes, I do have a copyright assignment with the FSF.
Also I am in process of implementing signed integer 64-bit division by constant.

                    thanks, Dinar.

[-- Attachment #2: 18.patch --]
[-- Type: application/octet-stream, Size: 5883 bytes --]

diff -rup gcc-20120418-orig/gcc/config/arm/arm.c gcc-20120418/gcc/config/arm/arm.c
--- gcc-20120418-orig/gcc/config/arm/arm.c	2012-04-20 13:59:17.521258861 +0400
+++ gcc-20120418/gcc/config/arm/arm.c	2012-05-14 15:38:44.980815823 +0400
@@ -7131,6 +7131,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou
 	*total = COSTS_N_INSNS (2);
       else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
 	*total = COSTS_N_INSNS (4);
+      else if (mode == DImode) 
+        *total = COSTS_N_INSNS (50);
       else
 	*total = COSTS_N_INSNS (20);
       return false;
diff -rup gcc-20120418-orig/gcc/config/mips/mips.c gcc-20120418/gcc/config/mips/mips.c
--- gcc-20120418-orig/gcc/config/mips/mips.c	2012-04-20 13:59:16.417258891 +0400
+++ gcc-20120418/gcc/config/mips/mips.c	2012-05-14 15:41:05.132812098 +0400
@@ -3845,8 +3845,13 @@ mips_rtx_costs (rtx x, int code, int out
 	    }
 	  *total = COSTS_N_INSNS (mips_idiv_insns ());
 	}
-      else if (mode == DImode)
+      else if (mode == DImode) {
+	if (!TARGET_64BIT)
+	   /* divide double integer libcall is expensive.  */
+	   *total = COSTS_N_INSNS (200);
+	  else
         *total = mips_cost->int_div_di;
+	}
       else
 	*total = mips_cost->int_div_si;
       return false;
diff -rup gcc-20120418-orig/gcc/expmed.c gcc-20120418/gcc/expmed.c
--- gcc-20120418-orig/gcc/expmed.c	2012-04-20 14:00:49.125256428 +0400
+++ gcc-20120418/gcc/expmed.c	2012-05-22 17:17:16.618291346 +0400
@@ -3523,6 +3523,110 @@ expand_mult_highpart_optab (enum machine
 	}
     }
 
+  if ((size - 1 > BITS_PER_WORD
+       && BITS_PER_WORD == 32 && mode == DImode)
+      && unsignedp
+      && (!optimize_size && (optimize>1))
+      && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
+          + shift_cost[speed][mode][31] < max_cost))
+    {
+      unsigned HOST_WIDE_INT d;
+      rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result;
+
+      d = (INTVAL (op1) & GET_MODE_MASK (DImode));
+
+      /* Extracting the higher part of the 64-bit multiplier.  */
+      x1 = gen_highpart (SImode, op0);
+      x1 = force_reg (SImode, x1);
+
+      /* Extracting the lower part of the 64-bit multiplier.  */
+      x0 = gen_lowpart (SImode, op0);
+      x0 = force_reg (SImode, x0);
+
+      x1 = convert_to_mode (DImode, x1, 1);
+      x0 = convert_to_mode (DImode, x0, 1);
+
+      /* Splitting the 64-bit constant for the higher and the lower parts.  */
+      y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
+      y1 = gen_rtx_CONST_INT (DImode, d>>32);
+
+      z2 = gen_reg_rtx (DImode);
+      u0 = gen_reg_rtx (DImode);
+
+      /* Unsigned multiplication of the higher multiplier part
+	 and the higher constant part.  */
+      z2 = expand_mult(DImode, x1, y1, z2, 1);
+      /* Unsigned multiplication of the lower multiplier part
+         and the higher constant part.  */
+      u0 = expand_mult(DImode, x0, y1, u0, 1);
+
+      z0 = gen_reg_rtx (DImode);
+      u1 = gen_reg_rtx (DImode);
+
+      /* Unsigned multiplication of the lower multiplier part
+         and the lower constant part.  */
+      z0 = expand_mult (DImode, x0, y0, z0, 1);
+
+      /* Unsigned multiplication of the higher multiplier part
+         the lower constant part.  */
+      u1 = expand_mult (DImode, x1, y0, u1, 1);
+
+      /* Getting the higher part of multiplication between the lower multiplier
+         part and the lower constant part, the lower part is not interesting
+         for the final result.  */
+      u0tmp = gen_highpart (SImode, z0);
+      u0tmp = force_reg (SImode, u0tmp);
+      u0tmp = convert_to_mode (DImode, u0tmp, 1);
+
+      /* Adding the higher part of multiplication between the lower multiplier
+         part and the lower constant part to the result of multiplication between
+	 the lower multiplier part and the higher constant part. Please note,
+	 that we couldn't get overflow here since in the worst case
+         (0xffffffff*0xffffffff)+0xffffffff we get 0xffffffff00000000L.  */
+      expand_inc (u0, u0tmp);
+      tmp = gen_reg_rtx (DImode);
+
+      /* Adding multiplication between the lower multiplier part and the higher
+         constant part with the higher part of multiplication between the lower
+         multiplier part and the lower constant part to the result of multiplication
+         between the higher multiplier part and the lower constant part.  */
+      tmp = expand_binop (DImode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+      if (!tmp)
+             return 0;
+
+      /* Checking for overflow.  */
+      c = gen_reg_rtx (DImode);
+      c1 = gen_reg_rtx (DImode);
+      cres = gen_reg_rtx (DImode);
+
+      emit_store_flag_force (c, GT, u0, tmp, DImode, 1, 1);
+      emit_store_flag_force (c1, GT, u1, tmp, DImode, 1, 1);
+      result = expand_binop (DImode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
+      if (!result)
+           return 0;
+
+      ccst = gen_reg_rtx (DImode);
+      ccst = expand_shift (LSHIFT_EXPR, DImode, cres, 32, ccst, 1);
+
+      /* Adding 0x10000000 in case of overflow to result of multiplication
+         higher multiplier part and higher constant part. Please note that
+         we don't have to check for overflow here because in the worst case
+         (0xffffffff*0xffffffff) + 0x100000000 equals to 0xffffffff00000001L.  */
+      expand_inc (z2, ccst);
+
+
+      /* Extracting the higher part of the sum.  */
+      tmp = gen_highpart (SImode, tmp);
+      tmp = force_reg (SImode, tmp);
+      tmp = convert_to_mode (DImode, tmp, 1);
+
+      /* The final result, again we don't have to check for overflow here.  */
+      expand_inc (z2, tmp);
+
+      return z2;
+
+    }
+
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing

[-- Attachment #3: mul.c --]
[-- Type: text/x-csrc, Size: 526 bytes --]

unsigned long long mul(unsigned long long a, unsigned long long b)
{
        unsigned long long x1=a>>32;
        unsigned long long x0=a&0x00000000ffffffff;
        unsigned long long y1=b>>32;
        unsigned long long y0=b&0x00000000ffffffff;
        unsigned long long z2, z0, tmp, u0, u1;
        unsigned char c=0;
        z2=x1*y1;
        z0=x0*y0;
        u0=x0*y1+(z0>>32);
        u1=x1*y0;
        tmp = (u0+u1);
        c = (tmp < u0) || (tmp < u1);
        return z2+(tmp>>32)+(((unsigned long long)c)<<32);
}


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-05-22 14:05       ` Dinar Temirbulatov
@ 2012-05-22 15:46         ` Richard Henderson
  2012-05-25 10:20           ` Dinar Temirbulatov
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2012-05-22 15:46 UTC (permalink / raw)
  To: Dinar Temirbulatov
  Cc: Richard Earnshaw, Michael Hope, gcc-patches, aph, Alexey Kravets

On 05/22/12 07:05, Dinar Temirbulatov wrote:
> +  if ((size - 1 > BITS_PER_WORD
> +       && BITS_PER_WORD == 32 && mode == DImode)

Do note that this patch will not go in with hard-coded
SImode and DImode references.

Which, really, you do not even need.

  && GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD

is what you wanted for testing for double-word-ness,
and word_mode is a good substitute for SImode here.

+      /* Splitting the 64-bit constant for the higher and the lower parts.  */
+      y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
+      y1 = gen_rtx_CONST_INT (DImode, d>>32);

Use gen_int_mode.

> +      x1 = convert_to_mode (DImode, x1, 1);
> +      x0 = convert_to_mode (DImode, x0, 1);
> +
> +      /* Splitting the 64-bit constant for the higher and the lower parts.  */
> +      y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
> +      y1 = gen_rtx_CONST_INT (DImode, d>>32);
> +
> +      z2 = gen_reg_rtx (DImode);
> +      u0 = gen_reg_rtx (DImode);
> +
> +      /* Unsigned multiplication of the higher multiplier part
> +	 and the higher constant part.  */
> +      z2 = expand_mult(DImode, x1, y1, z2, 1);
> +      /* Unsigned multiplication of the lower multiplier part
> +         and the higher constant part.  */
> +      u0 = expand_mult(DImode, x0, y1, u0, 1);

I'm fairly sure you really want to be using expand_widening_mult
here, rather than using convert_to_mode first.  While combine may
be able to re-generate a widening multiply out of this sequence,
there's no sense making it work too hard.



r~

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-05-22 15:46         ` Richard Henderson
@ 2012-05-25 10:20           ` Dinar Temirbulatov
  2012-05-26 12:35             ` Paolo Bonzini
  2012-05-26 12:39             ` Paolo Bonzini
  0 siblings, 2 replies; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-05-25 10:20 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Richard Earnshaw, Michael Hope, gcc-patches, aph, Alexey Kravets

[-- Attachment #1: Type: text/plain, Size: 2056 bytes --]

Hi,
I have replaced "expand_mult" to "expand_widening_mult" and removed
all direct references to DImode, SImode modes in the
expand_mult_highpart_optab funtion. The attached patch was tested on
arm-7l, mips-32r2 (74k), i686 without new regressions. Richard, do you
think it is ready now?
                    thanks, Dinar.

On Tue, May 22, 2012 at 7:45 PM, Richard Henderson <rth@redhat.com> wrote:
> On 05/22/12 07:05, Dinar Temirbulatov wrote:
>> +  if ((size - 1 > BITS_PER_WORD
>> +       && BITS_PER_WORD == 32 && mode == DImode)
>
> Do note that this patch will not go in with hard-coded
> SImode and DImode references.
>
> Which, really, you do not even need.
>
>  && GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD
>
> is what you wanted for testing for double-word-ness,
> and word_mode is a good substitute for SImode here.
>
> +      /* Splitting the 64-bit constant for the higher and the lower parts.  */
> +      y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
> +      y1 = gen_rtx_CONST_INT (DImode, d>>32);
>
> Use gen_int_mode.
>
>> +      x1 = convert_to_mode (DImode, x1, 1);
>> +      x0 = convert_to_mode (DImode, x0, 1);
>> +
>> +      /* Splitting the 64-bit constant for the higher and the lower parts.  */
>> +      y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
>> +      y1 = gen_rtx_CONST_INT (DImode, d>>32);
>> +
>> +      z2 = gen_reg_rtx (DImode);
>> +      u0 = gen_reg_rtx (DImode);
>> +
>> +      /* Unsigned multiplication of the higher multiplier part
>> +      and the higher constant part.  */
>> +      z2 = expand_mult(DImode, x1, y1, z2, 1);
>> +      /* Unsigned multiplication of the lower multiplier part
>> +         and the higher constant part.  */
>> +      u0 = expand_mult(DImode, x0, y1, u0, 1);
>
> I'm fairly sure you really want to be using expand_widening_mult
> here, rather than using convert_to_mode first.  While combine may
> be able to re-generate a widening multiply out of this sequence,
> there's no sense making it work too hard.
>
>
>
> r~

[-- Attachment #2: ChangeLog.txt --]
[-- Type: text/plain, Size: 503 bytes --]

2012-05-25  Dinar Temirbulatov  <dtemirbulatov@gmail.com>
	    Alexey Kravets  <mr.kayrick@gmail.com>
	  
	* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
	double-word division operation.
	* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
	double-word division operation for 32-bit targets.
	* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
	product for unsigned double-word integers using 32-bit wide registers.

[-- Attachment #3: 22.patch --]
[-- Type: application/octet-stream, Size: 5697 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 2cecf45..9d6983b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7131,6 +7131,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
 	*total = COSTS_N_INSNS (2);
       else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
 	*total = COSTS_N_INSNS (4);
+      else if (mode == DImode)
+        *total = COSTS_N_INSNS (50);
       else
 	*total = COSTS_N_INSNS (20);
       return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index d48a465..b5627c2 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3846,7 +3846,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
 	  *total = COSTS_N_INSNS (mips_idiv_insns ());
 	}
       else if (mode == DImode)
-        *total = mips_cost->int_div_di;
+        {
+          if (!TARGET_64BIT)
+            /* Divide double integer library call is expensive.  */
+            *total = COSTS_N_INSNS (200);
+          else
+            *total = mips_cost->int_div_di;
+        }
       else
 	*total = mips_cost->int_div_si;
       return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index aa24fbf..5f4c921 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3523,6 +3523,105 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
 	}
     }
 
+  if (unsignedp && (!optimize_size && (optimize>1))
+      && (size - 1 > BITS_PER_WORD
+          && BITS_PER_WORD == 32 && GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD)
+      && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
+          + shift_cost[speed][mode][31] < max_cost))
+    {
+      unsigned HOST_WIDE_INT d;
+      rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result;
+
+      d = (INTVAL (op1) & GET_MODE_MASK (mode));
+
+      /* Extracting the higher part of the 64-bit multiplier.  */
+      x1 = gen_highpart (word_mode, op0);
+      x1 = force_reg (word_mode, x1);
+
+      /* Extracting the lower part of the 64-bit multiplier.  */
+      x0 = gen_lowpart (word_mode, op0);
+      x0 = force_reg (word_mode, x0);
+
+      /* Splitting the 64-bit constant for the higher and the lower parts.  */
+      y0 = gen_int_mode(d & UINT32_MAX, word_mode);
+      y1 = gen_int_mode(d >> 32, word_mode);
+
+      z2 = gen_reg_rtx (mode);
+      u0 = gen_reg_rtx (mode);
+
+      /* Unsigned multiplication of the higher multiplier part
+	 and the higher constant part.  */
+      z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
+      /* Unsigned multiplication of the lower multiplier part
+	 and the higher constant part.  */
+      u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
+
+      z0 = gen_reg_rtx (mode);
+      u1 = gen_reg_rtx (mode);
+
+      /* Unsigned multiplication of the lower multiplier part
+	 and the lower constant part.  */
+      z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
+
+      /* Unsigned multiplication of the higher multiplier part
+	 the lower constant part.  */
+      u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
+
+      /* Getting the higher part of multiplication between the lower multiplier
+	 part and the lower constant part, the lower part is not interesting
+	 for the final result.  */
+      u0tmp = gen_highpart (word_mode, z0);
+      u0tmp = force_reg (word_mode, u0tmp);
+      u0tmp = convert_to_mode (mode, u0tmp, 1);
+
+      /* Adding the higher part of multiplication between the lower multiplier
+	 part and the lower constant part to the result of multiplication between
+	 the lower multiplier part and the higher constant part. Please note,
+	 that we couldn't get overflow here since in the worst case
+	 (0xffffffff*0xffffffff)+0xffffffff we get 0xffffffff00000000L.  */
+      expand_inc (u0, u0tmp);
+      tmp = gen_reg_rtx (mode);
+
+      /* Adding multiplication between the lower multiplier part and the higher
+	 constant part with the higher part of multiplication between the lower
+	 multiplier part and the lower constant part to the result of multiplication
+	 between the higher multiplier part and the lower constant part.  */
+      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+      if (!tmp)
+             return 0;
+
+      /* Checking for overflow.  */
+      c = gen_reg_rtx (mode);
+      c1 = gen_reg_rtx (mode);
+      cres = gen_reg_rtx (mode);
+
+      emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
+      emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
+      result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
+      if (!result)
+           return 0;
+
+      ccst = gen_reg_rtx (mode);
+      ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
+
+      /* Adding 0x10000000 in case of overflow to the result of multiplication
+	 between the higher multiplier part and the higher constant part. Please note,
+	 that we don't have to check for overflow here because in the worst case
+	 (0xffffffff*0xffffffff) + 0x100000000 equals to 0xffffffff00000001L.  */
+      expand_inc (z2, ccst);
+
+      /* Extracting the higher part of the sum.  */
+      tmp = gen_highpart (word_mode, tmp);
+      tmp = force_reg (word_mode, tmp);
+      tmp = convert_to_mode (mode, tmp, 1);
+
+      /* The final result, again we don't have to check for overflow here.  */
+      expand_inc (z2, tmp);
+
+      return z2;
+
+    }
+
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-05-25 10:20           ` Dinar Temirbulatov
@ 2012-05-26 12:35             ` Paolo Bonzini
  2012-05-26 12:46               ` Paolo Bonzini
  2012-05-26 12:39             ` Paolo Bonzini
  1 sibling, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2012-05-26 12:35 UTC (permalink / raw)
  To: Dinar Temirbulatov
  Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
	aph, Alexey Kravets


> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 2cecf45..9d6983b 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -7131,6 +7131,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
>  	*total = COSTS_N_INSNS (2);
>        else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
>  	*total = COSTS_N_INSNS (4);
> +      else if (mode == DImode)
> +        *total = COSTS_N_INSNS (50);
>        else
>  	*total = COSTS_N_INSNS (20);
>        return false;
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index d48a465..b5627c2 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -3846,7 +3846,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
>  	  *total = COSTS_N_INSNS (mips_idiv_insns ());
>  	}
>        else if (mode == DImode)
> -        *total = mips_cost->int_div_di;
> +        {
> +          if (!TARGET_64BIT)
> +            /* Divide double integer library call is expensive.  */
> +            *total = COSTS_N_INSNS (200);
> +          else
> +            *total = mips_cost->int_div_di;
> +        }
>        else
>  	*total = mips_cost->int_div_si;
>        return false;
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index aa24fbf..5f4c921 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -3523,6 +3523,105 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
>  	}
>      }
>  
> +  if (unsignedp && (!optimize_size && (optimize>1))
> +      && (size - 1 > BITS_PER_WORD
> +          && BITS_PER_WORD == 32 && GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD)

These references to 32-bits are still wrong (and unnecessary, just
remove them).

> +      && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
> +          + shift_cost[speed][mode][31] < max_cost))
> +    {
> +      unsigned HOST_WIDE_INT d;
> +      rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result;
> +
> +      d = (INTVAL (op1) & GET_MODE_MASK (mode));

This could be a CONST_DOUBLE.  But you don't need "d", because you can...

> +      /* Extracting the higher part of the 64-bit multiplier.  */
> +      x1 = gen_highpart (word_mode, op0);
> +      x1 = force_reg (word_mode, x1);
> +
> +      /* Extracting the lower part of the 64-bit multiplier.  */
> +      x0 = gen_lowpart (word_mode, op0);
> +      x0 = force_reg (word_mode, x0);
> +
> +      /* Splitting the 64-bit constant for the higher and the lower parts.  */
> +      y0 = gen_int_mode(d & UINT32_MAX, word_mode);
> +      y1 = gen_int_mode(d >> 32, word_mode);

... use gen_lowpart and gen_highpart directly on op1.

> +
> +      z2 = gen_reg_rtx (mode);
> +      u0 = gen_reg_rtx (mode);
> +
> +      /* Unsigned multiplication of the higher multiplier part
> +	 and the higher constant part.  */
> +      z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
> +      /* Unsigned multiplication of the lower multiplier part
> +	 and the higher constant part.  */
> +      u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
> +
> +      z0 = gen_reg_rtx (mode);
> +      u1 = gen_reg_rtx (mode);
> +
> +      /* Unsigned multiplication of the lower multiplier part
> +	 and the lower constant part.  */
> +      z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
> +
> +      /* Unsigned multiplication of the higher multiplier part
> +	 the lower constant part.  */
> +      u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);

Up to here the comments are not necessary.

> +      /* Getting the higher part of multiplication between the lower multiplier
> +	 part and the lower constant part, the lower part is not interesting
> +	 for the final result.  */
> +      u0tmp = gen_highpart (word_mode, z0);
> +      u0tmp = force_reg (word_mode, u0tmp);
> +      u0tmp = convert_to_mode (mode, u0tmp, 1);
> +
> +      /* Adding the higher part of multiplication between the lower multiplier
> +	 part and the lower constant part to the result of multiplication between
> +	 the lower multiplier part and the higher constant part. Please note,
> +	 that we couldn't get overflow here since in the worst case
> +	 (0xffffffff*0xffffffff)+0xffffffff we get 0xffffffff00000000L.  */

The command can simply be "compute the middle word of the three-word
intermediate result."  Also it's not overflow, it's carry.

> +      expand_inc (u0, u0tmp);
> +      tmp = gen_reg_rtx (mode);
> +
> +      /* Adding multiplication between the lower multiplier part and the higher
> +	 constant part with the higher part of multiplication between the lower
> +	 multiplier part and the lower constant part to the result of multiplication
> +	 between the higher multiplier part and the lower constant part.  */

Here you have to explain:

       /* We have to return

            z2 + ((u0 + u1) >> GET_MODE_BITSIZE (word_mode)).

          u0 + u1 are the upper two words of the three-word
          intermediate result and they could have up to
          2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
          We compute the extra bit by checking for carry, and add
          1 << GET_MODE_BITSIZE (word_mode) to z2 if there is carry.  */

> +      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
> +      if (!tmp)
> +             return 0;

         /* We have to return z2 + (tmp >> 32).  We need
> +      /* Checking for overflow.  */

This is not overflow, it's carry (see above).

> +      c = gen_reg_rtx (mode);
> +      c1 = gen_reg_rtx (mode);
> +      cres = gen_reg_rtx (mode);
> +
> +      emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
> +      emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
> +      result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
> +      if (!result)
> +           return 0;
> +
> +      ccst = gen_reg_rtx (mode);
> +      ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);

This 32 should be GET_MODE_BITSIZE (word_mode).

> +
> +      /* Adding 0x10000000 in case of overflow to the result of multiplication

One 0 missing in the constant.

> +	 between the higher multiplier part and the higher constant part. Please note,
> +	 that we don't have to check for overflow here because in the worst case
> +	 (0xffffffff*0xffffffff) + 0x100000000 equals to 0xffffffff00000001L.  */

Again, s/overflow/carry/.

> +      expand_inc (z2, ccst);
> +      /* Extracting the higher part of the sum.  */
> +      tmp = gen_highpart (word_mode, tmp);
> +      tmp = force_reg (word_mode, tmp);
> +      tmp = convert_to_mode (mode, tmp, 1);
> +
> +      /* The final result, again we don't have to check for overflow here.  */
> +      expand_inc (z2, tmp);
> +
> +      return z2;
> +
> +    }
> +
>    /* Try widening multiplication of opposite signedness, and adjust.  */
>    moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
>    if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-05-25 10:20           ` Dinar Temirbulatov
  2012-05-26 12:35             ` Paolo Bonzini
@ 2012-05-26 12:39             ` Paolo Bonzini
  1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-05-26 12:39 UTC (permalink / raw)
  To: Dinar Temirbulatov
  Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
	aph, Alexey Kravets

Il 25/05/2012 12:20, Dinar Temirbulatov ha scritto:
> +      emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
> +      emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
> +      result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
> +      if (!result)
> +           return 0;

Ah, you don't need the or.  u0 < tmp is already giving the overflow.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-05-26 12:35             ` Paolo Bonzini
@ 2012-05-26 12:46               ` Paolo Bonzini
  2012-06-07 10:21                 ` Dinar Temirbulatov
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2012-05-26 12:46 UTC (permalink / raw)
  Cc: Dinar Temirbulatov, Richard Henderson, Richard Earnshaw,
	Michael Hope, gcc-patches, aph, Alexey Kravets

Il 26/05/2012 14:35, Paolo Bonzini ha scritto:
>        /* We have to return
> 
>             z2 + ((u0 + u1) >> GET_MODE_BITSIZE (word_mode)).
> 
>           u0 + u1 are the upper two words of the three-word
>           intermediate result and they could have up to
>           2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
>           We compute the extra bit by checking for carry, and add
>           1 << GET_MODE_BITSIZE (word_mode) to z2 if there is carry.  */

Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD.

>> > +      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
>> > +      if (!tmp)
>> > +             return 0;
>          /* We have to return z2 + (tmp >> 32).  We need
>> > +      /* Checking for overflow.  */
> This is not overflow, it's carry (see above).
> 
>> > +      c = gen_reg_rtx (mode);
>> > +      c1 = gen_reg_rtx (mode);
>> > +      cres = gen_reg_rtx (mode);
>> > +
>> > +      emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
>> > +      emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
>> > +      result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
>> > +      if (!result)
>> > +           return 0;
>> > +
>> > +      ccst = gen_reg_rtx (mode);
>> > +      ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
> This 32 should be GET_MODE_BITSIZE (word_mode).

Here, too.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-05-26 12:46               ` Paolo Bonzini
@ 2012-06-07 10:21                 ` Dinar Temirbulatov
  2012-06-07 10:43                   ` Dinar Temirbulatov
  0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-07 10:21 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
	aph, Alexey Kravets

[-- Attachment #1: Type: text/plain, Size: 1735 bytes --]

Hi,
Here is new version of patch based up on Paolo review, again tested on
arm-7l, mips-32r2 (74k), i686 without new regressions.
                          thanks, Dinar.

On Sat, May 26, 2012 at 4:45 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> Il 26/05/2012 14:35, Paolo Bonzini ha scritto:
>>        /* We have to return
>>
>>             z2 + ((u0 + u1) >> GET_MODE_BITSIZE (word_mode)).
>>
>>           u0 + u1 are the upper two words of the three-word
>>           intermediate result and they could have up to
>>           2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
>>           We compute the extra bit by checking for carry, and add
>>           1 << GET_MODE_BITSIZE (word_mode) to z2 if there is carry.  */
>
> Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD.
>
>>> > +      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
>>> > +      if (!tmp)
>>> > +             return 0;
>>          /* We have to return z2 + (tmp >> 32).  We need
>>> > +      /* Checking for overflow.  */
>> This is not overflow, it's carry (see above).
>>
>>> > +      c = gen_reg_rtx (mode);
>>> > +      c1 = gen_reg_rtx (mode);
>>> > +      cres = gen_reg_rtx (mode);
>>> > +
>>> > +      emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
>>> > +      emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
>>> > +      result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
>>> > +      if (!result)
>>> > +           return 0;
>>> > +
>>> > +      ccst = gen_reg_rtx (mode);
>>> > +      ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
>> This 32 should be GET_MODE_BITSIZE (word_mode).
>
> Here, too.
>
> Paolo
>

[-- Attachment #2: ChangeLog.txt --]
[-- Type: text/plain, Size: 503 bytes --]

2012-06-07  Dinar Temirbulatov  <dtemirbulatov@gmail.com>
	    Alexey Kravets  <mr.kayrick@gmail.com>
	  
	* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
	double-word division operation.
	* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
	double-word division operation for 32-bit targets.
	* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
	product for unsigned double-word integers using 32-bit wide registers.

[-- Attachment #3: 28.patch --]
[-- Type: application/octet-stream, Size: 4181 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8a86227..0f8120f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
 	*total = COSTS_N_INSNS (2);
       else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
 	*total = COSTS_N_INSNS (4);
+      else if (mode == DImode) 
+        *total = COSTS_N_INSNS (50);
       else
 	*total = COSTS_N_INSNS (20);
       return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5bcb7a8..57bb4cc 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
 	    }
 	  *total = COSTS_N_INSNS (mips_idiv_insns ());
 	}
-      else if (mode == DImode)
+      else if (mode == DImode) {
+	if (!TARGET_64BIT)
+	   /* divide double integer libcall is expensive.  */
+	   *total = COSTS_N_INSNS (200);
+	  else
         *total = mips_cost->int_div_di;
+	}
       else
 	*total = mips_cost->int_div_si;
       return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 98f7c09..5108df9 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3539,6 +3539,84 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
 	}
     }
 
+  if (unsignedp
+      && size - 1 > BITS_PER_WORD 
+      && (!optimize_size && (optimize>1))
+      && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
+          + shift_cost[speed][mode][31] < max_cost))
+    {
+      rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, carry, carry_result, result;
+
+      /* Extracting the higher part of the 64-bit multiplier.  */
+      x1 = gen_highpart (word_mode, op0);
+      x1 = force_reg (word_mode, x1);
+
+      /* Extracting the lower part of the 64-bit multiplier.  */
+      x0 = gen_lowpart (word_mode, op0);
+      x0 = force_reg (word_mode, x0);
+
+      /* Splitting the 64-bit constant for the higher and the lower parts.  */
+      y0 = gen_lowpart (word_mode, op1);
+      y0 = force_reg (word_mode, y0);
+      y1 = gen_highpart_mode (word_mode, mode, op1);
+
+      z2 = gen_reg_rtx (mode);
+      u0 = gen_reg_rtx (mode);
+
+      z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
+
+      u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
+
+      z0 = gen_reg_rtx (mode);
+      u1 = gen_reg_rtx (mode);
+
+      z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
+
+      u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
+
+      /* Compute the middle word of the three-word intermediate result.  */
+      u0tmp = gen_highpart (word_mode, z0);
+      u0tmp = force_reg (word_mode, u0tmp);
+      u0tmp = convert_to_mode (mode, u0tmp, 1);
+
+      /* We have to return
+	 z2 + ((u0 + u1) >> BITS_PER_WORD)
+	 u0 + u1 are the upper two words of the three-word
+	 intermediate result and they could have up to
+	 2 * BITS_PER_WORD + 1 bits of precision.
+	 We compute the extra bit by checking for carry, and add
+	 1 << BITS_PER_WORD to z2 if there is carry.  */
+
+      expand_inc (u0, u0tmp);
+      tmp = gen_reg_rtx (mode);
+
+      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+      if (!tmp)
+             return 0;
+
+      /* Checking for carry here.  */
+      carry = gen_reg_rtx (mode);
+
+      emit_store_flag_force (carry, GT, u0, tmp, mode, 1, 1);
+
+      carry_result = gen_reg_rtx (mode);
+      carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1);
+
+      /* Adding 0x100000000 as carry here if required.  */
+      expand_inc (z2, carry_result);
+
+      /* Extracting the higher part of the sum.  */
+      tmp = gen_highpart (word_mode, tmp);
+      tmp = force_reg (word_mode, tmp);
+      tmp = convert_to_mode (mode, tmp, 1);
+
+      /* The final result */
+      expand_inc (z2, tmp);
+
+      return z2;
+
+    }
+
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-06-07 10:21                 ` Dinar Temirbulatov
@ 2012-06-07 10:43                   ` Dinar Temirbulatov
  2012-06-07 14:36                     ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-07 10:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
	aph, Alexey Kravets

[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]

oh, I found typo in comment in the end of patch. fixed.
                        thanks, Dinar.

On Thu, Jun 7, 2012 at 2:14 PM, Dinar Temirbulatov
<dtemirbulatov@gmail.com> wrote:
> Hi,
> Here is new version of patch based up on Paolo review, again tested on
> arm-7l, mips-32r2 (74k), i686 without new regressions.
>                          thanks, Dinar.
>
> On Sat, May 26, 2012 at 4:45 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> Il 26/05/2012 14:35, Paolo Bonzini ha scritto:
>>>        /* We have to return
>>>
>>>             z2 + ((u0 + u1) >> GET_MODE_BITSIZE (word_mode)).
>>>
>>>           u0 + u1 are the upper two words of the three-word
>>>           intermediate result and they could have up to
>>>           2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
>>>           We compute the extra bit by checking for carry, and add
>>>           1 << GET_MODE_BITSIZE (word_mode) to z2 if there is carry.  */
>>
>> Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD.
>>
>>>> > +      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
>>>> > +      if (!tmp)
>>>> > +             return 0;
>>>          /* We have to return z2 + (tmp >> 32).  We need
>>>> > +      /* Checking for overflow.  */
>>> This is not overflow, it's carry (see above).
>>>
>>>> > +      c = gen_reg_rtx (mode);
>>>> > +      c1 = gen_reg_rtx (mode);
>>>> > +      cres = gen_reg_rtx (mode);
>>>> > +
>>>> > +      emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
>>>> > +      emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
>>>> > +      result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
>>>> > +      if (!result)
>>>> > +           return 0;
>>>> > +
>>>> > +      ccst = gen_reg_rtx (mode);
>>>> > +      ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
>>> This 32 should be GET_MODE_BITSIZE (word_mode).
>>
>> Here, too.
>>
>> Paolo
>>

[-- Attachment #2: 28.patch --]
[-- Type: application/octet-stream, Size: 4183 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8a86227..0f8120f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
 	*total = COSTS_N_INSNS (2);
       else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
 	*total = COSTS_N_INSNS (4);
+      else if (mode == DImode) 
+        *total = COSTS_N_INSNS (50);
       else
 	*total = COSTS_N_INSNS (20);
       return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5bcb7a8..57bb4cc 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
 	    }
 	  *total = COSTS_N_INSNS (mips_idiv_insns ());
 	}
-      else if (mode == DImode)
+      else if (mode == DImode) {
+	if (!TARGET_64BIT)
+	   /* divide double integer libcall is expensive.  */
+	   *total = COSTS_N_INSNS (200);
+	  else
         *total = mips_cost->int_div_di;
+	}
       else
 	*total = mips_cost->int_div_si;
       return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 98f7c09..bb4d7cd 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3539,6 +3539,84 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
 	}
     }
 
+  if (unsignedp
+      && size - 1 > BITS_PER_WORD 
+      && (!optimize_size && (optimize>1))
+      && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
+          + shift_cost[speed][mode][31] < max_cost))
+    {
+      rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, carry, carry_result, result;
+
+      /* Extracting the higher part of the 64-bit multiplier.  */
+      x1 = gen_highpart (word_mode, op0);
+      x1 = force_reg (word_mode, x1);
+
+      /* Extracting the lower part of the 64-bit multiplier.  */
+      x0 = gen_lowpart (word_mode, op0);
+      x0 = force_reg (word_mode, x0);
+
+      /* Splitting the 64-bit constant for the higher and the lower parts.  */
+      y0 = gen_lowpart (word_mode, op1);
+      y0 = force_reg (word_mode, y0);
+      y1 = gen_highpart_mode (word_mode, mode, op1);
+
+      z2 = gen_reg_rtx (mode);
+      u0 = gen_reg_rtx (mode);
+
+      z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
+
+      u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
+
+      z0 = gen_reg_rtx (mode);
+      u1 = gen_reg_rtx (mode);
+
+      z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
+
+      u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
+
+      /* Compute the middle word of the three-word intermediate result.  */
+      u0tmp = gen_highpart (word_mode, z0);
+      u0tmp = force_reg (word_mode, u0tmp);
+      u0tmp = convert_to_mode (mode, u0tmp, 1);
+
+      /* We have to return
+	 z2 + ((u0 + u1) >> BITS_PER_WORD)
+	 u0 + u1 are the upper two words of the three-word
+	 intermediate result and they could have up to
+	 2 * BITS_PER_WORD + 1 bits of precision.
+	 We compute the extra bit by checking for carry, and add
+	 1 << BITS_PER_WORD to z2 if there is carry.  */
+
+      expand_inc (u0, u0tmp);
+      tmp = gen_reg_rtx (mode);
+
+      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+      if (!tmp)
+             return 0;
+
+      /* Checking for carry here.  */
+      carry = gen_reg_rtx (mode);
+
+      emit_store_flag_force (carry, GT, u0, tmp, mode, 1, 1);
+
+      carry_result = gen_reg_rtx (mode);
+      carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1);
+
+      /* Adding 0x100000000 as carry here if required.  */
+      expand_inc (z2, carry_result);
+
+      /* Extracting the higher part of the sum.  */
+      tmp = gen_highpart (word_mode, tmp);
+      tmp = force_reg (word_mode, tmp);
+      tmp = convert_to_mode (mode, tmp, 1);
+
+      /* The final result.  */
+      expand_inc (z2, tmp);
+
+      return z2;
+
+    }
+
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-06-07 10:43                   ` Dinar Temirbulatov
@ 2012-06-07 14:36                     ` Paolo Bonzini
  2012-06-08 18:37                       ` Dinar Temirbulatov
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2012-06-07 14:36 UTC (permalink / raw)
  To: Dinar Temirbulatov
  Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
	aph, Alexey Kravets

Il 07/06/2012 12:21, Dinar Temirbulatov ha scritto:
> oh, I found typo in comment in the end of patch. fixed.

Great improvement, thanks!

Unfortunately we're not there yet, but much closer!  I could understand
the new code much better so I suggest some more improvements below, both
to the comments and to the code generation.

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 8a86227..0f8120f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
>  	*total = COSTS_N_INSNS (2);
>        else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
>  	*total = COSTS_N_INSNS (4);
> +      else if (mode == DImode) 
> +        *total = COSTS_N_INSNS (50);
>        else
>  	*total = COSTS_N_INSNS (20);
>        return false;
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 5bcb7a8..57bb4cc 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
>  	    }
>  	  *total = COSTS_N_INSNS (mips_idiv_insns ());
>  	}
> -      else if (mode == DImode)
> +      else if (mode == DImode) {
> +	if (!TARGET_64BIT)
> +	   /* divide double integer libcall is expensive.  */
> +	   *total = COSTS_N_INSNS (200);
> +	  else
>          *total = mips_cost->int_div_di;
> +	}
>        else
>  	*total = mips_cost->int_div_si;
>        return false;
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 98f7c09..bb4d7cd 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -3539,6 +3539,84 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
>  	}
>      }
>  
> +  if (unsignedp
> +      && size - 1 > BITS_PER_WORD 
> +      && (!optimize_size && (optimize>1))

Coding style: "(!optimize_size && optimize > 1)".

> +      && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
> +          + shift_cost[speed][mode][31] < max_cost))

Thanks, this is now much cleaner and I could see other improvements.

This should be

  3 * mul_widen_cost[speed][mode] + mul_highpart_cost[speed][mode] +
  4 * add_cost[speed][mode] + add_cost[speed][word_mode]

That is because there is no shift really: a shift by 32 is simply moving
the operand to the higher word, and an add of that value will ignore the
lower word.  Hence, summing carry_result is cheaper: that is
add_cost[speed][word_mode].

On the other hand you also have to consider the comparison emitted by
emit_store_flag_force, which will usually cost the same as an addition.
 That is the fourth add_cost[speed][mode].

> +    {
> +      rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, carry, carry_result, result;
> +      /* Extracting the higher part of the 64-bit multiplier.  */
> +      x1 = gen_highpart (word_mode, op0);
> +      x1 = force_reg (word_mode, x1);
> +
> +      /* Extracting the lower part of the 64-bit multiplier.  */
> +      x0 = gen_lowpart (word_mode, op0);
> +      x0 = force_reg (word_mode, x0);
> +
> +      /* Splitting the 64-bit constant for the higher and the lower parts.  */
> +      y0 = gen_lowpart (word_mode, op1);
> +      y0 = force_reg (word_mode, y0);
> +      y1 = gen_highpart_mode (word_mode, mode, op1);
> +
> +      z2 = gen_reg_rtx (mode);
> +      u0 = gen_reg_rtx (mode);

You do not need the gen_reg_rtx; just pass a NULL_RTX target to
expand_widening_mult.

> +      z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
> +

Remove the empty line.  Also, let's rename the values to make clear
where is the multiplication of what:

z2 -> u11
u0 -> u01
z0 -> u00
u1 -> u10


> +      u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
> +
> +      z0 = gen_reg_rtx (mode);
> +      u1 = gen_reg_rtx (mode);

gen_reg_rtx is not needed here too.

> +      z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
> +

And neither is this blank line.

> +      u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
> +      /* Compute the middle word of the three-word intermediate result.  */
                        ^^^^^^

Oops, this is the low word, not the middle.  But let's improve the
comment to explain the algorithm.

       /* u00, u01, u10, u11 form a three-word value with the result
          in the top word, so we want to return this:

            ((u11 << 2*BITS_PER_WORD) +
                     (u01 + u10 << BITS_PER_WORD) +
                            u00) >> 3 * BITS_PER_WORD

          We then rewrite it this way:

            (u11 + ((u01 + u10 + (u00 >> BITS_PER_WORD))
               >> BITS_PER_WORD) >> BITS_PER_WORD

          where the shifts are realized with gen_highpart and a
          conversion back to the wider mode.  */

> +      u0tmp = gen_highpart (word_mode, z0);
> +      u0tmp = force_reg (word_mode, u0tmp);
> +      u0tmp = convert_to_mode (mode, u0tmp, 1);

u0tmp -> u00h

Put the expand_inc (u01, u00h); before the comment.  The formula is
now above so we can say more simply:

         /* Summing u01, u10 and u00h together could have up to
	    2 * BITS_PER_WORD + 1 bits of precision.
	    We compute the extra bit by checking for carry, and add
	    1 << BITS_PER_WORD to u11 if there is carry.  */

> +
> +      expand_inc (u0, u0tmp);
> +      tmp = gen_reg_rtx (mode);
> +      tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);

Now that you have a single emit_store_flag_force, you can avoid "tmp =
gen_reg_rtx (mode)" and just use

         expand_inc (u01, u10);

> +      if (!tmp)
> +             return 0;

This cannot fail, you can remove the "if".

> +
> +      /* Checking for carry here.  */
> +      carry = gen_reg_rtx (mode);
> +      emit_store_flag_force (carry, GT, u0, tmp, mode, 1, 1);

Since above you will use u01 as the target, you have to use u10 instead
here:

         carry = emit_store_flag_force (NULL_RTX, GT, u10, u01,
                                        mode, 1, 1);

i.e. operand > result.  That's a nice improvement, and should generate
optimal code like:

        add  r0, r4      ; r0:r1 += 0:r4           u0 += u0h
        adc  r1, 0
        add  r0, r2      ; r0:r1 += r2:r3
        adc  r1, r3
        sub  r2, r0      ; flags = r2:r3 CMP r0:r1
        sbc  r3, r1
        it   hi          ; if r2:r3 > r0:r1
         add r6, #1      ;    ... r6:r7 += 1:0
        add  r6, r0      ; r6:r7 += r0:r1
        adc  r7, r1

for everything after the multiplications.  This matches nicely the cost
estimation above.

> +      carry_result = gen_reg_rtx (mode);

No need for this gen_reg_rtx, either, by passing a NULL_RTX target below.

> +      carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1);
> +
> +      /* Adding 0x100000000 as carry here if required.  */

Oops, a remnant of 32-bit specific code.

/* Adding 1 << BITS_PER_WORD as carry here if required.  */

> +      expand_inc (z2, carry_result);
> +
> +      /* Extracting the higher part of the sum.  */
> +      tmp = gen_highpart (word_mode, tmp);
> +      tmp = force_reg (word_mode, tmp);
> +      tmp = convert_to_mode (mode, tmp, 1);

And these will use u01 instead of tmp.

> +      /* The final result.  */
> +      expand_inc (z2, tmp);
> +      return z2;
> +
> +    }
> +
>    /* Try widening multiplication of opposite signedness, and adjust.  */
>    moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
>    if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
> 

I hope you appreciate the improvements!

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-06-07 14:36                     ` Paolo Bonzini
@ 2012-06-08 18:37                       ` Dinar Temirbulatov
  2012-06-11  8:03                         ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-08 18:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
	aph, Alexey Kravets

[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]

Hi, Paolo.
Here is the new version of patch. I have tested this version with gcc
testsuite only on i686 without new regressions, for now. Mips and arm
tests are in progress.

One strange thing I noticed:
>
> No need for this gen_reg_rtx, either, by passing a NULL_RTX target below.
>
>> +      carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1);
>> +
>> +      /* Adding 0x100000000 as carry here if required.  */
>
> Oops, a remnant of 32-bit specific code.
>

that I have to add convert_to_mode () to DImode after
emit_store_flag_force (), since emit_store_flag_force () returns
"carry" in SImode and without convert_to_mode () call compiler fails
with this error:

Breakpoint 2, simplify_subreg (outermode=SImode, op=0x7ffff56cdf20,
innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5423
5423      gcc_assert (GET_MODE (op) == innermode
(gdb) bt
#0  simplify_subreg (outermode=SImode, op=0x7ffff56cdf20,
innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5423
#1  0x0000000000aea223 in simplify_gen_subreg (outermode=SImode,
op=0x7ffff56cdf20, innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5763
#2  0x0000000000733c99 in operand_subword (op=0x7ffff56cdf20,
offset=0, validate_address=1, mode=DImode) at
../../gcc-20120418-1/gcc/emit-rtl.c:1427
#3  0x0000000000733cc6 in operand_subword_force (op=0x7ffff56cdf20,
offset=0, mode=DImode) at ../../gcc-20120418-1/gcc/emit-rtl.c:1440
#4  0x0000000000a016b3 in expand_binop (mode=DImode,
binoptab=0x195f580, op0=0x7ffff56cdf20, op1=0x7ffff583d670,
target=0x7ffff56cdfa0, unsignedp=1, methods=OPTAB_DIRECT)
    at ../../gcc-20120418-1/gcc/optabs.c:1779
#5  0x00000000007525af in expand_shift_1 (code=LSHIFT_EXPR,
mode=DImode, shifted=0x7ffff56cdf20, amount=0x7ffff583d670,
target=0x0, unsignedp=1)
    at ../../gcc-20120418-1/gcc/expmed.c:2273
#6  0x00000000007526b6 in expand_shift (code=LSHIFT_EXPR, mode=DImode,
shifted=0x7ffff56cdf20, amount=32, target=0x0, unsignedp=1) at
../../gcc-20120418-1/gcc/expmed.c:2318
#7  0x00000000007563e6 in expand_mult_highpart_optab (mode=DImode,
op0=0x7ffff56cdcc0, op1=0x7ffff56b1e00, target=0x0, unsignedp=1,
max_cost=188)
    at ../../gcc-20120418-1/gcc/expmed.c:3581
#8  0x0000000000756747 in expand_mult_highpart (mode=DImode,
op0=0x7ffff56cdcc0, op1=0x7ffff56b1e00, target=0x0, unsignedp=1,
max_cost=188)
    at ../../gcc-20120418-1/gcc/expmed.c:3654

                                  thanks, Dinar.

[-- Attachment #2: 30.patch --]
[-- Type: application/octet-stream, Size: 4331 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8a86227..0f8120f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
 	*total = COSTS_N_INSNS (2);
       else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
 	*total = COSTS_N_INSNS (4);
+      else if (mode == DImode) 
+        *total = COSTS_N_INSNS (50);
       else
 	*total = COSTS_N_INSNS (20);
       return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5bcb7a8..57bb4cc 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
 	    }
 	  *total = COSTS_N_INSNS (mips_idiv_insns ());
 	}
-      else if (mode == DImode)
+      else if (mode == DImode) {
+	if (!TARGET_64BIT)
+	   /* divide double integer libcall is expensive.  */
+	   *total = COSTS_N_INSNS (200);
+	  else
         *total = mips_cost->int_div_di;
+	}
       else
 	*total = mips_cost->int_div_si;
       return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 98f7c09..51a0d9b 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3539,6 +3539,78 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
 	}
     }
 
+  if (unsignedp
+      && size - 1 > BITS_PER_WORD 
+      && (!optimize_size && optimize > 1)
+      && (3 * mul_widen_cost[speed][mode] + mul_highpart_cost[speed][mode] +
+          4 * add_cost[speed][mode] + add_cost[speed][word_mode] < max_cost))
+    {
+      rtx x1, x0, y1, y0, u00, u01, u10, u11, u00h, carry, carry_result;
+
+      /* Extracting the higher part of the 64-bit multiplier.  */
+      x1 = gen_highpart (word_mode, op0);
+      x1 = force_reg (word_mode, x1);
+
+      /* Extracting the lower part of the 64-bit multiplier.  */
+      x0 = gen_lowpart (word_mode, op0);
+      x0 = force_reg (word_mode, x0);
+
+      /* Splitting the 64-bit constant for the higher and the lower parts.  */
+      y0 = gen_lowpart (word_mode, op1);
+      y0 = force_reg (word_mode, y0);
+      y1 = gen_highpart_mode (word_mode, mode, op1);
+
+      u11 = expand_widening_mult (mode, x1, y1, NULL_RTX, 1, umul_widen_optab);
+      u01 = expand_widening_mult (mode, x0, y1, NULL_RTX, 1, umul_widen_optab);
+      u00 = expand_widening_mult (mode, x0, y0, NULL_RTX, 1, umul_widen_optab);
+      u10 = expand_widening_mult (mode, x1, y0, NULL_RTX, 1, umul_widen_optab);
+
+      /* u00, u01, u10, u11 form a three-word value with the result
+          in the top word, so we want to return this:
+
+            ((u11 << 2*BITS_PER_WORD) +
+                     (u01 + u10 << BITS_PER_WORD) +
+                            u00) >> 3 * BITS_PER_WORD
+
+          We then rewrite it this way:
+
+            (u11 + ((u01 + u10 + (u00 >> BITS_PER_WORD))
+               >> BITS_PER_WORD) >> BITS_PER_WORD
+
+          where the shifts are realized with gen_highpart and a
+          conversion back to the wider mode.  */
+      u00h = gen_highpart (word_mode, u00);
+      u00h = force_reg (word_mode, u00h);
+      u00h = convert_to_mode (mode, u00h, 1);
+      expand_inc (u01, u00h);
+
+      /* Summing u01, u10 and u00h together could have up to
+	 2 * BITS_PER_WORD + 1 bits of precision.
+	 We compute the extra bit by checking for carry, and add
+	 1 << BITS_PER_WORD to u11 if there is carry.  */
+      expand_inc (u01, u10);
+
+      /* Checking for carry here.  */
+      carry = emit_store_flag_force (NULL_RTX, GT, u10, u01,
+				     mode, 1, 1);
+      carry = convert_to_mode (mode, carry, 1);
+      carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, NULL_RTX, 1);
+
+      /* Adding 1 << BITS_PER_WORD as carry here if required.  */
+      expand_inc (u11, carry_result);
+
+      /* Extracting the higher part of the sum.  */
+      u01 = gen_highpart (word_mode, u01);
+      u01 = force_reg (word_mode, u01);
+      u01 = convert_to_mode (mode, u01, 1);
+
+      /* The final result.  */
+      expand_inc (u11, u01);
+
+      return u11;
+
+    }
+
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-06-08 18:37                       ` Dinar Temirbulatov
@ 2012-06-11  8:03                         ` Paolo Bonzini
       [not found]                           ` <CAMnfPmOaL2x4yi0AYOG7KQbjugCM6J5WsAjYg9eY2mELEVfJTw@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2012-06-11  8:03 UTC (permalink / raw)
  To: Dinar Temirbulatov
  Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
	aph, Alexey Kravets

Il 08/06/2012 20:13, Dinar Temirbulatov ha scritto:
> that I have to add convert_to_mode () to DImode after
> emit_store_flag_force (), since emit_store_flag_force () returns
> "carry" in SImode and without convert_to_mode () call compiler fails
> with this error:

Yes, that makes sense.  The new patch looks ok to me.  Just one
question, do you have proof that this:

> +      /* u00, u01, u10, u11 form a three-word value with the result
> +          in the top word, so we want to return this:
> +
> +            ((u11 << 2*BITS_PER_WORD) +
> +                     (u01 + u10 << BITS_PER_WORD) +
> +                            u00) >> 3 * BITS_PER_WORD
> +
> +          We then rewrite it this way:
> +
> +            (u11 + ((u01 + u10 + (u00 >> BITS_PER_WORD))
> +               >> BITS_PER_WORD) >> BITS_PER_WORD

is safe?  That is, that the underflows cannot produce a wrong result?

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
       [not found]                             ` <4FD6E900.9010903@gnu.org>
@ 2012-06-14 19:05                               ` Dinar Temirbulatov
  2012-06-15  8:16                                 ` Richard Earnshaw
  0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-14 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexey Kravets, gcc-patches, Michael Hope, Richard Henderson,
	Richard Earnshaw, aph

[-- Attachment #1: Type: text/plain, Size: 307 bytes --]

Hi,
OK for trunk?
                thanks, Dinar.

On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto:
>>> is safe?  That is, that the underflows cannot produce a wrong result?
>
> [snip]
>
> Thanks very much!
>
> Paolo

[-- Attachment #2: ChangeLog.txt --]
[-- Type: text/plain, Size: 541 bytes --]

2012-06-14  Dinar Temirbulatov  <dtemirbulatov@gmail.com>
	    Alexey Kravets  <mr.kayrick@gmail.com>
	    Paolo Bonzini  <bonzini@gnu.org>
	  
	* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
	double-word division operation.
	* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
	double-word division operation for 32-bit targets.
	* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
	product for unsigned double-word integers using 32-bit wide registers.

[-- Attachment #3: 30.patch --]
[-- Type: application/octet-stream, Size: 4331 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8a86227..0f8120f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
 	*total = COSTS_N_INSNS (2);
       else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
 	*total = COSTS_N_INSNS (4);
+      else if (mode == DImode) 
+        *total = COSTS_N_INSNS (50);
       else
 	*total = COSTS_N_INSNS (20);
       return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5bcb7a8..57bb4cc 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
 	    }
 	  *total = COSTS_N_INSNS (mips_idiv_insns ());
 	}
-      else if (mode == DImode)
+      else if (mode == DImode) {
+	if (!TARGET_64BIT)
+	   /* divide double integer libcall is expensive.  */
+	   *total = COSTS_N_INSNS (200);
+	  else
         *total = mips_cost->int_div_di;
+	}
       else
 	*total = mips_cost->int_div_si;
       return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 98f7c09..51a0d9b 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3539,6 +3539,78 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
 	}
     }
 
+  if (unsignedp
+      && size - 1 > BITS_PER_WORD 
+      && (!optimize_size && optimize > 1)
+      && (3 * mul_widen_cost[speed][mode] + mul_highpart_cost[speed][mode] +
+          4 * add_cost[speed][mode] + add_cost[speed][word_mode] < max_cost))
+    {
+      rtx x1, x0, y1, y0, u00, u01, u10, u11, u00h, carry, carry_result;
+
+      /* Extracting the higher part of the 64-bit multiplier.  */
+      x1 = gen_highpart (word_mode, op0);
+      x1 = force_reg (word_mode, x1);
+
+      /* Extracting the lower part of the 64-bit multiplier.  */
+      x0 = gen_lowpart (word_mode, op0);
+      x0 = force_reg (word_mode, x0);
+
+      /* Splitting the 64-bit constant for the higher and the lower parts.  */
+      y0 = gen_lowpart (word_mode, op1);
+      y0 = force_reg (word_mode, y0);
+      y1 = gen_highpart_mode (word_mode, mode, op1);
+
+      u11 = expand_widening_mult (mode, x1, y1, NULL_RTX, 1, umul_widen_optab);
+      u01 = expand_widening_mult (mode, x0, y1, NULL_RTX, 1, umul_widen_optab);
+      u00 = expand_widening_mult (mode, x0, y0, NULL_RTX, 1, umul_widen_optab);
+      u10 = expand_widening_mult (mode, x1, y0, NULL_RTX, 1, umul_widen_optab);
+
+      /* u00, u01, u10, u11 form a three-word value with the result
+          in the top word, so we want to return this:
+
+            ((u11 << 2*BITS_PER_WORD) +
+                     (u01 + u10 << BITS_PER_WORD) +
+                            u00) >> 3 * BITS_PER_WORD
+
+          We then rewrite it this way:
+
+            (u11 + ((u01 + u10 + (u00 >> BITS_PER_WORD))
+               >> BITS_PER_WORD) >> BITS_PER_WORD
+
+          where the shifts are realized with gen_highpart and a
+          conversion back to the wider mode.  */
+      u00h = gen_highpart (word_mode, u00);
+      u00h = force_reg (word_mode, u00h);
+      u00h = convert_to_mode (mode, u00h, 1);
+      expand_inc (u01, u00h);
+
+      /* Summing u01, u10 and u00h together could have up to
+	 2 * BITS_PER_WORD + 1 bits of precision.
+	 We compute the extra bit by checking for carry, and add
+	 1 << BITS_PER_WORD to u11 if there is carry.  */
+      expand_inc (u01, u10);
+
+      /* Checking for carry here.  */
+      carry = emit_store_flag_force (NULL_RTX, GT, u10, u01,
+				     mode, 1, 1);
+      carry = convert_to_mode (mode, carry, 1);
+      carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, NULL_RTX, 1);
+
+      /* Adding 1 << BITS_PER_WORD as carry here if required.  */
+      expand_inc (u11, carry_result);
+
+      /* Extracting the higher part of the sum.  */
+      u01 = gen_highpart (word_mode, u01);
+      u01 = force_reg (word_mode, u01);
+      u01 = convert_to_mode (mode, u01, 1);
+
+      /* The final result.  */
+      expand_inc (u11, u01);
+
+      return u11;
+
+    }
+
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
   if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-06-14 19:05                               ` Dinar Temirbulatov
@ 2012-06-15  8:16                                 ` Richard Earnshaw
  2012-06-15 18:03                                   ` Dinar Temirbulatov
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Earnshaw @ 2012-06-15  8:16 UTC (permalink / raw)
  To: Dinar Temirbulatov
  Cc: Paolo Bonzini, Alexey Kravets, gcc-patches, Michael Hope,
	Richard Henderson, aph

On 14/06/12 19:46, Dinar Temirbulatov wrote:
> Hi,
> OK for trunk?
>                 thanks, Dinar.
> 

I'm still not comfortable about the code bloat that this is likely to
incurr at -O2.

R.

> On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto:
>>>> is safe?  That is, that the underflows cannot produce a wrong result?
>>
>> [snip]
>>
>> Thanks very much!
>>
>> Paolo=
>>
>> ChangeLog.txt
>>
>>
>> 2012-06-14  Dinar Temirbulatov  <dtemirbulatov@gmail.com>
>> 	    Alexey Kravets  <mr.kayrick@gmail.com>
>> 	    Paolo Bonzini  <bonzini@gnu.org>
>> 	  
>> 	* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
>> 	double-word division operation.
>> 	* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
>> 	double-word division operation for 32-bit targets.
>> 	* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
>> 	product for unsigned double-word integers using 32-bit wide registers.
>>
>> 30.patch
>>
>>
>> N\x18¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: divide 64-bit by constant for 32-bit target machines
  2012-06-15  8:16                                 ` Richard Earnshaw
@ 2012-06-15 18:03                                   ` Dinar Temirbulatov
  0 siblings, 0 replies; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-15 18:03 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Paolo Bonzini, Alexey Kravets, gcc-patches, Michael Hope,
	Richard Henderson, aph

Hi, Richard,
How about if I add and utilize "umul_highpart_di" to the libgcc
instead of expanding
multiplication for the high part directly, or add my own function with
with pre-shift, post-shift, and
64-bit constant and 64-bit operand as function parameters for division
for less than "-O3"?
                     thanks, Dinar.

On Fri, Jun 15, 2012 at 12:12 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 14/06/12 19:46, Dinar Temirbulatov wrote:
>> Hi,
>> OK for trunk?
>>                 thanks, Dinar.
>>
>
> I'm still not comfortable about the code bloat that this is likely to
> incurr at -O2.
>
> R.
>
>> On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>>> Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto:
>>>>> is safe?  That is, that the underflows cannot produce a wrong result?
>>>
>>> [snip]
>>>
>>> Thanks very much!
>>>
>>> Paolo=
>>>
>>> ChangeLog.txt
>>>
>>>
>>> 2012-06-14  Dinar Temirbulatov  <dtemirbulatov@gmail.com>
>>>          Alexey Kravets  <mr.kayrick@gmail.com>
>>>          Paolo Bonzini  <bonzini@gnu.org>
>>>
>>>      * config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
>>>      double-word division operation.
>>>      * config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
>>>      double-word division operation for 32-bit targets.
>>>      * gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
>>>      product for unsigned double-word integers using 32-bit wide registers.
>>>
>>> 30.patch
>>>
>>>
>>> N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-06-15 17:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-20 12:57 divide 64-bit by constant for 32-bit target machines Dinar Temirbulatov
2012-04-23 14:30 ` Andrew Haley
2012-04-24  1:49 ` Michael Hope
2012-05-03 10:28   ` Dinar Temirbulatov
2012-05-03 13:41     ` Richard Earnshaw
2012-05-22 14:05       ` Dinar Temirbulatov
2012-05-22 15:46         ` Richard Henderson
2012-05-25 10:20           ` Dinar Temirbulatov
2012-05-26 12:35             ` Paolo Bonzini
2012-05-26 12:46               ` Paolo Bonzini
2012-06-07 10:21                 ` Dinar Temirbulatov
2012-06-07 10:43                   ` Dinar Temirbulatov
2012-06-07 14:36                     ` Paolo Bonzini
2012-06-08 18:37                       ` Dinar Temirbulatov
2012-06-11  8:03                         ` Paolo Bonzini
     [not found]                           ` <CAMnfPmOaL2x4yi0AYOG7KQbjugCM6J5WsAjYg9eY2mELEVfJTw@mail.gmail.com>
     [not found]                             ` <4FD6E900.9010903@gnu.org>
2012-06-14 19:05                               ` Dinar Temirbulatov
2012-06-15  8:16                                 ` Richard Earnshaw
2012-06-15 18:03                                   ` Dinar Temirbulatov
2012-05-26 12:39             ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).