* divide 64-bit by constant for 32-bit target machines
@ 2012-04-20 12:57 Dinar Temirbulatov
2012-04-23 14:30 ` Andrew Haley
2012-04-24 1:49 ` Michael Hope
0 siblings, 2 replies; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-04-20 12:57 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 550 bytes --]
Hi,
Here is the patch that adds support for divide 64-bit by constant for
32-bit target machines, this patch was tested on arm-7a with no new
regressions, also I am not sure on how to avoid for example i686
targets since div operation there is fast compared to over targets and
it showed better performance with libc/sysdeps/wordsize-32/divdi3.c
__udivdi3 vs my implementation on the compiler side, it is not clear
for me by looking at the udiv_cost[speed][SImode] value.
thanks, Dinar.
[-- Attachment #2: 11.patch --]
[-- Type: application/octet-stream, Size: 2498 bytes --]
diff -rup gcc-20120418-orig/gcc/expmed.c gcc-20120418/gcc/expmed.c
--- gcc-20120418-orig/gcc/expmed.c 2012-04-20 14:00:49.125256428 +0400
+++ gcc-20120418/gcc/expmed.c 2012-04-20 16:15:08.305042269 +0400
@@ -3523,6 +3523,68 @@ expand_mult_highpart_optab (enum machine
}
}
+ if (size - 1 > BITS_PER_WORD
+ && BITS_PER_WORD == 32 && mode == DImode)
+ {
+ unsigned HOST_WIDE_INT d;
+ rtx x1, x0, y1, y0, z2, z0, tmp, tmp1, u0, u0tmp, u1, c, c1, ccst, cres, result, seq;
+
+ d = (INTVAL (op1) & GET_MODE_MASK (DImode));
+ start_sequence ();
+ x1 = gen_highpart (SImode, op0);
+ x1 = force_reg (SImode, x1);
+ x0 = gen_lowpart (SImode, op0);
+ x0 = force_reg (SImode, x0);
+
+ x1 = convert_to_mode (DImode, x1, 0);
+ x0 = convert_to_mode (DImode, x0, 0);
+
+ y0 = gen_rtx_CONST_INT (DImode, d&UINT_MAX);
+ y1 = gen_rtx_CONST_INT (DImode, d>>32);
+
+ z2 = gen_reg_rtx (DImode);
+ u0 = gen_reg_rtx (DImode);
+
+ z2 = expand_mult(DImode, x1, y1, z2, 0);
+ u0 = expand_mult(DImode, x0, y1, u0, 0);
+
+ z0 = gen_reg_rtx (DImode);
+ u1 = gen_reg_rtx (DImode);
+ z0 = expand_mult(DImode, x0, y0, z0, 0);
+ u1 = expand_mult(DImode, x1, y0, u1, 0);
+
+ u0tmp = gen_reg_rtx (DImode);
+ u0tmp = expand_shift (RSHIFT_EXPR, DImode, z0, 32, u0tmp, 1);
+ expand_inc (u0, u0tmp);
+ tmp = gen_reg_rtx (DImode);
+ tmp = expand_binop (DImode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+ if (!tmp)
+ return 0;
+
+ c = gen_reg_rtx (DImode);
+ c1 = gen_reg_rtx (DImode);
+ cres = gen_reg_rtx (DImode);
+ emit_store_flag_force (c, GT, u0, tmp, DImode, 1, -1);
+ emit_store_flag_force (c1, GT, u1, tmp, DImode, 1, -1);
+ result = expand_binop (DImode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
+ if (!result)
+ return 0;
+
+ ccst = gen_reg_rtx (DImode);
+ ccst = expand_shift (LSHIFT_EXPR, DImode, cres, 32, ccst, 1);
+
+ expand_inc (z2, ccst);
+
+ tmp1 = gen_reg_rtx (DImode);
+ tmp1 = expand_shift (RSHIFT_EXPR, DImode, tmp, 32, NULL_RTX, 1);
+ expand_inc (z2, tmp1);
+ seq = get_insns ();
+ end_sequence ();
+ emit_insn (seq);
+ return z2;
+
+ }
+
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-04-20 12:57 divide 64-bit by constant for 32-bit target machines Dinar Temirbulatov
@ 2012-04-23 14:30 ` Andrew Haley
2012-04-24 1:49 ` Michael Hope
1 sibling, 0 replies; 19+ messages in thread
From: Andrew Haley @ 2012-04-23 14:30 UTC (permalink / raw)
To: gcc-patches
On 04/20/2012 01:57 PM, Dinar Temirbulatov wrote:
> Here is the patch that adds support for divide 64-bit by constant for
> 32-bit target machines, this patch was tested on arm-7a with no new
> regressions, also I am not sure on how to avoid for example i686
> targets since div operation there is fast compared to over targets and
> it showed better performance with libc/sysdeps/wordsize-32/divdi3.c
> __udivdi3 vs my implementation on the compiler side, it is not clear
> for me by looking at the udiv_cost[speed][SImode] value.
I can't approve this patch. However, I do think that a comment
which explains the algorithm is needed.
Andrew.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-04-20 12:57 divide 64-bit by constant for 32-bit target machines Dinar Temirbulatov
2012-04-23 14:30 ` Andrew Haley
@ 2012-04-24 1:49 ` Michael Hope
2012-05-03 10:28 ` Dinar Temirbulatov
1 sibling, 1 reply; 19+ messages in thread
From: Michael Hope @ 2012-04-24 1:49 UTC (permalink / raw)
To: Dinar Temirbulatov; +Cc: gcc-patches
On 21 April 2012 00:57, Dinar Temirbulatov <dtemirbulatov@gmail.com> wrote:
> Hi,
> Here is the patch that adds support for divide 64-bit by constant for
> 32-bit target machines, this patch was tested on arm-7a with no new
> regressions, also I am not sure on how to avoid for example i686
> targets since div operation there is fast compared to over targets and
> it showed better performance with libc/sysdeps/wordsize-32/divdi3.c
> __udivdi3 vs my implementation on the compiler side, it is not clear
> for me by looking at the udiv_cost[speed][SImode] value.
Hi Dinar. I'm afraid it gives the wrong results for some dividends:
* 82625484914982912 / 2023346444509052928: gives 4096, should be zero
* 18317463604061229328 / 2023346444509052928: gives 4109, should be 9
* 12097415865295708879 / 4545815675034402816: gives 130, should be 2
* 18195490362097456014 / 6999635335417036800: gives 10, should be 2
The expanded version is very large. Perhaps it should only turn on at
-O2 and always turn off at -Os?
The speed increase is quite impressive - I'm seeing between 2.7 and
20x faster on a Cortex-A9 for things like x / 3.
-- Michael
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-04-24 1:49 ` Michael Hope
@ 2012-05-03 10:28 ` Dinar Temirbulatov
2012-05-03 13:41 ` Richard Earnshaw
0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-05-03 10:28 UTC (permalink / raw)
To: Michael Hope; +Cc: gcc-patches, aph
[-- Attachment #1: Type: text/plain, Size: 643 bytes --]
Hi,
Here is updated version of patch. I added comments describing the algorithm.
> Hi Dinar. I'm afraid it gives the wrong results for some dividends
> * 82625484914982912 / 2023346444509052928: gives 4096, should be zero
> * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9
> * 12097415865295708879 / 4545815675034402816: gives 130, should be 2
> * 18195490362097456014 / 6999635335417036800: gives 10, should be 2
Oh, I have used signed multiplication instead of unsigned and that was
the reason of errors above, fixed that typo.
Tested on arm-7l with no new regressions.
Ok for trunk?
thanks, Dinar.
[-- Attachment #2: 14.patch --]
[-- Type: application/octet-stream, Size: 4513 bytes --]
diff -rup gcc-20120418-orig/gcc/expmed.c gcc-20120418/gcc/expmed.c
--- gcc-20120418-orig/gcc/expmed.c 2012-04-20 14:00:49.125256428 +0400
+++ gcc-20120418/gcc/expmed.c 2012-05-03 14:16:13.755219789 +0400
@@ -3523,6 +3523,106 @@ expand_mult_highpart_optab (enum machine
}
}
+ if ((size - 1 > BITS_PER_WORD
+ && BITS_PER_WORD == 32 && mode == DImode)
+ && (!optimize_size) && (optimize>1))
+ {
+ unsigned HOST_WIDE_INT d;
+ rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result;
+
+ d = (INTVAL (op1) & GET_MODE_MASK (DImode));
+
+ /* Extracting the higher part of the 64-bit multiplier */
+ x1 = gen_highpart (SImode, op0);
+ x1 = force_reg (SImode, x1);
+
+ /* Extracting the lower part of the 64-bit multiplier */
+ x0 = gen_lowpart (SImode, op0);
+ x0 = force_reg (SImode, x0);
+
+ x1 = convert_to_mode (DImode, x1, 1);
+ x0 = convert_to_mode (DImode, x0, 1);
+
+ /* Splitting 64-bit constant for higher and lower parts */
+ y0 = gen_rtx_CONST_INT (DImode, d&UINT_MAX);
+ y1 = gen_rtx_CONST_INT (DImode, d>>32);
+
+ z2 = gen_reg_rtx (DImode);
+ u0 = gen_reg_rtx (DImode);
+
+ /* Unsigned multiplication of the higher mulitplier part
+ and the higher constant part */
+ z2 = expand_mult(DImode, x1, y1, z2, 1);
+ /* Unsigned multiplication of the lower mulitplier part
+ and the higher constant part */
+ u0 = expand_mult(DImode, x0, y1, u0, 1);
+
+ z0 = gen_reg_rtx (DImode);
+ u1 = gen_reg_rtx (DImode);
+
+ /* Unsigned multiplication of the lower mulitplier part
+ and the lower constant part */
+ z0 = expand_mult(DImode, x0, y0, z0, 1);
+
+ /* Unsigned multiplication of the higher mulitplier part
+ and the lower constant part */
+ u1 = expand_mult(DImode, x1, y0, u1, 1);
+
+ /* Getting the higher part of multiplication between the lower mulitplier
+ part and the lower constant part, the lower part is not interesting
+ for the final result */
+ u0tmp = gen_highpart (SImode, z0);
+ u0tmp = force_reg (SImode, u0tmp);
+
+ /* Adding the higher part of multiplication between the lower mulitplier
+ part and the lower constant part to the result of mutiliplication between
+ the lower mulitplier part and the higher constant part. Please note,
+ that we couldn't get overflow here since in the worst case
+ (0xffffffff*0xffffffff)+0xffffffff we get 0xffffffff00000000L */
+ expand_inc (u0, u0tmp);
+ tmp = gen_reg_rtx (DImode);
+
+ /* Adding mutiliplication between the lower mulitplier part and the higher
+ constant part with the higher part of multiplication between the lower
+ mulitplier part and the lower constant part to the result of mutiliplication
+ between the higher mulitplier part and the lower constant part */
+ tmp = expand_binop (DImode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+ if (!tmp)
+ return 0;
+
+ /* Checking for owerflow, please not that we couldn't user carry-flag
+ here before the reload pass .
+ cres = (tmp < u0) || (tmp < u1); */
+ c = gen_reg_rtx (DImode);
+ c1 = gen_reg_rtx (DImode);
+ cres = gen_reg_rtx (DImode);
+
+ emit_store_flag_force (c, GT, u0, tmp, DImode, 1, -1);
+ emit_store_flag_force (c1, GT, u1, tmp, DImode, 1, -1);
+ result = expand_binop (DImode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
+ if (!result)
+ return 0;
+
+ ccst = gen_reg_rtx (DImode);
+ ccst = expand_shift (LSHIFT_EXPR, DImode, cres, 32, ccst, 1);
+
+ /* Adding 0x10000000 in case of overflow to result of multiplication
+ higher mulitplier part and higher constant part. Please note that
+ we don't have to check for overflow here because
+ (0xffffffff*0xffffffff) + 0x100000000 equals to 0xffffffff00000001L */
+ expand_inc (z2, ccst);
+
+
+ /* Extrating the higher part of the sum */
+ tmp = gen_highpart (SImode, tmp);
+ tmp = force_reg (SImode, tmp);
+
+ /* The final result, again we don't have to check for overflow here */
+ expand_inc (z2, tmp);
+ return z2;
+
+ }
+
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-05-03 10:28 ` Dinar Temirbulatov
@ 2012-05-03 13:41 ` Richard Earnshaw
2012-05-22 14:05 ` Dinar Temirbulatov
0 siblings, 1 reply; 19+ messages in thread
From: Richard Earnshaw @ 2012-05-03 13:41 UTC (permalink / raw)
To: Dinar Temirbulatov; +Cc: Michael Hope, gcc-patches, aph
On 03/05/12 11:27, Dinar Temirbulatov wrote:
> Hi,
> Here is updated version of patch. I added comments describing the algorithm.
>
>> Hi Dinar. I'm afraid it gives the wrong results for some dividends
>> * 82625484914982912 / 2023346444509052928: gives 4096, should be zero
>> * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9
>> * 12097415865295708879 / 4545815675034402816: gives 130, should be 2
>> * 18195490362097456014 / 6999635335417036800: gives 10, should be 2
> Oh, I have used signed multiplication instead of unsigned and that was
> the reason of errors above, fixed that typo.
> Tested on arm-7l with no new regressions.
> Ok for trunk?
>
This clearly needs more work.
Comments: Need to end with a period and two spaces.
Binary Operators: Need to be surrounded with white space.
As Andrew Haley has already said, some documentation of the algorithm is
needed.
Why is this restricted to BITS_PER_WORD == 32?
Costs: This code clearly needs to understand the relative cost of doing
division this way; there is a limit to the amount of inline expansion
that we should tolerate, particularly at -O2 and it's not clear that
this will be much better than a library call if we don't have a widening
multiply operation (as is true for older ARM chips, and presumably some
other CPUs). In essence, you need to work out the cost of a divide
instruction (just as rtx costs for this) and the approximate cost of the
expanded algorithm.
Another issue that worries me is the number of live DImode values; on
machines with few registers this could cause excessive spilling to start
happening.
I also wonder whether it would be beneficial to generate custom
functions for division by specific constants (using this algorithm) and
then call those functions rather than the standard lib-call. On ELF
systems the functions can marked as hidden and put into common sections
so that we only end up with one instance of each function in a program.
+ /* Checking for owerflow, please not that we couldn't user carry-flag
+ here before the reload pass .
+ cres = (tmp < u0) || (tmp < u1); */
Generic code can't assume the presence of a carry flag either before or
after reload, so the latter part of the comment is somewhat meaningless.
Also spelling error in comment.
Finally, do you have a copyright assignment with the FSF? We can't use
this code without one.
R.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-05-03 13:41 ` Richard Earnshaw
@ 2012-05-22 14:05 ` Dinar Temirbulatov
2012-05-22 15:46 ` Richard Henderson
0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-05-22 14:05 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: Michael Hope, gcc-patches, aph, Alexey Kravets
[-- Attachment #1: Type: text/plain, Size: 3438 bytes --]
Hi,
Here is the new version of the patch
I have fixed two errors in the previous version,
on mips32 the compiler could not expand division and terminated
with ICE, this change fixed the issue:
/* Extrating the higher part of the sum */
tmp = gen_highpart (SImode, tmp);
tmp = force_reg (SImode, tmp);
+ tmp = force_reg (SImode, tmp);
+ tmp = convert_to_mode (DImode, tmp, 1);
and another error on i686 and mips32r2: I found that overflow
check in multiplication was not working. For example
0xfe34b4190a392b23/257 produced wrong result. Following change
resolved the issue:
-emit_store_flag_force (c, GT, u0, tmp, DImode, 1, -1);
+emit_store_flag_force (c, GT, u0, tmp, DImode, 1, 1);
Tested this new version of the patch on -none-linux-gnu with arm-7l,
mips-32r2 (74k), i686 without new regressions.
On Thu, May 3, 2012 at 5:40 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 03/05/12 11:27, Dinar Temirbulatov wrote:
>
>
> This clearly needs more work.
>
> Comments: Need to end with a period and two spaces.
> Binary Operators: Need to be surrounded with white space.
sorry for this, hope I resolved such issues with the new version.
>
> As Andrew Haley has already said, some documentation of the algorithm is
> needed.
General documentation for the issue could be found here
gmplib.org/~tege/divcnst-pldi94.pdf.
Multiplication to get higher 128-bit was developed by me and Alexey
Kravets, I attached C version of the algorithm.
>
> Why is this restricted to BITS_PER_WORD == 32?
I am checking here that we are generating code for 32-bit target with
32-bit wide general propose registers, and with 64-bit wide registers
usually there is an instruction to get the higher 64-bit of 64x64-bit
multiplication cheaply.
>
> Costs: This code clearly needs to understand the relative cost of doing
> division this way; there is a limit to the amount of inline expansion
> that we should tolerate, particularly at -O2 and it's not clear that
> this will be much better than a library call if we don't have a widening
> multiply operation (as is true for older ARM chips, and presumably some
> other CPUs). In essence, you need to work out the cost of a divide
> instruction (just as rtx costs for this) and the approximate cost of the
> expanded algorithm.
Added cost calculation.
>
> Another issue that worries me is the number of live DImode values; on
> machines with few registers this could cause excessive spilling to start
> happening.
The cost calculation suppose to take this into account.
>
> I also wonder whether it would be beneficial to generate custom
> functions for division by specific constants (using this algorithm) and
> then call those functions rather than the standard lib-call. On ELF
> systems the functions can marked as hidden and put into common sections
> so that we only end up with one instance of each function in a program.
yes, I think it is a good approach, I could redo my patch with such
intrinsic function implementation with pre-shift, post-shift, and
64-bit constant as function parameters.
>
> Finally, do you have a copyright assignment with the FSF? We can't use
> this code without one.
yes, I do have a copyright assignment with the FSF.
Also I am in process of implementing signed integer 64-bit division by constant.
thanks, Dinar.
[-- Attachment #2: 18.patch --]
[-- Type: application/octet-stream, Size: 5883 bytes --]
diff -rup gcc-20120418-orig/gcc/config/arm/arm.c gcc-20120418/gcc/config/arm/arm.c
--- gcc-20120418-orig/gcc/config/arm/arm.c 2012-04-20 13:59:17.521258861 +0400
+++ gcc-20120418/gcc/config/arm/arm.c 2012-05-14 15:38:44.980815823 +0400
@@ -7131,6 +7131,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou
*total = COSTS_N_INSNS (2);
else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
*total = COSTS_N_INSNS (4);
+ else if (mode == DImode)
+ *total = COSTS_N_INSNS (50);
else
*total = COSTS_N_INSNS (20);
return false;
diff -rup gcc-20120418-orig/gcc/config/mips/mips.c gcc-20120418/gcc/config/mips/mips.c
--- gcc-20120418-orig/gcc/config/mips/mips.c 2012-04-20 13:59:16.417258891 +0400
+++ gcc-20120418/gcc/config/mips/mips.c 2012-05-14 15:41:05.132812098 +0400
@@ -3845,8 +3845,13 @@ mips_rtx_costs (rtx x, int code, int out
}
*total = COSTS_N_INSNS (mips_idiv_insns ());
}
- else if (mode == DImode)
+ else if (mode == DImode) {
+ if (!TARGET_64BIT)
+ /* divide double integer libcall is expensive. */
+ *total = COSTS_N_INSNS (200);
+ else
*total = mips_cost->int_div_di;
+ }
else
*total = mips_cost->int_div_si;
return false;
diff -rup gcc-20120418-orig/gcc/expmed.c gcc-20120418/gcc/expmed.c
--- gcc-20120418-orig/gcc/expmed.c 2012-04-20 14:00:49.125256428 +0400
+++ gcc-20120418/gcc/expmed.c 2012-05-22 17:17:16.618291346 +0400
@@ -3523,6 +3523,110 @@ expand_mult_highpart_optab (enum machine
}
}
+ if ((size - 1 > BITS_PER_WORD
+ && BITS_PER_WORD == 32 && mode == DImode)
+ && unsignedp
+ && (!optimize_size && (optimize>1))
+ && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
+ + shift_cost[speed][mode][31] < max_cost))
+ {
+ unsigned HOST_WIDE_INT d;
+ rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result;
+
+ d = (INTVAL (op1) & GET_MODE_MASK (DImode));
+
+ /* Extracting the higher part of the 64-bit multiplier. */
+ x1 = gen_highpart (SImode, op0);
+ x1 = force_reg (SImode, x1);
+
+ /* Extracting the lower part of the 64-bit multiplier. */
+ x0 = gen_lowpart (SImode, op0);
+ x0 = force_reg (SImode, x0);
+
+ x1 = convert_to_mode (DImode, x1, 1);
+ x0 = convert_to_mode (DImode, x0, 1);
+
+ /* Splitting the 64-bit constant for the higher and the lower parts. */
+ y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
+ y1 = gen_rtx_CONST_INT (DImode, d>>32);
+
+ z2 = gen_reg_rtx (DImode);
+ u0 = gen_reg_rtx (DImode);
+
+ /* Unsigned multiplication of the higher multiplier part
+ and the higher constant part. */
+ z2 = expand_mult(DImode, x1, y1, z2, 1);
+ /* Unsigned multiplication of the lower multiplier part
+ and the higher constant part. */
+ u0 = expand_mult(DImode, x0, y1, u0, 1);
+
+ z0 = gen_reg_rtx (DImode);
+ u1 = gen_reg_rtx (DImode);
+
+ /* Unsigned multiplication of the lower multiplier part
+ and the lower constant part. */
+ z0 = expand_mult (DImode, x0, y0, z0, 1);
+
+ /* Unsigned multiplication of the higher multiplier part
+ the lower constant part. */
+ u1 = expand_mult (DImode, x1, y0, u1, 1);
+
+ /* Getting the higher part of multiplication between the lower multiplier
+ part and the lower constant part, the lower part is not interesting
+ for the final result. */
+ u0tmp = gen_highpart (SImode, z0);
+ u0tmp = force_reg (SImode, u0tmp);
+ u0tmp = convert_to_mode (DImode, u0tmp, 1);
+
+ /* Adding the higher part of multiplication between the lower multiplier
+ part and the lower constant part to the result of multiplication between
+ the lower multiplier part and the higher constant part. Please note,
+ that we couldn't get overflow here since in the worst case
+ (0xffffffff*0xffffffff)+0xffffffff we get 0xffffffff00000000L. */
+ expand_inc (u0, u0tmp);
+ tmp = gen_reg_rtx (DImode);
+
+ /* Adding multiplication between the lower multiplier part and the higher
+ constant part with the higher part of multiplication between the lower
+ multiplier part and the lower constant part to the result of multiplication
+ between the higher multiplier part and the lower constant part. */
+ tmp = expand_binop (DImode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+ if (!tmp)
+ return 0;
+
+ /* Checking for overflow. */
+ c = gen_reg_rtx (DImode);
+ c1 = gen_reg_rtx (DImode);
+ cres = gen_reg_rtx (DImode);
+
+ emit_store_flag_force (c, GT, u0, tmp, DImode, 1, 1);
+ emit_store_flag_force (c1, GT, u1, tmp, DImode, 1, 1);
+ result = expand_binop (DImode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
+ if (!result)
+ return 0;
+
+ ccst = gen_reg_rtx (DImode);
+ ccst = expand_shift (LSHIFT_EXPR, DImode, cres, 32, ccst, 1);
+
+ /* Adding 0x10000000 in case of overflow to result of multiplication
+ higher multiplier part and higher constant part. Please note that
+ we don't have to check for overflow here because in the worst case
+ (0xffffffff*0xffffffff) + 0x100000000 equals to 0xffffffff00000001L. */
+ expand_inc (z2, ccst);
+
+
+ /* Extracting the higher part of the sum. */
+ tmp = gen_highpart (SImode, tmp);
+ tmp = force_reg (SImode, tmp);
+ tmp = convert_to_mode (DImode, tmp, 1);
+
+ /* The final result, again we don't have to check for overflow here. */
+ expand_inc (z2, tmp);
+
+ return z2;
+
+ }
+
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
[-- Attachment #3: mul.c --]
[-- Type: text/x-csrc, Size: 526 bytes --]
unsigned long long mul(unsigned long long a, unsigned long long b)
{
unsigned long long x1=a>>32;
unsigned long long x0=a&0x00000000ffffffff;
unsigned long long y1=b>>32;
unsigned long long y0=b&0x00000000ffffffff;
unsigned long long z2, z0, tmp, u0, u1;
unsigned char c=0;
z2=x1*y1;
z0=x0*y0;
u0=x0*y1+(z0>>32);
u1=x1*y0;
tmp = (u0+u1);
c = (tmp < u0) || (tmp < u1);
return z2+(tmp>>32)+(((unsigned long long)c)<<32);
}
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-05-22 14:05 ` Dinar Temirbulatov
@ 2012-05-22 15:46 ` Richard Henderson
2012-05-25 10:20 ` Dinar Temirbulatov
0 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2012-05-22 15:46 UTC (permalink / raw)
To: Dinar Temirbulatov
Cc: Richard Earnshaw, Michael Hope, gcc-patches, aph, Alexey Kravets
On 05/22/12 07:05, Dinar Temirbulatov wrote:
> + if ((size - 1 > BITS_PER_WORD
> + && BITS_PER_WORD == 32 && mode == DImode)
Do note that this patch will not go in with hard-coded
SImode and DImode references.
Which, really, you do not even need.
&& GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD
is what you wanted for testing for double-word-ness,
and word_mode is a good substitute for SImode here.
+ /* Splitting the 64-bit constant for the higher and the lower parts. */
+ y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
+ y1 = gen_rtx_CONST_INT (DImode, d>>32);
Use gen_int_mode.
> + x1 = convert_to_mode (DImode, x1, 1);
> + x0 = convert_to_mode (DImode, x0, 1);
> +
> + /* Splitting the 64-bit constant for the higher and the lower parts. */
> + y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
> + y1 = gen_rtx_CONST_INT (DImode, d>>32);
> +
> + z2 = gen_reg_rtx (DImode);
> + u0 = gen_reg_rtx (DImode);
> +
> + /* Unsigned multiplication of the higher multiplier part
> + and the higher constant part. */
> + z2 = expand_mult(DImode, x1, y1, z2, 1);
> + /* Unsigned multiplication of the lower multiplier part
> + and the higher constant part. */
> + u0 = expand_mult(DImode, x0, y1, u0, 1);
I'm fairly sure you really want to be using expand_widening_mult
here, rather than using convert_to_mode first. While combine may
be able to re-generate a widening multiply out of this sequence,
there's no sense making it work too hard.
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-05-22 15:46 ` Richard Henderson
@ 2012-05-25 10:20 ` Dinar Temirbulatov
2012-05-26 12:35 ` Paolo Bonzini
2012-05-26 12:39 ` Paolo Bonzini
0 siblings, 2 replies; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-05-25 10:20 UTC (permalink / raw)
To: Richard Henderson
Cc: Richard Earnshaw, Michael Hope, gcc-patches, aph, Alexey Kravets
[-- Attachment #1: Type: text/plain, Size: 2056 bytes --]
Hi,
I have replaced "expand_mult" to "expand_widening_mult" and removed
all direct references to DImode, SImode modes in the
expand_mult_highpart_optab funtion. The attached patch was tested on
arm-7l, mips-32r2 (74k), i686 without new regressions. Richard, do you
think it is ready now?
thanks, Dinar.
On Tue, May 22, 2012 at 7:45 PM, Richard Henderson <rth@redhat.com> wrote:
> On 05/22/12 07:05, Dinar Temirbulatov wrote:
>> + if ((size - 1 > BITS_PER_WORD
>> + && BITS_PER_WORD == 32 && mode == DImode)
>
> Do note that this patch will not go in with hard-coded
> SImode and DImode references.
>
> Which, really, you do not even need.
>
> && GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD
>
> is what you wanted for testing for double-word-ness,
> and word_mode is a good substitute for SImode here.
>
> + /* Splitting the 64-bit constant for the higher and the lower parts. */
> + y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
> + y1 = gen_rtx_CONST_INT (DImode, d>>32);
>
> Use gen_int_mode.
>
>> + x1 = convert_to_mode (DImode, x1, 1);
>> + x0 = convert_to_mode (DImode, x0, 1);
>> +
>> + /* Splitting the 64-bit constant for the higher and the lower parts. */
>> + y0 = gen_rtx_CONST_INT (DImode, d&UINT32_MAX);
>> + y1 = gen_rtx_CONST_INT (DImode, d>>32);
>> +
>> + z2 = gen_reg_rtx (DImode);
>> + u0 = gen_reg_rtx (DImode);
>> +
>> + /* Unsigned multiplication of the higher multiplier part
>> + and the higher constant part. */
>> + z2 = expand_mult(DImode, x1, y1, z2, 1);
>> + /* Unsigned multiplication of the lower multiplier part
>> + and the higher constant part. */
>> + u0 = expand_mult(DImode, x0, y1, u0, 1);
>
> I'm fairly sure you really want to be using expand_widening_mult
> here, rather than using convert_to_mode first. While combine may
> be able to re-generate a widening multiply out of this sequence,
> there's no sense making it work too hard.
>
>
>
> r~
[-- Attachment #2: ChangeLog.txt --]
[-- Type: text/plain, Size: 503 bytes --]
2012-05-25 Dinar Temirbulatov <dtemirbulatov@gmail.com>
Alexey Kravets <mr.kayrick@gmail.com>
* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
double-word division operation.
* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
double-word division operation for 32-bit targets.
* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
product for unsigned double-word integers using 32-bit wide registers.
[-- Attachment #3: 22.patch --]
[-- Type: application/octet-stream, Size: 5697 bytes --]
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 2cecf45..9d6983b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7131,6 +7131,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
*total = COSTS_N_INSNS (2);
else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
*total = COSTS_N_INSNS (4);
+ else if (mode == DImode)
+ *total = COSTS_N_INSNS (50);
else
*total = COSTS_N_INSNS (20);
return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index d48a465..b5627c2 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3846,7 +3846,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
*total = COSTS_N_INSNS (mips_idiv_insns ());
}
else if (mode == DImode)
- *total = mips_cost->int_div_di;
+ {
+ if (!TARGET_64BIT)
+ /* Divide double integer library call is expensive. */
+ *total = COSTS_N_INSNS (200);
+ else
+ *total = mips_cost->int_div_di;
+ }
else
*total = mips_cost->int_div_si;
return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index aa24fbf..5f4c921 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3523,6 +3523,105 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
}
}
+ if (unsignedp && (!optimize_size && (optimize>1))
+ && (size - 1 > BITS_PER_WORD
+ && BITS_PER_WORD == 32 && GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD)
+ && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
+ + shift_cost[speed][mode][31] < max_cost))
+ {
+ unsigned HOST_WIDE_INT d;
+ rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result;
+
+ d = (INTVAL (op1) & GET_MODE_MASK (mode));
+
+ /* Extracting the higher part of the 64-bit multiplier. */
+ x1 = gen_highpart (word_mode, op0);
+ x1 = force_reg (word_mode, x1);
+
+ /* Extracting the lower part of the 64-bit multiplier. */
+ x0 = gen_lowpart (word_mode, op0);
+ x0 = force_reg (word_mode, x0);
+
+ /* Splitting the 64-bit constant for the higher and the lower parts. */
+ y0 = gen_int_mode(d & UINT32_MAX, word_mode);
+ y1 = gen_int_mode(d >> 32, word_mode);
+
+ z2 = gen_reg_rtx (mode);
+ u0 = gen_reg_rtx (mode);
+
+ /* Unsigned multiplication of the higher multiplier part
+ and the higher constant part. */
+ z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
+ /* Unsigned multiplication of the lower multiplier part
+ and the higher constant part. */
+ u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
+
+ z0 = gen_reg_rtx (mode);
+ u1 = gen_reg_rtx (mode);
+
+ /* Unsigned multiplication of the lower multiplier part
+ and the lower constant part. */
+ z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
+
+ /* Unsigned multiplication of the higher multiplier part
+ the lower constant part. */
+ u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
+
+ /* Getting the higher part of multiplication between the lower multiplier
+ part and the lower constant part, the lower part is not interesting
+ for the final result. */
+ u0tmp = gen_highpart (word_mode, z0);
+ u0tmp = force_reg (word_mode, u0tmp);
+ u0tmp = convert_to_mode (mode, u0tmp, 1);
+
+ /* Adding the higher part of multiplication between the lower multiplier
+ part and the lower constant part to the result of multiplication between
+ the lower multiplier part and the higher constant part. Please note,
+ that we couldn't get overflow here since in the worst case
+ (0xffffffff*0xffffffff)+0xffffffff we get 0xffffffff00000000L. */
+ expand_inc (u0, u0tmp);
+ tmp = gen_reg_rtx (mode);
+
+ /* Adding multiplication between the lower multiplier part and the higher
+ constant part with the higher part of multiplication between the lower
+ multiplier part and the lower constant part to the result of multiplication
+ between the higher multiplier part and the lower constant part. */
+ tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+ if (!tmp)
+ return 0;
+
+ /* Checking for overflow. */
+ c = gen_reg_rtx (mode);
+ c1 = gen_reg_rtx (mode);
+ cres = gen_reg_rtx (mode);
+
+ emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
+ emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
+ result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
+ if (!result)
+ return 0;
+
+ ccst = gen_reg_rtx (mode);
+ ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
+
+ /* Adding 0x10000000 in case of overflow to the result of multiplication
+ between the higher multiplier part and the higher constant part. Please note,
+ that we don't have to check for overflow here because in the worst case
+ (0xffffffff*0xffffffff) + 0x100000000 equals to 0xffffffff00000001L. */
+ expand_inc (z2, ccst);
+
+ /* Extracting the higher part of the sum. */
+ tmp = gen_highpart (word_mode, tmp);
+ tmp = force_reg (word_mode, tmp);
+ tmp = convert_to_mode (mode, tmp, 1);
+
+ /* The final result, again we don't have to check for overflow here. */
+ expand_inc (z2, tmp);
+
+ return z2;
+
+ }
+
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-05-25 10:20 ` Dinar Temirbulatov
@ 2012-05-26 12:35 ` Paolo Bonzini
2012-05-26 12:46 ` Paolo Bonzini
2012-05-26 12:39 ` Paolo Bonzini
1 sibling, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2012-05-26 12:35 UTC (permalink / raw)
To: Dinar Temirbulatov
Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
aph, Alexey Kravets
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 2cecf45..9d6983b 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -7131,6 +7131,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
> *total = COSTS_N_INSNS (2);
> else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
> *total = COSTS_N_INSNS (4);
> + else if (mode == DImode)
> + *total = COSTS_N_INSNS (50);
> else
> *total = COSTS_N_INSNS (20);
> return false;
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index d48a465..b5627c2 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -3846,7 +3846,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
> *total = COSTS_N_INSNS (mips_idiv_insns ());
> }
> else if (mode == DImode)
> - *total = mips_cost->int_div_di;
> + {
> + if (!TARGET_64BIT)
> + /* Divide double integer library call is expensive. */
> + *total = COSTS_N_INSNS (200);
> + else
> + *total = mips_cost->int_div_di;
> + }
> else
> *total = mips_cost->int_div_si;
> return false;
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index aa24fbf..5f4c921 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -3523,6 +3523,105 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
> }
> }
>
> + if (unsignedp && (!optimize_size && (optimize>1))
> + && (size - 1 > BITS_PER_WORD
> + && BITS_PER_WORD == 32 && GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD)
These references to 32-bits are still wrong (and unnecessary, just
remove them).
> + && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
> + + shift_cost[speed][mode][31] < max_cost))
> + {
> + unsigned HOST_WIDE_INT d;
> + rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, c, c1, ccst, cres, result;
> +
> + d = (INTVAL (op1) & GET_MODE_MASK (mode));
This could be a CONST_DOUBLE. But you don't need "d", because you can...
> + /* Extracting the higher part of the 64-bit multiplier. */
> + x1 = gen_highpart (word_mode, op0);
> + x1 = force_reg (word_mode, x1);
> +
> + /* Extracting the lower part of the 64-bit multiplier. */
> + x0 = gen_lowpart (word_mode, op0);
> + x0 = force_reg (word_mode, x0);
> +
> + /* Splitting the 64-bit constant for the higher and the lower parts. */
> + y0 = gen_int_mode(d & UINT32_MAX, word_mode);
> + y1 = gen_int_mode(d >> 32, word_mode);
... use gen_lowpart and gen_highpart directly on op1.
> +
> + z2 = gen_reg_rtx (mode);
> + u0 = gen_reg_rtx (mode);
> +
> + /* Unsigned multiplication of the higher multiplier part
> + and the higher constant part. */
> + z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
> + /* Unsigned multiplication of the lower multiplier part
> + and the higher constant part. */
> + u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
> +
> + z0 = gen_reg_rtx (mode);
> + u1 = gen_reg_rtx (mode);
> +
> + /* Unsigned multiplication of the lower multiplier part
> + and the lower constant part. */
> + z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
> +
> + /* Unsigned multiplication of the higher multiplier part
> + the lower constant part. */
> + u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
Up to here the comments are not necessary.
> + /* Getting the higher part of multiplication between the lower multiplier
> + part and the lower constant part, the lower part is not interesting
> + for the final result. */
> + u0tmp = gen_highpart (word_mode, z0);
> + u0tmp = force_reg (word_mode, u0tmp);
> + u0tmp = convert_to_mode (mode, u0tmp, 1);
> +
> + /* Adding the higher part of multiplication between the lower multiplier
> + part and the lower constant part to the result of multiplication between
> + the lower multiplier part and the higher constant part. Please note,
> + that we couldn't get overflow here since in the worst case
> + (0xffffffff*0xffffffff)+0xffffffff we get 0xffffffff00000000L. */
The command can simply be "compute the middle word of the three-word
intermediate result." Also it's not overflow, it's carry.
> + expand_inc (u0, u0tmp);
> + tmp = gen_reg_rtx (mode);
> +
> + /* Adding multiplication between the lower multiplier part and the higher
> + constant part with the higher part of multiplication between the lower
> + multiplier part and the lower constant part to the result of multiplication
> + between the higher multiplier part and the lower constant part. */
Here you have to explain:
/* We have to return
z2 + ((u0 + u1) >> GET_MODE_BITSIZE (word_mode)).
u0 + u1 are the upper two words of the three-word
intermediate result and they could have up to
2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
We compute the extra bit by checking for carry, and add
1 << GET_MODE_BITSIZE (word_mode) to z2 if there is carry. */
> + tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
> + if (!tmp)
> + return 0;
/* We have to return z2 + (tmp >> 32). We need
> + /* Checking for overflow. */
This is not overflow, it's carry (see above).
> + c = gen_reg_rtx (mode);
> + c1 = gen_reg_rtx (mode);
> + cres = gen_reg_rtx (mode);
> +
> + emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
> + emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
> + result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
> + if (!result)
> + return 0;
> +
> + ccst = gen_reg_rtx (mode);
> + ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
This 32 should be GET_MODE_BITSIZE (word_mode).
> +
> + /* Adding 0x10000000 in case of overflow to the result of multiplication
One 0 missing in the constant.
> + between the higher multiplier part and the higher constant part. Please note,
> + that we don't have to check for overflow here because in the worst case
> + (0xffffffff*0xffffffff) + 0x100000000 equals to 0xffffffff00000001L. */
Again, s/overflow/carry/.
> + expand_inc (z2, ccst);
> + /* Extracting the higher part of the sum. */
> + tmp = gen_highpart (word_mode, tmp);
> + tmp = force_reg (word_mode, tmp);
> + tmp = convert_to_mode (mode, tmp, 1);
> +
> + /* The final result, again we don't have to check for overflow here. */
> + expand_inc (z2, tmp);
> +
> + return z2;
> +
> + }
> +
> /* Try widening multiplication of opposite signedness, and adjust. */
> moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
> if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-05-25 10:20 ` Dinar Temirbulatov
2012-05-26 12:35 ` Paolo Bonzini
@ 2012-05-26 12:39 ` Paolo Bonzini
1 sibling, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-05-26 12:39 UTC (permalink / raw)
To: Dinar Temirbulatov
Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
aph, Alexey Kravets
Il 25/05/2012 12:20, Dinar Temirbulatov ha scritto:
> + emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
> + emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
> + result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
> + if (!result)
> + return 0;
Ah, you don't need the or. u0 < tmp is already giving the overflow.
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-05-26 12:35 ` Paolo Bonzini
@ 2012-05-26 12:46 ` Paolo Bonzini
2012-06-07 10:21 ` Dinar Temirbulatov
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2012-05-26 12:46 UTC (permalink / raw)
Cc: Dinar Temirbulatov, Richard Henderson, Richard Earnshaw,
Michael Hope, gcc-patches, aph, Alexey Kravets
Il 26/05/2012 14:35, Paolo Bonzini ha scritto:
> /* We have to return
>
> z2 + ((u0 + u1) >> GET_MODE_BITSIZE (word_mode)).
>
> u0 + u1 are the upper two words of the three-word
> intermediate result and they could have up to
> 2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
> We compute the extra bit by checking for carry, and add
> 1 << GET_MODE_BITSIZE (word_mode) to z2 if there is carry. */
Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD.
>> > + tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
>> > + if (!tmp)
>> > + return 0;
> /* We have to return z2 + (tmp >> 32). We need
>> > + /* Checking for overflow. */
> This is not overflow, it's carry (see above).
>
>> > + c = gen_reg_rtx (mode);
>> > + c1 = gen_reg_rtx (mode);
>> > + cres = gen_reg_rtx (mode);
>> > +
>> > + emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
>> > + emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
>> > + result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
>> > + if (!result)
>> > + return 0;
>> > +
>> > + ccst = gen_reg_rtx (mode);
>> > + ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
> This 32 should be GET_MODE_BITSIZE (word_mode).
Here, too.
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-05-26 12:46 ` Paolo Bonzini
@ 2012-06-07 10:21 ` Dinar Temirbulatov
2012-06-07 10:43 ` Dinar Temirbulatov
0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-07 10:21 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
aph, Alexey Kravets
[-- Attachment #1: Type: text/plain, Size: 1735 bytes --]
Hi,
Here is new version of patch based up on Paolo review, again tested on
arm-7l, mips-32r2 (74k), i686 without new regressions.
thanks, Dinar.
On Sat, May 26, 2012 at 4:45 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> Il 26/05/2012 14:35, Paolo Bonzini ha scritto:
>> /* We have to return
>>
>> z2 + ((u0 + u1) >> GET_MODE_BITSIZE (word_mode)).
>>
>> u0 + u1 are the upper two words of the three-word
>> intermediate result and they could have up to
>> 2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
>> We compute the extra bit by checking for carry, and add
>> 1 << GET_MODE_BITSIZE (word_mode) to z2 if there is carry. */
>
> Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD.
>
>>> > + tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
>>> > + if (!tmp)
>>> > + return 0;
>> /* We have to return z2 + (tmp >> 32). We need
>>> > + /* Checking for overflow. */
>> This is not overflow, it's carry (see above).
>>
>>> > + c = gen_reg_rtx (mode);
>>> > + c1 = gen_reg_rtx (mode);
>>> > + cres = gen_reg_rtx (mode);
>>> > +
>>> > + emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
>>> > + emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
>>> > + result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
>>> > + if (!result)
>>> > + return 0;
>>> > +
>>> > + ccst = gen_reg_rtx (mode);
>>> > + ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
>> This 32 should be GET_MODE_BITSIZE (word_mode).
>
> Here, too.
>
> Paolo
>
[-- Attachment #2: ChangeLog.txt --]
[-- Type: text/plain, Size: 503 bytes --]
2012-06-07 Dinar Temirbulatov <dtemirbulatov@gmail.com>
Alexey Kravets <mr.kayrick@gmail.com>
* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
double-word division operation.
* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
double-word division operation for 32-bit targets.
* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
product for unsigned double-word integers using 32-bit wide registers.
[-- Attachment #3: 28.patch --]
[-- Type: application/octet-stream, Size: 4181 bytes --]
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8a86227..0f8120f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
*total = COSTS_N_INSNS (2);
else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
*total = COSTS_N_INSNS (4);
+ else if (mode == DImode)
+ *total = COSTS_N_INSNS (50);
else
*total = COSTS_N_INSNS (20);
return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5bcb7a8..57bb4cc 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
}
*total = COSTS_N_INSNS (mips_idiv_insns ());
}
- else if (mode == DImode)
+ else if (mode == DImode) {
+ if (!TARGET_64BIT)
+ /* divide double integer libcall is expensive. */
+ *total = COSTS_N_INSNS (200);
+ else
*total = mips_cost->int_div_di;
+ }
else
*total = mips_cost->int_div_si;
return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 98f7c09..5108df9 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3539,6 +3539,84 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
}
}
+ if (unsignedp
+ && size - 1 > BITS_PER_WORD
+ && (!optimize_size && (optimize>1))
+ && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
+ + shift_cost[speed][mode][31] < max_cost))
+ {
+ rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, carry, carry_result, result;
+
+ /* Extracting the higher part of the 64-bit multiplier. */
+ x1 = gen_highpart (word_mode, op0);
+ x1 = force_reg (word_mode, x1);
+
+ /* Extracting the lower part of the 64-bit multiplier. */
+ x0 = gen_lowpart (word_mode, op0);
+ x0 = force_reg (word_mode, x0);
+
+ /* Splitting the 64-bit constant for the higher and the lower parts. */
+ y0 = gen_lowpart (word_mode, op1);
+ y0 = force_reg (word_mode, y0);
+ y1 = gen_highpart_mode (word_mode, mode, op1);
+
+ z2 = gen_reg_rtx (mode);
+ u0 = gen_reg_rtx (mode);
+
+ z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
+
+ u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
+
+ z0 = gen_reg_rtx (mode);
+ u1 = gen_reg_rtx (mode);
+
+ z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
+
+ u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
+
+ /* Compute the middle word of the three-word intermediate result. */
+ u0tmp = gen_highpart (word_mode, z0);
+ u0tmp = force_reg (word_mode, u0tmp);
+ u0tmp = convert_to_mode (mode, u0tmp, 1);
+
+ /* We have to return
+ z2 + ((u0 + u1) >> BITS_PER_WORD)
+ u0 + u1 are the upper two words of the three-word
+ intermediate result and they could have up to
+ 2 * BITS_PER_WORD + 1 bits of precision.
+ We compute the extra bit by checking for carry, and add
+ 1 << BITS_PER_WORD to z2 if there is carry. */
+
+ expand_inc (u0, u0tmp);
+ tmp = gen_reg_rtx (mode);
+
+ tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+ if (!tmp)
+ return 0;
+
+ /* Checking for carry here. */
+ carry = gen_reg_rtx (mode);
+
+ emit_store_flag_force (carry, GT, u0, tmp, mode, 1, 1);
+
+ carry_result = gen_reg_rtx (mode);
+ carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1);
+
+ /* Adding 0x100000000 as carry here if required. */
+ expand_inc (z2, carry_result);
+
+ /* Extracting the higher part of the sum. */
+ tmp = gen_highpart (word_mode, tmp);
+ tmp = force_reg (word_mode, tmp);
+ tmp = convert_to_mode (mode, tmp, 1);
+
+ /* The final result */
+ expand_inc (z2, tmp);
+
+ return z2;
+
+ }
+
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-06-07 10:21 ` Dinar Temirbulatov
@ 2012-06-07 10:43 ` Dinar Temirbulatov
2012-06-07 14:36 ` Paolo Bonzini
0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-07 10:43 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
aph, Alexey Kravets
[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]
oh, I found typo in comment in the end of patch. fixed.
thanks, Dinar.
On Thu, Jun 7, 2012 at 2:14 PM, Dinar Temirbulatov
<dtemirbulatov@gmail.com> wrote:
> Hi,
> Here is new version of patch based up on Paolo review, again tested on
> arm-7l, mips-32r2 (74k), i686 without new regressions.
> thanks, Dinar.
>
> On Sat, May 26, 2012 at 4:45 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> Il 26/05/2012 14:35, Paolo Bonzini ha scritto:
>>> /* We have to return
>>>
>>> z2 + ((u0 + u1) >> GET_MODE_BITSIZE (word_mode)).
>>>
>>> u0 + u1 are the upper two words of the three-word
>>> intermediate result and they could have up to
>>> 2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision.
>>> We compute the extra bit by checking for carry, and add
>>> 1 << GET_MODE_BITSIZE (word_mode) to z2 if there is carry. */
>>
>> Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD.
>>
>>>> > + tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
>>>> > + if (!tmp)
>>>> > + return 0;
>>> /* We have to return z2 + (tmp >> 32). We need
>>>> > + /* Checking for overflow. */
>>> This is not overflow, it's carry (see above).
>>>
>>>> > + c = gen_reg_rtx (mode);
>>>> > + c1 = gen_reg_rtx (mode);
>>>> > + cres = gen_reg_rtx (mode);
>>>> > +
>>>> > + emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1);
>>>> > + emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1);
>>>> > + result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN);
>>>> > + if (!result)
>>>> > + return 0;
>>>> > +
>>>> > + ccst = gen_reg_rtx (mode);
>>>> > + ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1);
>>> This 32 should be GET_MODE_BITSIZE (word_mode).
>>
>> Here, too.
>>
>> Paolo
>>
[-- Attachment #2: 28.patch --]
[-- Type: application/octet-stream, Size: 4183 bytes --]
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8a86227..0f8120f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
*total = COSTS_N_INSNS (2);
else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
*total = COSTS_N_INSNS (4);
+ else if (mode == DImode)
+ *total = COSTS_N_INSNS (50);
else
*total = COSTS_N_INSNS (20);
return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5bcb7a8..57bb4cc 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
}
*total = COSTS_N_INSNS (mips_idiv_insns ());
}
- else if (mode == DImode)
+ else if (mode == DImode) {
+ if (!TARGET_64BIT)
+ /* divide double integer libcall is expensive. */
+ *total = COSTS_N_INSNS (200);
+ else
*total = mips_cost->int_div_di;
+ }
else
*total = mips_cost->int_div_si;
return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 98f7c09..bb4d7cd 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3539,6 +3539,84 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
}
}
+ if (unsignedp
+ && size - 1 > BITS_PER_WORD
+ && (!optimize_size && (optimize>1))
+ && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
+ + shift_cost[speed][mode][31] < max_cost))
+ {
+ rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, carry, carry_result, result;
+
+ /* Extracting the higher part of the 64-bit multiplier. */
+ x1 = gen_highpart (word_mode, op0);
+ x1 = force_reg (word_mode, x1);
+
+ /* Extracting the lower part of the 64-bit multiplier. */
+ x0 = gen_lowpart (word_mode, op0);
+ x0 = force_reg (word_mode, x0);
+
+ /* Splitting the 64-bit constant for the higher and the lower parts. */
+ y0 = gen_lowpart (word_mode, op1);
+ y0 = force_reg (word_mode, y0);
+ y1 = gen_highpart_mode (word_mode, mode, op1);
+
+ z2 = gen_reg_rtx (mode);
+ u0 = gen_reg_rtx (mode);
+
+ z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
+
+ u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
+
+ z0 = gen_reg_rtx (mode);
+ u1 = gen_reg_rtx (mode);
+
+ z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
+
+ u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
+
+ /* Compute the middle word of the three-word intermediate result. */
+ u0tmp = gen_highpart (word_mode, z0);
+ u0tmp = force_reg (word_mode, u0tmp);
+ u0tmp = convert_to_mode (mode, u0tmp, 1);
+
+ /* We have to return
+ z2 + ((u0 + u1) >> BITS_PER_WORD)
+ u0 + u1 are the upper two words of the three-word
+ intermediate result and they could have up to
+ 2 * BITS_PER_WORD + 1 bits of precision.
+ We compute the extra bit by checking for carry, and add
+ 1 << BITS_PER_WORD to z2 if there is carry. */
+
+ expand_inc (u0, u0tmp);
+ tmp = gen_reg_rtx (mode);
+
+ tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
+ if (!tmp)
+ return 0;
+
+ /* Checking for carry here. */
+ carry = gen_reg_rtx (mode);
+
+ emit_store_flag_force (carry, GT, u0, tmp, mode, 1, 1);
+
+ carry_result = gen_reg_rtx (mode);
+ carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1);
+
+ /* Adding 0x100000000 as carry here if required. */
+ expand_inc (z2, carry_result);
+
+ /* Extracting the higher part of the sum. */
+ tmp = gen_highpart (word_mode, tmp);
+ tmp = force_reg (word_mode, tmp);
+ tmp = convert_to_mode (mode, tmp, 1);
+
+ /* The final result. */
+ expand_inc (z2, tmp);
+
+ return z2;
+
+ }
+
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-06-07 10:43 ` Dinar Temirbulatov
@ 2012-06-07 14:36 ` Paolo Bonzini
2012-06-08 18:37 ` Dinar Temirbulatov
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2012-06-07 14:36 UTC (permalink / raw)
To: Dinar Temirbulatov
Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
aph, Alexey Kravets
Il 07/06/2012 12:21, Dinar Temirbulatov ha scritto:
> oh, I found typo in comment in the end of patch. fixed.
Great improvement, thanks!
Unfortunately we're not there yet, but much closer! I could understand
the new code much better so I suggest some more improvements below, both
to the comments and to the code generation.
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 8a86227..0f8120f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
> *total = COSTS_N_INSNS (2);
> else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
> *total = COSTS_N_INSNS (4);
> + else if (mode == DImode)
> + *total = COSTS_N_INSNS (50);
> else
> *total = COSTS_N_INSNS (20);
> return false;
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 5bcb7a8..57bb4cc 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
> }
> *total = COSTS_N_INSNS (mips_idiv_insns ());
> }
> - else if (mode == DImode)
> + else if (mode == DImode) {
> + if (!TARGET_64BIT)
> + /* divide double integer libcall is expensive. */
> + *total = COSTS_N_INSNS (200);
> + else
> *total = mips_cost->int_div_di;
> + }
> else
> *total = mips_cost->int_div_si;
> return false;
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 98f7c09..bb4d7cd 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -3539,6 +3539,84 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
> }
> }
>
> + if (unsignedp
> + && size - 1 > BITS_PER_WORD
> + && (!optimize_size && (optimize>1))
Coding style: "(!optimize_size && optimize > 1)".
> + && (4 * mul_cost[speed][mode] + 4 * add_cost[speed][mode]
> + + shift_cost[speed][mode][31] < max_cost))
Thanks, this is now much cleaner and I could see other improvements.
This should be
3 * mul_widen_cost[speed][mode] + mul_highpart_cost[speed][mode] +
4 * add_cost[speed][mode] + add_cost[speed][word_mode]
That is because there is no shift really: a shift by 32 is simply moving
the operand to the higher word, and an add of that value will ignore the
lower word. Hence, summing carry_result is cheaper: that is
add_cost[speed][word_mode].
On the other hand you also have to consider the comparison emitted by
emit_store_flag_force, which will usually cost the same as an addition.
That is the fourth add_cost[speed][mode].
> + {
> + rtx x1, x0, y1, y0, z2, z0, tmp, u0, u0tmp, u1, carry, carry_result, result;
> + /* Extracting the higher part of the 64-bit multiplier. */
> + x1 = gen_highpart (word_mode, op0);
> + x1 = force_reg (word_mode, x1);
> +
> + /* Extracting the lower part of the 64-bit multiplier. */
> + x0 = gen_lowpart (word_mode, op0);
> + x0 = force_reg (word_mode, x0);
> +
> + /* Splitting the 64-bit constant for the higher and the lower parts. */
> + y0 = gen_lowpart (word_mode, op1);
> + y0 = force_reg (word_mode, y0);
> + y1 = gen_highpart_mode (word_mode, mode, op1);
> +
> + z2 = gen_reg_rtx (mode);
> + u0 = gen_reg_rtx (mode);
You do not need the gen_reg_rtx; just pass a NULL_RTX target to
expand_widening_mult.
> + z2 = expand_widening_mult (mode, x1, y1, z2, 1, umul_widen_optab);
> +
Remove the empty line. Also, let's rename the values to make clear
where is the multiplication of what:
z2 -> u11
u0 -> u01
z0 -> u00
u1 -> u10
> + u0 = expand_widening_mult (mode, x0, y1, u0, 1, umul_widen_optab);
> +
> + z0 = gen_reg_rtx (mode);
> + u1 = gen_reg_rtx (mode);
gen_reg_rtx is not needed here too.
> + z0 = expand_widening_mult (mode, x0, y0, z0, 1, umul_widen_optab);
> +
And neither is this blank line.
> + u1 = expand_widening_mult (mode, x1, y0, u1, 1, umul_widen_optab);
> + /* Compute the middle word of the three-word intermediate result. */
^^^^^^
Oops, this is the low word, not the middle. But let's improve the
comment to explain the algorithm.
/* u00, u01, u10, u11 form a three-word value with the result
in the top word, so we want to return this:
((u11 << 2*BITS_PER_WORD) +
(u01 + u10 << BITS_PER_WORD) +
u00) >> 3 * BITS_PER_WORD
We then rewrite it this way:
(u11 + ((u01 + u10 + (u00 >> BITS_PER_WORD))
>> BITS_PER_WORD) >> BITS_PER_WORD
where the shifts are realized with gen_highpart and a
conversion back to the wider mode. */
> + u0tmp = gen_highpart (word_mode, z0);
> + u0tmp = force_reg (word_mode, u0tmp);
> + u0tmp = convert_to_mode (mode, u0tmp, 1);
u0tmp -> u00h
Put the expand_inc (u01, u00h); before the comment. The formula is
now above so we can say more simply:
/* Summing u01, u10 and u00h together could have up to
2 * BITS_PER_WORD + 1 bits of precision.
We compute the extra bit by checking for carry, and add
1 << BITS_PER_WORD to u11 if there is carry. */
> +
> + expand_inc (u0, u0tmp);
> + tmp = gen_reg_rtx (mode);
> + tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN);
Now that you have a single emit_store_flag_force, you can avoid "tmp =
gen_reg_rtx (mode)" and just use
expand_inc (u01, u10);
> + if (!tmp)
> + return 0;
This cannot fail, you can remove the "if".
> +
> + /* Checking for carry here. */
> + carry = gen_reg_rtx (mode);
> + emit_store_flag_force (carry, GT, u0, tmp, mode, 1, 1);
Since above you will use u01 as the target, you have to use u10 instead
here:
carry = emit_store_flag_force (NULL_RTX, GT, u10, u01,
mode, 1, 1);
i.e. operand > result. That's a nice improvement, and should generate
optimal code like:
add r0, r4 ; r0:r1 += 0:r4 u0 += u0h
adc r1, 0
add r0, r2 ; r0:r1 += r2:r3
adc r1, r3
sub r2, r0 ; flags = r2:r3 CMP r0:r1
sbc r3, r1
it hi ; if r2:r3 > r0:r1
add r6, #1 ; ... r6:r7 += 1:0
add r6, r0 ; r6:r7 += r0:r1
adc r7, r1
for everything after the multiplications. This matches nicely the cost
estimation above.
> + carry_result = gen_reg_rtx (mode);
No need for this gen_reg_rtx, either, by passing a NULL_RTX target below.
> + carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1);
> +
> + /* Adding 0x100000000 as carry here if required. */
Oops, a remnant of 32-bit specific code.
/* Adding 1 << BITS_PER_WORD as carry here if required. */
> + expand_inc (z2, carry_result);
> +
> + /* Extracting the higher part of the sum. */
> + tmp = gen_highpart (word_mode, tmp);
> + tmp = force_reg (word_mode, tmp);
> + tmp = convert_to_mode (mode, tmp, 1);
And these will use u01 instead of tmp.
> + /* The final result. */
> + expand_inc (z2, tmp);
> + return z2;
> +
> + }
> +
> /* Try widening multiplication of opposite signedness, and adjust. */
> moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
> if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
>
I hope you appreciate the improvements!
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-06-07 14:36 ` Paolo Bonzini
@ 2012-06-08 18:37 ` Dinar Temirbulatov
2012-06-11 8:03 ` Paolo Bonzini
0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-08 18:37 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
aph, Alexey Kravets
[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]
Hi, Paolo.
Here is the new version of patch. I have tested this version with gcc
testsuite only on i686 without new regressions, for now. Mips and arm
tests are in progress.
One strange thing I noticed:
>
> No need for this gen_reg_rtx, either, by passing a NULL_RTX target below.
>
>> + carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1);
>> +
>> + /* Adding 0x100000000 as carry here if required. */
>
> Oops, a remnant of 32-bit specific code.
>
that I have to add convert_to_mode () to DImode after
emit_store_flag_force (), since emit_store_flag_force () returns
"carry" in SImode and without convert_to_mode () call compiler fails
with this error:
Breakpoint 2, simplify_subreg (outermode=SImode, op=0x7ffff56cdf20,
innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5423
5423 gcc_assert (GET_MODE (op) == innermode
(gdb) bt
#0 simplify_subreg (outermode=SImode, op=0x7ffff56cdf20,
innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5423
#1 0x0000000000aea223 in simplify_gen_subreg (outermode=SImode,
op=0x7ffff56cdf20, innermode=DImode, byte=0) at
../../gcc-20120418-1/gcc/simplify-rtx.c:5763
#2 0x0000000000733c99 in operand_subword (op=0x7ffff56cdf20,
offset=0, validate_address=1, mode=DImode) at
../../gcc-20120418-1/gcc/emit-rtl.c:1427
#3 0x0000000000733cc6 in operand_subword_force (op=0x7ffff56cdf20,
offset=0, mode=DImode) at ../../gcc-20120418-1/gcc/emit-rtl.c:1440
#4 0x0000000000a016b3 in expand_binop (mode=DImode,
binoptab=0x195f580, op0=0x7ffff56cdf20, op1=0x7ffff583d670,
target=0x7ffff56cdfa0, unsignedp=1, methods=OPTAB_DIRECT)
at ../../gcc-20120418-1/gcc/optabs.c:1779
#5 0x00000000007525af in expand_shift_1 (code=LSHIFT_EXPR,
mode=DImode, shifted=0x7ffff56cdf20, amount=0x7ffff583d670,
target=0x0, unsignedp=1)
at ../../gcc-20120418-1/gcc/expmed.c:2273
#6 0x00000000007526b6 in expand_shift (code=LSHIFT_EXPR, mode=DImode,
shifted=0x7ffff56cdf20, amount=32, target=0x0, unsignedp=1) at
../../gcc-20120418-1/gcc/expmed.c:2318
#7 0x00000000007563e6 in expand_mult_highpart_optab (mode=DImode,
op0=0x7ffff56cdcc0, op1=0x7ffff56b1e00, target=0x0, unsignedp=1,
max_cost=188)
at ../../gcc-20120418-1/gcc/expmed.c:3581
#8 0x0000000000756747 in expand_mult_highpart (mode=DImode,
op0=0x7ffff56cdcc0, op1=0x7ffff56b1e00, target=0x0, unsignedp=1,
max_cost=188)
at ../../gcc-20120418-1/gcc/expmed.c:3654
thanks, Dinar.
[-- Attachment #2: 30.patch --]
[-- Type: application/octet-stream, Size: 4331 bytes --]
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8a86227..0f8120f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
*total = COSTS_N_INSNS (2);
else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
*total = COSTS_N_INSNS (4);
+ else if (mode == DImode)
+ *total = COSTS_N_INSNS (50);
else
*total = COSTS_N_INSNS (20);
return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5bcb7a8..57bb4cc 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
}
*total = COSTS_N_INSNS (mips_idiv_insns ());
}
- else if (mode == DImode)
+ else if (mode == DImode) {
+ if (!TARGET_64BIT)
+ /* divide double integer libcall is expensive. */
+ *total = COSTS_N_INSNS (200);
+ else
*total = mips_cost->int_div_di;
+ }
else
*total = mips_cost->int_div_si;
return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 98f7c09..51a0d9b 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3539,6 +3539,78 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
}
}
+ if (unsignedp
+ && size - 1 > BITS_PER_WORD
+ && (!optimize_size && optimize > 1)
+ && (3 * mul_widen_cost[speed][mode] + mul_highpart_cost[speed][mode] +
+ 4 * add_cost[speed][mode] + add_cost[speed][word_mode] < max_cost))
+ {
+ rtx x1, x0, y1, y0, u00, u01, u10, u11, u00h, carry, carry_result;
+
+ /* Extracting the higher part of the 64-bit multiplier. */
+ x1 = gen_highpart (word_mode, op0);
+ x1 = force_reg (word_mode, x1);
+
+ /* Extracting the lower part of the 64-bit multiplier. */
+ x0 = gen_lowpart (word_mode, op0);
+ x0 = force_reg (word_mode, x0);
+
+ /* Splitting the 64-bit constant for the higher and the lower parts. */
+ y0 = gen_lowpart (word_mode, op1);
+ y0 = force_reg (word_mode, y0);
+ y1 = gen_highpart_mode (word_mode, mode, op1);
+
+ u11 = expand_widening_mult (mode, x1, y1, NULL_RTX, 1, umul_widen_optab);
+ u01 = expand_widening_mult (mode, x0, y1, NULL_RTX, 1, umul_widen_optab);
+ u00 = expand_widening_mult (mode, x0, y0, NULL_RTX, 1, umul_widen_optab);
+ u10 = expand_widening_mult (mode, x1, y0, NULL_RTX, 1, umul_widen_optab);
+
+ /* u00, u01, u10, u11 form a three-word value with the result
+ in the top word, so we want to return this:
+
+ ((u11 << 2*BITS_PER_WORD) +
+ (u01 + u10 << BITS_PER_WORD) +
+ u00) >> 3 * BITS_PER_WORD
+
+ We then rewrite it this way:
+
+ (u11 + ((u01 + u10 + (u00 >> BITS_PER_WORD))
+ >> BITS_PER_WORD) >> BITS_PER_WORD
+
+ where the shifts are realized with gen_highpart and a
+ conversion back to the wider mode. */
+ u00h = gen_highpart (word_mode, u00);
+ u00h = force_reg (word_mode, u00h);
+ u00h = convert_to_mode (mode, u00h, 1);
+ expand_inc (u01, u00h);
+
+ /* Summing u01, u10 and u00h together could have up to
+ 2 * BITS_PER_WORD + 1 bits of precision.
+ We compute the extra bit by checking for carry, and add
+ 1 << BITS_PER_WORD to u11 if there is carry. */
+ expand_inc (u01, u10);
+
+ /* Checking for carry here. */
+ carry = emit_store_flag_force (NULL_RTX, GT, u10, u01,
+ mode, 1, 1);
+ carry = convert_to_mode (mode, carry, 1);
+ carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, NULL_RTX, 1);
+
+ /* Adding 1 << BITS_PER_WORD as carry here if required. */
+ expand_inc (u11, carry_result);
+
+ /* Extracting the higher part of the sum. */
+ u01 = gen_highpart (word_mode, u01);
+ u01 = force_reg (word_mode, u01);
+ u01 = convert_to_mode (mode, u01, 1);
+
+ /* The final result. */
+ expand_inc (u11, u01);
+
+ return u11;
+
+ }
+
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-06-08 18:37 ` Dinar Temirbulatov
@ 2012-06-11 8:03 ` Paolo Bonzini
[not found] ` <CAMnfPmOaL2x4yi0AYOG7KQbjugCM6J5WsAjYg9eY2mELEVfJTw@mail.gmail.com>
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2012-06-11 8:03 UTC (permalink / raw)
To: Dinar Temirbulatov
Cc: Richard Henderson, Richard Earnshaw, Michael Hope, gcc-patches,
aph, Alexey Kravets
Il 08/06/2012 20:13, Dinar Temirbulatov ha scritto:
> that I have to add convert_to_mode () to DImode after
> emit_store_flag_force (), since emit_store_flag_force () returns
> "carry" in SImode and without convert_to_mode () call compiler fails
> with this error:
Yes, that makes sense. The new patch looks ok to me. Just one
question, do you have proof that this:
> + /* u00, u01, u10, u11 form a three-word value with the result
> + in the top word, so we want to return this:
> +
> + ((u11 << 2*BITS_PER_WORD) +
> + (u01 + u10 << BITS_PER_WORD) +
> + u00) >> 3 * BITS_PER_WORD
> +
> + We then rewrite it this way:
> +
> + (u11 + ((u01 + u10 + (u00 >> BITS_PER_WORD))
> + >> BITS_PER_WORD) >> BITS_PER_WORD
is safe? That is, that the underflows cannot produce a wrong result?
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
[not found] ` <4FD6E900.9010903@gnu.org>
@ 2012-06-14 19:05 ` Dinar Temirbulatov
2012-06-15 8:16 ` Richard Earnshaw
0 siblings, 1 reply; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-14 19:05 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Alexey Kravets, gcc-patches, Michael Hope, Richard Henderson,
Richard Earnshaw, aph
[-- Attachment #1: Type: text/plain, Size: 307 bytes --]
Hi,
OK for trunk?
thanks, Dinar.
On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto:
>>> is safe? That is, that the underflows cannot produce a wrong result?
>
> [snip]
>
> Thanks very much!
>
> Paolo
[-- Attachment #2: ChangeLog.txt --]
[-- Type: text/plain, Size: 541 bytes --]
2012-06-14 Dinar Temirbulatov <dtemirbulatov@gmail.com>
Alexey Kravets <mr.kayrick@gmail.com>
Paolo Bonzini <bonzini@gnu.org>
* config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
double-word division operation.
* config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
double-word division operation for 32-bit targets.
* gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
product for unsigned double-word integers using 32-bit wide registers.
[-- Attachment #3: 30.patch --]
[-- Type: application/octet-stream, Size: 4331 bytes --]
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8a86227..0f8120f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7130,6 +7130,8 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed)
*total = COSTS_N_INSNS (2);
else if (TARGET_HARD_FLOAT && mode == DFmode && !TARGET_VFP_SINGLE)
*total = COSTS_N_INSNS (4);
+ else if (mode == DImode)
+ *total = COSTS_N_INSNS (50);
else
*total = COSTS_N_INSNS (20);
return false;
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 5bcb7a8..57bb4cc 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -3879,8 +3879,13 @@ mips_rtx_costs (rtx x, int code, int outer_code, int opno ATTRIBUTE_UNUSED,
}
*total = COSTS_N_INSNS (mips_idiv_insns ());
}
- else if (mode == DImode)
+ else if (mode == DImode) {
+ if (!TARGET_64BIT)
+ /* divide double integer libcall is expensive. */
+ *total = COSTS_N_INSNS (200);
+ else
*total = mips_cost->int_div_di;
+ }
else
*total = mips_cost->int_div_si;
return false;
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 98f7c09..51a0d9b 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -3539,6 +3539,78 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
}
}
+ if (unsignedp
+ && size - 1 > BITS_PER_WORD
+ && (!optimize_size && optimize > 1)
+ && (3 * mul_widen_cost[speed][mode] + mul_highpart_cost[speed][mode] +
+ 4 * add_cost[speed][mode] + add_cost[speed][word_mode] < max_cost))
+ {
+ rtx x1, x0, y1, y0, u00, u01, u10, u11, u00h, carry, carry_result;
+
+ /* Extracting the higher part of the 64-bit multiplier. */
+ x1 = gen_highpart (word_mode, op0);
+ x1 = force_reg (word_mode, x1);
+
+ /* Extracting the lower part of the 64-bit multiplier. */
+ x0 = gen_lowpart (word_mode, op0);
+ x0 = force_reg (word_mode, x0);
+
+ /* Splitting the 64-bit constant for the higher and the lower parts. */
+ y0 = gen_lowpart (word_mode, op1);
+ y0 = force_reg (word_mode, y0);
+ y1 = gen_highpart_mode (word_mode, mode, op1);
+
+ u11 = expand_widening_mult (mode, x1, y1, NULL_RTX, 1, umul_widen_optab);
+ u01 = expand_widening_mult (mode, x0, y1, NULL_RTX, 1, umul_widen_optab);
+ u00 = expand_widening_mult (mode, x0, y0, NULL_RTX, 1, umul_widen_optab);
+ u10 = expand_widening_mult (mode, x1, y0, NULL_RTX, 1, umul_widen_optab);
+
+ /* u00, u01, u10, u11 form a three-word value with the result
+ in the top word, so we want to return this:
+
+ ((u11 << 2*BITS_PER_WORD) +
+ (u01 + u10 << BITS_PER_WORD) +
+ u00) >> 3 * BITS_PER_WORD
+
+ We then rewrite it this way:
+
+ (u11 + ((u01 + u10 + (u00 >> BITS_PER_WORD))
+ >> BITS_PER_WORD) >> BITS_PER_WORD
+
+ where the shifts are realized with gen_highpart and a
+ conversion back to the wider mode. */
+ u00h = gen_highpart (word_mode, u00);
+ u00h = force_reg (word_mode, u00h);
+ u00h = convert_to_mode (mode, u00h, 1);
+ expand_inc (u01, u00h);
+
+ /* Summing u01, u10 and u00h together could have up to
+ 2 * BITS_PER_WORD + 1 bits of precision.
+ We compute the extra bit by checking for carry, and add
+ 1 << BITS_PER_WORD to u11 if there is carry. */
+ expand_inc (u01, u10);
+
+ /* Checking for carry here. */
+ carry = emit_store_flag_force (NULL_RTX, GT, u10, u01,
+ mode, 1, 1);
+ carry = convert_to_mode (mode, carry, 1);
+ carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, NULL_RTX, 1);
+
+ /* Adding 1 << BITS_PER_WORD as carry here if required. */
+ expand_inc (u11, carry_result);
+
+ /* Extracting the higher part of the sum. */
+ u01 = gen_highpart (word_mode, u01);
+ u01 = force_reg (word_mode, u01);
+ u01 = convert_to_mode (mode, u01, 1);
+
+ /* The final result. */
+ expand_inc (u11, u01);
+
+ return u11;
+
+ }
+
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-06-14 19:05 ` Dinar Temirbulatov
@ 2012-06-15 8:16 ` Richard Earnshaw
2012-06-15 18:03 ` Dinar Temirbulatov
0 siblings, 1 reply; 19+ messages in thread
From: Richard Earnshaw @ 2012-06-15 8:16 UTC (permalink / raw)
To: Dinar Temirbulatov
Cc: Paolo Bonzini, Alexey Kravets, gcc-patches, Michael Hope,
Richard Henderson, aph
On 14/06/12 19:46, Dinar Temirbulatov wrote:
> Hi,
> OK for trunk?
> thanks, Dinar.
>
I'm still not comfortable about the code bloat that this is likely to
incurr at -O2.
R.
> On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto:
>>>> is safe? That is, that the underflows cannot produce a wrong result?
>>
>> [snip]
>>
>> Thanks very much!
>>
>> Paolo=
>>
>> ChangeLog.txt
>>
>>
>> 2012-06-14 Dinar Temirbulatov <dtemirbulatov@gmail.com>
>> Alexey Kravets <mr.kayrick@gmail.com>
>> Paolo Bonzini <bonzini@gnu.org>
>>
>> * config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
>> double-word division operation.
>> * config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
>> double-word division operation for 32-bit targets.
>> * gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
>> product for unsigned double-word integers using 32-bit wide registers.
>>
>> 30.patch
>>
>>
>> N\x18¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: divide 64-bit by constant for 32-bit target machines
2012-06-15 8:16 ` Richard Earnshaw
@ 2012-06-15 18:03 ` Dinar Temirbulatov
0 siblings, 0 replies; 19+ messages in thread
From: Dinar Temirbulatov @ 2012-06-15 18:03 UTC (permalink / raw)
To: Richard Earnshaw
Cc: Paolo Bonzini, Alexey Kravets, gcc-patches, Michael Hope,
Richard Henderson, aph
Hi, Richard,
How about if I add and utilize "umul_highpart_di" to the libgcc
instead of expanding
multiplication for the high part directly, or add my own function with
with pre-shift, post-shift, and
64-bit constant and 64-bit operand as function parameters for division
for less than "-O3"?
thanks, Dinar.
On Fri, Jun 15, 2012 at 12:12 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 14/06/12 19:46, Dinar Temirbulatov wrote:
>> Hi,
>> OK for trunk?
>> thanks, Dinar.
>>
>
> I'm still not comfortable about the code bloat that this is likely to
> incurr at -O2.
>
> R.
>
>> On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>>> Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto:
>>>>> is safe? That is, that the underflows cannot produce a wrong result?
>>>
>>> [snip]
>>>
>>> Thanks very much!
>>>
>>> Paolo=
>>>
>>> ChangeLog.txt
>>>
>>>
>>> 2012-06-14 Dinar Temirbulatov <dtemirbulatov@gmail.com>
>>> Alexey Kravets <mr.kayrick@gmail.com>
>>> Paolo Bonzini <bonzini@gnu.org>
>>>
>>> * config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer
>>> double-word division operation.
>>> * config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer
>>> double-word division operation for 32-bit targets.
>>> * gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation
>>> product for unsigned double-word integers using 32-bit wide registers.
>>>
>>> 30.patch
>>>
>>>
>>> N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2012-06-15 17:53 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-20 12:57 divide 64-bit by constant for 32-bit target machines Dinar Temirbulatov
2012-04-23 14:30 ` Andrew Haley
2012-04-24 1:49 ` Michael Hope
2012-05-03 10:28 ` Dinar Temirbulatov
2012-05-03 13:41 ` Richard Earnshaw
2012-05-22 14:05 ` Dinar Temirbulatov
2012-05-22 15:46 ` Richard Henderson
2012-05-25 10:20 ` Dinar Temirbulatov
2012-05-26 12:35 ` Paolo Bonzini
2012-05-26 12:46 ` Paolo Bonzini
2012-06-07 10:21 ` Dinar Temirbulatov
2012-06-07 10:43 ` Dinar Temirbulatov
2012-06-07 14:36 ` Paolo Bonzini
2012-06-08 18:37 ` Dinar Temirbulatov
2012-06-11 8:03 ` Paolo Bonzini
[not found] ` <CAMnfPmOaL2x4yi0AYOG7KQbjugCM6J5WsAjYg9eY2mELEVfJTw@mail.gmail.com>
[not found] ` <4FD6E900.9010903@gnu.org>
2012-06-14 19:05 ` Dinar Temirbulatov
2012-06-15 8:16 ` Richard Earnshaw
2012-06-15 18:03 ` Dinar Temirbulatov
2012-05-26 12:39 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).