public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/37443] New: fast 64-bit divide by constant 32-bit
@ 2008-09-09 13:59 ajrobb at bigfoot dot com
2008-09-13 12:28 ` [Bug middle-end/37443] " rguenth at gcc dot gnu dot org
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: ajrobb at bigfoot dot com @ 2008-09-09 13:59 UTC (permalink / raw)
To: gcc-bugs
consider the code fragment:
uint64_t slow(uint64_t x) {
return x / 1220703125u;
}
This can be replaced by:
uint64_t fast(uint64_t x) {
uint32_t a = ((x >> 32) * 1270091284u) >> 32;
uint32_t b = ((x & 0xffffffffu) * 3777893186u) >> 32;
return ((x >> 32) * 3777893186ull + a + b) >> 30;
}
The 'fast' code runs 50% faster than 'slow'. However, removing the redundant
multiplies (see my earlier bug - fixed in 4.4 trunk) and tidying up storage, I
can use the following assembler to run nearly 100% faster than 'slow':
.p2align 4,,15
.globl _fast
.def _fast; .scl 2; .type 32; .endef
_fast:
pushl %ebx
movl $1270091284, %eax
mull 12(%esp)
movl $-517074110, %eax
movl %edx, %ebx
mull 8(%esp)
movl $-517074110, %eax
movl %edx, %ecx
mull 12(%esp)
addl %ebx, %ecx
popl %ebx
adcl $0, %edx
addl %ecx, %eax
adcl $0, %edx
shrdl $30, %edx, %eax
shrl $30, %edx
ret
NOTE: the 2 multipliers are derived using 96-bit arithmetic:
d = 1220703125u
-517074110 = 0xffe12e1342u = (((1u << 94) + d - 1) / d) >> 32
1270091284 = (((1u << 94) + d - 1) / d) & 0xffffffffu
--
Summary: fast 64-bit divide by constant 32-bit
Product: gcc
Version: 4.2.3
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: ajrobb at bigfoot dot com
GCC build triplet: i686-pc-cygwin
GCC host triplet: i686-pc-cygwin
GCC target triplet: i686-pc-cygwin
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/37443] fast 64-bit divide by constant 32-bit
2008-09-09 13:59 [Bug c/37443] New: fast 64-bit divide by constant 32-bit ajrobb at bigfoot dot com
@ 2008-09-13 12:28 ` rguenth at gcc dot gnu dot org
2008-09-16 0:18 ` ajrobb at bigfoot dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-09-13 12:28 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Keywords| |missed-optimization
Last reconfirmed|0000-00-00 00:00:00 |2008-09-13 12:26:41
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/37443] fast 64-bit divide by constant 32-bit
2008-09-09 13:59 [Bug c/37443] New: fast 64-bit divide by constant 32-bit ajrobb at bigfoot dot com
2008-09-13 12:28 ` [Bug middle-end/37443] " rguenth at gcc dot gnu dot org
@ 2008-09-16 0:18 ` ajrobb at bigfoot dot com
2008-09-16 7:16 ` [Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform ajrobb at bigfoot dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: ajrobb at bigfoot dot com @ 2008-09-16 0:18 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from ajrobb at bigfoot dot com 2008-09-16 00:17 -------
I have tested my initial routine with more values and it turns out that the
short-cut of omitting the least significant multiplication is wrong. I added
the 4th multiply to generate the full 128-bit precision. I also took the
multiplier used by gcc 4.2.3 on x86_64-linux-gnu (ubuntu) (4056481920730334085)
and (28+64) bit shift to achieve the division by 1220703125u (largest 32-bit
power of 5):
uint64_t fast(uint64_t x) {
static const uint64_t fact = 4056481920730334085ull;
uint32_t c = ((x & 0xffffffffu) * (fact & 0xffffffffu)) >> 32;
uint64_t a = ((x & 0xffffffffu) * (fact >> 32) + c);
/*
* Split the addition of the two middle 64-bit intermediate results
* into 2 stages to avoid possible overflow.
* (not actually an issue with this value of 'fact')
*/
uint32_t b = ((x >> 32) * (fact & 0xffffffffu) + (a & 0xffffffffu)) >> 32;
return ((x >> 32) * (fact >> 32) + (a >> 32) + b) >> 28;
}
After hand coding the assembler on 32-bit x86, the extra multiply increased the
run time from 15s to 17s - still much faster than 'slow' (64-bit divide API
call).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform
2008-09-09 13:59 [Bug c/37443] New: fast 64-bit divide by constant 32-bit ajrobb at bigfoot dot com
2008-09-13 12:28 ` [Bug middle-end/37443] " rguenth at gcc dot gnu dot org
2008-09-16 0:18 ` ajrobb at bigfoot dot com
@ 2008-09-16 7:16 ` ajrobb at bigfoot dot com
2008-09-20 17:34 ` ajrobb at bigfoot dot com
2008-09-21 2:41 ` ajrobb at bigfoot dot com
4 siblings, 0 replies; 7+ messages in thread
From: ajrobb at bigfoot dot com @ 2008-09-16 7:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from ajrobb at bigfoot dot com 2008-09-16 07:15 -------
I changed the summary as it is equally applicable to all 64-bit constant
divides on 32-bit x86 that are already compiled as multiplies on 64-bit x64. It
might be worth implementing as another API call, replacing the 64-bit divisor
with a 64-bit multiplier and a shift count.
--
ajrobb at bigfoot dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|fast 64-bit divide by |fast 64-bit divide by
|constant 32-bit |constant on 32-bit platform
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform
2008-09-09 13:59 [Bug c/37443] New: fast 64-bit divide by constant 32-bit ajrobb at bigfoot dot com
` (2 preceding siblings ...)
2008-09-16 7:16 ` [Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform ajrobb at bigfoot dot com
@ 2008-09-20 17:34 ` ajrobb at bigfoot dot com
2008-09-21 2:41 ` ajrobb at bigfoot dot com
4 siblings, 0 replies; 7+ messages in thread
From: ajrobb at bigfoot dot com @ 2008-09-20 17:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from ajrobb at bigfoot dot com 2008-09-20 17:32 -------
Created an attachment (id=16369)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16369&action=view)
proposed 32-bit API calls for 64-bit constant divison
I have attached a sample assembler code for a API calls for the 2 types of
64-bit division through multiplication:
mul0shift - for multiplier with 64 or fewer significant bits
mul1shift - for multiplier with 65 significant bits (implicit high bit)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform
2008-09-09 13:59 [Bug c/37443] New: fast 64-bit divide by constant 32-bit ajrobb at bigfoot dot com
` (3 preceding siblings ...)
2008-09-20 17:34 ` ajrobb at bigfoot dot com
@ 2008-09-21 2:41 ` ajrobb at bigfoot dot com
4 siblings, 0 replies; 7+ messages in thread
From: ajrobb at bigfoot dot com @ 2008-09-21 2:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from ajrobb at bigfoot dot com 2008-09-21 02:39 -------
Created an attachment (id=16370)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16370&action=view)
proposed 32-bit API calls for 64-bit constant divison
The original attachment did not handle shift greater than 31.
--
ajrobb at bigfoot dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #16369|0 |1
is obsolete| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform
[not found] <bug-37443-4@http.gcc.gnu.org/bugzilla/>
@ 2021-08-15 11:28 ` pinskia at gcc dot gnu.org
0 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-15 11:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Build|i686-pc-cygwin |
Host|i686-pc-cygwin |
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Must be a cost model issue because I can get /10u working but not /1220703125u
(in GCC 11+).
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-08-15 11:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-09 13:59 [Bug c/37443] New: fast 64-bit divide by constant 32-bit ajrobb at bigfoot dot com
2008-09-13 12:28 ` [Bug middle-end/37443] " rguenth at gcc dot gnu dot org
2008-09-16 0:18 ` ajrobb at bigfoot dot com
2008-09-16 7:16 ` [Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform ajrobb at bigfoot dot com
2008-09-20 17:34 ` ajrobb at bigfoot dot com
2008-09-21 2:41 ` ajrobb at bigfoot dot com
[not found] <bug-37443-4@http.gcc.gnu.org/bugzilla/>
2021-08-15 11:28 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).