* Option to make unsigned->signed conversion always well-defined? @ 2011-10-05 22:56 Ulf Magnusson 2011-10-06 8:25 ` Ulf Magnusson ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Ulf Magnusson @ 2011-10-05 22:56 UTC (permalink / raw) To: gcc Hi, I've been experimenting with different methods for emulating the signed overflow of an 8-bit CPU. The method I've found that seems to generate the most efficient code on both ARM and x86 is bool overflow(unsigned int a, unsigned int b) { const unsigned int sum = (int8_t)a + (int8_t)b; return (int8_t)sum != sum; } (The real function would probably be 'inline', of course. Regs are stored in overlong variables, hence 'unsigned int'.) Looking at the spec, it unfortunately seems the behavior of this function is undefined, as it relies on signed int addition wrapping, and that (int8_t)sum truncates bits. Is there some way to make this guaranteed safe with GCC without resorting to inline asm? Locally enabling -fwrap takes care of the addition, but that still leaves the conversion. /Ulf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-05 22:56 Option to make unsigned->signed conversion always well-defined? Ulf Magnusson @ 2011-10-06 8:25 ` Ulf Magnusson 2011-10-06 9:04 ` Pedro Pedruzzi ` (2 subsequent siblings) 3 siblings, 0 replies; 18+ messages in thread From: Ulf Magnusson @ 2011-10-06 8:25 UTC (permalink / raw) To: gcc On Wed, Oct 5, 2011 at 10:11 PM, Ulf Magnusson <ulfalizer@gmail.com> wrote: > Hi, > > I've been experimenting with different methods for emulating the > signed overflow of an 8-bit CPU. The method I've found that seems to > generate the most efficient code on both ARM and x86 is > > bool overflow(unsigned int a, unsigned int b) { > const unsigned int sum = (int8_t)a + (int8_t)b; > return (int8_t)sum != sum; > } > > (The real function would probably be 'inline', of course. Regs are > stored in overlong variables, hence 'unsigned int'.) > > Looking at the spec, it unfortunately seems the behavior of this > function is undefined, as it relies on signed int addition wrapping, > and that (int8_t)sum truncates bits. Is there some way to make this > guaranteed safe with GCC without resorting to inline asm? Locally > enabling -fwrap takes care of the addition, but that still leaves the > conversion. > > /Ulf > Is *((int8_t*)&sum) safe (assuming little endian)? Unfortunately that seems to generate worse code. On X86 it generates the following (GCC 4.5.2): 00000050 <_Z9overflow4jj>: 50: 83 ec 10 sub $0x10,%esp 53: 0f be 54 24 18 movsbl 0x18(%esp),%edx 58: 0f be 44 24 14 movsbl 0x14(%esp),%eax 5d: 8d 04 02 lea (%edx,%eax,1),%eax 60: 0f be d0 movsbl %al,%edx 63: 39 d0 cmp %edx,%eax 65: 0f 95 c0 setne %al 68: 83 c4 10 add $0x10,%esp 6b: c3 ret With the straight (int8_t) cast you get 50: 0f be 54 24 08 movsbl 0x8(%esp),%edx 55: 0f be 44 24 04 movsbl 0x4(%esp),%eax 5a: 8d 04 02 lea (%edx,%eax,1),%eax 5d: 0f be d0 movsbl %al,%edx 60: 39 c2 cmp %eax,%edx 62: 0f 95 c0 setne %al 65: c3 ret What's with the extra add/sub of ESP? /Ulf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-05 22:56 Option to make unsigned->signed conversion always well-defined? Ulf Magnusson 2011-10-06 8:25 ` Ulf Magnusson @ 2011-10-06 9:04 ` Pedro Pedruzzi 2011-10-06 10:52 ` Ulf Magnusson [not found] ` <4E8DBF3C.1020700@redhat.com> 2011-10-06 22:46 ` Florian Weimer 3 siblings, 1 reply; 18+ messages in thread From: Pedro Pedruzzi @ 2011-10-06 9:04 UTC (permalink / raw) To: gcc Em 05-10-2011 17:11, Ulf Magnusson escreveu: > Hi, > > I've been experimenting with different methods for emulating the > signed overflow of an 8-bit CPU. You would like to check whether a 8-bit signed addition will overflow or not, given the two operands. Is that correct? As you used the word `emulating', I am assuming that your function will not run by the mentioned CPU. Does this 8-bit CPU use two's complement representation? > The method I've found that seems to > generate the most efficient code on both ARM and x86 is > > bool overflow(unsigned int a, unsigned int b) { > const unsigned int sum = (int8_t)a + (int8_t)b; > return (int8_t)sum != sum; > } > > (The real function would probably be 'inline', of course. Regs are > stored in overlong variables, hence 'unsigned int'.) > > Looking at the spec, it unfortunately seems the behavior of this > function is undefined, as it relies on signed int addition wrapping, > and that (int8_t)sum truncates bits. Is there some way to make this > guaranteed safe with GCC without resorting to inline asm? Locally > enabling -fwrap takes care of the addition, but that still leaves the > conversion. I believe the cast from unsigned int to int8_t is implementation-defined for values that can't be represented in int8_t (e.g. 0xff). A kind of `undefined behavior' as well. I tried: bool overflow(unsigned int a, unsigned int b) { const unsigned int sum = a + b; return ((a & 0x80) == (b & 0x80)) && ((a & 0x80) != (sum & 0x80)); } But it is not as efficient as yours. -- Pedro Pedruzzi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-06 9:04 ` Pedro Pedruzzi @ 2011-10-06 10:52 ` Ulf Magnusson 2011-10-06 14:42 ` Ulf Magnusson 0 siblings, 1 reply; 18+ messages in thread From: Ulf Magnusson @ 2011-10-06 10:52 UTC (permalink / raw) To: Pedro Pedruzzi; +Cc: gcc On Thu, Oct 6, 2011 at 12:55 AM, Pedro Pedruzzi <pedro.pedruzzi@gmail.com> wrote: > Em 05-10-2011 17:11, Ulf Magnusson escreveu: >> Hi, >> >> I've been experimenting with different methods for emulating the >> signed overflow of an 8-bit CPU. > > You would like to check whether a 8-bit signed addition will overflow or > not, given the two operands. Is that correct? > > As you used the word `emulating', I am assuming that your function will > not run by the mentioned CPU. > No, it'll most likely only run on systems with a wider bitness. > Does this 8-bit CPU use two's complement representation? Yes, and the criterion for signed overflow is "both numbers have the same sign, but the sign of the sum is different". Should have made that more clear. > >> The method I've found that seems to >> generate the most efficient code on both ARM and x86 is >> >> bool overflow(unsigned int a, unsigned int b) { >> const unsigned int sum = (int8_t)a + (int8_t)b; >> return (int8_t)sum != sum; >> } >> >> (The real function would probably be 'inline', of course. Regs are >> stored in overlong variables, hence 'unsigned int'.) >> >> Looking at the spec, it unfortunately seems the behavior of this >> function is undefined, as it relies on signed int addition wrapping, >> and that (int8_t)sum truncates bits. Is there some way to make this >> guaranteed safe with GCC without resorting to inline asm? Locally >> enabling -fwrap takes care of the addition, but that still leaves the >> conversion. > > I believe the cast from unsigned int to int8_t is implementation-defined > for values that can't be represented in int8_t (e.g. 0xff). A kind of > `undefined behavior' as well. > > I tried: > > bool overflow(unsigned int a, unsigned int b) { > const unsigned int sum = a + b; > return ((a & 0x80) == (b & 0x80)) && ((a & 0x80) != (sum & 0x80)); > } > > But it is not as efficient as yours. > > -- > Pedro Pedruzzi > Yeah, I tried similar bit-trickery along the lines of bool overflow(unsigned int a, unsigned int b) { const uint8_t ab = (uint8_t)a; const uint8_t bb = (uint8_t)b; const uint8_t sum = ab + bb; return (ab ^ bb) & ~(ab ^ sum) & 0x80; } , but it doesn't seem to generate very efficient code. /Ulf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-06 10:52 ` Ulf Magnusson @ 2011-10-06 14:42 ` Ulf Magnusson 2011-10-06 14:45 ` Miles Bader 0 siblings, 1 reply; 18+ messages in thread From: Ulf Magnusson @ 2011-10-06 14:42 UTC (permalink / raw) To: Pedro Pedruzzi; +Cc: gcc On Thu, Oct 6, 2011 at 10:25 AM, Ulf Magnusson <ulfalizer@gmail.com> wrote: > On Thu, Oct 6, 2011 at 12:55 AM, Pedro Pedruzzi > <pedro.pedruzzi@gmail.com> wrote: >> Em 05-10-2011 17:11, Ulf Magnusson escreveu: >>> Hi, >>> >>> I've been experimenting with different methods for emulating the >>> signed overflow of an 8-bit CPU. >> >> You would like to check whether a 8-bit signed addition will overflow or >> not, given the two operands. Is that correct? >> >> As you used the word `emulating', I am assuming that your function will >> not run by the mentioned CPU. >> > > No, it'll most likely only run on systems with a wider bitness. > >> Does this 8-bit CPU use two's complement representation? > > Yes, and the criterion for signed overflow is "both numbers have the > same sign, but the sign of the sum is different". Should have made > that more clear. > >> >>> The method I've found that seems to >>> generate the most efficient code on both ARM and x86 is >>> >>> bool overflow(unsigned int a, unsigned int b) { >>> const unsigned int sum = (int8_t)a + (int8_t)b; >>> return (int8_t)sum != sum; >>> } >>> >>> (The real function would probably be 'inline', of course. Regs are >>> stored in overlong variables, hence 'unsigned int'.) >>> >>> Looking at the spec, it unfortunately seems the behavior of this >>> function is undefined, as it relies on signed int addition wrapping, >>> and that (int8_t)sum truncates bits. Is there some way to make this >>> guaranteed safe with GCC without resorting to inline asm? Locally >>> enabling -fwrap takes care of the addition, but that still leaves the >>> conversion. >> >> I believe the cast from unsigned int to int8_t is implementation-defined >> for values that can't be represented in int8_t (e.g. 0xff). A kind of >> `undefined behavior' as well. >> >> I tried: >> >> bool overflow(unsigned int a, unsigned int b) { >> const unsigned int sum = a + b; >> return ((a & 0x80) == (b & 0x80)) && ((a & 0x80) != (sum & 0x80)); >> } >> >> But it is not as efficient as yours. >> >> -- >> Pedro Pedruzzi >> > > Yeah, I tried similar bit-trickery along the lines of > > bool overflow(unsigned int a, unsigned int b) { > const uint8_t ab = (uint8_t)a; > const uint8_t bb = (uint8_t)b; > const uint8_t sum = ab + bb; > return (ab ^ bb) & ~(ab ^ sum) & 0x80; > } > > , but it doesn't seem to generate very efficient code. > > /Ulf > Might as well do bool overflowbit(unsigned int a, unsigned int b) { const unsigned int sum = a + b; return (a ^ b) & ~(a ^ sum) & 0x80; } But still not very good output compared to other approaches as expected. /Ulf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-06 14:42 ` Ulf Magnusson @ 2011-10-06 14:45 ` Miles Bader 2011-10-06 14:48 ` Ulf Magnusson 0 siblings, 1 reply; 18+ messages in thread From: Miles Bader @ 2011-10-06 14:45 UTC (permalink / raw) To: Ulf Magnusson; +Cc: Pedro Pedruzzi, gcc Ulf Magnusson <ulfalizer@gmail.com> writes: > Might as well do > > bool overflowbit(unsigned int a, unsigned int b) { > const unsigned int sum = a + b; > return (a ^ b) & ~(a ^ sum) & 0x80; > } > > But still not very good output compared to other approaches as expected. How about: bool overflowbit2(unsigned int a, unsigned int b) { const unsigned int sum = a + b; return ~(a ^ b) & sum & 0x80; } ? I thinnnnk it has the same results as your function... [I just made a table of all 8 possibilities, and checked!] -miles -- Circus, n. A place where horses, ponies and elephants are permitted to see men, women and children acting the fool. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-06 14:45 ` Miles Bader @ 2011-10-06 14:48 ` Ulf Magnusson 2011-10-06 21:31 ` Pedro Pedruzzi 0 siblings, 1 reply; 18+ messages in thread From: Ulf Magnusson @ 2011-10-06 14:48 UTC (permalink / raw) To: Miles Bader; +Cc: Pedro Pedruzzi, gcc On Thu, Oct 6, 2011 at 11:04 AM, Miles Bader <miles@gnu.org> wrote: > Ulf Magnusson <ulfalizer@gmail.com> writes: >> Might as well do >> >> bool overflowbit(unsigned int a, unsigned int b) { >> const unsigned int sum = a + b; >> return (a ^ b) & ~(a ^ sum) & 0x80; >> } >> >> But still not very good output compared to other approaches as expected. > > How about: > > bool overflowbit2(unsigned int a, unsigned int b) > { > const unsigned int sum = a + b; > return ~(a ^ b) & sum & 0x80; > } > > ? > > I thinnnnk it has the same results as your function... > [I just made a table of all 8 possibilities, and checked!] > > -miles > > -- > Circus, n. A place where horses, ponies and elephants are permitted to see > men, women and children acting the fool. > Ops, should have been return ~(a ^ b) & (a ^ sum) & 0x80 ~(a ^ b) gives 1 in the sign bit position if the signs are the same, and (a ^ sum) gives 1 if it's different in the sum. A clearer way of writing it (that also generates suboptimal code) is bool overflow(unsigned int a, unsigned int b) { const unsigned asign = a & 0x80; const unsigned bsign = b & 0x80; const unsigned sumsign = (a + b) & 0x80; return (asign == bsign) && (asign != sumsign); } Seems bit-fiddling isn't the way to go. Maybe I should take this to gnu-help as it isn't really development-related. /Ulf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-06 14:48 ` Ulf Magnusson @ 2011-10-06 21:31 ` Pedro Pedruzzi 2011-10-07 11:52 ` Miles Bader 0 siblings, 1 reply; 18+ messages in thread From: Pedro Pedruzzi @ 2011-10-06 21:31 UTC (permalink / raw) To: Ulf Magnusson; +Cc: Miles Bader, gcc On Thu, Oct 6, 2011 at 11:04 AM, Miles Bader <miles@gnu.org> wrote: > How about: > > bool overflowbit2(unsigned int a, unsigned int b) > { > const unsigned int sum = a + b; > return ~(a ^ b) & sum & 0x80; > } > > ? > > I thinnnnk it has the same results as your function... > [I just made a table of all 8 possibilities, and checked!] Miles, it is not the same. Take for example (0xff, 0xff). In 8-bit 2's complement, this is (-1, -1) and does not overflow. Your function says it does. Em 06-10-2011 12:23, Jeremy Hall escreveu: > bool overflow(int16_t a, int16_t b) > { > const int16_t sum = a + b; > return sum > INT8_MAX || sum < INT8_MIN; > } Jeremy, here you are ignoring the problem of converting from the unsigned int (in the range 0 to 0xff) to the signed integer that it represents in 8-bit two's complement. Example: 0xff -> -1. In practice, casting the unsigned int to int8_t works in most cases, but it is compiler-defined. We are trying to find a always well-defined approach that is efficient as well. > Ops, should have been > > return ~(a ^ b) & (a ^ sum) & 0x80 > > ~(a ^ b) gives 1 in the sign bit position if the signs are the same, > and (a ^ sum) gives 1 if it's different in the sum. This is good. Do you think this is suboptimal? How are you evaluating efficiency? In x86 this generates pretty small code. <overflow2>: 400524: 8d 04 3e lea (%rsi,%rdi,1),%eax 400527: 31 f8 xor %edi,%eax 400529: 31 f7 xor %esi,%edi 40052b: f7 d7 not %edi 40052d: 21 f8 and %edi,%eax 40052f: 25 80 00 00 00 and $0x80,%eax 400534: c3 retq -- Pedro Pedruzzi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-06 21:31 ` Pedro Pedruzzi @ 2011-10-07 11:52 ` Miles Bader 2011-10-07 17:20 ` Pedro Pedruzzi 0 siblings, 1 reply; 18+ messages in thread From: Miles Bader @ 2011-10-07 11:52 UTC (permalink / raw) To: Pedro Pedruzzi; +Cc: Ulf Magnusson, gcc Pedro Pedruzzi <pedro.pedruzzi@gmail.com> writes: > On Thu, Oct 6, 2011 at 11:04 AM, Miles Bader <miles@gnu.org> wrote: >> How about: >> >> bool overflowbit2(unsigned int a, unsigned int b) >> { >> const unsigned int sum = a + b; >> return ~(a ^ b) & sum & 0x80; >> } > > Miles, it is not the same. Take for example (0xff, 0xff). In 8-bit > 2's complement, this is (-1, -1) and does not overflow. Your > function says it does. Negative overflow isn't considered overflow...? wacky... -miles -- ===== (^o^; (())) *This is the cute octopus virus, please copy it into your sig so it can spread. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-07 11:52 ` Miles Bader @ 2011-10-07 17:20 ` Pedro Pedruzzi 2011-10-07 17:35 ` Miles Bader 0 siblings, 1 reply; 18+ messages in thread From: Pedro Pedruzzi @ 2011-10-07 17:20 UTC (permalink / raw) To: Miles Bader; +Cc: Ulf Magnusson, gcc Em 07-10-2011 02:35, Miles Bader escreveu: > Pedro Pedruzzi <pedro.pedruzzi@gmail.com> writes: >> On Thu, Oct 6, 2011 at 11:04 AM, Miles Bader <miles@gnu.org> wrote: >>> How about: >>> >>> bool overflowbit2(unsigned int a, unsigned int b) >>> { >>> const unsigned int sum = a + b; >>> return ~(a ^ b) & sum & 0x80; >>> } >> >> Miles, it is not the same. Take for example (0xff, 0xff). In 8-bit >> 2's complement, this is (-1, -1) and does not overflow. Your >> function says it does. > > Negative overflow isn't considered overflow...? wacky... It is. For example -100 + -100 = -200 (less than INT8_MIN; does not fit). But -1 + -1 = -2, is ok. -- Pedro ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-07 17:20 ` Pedro Pedruzzi @ 2011-10-07 17:35 ` Miles Bader 0 siblings, 0 replies; 18+ messages in thread From: Miles Bader @ 2011-10-07 17:35 UTC (permalink / raw) To: Pedro Pedruzzi; +Cc: Ulf Magnusson, gcc 2011/10/7 Pedro Pedruzzi <pedro.pedruzzi@gmail.com>: > It is. For example -100 + -100 = -200 (less than INT8_MIN; does not > fit). But -1 + -1 = -2, is ok. Ah, now I see... -miles -- Cat is power. Cat is peace. ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <4E8DBF3C.1020700@redhat.com>]
* Re: Option to make unsigned->signed conversion always well-defined? [not found] ` <4E8DBF3C.1020700@redhat.com> @ 2011-10-06 22:08 ` Ulf Magnusson 0 siblings, 0 replies; 18+ messages in thread From: Ulf Magnusson @ 2011-10-06 22:08 UTC (permalink / raw) To: Andrew Haley; +Cc: gcc-help, gcc (I'll cross-post this to gcc and keep it on gcc-help after that.) On Thu, Oct 6, 2011 at 4:46 PM, Andrew Haley <aph@redhat.com> wrote: > > inline int8_t as_signed_8 (unsigned int a) { > a &= 0xff; > return a & 0x80 ? (int)a - 0x100 : a; > } > > int overflow(unsigned int a, unsigned int b) { > int sum = as_signed_8(a) + as_signed_8(b); > return as_signed_8(sum) != sum; > } > > Andrew. > That's a really neat trick, and seems to generate identical code. Thanks! I'd be interesting to know if this version produces equally efficient code with MSVC. To summarize what we have so far, here's four different methods along with the code generated for X86 and ARM (GCC 4.5.2): #include <inttypes.h> inline int8_t as_signed_8(unsigned int a) { a &= 0xff; return a & 0x80 ? (int)a - 0x100 : a; } bool overflow_range(unsigned int a, unsigned int b) { const int sum = as_signed_8(a) + as_signed_8(b); return sum < -128 || sum > 127; } bool overflow_bit(unsigned int a, unsigned int b) { const unsigned int sum = a + b; return ~(a ^ b) & (a ^ sum) & 0x80; } bool overflow_unsafe(unsigned int a, unsigned int b) { const unsigned int sum = (int8_t)a + (int8_t)b; return (int8_t)sum != sum; } bool overflow_safe(unsigned int a, unsigned int b) { const int sum = as_signed_8(a) + as_signed_8(b); return as_signed_8(sum) != sum; } Output for X86 with -O3 -fomit-frame-pointer: 00000000 <_Z14overflow_rangejj>: 0: 0f be 54 24 04 movsbl 0x4(%esp),%edx 5: 0f be 44 24 08 movsbl 0x8(%esp),%eax a: 8d 84 02 80 00 00 00 lea 0x80(%edx,%eax,1),%eax 11: 3d ff 00 00 00 cmp $0xff,%eax 16: 0f 97 c0 seta %al 19: c3 ret 1a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 00000020 <_Z12overflow_bitjj>: 20: 8b 54 24 08 mov 0x8(%esp),%edx 24: 8b 4c 24 04 mov 0x4(%esp),%ecx 28: 89 d0 mov %edx,%eax 2a: 31 c8 xor %ecx,%eax 2c: 01 ca add %ecx,%edx 2e: 31 ca xor %ecx,%edx 30: f7 d0 not %eax 32: 21 d0 and %edx,%eax 34: a8 80 test $0x80,%al 36: 0f 95 c0 setne %al 39: c3 ret 3a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi 00000040 <_Z15overflow_unsafejj>: 40: 0f be 54 24 08 movsbl 0x8(%esp),%edx 45: 0f be 44 24 04 movsbl 0x4(%esp),%eax 4a: 8d 04 02 lea (%edx,%eax,1),%eax 4d: 0f be d0 movsbl %al,%edx 50: 39 c2 cmp %eax,%edx 52: 0f 95 c0 setne %al 55: c3 ret 56: 8d 76 00 lea 0x0(%esi),%esi 59: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi 00000060 <_Z13overflow_safejj>: 60: 0f be 54 24 08 movsbl 0x8(%esp),%edx 65: 0f be 44 24 04 movsbl 0x4(%esp),%eax 6a: 8d 04 02 lea (%edx,%eax,1),%eax 6d: 0f be d0 movsbl %al,%edx 70: 39 c2 cmp %eax,%edx 72: 0f 95 c0 setne %al 75: c3 ret Output for ARM with -O3 -fomit-frame-pointer -mthumb -march=armv7: 00000000 <_Z14overflow_rangejj>: 0: b249 sxtb r1, r1 2: b240 sxtb r0, r0 4: 1808 adds r0, r1, r0 6: 3080 adds r0, #128 ; 0x80 8: 28ff cmp r0, #255 ; 0xff a: bf94 ite ls c: 2000 movls r0, #0 e: 2001 movhi r0, #1 10: 4770 bx lr 12: bf00 nop 14: f3af 8000 nop.w 18: f3af 8000 nop.w 1c: f3af 8000 nop.w 00000020 <_Z12overflow_bitjj>: 20: 180b adds r3, r1, r0 22: 4041 eors r1, r0 24: ea83 0200 eor.w r2, r3, r0 28: ea22 0001 bic.w r0, r2, r1 2c: f3c0 10c0 ubfx r0, r0, #7, #1 30: 4770 bx lr 32: bf00 nop 34: f3af 8000 nop.w 38: f3af 8000 nop.w 3c: f3af 8000 nop.w 00000040 <_Z15overflow_unsafejj>: 40: b242 sxtb r2, r0 42: b249 sxtb r1, r1 44: 1888 adds r0, r1, r2 46: b243 sxtb r3, r0 48: 1a18 subs r0, r3, r0 4a: bf18 it ne 4c: 2001 movne r0, #1 4e: 4770 bx lr 00000050 <_Z13overflow_safejj>: 50: b242 sxtb r2, r0 52: b249 sxtb r1, r1 54: 1888 adds r0, r1, r2 56: b243 sxtb r3, r0 58: 1a18 subs r0, r3, r0 5a: bf18 it ne 5c: 2001 movne r0, #1 5e: 4770 bx lr Not sure which version would be fastest on ARM (no device to benchmark on handy). By the way, what's a nice way to benchmark snippets like this with optimization on? If you call each function in a loop from a different compilation unit the call overhead tends to dominate. If you instead put it in the same compilation unit and inline, the compiler might do things you do not expect that renders the benchmark useless. /Ulf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-05 22:56 Option to make unsigned->signed conversion always well-defined? Ulf Magnusson ` (2 preceding siblings ...) [not found] ` <4E8DBF3C.1020700@redhat.com> @ 2011-10-06 22:46 ` Florian Weimer 2011-10-07 19:36 ` Ulf Magnusson 3 siblings, 1 reply; 18+ messages in thread From: Florian Weimer @ 2011-10-06 22:46 UTC (permalink / raw) To: Ulf Magnusson; +Cc: gcc * Ulf Magnusson: > I've been experimenting with different methods for emulating the > signed overflow of an 8-bit CPU. The method I've found that seems to > generate the most efficient code on both ARM and x86 is > > bool overflow(unsigned int a, unsigned int b) { > const unsigned int sum = (int8_t)a + (int8_t)b; > return (int8_t)sum != sum; > } There's a GCC extension which is relevant here: | For conversion to a type of width N, the value is reduced modulo 2^N | to be within range of the type; no signal is raised. <http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation> Using that, you can replace the final "& 0x80" with a signed comparison to zero, which should be give you the best possible code (for the generic RISC). You only need to hunt down a copy of Hacker's Delight or find the right bit twiddling by other means. 8-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-06 22:46 ` Florian Weimer @ 2011-10-07 19:36 ` Ulf Magnusson 2011-10-07 22:24 ` Florian Weimer 0 siblings, 1 reply; 18+ messages in thread From: Ulf Magnusson @ 2011-10-07 19:36 UTC (permalink / raw) To: Florian Weimer; +Cc: gcc On Thu, Oct 6, 2011 at 11:31 PM, Florian Weimer <fw@deneb.enyo.de> wrote: > * Ulf Magnusson: > >> I've been experimenting with different methods for emulating the >> signed overflow of an 8-bit CPU. The method I've found that seems to >> generate the most efficient code on both ARM and x86 is >> >> bool overflow(unsigned int a, unsigned int b) { >> const unsigned int sum = (int8_t)a + (int8_t)b; >> return (int8_t)sum != sum; >> } > > There's a GCC extension which is relevant here: > > | For conversion to a type of width N, the value is reduced modulo 2^N > | to be within range of the type; no signal is raised. > > <http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation> > > Using that, you can replace the final "& 0x80" with a signed > comparison to zero, which should be give you the best possible code > (for the generic RISC). You only need to hunt down a copy of Hacker's > Delight or find the right bit twiddling by other means. 8-) > Are you thinking of something like this? bool overflow_bit2(unsigned int a, unsigned int b) { const unsigned int ashift = a << 24; const unsigned int bshift = b << 24; const unsigned int sum = a + b; return (int)(~(a ^ b) & (a ^ sum)) < 0; } That version generates 80: 180b adds r3, r1, r0 82: 4041 eors r1, r0 84: ea83 0200 eor.w r2, r3, r0 88: ea22 0001 bic.w r0, r2, r1 8c: 0fc0 lsrs r0, r0, #31 8e: 4770 bx lr Whereas the unshifted version generates 40: 180b adds r3, r1, r0 42: 4041 eors r1, r0 44: ea83 0200 eor.w r2, r3, r0 48: ea22 0001 bic.w r0, r2, r1 4c: f3c0 10c0 ubfx r0, r0, #7, #1 50: 4770 bx lr So maybe a bit better. (I'm no ARM pro, but the compiler does seem to take advantage of the fact that it's testing the real sign bit at least.) Btw, & 0x80000000 generates the same code. /Ulf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-07 19:36 ` Ulf Magnusson @ 2011-10-07 22:24 ` Florian Weimer 2011-10-07 22:48 ` Ulf Magnusson 0 siblings, 1 reply; 18+ messages in thread From: Florian Weimer @ 2011-10-07 22:24 UTC (permalink / raw) To: Ulf Magnusson; +Cc: gcc * Ulf Magnusson: > Are you thinking of something like this? > > bool overflow_bit2(unsigned int a, unsigned int b) { > const unsigned int ashift = a << 24; > const unsigned int bshift = b << 24; > const unsigned int sum = a + b; > return (int)(~(a ^ b) & (a ^ sum)) < 0; > } Yes, but rather like : bool overflow_bit2(unsigned char a, unsigned char b) { const unsigned char sum = a + b; return ((signed char)(~(a ^ b) & (a ^ sum))) < 0; } It still results in abysmal code, given that this should result in two or three instructions on most architectures. Are machine code insertions an option? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-07 22:24 ` Florian Weimer @ 2011-10-07 22:48 ` Ulf Magnusson 2011-10-07 23:36 ` Florian Weimer 0 siblings, 1 reply; 18+ messages in thread From: Ulf Magnusson @ 2011-10-07 22:48 UTC (permalink / raw) To: Florian Weimer; +Cc: gcc On Fri, Oct 7, 2011 at 7:35 PM, Florian Weimer <fw@deneb.enyo.de> wrote: > * Ulf Magnusson: > >> Are you thinking of something like this? >> >> bool overflow_bit2(unsigned int a, unsigned int b) { >> const unsigned int ashift = a << 24; >> const unsigned int bshift = b << 24; >> const unsigned int sum = a + b; >> return (int)(~(a ^ b) & (a ^ sum)) < 0; >> } > > Yes, but rather like : > > bool overflow_bit2(unsigned char a, unsigned char b) { > const unsigned char sum = a + b; > return ((signed char)(~(a ^ b) & (a ^ sum))) < 0; > } > > It still results in abysmal code, given that this should result in two > or three instructions on most architectures. > > Are machine code insertions an option? > Tried that version, but it seems to generate worse (or bigger anyway - haven't benchmarked it) code: 90: eb01 0c00 add.w ip, r1, r0 94: b2c2 uxtb r2, r0 96: ea82 030c eor.w r3, r2, ip 9a: ea82 0101 eor.w r1, r2, r1 9e: ea23 0001 bic.w r0, r3, r1 a2: f3c0 10c0 ubfx r0, r0, #7, #1 a6: 4770 bx lr a8: f3af 8000 nop.w ac: f3af 8000 nop.w Good machine code would be fun to see, though I might need to brush up on my ARM. /Ulf ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? 2011-10-07 22:48 ` Ulf Magnusson @ 2011-10-07 23:36 ` Florian Weimer 0 siblings, 0 replies; 18+ messages in thread From: Florian Weimer @ 2011-10-07 23:36 UTC (permalink / raw) To: Ulf Magnusson; +Cc: gcc * Ulf Magnusson: > Good machine code would be fun to see, though I might need to brush up > on my ARM. It turns out that ARM doesn't seem to have 8-bit overflow detection, so something like this has to be used (with the arguments in R1 and R2, result in the lower bit of R3): MOV R3, R1, LSL #24 ADDS R3, R2, R3, LSL #24 ADDVS R3, R3, #1 Not sure if this is correct, I don't really know ARM assembly. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Option to make unsigned->signed conversion always well-defined? @ 2011-10-06 18:24 Jeremy Hall 0 siblings, 0 replies; 18+ messages in thread From: Jeremy Hall @ 2011-10-06 18:24 UTC (permalink / raw) To: gcc Hi. Instead of all the clever bit twiddling I have used code similar to sum > UINT8_MAX which just generates cmp ax,255 seta al which seems to be far more efficient (even the signed version gets optimized down to the above single check). Please could someone tell me if I have missed something here???? Signed check: bool overflow(int16_t a, int16_t b) { const int16_t sum = a + b; return sum > INT8_MAX || sum < INT8_MIN; } Unsigned check bool overflow(uint16_t a, uint16_t b) { const uint16_t sum = a + b; return sum > UINT8_MAX; } Jeremy ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-10-07 19:36 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-10-05 22:56 Option to make unsigned->signed conversion always well-defined? Ulf Magnusson 2011-10-06 8:25 ` Ulf Magnusson 2011-10-06 9:04 ` Pedro Pedruzzi 2011-10-06 10:52 ` Ulf Magnusson 2011-10-06 14:42 ` Ulf Magnusson 2011-10-06 14:45 ` Miles Bader 2011-10-06 14:48 ` Ulf Magnusson 2011-10-06 21:31 ` Pedro Pedruzzi 2011-10-07 11:52 ` Miles Bader 2011-10-07 17:20 ` Pedro Pedruzzi 2011-10-07 17:35 ` Miles Bader [not found] ` <4E8DBF3C.1020700@redhat.com> 2011-10-06 22:08 ` Ulf Magnusson 2011-10-06 22:46 ` Florian Weimer 2011-10-07 19:36 ` Ulf Magnusson 2011-10-07 22:24 ` Florian Weimer 2011-10-07 22:48 ` Ulf Magnusson 2011-10-07 23:36 ` Florian Weimer 2011-10-06 18:24 Jeremy Hall
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).