* optimizer discards sign information @ 2024-04-10 8:52 stefan 2024-04-10 9:16 ` Alexander Monakov 0 siblings, 1 reply; 20+ messages in thread From: stefan @ 2024-04-10 8:52 UTC (permalink / raw) To: gcc-help Hi all, I just stumbled over an issue, which is present in almost all gcc versions. I worked around using inline assembly Maybe gcc behaves correct and I am wrong? Here is the code: https://godbolt.org/z/cW8jcdh56 typedef unsigned long long int u64; typedef unsigned int u32; typedef unsigned short u16; u64 foo(u16 a, u16 b) { u32 x = a * b; u64 r = x; return r; } And on gcc 13.2 x86.64 you get foo: movzx esi, si movzx edi, di imul edi, esi movsx rax, edi ret There is a sign extension! The optimizer step discards the information x_6 = (u32) _3; and uses _3 directly instead, which is signed. Am I wrong or is it gcc? Since this is a tree pass, all targets are affected. And its already present in gcc-6.5 Best regards Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: optimizer discards sign information 2024-04-10 8:52 optimizer discards sign information stefan @ 2024-04-10 9:16 ` Alexander Monakov 2024-04-10 9:19 ` Xi Ruoyao 2024-04-10 9:24 ` stefan 0 siblings, 2 replies; 20+ messages in thread From: Alexander Monakov @ 2024-04-10 9:16 UTC (permalink / raw) To: stefan; +Cc: gcc-help [-- Attachment #1: Type: text/plain, Size: 1446 bytes --] On Wed, 10 Apr 2024, stefan@franke.ms wrote: > Hi all, > > I just stumbled over an issue, which is present in almost all gcc versions. > I worked around using inline assembly… > Maybe gcc behaves correct and I am wrong? Here is the code: > > https://godbolt.org/z/cW8jcdh56 > > typedef unsigned long long int u64; > typedef unsigned int u32; > typedef unsigned short u16; > > u64 foo(u16 a, u16 b) { > u32 x = a * b; > u64 r = x; > return r; > } > > And on gcc 13.2 x86.64 you get > > foo: > movzx esi, si > movzx edi, di > imul edi, esi > movsx rax, edi > ret > > > There is a sign extension! The optimizer step discards the information > > x_6 = (u32) _3; > > and uses _3 directly instead, which is signed. > > Am I wrong or is it gcc? GCC is not wrong. When your code computes x: u32 x = a * b; 'a' and 'b' are first promoted to int according to C language rules, and the multiplication happens in the signed int type, with UB on overflow. The compiler deduces the range of signed int temporary holding the result of the multiplication is [0, 0x7fffffff], which allows to propagate it to the assignment of 'r' (which in the end produces a sign extension, as you observed, so the propagation did not turn out to be useful). u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b' as a fix for the code. Alexander ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: optimizer discards sign information 2024-04-10 9:16 ` Alexander Monakov @ 2024-04-10 9:19 ` Xi Ruoyao 2024-04-10 9:40 ` LIU Hao 2024-04-10 9:24 ` stefan 1 sibling, 1 reply; 20+ messages in thread From: Xi Ruoyao @ 2024-04-10 9:19 UTC (permalink / raw) To: Alexander Monakov, stefan; +Cc: gcc-help On Wed, 2024-04-10 at 12:16 +0300, Alexander Monakov wrote: > > On Wed, 10 Apr 2024, stefan@franke.ms wrote: > > > Hi all, > > > > I just stumbled over an issue, which is present in almost all gcc versions. > > I worked around using inline assembly… > > Maybe gcc behaves correct and I am wrong? Here is the code: > > > > https://godbolt.org/z/cW8jcdh56 > > > > typedef unsigned long long int u64; > > typedef unsigned int u32; > > typedef unsigned short u16; > > > > u64 foo(u16 a, u16 b) { > > u32 x = a * b; > > u64 r = x; > > return r; > > } > > > > And on gcc 13.2 x86.64 you get > > > > foo: > > movzx esi, si > > movzx edi, di > > imul edi, esi > > movsx rax, edi > > ret > > > > > > There is a sign extension! The optimizer step discards the information > > > > x_6 = (u32) _3; > > > > and uses _3 directly instead, which is signed. > > > > Am I wrong or is it gcc? > > GCC is not wrong. When your code computes x: > > u32 x = a * b; > > 'a' and 'b' are first promoted to int according to C language rules, and > the multiplication happens in the signed int type, with UB on overflow. > The compiler deduces the range of signed int temporary holding the result > of the multiplication is [0, 0x7fffffff], which allows to propagate it > to the assignment of 'r' (which in the end produces a sign extension, > as you observed, so the propagation did not turn out to be useful). > > u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b' > as a fix for the code. Also note that -fsanitize=undefined detects the issue properly: $ cat t.c typedef unsigned long long int u64; typedef unsigned int u32; typedef unsigned short u16; __attribute__((noipa)) u64 foo(u16 a, u16 b) { u32 x = a * b; u64 r = x; return r; } int main() { __builtin_printf("%llx\n", foo(65535, 65535)); } $ cc t.c -O2 -fsanitize=undefined $ ./a.out t.c:7:15: runtime error: signed integer overflow: 65535 * 65535 cannot be represented in type 'int' fffe0001 -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: optimizer discards sign information 2024-04-10 9:19 ` Xi Ruoyao @ 2024-04-10 9:40 ` LIU Hao 2024-04-10 9:44 ` Xi Ruoyao 0 siblings, 1 reply; 20+ messages in thread From: LIU Hao @ 2024-04-10 9:40 UTC (permalink / raw) To: Xi Ruoyao, Alexander Monakov, stefan; +Cc: gcc-help [-- Attachment #1.1: Type: text/plain, Size: 839 bytes --] 在 2024-04-10 17:19, Xi Ruoyao via Gcc-help 写道: > $ cc t.c -O2 -fsanitize=undefined > $ ./a.out > t.c:7:15: runtime error: signed integer overflow: 65535 * 65535 cannot be represented in type 'int' > fffe0001 Undefined behavior is not a valid point, as it never happens. It's a real bug. There are many PRs on bugzilla. The sign extension could have been eliminated completely: movzx eax, si movzx edi, di imul eax, edi ret And here is a similar issue: typedef unsigned long long int u64; typedef unsigned int u32; u64 foo(u64 a) { return (u32) __builtin_ctzll(a); } which results in xor eax, eax rep bsf rax, rdi // effectively `tzcnt rax, rdi` cdqe // unnecessary sign-extension ret -- Best regards, LIU Hao [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: optimizer discards sign information 2024-04-10 9:40 ` LIU Hao @ 2024-04-10 9:44 ` Xi Ruoyao 2024-04-10 9:51 ` LIU Hao 0 siblings, 1 reply; 20+ messages in thread From: Xi Ruoyao @ 2024-04-10 9:44 UTC (permalink / raw) To: LIU Hao, Alexander Monakov, stefan; +Cc: gcc-help On Wed, 2024-04-10 at 17:40 +0800, LIU Hao wrote: > 在 2024-04-10 17:19, Xi Ruoyao via Gcc-help 写道: > > $ cc t.c -O2 -fsanitize=undefined > > $ ./a.out > > t.c:7:15: runtime error: signed integer overflow: 65535 * 65535 > > cannot be represented in type 'int' > > fffe0001 > > Undefined behavior is not a valid point, as it never happens. You only get a "different result" when an undefined behavior happens, thus it **is** a valid point to say there is no wrong-code issue. > It's a real bug. There are many PRs on bugzilla. You may argue it's a missed-optimization, but we were discussing about wrong-code or not. > The sign extension could have been eliminated completely: > > movzx eax, si > movzx edi, di > imul eax, edi > ret > > > And here is a similar issue: > > typedef unsigned long long int u64; > typedef unsigned int u32; > > u64 foo(u64 a) { > return (u32) __builtin_ctzll(a); > } > > which results in > > xor eax, eax > rep bsf rax, rdi // effectively `tzcnt rax, rdi` > cdqe // unnecessary sign-extension > ret -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: optimizer discards sign information 2024-04-10 9:44 ` Xi Ruoyao @ 2024-04-10 9:51 ` LIU Hao 2024-04-10 9:52 ` Xi Ruoyao 2024-04-10 10:03 ` AW: " stefan 0 siblings, 2 replies; 20+ messages in thread From: LIU Hao @ 2024-04-10 9:51 UTC (permalink / raw) To: Xi Ruoyao, Alexander Monakov, stefan; +Cc: gcc-help [-- Attachment #1.1: Type: text/plain, Size: 549 bytes --] 在 2024-04-10 17:44, Xi Ruoyao 写道: > You only get a "different result" when an undefined behavior happens, > thus it **is** a valid point to say there is no wrong-code issue. > >> It's a real bug. There are many PRs on bugzilla. > > You may argue it's a missed-optimization, but we were discussing about > wrong-code or not. Nobody in this thread has been thinking it's wrong code. When there is no overflow, it's missed optimization. When there is an overflow, I don't care. Do you agree? -- Best regards, LIU Hao [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: optimizer discards sign information 2024-04-10 9:51 ` LIU Hao @ 2024-04-10 9:52 ` Xi Ruoyao 2024-04-10 10:07 ` LIU Hao 2024-04-10 10:03 ` AW: " stefan 1 sibling, 1 reply; 20+ messages in thread From: Xi Ruoyao @ 2024-04-10 9:52 UTC (permalink / raw) To: LIU Hao, Alexander Monakov, stefan; +Cc: gcc-help On Wed, 2024-04-10 at 17:51 +0800, LIU Hao wrote: > 在 2024-04-10 17:44, Xi Ruoyao 写道: > > You only get a "different result" when an undefined behavior > > happens, > > thus it **is** a valid point to say there is no wrong-code issue. > > > > > It's a real bug. There are many PRs on bugzilla. > > > > You may argue it's a missed-optimization, but we were discussing > > about > > wrong-code or not. > > Nobody in this thread has been thinking it's wrong code. No, the OP is still thinking it's a wrong-code. See the latest reply from him. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: optimizer discards sign information 2024-04-10 9:52 ` Xi Ruoyao @ 2024-04-10 10:07 ` LIU Hao 2024-04-10 10:17 ` Xi Ruoyao 0 siblings, 1 reply; 20+ messages in thread From: LIU Hao @ 2024-04-10 10:07 UTC (permalink / raw) To: Xi Ruoyao; +Cc: gcc-help [-- Attachment #1.1: Type: text/plain, Size: 571 bytes --] 在 2024-04-10 17:52, Xi Ruoyao via Gcc-help 写道: > No, the OP is still thinking it's a wrong-code. Would you read the Subject please? u32 x = a * b; u64 r = x; return r; This is same as u32 x = (int) a * (int) b; u64 r = x; return r; and return (u64)(u32) ((int) a * (int) b); The code requests an `int` be zero-extended to a `u64` (if the result is written to EAX then this is no-op), but GCC performs sign extension anyway. Do you still consider it not a bug? -- Best regards, LIU Hao [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: optimizer discards sign information 2024-04-10 10:07 ` LIU Hao @ 2024-04-10 10:17 ` Xi Ruoyao 0 siblings, 0 replies; 20+ messages in thread From: Xi Ruoyao @ 2024-04-10 10:17 UTC (permalink / raw) To: LIU Hao; +Cc: gcc-help On Wed, 2024-04-10 at 18:07 +0800, LIU Hao wrote: > 在 2024-04-10 17:52, Xi Ruoyao via Gcc-help 写道: > > No, the OP is still thinking it's a wrong-code. > > Would you read the Subject please? > > > u32 x = a * b; > u64 r = x; > return r; > > This is same as > > u32 x = (int) a * (int) b; > u64 r = x; > return r; > > and > > return (u64)(u32) ((int) a * (int) b); > > > The code requests an `int` be zero-extended to a `u64` (if the result is written to EAX then this is > no-op), but GCC performs sign extension anyway. Do you still consider it not a bug? It is a missed-optimization, but not wrong code. The code does not requests an *arbitrary* int to be zero-extended to a u64. It requests a value which can be proven as non-negative to be zero-extended. Thus doing a sign-extension may be sub-optimal (depending on the context and the target feature etc), but not wrong. u32 x = (int) a * (int) b; Here a and b are u16, so both (int) a and (int) b are non-negative. And since signed-overflow is UB, x is non-negative too. And for a non- negative int x, (u64)x is just same as (u64)(u32)x. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 20+ messages in thread
* AW: optimizer discards sign information 2024-04-10 9:51 ` LIU Hao 2024-04-10 9:52 ` Xi Ruoyao @ 2024-04-10 10:03 ` stefan 2024-04-10 10:34 ` Xi Ruoyao 1 sibling, 1 reply; 20+ messages in thread From: stefan @ 2024-04-10 10:03 UTC (permalink / raw) To: gcc-help > -----Ursprüngliche Nachricht----- > Von: LIU Hao <lh_mouse@126.com> > Gesendet: Mittwoch, 10. April 2024 11:51 > An: Xi Ruoyao <xry111@xry111.site>; Alexander Monakov > <amonakov@ispras.ru>; stefan@franke.ms > Cc: gcc-help@gcc.gnu.org > Betreff: Re: optimizer discards sign information > > 在 2024-04-10 17:44, Xi Ruoyao 写道: > > You only get a "different result" when an undefined behavior happens, > > thus it **is** a valid point to say there is no wrong-code issue. > > > >> It's a real bug. There are many PRs on bugzilla. > > > > You may argue it's a missed-optimization, but we were discussing about > > wrong-code or not. > > Nobody in this thread has been thinking it's wrong code. > > When there is no overflow, it's missed optimization. When there is an > overflow, I don't care. > > Do you agree? > > > -- > Best regards, > LIU Hao Yes, there is an overflow when the value gets assigned to x u32 x = *a * *b; And after that line of code, x is a valid unsigned int, no matter what value was assigned. And the compiler must not throw away that unsignedness. Also an add can overflow: u64 faa(int a, int b) { u32 x = a + b; u64 r = x; And in this case the optimizer doesn't discard the variable x With the multiplication the optimizer kills it: int _5; _5 = _2 * _4; x_9 = (u32) _5; r_10 = (u64) x_9; becomes: int _5; _5 = _2 * _4; r_10 = (u64) _5; The cast to (u32) is an important information, which IMHO must not be discarded. Regards Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AW: optimizer discards sign information 2024-04-10 10:03 ` AW: " stefan @ 2024-04-10 10:34 ` Xi Ruoyao 0 siblings, 0 replies; 20+ messages in thread From: Xi Ruoyao @ 2024-04-10 10:34 UTC (permalink / raw) To: stefan, gcc-help On Wed, 2024-04-10 at 12:03 +0200, stefan@franke.ms wrote: > Yes, there is an overflow when the value gets assigned to x > > u32 x = *a * *b; > > And after that line of code, x is a valid unsigned int, no matter what > value was assigned. And the compiler must not throw away that > unsignedness. If *a * *b does not overflow, x is a valid unsigned int. But if *a * *b overflows, x is not valid, at all. Its type does not matter. > Also an add can overflow: > > u64 faa(int a, int b) { > u32 x = a + b; > u64 r = x; > > And in this case the optimizer doesn't discard the variable x Only an overflow on *arithmetic operation* is undefined behavior. An overflow on *conversion* is not. In fact an "overflow on conversion" is even not referred as "overflow" in the standard. So in this case if a and b are both -1, x *must* be 0xfffffffdU, r *must* be 0xfffffffdULL, and there's no undefined behavior. So it will be incorrect (i.e. violating the standard, not "different from what a person thinks") to use a signed extension here, and the compiler does not do that. But for u16 a, b; u32 x = (int)a * (int)b; (int)a and (int)b must be non-negative, and since an overflow on multiplication is UB, (int)a * (int)b must be non-negative too. So it's valid (i.e. allowed by the standard, not "doing exactly what a person thinks") to use a signed extension (though maybe it's not optimal, and we may have a missed-optimization here). -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 20+ messages in thread
* AW: optimizer discards sign information 2024-04-10 9:16 ` Alexander Monakov 2024-04-10 9:19 ` Xi Ruoyao @ 2024-04-10 9:24 ` stefan 2024-04-10 9:49 ` stefan 2024-04-10 11:52 ` David Brown 1 sibling, 2 replies; 20+ messages in thread From: stefan @ 2024-04-10 9:24 UTC (permalink / raw) To: gcc-help > -----Ursprüngliche Nachricht----- > Von: Alexander Monakov <amonakov@ispras.ru> > Gesendet: Mittwoch, 10. April 2024 11:17 > An: stefan@franke.ms > Cc: gcc-help@gcc.gnu.org > Betreff: Re: optimizer discards sign information > > > On Wed, 10 Apr 2024, stefan@franke.ms wrote: > > > Hi all, > > > > I just stumbled over an issue, which is present in almost all gcc versions. > > I worked around using inline assembly… Maybe gcc behaves correct and I > > am wrong? Here is the code: > > > > https://godbolt.org/z/cW8jcdh56 > > > > typedef unsigned long long int u64; > > typedef unsigned int u32; > > typedef unsigned short u16; > > > > u64 foo(u16 a, u16 b) { > > u32 x = a * b; > > u64 r = x; > > return r; > > } > > > > And on gcc 13.2 x86.64 you get > > > > foo: > > movzx esi, si > > movzx edi, di > > imul edi, esi > > movsx rax, edi > > ret > > > > > > There is a sign extension! The optimizer step discards the information > > > > x_6 = (u32) _3; > > > > and uses _3 directly instead, which is signed. > > > > Am I wrong or is it gcc? > > GCC is not wrong. When your code computes x: > > u32 x = a * b; > > 'a' and 'b' are first promoted to int according to C language rules, and the > multiplication happens in the signed int type, with UB on overflow. > The compiler deduces the range of signed int temporary holding the result of > the multiplication is [0, 0x7fffffff], which allows to propagate it to the > assignment of 'r' (which in the end produces a sign extension, as you > observed, so the propagation did not turn out to be useful). > > u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b' > as a fix for the code. > > Alexander Thank you for the fix 😊 I considered u32 x = a * b; as good enough, since from my understanding, x *is* unsigned. Adding the multiplication with 1u resolves this. regards Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* AW: optimizer discards sign information 2024-04-10 9:24 ` stefan @ 2024-04-10 9:49 ` stefan 2024-04-10 9:54 ` Xi Ruoyao 2024-04-10 9:57 ` LIU Hao 2024-04-10 11:52 ` David Brown 1 sibling, 2 replies; 20+ messages in thread From: stefan @ 2024-04-10 9:49 UTC (permalink / raw) To: gcc-help > -----Ursprüngliche Nachricht----- > Von: Gcc-help <gcc-help-bounces+bebbo=bejy.net@gcc.gnu.org> Im Auftrag > von stefan@franke.ms > Gesendet: Mittwoch, 10. April 2024 11:25 > An: gcc-help@gcc.gnu.org > Betreff: AW: optimizer discards sign information > > > -----Ursprüngliche Nachricht----- > > Von: Alexander Monakov <amonakov@ispras.ru> > > Gesendet: Mittwoch, 10. April 2024 11:17 > > An: stefan@franke.ms > > Cc: gcc-help@gcc.gnu.org > > Betreff: Re: optimizer discards sign information > > > > > > On Wed, 10 Apr 2024, stefan@franke.ms wrote: > > > > > Hi all, > > > > > > I just stumbled over an issue, which is present in almost all gcc versions. > > > I worked around using inline assembly… Maybe gcc behaves correct and > > > I am wrong? Here is the code: > > > > > > https://godbolt.org/z/cW8jcdh56 > > > > > > typedef unsigned long long int u64; > > > typedef unsigned int u32; > > > typedef unsigned short u16; > > > > > > u64 foo(u16 a, u16 b) { > > > u32 x = a * b; > > > u64 r = x; > > > return r; > > > } > > > > > > And on gcc 13.2 x86.64 you get > > > > > > foo: > > > movzx esi, si > > > movzx edi, di > > > imul edi, esi > > > movsx rax, edi > > > ret > > > > > > > > > There is a sign extension! The optimizer step discards the > > > information > > > > > > x_6 = (u32) _3; > > > > > > and uses _3 directly instead, which is signed. > > > > > > Am I wrong or is it gcc? > > > > GCC is not wrong. When your code computes x: > > > > u32 x = a * b; > > > > 'a' and 'b' are first promoted to int according to C language rules, > > and the multiplication happens in the signed int type, with UB on overflow. > > The compiler deduces the range of signed int temporary holding the > > result of the multiplication is [0, 0x7fffffff], which allows to > > propagate it to the assignment of 'r' (which in the end produces a > > sign extension, as you observed, so the propagation did not turn out to be > useful). > > > > u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b' > > as a fix for the code. > > > > Alexander > > Thank you for the fix 😊 > I considered > > u32 x = a * b; > > as good enough, since from my understanding, x *is* unsigned. > > Adding the multiplication with 1u resolves this. > > regards > > Stefan But I keep considering this as a bug. And clang behaves correctly! https://godbolt.org/z/az8WqboET typedef unsigned long long int u64; typedef unsigned int u32; typedef unsigned short u16; u64 foo(u16 *a, u16 *b) { u32 x = *a * *b; u64 r = x; return r >> 31; } gcc yields foo: xor eax, eax ret clang yields foo: # @foo movzx ecx, word ptr [rdi] movzx eax, word ptr [rsi] imul eax, ecx shr eax, 31 ret regards Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AW: optimizer discards sign information 2024-04-10 9:49 ` stefan @ 2024-04-10 9:54 ` Xi Ruoyao 2024-04-10 9:57 ` LIU Hao 1 sibling, 0 replies; 20+ messages in thread From: Xi Ruoyao @ 2024-04-10 9:54 UTC (permalink / raw) To: stefan, gcc-help On Wed, 2024-04-10 at 11:49 +0200, stefan@franke.ms wrote: > > But I keep considering this as a bug. And clang behaves correctly! No it is not. Both compilers are correct as anything can happen for an undefined behavior. > > https://godbolt.org/z/az8WqboET > > typedef unsigned long long int u64; > typedef unsigned int u32; > typedef unsigned short u16; > > u64 foo(u16 *a, u16 *b) { > u32 x = *a * *b; > u64 r = x; > return r >> 31; > } > > gcc yields > > foo: > xor eax, eax > ret In this case GCC can assume (*a * *b) must be in [0, 2147483647] because otherwise there is an undefined behavior. Thus r is in [0, 2147483647], so r >> 31 must be 0. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AW: optimizer discards sign information 2024-04-10 9:49 ` stefan 2024-04-10 9:54 ` Xi Ruoyao @ 2024-04-10 9:57 ` LIU Hao 2024-04-10 10:03 ` Xi Ruoyao 1 sibling, 1 reply; 20+ messages in thread From: LIU Hao @ 2024-04-10 9:57 UTC (permalink / raw) To: stefan, gcc-help [-- Attachment #1.1: Type: text/plain, Size: 1001 bytes --] 在 2024-04-10 17:49, stefan@franke.ms 写道: > But I keep considering this as a bug. And clang behaves correctly! Yes there have been many reports [1]. It's a missed optimization. You may work around it by using 32-bit parameters, or casting either operand to u32; casting the result will not help. [1] https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&cf_gcctarget=x86_64&cf_gcctarget_type=allwordssubstr&cf_known_to_fail_type=allwords&cf_known_to_work_type=allwords&f0=OP&f1=OP&f2=product&f3=component&f4=alias&f5=short_desc&f7=content&f8=CP&f9=CP&j1=OR&list_id=423756&o2=substring&o3=substring&o4=substring&o5=substring&o6=substring&o7=matches&query_format=advanced&short_desc=sign%20extension&short_desc_type=allwordssubstr&v2=sign-extension&v3=sign-extension&v4=sign-extension&v5=sign-extension&v6=sign-extension&v7="sign-extension" -- Best regards, LIU Hao [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 840 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AW: optimizer discards sign information 2024-04-10 9:57 ` LIU Hao @ 2024-04-10 10:03 ` Xi Ruoyao 0 siblings, 0 replies; 20+ messages in thread From: Xi Ruoyao @ 2024-04-10 10:03 UTC (permalink / raw) To: LIU Hao, stefan, gcc-help On Wed, 2024-04-10 at 17:57 +0800, LIU Hao via Gcc-help wrote: > 在 2024-04-10 17:49, stefan@franke.ms 写道: > > But I keep considering this as a bug. And clang behaves correctly! > > Yes there have been many reports [1]. It's a missed optimization. Note that for this specific case: typedef unsigned long long int u64; typedef unsigned int u32; typedef unsigned short u16; u64 foo(u16 *a, u16 *b) { u32 x = *a * *b; u64 r = x; return r >> 31; } gcc yields foo: xor eax, eax ret clang yields foo: # @foo movzx ecx, word ptr [rdi] movzx eax, word ptr [rsi] imul eax, ecx shr eax, 31 ret It's actually a missed-optimization of **clang**. Optimizing this function to always return 0 **is** correct. But for the general case: u64 foo(u16 a, u16 b) { u32 x = a * b; u64 r = x; return r; } there is a missed-optimization of GCC (redundant sign extension). > You may work around it by using 32-bit parameters, or casting either > operand to u32; casting the result will not help. Indeed. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AW: optimizer discards sign information 2024-04-10 9:24 ` stefan 2024-04-10 9:49 ` stefan @ 2024-04-10 11:52 ` David Brown 2024-04-10 14:25 ` Stefan Franke 1 sibling, 1 reply; 20+ messages in thread From: David Brown @ 2024-04-10 11:52 UTC (permalink / raw) To: stefan, gcc-help On 10/04/2024 11:24, stefan@franke.ms wrote: >> -----Ursprüngliche Nachricht----- >> Von: Alexander Monakov <amonakov@ispras.ru> >> Gesendet: Mittwoch, 10. April 2024 11:17 >> An: stefan@franke.ms >> Cc: gcc-help@gcc.gnu.org >> Betreff: Re: optimizer discards sign information >> >> >> On Wed, 10 Apr 2024, stefan@franke.ms wrote: >> >>> Hi all, >>> >>> I just stumbled over an issue, which is present in almost all gcc versions. >>> I worked around using inline assembly… Maybe gcc behaves correct and I >>> am wrong? Here is the code: >>> >>> https://godbolt.org/z/cW8jcdh56 >>> >>> typedef unsigned long long int u64; >>> typedef unsigned int u32; >>> typedef unsigned short u16; >>> >>> u64 foo(u16 a, u16 b) { >>> u32 x = a * b; >>> u64 r = x; >>> return r; >>> } >>> >>> And on gcc 13.2 x86.64 you get >>> >>> foo: >>> movzx esi, si >>> movzx edi, di >>> imul edi, esi >>> movsx rax, edi >>> ret >>> >>> >>> There is a sign extension! The optimizer step discards the information >>> >>> x_6 = (u32) _3; >>> >>> and uses _3 directly instead, which is signed. >>> >>> Am I wrong or is it gcc? >> >> GCC is not wrong. When your code computes x: >> >> u32 x = a * b; >> >> 'a' and 'b' are first promoted to int according to C language rules, and the >> multiplication happens in the signed int type, with UB on overflow. >> The compiler deduces the range of signed int temporary holding the result of >> the multiplication is [0, 0x7fffffff], which allows to propagate it to the >> assignment of 'r' (which in the end produces a sign extension, as you >> observed, so the propagation did not turn out to be useful). >> >> u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b' >> as a fix for the code. >> >> Alexander > > Thank you for the fix 😊 > I considered > > u32 x = a * b; > > as good enough, since from my understanding, x *is* unsigned. > > Adding the multiplication with 1u resolves this. > From the wording you use, I think perhaps you have (or had) two misunderstandings about the way C works here. First, when you have an expression "x = y", the type of "x" is irrelevant to the evaluation of the expression "y", its value, and the validity of the value. Secondly, if you hit undefined behaviour somewhere, /everything/ afterwards is undefined behaviour. You cannot "correct" it, or force it in some ways to establish properties about it. And the compiler can assume it does not happen (or you don't care about anything that might happen). This lets it optimise based on these assumptions. So the fact that "x" is declared as an unsigned type is basically irrelevant. The semantics of the code "u32 x = a * b;", for the type sizes of x86-64, are that the compiler must take the u16 value in "a" and convert it into a 32-bit "int" preserving its value. It does the same with "b". It knows these are two 32-bit signed ints with values between 0 and 0xffff. Then it is asked to multiply them. It knows the result does not overflow - because overflow in the operation would be UB. Thus it knows the result is between 0 and 0x7fff'ffff. This is then converted to a u32. The compiler can do this any way it likes, as long as the value is conserved for all possible values (0 to 0x7fff'ffff). Typically it will be a no-op. Then this is converted to a u64, and again the compiler can do so in any manner that is value conserving for all possible values. Since it knows the value is in the range 0 to 0x7fff'ffff, then either a sign-extension from 32-bit to 64-bit, or a zero extension, will work fine. (It could also explicitly set the upper 32-bit to 0, or bitwise-and the register with 0x0000'0000'7fff'ffff, or shift it 33 bits left then 33 bits right, or any other operation that gives the same result in the end. That's a question of optimisation and efficiency, not correctness.) If it were to matter whether the compiler used sign extension or zero extension, you would have already have executed undefined behaviour - and all bets are off. Thus it /doesn't/ matter which is used - the compiler can use either method, whichever it thinks is most efficient. (And if it picked the less efficient one, that's a missed optimisation bug, not a code correctness bug.) And if you try to rely on the effects of UB in your code, then your code has a bug. Different compilers and different options may give you different end results - undefined behaviour means /anything/ can happen, including the thing you wanted to happen! This particular case is a bit subtle, and it's easy to get caught out - after all, it's using unsigned types everywhere, and the common myth is that unsigned types don't have undefined behaviour. The reality is that it is /operations/ that have defined or undefined behaviour, not the types, and there are no arithmetic operations on types smaller than "int". "uint8_t" and "uint16_t" silently promote to "int" (on 32-bit or 64-bit systems), and operations on /int/ can have undefined behaviour. I hope that clears up any remaining misunderstandings - and I hope it didn't sound patronising about things that you already knew. David ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AW: optimizer discards sign information 2024-04-10 11:52 ` David Brown @ 2024-04-10 14:25 ` Stefan Franke 2024-04-10 16:51 ` David Brown 2024-04-11 0:32 ` Oleg Endo 0 siblings, 2 replies; 20+ messages in thread From: Stefan Franke @ 2024-04-10 14:25 UTC (permalink / raw) To: gcc-help [-- Attachment #1: Type: text/plain, Size: 5828 bytes --] Am 10. April 2024 13:52:39 MESZ schrieb David Brown via Gcc-help <gcc-help@gcc.gnu.org>: >On 10/04/2024 11:24, stefan@franke.ms wrote: >>> -----Ursprüngliche Nachricht----- >>> Von: Alexander Monakov <amonakov@ispras.ru> >>> Gesendet: Mittwoch, 10. April 2024 11:17 >>> An: stefan@franke.ms >>> Cc: gcc-help@gcc.gnu.org >>> Betreff: Re: optimizer discards sign information >>> >>> >>> On Wed, 10 Apr 2024, stefan@franke.ms wrote: >>> >>>> Hi all, >>>> >>>> I just stumbled over an issue, which is present in almost all gcc versions. >>>> I worked around using inline assembly… Maybe gcc behaves correct and I >>>> am wrong? Here is the code: >>>> >>>> https://godbolt.org/z/cW8jcdh56 >>>> >>>> typedef unsigned long long int u64; >>>> typedef unsigned int u32; >>>> typedef unsigned short u16; >>>> >>>> u64 foo(u16 a, u16 b) { >>>> u32 x = a * b; >>>> u64 r = x; >>>> return r; >>>> } >>>> >>>> And on gcc 13.2 x86.64 you get >>>> >>>> foo: >>>> movzx esi, si >>>> movzx edi, di >>>> imul edi, esi >>>> movsx rax, edi >>>> ret >>>> >>>> >>>> There is a sign extension! The optimizer step discards the information >>>> >>>> x_6 = (u32) _3; >>>> >>>> and uses _3 directly instead, which is signed. >>>> >>>> Am I wrong or is it gcc? >>> >>> GCC is not wrong. When your code computes x: >>> >>> u32 x = a * b; >>> >>> 'a' and 'b' are first promoted to int according to C language rules, and the >>> multiplication happens in the signed int type, with UB on overflow. >>> The compiler deduces the range of signed int temporary holding the result of >>> the multiplication is [0, 0x7fffffff], which allows to propagate it to the >>> assignment of 'r' (which in the end produces a sign extension, as you >>> observed, so the propagation did not turn out to be useful). >>> >>> u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b' >>> as a fix for the code. >>> >>> Alexander >> >> Thank you for the fix 😊 >> I considered >> >> u32 x = a * b; >> >> as good enough, since from my understanding, x *is* unsigned. >> >> Adding the multiplication with 1u resolves this. >> > >From the wording you use, I think perhaps you have (or had) two misunderstandings about the way C works here. First, when you have an expression "x = y", the type of "x" is irrelevant to the evaluation of the expression "y", its value, and the validity of the value. > >Secondly, if you hit undefined behaviour somewhere, /everything/ afterwards is undefined behaviour. You cannot "correct" it, or force it in some ways to establish properties about it. And the compiler can assume it does not happen (or you don't care about anything that might happen). This lets it optimise based on these assumptions. > >So the fact that "x" is declared as an unsigned type is basically irrelevant. The semantics of the code "u32 x = a * b;", for the type sizes of x86-64, are that the compiler must take the u16 value in "a" and convert it into a 32-bit "int" preserving its value. It does the same with "b". It knows these are two 32-bit signed ints with values between 0 and 0xffff. > >Then it is asked to multiply them. It knows the result does not overflow - because overflow in the operation would be UB. Thus it knows the result is between 0 and 0x7fff'ffff. > >This is then converted to a u32. The compiler can do this any way it likes, as long as the value is conserved for all possible values (0 to 0x7fff'ffff). Typically it will be a no-op. > >Then this is converted to a u64, and again the compiler can do so in any manner that is value conserving for all possible values. Since it knows the value is in the range 0 to 0x7fff'ffff, then either a sign-extension from 32-bit to 64-bit, or a zero extension, will work fine. (It could also explicitly set the upper 32-bit to 0, or bitwise-and the register with 0x0000'0000'7fff'ffff, or shift it 33 bits left then 33 bits right, or any other operation that gives the same result in the end. That's a question of optimisation and efficiency, not correctness.) > >If it were to matter whether the compiler used sign extension or zero extension, you would have already have executed undefined behaviour - and all bets are off. Thus it /doesn't/ matter which is used - the compiler can use either method, whichever it thinks is most efficient. (And if it picked the less efficient one, that's a missed optimisation bug, not a code correctness bug.) > >And if you try to rely on the effects of UB in your code, then your code has a bug. Different compilers and different options may give you different end results - undefined behaviour means /anything/ can happen, including the thing you wanted to happen! > > >This particular case is a bit subtle, and it's easy to get caught out - after all, it's using unsigned types everywhere, and the common myth is that unsigned types don't have undefined behaviour. The reality is that it is /operations/ that have defined or undefined behaviour, not the types, and there are no arithmetic operations on types smaller than "int". "uint8_t" and "uint16_t" silently promote to "int" (on 32-bit or 64-bit systems), and operations on /int/ can have undefined behaviour. > > >I hope that clears up any remaining misunderstandings - and I hope it didn't sound patronising about things that you already knew. > >David > > Thank you for the detailed explanation. It still doesn't fit together, since if the compiler knows that it's unsigned, why does it emit code for sign extension on the m68k target? Well. Since there are enough explanations and my expectations are simply wrong consider this as closed. Kind regards Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AW: optimizer discards sign information 2024-04-10 14:25 ` Stefan Franke @ 2024-04-10 16:51 ` David Brown 2024-04-11 0:32 ` Oleg Endo 1 sibling, 0 replies; 20+ messages in thread From: David Brown @ 2024-04-10 16:51 UTC (permalink / raw) To: Stefan Franke, gcc-help On 10/04/2024 16:25, Stefan Franke wrote: > > Thank you for the detailed explanation. I'm glad if it helped you a little. > It still doesn't fit together, since if the compiler knows that it's unsigned, why does it emit code for sign extension on the m68k target? > It's nothing more than a sub-optimal code generation issue, since sign extension and zero extension are equally correct. gcc's code generation is pretty good by any standard, but it is not always perfectly optimal. Maybe it's hard to improve this one without a lot of effort or knock-on effects. Maybe the circumstances occur so rarely it is not worth the effort. Maybe the sign extension is actually faster (or no slower) on some targets - sometimes real-world speeds are not what we expect by looking at the assembly. (I thought you were using x86-64, not m68k? The m68k target is not nearly as popular as it used to be in the early days of gcc. My first use of gcc was building a cross-compiler of gcc 2.7 to run on DOS for a 68332 target. Happy days :-) ) > Well. Since there are enough explanations and my expectations are simply wrong consider this as closed. > > Kind regards > > Stefan > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AW: optimizer discards sign information 2024-04-10 14:25 ` Stefan Franke 2024-04-10 16:51 ` David Brown @ 2024-04-11 0:32 ` Oleg Endo 1 sibling, 0 replies; 20+ messages in thread From: Oleg Endo @ 2024-04-11 0:32 UTC (permalink / raw) To: Stefan Franke, gcc-help On Wed, 2024-04-10 at 16:25 +0200, Stefan Franke wrote: > > Thank you for the detailed explanation. > It still doesn't fit together, since if the compiler knows that it's unsigned, why does it emit code for sign extension on the m68k target? > > Well. Since there are enough explanations and my expectations are simply wrong consider this as closed. > > By the way, on targets that can do widening multiplication the sign extensions are omitted. E.g. on SH -- https://godbolt.org/z/7hr4njEx8 _foo: mulu.w r4,r5 sts macl,r1 sts macl,r0 shll r1 rts subc r1,r1 Best regards, Oleg Endo ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-04-11 0:32 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-10 8:52 optimizer discards sign information stefan 2024-04-10 9:16 ` Alexander Monakov 2024-04-10 9:19 ` Xi Ruoyao 2024-04-10 9:40 ` LIU Hao 2024-04-10 9:44 ` Xi Ruoyao 2024-04-10 9:51 ` LIU Hao 2024-04-10 9:52 ` Xi Ruoyao 2024-04-10 10:07 ` LIU Hao 2024-04-10 10:17 ` Xi Ruoyao 2024-04-10 10:03 ` AW: " stefan 2024-04-10 10:34 ` Xi Ruoyao 2024-04-10 9:24 ` stefan 2024-04-10 9:49 ` stefan 2024-04-10 9:54 ` Xi Ruoyao 2024-04-10 9:57 ` LIU Hao 2024-04-10 10:03 ` Xi Ruoyao 2024-04-10 11:52 ` David Brown 2024-04-10 14:25 ` Stefan Franke 2024-04-10 16:51 ` David Brown 2024-04-11 0:32 ` Oleg Endo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).