public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* optimizer discards sign information
@ 2024-04-10  8:52 stefan
  2024-04-10  9:16 ` Alexander Monakov
  0 siblings, 1 reply; 20+ messages in thread
From: stefan @ 2024-04-10  8:52 UTC (permalink / raw)
  To: gcc-help

Hi all,

I just stumbled over an issue, which is present in almost all gcc versions.
I worked around using inline assembly…
Maybe gcc behaves correct and I am wrong? Here is the code:

https://godbolt.org/z/cW8jcdh56

typedef unsigned long long int u64;
typedef unsigned int u32;
typedef unsigned short u16;

u64 foo(u16 a, u16 b) {
    u32 x = a * b;
    u64 r = x;
    return r;
}

And on gcc 13.2 x86.64 you get

foo:
        movzx   esi, si
        movzx   edi, di
        imul    edi, esi
        movsx   rax, edi
        ret


There is a sign extension! The optimizer step discards the information

	 x_6 = (u32) _3;

and uses _3 directly instead, which is signed.

Am I wrong or is it gcc?

Since this is a tree pass, all targets are affected. And it’s already
present in gcc-6.5

Best regards

Stefan 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimizer discards sign information
  2024-04-10  8:52 optimizer discards sign information stefan
@ 2024-04-10  9:16 ` Alexander Monakov
  2024-04-10  9:19   ` Xi Ruoyao
  2024-04-10  9:24   ` stefan
  0 siblings, 2 replies; 20+ messages in thread
From: Alexander Monakov @ 2024-04-10  9:16 UTC (permalink / raw)
  To: stefan; +Cc: gcc-help

[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]


On Wed, 10 Apr 2024, stefan@franke.ms wrote:

> Hi all,
> 
> I just stumbled over an issue, which is present in almost all gcc versions.
> I worked around using inline assembly…
> Maybe gcc behaves correct and I am wrong? Here is the code:
> 
> https://godbolt.org/z/cW8jcdh56
> 
> typedef unsigned long long int u64;
> typedef unsigned int u32;
> typedef unsigned short u16;
> 
> u64 foo(u16 a, u16 b) {
>     u32 x = a * b;
>     u64 r = x;
>     return r;
> }
> 
> And on gcc 13.2 x86.64 you get
> 
> foo:
>         movzx   esi, si
>         movzx   edi, di
>         imul    edi, esi
>         movsx   rax, edi
>         ret
> 
> 
> There is a sign extension! The optimizer step discards the information
> 
> 	 x_6 = (u32) _3;
> 
> and uses _3 directly instead, which is signed.
> 
> Am I wrong or is it gcc?

GCC is not wrong. When your code computes x:

    u32 x = a * b;

'a' and 'b' are first promoted to int according to C language rules, and
the multiplication happens in the signed int type, with UB on overflow.
The compiler deduces the range of signed int temporary holding the result
of the multiplication is [0, 0x7fffffff], which allows to propagate it
to the assignment of 'r' (which in the end produces a sign extension,
as you observed, so the propagation did not turn out to be useful).

u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b'
as a fix for the code.

Alexander

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimizer discards sign information
  2024-04-10  9:16 ` Alexander Monakov
@ 2024-04-10  9:19   ` Xi Ruoyao
  2024-04-10  9:40     ` LIU Hao
  2024-04-10  9:24   ` stefan
  1 sibling, 1 reply; 20+ messages in thread
From: Xi Ruoyao @ 2024-04-10  9:19 UTC (permalink / raw)
  To: Alexander Monakov, stefan; +Cc: gcc-help

On Wed, 2024-04-10 at 12:16 +0300, Alexander Monakov wrote:
> 
> On Wed, 10 Apr 2024, stefan@franke.ms wrote:
> 
> > Hi all,
> > 
> > I just stumbled over an issue, which is present in almost all gcc versions.
> > I worked around using inline assembly…
> > Maybe gcc behaves correct and I am wrong? Here is the code:
> > 
> > https://godbolt.org/z/cW8jcdh56
> > 
> > typedef unsigned long long int u64;
> > typedef unsigned int u32;
> > typedef unsigned short u16;
> > 
> > u64 foo(u16 a, u16 b) {
> >     u32 x = a * b;
> >     u64 r = x;
> >     return r;
> > }
> > 
> > And on gcc 13.2 x86.64 you get
> > 
> > foo:
> >         movzx   esi, si
> >         movzx   edi, di
> >         imul    edi, esi
> >         movsx   rax, edi
> >         ret
> > 
> > 
> > There is a sign extension! The optimizer step discards the information
> > 
> > 	 x_6 = (u32) _3;
> > 
> > and uses _3 directly instead, which is signed.
> > 
> > Am I wrong or is it gcc?
> 
> GCC is not wrong. When your code computes x:
> 
>     u32 x = a * b;
> 
> 'a' and 'b' are first promoted to int according to C language rules, and
> the multiplication happens in the signed int type, with UB on overflow.
> The compiler deduces the range of signed int temporary holding the result
> of the multiplication is [0, 0x7fffffff], which allows to propagate it
> to the assignment of 'r' (which in the end produces a sign extension,
> as you observed, so the propagation did not turn out to be useful).
> 
> u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b'
> as a fix for the code.

Also note that -fsanitize=undefined detects the issue properly:

$ cat t.c
typedef unsigned long long int u64;
typedef unsigned int u32;
typedef unsigned short u16;

__attribute__((noipa))
u64 foo(u16 a, u16 b) {
    u32 x = a * b;
    u64 r = x;
    return r;
}

int main()
{
	__builtin_printf("%llx\n", foo(65535, 65535));
}
$ cc t.c -O2 -fsanitize=undefined
$ ./a.out
t.c:7:15: runtime error: signed integer overflow: 65535 * 65535 cannot be represented in type 'int'
fffe0001

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 20+ messages in thread

* AW: optimizer discards sign information
  2024-04-10  9:16 ` Alexander Monakov
  2024-04-10  9:19   ` Xi Ruoyao
@ 2024-04-10  9:24   ` stefan
  2024-04-10  9:49     ` stefan
  2024-04-10 11:52     ` David Brown
  1 sibling, 2 replies; 20+ messages in thread
From: stefan @ 2024-04-10  9:24 UTC (permalink / raw)
  To: gcc-help

> -----Ursprüngliche Nachricht-----
> Von: Alexander Monakov <amonakov@ispras.ru>
> Gesendet: Mittwoch, 10. April 2024 11:17
> An: stefan@franke.ms
> Cc: gcc-help@gcc.gnu.org
> Betreff: Re: optimizer discards sign information
> 
> 
> On Wed, 10 Apr 2024, stefan@franke.ms wrote:
> 
> > Hi all,
> >
> > I just stumbled over an issue, which is present in almost all gcc versions.
> > I worked around using inline assembly… Maybe gcc behaves correct and I
> > am wrong? Here is the code:
> >
> > https://godbolt.org/z/cW8jcdh56
> >
> > typedef unsigned long long int u64;
> > typedef unsigned int u32;
> > typedef unsigned short u16;
> >
> > u64 foo(u16 a, u16 b) {
> >     u32 x = a * b;
> >     u64 r = x;
> >     return r;
> > }
> >
> > And on gcc 13.2 x86.64 you get
> >
> > foo:
> >         movzx   esi, si
> >         movzx   edi, di
> >         imul    edi, esi
> >         movsx   rax, edi
> >         ret
> >
> >
> > There is a sign extension! The optimizer step discards the information
> >
> > 	 x_6 = (u32) _3;
> >
> > and uses _3 directly instead, which is signed.
> >
> > Am I wrong or is it gcc?
> 
> GCC is not wrong. When your code computes x:
> 
>     u32 x = a * b;
> 
> 'a' and 'b' are first promoted to int according to C language rules, and the
> multiplication happens in the signed int type, with UB on overflow.
> The compiler deduces the range of signed int temporary holding the result of
> the multiplication is [0, 0x7fffffff], which allows to propagate it to the
> assignment of 'r' (which in the end produces a sign extension, as you
> observed, so the propagation did not turn out to be useful).
> 
> u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b'
> as a fix for the code.
> 
> Alexander

Thank you for the fix 😊
I considered

	u32 x = a * b;

as good enough, since from my understanding, x *is* unsigned.

Adding the multiplication with 1u resolves this.

regards

Stefan



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimizer discards sign information
  2024-04-10  9:19   ` Xi Ruoyao
@ 2024-04-10  9:40     ` LIU Hao
  2024-04-10  9:44       ` Xi Ruoyao
  0 siblings, 1 reply; 20+ messages in thread
From: LIU Hao @ 2024-04-10  9:40 UTC (permalink / raw)
  To: Xi Ruoyao, Alexander Monakov, stefan; +Cc: gcc-help


[-- Attachment #1.1: Type: text/plain, Size: 839 bytes --]

在 2024-04-10 17:19, Xi Ruoyao via Gcc-help 写道:
> $ cc t.c -O2 -fsanitize=undefined
> $ ./a.out
> t.c:7:15: runtime error: signed integer overflow: 65535 * 65535 cannot be represented in type 'int'
> fffe0001

Undefined behavior is not a valid point, as it never happens. It's a real bug. There are many PRs on 
bugzilla.


The sign extension could have been eliminated completely:

    movzx eax, si
    movzx edi, di
    imul eax, edi
    ret


And here is a similar issue:

    typedef unsigned long long int u64;
    typedef unsigned int u32;

    u64 foo(u64 a) {
      return (u32) __builtin_ctzll(a);
    }

which results in

    xor eax, eax
    rep bsf rax, rdi   // effectively `tzcnt rax, rdi`
    cdqe               // unnecessary sign-extension
    ret


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimizer discards sign information
  2024-04-10  9:40     ` LIU Hao
@ 2024-04-10  9:44       ` Xi Ruoyao
  2024-04-10  9:51         ` LIU Hao
  0 siblings, 1 reply; 20+ messages in thread
From: Xi Ruoyao @ 2024-04-10  9:44 UTC (permalink / raw)
  To: LIU Hao, Alexander Monakov, stefan; +Cc: gcc-help

On Wed, 2024-04-10 at 17:40 +0800, LIU Hao wrote:
> 在 2024-04-10 17:19, Xi Ruoyao via Gcc-help 写道:
> > $ cc t.c -O2 -fsanitize=undefined
> > $ ./a.out
> > t.c:7:15: runtime error: signed integer overflow: 65535 * 65535
> > cannot be represented in type 'int'
> > fffe0001
> 
> Undefined behavior is not a valid point, as it never happens.

You only get a "different result" when an undefined behavior happens,
thus it **is** a valid point to say there is no wrong-code issue.

> It's a real bug. There are many PRs on bugzilla.

You may argue it's a missed-optimization, but we were discussing about
wrong-code or not.

> The sign extension could have been eliminated completely:
> 
>     movzx eax, si
>     movzx edi, di
>     imul eax, edi
>     ret
> 
> 
> And here is a similar issue:
> 
>     typedef unsigned long long int u64;
>     typedef unsigned int u32;
> 
>     u64 foo(u64 a) {
>       return (u32) __builtin_ctzll(a);
>     }
> 
> which results in
> 
>     xor eax, eax
>     rep bsf rax, rdi   // effectively `tzcnt rax, rdi`
>     cdqe               // unnecessary sign-extension
>     ret

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 20+ messages in thread

* AW: optimizer discards sign information
  2024-04-10  9:24   ` stefan
@ 2024-04-10  9:49     ` stefan
  2024-04-10  9:54       ` Xi Ruoyao
  2024-04-10  9:57       ` LIU Hao
  2024-04-10 11:52     ` David Brown
  1 sibling, 2 replies; 20+ messages in thread
From: stefan @ 2024-04-10  9:49 UTC (permalink / raw)
  To: gcc-help

> -----Ursprüngliche Nachricht-----
> Von: Gcc-help <gcc-help-bounces+bebbo=bejy.net@gcc.gnu.org> Im Auftrag
> von stefan@franke.ms
> Gesendet: Mittwoch, 10. April 2024 11:25
> An: gcc-help@gcc.gnu.org
> Betreff: AW: optimizer discards sign information
> 
> > -----Ursprüngliche Nachricht-----
> > Von: Alexander Monakov <amonakov@ispras.ru>
> > Gesendet: Mittwoch, 10. April 2024 11:17
> > An: stefan@franke.ms
> > Cc: gcc-help@gcc.gnu.org
> > Betreff: Re: optimizer discards sign information
> >
> >
> > On Wed, 10 Apr 2024, stefan@franke.ms wrote:
> >
> > > Hi all,
> > >
> > > I just stumbled over an issue, which is present in almost all gcc versions.
> > > I worked around using inline assembly… Maybe gcc behaves correct and
> > > I am wrong? Here is the code:
> > >
> > > https://godbolt.org/z/cW8jcdh56
> > >
> > > typedef unsigned long long int u64;
> > > typedef unsigned int u32;
> > > typedef unsigned short u16;
> > >
> > > u64 foo(u16 a, u16 b) {
> > >     u32 x = a * b;
> > >     u64 r = x;
> > >     return r;
> > > }
> > >
> > > And on gcc 13.2 x86.64 you get
> > >
> > > foo:
> > >         movzx   esi, si
> > >         movzx   edi, di
> > >         imul    edi, esi
> > >         movsx   rax, edi
> > >         ret
> > >
> > >
> > > There is a sign extension! The optimizer step discards the
> > > information
> > >
> > > 	 x_6 = (u32) _3;
> > >
> > > and uses _3 directly instead, which is signed.
> > >
> > > Am I wrong or is it gcc?
> >
> > GCC is not wrong. When your code computes x:
> >
> >     u32 x = a * b;
> >
> > 'a' and 'b' are first promoted to int according to C language rules,
> > and the multiplication happens in the signed int type, with UB on overflow.
> > The compiler deduces the range of signed int temporary holding the
> > result of the multiplication is [0, 0x7fffffff], which allows to
> > propagate it to the assignment of 'r' (which in the end produces a
> > sign extension, as you observed, so the propagation did not turn out to be
> useful).
> >
> > u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b'
> > as a fix for the code.
> >
> > Alexander
> 
> Thank you for the fix 😊
> I considered
> 
> 	u32 x = a * b;
> 
> as good enough, since from my understanding, x *is* unsigned.
> 
> Adding the multiplication with 1u resolves this.
> 
> regards
> 
> Stefan

But I keep considering this as a bug. And clang behaves correctly!

https://godbolt.org/z/az8WqboET

typedef unsigned long long int u64;
typedef unsigned int u32;
typedef unsigned short u16;

u64 foo(u16 *a, u16 *b) {
    u32 x = *a * *b;
    u64 r = x;
    return r >> 31;
}

gcc yields

foo:
        xor     eax, eax
        ret

clang yields

foo:                                    # @foo
        movzx   ecx, word ptr [rdi]
        movzx   eax, word ptr [rsi]
        imul    eax, ecx
        shr     eax, 31
        ret

regards

Stefan





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimizer discards sign information
  2024-04-10  9:44       ` Xi Ruoyao
@ 2024-04-10  9:51         ` LIU Hao
  2024-04-10  9:52           ` Xi Ruoyao
  2024-04-10 10:03           ` AW: " stefan
  0 siblings, 2 replies; 20+ messages in thread
From: LIU Hao @ 2024-04-10  9:51 UTC (permalink / raw)
  To: Xi Ruoyao, Alexander Monakov, stefan; +Cc: gcc-help


[-- Attachment #1.1: Type: text/plain, Size: 549 bytes --]

在 2024-04-10 17:44, Xi Ruoyao 写道:
> You only get a "different result" when an undefined behavior happens,
> thus it **is** a valid point to say there is no wrong-code issue.
> 
>> It's a real bug. There are many PRs on bugzilla.
> 
> You may argue it's a missed-optimization, but we were discussing about
> wrong-code or not.

Nobody in this thread has been thinking it's wrong code.

When there is no overflow, it's missed optimization. When there is an overflow, I don't care.

Do you agree?


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimizer discards sign information
  2024-04-10  9:51         ` LIU Hao
@ 2024-04-10  9:52           ` Xi Ruoyao
  2024-04-10 10:07             ` LIU Hao
  2024-04-10 10:03           ` AW: " stefan
  1 sibling, 1 reply; 20+ messages in thread
From: Xi Ruoyao @ 2024-04-10  9:52 UTC (permalink / raw)
  To: LIU Hao, Alexander Monakov, stefan; +Cc: gcc-help

On Wed, 2024-04-10 at 17:51 +0800, LIU Hao wrote:
> 在 2024-04-10 17:44, Xi Ruoyao 写道:
> > You only get a "different result" when an undefined behavior
> > happens,
> > thus it **is** a valid point to say there is no wrong-code issue.
> > 
> > > It's a real bug. There are many PRs on bugzilla.
> > 
> > You may argue it's a missed-optimization, but we were discussing
> > about
> > wrong-code or not.
> 
> Nobody in this thread has been thinking it's wrong code.

No, the OP is still thinking it's a wrong-code.

See the latest reply from him.


-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: AW: optimizer discards sign information
  2024-04-10  9:49     ` stefan
@ 2024-04-10  9:54       ` Xi Ruoyao
  2024-04-10  9:57       ` LIU Hao
  1 sibling, 0 replies; 20+ messages in thread
From: Xi Ruoyao @ 2024-04-10  9:54 UTC (permalink / raw)
  To: stefan, gcc-help

On Wed, 2024-04-10 at 11:49 +0200, stefan@franke.ms wrote:

> 
> But I keep considering this as a bug. And clang behaves correctly!

No it is not.  Both compilers are correct as anything can happen for an
undefined behavior.

> 
> https://godbolt.org/z/az8WqboET
> 
> typedef unsigned long long int u64;
> typedef unsigned int u32;
> typedef unsigned short u16;
> 
> u64 foo(u16 *a, u16 *b) {
>     u32 x = *a * *b;
>     u64 r = x;
>     return r >> 31;
> }
> 
> gcc yields
> 
> foo:
>         xor     eax, eax
>         ret

In this case GCC can assume (*a * *b) must be in [0, 2147483647] because
otherwise there is an undefined behavior.  Thus r is in [0, 2147483647],
so r >> 31 must be 0.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: AW: optimizer discards sign information
  2024-04-10  9:49     ` stefan
  2024-04-10  9:54       ` Xi Ruoyao
@ 2024-04-10  9:57       ` LIU Hao
  2024-04-10 10:03         ` Xi Ruoyao
  1 sibling, 1 reply; 20+ messages in thread
From: LIU Hao @ 2024-04-10  9:57 UTC (permalink / raw)
  To: stefan, gcc-help


[-- Attachment #1.1: Type: text/plain, Size: 1001 bytes --]

在 2024-04-10 17:49, stefan@franke.ms 写道:
> But I keep considering this as a bug. And clang behaves correctly!

Yes there have been many reports [1]. It's a missed optimization.

You may work around it by using 32-bit parameters, or casting either operand to u32; casting the 
result will not help.


[1] 
https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&cf_gcctarget=x86_64&cf_gcctarget_type=allwordssubstr&cf_known_to_fail_type=allwords&cf_known_to_work_type=allwords&f0=OP&f1=OP&f2=product&f3=component&f4=alias&f5=short_desc&f7=content&f8=CP&f9=CP&j1=OR&list_id=423756&o2=substring&o3=substring&o4=substring&o5=substring&o6=substring&o7=matches&query_format=advanced&short_desc=sign%20extension&short_desc_type=allwordssubstr&v2=sign-extension&v3=sign-extension&v4=sign-extension&v5=sign-extension&v6=sign-extension&v7="sign-extension"


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* AW: optimizer discards sign information
  2024-04-10  9:51         ` LIU Hao
  2024-04-10  9:52           ` Xi Ruoyao
@ 2024-04-10 10:03           ` stefan
  2024-04-10 10:34             ` Xi Ruoyao
  1 sibling, 1 reply; 20+ messages in thread
From: stefan @ 2024-04-10 10:03 UTC (permalink / raw)
  To: gcc-help



> -----Ursprüngliche Nachricht-----
> Von: LIU Hao <lh_mouse@126.com>
> Gesendet: Mittwoch, 10. April 2024 11:51
> An: Xi Ruoyao <xry111@xry111.site>; Alexander Monakov
> <amonakov@ispras.ru>; stefan@franke.ms
> Cc: gcc-help@gcc.gnu.org
> Betreff: Re: optimizer discards sign information
> 
> 在 2024-04-10 17:44, Xi Ruoyao 写道:
> > You only get a "different result" when an undefined behavior happens,
> > thus it **is** a valid point to say there is no wrong-code issue.
> >
> >> It's a real bug. There are many PRs on bugzilla.
> >
> > You may argue it's a missed-optimization, but we were discussing about
> > wrong-code or not.
> 
> Nobody in this thread has been thinking it's wrong code.
> 
> When there is no overflow, it's missed optimization. When there is an
> overflow, I don't care.
> 
> Do you agree?
> 
> 
> --
> Best regards,
> LIU Hao


Yes, there is an overflow when the value gets assigned to x

	u32 x = *a * *b;

And after that line of code, x is a valid unsigned int, no matter what value was assigned. And the compiler must not throw away that unsignedness.

Also an add can overflow:

u64 faa(int a, int b) {
    u32 x = a + b;
    u64 r = x;

And in this case the optimizer doesn't discard the variable x

With the multiplication the optimizer kills it:

int _5;

  _5 = _2 * _4;
  x_9 = (u32) _5;
  r_10 = (u64) x_9;

becomes:

  int _5;

  _5 = _2 * _4;
  r_10 = (u64) _5;

The cast to (u32) is an important information, which IMHO must not be discarded.


Regards

Stefan



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: AW: optimizer discards sign information
  2024-04-10  9:57       ` LIU Hao
@ 2024-04-10 10:03         ` Xi Ruoyao
  0 siblings, 0 replies; 20+ messages in thread
From: Xi Ruoyao @ 2024-04-10 10:03 UTC (permalink / raw)
  To: LIU Hao, stefan, gcc-help

On Wed, 2024-04-10 at 17:57 +0800, LIU Hao via Gcc-help wrote:
> 在 2024-04-10 17:49, stefan@franke.ms 写道:
> > But I keep considering this as a bug. And clang behaves correctly!
> 
> Yes there have been many reports [1]. It's a missed optimization.

Note that for this specific case:

   typedef unsigned long long int u64;
   typedef unsigned int u32;
   typedef unsigned short u16;
   
   u64 foo(u16 *a, u16 *b) {
       u32 x = *a * *b;
       u64 r = x;
       return r >> 31;
   }
   
   gcc yields
   
   foo:
           xor     eax, eax
           ret
   
   clang yields
   
   foo:                                    # @foo
           movzx   ecx, word ptr [rdi]
           movzx   eax, word ptr [rsi]
           imul    eax, ecx
           shr     eax, 31
           ret
   
It's actually a missed-optimization of **clang**.  Optimizing this
function to always return 0 **is** correct.

But for the general case:

u64 foo(u16 a, u16 b) {
    u32 x = a * b;
    u64 r = x;
    return r;
}

there is a missed-optimization of GCC (redundant sign extension).

> You may work around it by using 32-bit parameters, or casting either
> operand to u32; casting the result will not help.

Indeed.


-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimizer discards sign information
  2024-04-10  9:52           ` Xi Ruoyao
@ 2024-04-10 10:07             ` LIU Hao
  2024-04-10 10:17               ` Xi Ruoyao
  0 siblings, 1 reply; 20+ messages in thread
From: LIU Hao @ 2024-04-10 10:07 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: gcc-help


[-- Attachment #1.1: Type: text/plain, Size: 571 bytes --]

在 2024-04-10 17:52, Xi Ruoyao via Gcc-help 写道:
> No, the OP is still thinking it's a wrong-code.

Would you read the Subject please?


     u32 x = a * b;
     u64 r = x;
     return r;

This is same as

     u32 x = (int) a * (int) b;
     u64 r = x;
     return r;

and

     return (u64)(u32) ((int) a * (int) b);


The code requests an `int` be zero-extended to a `u64` (if the result is written to EAX then this is 
no-op), but GCC performs sign extension anyway. Do you still consider it not a bug?



-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimizer discards sign information
  2024-04-10 10:07             ` LIU Hao
@ 2024-04-10 10:17               ` Xi Ruoyao
  0 siblings, 0 replies; 20+ messages in thread
From: Xi Ruoyao @ 2024-04-10 10:17 UTC (permalink / raw)
  To: LIU Hao; +Cc: gcc-help

On Wed, 2024-04-10 at 18:07 +0800, LIU Hao wrote:
> 在 2024-04-10 17:52, Xi Ruoyao via Gcc-help 写道:
> > No, the OP is still thinking it's a wrong-code.
> 
> Would you read the Subject please?
> 
> 
>      u32 x = a * b;
>      u64 r = x;
>      return r;
> 
> This is same as
> 
>      u32 x = (int) a * (int) b;
>      u64 r = x;
>      return r;
> 
> and
> 
>      return (u64)(u32) ((int) a * (int) b);
> 
> 
> The code requests an `int` be zero-extended to a `u64` (if the result is written to EAX then this is 
> no-op), but GCC performs sign extension anyway. Do you still consider it not a bug?

It is a missed-optimization, but not wrong code.

The code does not requests an *arbitrary* int to be zero-extended to a
u64.  It requests a value which can be proven as non-negative to be
zero-extended.  Thus doing a sign-extension may be sub-optimal
(depending on the context and the target feature etc), but not wrong.

u32 x = (int) a * (int) b;

Here a and b are u16, so both (int) a and (int) b are non-negative.  And
since signed-overflow is UB, x is non-negative too.  And for a non-
negative int x, (u64)x is just same as (u64)(u32)x.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: AW: optimizer discards sign information
  2024-04-10 10:03           ` AW: " stefan
@ 2024-04-10 10:34             ` Xi Ruoyao
  0 siblings, 0 replies; 20+ messages in thread
From: Xi Ruoyao @ 2024-04-10 10:34 UTC (permalink / raw)
  To: stefan, gcc-help

On Wed, 2024-04-10 at 12:03 +0200, stefan@franke.ms wrote:
> Yes, there is an overflow when the value gets assigned to x
> 
> 	u32 x = *a * *b;
> 
> And after that line of code, x is a valid unsigned int, no matter what
> value was assigned. And the compiler must not throw away that
> unsignedness.

If *a * *b does not overflow, x is a valid unsigned int.  But if *a * *b
overflows, x is not valid, at all.  Its type does not matter.

> Also an add can overflow:
> 
> u64 faa(int a, int b) {
>     u32 x = a + b;
>     u64 r = x;
> 
> And in this case the optimizer doesn't discard the variable x

Only an overflow on *arithmetic operation* is undefined behavior.  An
overflow on *conversion* is not.  In fact an "overflow on conversion" is
even not referred as "overflow" in the standard.

So in this case if a and b are both -1, x *must* be 0xfffffffdU, r
*must* be 0xfffffffdULL, and there's no undefined behavior.  So it will
be incorrect (i.e. violating the standard, not "different from what a
person thinks") to use a signed extension here, and the compiler does
not do that.

But for

u16 a, b;
u32 x = (int)a * (int)b;

(int)a and (int)b must be non-negative, and since an overflow on
multiplication is UB, (int)a * (int)b must be non-negative too.  So it's
valid (i.e. allowed by the standard, not "doing exactly what a person
thinks") to use a signed extension (though maybe it's not optimal, and
we may have a missed-optimization here).

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: AW: optimizer discards sign information
  2024-04-10  9:24   ` stefan
  2024-04-10  9:49     ` stefan
@ 2024-04-10 11:52     ` David Brown
  2024-04-10 14:25       ` Stefan Franke
  1 sibling, 1 reply; 20+ messages in thread
From: David Brown @ 2024-04-10 11:52 UTC (permalink / raw)
  To: stefan, gcc-help

On 10/04/2024 11:24, stefan@franke.ms wrote:
>> -----Ursprüngliche Nachricht-----
>> Von: Alexander Monakov <amonakov@ispras.ru>
>> Gesendet: Mittwoch, 10. April 2024 11:17
>> An: stefan@franke.ms
>> Cc: gcc-help@gcc.gnu.org
>> Betreff: Re: optimizer discards sign information
>>
>>
>> On Wed, 10 Apr 2024, stefan@franke.ms wrote:
>>
>>> Hi all,
>>>
>>> I just stumbled over an issue, which is present in almost all gcc versions.
>>> I worked around using inline assembly… Maybe gcc behaves correct and I
>>> am wrong? Here is the code:
>>>
>>> https://godbolt.org/z/cW8jcdh56
>>>
>>> typedef unsigned long long int u64;
>>> typedef unsigned int u32;
>>> typedef unsigned short u16;
>>>
>>> u64 foo(u16 a, u16 b) {
>>>      u32 x = a * b;
>>>      u64 r = x;
>>>      return r;
>>> }
>>>
>>> And on gcc 13.2 x86.64 you get
>>>
>>> foo:
>>>          movzx   esi, si
>>>          movzx   edi, di
>>>          imul    edi, esi
>>>          movsx   rax, edi
>>>          ret
>>>
>>>
>>> There is a sign extension! The optimizer step discards the information
>>>
>>> 	 x_6 = (u32) _3;
>>>
>>> and uses _3 directly instead, which is signed.
>>>
>>> Am I wrong or is it gcc?
>>
>> GCC is not wrong. When your code computes x:
>>
>>      u32 x = a * b;
>>
>> 'a' and 'b' are first promoted to int according to C language rules, and the
>> multiplication happens in the signed int type, with UB on overflow.
>> The compiler deduces the range of signed int temporary holding the result of
>> the multiplication is [0, 0x7fffffff], which allows to propagate it to the
>> assignment of 'r' (which in the end produces a sign extension, as you
>> observed, so the propagation did not turn out to be useful).
>>
>> u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b'
>> as a fix for the code.
>>
>> Alexander
> 
> Thank you for the fix 😊
> I considered
> 
> 	u32 x = a * b;
> 
> as good enough, since from my understanding, x *is* unsigned.
> 
> Adding the multiplication with 1u resolves this.
> 

 From the wording you use, I think perhaps you have (or had) two 
misunderstandings about the way C works here.  First, when you have an 
expression "x = y", the type of "x" is irrelevant to the evaluation of 
the expression "y", its value, and the validity of the value.

Secondly, if you hit undefined behaviour somewhere, /everything/ 
afterwards is undefined behaviour.  You cannot "correct" it, or force it 
in some ways to establish properties about it.  And the compiler can 
assume it does not happen (or you don't care about anything that might 
happen).  This lets it optimise based on these assumptions.

So the fact that "x" is declared as an unsigned type is basically 
irrelevant.  The semantics of the code "u32 x = a * b;", for the type 
sizes of x86-64, are that the compiler must take the u16 value in "a" 
and convert it into a 32-bit "int" preserving its value.  It does the 
same with "b".  It knows these are two 32-bit signed ints with values 
between 0 and 0xffff.

Then it is asked to multiply them.  It knows the result does not 
overflow - because overflow in the operation would be UB.  Thus it knows 
the result is between 0 and 0x7fff'ffff.

This is then converted to a u32.  The compiler can do this any way it 
likes, as long as the value is conserved for all possible values (0 to 
0x7fff'ffff).  Typically it will be a no-op.

Then this is converted to a u64, and again the compiler can do so in any 
manner that is value conserving for all possible values.  Since it knows 
the value is in the range 0 to 0x7fff'ffff, then either a sign-extension 
from 32-bit to 64-bit, or a zero extension, will work fine.  (It could 
also explicitly set the upper 32-bit to 0, or bitwise-and the register 
with 0x0000'0000'7fff'ffff, or shift it 33 bits left then 33 bits right, 
or any other operation that gives the same result in the end.  That's a 
question of optimisation and efficiency, not correctness.)

If it were to matter whether the compiler used sign extension or zero 
extension, you would have already have executed undefined behaviour - 
and all bets are off.  Thus it /doesn't/ matter which is used - the 
compiler can use either method, whichever it thinks is most efficient. 
(And if it picked the less efficient one, that's a missed optimisation 
bug, not a code correctness bug.)

And if you try to rely on the effects of UB in your code, then your code 
has a bug.  Different compilers and different options may give you 
different end results - undefined behaviour means /anything/ can happen, 
including the thing you wanted to happen!


This particular case is a bit subtle, and it's easy to get caught out - 
after all, it's using unsigned types everywhere, and the common myth is 
that unsigned types don't have undefined behaviour.  The reality is that 
it is /operations/ that have defined or undefined behaviour, not the 
types, and there are no arithmetic operations on types smaller than 
"int".  "uint8_t" and "uint16_t" silently promote to "int" (on 32-bit or 
64-bit systems), and operations on /int/ can have undefined behaviour.


I hope that clears up any remaining misunderstandings - and I hope it 
didn't sound patronising about things that you already knew.

David




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: AW: optimizer discards sign information
  2024-04-10 11:52     ` David Brown
@ 2024-04-10 14:25       ` Stefan Franke
  2024-04-10 16:51         ` David Brown
  2024-04-11  0:32         ` Oleg Endo
  0 siblings, 2 replies; 20+ messages in thread
From: Stefan Franke @ 2024-04-10 14:25 UTC (permalink / raw)
  To: gcc-help

[-- Attachment #1: Type: text/plain, Size: 5828 bytes --]



Am 10. April 2024 13:52:39 MESZ schrieb David Brown via Gcc-help <gcc-help@gcc.gnu.org>:
>On 10/04/2024 11:24, stefan@franke.ms wrote:
>>> -----Ursprüngliche Nachricht-----
>>> Von: Alexander Monakov <amonakov@ispras.ru>
>>> Gesendet: Mittwoch, 10. April 2024 11:17
>>> An: stefan@franke.ms
>>> Cc: gcc-help@gcc.gnu.org
>>> Betreff: Re: optimizer discards sign information
>>> 
>>> 
>>> On Wed, 10 Apr 2024, stefan@franke.ms wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I just stumbled over an issue, which is present in almost all gcc versions.
>>>> I worked around using inline assembly… Maybe gcc behaves correct and I
>>>> am wrong? Here is the code:
>>>> 
>>>> https://godbolt.org/z/cW8jcdh56
>>>> 
>>>> typedef unsigned long long int u64;
>>>> typedef unsigned int u32;
>>>> typedef unsigned short u16;
>>>> 
>>>> u64 foo(u16 a, u16 b) {
>>>>      u32 x = a * b;
>>>>      u64 r = x;
>>>>      return r;
>>>> }
>>>> 
>>>> And on gcc 13.2 x86.64 you get
>>>> 
>>>> foo:
>>>>          movzx   esi, si
>>>>          movzx   edi, di
>>>>          imul    edi, esi
>>>>          movsx   rax, edi
>>>>          ret
>>>> 
>>>> 
>>>> There is a sign extension! The optimizer step discards the information
>>>> 
>>>> 	 x_6 = (u32) _3;
>>>> 
>>>> and uses _3 directly instead, which is signed.
>>>> 
>>>> Am I wrong or is it gcc?
>>> 
>>> GCC is not wrong. When your code computes x:
>>> 
>>>      u32 x = a * b;
>>> 
>>> 'a' and 'b' are first promoted to int according to C language rules, and the
>>> multiplication happens in the signed int type, with UB on overflow.
>>> The compiler deduces the range of signed int temporary holding the result of
>>> the multiplication is [0, 0x7fffffff], which allows to propagate it to the
>>> assignment of 'r' (which in the end produces a sign extension, as you
>>> observed, so the propagation did not turn out to be useful).
>>> 
>>> u16 * u16 is a famous footgun for sure. I'd suggest 'x = 1u * a * b'
>>> as a fix for the code.
>>> 
>>> Alexander
>> 
>> Thank you for the fix 😊
>> I considered
>> 
>> 	u32 x = a * b;
>> 
>> as good enough, since from my understanding, x *is* unsigned.
>> 
>> Adding the multiplication with 1u resolves this.
>> 
>
>From the wording you use, I think perhaps you have (or had) two misunderstandings about the way C works here.  First, when you have an expression "x = y", the type of "x" is irrelevant to the evaluation of the expression "y", its value, and the validity of the value.
>
>Secondly, if you hit undefined behaviour somewhere, /everything/ afterwards is undefined behaviour.  You cannot "correct" it, or force it in some ways to establish properties about it.  And the compiler can assume it does not happen (or you don't care about anything that might happen).  This lets it optimise based on these assumptions.
>
>So the fact that "x" is declared as an unsigned type is basically irrelevant.  The semantics of the code "u32 x = a * b;", for the type sizes of x86-64, are that the compiler must take the u16 value in "a" and convert it into a 32-bit "int" preserving its value.  It does the same with "b".  It knows these are two 32-bit signed ints with values between 0 and 0xffff.
>
>Then it is asked to multiply them.  It knows the result does not overflow - because overflow in the operation would be UB.  Thus it knows the result is between 0 and 0x7fff'ffff.
>
>This is then converted to a u32.  The compiler can do this any way it likes, as long as the value is conserved for all possible values (0 to 0x7fff'ffff).  Typically it will be a no-op.
>
>Then this is converted to a u64, and again the compiler can do so in any manner that is value conserving for all possible values.  Since it knows the value is in the range 0 to 0x7fff'ffff, then either a sign-extension from 32-bit to 64-bit, or a zero extension, will work fine.  (It could also explicitly set the upper 32-bit to 0, or bitwise-and the register with 0x0000'0000'7fff'ffff, or shift it 33 bits left then 33 bits right, or any other operation that gives the same result in the end.  That's a question of optimisation and efficiency, not correctness.)
>
>If it were to matter whether the compiler used sign extension or zero extension, you would have already have executed undefined behaviour - and all bets are off.  Thus it /doesn't/ matter which is used - the compiler can use either method, whichever it thinks is most efficient. (And if it picked the less efficient one, that's a missed optimisation bug, not a code correctness bug.)
>
>And if you try to rely on the effects of UB in your code, then your code has a bug.  Different compilers and different options may give you different end results - undefined behaviour means /anything/ can happen, including the thing you wanted to happen!
>
>
>This particular case is a bit subtle, and it's easy to get caught out - after all, it's using unsigned types everywhere, and the common myth is that unsigned types don't have undefined behaviour.  The reality is that it is /operations/ that have defined or undefined behaviour, not the types, and there are no arithmetic operations on types smaller than "int".  "uint8_t" and "uint16_t" silently promote to "int" (on 32-bit or 64-bit systems), and operations on /int/ can have undefined behaviour.
>
>
>I hope that clears up any remaining misunderstandings - and I hope it didn't sound patronising about things that you already knew.
>
>David
>
>

Thank you for the detailed explanation.
It still doesn't fit together, since if the compiler knows that it's unsigned, why does it emit code for sign extension on the m68k target?

Well. Since there are enough explanations and my expectations are simply wrong consider this as closed.

Kind regards

Stefan 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: AW: optimizer discards sign information
  2024-04-10 14:25       ` Stefan Franke
@ 2024-04-10 16:51         ` David Brown
  2024-04-11  0:32         ` Oleg Endo
  1 sibling, 0 replies; 20+ messages in thread
From: David Brown @ 2024-04-10 16:51 UTC (permalink / raw)
  To: Stefan Franke, gcc-help

On 10/04/2024 16:25, Stefan Franke wrote:

> 
> Thank you for the detailed explanation.

I'm glad if it helped you a little.

> It still doesn't fit together, since if the compiler knows that it's unsigned, why does it emit code for sign extension on the m68k target?
> 

It's nothing more than a sub-optimal code generation issue, since sign 
extension and zero extension are equally correct.  gcc's code generation 
is pretty good by any standard, but it is not always perfectly optimal. 
Maybe it's hard to improve this one without a lot of effort or knock-on 
effects.  Maybe the circumstances occur so rarely it is not worth the 
effort.  Maybe the sign extension is actually faster (or no slower) on 
some targets - sometimes real-world speeds are not what we expect by 
looking at the assembly.

(I thought you were using x86-64, not m68k?  The m68k target is not 
nearly as popular as it used to be in the early days of gcc.  My first 
use of gcc was building a cross-compiler of gcc 2.7 to run on DOS for a 
68332 target.  Happy days :-) )

> Well. Since there are enough explanations and my expectations are simply wrong consider this as closed.
> 
> Kind regards
> 
> Stefan
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: AW: optimizer discards sign information
  2024-04-10 14:25       ` Stefan Franke
  2024-04-10 16:51         ` David Brown
@ 2024-04-11  0:32         ` Oleg Endo
  1 sibling, 0 replies; 20+ messages in thread
From: Oleg Endo @ 2024-04-11  0:32 UTC (permalink / raw)
  To: Stefan Franke, gcc-help


On Wed, 2024-04-10 at 16:25 +0200, Stefan Franke wrote:
> 
> Thank you for the detailed explanation.
> It still doesn't fit together, since if the compiler knows that it's unsigned, why does it emit code for sign extension on the m68k target?
> 
> Well. Since there are enough explanations and my expectations are simply wrong consider this as closed.
> 
> 

By the way, on targets that can do widening multiplication the sign
extensions are omitted.  E.g. on SH -- https://godbolt.org/z/7hr4njEx8

_foo:
        mulu.w  r4,r5
        sts     macl,r1
        sts     macl,r0
        shll    r1
        rts     
        subc    r1,r1


Best regards,
Oleg Endo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-04-11  0:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-10  8:52 optimizer discards sign information stefan
2024-04-10  9:16 ` Alexander Monakov
2024-04-10  9:19   ` Xi Ruoyao
2024-04-10  9:40     ` LIU Hao
2024-04-10  9:44       ` Xi Ruoyao
2024-04-10  9:51         ` LIU Hao
2024-04-10  9:52           ` Xi Ruoyao
2024-04-10 10:07             ` LIU Hao
2024-04-10 10:17               ` Xi Ruoyao
2024-04-10 10:03           ` AW: " stefan
2024-04-10 10:34             ` Xi Ruoyao
2024-04-10  9:24   ` stefan
2024-04-10  9:49     ` stefan
2024-04-10  9:54       ` Xi Ruoyao
2024-04-10  9:57       ` LIU Hao
2024-04-10 10:03         ` Xi Ruoyao
2024-04-10 11:52     ` David Brown
2024-04-10 14:25       ` Stefan Franke
2024-04-10 16:51         ` David Brown
2024-04-11  0:32         ` Oleg Endo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).