[PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed _

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
@ 2020-11-10 17:59 Stefan Kanthak
  2020-11-10 18:08 ` Eric Botcazou
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Stefan Kanthak @ 2020-11-10 17:59 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 381 bytes --]

The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
is rather bad, it yields bad machine code at least on i386 and AMD64.
Since GCC knows how to shift integers twice the register size these functions
can be written as one-liners.

The implementation of the __bswapsi2() function uses SIGNED instead of
unsigned mask values; cf. __bswapdi2()

Stefan Kanthak

[-- Attachment #2: libgcc2.diff --]
[-- Type: application/octet-stream, Size: 2702 bytes --]

--- -/libgcc/libgcc2.h
+++ +/libgcc/libgcc2.h
@@ -391,7 +391,7 @@
 extern DWtype __negdi2 (DWtype);
 #endif
 
-extern DWtype __lshrdi3 (DWtype, shift_count_type);
+extern UDWtype __lshrdi3 (UDWtype, shift_count_type);
 extern DWtype __ashldi3 (DWtype, shift_count_type);
 extern DWtype __ashrdi3 (DWtype, shift_count_type);
 
--- -/libgcc/libgcc2.c
+++ +/libgcc/libgcc2.c
@@ -398,30 +398,10 @@
 /* Unless shift functions are defined with full ANSI prototypes,
    parameter b will be promoted to int if shift_count_type is smaller than an int.  */
 #ifdef L_lshrdi3
-DWtype
-__lshrdi3 (DWtype u, shift_count_type b)
+UDWtype
+__lshrdi3 (UDWtype u, shift_count_type b)
 {
-  if (b == 0)
-    return u;
-
-  const DWunion uu = {.ll = u};
-  const shift_count_type bm = W_TYPE_SIZE - b;
-  DWunion w;
-
-  if (bm <= 0)
-    {
-      w.s.high = 0;
-      w.s.low = (UWtype) uu.s.high >> -bm;
-    }
-  else
-    {
-      const UWtype carries = (UWtype) uu.s.high << bm;
-
-      w.s.high = (UWtype) uu.s.high >> b;
-      w.s.low = ((UWtype) uu.s.low >> b) | carries;
-    }
-
-  return w.ll;
+  return u >> b;
 }
 #endif
 
@@ -429,27 +409,7 @@
 DWtype
 __ashldi3 (DWtype u, shift_count_type b)
 {
-  if (b == 0)
-    return u;
-
-  const DWunion uu = {.ll = u};
-  const shift_count_type bm = W_TYPE_SIZE - b;
-  DWunion w;
-
-  if (bm <= 0)
-    {
-      w.s.low = 0;
-      w.s.high = (UWtype) uu.s.low << -bm;
-    }
-  else
-    {
-      const UWtype carries = (UWtype) uu.s.low >> bm;
-
-      w.s.low = (UWtype) uu.s.low << b;
-      w.s.high = ((UWtype) uu.s.high << b) | carries;
-    }
-
-  return w.ll;
+  return u << b;
 }
 #endif
 
@@ -457,28 +417,7 @@
 DWtype
 __ashrdi3 (DWtype u, shift_count_type b)
 {
-  if (b == 0)
-    return u;
-
-  const DWunion uu = {.ll = u};
-  const shift_count_type bm = W_TYPE_SIZE - b;
-  DWunion w;
-
-  if (bm <= 0)
-    {
-      /* w.s.high = 1..1 or 0..0 */
-      w.s.high = uu.s.high >> (W_TYPE_SIZE - 1);
-      w.s.low = uu.s.high >> -bm;
-    }
-  else
-    {
-      const UWtype carries = (UWtype) uu.s.high << bm;
-
-      w.s.high = uu.s.high >> b;
-      w.s.low = ((UWtype) uu.s.low >> b) | carries;
-    }
-
-  return w.ll;
+  return u >> b;
 }
 #endif
 \f
@@ -486,10 +425,10 @@
 SItype
 __bswapsi2 (SItype u)
 {
-  return ((((u) & 0xff000000) >> 24)
-	  | (((u) & 0x00ff0000) >>  8)
-	  | (((u) & 0x0000ff00) <<  8)
-	  | (((u) & 0x000000ff) << 24));
+  return ((((u) & 0xff000000u) >> 24)
+	  | (((u) & 0x00ff0000u) >>  8)
+	  | (((u) & 0x0000ff00u) <<  8)
+	  | (((u) & 0x000000ffu) << 24));
 }
 #endif
 #ifdef L_bswapdi2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 17:59 [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function Stefan Kanthak
@ 2020-11-10 18:08 ` Eric Botcazou
  2020-11-10 19:44   ` Stefan Kanthak
  2020-11-10 18:09 ` Jakub Jelinek
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Eric Botcazou @ 2020-11-10 18:08 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc-patches

> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
> GCC knows how to shift integers twice the register size these functions can
> be written as one-liners.

These functions are precisely meant to be used when GCC cannot do that.

-- 
Eric Botcazou



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 17:59 [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function Stefan Kanthak
  2020-11-10 18:08 ` Eric Botcazou
@ 2020-11-10 18:09 ` Jakub Jelinek
  2020-11-10 18:32   ` Jeff Law
  2020-11-10 18:26 ` Jakub Jelinek
  2020-11-10 23:48 ` Jeff Law
  3 siblings, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2020-11-10 18:09 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc-patches

On Tue, Nov 10, 2020 at 06:59:30PM +0100, Stefan Kanthak via Gcc-patches wrote:
> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> is rather bad, it yields bad machine code at least on i386 and AMD64.
> Since GCC knows how to shift integers twice the register size these functions
> can be written as one-liners.

This looks wrong.  If gcc knows how to do that inline, it will not call the
out of line function at all.  The functions are there for the cases where
gcc can't do that.
So, your patch will shorten the routines on targets where those are never
called, and completely break them on any other (resulting in infinite
recursion there).

	Jakub


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 17:59 [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function Stefan Kanthak
  2020-11-10 18:08 ` Eric Botcazou
  2020-11-10 18:09 ` Jakub Jelinek
@ 2020-11-10 18:26 ` Jakub Jelinek
  2020-11-10 23:48 ` Jeff Law
  3 siblings, 0 replies; 32+ messages in thread
From: Jakub Jelinek @ 2020-11-10 18:26 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc-patches

On Tue, Nov 10, 2020 at 06:59:30PM +0100, Stefan Kanthak via Gcc-patches wrote:
> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> is rather bad, it yields bad machine code at least on i386 and AMD64.
> Since GCC knows how to shift integers twice the register size these functions
> can be written as one-liners.
> 
> The implementation of the __bswapsi2() function uses SIGNED instead of
> unsigned mask values; cf. __bswapdi2()

The 0xff000000 constant is unsigned, the others are signed, but that doesn't
really matter as the high bits are masked off.

	Jakub


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 18:09 ` Jakub Jelinek
@ 2020-11-10 18:32   ` Jeff Law
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-10 18:32 UTC (permalink / raw)
  To: Jakub Jelinek, Stefan Kanthak; +Cc: gcc-patches


On 11/10/20 11:09 AM, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Nov 10, 2020 at 06:59:30PM +0100, Stefan Kanthak via Gcc-patches wrote:
>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>> is rather bad, it yields bad machine code at least on i386 and AMD64.
>> Since GCC knows how to shift integers twice the register size these functions
>> can be written as one-liners.
> This looks wrong.  If gcc knows how to do that inline, it will not call the
> out of line function at all.  The functions are there for the cases where
> gcc can't do that.
> So, your patch will shorten the routines on targets where those are never
> called, and completely break them on any other (resulting in infinite
> recursion there).

There's a way to test that.  It's possible the expanders know how to
handle these cases now.  I'd be surprised (as I was with the [u]cmpdi2),
but it's possible.  I can throw it into the tester which will spin all
those pesky embedded targets.  If we get infinite recursion it should
show up quite clearly.


Of course, if the expanders now handle all the cases, then these
routines in libgcc2 are dead code and this is a fairly pointless exercise.


jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 18:08 ` Eric Botcazou
@ 2020-11-10 19:44   ` Stefan Kanthak
  2020-11-10 20:14     ` Jakub Jelinek
                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Stefan Kanthak @ 2020-11-10 19:44 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

Eric Botcazou <botcazou@adacore.com> wrote:

>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>> GCC knows how to shift integers twice the register size these functions can
>> be written as one-liners.
> 
> These functions are precisely meant to be used when GCC cannot do that.

On which processor(s) is GCC unable to generate code for DWtype shifts?

JFTR: if GCC were not able to generate code for DWtype addition and subtraction
      it would also need __[u]addDI3() and __[u]subDI3() functions ... which
      are but missing from libgcc.a

Stefan Kanthak

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 19:44   ` Stefan Kanthak
@ 2020-11-10 20:14     ` Jakub Jelinek
  2020-11-10 21:09       ` Jeff Law
  2020-11-10 22:12       ` Jeff Law
  2020-11-10 22:01     ` Jeff Law
  2020-11-11  0:00     ` Andreas Schwab
  2 siblings, 2 replies; 32+ messages in thread
From: Jakub Jelinek @ 2020-11-10 20:14 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Eric Botcazou, gcc-patches

On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
> Eric Botcazou <botcazou@adacore.com> wrote:
> 
> >> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> >> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
> >> GCC knows how to shift integers twice the register size these functions can
> >> be written as one-liners.
> > 
> > These functions are precisely meant to be used when GCC cannot do that.
> 
> On which processor(s) is GCC unable to generate code for DWtype shifts?

E.g. avr-none, msp430-elf, pdp11-aout.
And I see recursive __cmpdi2 calls on avr-none too.

	Jakub


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 20:14     ` Jakub Jelinek
@ 2020-11-10 21:09       ` Jeff Law
  2020-11-10 21:17         ` Jakub Jelinek
  2020-11-24  0:21         ` Jeff Law
  2020-11-10 22:12       ` Jeff Law
  1 sibling, 2 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-10 21:09 UTC (permalink / raw)
  To: Jakub Jelinek, Stefan Kanthak; +Cc: gcc-patches, Eric Botcazou


On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
>> Eric Botcazou <botcazou@adacore.com> wrote:
>>
>>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>>>> GCC knows how to shift integers twice the register size these functions can
>>>> be written as one-liners.
>>> These functions are precisely meant to be used when GCC cannot do that.
>> On which processor(s) is GCC unable to generate code for DWtype shifts?
> E.g. avr-none, msp430-elf, pdp11-aout.
> And I see recursive __cmpdi2 calls on avr-none too.

ACK.  I'll pull those [u]cmpdi changes.  They were iffy at best, this
confirms the concerns we both had.

jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 21:09       ` Jeff Law
@ 2020-11-10 21:17         ` Jakub Jelinek
  2020-11-10 23:42           ` Jeff Law
  2020-11-24  0:21         ` Jeff Law
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2020-11-10 21:17 UTC (permalink / raw)
  To: Jeff Law; +Cc: Stefan Kanthak, gcc-patches, Eric Botcazou

On Tue, Nov 10, 2020 at 02:09:20PM -0700, Jeff Law wrote:
> 
> On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
> >> Eric Botcazou <botcazou@adacore.com> wrote:
> >>
> >>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> >>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
> >>>> GCC knows how to shift integers twice the register size these functions can
> >>>> be written as one-liners.
> >>> These functions are precisely meant to be used when GCC cannot do that.
> >> On which processor(s) is GCC unable to generate code for DWtype shifts?
> > E.g. avr-none, msp430-elf, pdp11-aout.
> > And I see recursive __cmpdi2 calls on avr-none too.
> 
> ACK.Â  I'll pull those [u]cmpdi changes.Â  They were iffy at best, this
> confirms the concerns we both had.

To be precise, I haven't tried to build libgcc for all of those, just
checked what code is produced for
unsigned long long foo (unsigned long long x) { return x << 3; }
etc. by all trunk cross-compilers I have lying around.
It might be that their __ashlDI3 etc. are actually __ashlsi3 and they
have some other implementation of __ashldi3 etc.

Anyway, I'm not sure if we really need to optimize too much functions which
are never used; and the -ftrapv stuff should one day be reimplemented using
-fsanitize=signed-integer-overflow with an abort behavior on overflows.

libgcc functions which are used heavily of course should be optimized as
much as possible.

	Jakub


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 19:44   ` Stefan Kanthak
  2020-11-10 20:14     ` Jakub Jelinek
@ 2020-11-10 22:01     ` Jeff Law
  2020-11-11  0:00     ` Andreas Schwab
  2 siblings, 0 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-10 22:01 UTC (permalink / raw)
  To: Stefan Kanthak, Eric Botcazou; +Cc: gcc-patches


On 11/10/20 12:44 PM, Stefan Kanthak wrote:
> Eric Botcazou <botcazou@adacore.com> wrote:
>
>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>>> GCC knows how to shift integers twice the register size these functions can
>>> be written as one-liners.
>> These functions are precisely meant to be used when GCC cannot do that.
> On which processor(s) is GCC unable to generate code for DWtype shifts?

We'd have to look at them all, and it's just not worth the headache.  
Just to pick one off the top of my head that I know well would be the H8
which only has QI, HI and SI shifts/rotates.  I'd expect inherent
support for double-word shifts to be the exception rather than the rule
for 32bit embedded targets.


>
> JFTR: if GCC were not able to generate code for DWtype addition and subtraction
>       it would also need __[u]addDI3() and __[u]subDI3() functions ... which
>       are but missing from libgcc.a

Umm, no.  Non-overflow trapping addition/subtraction needs just one
version, signedness doesn't matter.


jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 20:14     ` Jakub Jelinek
  2020-11-10 21:09       ` Jeff Law
@ 2020-11-10 22:12       ` Jeff Law
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-10 22:12 UTC (permalink / raw)
  To: Jakub Jelinek, Stefan Kanthak; +Cc: gcc-patches, Eric Botcazou


On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
> On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
>> Eric Botcazou <botcazou@adacore.com> wrote:
>>
>>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>>>> GCC knows how to shift integers twice the register size these functions can
>>>> be written as one-liners.
>>> These functions are precisely meant to be used when GCC cannot do that.
>> On which processor(s) is GCC unable to generate code for DWtype shifts?
> E.g. avr-none, msp430-elf, pdp11-aout.
> And I see recursive __cmpdi2 calls on avr-none too.

FWIW, the tester only builds libgcc for avr-elf as there's no newlib
port.  msp430-elf builds newlib and will run the testsuite using the
simulator from binutils/gdb.  I don't test pdp11 at all.

While the avr port may advertise DImode shifts and such, under the hood
they're actually implemented as calls into hand written assembly in
libgcc and probably implement an internal ABI, particularly WRT register
usage.


jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 21:17         ` Jakub Jelinek
@ 2020-11-10 23:42           ` Jeff Law
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-10 23:42 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Stefan Kanthak, gcc-patches, Eric Botcazou


On 11/10/20 2:17 PM, Jakub Jelinek wrote:
> On Tue, Nov 10, 2020 at 02:09:20PM -0700, Jeff Law wrote:
>> On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
>>> On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
>>>> Eric Botcazou <botcazou@adacore.com> wrote:
>>>>
>>>>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>>>>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>>>>>> GCC knows how to shift integers twice the register size these functions can
>>>>>> be written as one-liners.
>>>>> These functions are precisely meant to be used when GCC cannot do that.
>>>> On which processor(s) is GCC unable to generate code for DWtype shifts?
>>> E.g. avr-none, msp430-elf, pdp11-aout.
>>> And I see recursive __cmpdi2 calls on avr-none too.
>> ACK.  I'll pull those [u]cmpdi changes.  They were iffy at best, this
>> confirms the concerns we both had.
> To be precise, I haven't tried to build libgcc for all of those, just
> checked what code is produced for
> unsigned long long foo (unsigned long long x) { return x << 3; }
> etc. by all trunk cross-compilers I have lying around.
> It might be that their __ashlDI3 etc. are actually __ashlsi3 and they
> have some other implementation of __ashldi3 etc.

Well for avr libgcc's cmpdi compiles into something reasonable, but
ucmpdi calls cmpdi and I can't convince myself it's right.  Regardless,
the [u]cmpdi stuff was very much borderline from a correctness
standpoint and I'm going to revert that part.



>
> Anyway, I'm not sure if we really need to optimize too much functions which
> are never used; and the -ftrapv stuff should one day be reimplemented using
> -fsanitize=signed-integer-overflow with an abort behavior on overflows.
>
> libgcc functions which are used heavily of course should be optimized as
> much as possible.

Agreed.  The improvements to use the overflow intrinsics are somewhat
useful -- they make the code clearer and that's a win.  The potential
self-recursive stuff in [u]cmpdi and the shifts probably aren't useful
(I haven't looked closely at the shifts yet).  Stefan has a fundamental
mis-understanding of how those routines are used and more importantly
the constraints they have to work within.

Jeff

Jeff

>
> 	Jakub
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 17:59 [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function Stefan Kanthak
                   ` (2 preceding siblings ...)
  2020-11-10 18:26 ` Jakub Jelinek
@ 2020-11-10 23:48 ` Jeff Law
  2020-11-10 23:53   ` Jakub Jelinek
  3 siblings, 1 reply; 32+ messages in thread
From: Jeff Law @ 2020-11-10 23:48 UTC (permalink / raw)
  To: Stefan Kanthak, gcc-patches


On 11/10/20 10:59 AM, Stefan Kanthak via Gcc-patches wrote:
> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
> is rather bad, it yields bad machine code at least on i386 and AMD64.
> Since GCC knows how to shift integers twice the register size these functions
> can be written as one-liners.
>
> The implementation of the __bswapsi2() function uses SIGNED instead of
> unsigned mask values; cf. __bswapdi2()
>
> Stefan Kanthak
>
> libgcc2.diff
>
> --- -/libgcc/libgcc2.h
> +++ +/libgcc/libgcc2.h
> @@ -391,7 +391,7 @@
>  extern DWtype __negdi2 (DWtype);
>  #endif
>  
> -extern DWtype __lshrdi3 (DWtype, shift_count_type);
> +extern UDWtype __lshrdi3 (UDWtype, shift_count_type);
>  extern DWtype __ashldi3 (DWtype, shift_count_type);
>  extern DWtype __ashrdi3 (DWtype, shift_count_type);
>  
> --- -/libgcc/libgcc2.c
> +++ +/libgcc/libgcc2.c
> @@ -398,30 +398,10 @@
>  /* Unless shift functions are defined with full ANSI prototypes,
>     parameter b will be promoted to int if shift_count_type is smaller than an int.  */
>  #ifdef L_lshrdi3
> -DWtype
> -__lshrdi3 (DWtype u, shift_count_type b)
> +UDWtype
> +__lshrdi3 (UDWtype u, shift_count_type b)

As has been pointed out, you can't implement these routines by doing the
operation in the same mode as the argument.Â  It's potentially
self-recursive.


THe whole point of these routines is to provide double-word capabilities
on targets that don't have intrinsic double-word capabilities. That's
why they're written in a non-obvious way using word sized operations.


>  \f
> @@ -486,10 +425,10 @@
>  SItype
>  __bswapsi2 (SItype u)
>  {
> -  return ((((u) & 0xff000000) >> 24)
> -	  | (((u) & 0x00ff0000) >>  8)
> -	  | (((u) & 0x0000ff00) <<  8)
> -	  | (((u) & 0x000000ff) << 24));
> +  return ((((u) & 0xff000000u) >> 24)
> +	  | (((u) & 0x00ff0000u) >>  8)
> +	  | (((u) & 0x0000ff00u) <<  8)
> +	  | (((u) & 0x000000ffu) << 24));

What's the point of this change?Â  I'm not sure how the signedness of the
constant really matters here.

jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 23:48 ` Jeff Law
@ 2020-11-10 23:53   ` Jakub Jelinek
  2020-11-11  8:33     ` Stefan Kanthak
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2020-11-10 23:53 UTC (permalink / raw)
  To: Jeff Law; +Cc: Stefan Kanthak, gcc-patches

On Tue, Nov 10, 2020 at 04:48:10PM -0700, Jeff Law via Gcc-patches wrote:
> > @@ -486,10 +425,10 @@
> >  SItype
> >  __bswapsi2 (SItype u)
> >  {
> > -  return ((((u) & 0xff000000) >> 24)
> > -	  | (((u) & 0x00ff0000) >>  8)
> > -	  | (((u) & 0x0000ff00) <<  8)
> > -	  | (((u) & 0x000000ff) << 24));
> > +  return ((((u) & 0xff000000u) >> 24)
> > +	  | (((u) & 0x00ff0000u) >>  8)
> > +	  | (((u) & 0x0000ff00u) <<  8)
> > +	  | (((u) & 0x000000ffu) << 24));
> 
> What's the point of this change?Â  I'm not sure how the signedness of the
> constant really matters here.

Note 0xff000000 is implicitly 0xff000000U because it doesn't fit into signed
int, and that is the only one where the logical vs. arithmetic right shift
really matters for correct behavior.
Whether (((u) & 0x00ff0000) >> 8) is expanded as logical or arithmetic shift
doesn't really matter for code behavior, though perhaps we could during
expansion check if the most significant bit is known to be zero as in this
case and ask target about arithmetic vs. logical right shift costs and
choose the less costly if they aren't even.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 19:44   ` Stefan Kanthak
  2020-11-10 20:14     ` Jakub Jelinek
  2020-11-10 22:01     ` Jeff Law
@ 2020-11-11  0:00     ` Andreas Schwab
  2020-11-24 13:57       ` Stefan Kanthak
  2 siblings, 1 reply; 32+ messages in thread
From: Andreas Schwab @ 2020-11-11  0:00 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Eric Botcazou, gcc-patches

On Nov 10 2020, Stefan Kanthak wrote:

> Eric Botcazou <botcazou@adacore.com> wrote:
>
>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>>> GCC knows how to shift integers twice the register size these functions can
>>> be written as one-liners.
>> 
>> These functions are precisely meant to be used when GCC cannot do that.
>
> On which processor(s) is GCC unable to generate code for DWtype shifts?

On most 32-bit targets with -Os.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 23:53   ` Jakub Jelinek
@ 2020-11-11  8:33     ` Stefan Kanthak
  2020-11-11  9:55       ` Jakub Jelinek
  2020-11-30  1:06       ` Jeff Law
  0 siblings, 2 replies; 32+ messages in thread
From: Stefan Kanthak @ 2020-11-11  8:33 UTC (permalink / raw)
  To: Jakub Jelinek, Jeff Law; +Cc: gcc-patches

Jakub Jelinek <jakub@redhat.com> wrote:

> On Tue, Nov 10, 2020 at 04:48:10PM -0700, Jeff Law via Gcc-patches wrote:
>> > @@ -486,10 +425,10 @@
>> >  SItype
>> >  __bswapsi2 (SItype u)
>> >  {
>> > -  return ((((u) & 0xff000000) >> 24)
>> > -   | (((u) & 0x00ff0000) >>  8)
>> > -   | (((u) & 0x0000ff00) <<  8)
>> > -   | (((u) & 0x000000ff) << 24));
>> > +  return ((((u) & 0xff000000u) >> 24)
>> > +   | (((u) & 0x00ff0000u) >>  8)
>> > +   | (((u) & 0x0000ff00u) <<  8)
>> > +   | (((u) & 0x000000ffu) << 24));
>>
>> What's the point of this change?Â I'm not sure how the signedness of the
>> constant really matters here.
>
> Note 0xff000000 is implicitly 0xff000000U because it doesn't fit into signed
> int, and that is the only one where the logical vs. arithmetic right shift
> really matters for correct behavior.

Ouch: that's but not the point here; what matters is the undefined behaviour of
      ((u) & 0x000000ff) << 24

0x000000ff is a signed int, so (u) & 0x000000ff is signed too -- and producing
a negative value (or overflow) from the left-shift of a signed int, i.e.
shifting into (or beyond) the sign bit, is undefined behaviour!

JFTR: both -fsanitize=signed-integer-overflow and -fsanitize=undefined fail
      to catch this BUGBUGBUG, which surfaces on i386 and AMD64 with -O1 or
      -O0!
Stefan Kanthak

PS: even worse, -fsanitize=signed-integer-overflow fails to catch 1 << 31
    or 128 << 24!


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-11  8:33     ` Stefan Kanthak
@ 2020-11-11  9:55       ` Jakub Jelinek
  2020-11-11 17:54         ` Joseph Myers
  2020-11-23 23:01         ` Jeff Law
  2020-11-30  1:06       ` Jeff Law
  1 sibling, 2 replies; 32+ messages in thread
From: Jakub Jelinek @ 2020-11-11  9:55 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jeff Law, gcc-patches

On Wed, Nov 11, 2020 at 09:33:00AM +0100, Stefan Kanthak wrote:
> Ouch: that's but not the point here; what matters is the undefined behaviour of
>       ((u) & 0x000000ff) << 24
> 
> 0x000000ff is a signed int, so (u) & 0x000000ff is signed too -- and producing
> a negative value (or overflow) from the left-shift of a signed int, i.e.
> shifting into (or beyond) the sign bit, is undefined behaviour!

Only in some language dialects.
It is caught by -fsanitize=shift.
In C++20, if the shift count is within bounds, all signed as well as
unsigned left shifts well defined.
In C99/C11 there is one extra rule:
For signed x << y, in C99/C11, the following:
     (unsigned) x >> (uprecm1 - y)
     if non-zero, is undefined.
and for C++11 to C++17 another one:
  /* For signed x << y, in C++11 and later, the following:
     x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1
     is undefined.  */
So indeed, 0x80 << 24 is UB in C99/C11 and C++98, unclear in C89 and
well defined in C++11 and later.  I don't know if C2X is considering
mandating two's complement and making it well defined like C++20 did.

Guess we should fix that, though because different languages have different
rules, GCC itself except for sanitization doesn't consider it UB and only
treats shifts by negative value or shifts by bitsize or more UB.

	Jakub


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-11  9:55       ` Jakub Jelinek
@ 2020-11-11 17:54         ` Joseph Myers
  2020-11-23 23:01         ` Jeff Law
  1 sibling, 0 replies; 32+ messages in thread
From: Joseph Myers @ 2020-11-11 17:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Stefan Kanthak, gcc-patches

On Wed, 11 Nov 2020, Jakub Jelinek via Gcc-patches wrote:

> So indeed, 0x80 << 24 is UB in C99/C11 and C++98, unclear in C89 and
> well defined in C++11 and later.  I don't know if C2X is considering
> mandating two's complement and making it well defined like C++20 did.

C2x requires two's complement but that's only about representation; there 
are no changes so far to what shifts are undefined.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-11  9:55       ` Jakub Jelinek
  2020-11-11 17:54         ` Joseph Myers
@ 2020-11-23 23:01         ` Jeff Law
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-23 23:01 UTC (permalink / raw)
  To: Jakub Jelinek, Stefan Kanthak; +Cc: gcc-patches



On 11/11/20 2:55 AM, Jakub Jelinek wrote:
> On Wed, Nov 11, 2020 at 09:33:00AM +0100, Stefan Kanthak wrote:
>> Ouch: that's but not the point here; what matters is the undefined behaviour of
>>       ((u) & 0x000000ff) << 24
>>
>> 0x000000ff is a signed int, so (u) & 0x000000ff is signed too -- and producing
>> a negative value (or overflow) from the left-shift of a signed int, i.e.
>> shifting into (or beyond) the sign bit, is undefined behaviour!
> Only in some language dialects.
> It is caught by -fsanitize=shift.
> In C++20, if the shift count is within bounds, all signed as well as
> unsigned left shifts well defined.
> In C99/C11 there is one extra rule:
> For signed x << y, in C99/C11, the following:
>      (unsigned) x >> (uprecm1 - y)
>      if non-zero, is undefined.
> and for C++11 to C++17 another one:
>   /* For signed x << y, in C++11 and later, the following:
>      x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1
>      is undefined.  */
> So indeed, 0x80 << 24 is UB in C99/C11 and C++98, unclear in C89 and
> well defined in C++11 and later.  I don't know if C2X is considering
> mandating two's complement and making it well defined like C++20 did.
>
> Guess we should fix that, though because different languages have different
> rules, GCC itself except for sanitization doesn't consider it UB and only
> treats shifts by negative value or shifts by bitsize or more UB.
Even if it's well defined by C++20, I don't think we can rely on those
semantics within libgcc2.  At best we might be able to claim C99 and we
might even be stuck at C89, regardless of how GCC treats a shift into
the sign bit.

I'll do a bit of testing here, but I'm inclined to explicitly treat all
those constants as unsigned for the sake of consistency.  Thanks Stefan
for pointing out what I missed (shift into the sign bit).

jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-10 21:09       ` Jeff Law
  2020-11-10 21:17         ` Jakub Jelinek
@ 2020-11-24  0:21         ` Jeff Law
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-24  0:21 UTC (permalink / raw)
  To: Jakub Jelinek, Stefan Kanthak; +Cc: gcc-patches, Eric Botcazou



On 11/10/20 2:09 PM, Jeff Law wrote:
> On 11/10/20 1:14 PM, Jakub Jelinek via Gcc-patches wrote:
>> On Tue, Nov 10, 2020 at 08:44:32PM +0100, Stefan Kanthak wrote:
>>> Eric Botcazou <botcazou@adacore.com> wrote:
>>>
>>>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>>>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>>>>> GCC knows how to shift integers twice the register size these functions can
>>>>> be written as one-liners.
>>>> These functions are precisely meant to be used when GCC cannot do that.
>>> On which processor(s) is GCC unable to generate code for DWtype shifts?
>> E.g. avr-none, msp430-elf, pdp11-aout.
>> And I see recursive __cmpdi2 calls on avr-none too.
> ACK.  I'll pull those [u]cmpdi changes.  They were iffy at best, this
> confirms the concerns we both had.
I'm not seeing recursive calls.  I'm seeing ucmpdi calling cmpdi on avr
(and cmpdi is simple straightline code with no calls, so no recursion). 
msp looks OK WRT [u]cmpdi as does pdp11.

So I think unless we see something clearly wrong we should leave the
committed patch as-is.  There's the follow-up for abs and more multiply
stuff that still needs a looksie as well as subsequent independent patches.

Jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-11  0:00     ` Andreas Schwab
@ 2020-11-24 13:57       ` Stefan Kanthak
  2020-11-24 14:34         ` Andreas Schwab
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Kanthak @ 2020-11-24 13:57 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eric Botcazou, gcc-patches

Andreas Schwab wrote 2020-11-11:

> On Nov 10 2020, Stefan Kanthak wrote:
> 
>> Eric Botcazou <botcazou@adacore.com> wrote:
>>
>>>> The implementation of the __ashlDI3(), __ashrDI3() and __lshrDI3() functions
>>>> is rather bad, it yields bad machine code at least on i386 and AMD64. Since
>>>> GCC knows how to shift integers twice the register size these functions can
>>>> be written as one-liners.
>>> 
>>> These functions are precisely meant to be used when GCC cannot do that.
>>
>> On which processor(s) is GCC unable to generate code for DWtype shifts?
> 
> On most 32-bit targets with -Os.

--- counter-FUD-example.c ---
long long __ashldi3 (long long value, int count) {
    return value << count;
}

long long __ashrdi3 (unsigned long long value, int count) {
    return value >> count;
}

unsigned long long __lshrdi3 (unsigned long long value, int count) {
    return value >> count;
}

// just for completeness sake:

unsigned long long __lshldi3 (unsigned long long value, int count) {
    return value << count;
}

extern   signed long long  left,  right;
extern unsigned long long uleft, uright;

int main(int argc, char **argv) {
   left <<= argc;
   right >>= argc;
   uleft <<= argc;
   uright >>= argc;
}
--- EOF ---

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
...
Model name:            AMD EPYC 7262 8-Core Processor
...
gcc -m32          -o- -Os -S counter-FUD-example.c | fgrep 'call'
gcc -m32 -mno-sse -o- -Os -S counter-FUD-example.c | fgrep 'call'

'nuff said
Stefan

JTFR: without -mno-sse, GCC (at least version 8.4) generates rather
      awful and DEFINITELY LONGER code (42 vs. 38) bytes than with
      -mno-sse, i.e. -Os is buggy too!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-24 13:57       ` Stefan Kanthak
@ 2020-11-24 14:34         ` Andreas Schwab
  2020-11-24 15:40           ` Stefan Kanthak
  0 siblings, 1 reply; 32+ messages in thread
From: Andreas Schwab @ 2020-11-24 14:34 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Eric Botcazou, gcc-patches

On Nov 24 2020, Stefan Kanthak wrote:

> 'nuff said

What's your point?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-24 14:34         ` Andreas Schwab
@ 2020-11-24 15:40           ` Stefan Kanthak
  2020-11-24 15:49             ` Andreas Schwab
  2020-11-25 18:53             ` Jeff Law
  0 siblings, 2 replies; 32+ messages in thread
From: Stefan Kanthak @ 2020-11-24 15:40 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eric Botcazou, gcc-patches

Andreas Schwab wrote:

> On Nov 24 2020, Stefan Kanthak wrote:
> 
>> 'nuff said
> 
> What's your point?

Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter
example to your "argument"!
I recommend careful reading.

Stefan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-24 15:40           ` Stefan Kanthak
@ 2020-11-24 15:49             ` Andreas Schwab
  2020-11-25 18:53             ` Jeff Law
  1 sibling, 0 replies; 32+ messages in thread
From: Andreas Schwab @ 2020-11-24 15:49 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Eric Botcazou, gcc-patches

On Nov 24 2020, Stefan Kanthak wrote:

> Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter
> example to your "argument"!

In which way?

> I recommend careful reading.

Yes, I do.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-24 15:40           ` Stefan Kanthak
  2020-11-24 15:49             ` Andreas Schwab
@ 2020-11-25 18:53             ` Jeff Law
  2020-11-25 20:22               ` Stefan Kanthak
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff Law @ 2020-11-25 18:53 UTC (permalink / raw)
  To: Stefan Kanthak, Andreas Schwab; +Cc: gcc-patches, Eric Botcazou

On 11/24/20 8:40 AM, Stefan Kanthak wrote:
> Andreas Schwab wrote:
>
>> On Nov 24 2020, Stefan Kanthak wrote:
>>
>>> 'nuff said
>> What's your point?
> Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter
> example to your "argument"!
> I recommend careful reading.
Umm, you should broaden your horizons.Â  The world is not an x86.Â  I'm
pretty sure Andreas was referring to non-x86 targets.

As Jakub has already indicated, your change will result in infinite
recursion on avr.Â  I happened to have a cr16 handy and it looks like
it'd generate infinite recursion there too.

On other targets the routines you're changing won't be used because they
either have 64 bit shifts or the compiler can synthesize them from other
primitives that are available.

It's pointless to keep arguing on the shift stuff.Â  What you've
submitted is fundamentally wrong in the context of gcc's libgcc2
routines.Â  It's that simple.Â  If you keep arguing about it you're likely
just going to annoy those who can help you to the point where they won't
bother.

I think the bswapsi2 change will go forward, but it needs to be tested.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-25 18:53             ` Jeff Law
@ 2020-11-25 20:22               ` Stefan Kanthak
  2020-11-25 20:42                 ` Jakub Jelinek
  2020-11-25 20:44                 ` Jeff Law
  0 siblings, 2 replies; 32+ messages in thread
From: Stefan Kanthak @ 2020-11-25 20:22 UTC (permalink / raw)
  To: Andreas Schwab, Jeff Law; +Cc: gcc-patches, Eric Botcazou

Jeff Law <law@redhat.com> wrote:

> On 11/24/20 8:40 AM, Stefan Kanthak wrote:
>> Andreas Schwab wrote:
>>
>>> On Nov 24 2020, Stefan Kanthak wrote:
>>>
>>>> 'nuff said
>>> What's your point?
>> Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter
>> example to your "argument"!
>> I recommend careful reading.
> Umm, you should broaden your horizons.

My horizon is as wide as the cap^Winability of GCC available for the
machines I use.

> The world is not an x86.

The GCC available for the machines I use is only able to generate x86
code.

> I'm pretty sure Andreas was referring to non-x86 targets.

| On most 32-bit targets with -Os.

Are avr or cr16 32-bit processors?

> As Jakub has already indicated, your change will result in infinite
> recursion on avr.Â I happened to have a cr16 handy and it looks like
> it'd generate infinite recursion there too.

JFTR: does GCC emit a warning then? If not: why not?

Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
code it generates.

> On other targets the routines you're changing won't be used because they
> either have 64 bit shifts or the compiler can synthesize them from other
> primitives that are available.

These routines are documented in
<https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html>
and might be called by your users.

> It's pointless to keep arguing on the shift stuff.Â What you've
> submitted is fundamentally wrong in the context of gcc's libgcc2
> routines.Â It's that simple.Â If you keep arguing about it you're likely
> just going to annoy those who can help you to the point where they won't
> bother.

Is there any documentation for (the design and restrictions of) libgcc2?
The patches I sent for the shift and the comparision routines are based
on the assumption that GCC generates code for "double-word" arithmetic
inline.
If it doesn't: where are __addDI3 and __subDI3 defined, and where are
__adddi3, __addti3, __subdi3 and __subti3 documented?

Since libgcc2.[hc] don't define them, and
<https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html>
doesn't document them, I had reason to believe that my assumption holds.

> I think the bswapsi2 change will go forward, but it needs to be tested.

Stefan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-25 20:22               ` Stefan Kanthak
@ 2020-11-25 20:42                 ` Jakub Jelinek
  2020-11-25 21:22                   ` Stefan Kanthak
  2020-11-25 20:44                 ` Jeff Law
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Jelinek @ 2020-11-25 20:42 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Andreas Schwab, Jeff Law, gcc-patches, Eric Botcazou

On Wed, Nov 25, 2020 at 09:22:53PM +0100, Stefan Kanthak wrote:
> > As Jakub has already indicated, your change will result in infinite
> > recursion on avr.Ã‚ I happened to have a cr16 handy and it looks like
> > it'd generate infinite recursion there too.
> 
> JFTR: does GCC emit a warning then? If not: why not?

Why should it.  libgcc is something GCC has full control over and can assume
it is written in a way that it can't recurse infinitely, that is one of
libgcc design goals.

> Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
> no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
> no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
> code it generates.

You can always use cross-compilers.

> > On other targets the routines you're changing won't be used because they
> > either have 64 bit shifts or the compiler can synthesize them from other
> > primitives that are available.
> 
> These routines are documented in
> <https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html>
> and might be called by your users.

That is just theory, if people call them by hand instead of using normal C
arithmetics they will get worse code in any case (at least because of the
library call).
As has been said multiple times, trying to optimize routines that are never
called on x86 for x86 is just wasted energy, better invest your time in
functions that are actually ever called.  And even for those, care has to be
taken so that it doesn't break any other of the > 50 target architectures
GCC supports.

	Jakub

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-25 20:22               ` Stefan Kanthak
  2020-11-25 20:42                 ` Jakub Jelinek
@ 2020-11-25 20:44                 ` Jeff Law
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-25 20:44 UTC (permalink / raw)
  To: Stefan Kanthak, Andreas Schwab; +Cc: gcc-patches, Eric Botcazou



On 11/25/20 1:22 PM, Stefan Kanthak wrote:
> Jeff Law <law@redhat.com> wrote:
>
>> On 11/24/20 8:40 AM, Stefan Kanthak wrote:
>>> Andreas Schwab wrote:
>>>
>>>> On Nov 24 2020, Stefan Kanthak wrote:
>>>>
>>>>> 'nuff said
>>>> What's your point?
>>> Pinpoint deficiencies and bugs in GCC and libgcc, plus a counter
>>> example to your "argument"!
>>> I recommend careful reading.
>> Umm, you should broaden your horizons.
> My horizon is as wide as the cap^Winability of GCC available for the
> machines I use.
Then you should expect that some of your patches are going to simply be
inappropriate for GCC.Â  When people with 20-30 years of experience with
GCC tell you what you're doing is wrong in the context of GCC, it would
be wise to listen.Â  Continuing to argue doesn't change the fact that
what you're trying to do is fundamentally wrong.

>
>> The world is not an x86.
> The GCC available for the machines I use is only able to generate x86
> code.
You can certainly take GCC sources and use them to create cross
compilers for any architecture supported by GCC.Â  People do it all the time.

>
>> I'm pretty sure Andreas was referring to non-x86 targets.
> | On most 32-bit targets with -Os.
>
> Are avr or cr16 32-bit processors?
I don't know offhand.Â  And ultimately it doesn't matter, the shift
changes are simply wrong and will not be applied.

>
>> As Jakub has already indicated, your change will result in infinite
>> recursion on avr.Ã‚ I happened to have a cr16 handy and it looks like
>> it'd generate infinite recursion there too.
> JFTR: does GCC emit a warning then? If not: why not?
GCC doesn't try to warn for self-recursion.

>
> Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
> no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
> no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
> code it generates.
Read about building cross compilers.Â  I don't have any of those things
either.Â  But I'm still able to build and test them without anything
other than a generic x86_64 laptop.

Hell, I can *bootstrap* on things like m68k, mips, hppa, riscv, alpha,
andÂ  others using qemu emulation on a generic x86_64 laptop.


>
>> On other targets the routines you're changing won't be used because they
>> either have 64 bit shifts or the compiler can synthesize them from other
>> primitives that are available.
> These routines are documented in
> <https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routines.html>
> and might be called by your users.
THose routines are not for direct use by users.Â Â  I've already stated
that.Â  In fact, you're referring to the "gcc internals" manual which is
for developers working on GCC itself, not for end users.Â  There's
separate documentation for end users and I'd fully expect it not to
document any of the routines in libgcc, because they're not for end users.


Also note the first sentence on the page you've referenced:

"The integer arithmetic routines are used on platforms that donâ€™t
provide hardware support for arithmetic operations on some modes."

Which again is what Jakub, myself and others have been telling you.Â  You
can't (for example) use a 64bit shift in any of the 64bit shift routines
in libgcc2.Â  It's fundamentally wrong.
>> It's pointless to keep arguing on the shift stuff.Ã‚ What you've
>> submitted is fundamentally wrong in the context of gcc's libgcc2
>> routines.Ã‚ It's that simple.Ã‚ If you keep arguing about it you're likely
>> just going to annoy those who can help you to the point where they won't
>> bother.
> Is there any documentation for (the design and restrictions of) libgcc2?
Not that I'm immediately aware of.

> The patches I sent for the shift and the comparision routines are based
> on the assumption that GCC generates code for "double-word" arithmetic
> inline.
And that's a fundamentally incorrect assumption as myself and others
have pointed out.


Jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-25 20:42                 ` Jakub Jelinek
@ 2020-11-25 21:22                   ` Stefan Kanthak
  2020-11-25 22:06                     ` Jeff Law
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Kanthak @ 2020-11-25 21:22 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Andreas Schwab, Jeff Law, gcc-patches, Eric Botcazou

Jakub Jelinek <jakub@redhat.com> wrote:

> On Wed, Nov 25, 2020 at 09:22:53PM +0100, Stefan Kanthak wrote:
>> > As Jakub has already indicated, your change will result in infinite
>> > recursion on avr.Ã, I happened to have a cr16 handy and it looks like
>> > it'd generate infinite recursion there too.
>>
>> JFTR: does GCC emit a warning then? If not: why not?
>
> Why should it.  libgcc is something GCC has full control over and can assume
> it is written in a way that it can't recurse infinitely, that is one of
> libgcc design goals.

Where are these design goals documented?

>> Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
>> no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
>> no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
>> code it generates.
>
> You can always use cross-compilers.

There are no cross-compilers available here.
Why should I waste time and energy to build cross-compilers for processors
I don't use?

[...]

> As has been said multiple times, trying to optimize routines that are never
> called on x86 for x86

According to Andreas Schwab, the 64-bit shift routines are called on 32-bit
processors when compiling with -Os!
x86 is a 32-bit processor.

> is just wasted energy, better invest your time in
> functions that are actually ever called.

Where is the documentation that names the routines (not) called on the >50
target architectures?
And why do you build and ship routines which are never called at all?
This is a waste of time, space and energy!

Stefan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-25 21:22                   ` Stefan Kanthak
@ 2020-11-25 22:06                     ` Jeff Law
  2020-11-25 23:06                       ` Stefan Kanthak
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff Law @ 2020-11-25 22:06 UTC (permalink / raw)
  To: Stefan Kanthak, Jakub Jelinek; +Cc: Andreas Schwab, gcc-patches, Eric Botcazou



On 11/25/20 2:22 PM, Stefan Kanthak wrote:
> Jakub Jelinek <jakub@redhat.com> wrote:
>
>> On Wed, Nov 25, 2020 at 09:22:53PM +0100, Stefan Kanthak wrote:
>>>> As Jakub has already indicated, your change will result in infinite
>>>> recursion on avr.Ã, I happened to have a cr16 handy and it looks like
>>>> it'd generate infinite recursion there too.
>>> JFTR: does GCC emit a warning then? If not: why not?
>> Why should it.  libgcc is something GCC has full control over and can assume
>> it is written in a way that it can't recurse infinitely, that is one of
>> libgcc design goals.
> Where are these design goals documented?
The internals manual is the best source of information on these
routines.  But the implications of what you're going to read there won't
be clear unless you think hard and probably spend some time working with
embedded targets.  Jakub is absolutely, 100% correct here.

>>> Since I neither have an avr nor a cr16 here, and also no TR-440, no S/3x0,
>>> no Spectra-70, no PDP-11, no VAX, no SPARC, no MIPS, no PowerPC, no MC68k,
>>> no NSC16xxx and no NSC32xxx any more, GCC only gives me access to the x86
>>> code it generates.
>> You can always use cross-compilers.
> There are no cross-compilers available here.
> Why should I waste time and energy to build cross-compilers for processors
> I don't use?
If all we had to care about was x86, then your changes would likely be
fine, but GCC targets dozens of distinct processors from deeply embedded
to mainframes and we have to consider how changes impact all of them,
both from a correctness and from a performance standpoint.

By understanding how your proposed changes affect other processors, you
can write better changes that are more likely to get included. 
Furthermore you can focus efforts on things that matter more in the real
world.  DImode shifts in libgcc are _not_ useful to try and optimize on
x86_64 as it has instructions to implement 64 bit shifts.  DImode shifts
in libgcc are not useful to try and optimize on i686 as the compiler can
synthesize them on i686.  DImode shifts can't be easily synthesized on
other targets and on those targets we call the routines in libgcc2. 
Similarly for other routines you find in libgcc2.



>
> [...]
>
>> As has been said multiple times, trying to optimize routines that are never
>> called on x86 for x86
> According to Andreas Schwab, the 64-bit shift routines are called on 32-bit
> processors when compiling with -Os!
> x86 is a 32-bit processor.
He's making a generalization.  It's not going to apply to every 32bit
processor.  Nor would it even necessarily apply to every 16 bit
processor.  But I'm not sure what the point of arguing about this is. 
What you're trying to do with the shift routines is wrong.

>
>> is just wasted energy, better invest your time in
>> functions that are actually ever called.
> Where is the documentation that names the routines (not) called on the >50
> target architectures?
Nobody's bothered to document that.  I don't think doing so would
ultimately be useful.   I do think there is some documentation on what
we used to call libgcc1 as those routines have typically been provided
as assembly code on the targets where they're needed (think 32bit
arithmetic on something like an 8 or 16 bit target).  I haven't looked
at or needed to look for that documentation in over 20 years, so I don't
know if it's still around.


> And why do you build and ship routines which are never called at all?
> This is a waste of time, space and energy!
They are used on some and not others.

I'm sorry you think it's a waste of time space and energy.  These bits
have a purpose and it would be more work than it's worth to try and
selectively build and ship these routines on a target by target basis. 
They're part of the source tree, they get built into the library and
when they are actually needed, they will be used.  As a generality
you're going to find that most are not needed on x86.  So spending time
optimizing them for x86 just isn't all that useful.

jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-25 22:06                     ` Jeff Law
@ 2020-11-25 23:06                       ` Stefan Kanthak
  0 siblings, 0 replies; 32+ messages in thread
From: Stefan Kanthak @ 2020-11-25 23:06 UTC (permalink / raw)
  To: Jakub Jelinek, Jeff Law; +Cc: Andreas Schwab, gcc-patches, Eric Botcazou

Jeff Law <law@redhat.com> wrote:

[...]

> By understanding how your proposed changes affect other processors, you
> can write better changes that are more likely to get included. 
> Furthermore you can focus efforts on things that matter more in the real
> world. DImode shifts in libgcc are _not_ useful to try and optimize on
> x86_64 as it has instructions to implement 64 bit shifts. DImode shifts
> in libgcc are not useful to try and optimize on i686 as the compiler can
> synthesize them on i686. DImode shifts can't be easily synthesized on
> other targets and on those targets we call the routines in libgcc2. 
> Similarly for other routines you find in libgcc2.

What makes you think that my patches addressed only i386 and AMD64?

Again: from the absence of __addDI3/__subDI3 in libgcc2.[ch] I had reason
to assume that GCC synthesizes "double-word" addition/subtraction on all
processors, not just on x86.
Since "double-word" comparision and shifts are likewise simple operations
I further assumed that GCC synthesizes them too on all processors.

What's the fundamental difference between subtraction and comparision?
Why does GCC generate calls to __[u]cmpDI2 for a simple comparision
instead to synthesize it?
And: as shown in libgcc2.c, "double-word" shifts can easily be synthesized
using "single-word" shifts plus logical OR on any target.
I expected GCC to synthesize these operations on non-x86 processors too,
just like "double-word" addition and subtraction.

A possible/reasonable explanation would be code size, i.e. if the synthesized
instructions need significantly more memory than the function call (including
the argument setup of course).

Stefan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function
  2020-11-11  8:33     ` Stefan Kanthak
  2020-11-11  9:55       ` Jakub Jelinek
@ 2020-11-30  1:06       ` Jeff Law
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff Law @ 2020-11-30  1:06 UTC (permalink / raw)
  To: Stefan Kanthak, Jakub Jelinek; +Cc: gcc-patches



On 11/11/20 1:33 AM, Stefan Kanthak wrote:
> Jakub Jelinek <jakub@redhat.com> wrote:
>
>> On Tue, Nov 10, 2020 at 04:48:10PM -0700, Jeff Law via Gcc-patches wrote:
>>>> @@ -486,10 +425,10 @@
>>>>  SItype
>>>>  __bswapsi2 (SItype u)
>>>>  {
>>>> -  return ((((u) & 0xff000000) >> 24)
>>>> -   | (((u) & 0x00ff0000) >>  8)
>>>> -   | (((u) & 0x0000ff00) <<  8)
>>>> -   | (((u) & 0x000000ff) << 24));
>>>> +  return ((((u) & 0xff000000u) >> 24)
>>>> +   | (((u) & 0x00ff0000u) >>  8)
>>>> +   | (((u) & 0x0000ff00u) <<  8)
>>>> +   | (((u) & 0x000000ffu) << 24));
>>> What's the point of this change?Â I'm not sure how the signedness of the
>>> constant really matters here.
>> Note 0xff000000 is implicitly 0xff000000U because it doesn't fit into signed
>> int, and that is the only one where the logical vs. arithmetic right shift
>> really matters for correct behavior.
> Ouch: that's but not the point here; what matters is the undefined behaviour of
>       ((u) & 0x000000ff) << 24
>
> 0x000000ff is a signed int, so (u) & 0x000000ff is signed too -- and producing
> a negative value (or overflow) from the left-shift of a signed int, i.e.
> shifting into (or beyond) the sign bit, is undefined behaviour!
I've pushed the bswapsi2 changes to the trunk.

Thanks.
jeff


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-11-30  1:06 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-10 17:59 [PATCH] Better __ashlDI3, __ashrDI3 and __lshrDI3 functions, plus fixed __bswapsi2 function Stefan Kanthak
2020-11-10 18:08 ` Eric Botcazou
2020-11-10 19:44   ` Stefan Kanthak
2020-11-10 20:14     ` Jakub Jelinek
2020-11-10 21:09       ` Jeff Law
2020-11-10 21:17         ` Jakub Jelinek
2020-11-10 23:42           ` Jeff Law
2020-11-24  0:21         ` Jeff Law
2020-11-10 22:12       ` Jeff Law
2020-11-10 22:01     ` Jeff Law
2020-11-11  0:00     ` Andreas Schwab
2020-11-24 13:57       ` Stefan Kanthak
2020-11-24 14:34         ` Andreas Schwab
2020-11-24 15:40           ` Stefan Kanthak
2020-11-24 15:49             ` Andreas Schwab
2020-11-25 18:53             ` Jeff Law
2020-11-25 20:22               ` Stefan Kanthak
2020-11-25 20:42                 ` Jakub Jelinek
2020-11-25 21:22                   ` Stefan Kanthak
2020-11-25 22:06                     ` Jeff Law
2020-11-25 23:06                       ` Stefan Kanthak
2020-11-25 20:44                 ` Jeff Law
2020-11-10 18:09 ` Jakub Jelinek
2020-11-10 18:32   ` Jeff Law
2020-11-10 18:26 ` Jakub Jelinek
2020-11-10 23:48 ` Jeff Law
2020-11-10 23:53   ` Jakub Jelinek
2020-11-11  8:33     ` Stefan Kanthak
2020-11-11  9:55       ` Jakub Jelinek
2020-11-11 17:54         ` Joseph Myers
2020-11-23 23:01         ` Jeff Law
2020-11-30  1:06       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).