public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* libgcc: strange optimization
@ 2011-08-01 20:30 Michael Walle
  2011-08-01 20:51 ` Georg-Johann Lay
  2011-08-01 21:30 ` Richard Henderson
  0 siblings, 2 replies; 47+ messages in thread
From: Michael Walle @ 2011-08-01 20:30 UTC (permalink / raw)
  To: gcc

Hi list,


consider the following test code:
 static void inline f1(int arg)
 {
   register int a1 asm("r8") = 10;
   register int a2 asm("r1") = arg;

   asm("scall" : : "r"(a1), "r"(a2));
 }

 void f2(int arg)
 {
   f1(arg >> 10);
 }


If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
email), the a1 = 10; assignment is optimized away. According to my
understanding the following happens:

 1) function inlining
 2) deferred argument evaluation
 3) because our target has no barrel shifter, (arg >> 10) is emitted as a
function call to libgcc's __ashrsi3 (_in place_!)
 4) BAM! dead code elimination optimizes r8 assignment away because calli
may clobber r1-r10 (callee saved registers on lm32).

If you use:
 void f2(int arg)
 {
   f1(__ashrsi3(arg, 10));
 }
everything works as expected, __ashrsi3 is evaluated before the body of f1.

According to wikipedia [1], function calls are sequence points and all
side effects for the arguments are completed before entering the function.
So in my understanding the deferred argument evaluation is wrong if that
operation is emitted as a call to a libgcc helper.

I tried that on other architectures too (microblaze and avr). All show the
same behaviour. If an integer arithmetic opcode is translated to a call to
libgcc, every assignment to a register which is clobbered by the call is
optimized away.

The GCC mentions some caveats when using explicit register variables [2]:
  In the above example, beware that a register that is call-clobbered by
  the target ABI will be overwritten by any function call in the
  assignment, including library calls for arithmetic operators. Also a
  register may be clobbered when generating some operations, like variable
  shift, memory copy or memory move on x86. Assuming it is a call-clobbered
  register, this may happen to r0 above by the assignment to p2. If you
  have to use such a register, use temporary variables for expressions
  between the register assignment.

But i think, this may not apply to the case above, where the arithmetic
operator is an argument of the called function. Eg. there is a sequence
point and the statements must not be reordered.


Assembler output (lm32-gcc -O1 -S -c test.c):
f2:
	addi     sp, sp, -4
	sw       (sp+4), ra
	addi     r2, r0, 10
	calli    __ashrsi3
	scall
	lw       ra, (sp+4)
	addi     sp, sp, 4
	b        ra

Assembler output with no DCE (lm32-gcc -O1 -S -fno-dce -c test.c)
f2:
	addi     sp, sp, -4
	sw       (sp+4), ra
	addi     r8, r0, 10
	addi     r2, r0, 10
	calli    __ashrsi3
	scall
	lw       ra, (sp+4)
	addi     sp, sp, 4
	b        ra

[1] http://en.wikipedia.org/wiki/Sequence_point
[2]
http://gcc.gnu.org/onlinedocs/gcc/Extended-
Asm.html#Example%20of%20asm%20with%20clobbered%20asm%20reg

-- 
Michael

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-01 20:30 libgcc: strange optimization Michael Walle
@ 2011-08-01 20:51 ` Georg-Johann Lay
  2011-08-01 21:14   ` Michael Walle
  2011-08-02  6:29   ` Hans-Peter Nilsson
  2011-08-01 21:30 ` Richard Henderson
  1 sibling, 2 replies; 47+ messages in thread
From: Georg-Johann Lay @ 2011-08-01 20:51 UTC (permalink / raw)
  To: Michael Walle; +Cc: gcc

Michael Walle schrieb:
> Hi list,
> 
> consider the following test code:
>  static void inline f1(int arg)
>  {
>    register int a1 asm("r8") = 10;
>    register int a2 asm("r1") = arg;
> 
>    asm("scall" : : "r"(a1), "r"(a2));
>  }
> 
>  void f2(int arg)
>  {
>    f1(arg >> 10);
>  }
> 
> 
> If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
> email), the a1 = 10; assignment is optimized away.

Your asm has no output operands and no side effects, with more 
aggressive optimization the whole ask would disappear.

What you want is maybe something like

    asm volatile ("scall" : : "r"(a1), "r"(a2));

Johann

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-01 20:51 ` Georg-Johann Lay
@ 2011-08-01 21:14   ` Michael Walle
  2011-08-02  6:47     ` Georg-Johann Lay
  2011-08-02  6:29   ` Hans-Peter Nilsson
  1 sibling, 1 reply; 47+ messages in thread
From: Michael Walle @ 2011-08-01 21:14 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc


Hi,

That was quick :)

> Your asm has no output operands and no side effects, with more
> aggressive optimization the whole ask would disappear.
Sorry, that was just a small test file, the original code has output operands.

The new test code:
 static int inline f1(int arg)
 {
   register int ret asm("r1");
   register int a1 asm("r8") = 10;
   register int a2 asm("r1") = arg;

   asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory");

   return ret;
 }

 int f2(int arg1, int arg2)
 {
   return f1(arg1 >> 10);
 }

translates to the same assembly:
f2:
        addi     sp, sp, -4
        sw       (sp+4), ra
        addi     r2, r0, 10
        calli    __ashrsi3
        scall
        lw       ra, (sp+4)
        addi     sp, sp, 4
        b        ra

PS. R1 is the return register in the target architecture ABI.

-- 
Michael

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-01 20:30 libgcc: strange optimization Michael Walle
  2011-08-01 20:51 ` Georg-Johann Lay
@ 2011-08-01 21:30 ` Richard Henderson
  2011-08-02  6:37   ` Hans-Peter Nilsson
  1 sibling, 1 reply; 47+ messages in thread
From: Richard Henderson @ 2011-08-01 21:30 UTC (permalink / raw)
  To: Michael Walle; +Cc: gcc

On 08/01/2011 01:30 PM, Michael Walle wrote:
>  1) function inlining
>  2) deferred argument evaluation
>  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
> function call to libgcc's __ashrsi3 (_in place_!)
>  4) BAM! dead code elimination optimizes r8 assignment away because calli
> may clobber r1-r10 (callee saved registers on lm32).

I'm afraid the only solution I can think of is to force F1 out-of-line.
That's the only safe way to make sure that arguments are completely
evaluated before forcing them into hard register variables.

Alternately, expose new constraints such that you don't need the
hard register variables at all.  E.g.

  asm("scall" : : "R08"(a1), "R01"(a2));

where Rxx is defined in constraints.md for every relevant register.
That'll prevent a reference to the hard register until register
allocation, at which point we'll have done the right thing with
the shift.


r~

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-01 20:51 ` Georg-Johann Lay
  2011-08-01 21:14   ` Michael Walle
@ 2011-08-02  6:29   ` Hans-Peter Nilsson
  1 sibling, 0 replies; 47+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-02  6:29 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: Michael Walle, gcc

On Mon, 1 Aug 2011, Georg-Johann Lay wrote:
> Michael Walle schrieb:
> > Hi list,
> >
> > consider the following test code:
> >  static void inline f1(int arg)
> >  {
> >    register int a1 asm("r8") = 10;
> >    register int a2 asm("r1") = arg;
> >
> >    asm("scall" : : "r"(a1), "r"(a2));
> >  }
> >
> >  void f2(int arg)
> >  {
> >    f1(arg >> 10);
> >  }
> >
> >
> > If you compile this code with 'lm32-gcc -O1 -S -c test.c' (see end of this
> > email), the a1 = 10; assignment is optimized away.
>
> Your asm has no output operands and no side effects, with more aggressive
> optimization the whole ask would disappear.

No, for the record that's not supposed to happen for asms
*without outputs*.

"If an @code{asm} has output operands, GCC assumes for
optimization purposes the instruction has no side effects except
to change the output
operands."

> What you want is maybe something like
>
>    asm volatile ("scall" : : "r"(a1), "r"(a2));

For the code at hand, the scall should be described to both have
an output and be marked volatile, since the system call is a
side effect that GCC can't see and might otherwise optimize away
if the system call return value is unused.  A plain volatile
marking as the above should not be necessary, modulo gcc bugs.

The real problem is quite worrysome.  I don't think a port
(lm32) should have to solve it with constraints; the (inline)
function parameter *should* cause a non-clobbering temporary to
hold any intermediate operations, but it looks as if you'll
otherwise have to debug it yourself.

brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-01 21:30 ` Richard Henderson
@ 2011-08-02  6:37   ` Hans-Peter Nilsson
  2011-08-02  8:49     ` Mikael Pettersson
  0 siblings, 1 reply; 47+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-02  6:37 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Michael Walle, gcc

On Mon, 1 Aug 2011, Richard Henderson wrote:

> On 08/01/2011 01:30 PM, Michael Walle wrote:
> >  1) function inlining
> >  2) deferred argument evaluation
> >  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
> > function call to libgcc's __ashrsi3 (_in place_!)
> >  4) BAM! dead code elimination optimizes r8 assignment away because calli
> > may clobber r1-r10 (callee saved registers on lm32).
>
> I'm afraid the only solution I can think of is to force F1 out-of-line.

Or another temporary - but the parameter should already have
that effect.

brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-01 21:14   ` Michael Walle
@ 2011-08-02  6:47     ` Georg-Johann Lay
  0 siblings, 0 replies; 47+ messages in thread
From: Georg-Johann Lay @ 2011-08-02  6:47 UTC (permalink / raw)
  To: Michael Walle; +Cc: gcc

Michael Walle wrote:
> Hi,
> 
> That was quick :)
> 
>> Your asm has no output operands and no side effects, with more
>> aggressive optimization the whole ask would disappear.
> Sorry, that was just a small test file, the original code has output operands.
> 
> The new test code:
>  static int inline f1(int arg)
>  {
>    register int ret asm("r1");
>    register int a1 asm("r8") = 10;
>    register int a2 asm("r1") = arg;
> 
>    asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory");
> 
>    return ret;
>  }
> 
>  int f2(int arg1, int arg2)
>  {
>    return f1(arg1 >> 10);
>  }
> 
> translates to the same assembly:
> f2:
>         addi     sp, sp, -4
>         sw       (sp+4), ra
>         addi     r2, r0, 10
>         calli    __ashrsi3
>         scall
>         lw       ra, (sp+4)
>         addi     sp, sp, 4
>         b        ra
> 
> PS. R1 is the return register in the target architecture ABI.

I'd guess you ran into

http://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html#Local-Reg-Vars

A common pitfall is to initialize multiple call-clobbered registers
with arbitrary expressions,  where a function call or  library call
for an arithmetic operator will overwrite a register value from a
previous assignment, for example r0 below:

     register int *p1 asm ("r0") = ...;
     register int *p2 asm ("r1") = ...;

In those cases, a solution is to use a temporary variable for each
arbitrary expression.

So I'd try to rewrite it as

static int inline f1 (int arg0)
{
    int arg = arg0;
    register int ret asm("r1");
    register int a1 asm("r8") = 10;
    register int a2 asm("r1") = arg;

    asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory");

    return ret;
}

and if that does not help the rather hackish

static int inline f1 (int arg0)
{
    int arg = arg0;
    register int ret asm("r1");
    register int a1 asm("r8");
    register int a2 asm("r1");

    asm ("" : "+r" (arg));

    a1 = 10;
    a2 = arg;

    asm volatile ("scall" : "=r"(ret) : "r"(a1), "r"(a2) : "memory");

    return ret;
}

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02  6:37   ` Hans-Peter Nilsson
@ 2011-08-02  8:49     ` Mikael Pettersson
  2011-08-02  9:47       ` Richard Guenther
  0 siblings, 1 reply; 47+ messages in thread
From: Mikael Pettersson @ 2011-08-02  8:49 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Richard Henderson, Michael Walle, gcc

Hans-Peter Nilsson writes:
 > On Mon, 1 Aug 2011, Richard Henderson wrote:
 > 
 > > On 08/01/2011 01:30 PM, Michael Walle wrote:
 > > >  1) function inlining
 > > >  2) deferred argument evaluation
 > > >  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
 > > > function call to libgcc's __ashrsi3 (_in place_!)
 > > >  4) BAM! dead code elimination optimizes r8 assignment away because calli
 > > > may clobber r1-r10 (callee saved registers on lm32).
 > >
 > > I'm afraid the only solution I can think of is to force F1 out-of-line.
 > 
 > Or another temporary - but the parameter should already have
 > that effect.

It should, but doesn't.  See PR48863 for similar breakage on ARM.

/Mikael

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02  8:49     ` Mikael Pettersson
@ 2011-08-02  9:47       ` Richard Guenther
  2011-08-02 10:02         ` Georg-Johann Lay
  0 siblings, 1 reply; 47+ messages in thread
From: Richard Guenther @ 2011-08-02  9:47 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Hans-Peter Nilsson, Richard Henderson, Michael Walle, gcc

On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> Hans-Peter Nilsson writes:
>  > On Mon, 1 Aug 2011, Richard Henderson wrote:
>  >
>  > > On 08/01/2011 01:30 PM, Michael Walle wrote:
>  > > >  1) function inlining
>  > > >  2) deferred argument evaluation
>  > > >  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
>  > > > function call to libgcc's __ashrsi3 (_in place_!)
>  > > >  4) BAM! dead code elimination optimizes r8 assignment away because calli
>  > > > may clobber r1-r10 (callee saved registers on lm32).
>  > >
>  > > I'm afraid the only solution I can think of is to force F1 out-of-line.
>  >
>  > Or another temporary - but the parameter should already have
>  > that effect.
>
> It should, but doesn't.  See PR48863 for similar breakage on ARM.

On GIMPLE we don't see either the libcall nor those "dependencies".

Don't use register vars.

Richard.

> /Mikael
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02  9:47       ` Richard Guenther
@ 2011-08-02 10:02         ` Georg-Johann Lay
  2011-08-02 10:11           ` Richard Guenther
  0 siblings, 1 reply; 47+ messages in thread
From: Georg-Johann Lay @ 2011-08-02 10:02 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Hans-Peter Nilsson, Richard Henderson,
	Michael Walle, gcc

Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson <mikpe@it.uu.se> wrote:
>> Hans-Peter Nilsson writes:
>>  > On Mon, 1 Aug 2011, Richard Henderson wrote:
>>  >
>>  > > On 08/01/2011 01:30 PM, Michael Walle wrote:
>>  > > >  1) function inlining
>>  > > >  2) deferred argument evaluation
>>  > > >  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
>>  > > > function call to libgcc's __ashrsi3 (_in place_!)
>>  > > >  4) BAM! dead code elimination optimizes r8 assignment away because calli
>>  > > > may clobber r1-r10 (callee saved registers on lm32).
>>  > >
>>  > > I'm afraid the only solution I can think of is to force F1 out-of-line.
>>  >
>>  > Or another temporary - but the parameter should already have
>>  > that effect.
>>
>> It should, but doesn't.  See PR48863 for similar breakage on ARM.
> 
> On GIMPLE we don't see either the libcall nor those "dependencies".
> 
> Don't use register vars.

IMO such code is supposed to work, e.g. in order to write an interface
to a non-ABI assembler function.  In general this cannot be expressed
by means of constraints because that would imply plethora of different
constraints for each register/mode.

From the documentation a user will expect that he wrote correct code
and is not supposed to bother with GCC inerts like implicit library
calls or GIMPLE or whatever.

Johann

> Richard.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 10:02         ` Georg-Johann Lay
@ 2011-08-02 10:11           ` Richard Guenther
  2011-08-02 10:55             ` Michael Walle
  0 siblings, 1 reply; 47+ messages in thread
From: Richard Guenther @ 2011-08-02 10:11 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Mikael Pettersson, Hans-Peter Nilsson, Richard Henderson,
	Michael Walle, gcc

On Tue, Aug 2, 2011 at 12:01 PM, Georg-Johann Lay <avr@gjlay.de> wrote:
> Richard Guenther wrote:
>> On Tue, Aug 2, 2011 at 10:48 AM, Mikael Pettersson <mikpe@it.uu.se> wrote:
>>> Hans-Peter Nilsson writes:
>>>  > On Mon, 1 Aug 2011, Richard Henderson wrote:
>>>  >
>>>  > > On 08/01/2011 01:30 PM, Michael Walle wrote:
>>>  > > >  1) function inlining
>>>  > > >  2) deferred argument evaluation
>>>  > > >  3) because our target has no barrel shifter, (arg >> 10) is emitted as a
>>>  > > > function call to libgcc's __ashrsi3 (_in place_!)
>>>  > > >  4) BAM! dead code elimination optimizes r8 assignment away because calli
>>>  > > > may clobber r1-r10 (callee saved registers on lm32).
>>>  > >
>>>  > > I'm afraid the only solution I can think of is to force F1 out-of-line.
>>>  >
>>>  > Or another temporary - but the parameter should already have
>>>  > that effect.
>>>
>>> It should, but doesn't.  See PR48863 for similar breakage on ARM.
>>
>> On GIMPLE we don't see either the libcall nor those "dependencies".
>>
>> Don't use register vars.
>
> IMO such code is supposed to work, e.g. in order to write an interface
> to a non-ABI assembler function.  In general this cannot be expressed
> by means of constraints because that would imply plethora of different
> constraints for each register/mode.
>
> From the documentation a user will expect that he wrote correct code
> and is not supposed to bother with GCC inerts like implicit library
> calls or GIMPLE or whatever.

Well.  I suppose what is happening is that we expand from

  register int a1 __asm__ (*edi);
  register int a2 __asm__ (*eax);
  int D.2700;

<bb 2>:
  D.2700_2 = arg_1(D) >> 10;
  a1 = 10;
  a2 = D.2700_2;
  __asm__ __volatile__("scall" :  : "r" a1, "r" a2);
  return;

and end up TERing D.2700_2 = arg_1(D) >> 10, materializing a
libcall after the setting of a1 and before the asm.  To confirm that
try -fno-tree-ter.  I don't see how we can easily avoid this without
exposing libcalls at the gimple level.  Maybe disable TER if
we see any register variable use.

Richard.

> Johann
>
>> Richard.
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 10:11           ` Richard Guenther
@ 2011-08-02 10:55             ` Michael Walle
  2011-08-02 12:06               ` Mikael Pettersson
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Walle @ 2011-08-02 10:55 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Georg-Johann Lay, Mikael Pettersson, Hans-Peter Nilsson,
	Richard Henderson, Michael Walle, gcc


Hi,

> To confirm that try -fno-tree-ter.

"lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
assembly code:

f2:
	addi     sp, sp, -4
	sw       (sp+4), ra
	addi     r2, r0, 10
	calli    __ashrsi3
	addi     r8, r0, 10
	scall
	lw       ra, (sp+4)
	addi     sp, sp, 4
	b        ra

-- 
Michael

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 10:55             ` Michael Walle
@ 2011-08-02 12:06               ` Mikael Pettersson
  2011-08-02 12:23                 ` Richard Guenther
  0 siblings, 1 reply; 47+ messages in thread
From: Mikael Pettersson @ 2011-08-02 12:06 UTC (permalink / raw)
  To: Michael Walle
  Cc: Richard Guenther, Georg-Johann Lay, Mikael Pettersson,
	Hans-Peter Nilsson, Richard Henderson, gcc

Michael Walle writes:
 > 
 > Hi,
 > 
 > > To confirm that try -fno-tree-ter.
 > 
 > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
 > assembly code:
 > 
 > f2:
 > 	addi     sp, sp, -4
 > 	sw       (sp+4), ra
 > 	addi     r2, r0, 10
 > 	calli    __ashrsi3
 > 	addi     r8, r0, 10
 > 	scall
 > 	lw       ra, (sp+4)
 > 	addi     sp, sp, 4
 > 	b        ra

-fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 12:06               ` Mikael Pettersson
@ 2011-08-02 12:23                 ` Richard Guenther
  2011-08-02 12:36                   ` Georg-Johann Lay
                                     ` (5 more replies)
  0 siblings, 6 replies; 47+ messages in thread
From: Richard Guenther @ 2011-08-02 12:23 UTC (permalink / raw)
  To: Mikael Pettersson
  Cc: Michael Walle, Georg-Johann Lay, Hans-Peter Nilsson,
	Richard Henderson, gcc

On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> Michael Walle writes:
>  >
>  > Hi,
>  >
>  > > To confirm that try -fno-tree-ter.
>  >
>  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
>  > assembly code:
>  >
>  > f2:
>  >      addi     sp, sp, -4
>  >      sw       (sp+4), ra
>  >      addi     r2, r0, 10
>  >      calli    __ashrsi3
>  >      addi     r8, r0, 10
>  >      scall
>  >      lw       ra, (sp+4)
>  >      addi     sp, sp, 4
>  >      b        ra
>
> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.

It's of course only a workaround, not a real fix as nothing prevents
other optimizers from performing the re-scheduling TER does.

I suggest to amend the documentation for local call-clobbered register
variables to say that the only valid sequence using them is from a
non-inlinable function that contains only direct initializations of the
register variables from constants or parameters.

Or go one step further and deprecate local register variables alltogether
(they IMHO don't make much sense, and rather the targets should provide
a way to properly constrain asm inputs and outputs).

Richard.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 12:23                 ` Richard Guenther
@ 2011-08-02 12:36                   ` Georg-Johann Lay
  2011-08-02 12:54                   ` Hans-Peter Nilsson
                                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 47+ messages in thread
From: Georg-Johann Lay @ 2011-08-02 12:36 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Michael Walle, Hans-Peter Nilsson,
	Richard Henderson, gcc

Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
>> Michael Walle writes:
>>  >
>>  > Hi,
>>  >
>>  > > To confirm that try -fno-tree-ter.
>>  >
>>  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
>>  > assembly code:
>>  >
>>  > f2:
>>  >      addi     sp, sp, -4
>>  >      sw       (sp+4), ra
>>  >      addi     r2, r0, 10
>>  >      calli    __ashrsi3
>>  >      addi     r8, r0, 10
>>  >      scall
>>  >      lw       ra, (sp+4)
>>  >      addi     sp, sp, 4
>>  >      b        ra
>>
>> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
> 
> It's of course only a workaround, not a real fix as nothing prevents
> other optimizers from performing the re-scheduling TER does.
> 
> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.
> 
> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).

Strongly oppose.

Local register variables are very useful; maybe not on a linux machine
but on embedded systems there are situations you cannot do without.

You ever counted the constraint alternatives that that would be needed?

You'd need different constraints for QI/HI/SI/DI for each register
resulting in myriads of register classes increasing register allocation
time and dumps would become impossible to read.  I once tried it for
a target...never again.

Besides that, with local register vars the developer can write code that
meets his requirements, whereas for constraints you will always have to
change the compiler and make existing sources incompatible.

If GCC provides such a feature it must work properly and not be sacrifices
for this or that optimization.

Correct code is always preferred over non-functional code.

Johann

> 
> Richard.
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 12:23                 ` Richard Guenther
  2011-08-02 12:36                   ` Georg-Johann Lay
@ 2011-08-02 12:54                   ` Hans-Peter Nilsson
  2011-08-02 13:09                     ` Richard Guenther
  2011-08-02 13:23                   ` Ian Lance Taylor
                                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 47+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-02 12:54 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Richard Henderson, gcc

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2182 bytes --]

On Tue, 2 Aug 2011, Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> > Michael Walle writes:
> >  >
> >  > Hi,
> >  >
> >  > > To confirm that try -fno-tree-ter.
> >  >
> >  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
> >  > assembly code:
> >  >
> >  > f2:
> >  >      addi     sp, sp, -4
> >  >      sw       (sp+4), ra
> >  >      addi     r2, r0, 10
> >  >      calli    __ashrsi3
> >  >      addi     r8, r0, 10
> >  >      scall
> >  >      lw       ra, (sp+4)
> >  >      addi     sp, sp, 4
> >  >      b        ra
> >
> > -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
>
> It's of course only a workaround, not a real fix as nothing prevents
> other optimizers from performing the re-scheduling TER does.
>
> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.

I'd be ok with that, FWIW; I see the problem with keeping the
scheduling of operations in a working order (yuck) and I don't
see how else to keep it working ...except perhaps make gcc flag
functions with register asms as non-inlinable, maybe even flag
down any of the dangerous re-scheduling?

Maybe I can do that with some hand-holding?

> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).

They do make sense when implementing e.g. system calls, and
they're documented to work as discussed.  (I almost regret
making that happen, though.)  Fortunately such functions are
small, and not relatively much helped by inlining (it's a
*syscall*; much more happening beyond the call than is affected
by inlining some parameter initialization).  Sure, new targets
are much better off by implementing that through other means,
but preferably intrinsic functions to asms.

brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 12:54                   ` Hans-Peter Nilsson
@ 2011-08-02 13:09                     ` Richard Guenther
  2011-08-02 13:16                       ` Hans-Peter Nilsson
  2011-08-03  4:59                       ` Miles Bader
  0 siblings, 2 replies; 47+ messages in thread
From: Richard Guenther @ 2011-08-02 13:09 UTC (permalink / raw)
  To: Hans-Peter Nilsson
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Richard Henderson, gcc

On Tue, Aug 2, 2011 at 2:53 PM, Hans-Peter Nilsson <hp@bitrange.com> wrote:
> On Tue, 2 Aug 2011, Richard Guenther wrote:
>> On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
>> > Michael Walle writes:
>> >  >
>> >  > Hi,
>> >  >
>> >  > > To confirm that try -fno-tree-ter.
>> >  >
>> >  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
>> >  > assembly code:
>> >  >
>> >  > f2:
>> >  >      addi     sp, sp, -4
>> >  >      sw       (sp+4), ra
>> >  >      addi     r2, r0, 10
>> >  >      calli    __ashrsi3
>> >  >      addi     r8, r0, 10
>> >  >      scall
>> >  >      lw       ra, (sp+4)
>> >  >      addi     sp, sp, 4
>> >  >      b        ra
>> >
>> > -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
>>
>> It's of course only a workaround, not a real fix as nothing prevents
>> other optimizers from performing the re-scheduling TER does.
>>
>> I suggest to amend the documentation for local call-clobbered register
>> variables to say that the only valid sequence using them is from a
>> non-inlinable function that contains only direct initializations of the
>> register variables from constants or parameters.
>
> I'd be ok with that, FWIW; I see the problem with keeping the
> scheduling of operations in a working order (yuck) and I don't
> see how else to keep it working ...except perhaps make gcc flag
> functions with register asms as non-inlinable, maybe even flag
> down any of the dangerous re-scheduling?

But then can't people use a pure assembler stub instead?  Without
inlining there isn't much benefit left from writing

 void f1(int arg)
 {
  register int a1 asm("r8") = 10;
  register int a2 asm("r1") = arg;

  asm("scall" : : "r"(a1), "r"(a2));
 }

instead of

f1:
 mov r8, 10
 mov r1, rX
 scall
 ret

in a .s file no?  I doubt much prologue/epilogue is needed.

Or even write

void f1(int arg)
{
 asm("mov r8, %0; mov r1 %1; scall;" : : "g"(a1), "g"(a2) : "r8", "r1");
}

which should be inlinable again (yes, in inlined for not optimally
register-allocated, but compared to the non-inline routine?).

Richard.

> Maybe I can do that with some hand-holding?
>
>> Or go one step further and deprecate local register variables alltogether
>> (they IMHO don't make much sense, and rather the targets should provide
>> a way to properly constrain asm inputs and outputs).
>
> They do make sense when implementing e.g. system calls, and
> they're documented to work as discussed.  (I almost regret
> making that happen, though.)  Fortunately such functions are
> small, and not relatively much helped by inlining (it's a
> *syscall*; much more happening beyond the call than is affected
> by inlining some parameter initialization).  Sure, new targets
> are much better off by implementing that through other means,
> but preferably intrinsic functions to asms.
>
> brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 13:09                     ` Richard Guenther
@ 2011-08-02 13:16                       ` Hans-Peter Nilsson
  2011-08-03  4:59                       ` Miles Bader
  1 sibling, 0 replies; 47+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-02 13:16 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Richard Henderson, gcc

On Tue, 2 Aug 2011, Richard Guenther wrote:
> > I'd be ok with that, FWIW; I see the problem with keeping the
> > scheduling of operations in a working order (yuck) and I don't
> > see how else to keep it working ...except perhaps make gcc flag
> > functions with register asms as non-inlinable, maybe even flag
> > down any of the dangerous re-scheduling?
>
> But then can't people use a pure assembler stub instead?

I see your point, but you're thinking new code, I'm thinking
let's keep existing code used by several targets and documented
as working the last seven years working.  Maybe breakable with a
major release; in gcc-5.  (Oh no, I see what's coming. :)

brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 12:23                 ` Richard Guenther
  2011-08-02 12:36                   ` Georg-Johann Lay
  2011-08-02 12:54                   ` Hans-Peter Nilsson
@ 2011-08-02 13:23                   ` Ian Lance Taylor
  2011-08-02 13:42                     ` Richard Guenther
  2011-08-02 16:03                   ` Richard Henderson
                                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 47+ messages in thread
From: Ian Lance Taylor @ 2011-08-02 13:23 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Hans-Peter Nilsson, Richard Henderson, gcc

Richard Guenther <richard.guenther@gmail.com> writes:

> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).

No, local register variables are documented as working and many programs
rely on them.  They are a straightforward way to get an asm argument in
a specific register, and I don't see any reason to break that.

> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.

Let's just implement those requirements in the compiler itself.

Ian

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 13:23                   ` Ian Lance Taylor
@ 2011-08-02 13:42                     ` Richard Guenther
  2011-08-02 14:35                       ` Ian Lance Taylor
  2011-08-03  9:12                       ` Ulrich Weigand
  0 siblings, 2 replies; 47+ messages in thread
From: Richard Guenther @ 2011-08-02 13:42 UTC (permalink / raw)
  To: Ian Lance Taylor
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Hans-Peter Nilsson, Richard Henderson, gcc

On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor <iant@google.com> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>
>> Or go one step further and deprecate local register variables alltogether
>> (they IMHO don't make much sense, and rather the targets should provide
>> a way to properly constrain asm inputs and outputs).
>
> No, local register variables are documented as working and many programs
> rely on them.  They are a straightforward way to get an asm argument in
> a specific register, and I don't see any reason to break that.

Well, maybe they look like so.  But in reality there is _no_ connection
from the register setup to the actual asm.  Which is the problem the
compiler faces here (apart from the libcall issue).  If there should be
an implicit dependence of all asms to all local register var setters
and users then this isn't implemented on gimple (or rather it works
by chance there as we treat register vars as memory and do not
disambiguate anything across asms (yet)).

>> I suggest to amend the documentation for local call-clobbered register
>> variables to say that the only valid sequence using them is from a
>> non-inlinable function that contains only direct initializations of the
>> register variables from constants or parameters.
>
> Let's just implement those requirements in the compiler itself.

Doesn't work for existing code, no?  And if thinking new code then
I'd rather have explicit dependences (and a way to represent them).
Thus, for example

asm ("scall" : : "asm("r0")" (10), ...)

thus, why force new constraints when we already can figure out
local register vars by register name?  Why not extend the constraint
syntax somehow to allow specifying the same effect?

Richard.

> Ian
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 13:42                     ` Richard Guenther
@ 2011-08-02 14:35                       ` Ian Lance Taylor
  2011-08-03  9:12                       ` Ulrich Weigand
  1 sibling, 0 replies; 47+ messages in thread
From: Ian Lance Taylor @ 2011-08-02 14:35 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Hans-Peter Nilsson, Richard Henderson, gcc

Richard Guenther <richard.guenther@gmail.com> writes:

> On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor <iant@google.com> wrote:
>> Richard Guenther <richard.guenther@gmail.com> writes:
>>
>>> Or go one step further and deprecate local register variables alltogether
>>> (they IMHO don't make much sense, and rather the targets should provide
>>> a way to properly constrain asm inputs and outputs).
>>
>> No, local register variables are documented as working and many programs
>> rely on them.  They are a straightforward way to get an asm argument in
>> a specific register, and I don't see any reason to break that.
>
> Well, maybe they look like so.  But in reality there is _no_ connection
> from the register setup to the actual asm.  Which is the problem the
> compiler faces here (apart from the libcall issue).  If there should be
> an implicit dependence of all asms to all local register var setters
> and users then this isn't implemented on gimple (or rather it works
> by chance there as we treat register vars as memory and do not
> disambiguate anything across asms (yet)).

I'm not sure why we need to do anything at the GIMPLE level other than
disable some optimizations.  There is a connection from the register
variable to the asm--the asm refers to the variable.  There is nothing
specific about the register in there, but at the GIMPLE level there
doesn't have to be.

We should not break a useful existing feature because we find it
inconvenient.  Let's just disable some optimizations so that it
continues to work.


>>> I suggest to amend the documentation for local call-clobbered register
>>> variables to say that the only valid sequence using them is from a
>>> non-inlinable function that contains only direct initializations of the
>>> register variables from constants or parameters.
>>
>> Let's just implement those requirements in the compiler itself.
>
> Doesn't work for existing code, no?

Why not?


> And if thinking new code then
> I'd rather have explicit dependences (and a way to represent them).
> Thus, for example
>
> asm ("scall" : : "asm("r0")" (10), ...)
>
> thus, why force new constraints when we already can figure out
> local register vars by register name?  Why not extend the constraint
> syntax somehow to allow specifying the same effect?

I agree that it would be a good idea to permit asms to indicate the
specific register the operand should go into.

Ian

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 12:23                 ` Richard Guenther
                                     ` (2 preceding siblings ...)
  2011-08-02 13:23                   ` Ian Lance Taylor
@ 2011-08-02 16:03                   ` Richard Henderson
  2011-08-02 20:10                     ` Richard Guenther
  2011-08-02 17:21                   ` Georg-Johann Lay
  2011-08-09 16:55                   ` Richard Earnshaw
  5 siblings, 1 reply; 47+ messages in thread
From: Richard Henderson @ 2011-08-02 16:03 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Hans-Peter Nilsson, gcc

On 08/02/2011 05:22 AM, Richard Guenther wrote:
>> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
> 
> It's of course only a workaround, not a real fix as nothing prevents
> other optimizers from performing the re-scheduling TER does.
> 
> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.
> 
> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).

Neither of these is a viable option.

What we might be able to do is throttle TER when the destination
is a local register variable.  This should unbreak the common case
of local regs immediately surrounding an asm.


r~

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 12:23                 ` Richard Guenther
                                     ` (3 preceding siblings ...)
  2011-08-02 16:03                   ` Richard Henderson
@ 2011-08-02 17:21                   ` Georg-Johann Lay
  2011-08-09 16:55                   ` Richard Earnshaw
  5 siblings, 0 replies; 47+ messages in thread
From: Georg-Johann Lay @ 2011-08-02 17:21 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Michael Walle, Hans-Peter Nilsson,
	Richard Henderson, gcc

Richard Guenther schrieb:
> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.
> 
> Richard.

That's completely counterproductive.

If a developer invents asm or local register variables he has a
very good reason for that choice like to meet hard (with hard as
in HARD) real time constraints.  Disabling inlining a function
that uses local register vars would make many places of local
register vars unusable because thre would no more be a way to
write down the exact register usage footprint of a piece of
asm.  Typical use-cases are interfacing to a function that has a
smaller register footprint than an ordinary block-box function or
doing some arithmetic that needs special hard regs for which there
are no fitting constraints. All this will be impossible if inlining
is disabled for such functions; then it is no more possible to
describe such a low-overhead piece of code without calling a black box 
function, clobber all call-clobbered registers, render a maybe 
tail-function into a non tail-call function etc.

Embedded systems like ARM or PowerPC based get more and more
important over the years and GCC should put more attention to
them and their needs; not only to bolide servers/PCs.
This includes systems with hard real time constraints.

Johann

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 16:03                   ` Richard Henderson
@ 2011-08-02 20:10                     ` Richard Guenther
  0 siblings, 0 replies; 47+ messages in thread
From: Richard Guenther @ 2011-08-02 20:10 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Hans-Peter Nilsson, gcc

On Tue, Aug 2, 2011 at 6:02 PM, Richard Henderson <rth@redhat.com> wrote:
> On 08/02/2011 05:22 AM, Richard Guenther wrote:
>>> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
>>
>> It's of course only a workaround, not a real fix as nothing prevents
>> other optimizers from performing the re-scheduling TER does.
>>
>> I suggest to amend the documentation for local call-clobbered register
>> variables to say that the only valid sequence using them is from a
>> non-inlinable function that contains only direct initializations of the
>> register variables from constants or parameters.
>>
>> Or go one step further and deprecate local register variables alltogether
>> (they IMHO don't make much sense, and rather the targets should provide
>> a way to properly constrain asm inputs and outputs).
>
> Neither of these is a viable option.
>
> What we might be able to do is throttle TER when the destination
> is a local register variable.  This should unbreak the common case
> of local regs immediately surrounding an asm.

Sure, similar to disabling TER for functions containing such vars.
But it isn't a solution for the general issue that nothing prevents
scheduling gimple statements between register variable def and use.

Richard.

>
> r~
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 13:09                     ` Richard Guenther
  2011-08-02 13:16                       ` Hans-Peter Nilsson
@ 2011-08-03  4:59                       ` Miles Bader
  1 sibling, 0 replies; 47+ messages in thread
From: Miles Bader @ 2011-08-03  4:59 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Hans-Peter Nilsson, Mikael Pettersson, Michael Walle,
	Georg-Johann Lay, Richard Henderson, gcc

Richard Guenther <richard.guenther@gmail.com> writes:
> But then can't people use a pure assembler stub instead?  Without
> inlining there isn't much benefit left from writing
>
>  void f1(int arg)
>  {
>   register int a1 asm("r8") = 10;
>   register int a2 asm("r1") = arg;
>
>   asm("scall" : : "r"(a1), "r"(a2));
>  }
>
> instead of
>
> f1:
>  mov r8, 10
>  mov r1, rX
>  scall
>  ret
>
> in a .s file no?  I doubt much prologue/epilogue is needed.
>
> Or even write
>
> void f1(int arg)
> {
>  asm("mov r8, %0; mov r1 %1; scall;" : : "g"(a1), "g"(a2) : "r8", "r1");
> }

Of course in practice people _do_ want to use it with f1 inlined, where
using reg variables (or alternatively, some expanded constraint language
for the asm parameters) can really get rid of tons of unnecessary asm
moves, and they want the compiler to guard against conflicts.

-Miles

-- 
"Whatever you do will be insignificant, but it is very important that
 you do it."  Mahatma Gandhi

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 13:42                     ` Richard Guenther
  2011-08-02 14:35                       ` Ian Lance Taylor
@ 2011-08-03  9:12                       ` Ulrich Weigand
  2011-08-03  9:51                         ` Georg-Johann Lay
  2011-08-04  0:20                         ` Hans-Peter Nilsson
  1 sibling, 2 replies; 47+ messages in thread
From: Ulrich Weigand @ 2011-08-03  9:12 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Ian Lance Taylor, Mikael Pettersson, Michael Walle,
	Georg-Johann Lay, Hans-Peter Nilsson, Richard Henderson, gcc

Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor <iant@google.com> wrote:
> > Richard Guenther <richard.guenther@gmail.com> writes:
> >> I suggest to amend the documentation for local call-clobbered register
> >> variables to say that the only valid sequence using them is from a
> >> non-inlinable function that contains only direct initializations of the
> >> register variables from constants or parameters.
> >
> > Let's just implement those requirements in the compiler itself.
> 
> Doesn't work for existing code, no?  And if thinking new code then
> I'd rather have explicit dependences (and a way to represent them).
> Thus, for example
> 
> asm ("scall" : : "asm("r0")" (10), ...)
> 
> thus, why force new constraints when we already can figure out
> local register vars by register name?  Why not extend the constraint
> syntax somehow to allow specifying the same effect?

Maybe it would be possible to implement this while keeping the syntax
of existing code by (re-)defining the semantics of register asm to
basically say that:

 If a variable X is declared as register asm for register Y, and X
 is later on used as operand to an inline asm, the register allocator
 will choose register Y to hold that asm operand.  (And this is the
 full specification of register asm semantics, nothing beyond this
 is guaranteed.)

It seems this semantics could be implemented very early on, probably
in the frontend itself.  The frontend would mark the *asm* statement
as using the specified register (there would be no special handling
of the *variable* as such, after the frontend is done).  The optimizers
would then simply be required to pass the asm-statement register
annotations though, much like today they pass constraints through.
At the point where register allocation decisions are made, those
register annotations would then be acted on.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-03  9:12                       ` Ulrich Weigand
@ 2011-08-03  9:51                         ` Georg-Johann Lay
  2011-08-03 10:04                           ` Richard Guenther
  2011-08-04  0:20                         ` Hans-Peter Nilsson
  1 sibling, 1 reply; 47+ messages in thread
From: Georg-Johann Lay @ 2011-08-03  9:51 UTC (permalink / raw)
  To: Ulrich Weigand
  Cc: Richard Guenther, Ian Lance Taylor, Mikael Pettersson,
	Michael Walle, Hans-Peter Nilsson, Richard Henderson, gcc

Ulrich Weigand wrote:
> Richard Guenther wrote:
>> On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor <iant@google.com> wrote:
>>> Richard Guenther <richard.guenther@gmail.com> writes:
>>>> I suggest to amend the documentation for local call-clobbered register
>>>> variables to say that the only valid sequence using them is from a
>>>> non-inlinable function that contains only direct initializations of the
>>>> register variables from constants or parameters.
>>> Let's just implement those requirements in the compiler itself.
>> Doesn't work for existing code, no?  And if thinking new code then
>> I'd rather have explicit dependences (and a way to represent them).
>> Thus, for example
>>
>> asm ("scall" : : "asm("r0")" (10), ...)
>>
>> thus, why force new constraints when we already can figure out
>> local register vars by register name?  Why not extend the constraint
>> syntax somehow to allow specifying the same effect?

Yes this would be exact equivalence of

  register int var asm ("r0") = 10;
  ...
  asm ("scall" : : "r" (var), ...)


> Maybe it would be possible to implement this while keeping the syntax
> of existing code by (re-)defining the semantics of register asm to
> basically say that:
> 
>  If a variable X is declared as register asm for register Y, and X
>  is later on used as operand to an inline asm, the register allocator
>  will choose register Y to hold that asm operand.  (And this is the
>  full specification of register asm semantics, nothing beyond this
>  is guaranteed.)

Yes, that's reasonable.  As I understand the docs, in code like

void foo ()
{
   register int var asm ("r1") = 10;
   asm (";; use r1");
}

there is nothing that connects var to the asm and assuming that
r1 holds 10 in the asm is a user error.

The only place where the asm attached to a variable needs to have
effect are the inline asm sequences that explicitly refer to
respective variables.  If there is no inline asm referencing a
local register variable, there is on difference to a non-register
auto variable; there could even be a warning that in such a case
that

   register int var asm ("r1") = 10;

is equivalent to

   int var = 10;

> It seems this semantics could be implemented very early on, probably
> in the frontend itself.  The frontend would mark the *asm* statement
> as using the specified register (there would be no special handling
> of the *variable* as such, after the frontend is done).  The optimizers
> would then simply be required to pass the asm-statement register
> annotations though, much like today they pass constraints through.
> At the point where register allocation decisions are made, those
> register annotations would then be acted on.
> 
> Bye,
> Ulrich

I wonder why it does not work like that in the current implementation.
Local register variable is just like using a similar constraint
(with the only difference that in general there is no such constraint,
otherwise the developer would use it). A pass like .asmcons could
take care of it just the same way it does for constraints and no
optimizer passed would have to bother if a variable is a local register
or not.

This would render local register variables even more functional
because no one needed to care if there were implicit library calls
or things like that.

Johann

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-03  9:51                         ` Georg-Johann Lay
@ 2011-08-03 10:04                           ` Richard Guenther
  2011-08-03 13:27                             ` Michael Matz
  0 siblings, 1 reply; 47+ messages in thread
From: Richard Guenther @ 2011-08-03 10:04 UTC (permalink / raw)
  To: Georg-Johann Lay
  Cc: Ulrich Weigand, Ian Lance Taylor, Mikael Pettersson,
	Michael Walle, Hans-Peter Nilsson, Richard Henderson, gcc

On Wed, Aug 3, 2011 at 11:50 AM, Georg-Johann Lay <avr@gjlay.de> wrote:
> Ulrich Weigand wrote:
>> Richard Guenther wrote:
>>> On Tue, Aug 2, 2011 at 3:23 PM, Ian Lance Taylor <iant@google.com> wrote:
>>>> Richard Guenther <richard.guenther@gmail.com> writes:
>>>>> I suggest to amend the documentation for local call-clobbered register
>>>>> variables to say that the only valid sequence using them is from a
>>>>> non-inlinable function that contains only direct initializations of the
>>>>> register variables from constants or parameters.
>>>> Let's just implement those requirements in the compiler itself.
>>> Doesn't work for existing code, no?  And if thinking new code then
>>> I'd rather have explicit dependences (and a way to represent them).
>>> Thus, for example
>>>
>>> asm ("scall" : : "asm("r0")" (10), ...)
>>>
>>> thus, why force new constraints when we already can figure out
>>> local register vars by register name?  Why not extend the constraint
>>> syntax somehow to allow specifying the same effect?
>
> Yes this would be exact equivalence of
>
>  register int var asm ("r0") = 10;
>  ...
>  asm ("scall" : : "r" (var), ...)
>
>
>> Maybe it would be possible to implement this while keeping the syntax
>> of existing code by (re-)defining the semantics of register asm to
>> basically say that:
>>
>>  If a variable X is declared as register asm for register Y, and X
>>  is later on used as operand to an inline asm, the register allocator
>>  will choose register Y to hold that asm operand.  (And this is the
>>  full specification of register asm semantics, nothing beyond this
>>  is guaranteed.)
>
> Yes, that's reasonable.  As I understand the docs, in code like
>
> void foo ()
> {
>   register int var asm ("r1") = 10;
>   asm (";; use r1");
> }
>
> there is nothing that connects var to the asm and assuming that
> r1 holds 10 in the asm is a user error.
>
> The only place where the asm attached to a variable needs to have
> effect are the inline asm sequences that explicitly refer to
> respective variables.  If there is no inline asm referencing a
> local register variable, there is on difference to a non-register
> auto variable; there could even be a warning that in such a case
> that
>
>   register int var asm ("r1") = 10;
>
> is equivalent to
>
>   int var = 10;
>
>> It seems this semantics could be implemented very early on, probably
>> in the frontend itself.  The frontend would mark the *asm* statement
>> as using the specified register (there would be no special handling
>> of the *variable* as such, after the frontend is done).  The optimizers
>> would then simply be required to pass the asm-statement register
>> annotations though, much like today they pass constraints through.
>> At the point where register allocation decisions are made, those
>> register annotations would then be acted on.
>>
>> Bye,
>> Ulrich
>
> I wonder why it does not work like that in the current implementation.
> Local register variable is just like using a similar constraint
> (with the only difference that in general there is no such constraint,
> otherwise the developer would use it). A pass like .asmcons could
> take care of it just the same way it does for constraints and no
> optimizer passed would have to bother if a variable is a local register
> or not.
>
> This would render local register variables even more functional
> because no one needed to care if there were implicit library calls
> or things like that.

Yes, I like that idea.

Richard.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-03 10:04                           ` Richard Guenther
@ 2011-08-03 13:27                             ` Michael Matz
  2011-08-03 14:02                               ` Richard Guenther
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Matz @ 2011-08-03 13:27 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Georg-Johann Lay, Ulrich Weigand, Ian Lance Taylor,
	Mikael Pettersson, Michael Walle, Hans-Peter Nilsson,
	Richard Henderson, gcc

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1704 bytes --]

Hi,

On Wed, 3 Aug 2011, Richard Guenther wrote:

> > Yes, that's reasonable.  As I understand the docs, in code like
> >
> > void foo ()
> > {
> >   register int var asm ("r1") = 10;
> >   asm (";; use r1");
> > }
> >
> > there is nothing that connects var to the asm and assuming that
> > r1 holds 10 in the asm is a user error.
> >
> > The only place where the asm attached to a variable needs to have
> > effect are the inline asm sequences that explicitly refer to
> > respective variables.  If there is no inline asm referencing a
> > local register variable, there is on difference to a non-register
> > auto variable; there could even be a warning that in such a case
> > that
> >
> >   register int var asm ("r1") = 10;
> >
> > is equivalent to
> >
> >   int var = 10;
> >
> > This would render local register variables even more functional 
> > because no one needed to care if there were implicit library calls or 
> > things like that.
> 
> Yes, I like that idea.

I do too.  Except it doesn't work :)

There's a common idiom of accessing registers read-only by declaring local 
register vars.  E.g. to (*grasp*) the stack pointer.  There won't be a DEF 
for that register var, and hence at use-points we couldn't reload any 
sensible values into those registers (and we really shouldn't clobber the 
stack pointer in this way).

We could introduce that special semantic only for non-reserved registers, 
and require no writes to register vars for reserved registers.

Or we could simply do:

  if (any_local_reg_vars)
    optimize = 0;

But I already see people wanting to _do_ optimization also with local reg 
vars, "just not the wrong optimizations" ;-/


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-03 13:27                             ` Michael Matz
@ 2011-08-03 14:02                               ` Richard Guenther
  2011-08-03 14:55                                 ` Georg-Johann Lay
  2011-08-03 15:05                                 ` Richard Henderson
  0 siblings, 2 replies; 47+ messages in thread
From: Richard Guenther @ 2011-08-03 14:02 UTC (permalink / raw)
  To: Michael Matz
  Cc: Georg-Johann Lay, Ulrich Weigand, Ian Lance Taylor,
	Mikael Pettersson, Michael Walle, Hans-Peter Nilsson,
	Richard Henderson, gcc

On Wed, Aug 3, 2011 at 3:27 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Wed, 3 Aug 2011, Richard Guenther wrote:
>
>> > Yes, that's reasonable.  As I understand the docs, in code like
>> >
>> > void foo ()
>> > {
>> >   register int var asm ("r1") = 10;
>> >   asm (";; use r1");
>> > }
>> >
>> > there is nothing that connects var to the asm and assuming that
>> > r1 holds 10 in the asm is a user error.
>> >
>> > The only place where the asm attached to a variable needs to have
>> > effect are the inline asm sequences that explicitly refer to
>> > respective variables.  If there is no inline asm referencing a
>> > local register variable, there is on difference to a non-register
>> > auto variable; there could even be a warning that in such a case
>> > that
>> >
>> >   register int var asm ("r1") = 10;
>> >
>> > is equivalent to
>> >
>> >   int var = 10;
>> >
>> > This would render local register variables even more functional
>> > because no one needed to care if there were implicit library calls or
>> > things like that.
>>
>> Yes, I like that idea.
>
> I do too.  Except it doesn't work :)
>
> There's a common idiom of accessing registers read-only by declaring local
> register vars.  E.g. to (*grasp*) the stack pointer.  There won't be a DEF
> for that register var, and hence at use-points we couldn't reload any
> sensible values into those registers (and we really shouldn't clobber the
> stack pointer in this way).
>
> We could introduce that special semantic only for non-reserved registers,
> and require no writes to register vars for reserved registers.
>
> Or we could simply do:
>
>  if (any_local_reg_vars)
>    optimize = 0;
>
> But I already see people wanting to _do_ optimization also with local reg
> vars, "just not the wrong optimizations" ;-/

I'd say we should start rejecting all these bogus constructs by default
(maybe accepting them with -fpermissive and then, well, maybe generate
some dwim code).  That is, local register var decls are only valid
with an initializer, they are implicitly constant (you can't re-assign to them).
Reserved registers are a no-go (like %esp), either global or local.

Richard.

>
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-03 14:02                               ` Richard Guenther
@ 2011-08-03 14:55                                 ` Georg-Johann Lay
  2011-08-03 15:05                                 ` Richard Henderson
  1 sibling, 0 replies; 47+ messages in thread
From: Georg-Johann Lay @ 2011-08-03 14:55 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Michael Matz, Ulrich Weigand, Ian Lance Taylor,
	Mikael Pettersson, Michael Walle, Hans-Peter Nilsson,
	Richard Henderson, gcc

Richard Guenther wrote:
> On Wed, Aug 3, 2011 at 3:27 PM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Wed, 3 Aug 2011, Richard Guenther wrote:
>>
>>>> Yes, that's reasonable.  As I understand the docs, in code like
>>>>
>>>> void foo ()
>>>> {
>>>>   register int var asm ("r1") = 10;
>>>>   asm (";; use r1");
>>>> }
>>>>
>>>> there is nothing that connects var to the asm and assuming that
>>>> r1 holds 10 in the asm is a user error.
>>>>
>>>> The only place where the asm attached to a variable needs to have
>>>> effect are the inline asm sequences that explicitly refer to
>>>> respective variables.  If there is no inline asm referencing a
>>>> local register variable, there is on difference to a non-register
>>>> auto variable; there could even be a warning that in such a case
>>>> that
>>>>
>>>>   register int var asm ("r1") = 10;
>>>>
>>>> is equivalent to
>>>>
>>>>   int var = 10;
>>>>
>>>> This would render local register variables even more functional
>>>> because no one needed to care if there were implicit library calls or
>>>> things like that.
>>> Yes, I like that idea.
>> I do too.  Except it doesn't work :)
>>
>> There's a common idiom of accessing registers read-only by declaring local
>> register vars.  E.g. to (*grasp*) the stack pointer.  There won't be a DEF
>> for that register var, and hence at use-points we couldn't reload any
>> sensible values into those registers (and we really shouldn't clobber the
>> stack pointer in this way).
>>
>> We could introduce that special semantic only for non-reserved registers,
>> and require no writes to register vars for reserved registers.
>>
>> Or we could simply do:
>>
>>  if (any_local_reg_vars)
>>    optimize = 0;
>>
>> But I already see people wanting to _do_ optimization also with local reg
>> vars, "just not the wrong optimizations" ;-/

Definitely yes.  As I wrote above, if you see asm it's not unlikely that it
is  a piece of performance critical code.

> I'd say we should start rejecting all these bogus constructs by default
> (maybe accepting them with -fpermissive and then, well, maybe generate
> some dwim code).  That is, local register var decls are only valid
> with an initializer, they are implicitly constant (you can't re-assign to them).
> Reserved registers are a no-go (like %esp), either global or local.

Would that help? Like in code

static inline void foo (int arg)
{
   register const int reg asm ("r1") = arg;
   asm ("..."::"r"(reg));
}

And with output constraints like "=r,0" or "+r".  Or in local blocks:

static inline void foo (int arg)
{
   register const int reg asm ("r1") = arg;

   ...
   {
       register const int reg2 asm ("r1") = reg;
       asm ("..."::"r"(reg2));
   }
}



Do the current optimizers shred inline asm with ordinary constraints
but without local registers?

If yes, there is a considerable problem in the optimizers and/or in GCC.

If not, why can't local register variables work similarly, i.e. propagate
the register information into respective asms and forget about it for
the variables?

Johann

> Richard.
> 
>> Ciao,
>> Michael.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-03 14:02                               ` Richard Guenther
  2011-08-03 14:55                                 ` Georg-Johann Lay
@ 2011-08-03 15:05                                 ` Richard Henderson
  1 sibling, 0 replies; 47+ messages in thread
From: Richard Henderson @ 2011-08-03 15:05 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Michael Matz, Georg-Johann Lay, Ulrich Weigand, Ian Lance Taylor,
	Mikael Pettersson, Michael Walle, Hans-Peter Nilsson, gcc

On 08/03/2011 07:02 AM, Richard Guenther wrote:
> Reserved registers are a no-go (like %esp), either global or local.

Local register variables referring to anything in fixed_regs
are trivial to handle -- continue to treat them exactly as we
currently do.  They won't be clobbered by random code movement
because they're fixed.


r~

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-03  9:12                       ` Ulrich Weigand
  2011-08-03  9:51                         ` Georg-Johann Lay
@ 2011-08-04  0:20                         ` Hans-Peter Nilsson
  2011-08-04  7:29                           ` Andreas Schwab
  2011-08-04  9:51                           ` Andrew Haley
  1 sibling, 2 replies; 47+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-04  0:20 UTC (permalink / raw)
  To: Ulrich Weigand
  Cc: Richard Guenther, Ian Lance Taylor, Mikael Pettersson,
	Michael Walle, Georg-Johann Lay, Richard Henderson, gcc

On Wed, 3 Aug 2011, Ulrich Weigand wrote:
> Richard Guenther wrote:
> > asm ("scall" : : "asm("r0")" (10), ...)
> Maybe it would be possible to implement this while keeping the syntax
> of existing code by (re-)defining the semantics of register asm to
> basically say that:
>
>  If a variable X is declared as register asm for register Y, and X
>  is later on used as operand to an inline asm, the register allocator
>  will choose register Y to hold that asm operand.

"me too": Nice idea!

>  (And this is the
>  full specification of register asm semantics, nothing beyond this
>  is guaranteed.)

You'd have to handle global registers differently, and local
fixed registers not feeding into asms.  For everything else,
error or warning.  That should be ok, because local asm
registers are wonderfully already documented to have that
restriction: "Local register variables in specific registers do
not reserve the registers, except at the point where they are
used as input or output operands in an @code{asm} statement and
the @code{asm} statement itself is not deleted."

So, it's just a small matter of programming to make that happen
for real. :-)

To make sure, it'd be nice if someone could perhaps grep an
entire GNU/Linux-or-other distribution including the kernel for
uses of asm-declared *local* registers that don't directly feed
into asms and not being the stack-pointer?  Or can we get away
with just saying that local asm registers haven't had any other
documented meaning for the last seven years?

> It seems this semantics could be implemented very early on, probably
> in the frontend itself.  The frontend would mark the *asm* statement
> as using the specified register (there would be no special handling
> of the *variable* as such, after the frontend is done).  The optimizers
> would then simply be required to pass the asm-statement register
> annotations though, much like today they pass constraints through.
> At the point where register allocation decisions are made, those
> register annotations would then be acted on.

People ask why it's not already like that, probably because they
assume the ideal sequence of events.  At least the quote above
is a late addition (close to seven years now).  IIUC, asms and
register asms weren't originally tied together and the current
implementation with early register tying just happened to work
well together, well, that is until the SSA revolution. ;)

brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-04  0:20                         ` Hans-Peter Nilsson
@ 2011-08-04  7:29                           ` Andreas Schwab
  2011-08-04 13:04                             ` Hans-Peter Nilsson
  2011-08-04  9:51                           ` Andrew Haley
  1 sibling, 1 reply; 47+ messages in thread
From: Andreas Schwab @ 2011-08-04  7:29 UTC (permalink / raw)
  To: Hans-Peter Nilsson
  Cc: Ulrich Weigand, Richard Guenther, Ian Lance Taylor,
	Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Richard Henderson, gcc

Hans-Peter Nilsson <hp@bitrange.com> writes:

> To make sure, it'd be nice if someone could perhaps grep an
> entire GNU/Linux-or-other distribution including the kernel for
> uses of asm-declared *local* registers that don't directly feed
> into asms and not being the stack-pointer?

One frequent candidate is the global pointer.

Andreas.

-- 
Andreas Schwab, schwab@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84  5EC7 45C6 250E 6F00 984E
"And now for something completely different."

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-04  0:20                         ` Hans-Peter Nilsson
  2011-08-04  7:29                           ` Andreas Schwab
@ 2011-08-04  9:51                           ` Andrew Haley
  2011-08-04  9:52                             ` Richard Guenther
  1 sibling, 1 reply; 47+ messages in thread
From: Andrew Haley @ 2011-08-04  9:51 UTC (permalink / raw)
  To: gcc

On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:

> To make sure, it'd be nice if someone could perhaps grep an
> entire GNU/Linux-or-other distribution including the kernel for
> uses of asm-declared *local* registers that don't directly feed
> into asms and not being the stack-pointer?  Or can we get away
> with just saying that local asm registers haven't had any other
> documented meaning for the last seven years?

It's the sort of thing that gets done in threaded interpreters,
where you really need to keep a few pointers in registers and
the interpreter itself is a very long function.  gcc has always
done a dreadful job of register allocation in such cases.

Andrew.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-04  9:51                           ` Andrew Haley
@ 2011-08-04  9:52                             ` Richard Guenther
  2011-08-04 11:11                               ` Andrew Haley
  0 siblings, 1 reply; 47+ messages in thread
From: Richard Guenther @ 2011-08-04  9:52 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc

On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley <aph@redhat.com> wrote:
> On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:
>
>> To make sure, it'd be nice if someone could perhaps grep an
>> entire GNU/Linux-or-other distribution including the kernel for
>> uses of asm-declared *local* registers that don't directly feed
>> into asms and not being the stack-pointer?  Or can we get away
>> with just saying that local asm registers haven't had any other
>> documented meaning for the last seven years?
>
> It's the sort of thing that gets done in threaded interpreters,
> where you really need to keep a few pointers in registers and
> the interpreter itself is a very long function.  gcc has always
> done a dreadful job of register allocation in such cases.

Sure, but what I have seen people use global register variables
for this (which means they get taken away from the register allocator).

Richard.

> Andrew.
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-04  9:52                             ` Richard Guenther
@ 2011-08-04 11:11                               ` Andrew Haley
  2011-08-04 11:20                                 ` Richard Guenther
  2011-08-06 15:00                                 ` Paolo Bonzini
  0 siblings, 2 replies; 47+ messages in thread
From: Andrew Haley @ 2011-08-04 11:11 UTC (permalink / raw)
  To: gcc

On 08/04/2011 10:52 AM, Richard Guenther wrote:
> On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley <aph@redhat.com> wrote:
>> On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:
>>
>>> To make sure, it'd be nice if someone could perhaps grep an
>>> entire GNU/Linux-or-other distribution including the kernel for
>>> uses of asm-declared *local* registers that don't directly feed
>>> into asms and not being the stack-pointer?  Or can we get away
>>> with just saying that local asm registers haven't had any other
>>> documented meaning for the last seven years?
>>
>> It's the sort of thing that gets done in threaded interpreters,
>> where you really need to keep a few pointers in registers and
>> the interpreter itself is a very long function.  gcc has always
>> done a dreadful job of register allocation in such cases.
> 
> Sure, but what I have seen people use global register variables
> for this (which means they get taken away from the register allocator).

Not always though, and the x86 has so few registers that using a
global register variable is very problematic.  I suppose you could
compile the threaded interpreter in a file of its own, but I'm not
sure that has quite the same semantics as local register variables.

The problem is that people who care about this stuff very much don't
always read gcc@gcc.gnu.org so won't be heard.  But in their own world
(LISP, Forth) nice features like register variables and labels as
values have led to gcc being the preferred compiler for this kind of
work.

Andrew.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-04 11:11                               ` Andrew Haley
@ 2011-08-04 11:20                                 ` Richard Guenther
  2011-08-04 14:46                                   ` Andrew Haley
  2011-08-06 15:00                                 ` Paolo Bonzini
  1 sibling, 1 reply; 47+ messages in thread
From: Richard Guenther @ 2011-08-04 11:20 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc

On Thu, Aug 4, 2011 at 1:10 PM, Andrew Haley <aph@redhat.com> wrote:
> On 08/04/2011 10:52 AM, Richard Guenther wrote:
>> On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley <aph@redhat.com> wrote:
>>> On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:
>>>
>>>> To make sure, it'd be nice if someone could perhaps grep an
>>>> entire GNU/Linux-or-other distribution including the kernel for
>>>> uses of asm-declared *local* registers that don't directly feed
>>>> into asms and not being the stack-pointer?  Or can we get away
>>>> with just saying that local asm registers haven't had any other
>>>> documented meaning for the last seven years?
>>>
>>> It's the sort of thing that gets done in threaded interpreters,
>>> where you really need to keep a few pointers in registers and
>>> the interpreter itself is a very long function.  gcc has always
>>> done a dreadful job of register allocation in such cases.
>>
>> Sure, but what I have seen people use global register variables
>> for this (which means they get taken away from the register allocator).
>
> Not always though, and the x86 has so few registers that using a
> global register variable is very problematic.  I suppose you could
> compile the threaded interpreter in a file of its own, but I'm not
> sure that has quite the same semantics as local register variables.
>
> The problem is that people who care about this stuff very much don't
> always read gcc@gcc.gnu.org so won't be heard.  But in their own world
> (LISP, Forth) nice features like register variables and labels as
> values have led to gcc being the preferred compiler for this kind of
> work.

Well, the uses won't break with the idea - they would simply work
like if they were not using local register variables.

Richard.

> Andrew.
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-04  7:29                           ` Andreas Schwab
@ 2011-08-04 13:04                             ` Hans-Peter Nilsson
  0 siblings, 0 replies; 47+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-04 13:04 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Ulrich Weigand, Richard Guenther, Ian Lance Taylor,
	Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Richard Henderson, gcc

On Thu, 4 Aug 2011, Andreas Schwab wrote:
> Hans-Peter Nilsson <hp@bitrange.com> writes:
>
> > To make sure, it'd be nice if someone could perhaps grep an
> > entire GNU/Linux-or-other distribution including the kernel for
> > uses of asm-declared *local* registers that don't directly feed
> > into asms and not being the stack-pointer?
>
> One frequent candidate is the global pointer.

Yes, that too, but it's usually fixed isn't it?  What I really
meant was "not being a fixed register" but I don't think many
willing to grep a whole distro can tell which registers in which
gcc port are fixed and remember to look for uses of
"-ffixed-reg-".

brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-04 11:20                                 ` Richard Guenther
@ 2011-08-04 14:46                                   ` Andrew Haley
  0 siblings, 0 replies; 47+ messages in thread
From: Andrew Haley @ 2011-08-04 14:46 UTC (permalink / raw)
  To: gcc

On 08/04/2011 12:19 PM, Richard Guenther wrote:
> On Thu, Aug 4, 2011 at 1:10 PM, Andrew Haley <aph@redhat.com> wrote:
>> On 08/04/2011 10:52 AM, Richard Guenther wrote:
>>> On Thu, Aug 4, 2011 at 11:50 AM, Andrew Haley <aph@redhat.com> wrote:
>>>> On 08/04/2011 01:19 AM, Hans-Peter Nilsson wrote:
>>>>
>>>>> To make sure, it'd be nice if someone could perhaps grep an
>>>>> entire GNU/Linux-or-other distribution including the kernel for
>>>>> uses of asm-declared *local* registers that don't directly feed
>>>>> into asms and not being the stack-pointer?  Or can we get away
>>>>> with just saying that local asm registers haven't had any other
>>>>> documented meaning for the last seven years?
>>>>
>>>> It's the sort of thing that gets done in threaded interpreters,
>>>> where you really need to keep a few pointers in registers and
>>>> the interpreter itself is a very long function.  gcc has always
>>>> done a dreadful job of register allocation in such cases.
>>>
>>> Sure, but what I have seen people use global register variables
>>> for this (which means they get taken away from the register allocator).
>>
>> Not always though, and the x86 has so few registers that using a
>> global register variable is very problematic.  I suppose you could
>> compile the threaded interpreter in a file of its own, but I'm not
>> sure that has quite the same semantics as local register variables.
>>
>> The problem is that people who care about this stuff very much don't
>> always read gcc@gcc.gnu.org so won't be heard.  But in their own world
>> (LISP, Forth) nice features like register variables and labels as
>> values have led to gcc being the preferred compiler for this kind of
>> work.
> 
> Well, the uses won't break with the idea - they would simply work
> like if they were not using local register variables.

I don't understand this remark.  Surely if they work like they were
not using local register variables, you'll get dreadful register
allocation.  But this is a big reason to use gcc.  Efficient code
really does matter to people writing this kind of thing.

Andrew.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-04 11:11                               ` Andrew Haley
  2011-08-04 11:20                                 ` Richard Guenther
@ 2011-08-06 15:00                                 ` Paolo Bonzini
  2011-08-08  8:06                                   ` Richard Guenther
  1 sibling, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2011-08-06 15:00 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc

On 08/04/2011 01:10 PM, Andrew Haley wrote:
>>> >>  It's the sort of thing that gets done in threaded interpreters,
>>> >>  where you really need to keep a few pointers in registers and
>>> >>  the interpreter itself is a very long function.  gcc has always
>>> >>  done a dreadful job of register allocation in such cases.
>> >
>> >  Sure, but what I have seen people use global register variables
>> >  for this (which means they get taken away from the register allocator).
>
> Not always though, and the x86 has so few registers that using a
> global register variable is very problematic.  I suppose you could
> compile the threaded interpreter in a file of its own, but I'm not
> sure that has quite the same semantics as local register variables.

Indeed, local register variables give almost the same benefit as globals 
with half the burden.  The idea is that you don't care about the exact 
register that holds the contents but, by specifying a callee-save 
register, GCC will use those instead of memory across calls.  This 
reduces _a lot_ the number of spills.

> The problem is that people who care about this stuff very much don't
> always readgcc@gcc.gnu.org  so won't be heard.  But in their own world
> (LISP, Forth) nice features like register variables and labels as
> values have led to gcc being the preferred compiler for this kind of
> work.

/me raises hands.

For GNU Smalltalk, using

#if defined(__i386__)
# define __DECL_REG1 __asm("%esi")
# define __DECL_REG2 __asm("%edi")
# define __DECL_REG3 /* no more caller-save regs if PIC is in use!  */
#endif

#if defined(__x86_64__)
# define __DECL_REG1 __asm("%r12")
# define __DECL_REG2 __asm("%r13")
# define __DECL_REG3 __asm("%rbx")
#endif

...

   register unsigned char *ip __DECL_REG1;
   register OOP * sp __DECL_REG2;
   register intptr_t arg __DECL_REG3;

improves performance by up to 20% if I remember correctly.  I can 
benchmark it if desired.

It does not come for free, in some cases the register allocator does 
some stupid things due to the hard register declaration.  But it gets 
much better code overall, so who cares about the microoptimization.

Of course, if the register allocator did the right thing, or if I could 
use simply

   unsigned char *ip __attribute__(__do_not_spill_me__(20)));
   OOP *sp __attribute__(__do_not_spill_me__(10)));
   intptr_t arg __attrbite__(__do_not_spill_me__(0)));

that would be just fine.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-06 15:00                                 ` Paolo Bonzini
@ 2011-08-08  8:06                                   ` Richard Guenther
  2011-08-08 10:59                                     ` Paolo Bonzini
  0 siblings, 1 reply; 47+ messages in thread
From: Richard Guenther @ 2011-08-08  8:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Andrew Haley, gcc, Vladimir N. Makarov

On Sat, Aug 6, 2011 at 5:00 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 08/04/2011 01:10 PM, Andrew Haley wrote:
>>>>
>>>> >>  It's the sort of thing that gets done in threaded interpreters,
>>>> >>  where you really need to keep a few pointers in registers and
>>>> >>  the interpreter itself is a very long function.  gcc has always
>>>> >>  done a dreadful job of register allocation in such cases.
>>>
>>> >
>>> >  Sure, but what I have seen people use global register variables
>>> >  for this (which means they get taken away from the register
>>> > allocator).
>>
>> Not always though, and the x86 has so few registers that using a
>> global register variable is very problematic.  I suppose you could
>> compile the threaded interpreter in a file of its own, but I'm not
>> sure that has quite the same semantics as local register variables.
>
> Indeed, local register variables give almost the same benefit as globals
> with half the burden.  The idea is that you don't care about the exact
> register that holds the contents but, by specifying a callee-save register,
> GCC will use those instead of memory across calls.  This reduces _a lot_ the
> number of spills.
>
>> The problem is that people who care about this stuff very much don't
>> always readgcc@gcc.gnu.org  so won't be heard.  But in their own world
>> (LISP, Forth) nice features like register variables and labels as
>> values have led to gcc being the preferred compiler for this kind of
>> work.
>
> /me raises hands.
>
> For GNU Smalltalk, using
>
> #if defined(__i386__)
> # define __DECL_REG1 __asm("%esi")
> # define __DECL_REG2 __asm("%edi")
> # define __DECL_REG3 /* no more caller-save regs if PIC is in use!  */
> #endif
>
> #if defined(__x86_64__)
> # define __DECL_REG1 __asm("%r12")
> # define __DECL_REG2 __asm("%r13")
> # define __DECL_REG3 __asm("%rbx")
> #endif
>
> ...
>
>  register unsigned char *ip __DECL_REG1;
>  register OOP * sp __DECL_REG2;
>  register intptr_t arg __DECL_REG3;
>
> improves performance by up to 20% if I remember correctly.  I can benchmark
> it if desired.
>
> It does not come for free, in some cases the register allocator does some
> stupid things due to the hard register declaration.  But it gets much better
> code overall, so who cares about the microoptimization.
>
> Of course, if the register allocator did the right thing, or if I could use
> simply
>
>  unsigned char *ip __attribute__(__do_not_spill_me__(20)));
>  OOP *sp __attribute__(__do_not_spill_me__(10)));
>  intptr_t arg __attrbite__(__do_not_spill_me__(0)));
>
> that would be just fine.

Like if

register unsigned char *ip;

would increase spill cost of ip compared to

unsigned char *ip;

?  It is, after all, a cost issue - forcefully pinning down registers can
lead to problems.  We'd of course have to somehow "preserve" the
register state of ip for all relevant pseudos (and avoid coalescing with
non-register ones).

Richard.

> Paolo
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-08  8:06                                   ` Richard Guenther
@ 2011-08-08 10:59                                     ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-08-08 10:59 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew Haley, gcc, Vladimir N. Makarov

On 08/08/2011 10:06 AM, Richard Guenther wrote:
> Like if
>
> register unsigned char *ip;
>
> would increase spill cost of ip compared to
>
> unsigned char *ip;
>
> ?

Remember we're talking about a function with 11000 pseudos and 4000 
allocnos (not to mention a 1500 basic blocks).  You cannot really blame 
IRA for not doing the right thing.  And actually, ip and sp are live 
everywhere, so there's no hope of reserving a register for them, 
especially since all x86 callee-save registers have special uses in 
string functions.

If I understand the huge dumps correctly, the missing part is trying to 
use callee-save registers for spilling, rather than memory.  However, 
perhaps another way to do it is a specialized region management scheme 
for large switch statements, treating each switch arm as a separate 
region??  There are few registers live across the switch, and all of 
them are used either "a lot" or "almost never" (and always in cold blocks).

BTW, here are some measurements on x86-64:

1) with regalloc hints: 450060432 bytecodes/sec; 12819996 calls/sec
2) without regalloc hints: 263002439 bytecodes/sec; 9458816 sends/sec

Probably even worse on x86-32.

None of -fira-region=all, -fira-region=one, -fira-algorithm=priority had 
significant changes.  In fact, it's pretty much a "binary" result: I'd 
expect register allocation results to be either on par with (1) or 
similar to (2); everything else is mostly noise.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-02 12:23                 ` Richard Guenther
                                     ` (4 preceding siblings ...)
  2011-08-02 17:21                   ` Georg-Johann Lay
@ 2011-08-09 16:55                   ` Richard Earnshaw
  2011-08-09 17:24                     ` Ulrich Weigand
  2011-08-10  0:40                     ` Hans-Peter Nilsson
  5 siblings, 2 replies; 47+ messages in thread
From: Richard Earnshaw @ 2011-08-09 16:55 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Mikael Pettersson, Michael Walle, Georg-Johann Lay,
	Hans-Peter Nilsson, Richard Henderson, gcc

On 02/08/11 13:22, Richard Guenther wrote:
> On Tue, Aug 2, 2011 at 2:06 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
>> Michael Walle writes:
>>  >
>>  > Hi,
>>  >
>>  > > To confirm that try -fno-tree-ter.
>>  >
>>  > "lm32-gcc -O1 -fno-tree-ter -S -c test.c" generates the following working
>>  > assembly code:
>>  >
>>  > f2:
>>  >      addi     sp, sp, -4
>>  >      sw       (sp+4), ra
>>  >      addi     r2, r0, 10
>>  >      calli    __ashrsi3
>>  >      addi     r8, r0, 10
>>  >      scall
>>  >      lw       ra, (sp+4)
>>  >      addi     sp, sp, 4
>>  >      b        ra
>>
>> -fno-tree-ter also unbreaks the ARM test case in PR48863 comment #4.
> 
> It's of course only a workaround, not a real fix as nothing prevents
> other optimizers from performing the re-scheduling TER does.
> 
> I suggest to amend the documentation for local call-clobbered register
> variables to say that the only valid sequence using them is from a
> non-inlinable function that contains only direct initializations of the
> register variables from constants or parameters.
> 
> Or go one step further and deprecate local register variables alltogether
> (they IMHO don't make much sense, and rather the targets should provide
> a way to properly constrain asm inputs and outputs).
> 
> Richard.
> 
> 

Better still would be to change the specification and implementation of
local register variables to only guarantee them at the beginning of ASM
statements.  At other times they are simply the same as other local
variables.  Now we have a problem that the register allocator knows how
to solve.

In other words, if the user writes

bar (int y)
{
  register int x asm ("r0") = y;

  foo()

  asm volatile ("mov r1, r0");

}

The compiler will generate
(set (reg:SI 999 <x>) (reg:SI <y>))
(call "foo")
(set (reg:SI 0 "r0") (reg:SI 999 <x>))
(asm "mov r1, r0")
(set (reg:SI 999 <x>) (reg:SI 0 "r0"))

That is, it inserts appropriate set insns around asm blocks.  Of course,
the register allocator can try to allocate reg 999 to r0 and if it
succeeds, then the sets become dead.  But if it fails then at least the
code will continue to execute as intended.

R.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-09 16:55                   ` Richard Earnshaw
@ 2011-08-09 17:24                     ` Ulrich Weigand
  2011-08-09 19:48                       ` Hans-Peter Nilsson
  2011-08-10  0:40                     ` Hans-Peter Nilsson
  1 sibling, 1 reply; 47+ messages in thread
From: Ulrich Weigand @ 2011-08-09 17:24 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Guenther, Mikael Pettersson, Michael Walle,
	Georg-Johann Lay, Hans-Peter Nilsson, Richard Henderson, gcc

Richard Earnshaw wrote:

> Better still would be to change the specification and implementation of
> local register variables to only guarantee them at the beginning of ASM
> statements.  At other times they are simply the same as other local
> variables.  Now we have a problem that the register allocator knows how
> to solve.

This seems to be pretty much the same as my proposal here:
http://gcc.gnu.org/ml/gcc/2011-08/msg00064.html

But there was some push-back on requiring additional semantics
by some users ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-09 17:24                     ` Ulrich Weigand
@ 2011-08-09 19:48                       ` Hans-Peter Nilsson
  0 siblings, 0 replies; 47+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-09 19:48 UTC (permalink / raw)
  To: Ulrich Weigand
  Cc: Richard Earnshaw, Richard Guenther, Mikael Pettersson,
	Michael Walle, Georg-Johann Lay, Richard Henderson, gcc

On Tue, 9 Aug 2011, Ulrich Weigand wrote:
> Richard Earnshaw wrote:
>
> > Better still would be to change the specification and implementation of
> > local register variables to only guarantee them at the beginning of ASM
> > statements.  At other times they are simply the same as other local
> > variables.  Now we have a problem that the register allocator knows how
> > to solve.
>
> This seems to be pretty much the same as my proposal here:
> http://gcc.gnu.org/ml/gcc/2011-08/msg00064.html
>
> But there was some push-back on requiring additional semantics
> by some users ...

Don't feel bad, at least we seem to have overwhelming consensus
on what to do for local asm-declared register variables when
they feed asm statements! :)

I found an example where I have an asm-declared register that
was used not just for the primary asm statement, but I'm ok with
those other uses not using the declared register, just as warned
by the documentation.  (I don't think gcc can better assign
another register, but that's beside the point.)

brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: libgcc: strange optimization
  2011-08-09 16:55                   ` Richard Earnshaw
  2011-08-09 17:24                     ` Ulrich Weigand
@ 2011-08-10  0:40                     ` Hans-Peter Nilsson
  1 sibling, 0 replies; 47+ messages in thread
From: Hans-Peter Nilsson @ 2011-08-10  0:40 UTC (permalink / raw)
  To: Richard Earnshaw
  Cc: Richard Guenther, Mikael Pettersson, Michael Walle,
	Georg-Johann Lay, Richard Henderson, gcc

On Tue, 9 Aug 2011, Richard Earnshaw wrote:
> Better still would be to change the specification and implementation of
> local register variables to only guarantee them at the beginning of ASM
> statements.

Only for those asm statements taking the same asm-register
variables as arguments.

>  At other times they are simply the same as other local
> variables.  Now we have a problem that the register allocator knows how
> to solve.
>
> In other words, if the user writes
>
> bar (int y)
> {
>   register int x asm ("r0") = y;
>
>   foo()
>
>   asm volatile ("mov r1, r0");
>
> }
>
> The compiler will generate
> (set (reg:SI 999 <x>) (reg:SI <y>))
> (call "foo")
> (set (reg:SI 0 "r0") (reg:SI 999 <x>))
> (asm "mov r1, r0")
> (set (reg:SI 999 <x>) (reg:SI 0 "r0"))

It should rather eliminate the variable x and its assignment as
it isn't used in a way properly conveyed to gcc: the occurrence
of the string "r0" in the asm should not be considered.

I like Ulrich Weigand's proposal better, not the least because
it's how it's already documented to work.

brgds, H-P

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2011-08-10  0:40 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-01 20:30 libgcc: strange optimization Michael Walle
2011-08-01 20:51 ` Georg-Johann Lay
2011-08-01 21:14   ` Michael Walle
2011-08-02  6:47     ` Georg-Johann Lay
2011-08-02  6:29   ` Hans-Peter Nilsson
2011-08-01 21:30 ` Richard Henderson
2011-08-02  6:37   ` Hans-Peter Nilsson
2011-08-02  8:49     ` Mikael Pettersson
2011-08-02  9:47       ` Richard Guenther
2011-08-02 10:02         ` Georg-Johann Lay
2011-08-02 10:11           ` Richard Guenther
2011-08-02 10:55             ` Michael Walle
2011-08-02 12:06               ` Mikael Pettersson
2011-08-02 12:23                 ` Richard Guenther
2011-08-02 12:36                   ` Georg-Johann Lay
2011-08-02 12:54                   ` Hans-Peter Nilsson
2011-08-02 13:09                     ` Richard Guenther
2011-08-02 13:16                       ` Hans-Peter Nilsson
2011-08-03  4:59                       ` Miles Bader
2011-08-02 13:23                   ` Ian Lance Taylor
2011-08-02 13:42                     ` Richard Guenther
2011-08-02 14:35                       ` Ian Lance Taylor
2011-08-03  9:12                       ` Ulrich Weigand
2011-08-03  9:51                         ` Georg-Johann Lay
2011-08-03 10:04                           ` Richard Guenther
2011-08-03 13:27                             ` Michael Matz
2011-08-03 14:02                               ` Richard Guenther
2011-08-03 14:55                                 ` Georg-Johann Lay
2011-08-03 15:05                                 ` Richard Henderson
2011-08-04  0:20                         ` Hans-Peter Nilsson
2011-08-04  7:29                           ` Andreas Schwab
2011-08-04 13:04                             ` Hans-Peter Nilsson
2011-08-04  9:51                           ` Andrew Haley
2011-08-04  9:52                             ` Richard Guenther
2011-08-04 11:11                               ` Andrew Haley
2011-08-04 11:20                                 ` Richard Guenther
2011-08-04 14:46                                   ` Andrew Haley
2011-08-06 15:00                                 ` Paolo Bonzini
2011-08-08  8:06                                   ` Richard Guenther
2011-08-08 10:59                                     ` Paolo Bonzini
2011-08-02 16:03                   ` Richard Henderson
2011-08-02 20:10                     ` Richard Guenther
2011-08-02 17:21                   ` Georg-Johann Lay
2011-08-09 16:55                   ` Richard Earnshaw
2011-08-09 17:24                     ` Ulrich Weigand
2011-08-09 19:48                       ` Hans-Peter Nilsson
2011-08-10  0:40                     ` Hans-Peter Nilsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).