Spurious register spill with volatile function argument

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Spurious register spill with volatile function argument
       [not found]   ` <CA++6G0CK3Yu-w9fhOrvpRP49qz9=Vs20J_=R28Mq=zCJOH+-UA@mail.gmail.com>
@ 2016-03-27  5:58     ` Michael Clark
  2016-03-28  9:16       ` Andrew Haley
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Clark @ 2016-03-27  5:58 UTC (permalink / raw)
  To: gcc, cfe-dev, Andrew Waterman; +Cc: RISC-V Software Developers

Seems I had misused volatile. I removed ‘volatile’ from the function argument on test_0 and it prevented the spill through the stack.

I added volatile because I was trying to avoid the compiler optimising away the call to test_0 (as it has no side effects) but it appeared that volatile was unnecessary and was a misuse of volatile (intended to indicate storage may change outside of the control of the compiler). However it is an interesting case… as a register arguments don’t have storage.

GCC, Clang folk, any ideas on why there is a stack spill for a volatile register argument passed in esi? Does volatile force the argument to have storage allocated on the stack? Is this a corner case in the C standard? This argument in the x86_64 calling convention only has a register, so technically it can’t change outside the control of the C "virtual machine” so volatile has a vague meaning here. This seems to be a case of interpreting the C standard in such a was as to make sure that a volatile argument “can be changed” outside the control of the C "virtual machine” by explicitly giving it a storage location on the stack. I think volatile scalar arguments are a special case and that the volatile type label shouldn’t widen the scope beyond the register unless it actually *needs* storage to spill. This is not a volatile stack scoped variable unless the C standard interprets ABI register parameters as actually having ‘storage’ so this is questionable… Maybe I should have gotten a warning… or the volatile type qualifier on a scalar register argument should have been ignored…

volatile for scalar function arguments seems to mean: “make this volatile and subject to change outside of the compiler” rather than being a qualifier for its storage (which is a register).

# gcc
test_0:
     mov     DWORD PTR [rsp-4], esi
     mov     ecx, DWORD PTR [rsp-4]
     mov     eax, edi
     cdq
     idiv    ecx
     mov     eax, edx
     ret

# clang
test_0:
	mov	dword ptr [rsp - 4], esi
	xor	edx, edx
	mov	eax, edi
	div	dword ptr [rsp - 4]
	mov	eax, edx
	ret

/* Test program compiled on x86_64 with: cc -O3 -fomit-frame-pointer -masm=intel -S test.c -o test.S  */

#include <stdio.h>
#include <limits.h>

static const int p = 8191;
static const int s = 13;

int __attribute__ ((noinline)) test_0(unsigned int k, volatile int p)
{
     return k % p;
}

int __attribute__ ((noinline)) test_1(unsigned int k)
{
     return k % p;
}

int __attribute__ ((noinline)) test_2(unsigned int k)
{
     int i = (k&p) + (k>>s);
     i = (i&p) + (i>>s);
     if (i>=p) i -= p;
     return i;
}

int main()
{
     test_0(1, 8191); /* control */
     for (int i = INT_MIN; i < INT_MAX; i++) {
             int r1 = test_1(i), r2 = test_2(i);
             if (r1 != r2) printf("%d %d %d\n", i, r1, r2);
     }
}

> On 27 Mar 2016, at 2:32 PM, Andrew Waterman <andrew@sifive.com> wrote:
> 
> It would be good to figure out how to get rid of the spurious register spills.
> 
> The strength reduction optimization isn't always profitable on Rocket,
> as it increases instruction count and code size.  The divider has an
> early out and for small numbers is quite fast.
> 
> On Fri, Mar 25, 2016 at 5:43 PM, Michael Clark <michaeljclark@mac.com> wrote:
>> Now considering I have no idea how many cycles it takes for an integer divide on the Rocket so the optimisation may not be a win.
>> 
>> Trying to read MuDiv in multiplier.scala, and will at some point run some timings in the cycle-accurate simulator.
>> 
>> In either case, the spurious stack moves emitted by GCC are curious...
>> 
>>> On 26 Mar 2016, at 9:42 AM, Michael Clark <michaeljclark@mac.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> I have found an interesting case where an optimisation is not being applied by GCC on RISC-V. And also some strange assembly output from GCC on RISC-V.
>>> 
>>> Both GCC and Clang appear to optimise division by a constant Mersenne prime on x86_64 however GCC on RISC-V is not applying this optimisation.
>>> 
>>> See test program and assembly output for these platforms:
>>> 
>>> * GCC -O3 on RISC-V
>>> * GCC -O3 on x86_64
>>> * LLVM/Clang -O3 on x86_64
>>> 
>>> Another strange observation is GCC on RISC-V is moving a1 to a5 via a stack store followed by a stack load. Odd? GCC 5 also seems to be doing odd stuff with stack ‘moves' on x86_64, moving esi to ecx via the stack (I think recent x86 micro-architecture treats tip of the stack like an extended register file so this may only have a small penalty on x86).
>>> 
>>> See GCC on RISC-V is emitting this:
>>> 
>>> test_0:
>>>      add     sp,sp,-16
>>>      sw      a1,12(sp)
>>>      lw      a5,12(sp)
>>>      add     sp,sp,16
>>>      remuw   a0,a0,a5
>>>      jr      ra
>>> 
>>> instead of this:
>>> 
>>> test_0:
>>>      remuw   a0,a0,a1
>>>      jr      ra
>>> 
>>> Compiler devs, please read Test program and assembly output. I have not yet tested LLVM/Clang on RISC-V yet… I will do that next… I have not had time to dig into compiler code yet...
>>> 
>>> Regards,
>>> Michael.
>>> 
>>> 
>>> /* Test program */
>>> 
>>> #include <stdio.h>
>>> #include <limits.h>
>>> 
>>> static const int p = 8191;
>>> static const int s = 13;
>>> 
>>> int __attribute__ ((noinline)) test_0(unsigned int k, volatile int p)
>>> {
>>>      return k % p;
>>> }
>>> 
>>> int __attribute__ ((noinline)) test_1(unsigned int k)
>>> {
>>>      return k % p;
>>> }
>>> 
>>> int __attribute__ ((noinline)) test_2(unsigned int k)
>>> {
>>>      int i = (k&p) + (k>>s);
>>>      i = (i&p) + (i>>s);
>>>      if (i>=p) i -= p;
>>>      return i;
>>> }
>>> 
>>> int main()
>>> {
>>>      test_0(1, 8191); /* control */
>>>      for (int i = INT_MIN; i < INT_MAX; i++) {
>>>              int r1 = test_1(i), r2 = test_2(i);
>>>              if (r1 != r2) printf("%d %d %d\n", i, r1, r2);
>>>      }
>>> }
>>> 
>>> 
>>> 
>>> /* RISC-V GCC */
>>> 
>>> $ riscv64-unknown-elf-gcc --version
>>> riscv64-unknown-elf-gcc (GCC) 5.2.0
>>> 
>>> test_0:
>>>      add     sp,sp,-16
>>>      sw      a1,12(sp)
>>>      lw      a5,12(sp)
>>>      add     sp,sp,16
>>>      remuw   a0,a0,a5
>>>      jr      ra
>>> test_1:
>>>      li      a5,8192
>>>      addw    a5,a5,-1
>>>      remuw   a0,a0,a5
>>>      ret
>>> test_2:
>>>      li      a3,8192
>>>      addw    a2,a3,-1
>>>      and     a4,a0,a2
>>>      srlw    a0,a0,13
>>>      addw    a5,a4,a0
>>>      and     a0,a5,a2
>>>      sraw    a5,a5,13
>>>      addw    a0,a0,a5
>>>      addw    a3,a3,-2
>>>      ble     a0,a3,.L5
>>>      subw    a0,a0,a2
>>> .L5:
>>>      ret
>>> 
>>> 
>>> /* Linux x86_64 GCC */
>>> 
>>> $ gcc --version
>>> gcc (Debian 5.2.1-23) 5.2.1 20151028
>>> 
>>> test_0:
>>>      mov     DWORD PTR [rsp-4], esi
>>>      mov     ecx, DWORD PTR [rsp-4]
>>>      mov     eax, edi
>>>      cdq
>>>      idiv    ecx
>>>      mov     eax, edx
>>>      ret
>>> test_1:
>>>      mov     eax, edi
>>>      mov     rcx, rax
>>>      mov     rdx, rax
>>>      sal     rcx, 6
>>>      sal     rdx, 19
>>>      add     rdx, rcx
>>>      add     rax, rdx
>>>      mov     edx, edi
>>>      shr     rax, 32
>>>      sub     edx, eax
>>>      shr     edx
>>>      add     eax, edx
>>>      shr     eax, 12
>>>      mov     edx, eax
>>>      sal     edx, 13
>>>      sub     edx, eax
>>>      sub     edi, edx
>>>      mov     eax, edi
>>>      ret
>>> test_2:
>>>      mov     eax, edi
>>>      shr     edi, 13
>>>      and     eax, 8191
>>>      add     eax, edi
>>>      mov     edx, eax
>>>      sar     eax, 13
>>>      and     edx, 8191
>>>      add     eax, edx
>>>      lea     edx, [rax-8191]
>>>      cmp     eax, 8191
>>>      cmovge  eax, edx
>>>      ret
>>> 
>>> 
>>> /* Darwin x86_64 LLVM Clang */
>>> 
>>> $ cc --version
>>> Apple LLVM version 7.3.0 (clang-703.0.29)
>>> 
>>> _test_0:
>>>       mov     dword ptr [rsp - 4], esi
>>>       xor     edx, edx
>>>       mov     eax, edi
>>>       div     dword ptr [rsp - 4]
>>>       mov     eax, edx
>>>       ret
>>> _test_1:
>>>       mov     eax, edi
>>>       imul    rax, rax, 524353
>>>       shr     rax, 32
>>>       mov     ecx, edi
>>>       sub     ecx, eax
>>>       shr     ecx
>>>       add     ecx, eax
>>>       shr     ecx, 12
>>>       imul    eax, ecx, 8191
>>>       sub     edi, eax
>>>       mov     eax, edi
>>>       ret
>>> _test_2:
>>>       mov     eax, edi
>>>       and     eax, 8191
>>>       mov     ecx, edi
>>>       shr     ecx, 13
>>>       add     eax, ecx
>>>       add     ecx, edi
>>>       and     ecx, 8191
>>>       shr     eax, 13
>>>       lea     edx, [rcx + rax]
>>>       cmp     edx, 8190
>>>       lea     eax, [rcx + rax - 8191]
>>>       cmovbe  eax, edx
>>>       ret
>>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
>>> To post to this group, send email to sw-dev@groups.riscv.org.
>>> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>>> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/2600D96D-94BC-4259-9D39-DE4993859281%40mac.com.
>> 
>> --
>> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
>> To post to this group, send email to sw-dev@groups.riscv.org.
>> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/9F3C9DE6-F00B-4402-A83B-354455DEAFFA%40mac.com.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spurious register spill with volatile function argument
  2016-03-27  5:58     ` Spurious register spill with volatile function argument Michael Clark
@ 2016-03-28  9:16       ` Andrew Haley
  2016-03-28 12:12         ` Florian Weimer
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Haley @ 2016-03-28  9:16 UTC (permalink / raw)
  To: Michael Clark, gcc, cfe-dev, Andrew Waterman; +Cc: RISC-V Software Developers

On 27/03/16 06:57, Michael Clark wrote:

> GCC, Clang folk, any ideas on why there is a stack spill for a
> volatile register argument passed in esi? Does volatile force the
> argument to have storage allocated on the stack? Is this a corner
> case in the C standard? This argument in the x86_64 calling
> convention only has a register, so technically it canâ€™t change
> outside the control of the C "virtual machineâ€ so volatile has a
> vague meaning here.

"volatile" doesn't really mean very much, formally speaking.  Sure, the
standard says "accesses to volatile objects are evaluated
strictly according to the rules of the abstract machine," but nowhere
is it specified exactly what constitutes an access.  (To be precise,
"what constitutes an access to an object that has volatile-qualified
type is implementation-defined.")

So, we have to fall back to tradition.  Traditionally, all volatile
objects are allocated stack slots and all accesses to them are memory
accesses.  This is consistent behaviour, and has been for a long time.
It is also extremely useful when debugging optimized code.

> volatile for scalar function arguments seems to mean: â€œmake this
> volatile and subject to change outside of the compilerâ€ rather than
> being a qualifier for its storage (which is a register).

No, arguments are not necessarily stored in registers: they're passed
in registers, but after function entry function they're just auto
variables and are stored wherever the compiler likes.

Andrew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spurious register spill with volatile function argument
  2016-03-28  9:16       ` Andrew Haley
@ 2016-03-28 12:12         ` Florian Weimer
  2016-03-28 14:34           ` Paul_Koning
  0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2016-03-28 12:12 UTC (permalink / raw)
  To: Andrew Haley
  Cc: Michael Clark, gcc, cfe-dev, Andrew Waterman, RISC-V Software Developers

* Andrew Haley:

> "volatile" doesn't really mean very much, formally speaking.  Sure, the
> standard says "accesses to volatile objects are evaluated
> strictly according to the rules of the abstract machine," but nowhere
> is it specified exactly what constitutes an access.

Reading or modifying an object is defined as “access”.

The problem is that “reading” is either not defined, or the existing
flatly contradicts existing practice.

For example, if p is a pointer to a struct, will the expression &p->m
read *p?

Previously, this wasn't very interesting.  But under the model memory,
it's suddenly quite relevant.  If reading p->m implies a read of the
entire object *p, you cannot use a member to synchronize access to
other members of the struct.  For example, if m is a mutex, and
carefully acquire the mutex before you read or write other members,
you still have data race between a write to some other member and the
acquisition of the mutex because the mutex acquisition reads the
entire struct (including the member written to ).

One possible cure is to take the address of the mutex and keep track
of it separately.  Or you could construct a pointer using offsetof.
But no one is doing that, obviously.

This is not entirely hypothetical.  Even today, GCC's aliasing
analysis requires that those implicit whole-object reads take place,
to make certain forms of type-punning invalid which would otherwise be
well-defined (and for which GCC would generate invalid code).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spurious register spill with volatile function argument
  2016-03-28 12:12         ` Florian Weimer
@ 2016-03-28 14:34           ` Paul_Koning
  2016-03-28 15:10             ` Florian Weimer
  0 siblings, 1 reply; 5+ messages in thread
From: Paul_Koning @ 2016-03-28 14:34 UTC (permalink / raw)
  To: fw; +Cc: gcc, cfe-dev, sw-dev


> On Mar 28, 2016, at 8:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> 
> ...
> The problem is that “reading” is either not defined, or the existing
> flatly contradicts existing practice.
> 
> For example, if p is a pointer to a struct, will the expression &p->m
> read *p?

Presumably the offset of m is substantially larger than 0?  If so, my answer would be "it had better not".  Does any compiler treat that statement as an access to *p ?

	paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spurious register spill with volatile function argument
  2016-03-28 14:34           ` Paul_Koning
@ 2016-03-28 15:10             ` Florian Weimer
  0 siblings, 0 replies; 5+ messages in thread
From: Florian Weimer @ 2016-03-28 15:10 UTC (permalink / raw)
  To: Paul_Koning; +Cc: gcc, cfe-dev, sw-dev

* Paul Koning:

>> On Mar 28, 2016, at 8:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> 
>> ...
>> The problem is that “reading” is either not defined, or the existing
>> flatly contradicts existing practice.
>> 
>> For example, if p is a pointer to a struct, will the expression &p->m
>> read *p?
>
> Presumably the offset of m is substantially larger than 0?  If so, my
> answer would be "it had better not".  Does any compiler treat that
> statement as an access to *p ?

As I tried to explain, GCC does, for aliasing purposes.  For the
expression p->m, there is an implicit read of *p, asserting that the
static and dynamic types match.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-28 15:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <2600D96D-94BC-4259-9D39-DE4993859281@mac.com>
     [not found] ` <9F3C9DE6-F00B-4402-A83B-354455DEAFFA@mac.com>
     [not found]   ` <CA++6G0CK3Yu-w9fhOrvpRP49qz9=Vs20J_=R28Mq=zCJOH+-UA@mail.gmail.com>
2016-03-27  5:58     ` Spurious register spill with volatile function argument Michael Clark
2016-03-28  9:16       ` Andrew Haley
2016-03-28 12:12         ` Florian Weimer
2016-03-28 14:34           ` Paul_Koning
2016-03-28 15:10             ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).