* Spurious register spill with volatile function argument
[not found] ` <CA++6G0CK3Yu-w9fhOrvpRP49qz9=Vs20J_=R28Mq=zCJOH+-UA@mail.gmail.com>
@ 2016-03-27 5:58 ` Michael Clark
2016-03-28 9:16 ` Andrew Haley
0 siblings, 1 reply; 5+ messages in thread
From: Michael Clark @ 2016-03-27 5:58 UTC (permalink / raw)
To: gcc, cfe-dev, Andrew Waterman; +Cc: RISC-V Software Developers
Seems I had misused volatile. I removed ‘volatile’ from the function argument on test_0 and it prevented the spill through the stack.
I added volatile because I was trying to avoid the compiler optimising away the call to test_0 (as it has no side effects) but it appeared that volatile was unnecessary and was a misuse of volatile (intended to indicate storage may change outside of the control of the compiler). However it is an interesting case… as a register arguments don’t have storage.
GCC, Clang folk, any ideas on why there is a stack spill for a volatile register argument passed in esi? Does volatile force the argument to have storage allocated on the stack? Is this a corner case in the C standard? This argument in the x86_64 calling convention only has a register, so technically it can’t change outside the control of the C "virtual machine” so volatile has a vague meaning here. This seems to be a case of interpreting the C standard in such a was as to make sure that a volatile argument “can be changed” outside the control of the C "virtual machine” by explicitly giving it a storage location on the stack. I think volatile scalar arguments are a special case and that the volatile type label shouldn’t widen the scope beyond the register unless it actually *needs* storage to spill. This is not a volatile stack scoped variable unless the C standard interprets ABI register parameters as actually having ‘storage’ so this is questionable… Maybe I should have gotten a warning… or the volatile type qualifier on a scalar register argument should have been ignored…
volatile for scalar function arguments seems to mean: “make this volatile and subject to change outside of the compiler” rather than being a qualifier for its storage (which is a register).
# gcc
test_0:
mov DWORD PTR [rsp-4], esi
mov ecx, DWORD PTR [rsp-4]
mov eax, edi
cdq
idiv ecx
mov eax, edx
ret
# clang
test_0:
mov dword ptr [rsp - 4], esi
xor edx, edx
mov eax, edi
div dword ptr [rsp - 4]
mov eax, edx
ret
/* Test program compiled on x86_64 with: cc -O3 -fomit-frame-pointer -masm=intel -S test.c -o test.S */
#include <stdio.h>
#include <limits.h>
static const int p = 8191;
static const int s = 13;
int __attribute__ ((noinline)) test_0(unsigned int k, volatile int p)
{
return k % p;
}
int __attribute__ ((noinline)) test_1(unsigned int k)
{
return k % p;
}
int __attribute__ ((noinline)) test_2(unsigned int k)
{
int i = (k&p) + (k>>s);
i = (i&p) + (i>>s);
if (i>=p) i -= p;
return i;
}
int main()
{
test_0(1, 8191); /* control */
for (int i = INT_MIN; i < INT_MAX; i++) {
int r1 = test_1(i), r2 = test_2(i);
if (r1 != r2) printf("%d %d %d\n", i, r1, r2);
}
}
> On 27 Mar 2016, at 2:32 PM, Andrew Waterman <andrew@sifive.com> wrote:
>
> It would be good to figure out how to get rid of the spurious register spills.
>
> The strength reduction optimization isn't always profitable on Rocket,
> as it increases instruction count and code size. The divider has an
> early out and for small numbers is quite fast.
>
> On Fri, Mar 25, 2016 at 5:43 PM, Michael Clark <michaeljclark@mac.com> wrote:
>> Now considering I have no idea how many cycles it takes for an integer divide on the Rocket so the optimisation may not be a win.
>>
>> Trying to read MuDiv in multiplier.scala, and will at some point run some timings in the cycle-accurate simulator.
>>
>> In either case, the spurious stack moves emitted by GCC are curious...
>>
>>> On 26 Mar 2016, at 9:42 AM, Michael Clark <michaeljclark@mac.com> wrote:
>>>
>>> Hi All,
>>>
>>> I have found an interesting case where an optimisation is not being applied by GCC on RISC-V. And also some strange assembly output from GCC on RISC-V.
>>>
>>> Both GCC and Clang appear to optimise division by a constant Mersenne prime on x86_64 however GCC on RISC-V is not applying this optimisation.
>>>
>>> See test program and assembly output for these platforms:
>>>
>>> * GCC -O3 on RISC-V
>>> * GCC -O3 on x86_64
>>> * LLVM/Clang -O3 on x86_64
>>>
>>> Another strange observation is GCC on RISC-V is moving a1 to a5 via a stack store followed by a stack load. Odd? GCC 5 also seems to be doing odd stuff with stack ‘moves' on x86_64, moving esi to ecx via the stack (I think recent x86 micro-architecture treats tip of the stack like an extended register file so this may only have a small penalty on x86).
>>>
>>> See GCC on RISC-V is emitting this:
>>>
>>> test_0:
>>> add sp,sp,-16
>>> sw a1,12(sp)
>>> lw a5,12(sp)
>>> add sp,sp,16
>>> remuw a0,a0,a5
>>> jr ra
>>>
>>> instead of this:
>>>
>>> test_0:
>>> remuw a0,a0,a1
>>> jr ra
>>>
>>> Compiler devs, please read Test program and assembly output. I have not yet tested LLVM/Clang on RISC-V yet… I will do that next… I have not had time to dig into compiler code yet...
>>>
>>> Regards,
>>> Michael.
>>>
>>>
>>> /* Test program */
>>>
>>> #include <stdio.h>
>>> #include <limits.h>
>>>
>>> static const int p = 8191;
>>> static const int s = 13;
>>>
>>> int __attribute__ ((noinline)) test_0(unsigned int k, volatile int p)
>>> {
>>> return k % p;
>>> }
>>>
>>> int __attribute__ ((noinline)) test_1(unsigned int k)
>>> {
>>> return k % p;
>>> }
>>>
>>> int __attribute__ ((noinline)) test_2(unsigned int k)
>>> {
>>> int i = (k&p) + (k>>s);
>>> i = (i&p) + (i>>s);
>>> if (i>=p) i -= p;
>>> return i;
>>> }
>>>
>>> int main()
>>> {
>>> test_0(1, 8191); /* control */
>>> for (int i = INT_MIN; i < INT_MAX; i++) {
>>> int r1 = test_1(i), r2 = test_2(i);
>>> if (r1 != r2) printf("%d %d %d\n", i, r1, r2);
>>> }
>>> }
>>>
>>>
>>>
>>> /* RISC-V GCC */
>>>
>>> $ riscv64-unknown-elf-gcc --version
>>> riscv64-unknown-elf-gcc (GCC) 5.2.0
>>>
>>> test_0:
>>> add sp,sp,-16
>>> sw a1,12(sp)
>>> lw a5,12(sp)
>>> add sp,sp,16
>>> remuw a0,a0,a5
>>> jr ra
>>> test_1:
>>> li a5,8192
>>> addw a5,a5,-1
>>> remuw a0,a0,a5
>>> ret
>>> test_2:
>>> li a3,8192
>>> addw a2,a3,-1
>>> and a4,a0,a2
>>> srlw a0,a0,13
>>> addw a5,a4,a0
>>> and a0,a5,a2
>>> sraw a5,a5,13
>>> addw a0,a0,a5
>>> addw a3,a3,-2
>>> ble a0,a3,.L5
>>> subw a0,a0,a2
>>> .L5:
>>> ret
>>>
>>>
>>> /* Linux x86_64 GCC */
>>>
>>> $ gcc --version
>>> gcc (Debian 5.2.1-23) 5.2.1 20151028
>>>
>>> test_0:
>>> mov DWORD PTR [rsp-4], esi
>>> mov ecx, DWORD PTR [rsp-4]
>>> mov eax, edi
>>> cdq
>>> idiv ecx
>>> mov eax, edx
>>> ret
>>> test_1:
>>> mov eax, edi
>>> mov rcx, rax
>>> mov rdx, rax
>>> sal rcx, 6
>>> sal rdx, 19
>>> add rdx, rcx
>>> add rax, rdx
>>> mov edx, edi
>>> shr rax, 32
>>> sub edx, eax
>>> shr edx
>>> add eax, edx
>>> shr eax, 12
>>> mov edx, eax
>>> sal edx, 13
>>> sub edx, eax
>>> sub edi, edx
>>> mov eax, edi
>>> ret
>>> test_2:
>>> mov eax, edi
>>> shr edi, 13
>>> and eax, 8191
>>> add eax, edi
>>> mov edx, eax
>>> sar eax, 13
>>> and edx, 8191
>>> add eax, edx
>>> lea edx, [rax-8191]
>>> cmp eax, 8191
>>> cmovge eax, edx
>>> ret
>>>
>>>
>>> /* Darwin x86_64 LLVM Clang */
>>>
>>> $ cc --version
>>> Apple LLVM version 7.3.0 (clang-703.0.29)
>>>
>>> _test_0:
>>> mov dword ptr [rsp - 4], esi
>>> xor edx, edx
>>> mov eax, edi
>>> div dword ptr [rsp - 4]
>>> mov eax, edx
>>> ret
>>> _test_1:
>>> mov eax, edi
>>> imul rax, rax, 524353
>>> shr rax, 32
>>> mov ecx, edi
>>> sub ecx, eax
>>> shr ecx
>>> add ecx, eax
>>> shr ecx, 12
>>> imul eax, ecx, 8191
>>> sub edi, eax
>>> mov eax, edi
>>> ret
>>> _test_2:
>>> mov eax, edi
>>> and eax, 8191
>>> mov ecx, edi
>>> shr ecx, 13
>>> add eax, ecx
>>> add ecx, edi
>>> and ecx, 8191
>>> shr eax, 13
>>> lea edx, [rcx + rax]
>>> cmp edx, 8190
>>> lea eax, [rcx + rax - 8191]
>>> cmovbe eax, edx
>>> ret
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
>>> To post to this group, send email to sw-dev@groups.riscv.org.
>>> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>>> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/2600D96D-94BC-4259-9D39-DE4993859281%40mac.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.
>> To post to this group, send email to sw-dev@groups.riscv.org.
>> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/9F3C9DE6-F00B-4402-A83B-354455DEAFFA%40mac.com.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Spurious register spill with volatile function argument
2016-03-27 5:58 ` Spurious register spill with volatile function argument Michael Clark
@ 2016-03-28 9:16 ` Andrew Haley
2016-03-28 12:12 ` Florian Weimer
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Haley @ 2016-03-28 9:16 UTC (permalink / raw)
To: Michael Clark, gcc, cfe-dev, Andrew Waterman; +Cc: RISC-V Software Developers
On 27/03/16 06:57, Michael Clark wrote:
> GCC, Clang folk, any ideas on why there is a stack spill for a
> volatile register argument passed in esi? Does volatile force the
> argument to have storage allocated on the stack? Is this a corner
> case in the C standard? This argument in the x86_64 calling
> convention only has a register, so technically it canât change
> outside the control of the C "virtual machineâ so volatile has a
> vague meaning here.
"volatile" doesn't really mean very much, formally speaking. Sure, the
standard says "accesses to volatile objects are evaluated
strictly according to the rules of the abstract machine," but nowhere
is it specified exactly what constitutes an access. (To be precise,
"what constitutes an access to an object that has volatile-qualified
type is implementation-defined.")
So, we have to fall back to tradition. Traditionally, all volatile
objects are allocated stack slots and all accesses to them are memory
accesses. This is consistent behaviour, and has been for a long time.
It is also extremely useful when debugging optimized code.
> volatile for scalar function arguments seems to mean: âmake this
> volatile and subject to change outside of the compilerâ rather than
> being a qualifier for its storage (which is a register).
No, arguments are not necessarily stored in registers: they're passed
in registers, but after function entry function they're just auto
variables and are stored wherever the compiler likes.
Andrew.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Spurious register spill with volatile function argument
2016-03-28 9:16 ` Andrew Haley
@ 2016-03-28 12:12 ` Florian Weimer
2016-03-28 14:34 ` Paul_Koning
0 siblings, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2016-03-28 12:12 UTC (permalink / raw)
To: Andrew Haley
Cc: Michael Clark, gcc, cfe-dev, Andrew Waterman, RISC-V Software Developers
* Andrew Haley:
> "volatile" doesn't really mean very much, formally speaking. Sure, the
> standard says "accesses to volatile objects are evaluated
> strictly according to the rules of the abstract machine," but nowhere
> is it specified exactly what constitutes an access.
Reading or modifying an object is defined as “access”.
The problem is that “reading” is either not defined, or the existing
flatly contradicts existing practice.
For example, if p is a pointer to a struct, will the expression &p->m
read *p?
Previously, this wasn't very interesting. But under the model memory,
it's suddenly quite relevant. If reading p->m implies a read of the
entire object *p, you cannot use a member to synchronize access to
other members of the struct. For example, if m is a mutex, and
carefully acquire the mutex before you read or write other members,
you still have data race between a write to some other member and the
acquisition of the mutex because the mutex acquisition reads the
entire struct (including the member written to ).
One possible cure is to take the address of the mutex and keep track
of it separately. Or you could construct a pointer using offsetof.
But no one is doing that, obviously.
This is not entirely hypothetical. Even today, GCC's aliasing
analysis requires that those implicit whole-object reads take place,
to make certain forms of type-punning invalid which would otherwise be
well-defined (and for which GCC would generate invalid code).
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Spurious register spill with volatile function argument
2016-03-28 12:12 ` Florian Weimer
@ 2016-03-28 14:34 ` Paul_Koning
2016-03-28 15:10 ` Florian Weimer
0 siblings, 1 reply; 5+ messages in thread
From: Paul_Koning @ 2016-03-28 14:34 UTC (permalink / raw)
To: fw; +Cc: gcc, cfe-dev, sw-dev
> On Mar 28, 2016, at 8:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>
> ...
> The problem is that “reading” is either not defined, or the existing
> flatly contradicts existing practice.
>
> For example, if p is a pointer to a struct, will the expression &p->m
> read *p?
Presumably the offset of m is substantially larger than 0? If so, my answer would be "it had better not". Does any compiler treat that statement as an access to *p ?
paul
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Spurious register spill with volatile function argument
2016-03-28 14:34 ` Paul_Koning
@ 2016-03-28 15:10 ` Florian Weimer
0 siblings, 0 replies; 5+ messages in thread
From: Florian Weimer @ 2016-03-28 15:10 UTC (permalink / raw)
To: Paul_Koning; +Cc: gcc, cfe-dev, sw-dev
* Paul Koning:
>> On Mar 28, 2016, at 8:11 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>
>> ...
>> The problem is that “reading” is either not defined, or the existing
>> flatly contradicts existing practice.
>>
>> For example, if p is a pointer to a struct, will the expression &p->m
>> read *p?
>
> Presumably the offset of m is substantially larger than 0? If so, my
> answer would be "it had better not". Does any compiler treat that
> statement as an access to *p ?
As I tried to explain, GCC does, for aliasing purposes. For the
expression p->m, there is an implicit read of *p, asserting that the
static and dynamic types match.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-03-28 15:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <2600D96D-94BC-4259-9D39-DE4993859281@mac.com>
[not found] ` <9F3C9DE6-F00B-4402-A83B-354455DEAFFA@mac.com>
[not found] ` <CA++6G0CK3Yu-w9fhOrvpRP49qz9=Vs20J_=R28Mq=zCJOH+-UA@mail.gmail.com>
2016-03-27 5:58 ` Spurious register spill with volatile function argument Michael Clark
2016-03-28 9:16 ` Andrew Haley
2016-03-28 12:12 ` Florian Weimer
2016-03-28 14:34 ` Paul_Koning
2016-03-28 15:10 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).