public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* naked functions on x86 architecture
@ 2009-06-12 16:20 Zachary Turner
  2009-06-12 16:32 ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Zachary Turner @ 2009-06-12 16:20 UTC (permalink / raw)
  To: gcc

Hi,

I know this has been discussed before, I have read through some of the
archives and read about some of the rationale.  I want to raise it
again however, because I don't think anyone has ever presented a good
example of where it is really really useful on x86 architectures.

In general, it is very useful for selecting different versions of
instructions (byte, word, dword, qword) with a template
specialization.  I'll post some code that works under visual c++ 9.0
to demonstrate what I mean.  The following function finds the index of
the first zero (or nonzero with similar template specializations
replacing rep with repne) "element" of an arbitrarily sized array (and
is the fastest way I know to do so).

template<typename T> int __declspec(naked) scas();

template<> int __declspec(naked) scas<boost::uint8_t>() { __asm rep
scasb __asm mov eax, edi __asm ret }
template<> int __declspec(naked) scas<boost::uint16_t>() { __asm rep
scasw __asm mov eax, edi __asm ret }
template<> int __declspec(naked) scas<boost::uint32_t>() { __asm rep
scasd __asm mov eax, edi __asm ret }
#if (sizeof(void*) == sizeof(boost::uint64_t))
template<> int __declspec(naked) scas<boost::uint64_t>() { __asm rep
scasq __asm mov rax, rdi __asm ret }
#endif

template<typename T>
int find_first_nonzero_scas(T* x, int cnt)
{
    int result = 0;
    __asm {
        xor eax, eax
        mov edi, x
        mov ecx, cnt
    }
    result = scas<T>();
    result -= reinterpret_cast<int>(x);
    result /= sizeof(T);
    return --result;
}


This is one example, but it illustrates a general concept that I think
is really useful and I personally have used numerous times for lots of
other instructions than SCAS.  If there is a way to achieve this
without using a naked function then please advise.  I'd rather not
resort to an if/then/else when the value of every test is known at
compile time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: naked functions on x86 architecture
  2009-06-12 16:20 naked functions on x86 architecture Zachary Turner
@ 2009-06-12 16:32 ` Paolo Bonzini
  2009-06-12 17:25   ` Zachary Turner
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2009-06-12 16:32 UTC (permalink / raw)
  To: Zachary Turner; +Cc: gcc

> This is one example, but it illustrates a general concept that I think
> is really useful and I personally have used numerous times for lots of
> other instructions than SCAS.  If there is a way to achieve this
> without using a naked function then please advise.

Keeping the __asm syntax, I'd be surprised if this did not work:

template<typename T>
int find_first_nonzero_scas(T* x, int cnt)
{
     int result = 0;
     __asm {
         xor eax, eax
         mov edi, x
         mov ecx, cnt
     }
     if (sizeof (T) == 1)
       __asm { rep scasb; mov result, edi }
     if (sizeof (T) == 2)
       __asm { rep scasw; mov result, edi }
     if (sizeof (T) == 4)
       __asm { rep scasl; mov result, edi }
     result -= reinterpret_cast<int>(x);
     result /= sizeof(T);
     return --result;
}

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: naked functions on x86 architecture
  2009-06-12 16:32 ` Paolo Bonzini
@ 2009-06-12 17:25   ` Zachary Turner
  2009-06-12 17:39     ` Andrew Haley
  0 siblings, 1 reply; 6+ messages in thread
From: Zachary Turner @ 2009-06-12 17:25 UTC (permalink / raw)
  To: gcc

On Fri, Jun 12, 2009 at 11:32 AM, Paolo Bonzini<paolo.bonzini@gmail.com> wrote:
>> This is one example, but it illustrates a general concept that I think
>> is really useful and I personally have used numerous times for lots of
>> other instructions than SCAS.  If there is a way to achieve this
>> without using a naked function then please advise.
>
> Keeping the __asm syntax, I'd be surprised if this did not work:
>
> template<typename T>
> int find_first_nonzero_scas(T* x, int cnt)
> {
>    int result = 0;
>    __asm {
>        xor eax, eax
>        mov edi, x
>        mov ecx, cnt
>    }
>    if (sizeof (T) == 1)
>      __asm { rep scasb; mov result, edi }
>    if (sizeof (T) == 2)
>      __asm { rep scasw; mov result, edi }
>    if (sizeof (T) == 4)
>      __asm { rep scasl; mov result, edi }
>    result -= reinterpret_cast<int>(x);
>    result /= sizeof(T);
>    return --result;
> }
>
> Paolo
>

Sorry about the asm syntax, I still haven't used inline assembly in
gcc so I haven't looked at the syntax yet.  I was just going to start
porting over some code to work on gcc when I started looking into the
naked issue.

That being said, what you suggest will indeed work, and be optimized
to be as efficient as the template method.  It's what I'll probably
end up doing as a fallback.  But it's very ugly, and there are a
couple of cases where I have much more inline assembly than in this
particular example.  So I have to litter segments of code like that
all throughout the function.  I suppose I could wrap it in a macro for
readability, but its' nicer if it's just integrated with C++ like
everything else.  Its supported for many other platforms, it just
seems a little odd to explicitly not support on the most common
platform.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: naked functions on x86 architecture
  2009-06-12 17:25   ` Zachary Turner
@ 2009-06-12 17:39     ` Andrew Haley
  2009-06-12 17:56       ` Zachary Turner
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Haley @ 2009-06-12 17:39 UTC (permalink / raw)
  To: Zachary Turner; +Cc: gcc

Zachary Turner wrote:
> On Fri, Jun 12, 2009 at 11:32 AM, Paolo Bonzini<paolo.bonzini@gmail.com> wrote:
>>> This is one example, but it illustrates a general concept that I think
>>> is really useful and I personally have used numerous times for lots of
>>> other instructions than SCAS.  If there is a way to achieve this
>>> without using a naked function then please advise.
>> Keeping the __asm syntax, I'd be surprised if this did not work:
>>
>> template<typename T>
>> int find_first_nonzero_scas(T* x, int cnt)
>> {
>>    int result = 0;
>>    __asm {
>>        xor eax, eax
>>        mov edi, x
>>        mov ecx, cnt
>>    }
>>    if (sizeof (T) == 1)
>>      __asm { rep scasb; mov result, edi }
>>    if (sizeof (T) == 2)
>>      __asm { rep scasw; mov result, edi }
>>    if (sizeof (T) == 4)
>>      __asm { rep scasl; mov result, edi }
>>    result -= reinterpret_cast<int>(x);
>>    result /= sizeof(T);
>>    return --result;
>> }

> Sorry about the asm syntax, I still haven't used inline assembly in
> gcc so I haven't looked at the syntax yet.  I was just going to start
> porting over some code to work on gcc when I started looking into the
> naked issue.
> 
> That being said, what you suggest will indeed work, and be optimized
> to be as efficient as the template method.  It's what I'll probably
> end up doing as a fallback.  But it's very ugly, and there are a
> couple of cases where I have much more inline assembly than in this
> particular example.  So I have to litter segments of code like that
> all throughout the function.  I suppose I could wrap it in a macro for
> readability, but its' nicer if it's just integrated with C++ like
> everything else.  Its supported for many other platforms, it just
> seems a little odd to explicitly not support on the most common
> platform.

I've never quite understood why anyone would want naked asm functions in
C source code.  You have an assembler, and it's trivial to write the
functions you want in assembly language.  Well, apart from the name
mangling, but that's pretty simple.

And even then, surely you'd be much better off with an inline asm than
calling a naked function.

Andrew.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: naked functions on x86 architecture
  2009-06-12 17:39     ` Andrew Haley
@ 2009-06-12 17:56       ` Zachary Turner
  2009-06-12 18:47         ` Andrew Haley
  0 siblings, 1 reply; 6+ messages in thread
From: Zachary Turner @ 2009-06-12 17:56 UTC (permalink / raw)
  To: Andrew Haley; +Cc: gcc

On Fri, Jun 12, 2009 at 12:39 PM, Andrew Haley<aph@redhat.com> wrote:
> Zachary Turner wrote:
>> On Fri, Jun 12, 2009 at 11:32 AM, Paolo Bonzini<paolo.bonzini@gmail.com> wrote:
>>>> This is one example, but it illustrates a general concept that I think
>>>> is really useful and I personally have used numerous times for lots of
>>>> other instructions than SCAS.  If there is a way to achieve this
>>>> without using a naked function then please advise.
>>> Keeping the __asm syntax, I'd be surprised if this did not work:
>>>
>>> template<typename T>
>>> int find_first_nonzero_scas(T* x, int cnt)
>>> {
>>>    int result = 0;
>>>    __asm {
>>>        xor eax, eax
>>>        mov edi, x
>>>        mov ecx, cnt
>>>    }
>>>    if (sizeof (T) == 1)
>>>      __asm { rep scasb; mov result, edi }
>>>    if (sizeof (T) == 2)
>>>      __asm { rep scasw; mov result, edi }
>>>    if (sizeof (T) == 4)
>>>      __asm { rep scasl; mov result, edi }
>>>    result -= reinterpret_cast<int>(x);
>>>    result /= sizeof(T);
>>>    return --result;
>>> }
>
>> Sorry about the asm syntax, I still haven't used inline assembly in
>> gcc so I haven't looked at the syntax yet.  I was just going to start
>> porting over some code to work on gcc when I started looking into the
>> naked issue.
>>
>> That being said, what you suggest will indeed work, and be optimized
>> to be as efficient as the template method.  It's what I'll probably
>> end up doing as a fallback.  But it's very ugly, and there are a
>> couple of cases where I have much more inline assembly than in this
>> particular example.  So I have to litter segments of code like that
>> all throughout the function.  I suppose I could wrap it in a macro for
>> readability, but its' nicer if it's just integrated with C++ like
>> everything else.  Its supported for many other platforms, it just
>> seems a little odd to explicitly not support on the most common
>> platform.
>
> I've never quite understood why anyone would want naked asm functions in
> C source code.  You have an assembler, and it's trivial to write the
> functions you want in assembly language.  Well, apart from the name
> mangling, but that's pretty simple.
>
> And even then, surely you'd be much better off with an inline asm than
> calling a naked function.
>
> Andrew.
>
>
>

I guess the same reason people would want any asm functions in C
source code.  Sometimes it's just the best way to express something.
Like in the example I mentioned, I could write 4 different functions
in assembly, one for each size suffix, wrap them all up in a separate
assembly language file but IMHO it's more readable, quicker to code,
and more expressive to use a template switch like I've done.  C++ is
built on the philosophy of giving you enough rope to hang yourself
with.

I don't think there's a better way to express the selection of an
instruction based on operand size than through a naked template
specialization.

Using a .s file is more difficult to port across different compilers.
Many compilers provide support for naked functions and it's easy to
just use a #ifdef to check which compiler you're running on and define
the appropriate naked declaration string.

Besides, it's supported for embedded architectures, it's frustrating
because it feels like back in the days of a 386SX's where the
processors had working FPUs on them but they were switched off "just
because".  All the investment has already been done to add support for
naked functions, so I think people should be "permitted" to use it,
even if other people feel like they should be using something else.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: naked functions on x86 architecture
  2009-06-12 17:56       ` Zachary Turner
@ 2009-06-12 18:47         ` Andrew Haley
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Haley @ 2009-06-12 18:47 UTC (permalink / raw)
  To: Zachary Turner; +Cc: gcc

Zachary Turner wrote:

> I guess the same reason people would want any asm functions in C
> source code.  Sometimes it's just the best way to express something.
> Like in the example I mentioned, I could write 4 different functions
> in assembly, one for each size suffix, wrap them all up in a separate
> assembly language file but IMHO it's more readable, quicker to code,
> and more expressive to use a template switch like I've done.  C++ is
> built on the philosophy of giving you enough rope to hang yourself
> with.
> 
> I don't think there's a better way to express the selection of an
> instruction based on operand size than through a naked template
> specialization.
> 
> Using a .s file is more difficult to port across different compilers.
> Many compilers provide support for naked functions and it's easy to
> just use a #ifdef to check which compiler you're running on and define
> the appropriate naked declaration string.
> 
> Besides, it's supported for embedded architectures, it's frustrating
> because it feels like back in the days of a 386SX's where the
> processors had working FPUs on them but they were switched off "just
> because".  All the investment has already been done to add support for
> naked functions, so I think people should be "permitted" to use it,
> even if other people feel like they should be using something else.

I still don't get it.  A gcc asm version of this is

-------------------------------------------------------------------------
template<typename T> intptr_t scas(T *a, T val, int len);

template<> intptr_t scas<uint8_t>(uint8_t *a, uint8_t val, int len)
{
  intptr_t result;
  __asm__ ("rep scasb" : "=D"(result): "a"(val), "D"(a), "c"(len));
  return result;
}

template<typename T>
int find_first_nonzero_scas(T* x, int cnt)
{
    intptr_t  result = 0;
    result = scas<T>(x, 0, cnt);
    result -= reinterpret_cast<intptr_t>(x);
    result /= sizeof(T);
    return --result;
}
-------------------------------------------------------------------------

which, when instantiated, generates

int find_first_nonzero_scas<unsigned char>(unsigned char*, int):
	movq	%rdi, %rdx
	xorl	%eax, %eax
	movl	%esi, %ecx
	notq	%rdx
	rep scasb
	leaq	(%rdx,%rdi), %rax
	ret

How is this not better in every way ?

I can understand that you want something compatible with your source.  But
you said "I don't think anyone has ever presented a good example of where
[naked asms are] really really useful on x86 architectures."

Baffled,
Andrew.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-12 18:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-12 16:20 naked functions on x86 architecture Zachary Turner
2009-06-12 16:32 ` Paolo Bonzini
2009-06-12 17:25   ` Zachary Turner
2009-06-12 17:39     ` Andrew Haley
2009-06-12 17:56       ` Zachary Turner
2009-06-12 18:47         ` Andrew Haley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).