* naked functions on x86 architecture @ 2009-06-12 16:20 Zachary Turner 2009-06-12 16:32 ` Paolo Bonzini 0 siblings, 1 reply; 6+ messages in thread From: Zachary Turner @ 2009-06-12 16:20 UTC (permalink / raw) To: gcc Hi, I know this has been discussed before, I have read through some of the archives and read about some of the rationale. I want to raise it again however, because I don't think anyone has ever presented a good example of where it is really really useful on x86 architectures. In general, it is very useful for selecting different versions of instructions (byte, word, dword, qword) with a template specialization. I'll post some code that works under visual c++ 9.0 to demonstrate what I mean. The following function finds the index of the first zero (or nonzero with similar template specializations replacing rep with repne) "element" of an arbitrarily sized array (and is the fastest way I know to do so). template<typename T> int __declspec(naked) scas(); template<> int __declspec(naked) scas<boost::uint8_t>() { __asm rep scasb __asm mov eax, edi __asm ret } template<> int __declspec(naked) scas<boost::uint16_t>() { __asm rep scasw __asm mov eax, edi __asm ret } template<> int __declspec(naked) scas<boost::uint32_t>() { __asm rep scasd __asm mov eax, edi __asm ret } #if (sizeof(void*) == sizeof(boost::uint64_t)) template<> int __declspec(naked) scas<boost::uint64_t>() { __asm rep scasq __asm mov rax, rdi __asm ret } #endif template<typename T> int find_first_nonzero_scas(T* x, int cnt) { int result = 0; __asm { xor eax, eax mov edi, x mov ecx, cnt } result = scas<T>(); result -= reinterpret_cast<int>(x); result /= sizeof(T); return --result; } This is one example, but it illustrates a general concept that I think is really useful and I personally have used numerous times for lots of other instructions than SCAS. If there is a way to achieve this without using a naked function then please advise. I'd rather not resort to an if/then/else when the value of every test is known at compile time. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: naked functions on x86 architecture 2009-06-12 16:20 naked functions on x86 architecture Zachary Turner @ 2009-06-12 16:32 ` Paolo Bonzini 2009-06-12 17:25 ` Zachary Turner 0 siblings, 1 reply; 6+ messages in thread From: Paolo Bonzini @ 2009-06-12 16:32 UTC (permalink / raw) To: Zachary Turner; +Cc: gcc > This is one example, but it illustrates a general concept that I think > is really useful and I personally have used numerous times for lots of > other instructions than SCAS. If there is a way to achieve this > without using a naked function then please advise. Keeping the __asm syntax, I'd be surprised if this did not work: template<typename T> int find_first_nonzero_scas(T* x, int cnt) { int result = 0; __asm { xor eax, eax mov edi, x mov ecx, cnt } if (sizeof (T) == 1) __asm { rep scasb; mov result, edi } if (sizeof (T) == 2) __asm { rep scasw; mov result, edi } if (sizeof (T) == 4) __asm { rep scasl; mov result, edi } result -= reinterpret_cast<int>(x); result /= sizeof(T); return --result; } Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: naked functions on x86 architecture 2009-06-12 16:32 ` Paolo Bonzini @ 2009-06-12 17:25 ` Zachary Turner 2009-06-12 17:39 ` Andrew Haley 0 siblings, 1 reply; 6+ messages in thread From: Zachary Turner @ 2009-06-12 17:25 UTC (permalink / raw) To: gcc On Fri, Jun 12, 2009 at 11:32 AM, Paolo Bonzini<paolo.bonzini@gmail.com> wrote: >> This is one example, but it illustrates a general concept that I think >> is really useful and I personally have used numerous times for lots of >> other instructions than SCAS. If there is a way to achieve this >> without using a naked function then please advise. > > Keeping the __asm syntax, I'd be surprised if this did not work: > > template<typename T> > int find_first_nonzero_scas(T* x, int cnt) > { > int result = 0; > __asm { > xor eax, eax > mov edi, x > mov ecx, cnt > } > if (sizeof (T) == 1) > __asm { rep scasb; mov result, edi } > if (sizeof (T) == 2) > __asm { rep scasw; mov result, edi } > if (sizeof (T) == 4) > __asm { rep scasl; mov result, edi } > result -= reinterpret_cast<int>(x); > result /= sizeof(T); > return --result; > } > > Paolo > Sorry about the asm syntax, I still haven't used inline assembly in gcc so I haven't looked at the syntax yet. I was just going to start porting over some code to work on gcc when I started looking into the naked issue. That being said, what you suggest will indeed work, and be optimized to be as efficient as the template method. It's what I'll probably end up doing as a fallback. But it's very ugly, and there are a couple of cases where I have much more inline assembly than in this particular example. So I have to litter segments of code like that all throughout the function. I suppose I could wrap it in a macro for readability, but its' nicer if it's just integrated with C++ like everything else. Its supported for many other platforms, it just seems a little odd to explicitly not support on the most common platform. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: naked functions on x86 architecture 2009-06-12 17:25 ` Zachary Turner @ 2009-06-12 17:39 ` Andrew Haley 2009-06-12 17:56 ` Zachary Turner 0 siblings, 1 reply; 6+ messages in thread From: Andrew Haley @ 2009-06-12 17:39 UTC (permalink / raw) To: Zachary Turner; +Cc: gcc Zachary Turner wrote: > On Fri, Jun 12, 2009 at 11:32 AM, Paolo Bonzini<paolo.bonzini@gmail.com> wrote: >>> This is one example, but it illustrates a general concept that I think >>> is really useful and I personally have used numerous times for lots of >>> other instructions than SCAS. If there is a way to achieve this >>> without using a naked function then please advise. >> Keeping the __asm syntax, I'd be surprised if this did not work: >> >> template<typename T> >> int find_first_nonzero_scas(T* x, int cnt) >> { >> int result = 0; >> __asm { >> xor eax, eax >> mov edi, x >> mov ecx, cnt >> } >> if (sizeof (T) == 1) >> __asm { rep scasb; mov result, edi } >> if (sizeof (T) == 2) >> __asm { rep scasw; mov result, edi } >> if (sizeof (T) == 4) >> __asm { rep scasl; mov result, edi } >> result -= reinterpret_cast<int>(x); >> result /= sizeof(T); >> return --result; >> } > Sorry about the asm syntax, I still haven't used inline assembly in > gcc so I haven't looked at the syntax yet. I was just going to start > porting over some code to work on gcc when I started looking into the > naked issue. > > That being said, what you suggest will indeed work, and be optimized > to be as efficient as the template method. It's what I'll probably > end up doing as a fallback. But it's very ugly, and there are a > couple of cases where I have much more inline assembly than in this > particular example. So I have to litter segments of code like that > all throughout the function. I suppose I could wrap it in a macro for > readability, but its' nicer if it's just integrated with C++ like > everything else. Its supported for many other platforms, it just > seems a little odd to explicitly not support on the most common > platform. I've never quite understood why anyone would want naked asm functions in C source code. You have an assembler, and it's trivial to write the functions you want in assembly language. Well, apart from the name mangling, but that's pretty simple. And even then, surely you'd be much better off with an inline asm than calling a naked function. Andrew. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: naked functions on x86 architecture 2009-06-12 17:39 ` Andrew Haley @ 2009-06-12 17:56 ` Zachary Turner 2009-06-12 18:47 ` Andrew Haley 0 siblings, 1 reply; 6+ messages in thread From: Zachary Turner @ 2009-06-12 17:56 UTC (permalink / raw) To: Andrew Haley; +Cc: gcc On Fri, Jun 12, 2009 at 12:39 PM, Andrew Haley<aph@redhat.com> wrote: > Zachary Turner wrote: >> On Fri, Jun 12, 2009 at 11:32 AM, Paolo Bonzini<paolo.bonzini@gmail.com> wrote: >>>> This is one example, but it illustrates a general concept that I think >>>> is really useful and I personally have used numerous times for lots of >>>> other instructions than SCAS. If there is a way to achieve this >>>> without using a naked function then please advise. >>> Keeping the __asm syntax, I'd be surprised if this did not work: >>> >>> template<typename T> >>> int find_first_nonzero_scas(T* x, int cnt) >>> { >>> int result = 0; >>> __asm { >>> xor eax, eax >>> mov edi, x >>> mov ecx, cnt >>> } >>> if (sizeof (T) == 1) >>> __asm { rep scasb; mov result, edi } >>> if (sizeof (T) == 2) >>> __asm { rep scasw; mov result, edi } >>> if (sizeof (T) == 4) >>> __asm { rep scasl; mov result, edi } >>> result -= reinterpret_cast<int>(x); >>> result /= sizeof(T); >>> return --result; >>> } > >> Sorry about the asm syntax, I still haven't used inline assembly in >> gcc so I haven't looked at the syntax yet. I was just going to start >> porting over some code to work on gcc when I started looking into the >> naked issue. >> >> That being said, what you suggest will indeed work, and be optimized >> to be as efficient as the template method. It's what I'll probably >> end up doing as a fallback. But it's very ugly, and there are a >> couple of cases where I have much more inline assembly than in this >> particular example. So I have to litter segments of code like that >> all throughout the function. I suppose I could wrap it in a macro for >> readability, but its' nicer if it's just integrated with C++ like >> everything else. Its supported for many other platforms, it just >> seems a little odd to explicitly not support on the most common >> platform. > > I've never quite understood why anyone would want naked asm functions in > C source code. You have an assembler, and it's trivial to write the > functions you want in assembly language. Well, apart from the name > mangling, but that's pretty simple. > > And even then, surely you'd be much better off with an inline asm than > calling a naked function. > > Andrew. > > > I guess the same reason people would want any asm functions in C source code. Sometimes it's just the best way to express something. Like in the example I mentioned, I could write 4 different functions in assembly, one for each size suffix, wrap them all up in a separate assembly language file but IMHO it's more readable, quicker to code, and more expressive to use a template switch like I've done. C++ is built on the philosophy of giving you enough rope to hang yourself with. I don't think there's a better way to express the selection of an instruction based on operand size than through a naked template specialization. Using a .s file is more difficult to port across different compilers. Many compilers provide support for naked functions and it's easy to just use a #ifdef to check which compiler you're running on and define the appropriate naked declaration string. Besides, it's supported for embedded architectures, it's frustrating because it feels like back in the days of a 386SX's where the processors had working FPUs on them but they were switched off "just because". All the investment has already been done to add support for naked functions, so I think people should be "permitted" to use it, even if other people feel like they should be using something else. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: naked functions on x86 architecture 2009-06-12 17:56 ` Zachary Turner @ 2009-06-12 18:47 ` Andrew Haley 0 siblings, 0 replies; 6+ messages in thread From: Andrew Haley @ 2009-06-12 18:47 UTC (permalink / raw) To: Zachary Turner; +Cc: gcc Zachary Turner wrote: > I guess the same reason people would want any asm functions in C > source code. Sometimes it's just the best way to express something. > Like in the example I mentioned, I could write 4 different functions > in assembly, one for each size suffix, wrap them all up in a separate > assembly language file but IMHO it's more readable, quicker to code, > and more expressive to use a template switch like I've done. C++ is > built on the philosophy of giving you enough rope to hang yourself > with. > > I don't think there's a better way to express the selection of an > instruction based on operand size than through a naked template > specialization. > > Using a .s file is more difficult to port across different compilers. > Many compilers provide support for naked functions and it's easy to > just use a #ifdef to check which compiler you're running on and define > the appropriate naked declaration string. > > Besides, it's supported for embedded architectures, it's frustrating > because it feels like back in the days of a 386SX's where the > processors had working FPUs on them but they were switched off "just > because". All the investment has already been done to add support for > naked functions, so I think people should be "permitted" to use it, > even if other people feel like they should be using something else. I still don't get it. A gcc asm version of this is ------------------------------------------------------------------------- template<typename T> intptr_t scas(T *a, T val, int len); template<> intptr_t scas<uint8_t>(uint8_t *a, uint8_t val, int len) { intptr_t result; __asm__ ("rep scasb" : "=D"(result): "a"(val), "D"(a), "c"(len)); return result; } template<typename T> int find_first_nonzero_scas(T* x, int cnt) { intptr_t result = 0; result = scas<T>(x, 0, cnt); result -= reinterpret_cast<intptr_t>(x); result /= sizeof(T); return --result; } ------------------------------------------------------------------------- which, when instantiated, generates int find_first_nonzero_scas<unsigned char>(unsigned char*, int): movq %rdi, %rdx xorl %eax, %eax movl %esi, %ecx notq %rdx rep scasb leaq (%rdx,%rdi), %rax ret How is this not better in every way ? I can understand that you want something compatible with your source. But you said "I don't think anyone has ever presented a good example of where [naked asms are] really really useful on x86 architectures." Baffled, Andrew. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-06-12 18:47 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-06-12 16:20 naked functions on x86 architecture Zachary Turner 2009-06-12 16:32 ` Paolo Bonzini 2009-06-12 17:25 ` Zachary Turner 2009-06-12 17:39 ` Andrew Haley 2009-06-12 17:56 ` Zachary Turner 2009-06-12 18:47 ` Andrew Haley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).