public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode
@ 2012-06-13 21:59 Jeroen van Bemmel
  2012-06-14  0:37 ` Ian Lance Taylor
  0 siblings, 1 reply; 3+ messages in thread
From: Jeroen van Bemmel @ 2012-06-13 21:59 UTC (permalink / raw)
  To: gcc-help

Hi,

I have ported an SSE4 strcmp function from 
http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
to GCC inline assembly:

long __res;
__asm__ __volatile__(
         "sub        $16, %4                    \n"
         "1:\n"
         "add        $16, %4                     \n"
         "movdqu        (%4), %%xmm0 \n"    // Could use any XMM, using 
register constraint "x"
         // ".byte 0x48                           \n"    // REX prefix 
with REX.w=1, to get result in RCX
         "pcmpistri    $0x18, (%4,%0), %%xmm0  \n"    // 
EQUAL_EACH(0x08) + NEGATIVE_POLARITY(0x10)
         "ja 1b                                \n"
         "jc 2f                                 \n"
         "xor %0, %0                     \n"
         "jmp 3f                              \n"    // XXX Extra jump 
could be avoided in pure asm
         "2:\n"
         "add %4, %0                     \n"
         "movzxb (%0,%1), %0      \n"
         "movzxb (%4,%1), %4      \n"
         "sub %4, %0                     \n"
         "3:\n"
     : "=a"(__res), "=c"(cs) : "0"(cs-ct), "1"(0L), "r"(ct) : "xmm0" );

     return (int) __res;

The problem with this code is that "pcmpistri" returns its result in ECX 
(i.e. the lower 32 bits of RCX), while the "movzxb" instructions use the 
full RCX register.
One solution is to insert a REX prefix with REX.w bit set ( any gas 
directive for this? )

However, I'd prefer to have gcc clear RCX at the beginning of the 
function. The above code loads the "c" register with 0, but the 
resulting asm code is
"xorl    ecx, ecx"

Is this a bug in GCC? Or how do I get it to clear the full RCX, without 
doing it 'manually' in the asm block?

Thanks,
Jeroen

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode
  2012-06-13 21:59 Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode Jeroen van Bemmel
@ 2012-06-14  0:37 ` Ian Lance Taylor
  2012-06-14  5:37   ` Jeroen van Bemmel
  0 siblings, 1 reply; 3+ messages in thread
From: Ian Lance Taylor @ 2012-06-14  0:37 UTC (permalink / raw)
  To: Jeroen van Bemmel; +Cc: gcc-help

Jeroen van Bemmel <jbemmel@zonnet.nl> writes:

> I have ported an SSE4 strcmp function from
> http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
> to GCC inline assembly:
>
> long __res;
> __asm__ __volatile__(
>         "sub        $16, %4                    \n"
>         "1:\n"
>         "add        $16, %4                     \n"
>         "movdqu        (%4), %%xmm0 \n"    // Could use any XMM, using
> register constraint "x"
>         // ".byte 0x48                           \n"    // REX prefix
> with REX.w=1, to get result in RCX
>         "pcmpistri    $0x18, (%4,%0), %%xmm0  \n"    //
> EQUAL_EACH(0x08) + NEGATIVE_POLARITY(0x10)
>         "ja 1b                                \n"
>         "jc 2f                                 \n"
>         "xor %0, %0                     \n"
>         "jmp 3f                              \n"    // XXX Extra jump
> could be avoided in pure asm
>         "2:\n"
>         "add %4, %0                     \n"
>         "movzxb (%0,%1), %0      \n"
>         "movzxb (%4,%1), %4      \n"
>         "sub %4, %0                     \n"
>         "3:\n"
>     : "=a"(__res), "=c"(cs) : "0"(cs-ct), "1"(0L), "r"(ct) : "xmm0" );
>
>     return (int) __res;
>
> The problem with this code is that "pcmpistri" returns its result in
> ECX (i.e. the lower 32 bits of RCX), while the "movzxb" instructions
> use the full RCX register.
> One solution is to insert a REX prefix with REX.w bit set ( any gas
> directive for this? )

Normally setting the low 32 bits of an x86 register will zero out the
upper 32 bits.  Is that not true for pcmpistri?

Otherwise, it sounds like you want the addressing mode (%rax,%ecx).
Does x86 really have that addressing mode?

Why not just zero extend %ecx to %rcx?

> However, I'd prefer to have gcc clear RCX at the beginning of the
> function. The above code loads the "c" register with 0, but the
> resulting asm code is
> "xorl    ecx, ecx"

That instruction will indeed set %ecx to zero.  Think about it.

Ian

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode
  2012-06-14  0:37 ` Ian Lance Taylor
@ 2012-06-14  5:37   ` Jeroen van Bemmel
  0 siblings, 0 replies; 3+ messages in thread
From: Jeroen van Bemmel @ 2012-06-14  5:37 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

Hi Ian,

I was confused by the Intel documentation, which states that pcmpistri 
provides its result in ECX. I stepped through a debugger, and verified 
that the upper 32 bits in RCX indeed get set to 0.

I also found that while I cannot distinguish between 32-bit and 64-bit 
forms of registers in constraints, GCC will choose the optimal flavor 
for me.

Thanks,
Jeroen

On 06/13/12 18:37, Ian Lance Taylor wrote:
> Jeroen van Bemmel<jbemmel@zonnet.nl>  writes:
>
>> I have ported an SSE4 strcmp function from
>> http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
>> to GCC inline assembly:
>>
>> long __res;
>> __asm__ __volatile__(
>>          "sub        $16, %4                    \n"
>>          "1:\n"
>>          "add        $16, %4                     \n"
>>          "movdqu        (%4), %%xmm0 \n"    // Could use any XMM, using
>> register constraint "x"
>>          // ".byte 0x48                           \n"    // REX prefix
>> with REX.w=1, to get result in RCX
>>          "pcmpistri    $0x18, (%4,%0), %%xmm0  \n"    //
>> EQUAL_EACH(0x08) + NEGATIVE_POLARITY(0x10)
>>          "ja 1b                                \n"
>>          "jc 2f                                 \n"
>>          "xor %0, %0                     \n"
>>          "jmp 3f                              \n"    // XXX Extra jump
>> could be avoided in pure asm
>>          "2:\n"
>>          "add %4, %0                     \n"
>>          "movzxb (%0,%1), %0      \n"
>>          "movzxb (%4,%1), %4      \n"
>>          "sub %4, %0                     \n"
>>          "3:\n"
>>      : "=a"(__res), "=c"(cs) : "0"(cs-ct), "1"(0L), "r"(ct) : "xmm0" );
>>
>>      return (int) __res;
>>
>> The problem with this code is that "pcmpistri" returns its result in
>> ECX (i.e. the lower 32 bits of RCX), while the "movzxb" instructions
>> use the full RCX register.
>> One solution is to insert a REX prefix with REX.w bit set ( any gas
>> directive for this? )
> Normally setting the low 32 bits of an x86 register will zero out the
> upper 32 bits.  Is that not true for pcmpistri?
>
> Otherwise, it sounds like you want the addressing mode (%rax,%ecx).
> Does x86 really have that addressing mode?
>
> Why not just zero extend %ecx to %rcx?
>
>> However, I'd prefer to have gcc clear RCX at the beginning of the
>> function. The above code loads the "c" register with 0, but the
>> resulting asm code is
>> "xorl    ecx, ecx"
> That instruction will indeed set %ecx to zero.  Think about it.
>
> Ian
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-06-14  5:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-13 21:59 Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode Jeroen van Bemmel
2012-06-14  0:37 ` Ian Lance Taylor
2012-06-14  5:37   ` Jeroen van Bemmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).