* Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode
@ 2012-06-13 21:59 Jeroen van Bemmel
2012-06-14 0:37 ` Ian Lance Taylor
0 siblings, 1 reply; 3+ messages in thread
From: Jeroen van Bemmel @ 2012-06-13 21:59 UTC (permalink / raw)
To: gcc-help
Hi,
I have ported an SSE4 strcmp function from
http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
to GCC inline assembly:
long __res;
__asm__ __volatile__(
"sub $16, %4 \n"
"1:\n"
"add $16, %4 \n"
"movdqu (%4), %%xmm0 \n" // Could use any XMM, using
register constraint "x"
// ".byte 0x48 \n" // REX prefix
with REX.w=1, to get result in RCX
"pcmpistri $0x18, (%4,%0), %%xmm0 \n" //
EQUAL_EACH(0x08) + NEGATIVE_POLARITY(0x10)
"ja 1b \n"
"jc 2f \n"
"xor %0, %0 \n"
"jmp 3f \n" // XXX Extra jump
could be avoided in pure asm
"2:\n"
"add %4, %0 \n"
"movzxb (%0,%1), %0 \n"
"movzxb (%4,%1), %4 \n"
"sub %4, %0 \n"
"3:\n"
: "=a"(__res), "=c"(cs) : "0"(cs-ct), "1"(0L), "r"(ct) : "xmm0" );
return (int) __res;
The problem with this code is that "pcmpistri" returns its result in ECX
(i.e. the lower 32 bits of RCX), while the "movzxb" instructions use the
full RCX register.
One solution is to insert a REX prefix with REX.w bit set ( any gas
directive for this? )
However, I'd prefer to have gcc clear RCX at the beginning of the
function. The above code loads the "c" register with 0, but the
resulting asm code is
"xorl ecx, ecx"
Is this a bug in GCC? Or how do I get it to clear the full RCX, without
doing it 'manually' in the asm block?
Thanks,
Jeroen
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode
2012-06-13 21:59 Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode Jeroen van Bemmel
@ 2012-06-14 0:37 ` Ian Lance Taylor
2012-06-14 5:37 ` Jeroen van Bemmel
0 siblings, 1 reply; 3+ messages in thread
From: Ian Lance Taylor @ 2012-06-14 0:37 UTC (permalink / raw)
To: Jeroen van Bemmel; +Cc: gcc-help
Jeroen van Bemmel <jbemmel@zonnet.nl> writes:
> I have ported an SSE4 strcmp function from
> http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
> to GCC inline assembly:
>
> long __res;
> __asm__ __volatile__(
> "sub $16, %4 \n"
> "1:\n"
> "add $16, %4 \n"
> "movdqu (%4), %%xmm0 \n" // Could use any XMM, using
> register constraint "x"
> // ".byte 0x48 \n" // REX prefix
> with REX.w=1, to get result in RCX
> "pcmpistri $0x18, (%4,%0), %%xmm0 \n" //
> EQUAL_EACH(0x08) + NEGATIVE_POLARITY(0x10)
> "ja 1b \n"
> "jc 2f \n"
> "xor %0, %0 \n"
> "jmp 3f \n" // XXX Extra jump
> could be avoided in pure asm
> "2:\n"
> "add %4, %0 \n"
> "movzxb (%0,%1), %0 \n"
> "movzxb (%4,%1), %4 \n"
> "sub %4, %0 \n"
> "3:\n"
> : "=a"(__res), "=c"(cs) : "0"(cs-ct), "1"(0L), "r"(ct) : "xmm0" );
>
> return (int) __res;
>
> The problem with this code is that "pcmpistri" returns its result in
> ECX (i.e. the lower 32 bits of RCX), while the "movzxb" instructions
> use the full RCX register.
> One solution is to insert a REX prefix with REX.w bit set ( any gas
> directive for this? )
Normally setting the low 32 bits of an x86 register will zero out the
upper 32 bits. Is that not true for pcmpistri?
Otherwise, it sounds like you want the addressing mode (%rax,%ecx).
Does x86 really have that addressing mode?
Why not just zero extend %ecx to %rcx?
> However, I'd prefer to have gcc clear RCX at the beginning of the
> function. The above code loads the "c" register with 0, but the
> resulting asm code is
> "xorl ecx, ecx"
That instruction will indeed set %ecx to zero. Think about it.
Ian
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode
2012-06-14 0:37 ` Ian Lance Taylor
@ 2012-06-14 5:37 ` Jeroen van Bemmel
0 siblings, 0 replies; 3+ messages in thread
From: Jeroen van Bemmel @ 2012-06-14 5:37 UTC (permalink / raw)
To: Ian Lance Taylor; +Cc: gcc-help
Hi Ian,
I was confused by the Intel documentation, which states that pcmpistri
provides its result in ECX. I stepped through a debugger, and verified
that the upper 32 bits in RCX indeed get set to 0.
I also found that while I cannot distinguish between 32-bit and 64-bit
forms of registers in constraints, GCC will choose the optimal flavor
for me.
Thanks,
Jeroen
On 06/13/12 18:37, Ian Lance Taylor wrote:
> Jeroen van Bemmel<jbemmel@zonnet.nl> writes:
>
>> I have ported an SSE4 strcmp function from
>> http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
>> to GCC inline assembly:
>>
>> long __res;
>> __asm__ __volatile__(
>> "sub $16, %4 \n"
>> "1:\n"
>> "add $16, %4 \n"
>> "movdqu (%4), %%xmm0 \n" // Could use any XMM, using
>> register constraint "x"
>> // ".byte 0x48 \n" // REX prefix
>> with REX.w=1, to get result in RCX
>> "pcmpistri $0x18, (%4,%0), %%xmm0 \n" //
>> EQUAL_EACH(0x08) + NEGATIVE_POLARITY(0x10)
>> "ja 1b \n"
>> "jc 2f \n"
>> "xor %0, %0 \n"
>> "jmp 3f \n" // XXX Extra jump
>> could be avoided in pure asm
>> "2:\n"
>> "add %4, %0 \n"
>> "movzxb (%0,%1), %0 \n"
>> "movzxb (%4,%1), %4 \n"
>> "sub %4, %0 \n"
>> "3:\n"
>> : "=a"(__res), "=c"(cs) : "0"(cs-ct), "1"(0L), "r"(ct) : "xmm0" );
>>
>> return (int) __res;
>>
>> The problem with this code is that "pcmpistri" returns its result in
>> ECX (i.e. the lower 32 bits of RCX), while the "movzxb" instructions
>> use the full RCX register.
>> One solution is to insert a REX prefix with REX.w bit set ( any gas
>> directive for this? )
> Normally setting the low 32 bits of an x86 register will zero out the
> upper 32 bits. Is that not true for pcmpistri?
>
> Otherwise, it sounds like you want the addressing mode (%rax,%ecx).
> Does x86 really have that addressing mode?
>
> Why not just zero extend %ecx to %rcx?
>
>> However, I'd prefer to have gcc clear RCX at the beginning of the
>> function. The above code loads the "c" register with 0, but the
>> resulting asm code is
>> "xorl ecx, ecx"
> That instruction will indeed set %ecx to zero. Think about it.
>
> Ian
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-06-14 5:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-13 21:59 Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode Jeroen van Bemmel
2012-06-14 0:37 ` Ian Lance Taylor
2012-06-14 5:37 ` Jeroen van Bemmel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).