* [GNU as / amd64] Forcing disp32 offset for MODRM
@ 2024-07-11 22:50 Mason
2024-07-11 23:25 ` H.J. Lu
0 siblings, 1 reply; 3+ messages in thread
From: Mason @ 2024-07-11 22:50 UTC (permalink / raw)
To: binutils
Hello everyone,
Hope this is an appropriate place for my question.
If not, I'd appreciate being pointed in the right
direction. Please CC me on answers.
Using GNU as - on amd64 target.
My example code:
movq 48*1(%rsi), %rax
movq 48*2(%rsi), %rax
movq 48*3(%rsi), %rax
movq 48*4(%rsi), %rax
which is assembled to
0: 48 8b 46 30 mov 0x30(%rsi),%rax
4: 48 8b 46 60 mov 0x60(%rsi),%rax
8: 48 8b 86 90 00 00 00 mov 0x90(%rsi),%rax
f: 48 8b 86 c0 00 00 00 mov 0xc0(%rsi),%rax
Since offsets 0x30 and 0x60 are less than 0x80,
they fit in an s8, thus gas logically uses disp8.
I am looking for a way to force gas to generate the
disp32 variant for the first two instructions:
0: 48 8b 86 30 00 00 00 mov 0x30(%rsi),%rax
4: 48 8b 86 60 00 00 00 mov 0x60(%rsi),%rax
8: 48 8b 86 90 00 00 00 mov 0x90(%rsi),%rax
f: 48 8b 86 c0 00 00 00 mov 0xc0(%rsi),%rax
Is there some magical syntax to say "s32 not s8" ?
Regards
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [GNU as / amd64] Forcing disp32 offset for MODRM
2024-07-11 22:50 [GNU as / amd64] Forcing disp32 offset for MODRM Mason
@ 2024-07-11 23:25 ` H.J. Lu
2024-07-12 1:00 ` Mason
0 siblings, 1 reply; 3+ messages in thread
From: H.J. Lu @ 2024-07-11 23:25 UTC (permalink / raw)
To: Mason; +Cc: Binutils
[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]
On Fri, Jul 12, 2024, 6:50 AM Mason <slash.tmp@free.fr> wrote:
> Hello everyone,
>
> Hope this is an appropriate place for my question.
> If not, I'd appreciate being pointed in the right
> direction. Please CC me on answers.
>
> Using GNU as - on amd64 target.
>
> My example code:
>
> movq 48*1(%rsi), %rax
> movq 48*2(%rsi), %rax
> movq 48*3(%rsi), %rax
> movq 48*4(%rsi), %rax
>
> which is assembled to
>
> 0: 48 8b 46 30 mov 0x30(%rsi),%rax
> 4: 48 8b 46 60 mov 0x60(%rsi),%rax
> 8: 48 8b 86 90 00 00 00 mov 0x90(%rsi),%rax
> f: 48 8b 86 c0 00 00 00 mov 0xc0(%rsi),%rax
>
> Since offsets 0x30 and 0x60 are less than 0x80,
> they fit in an s8, thus gas logically uses disp8.
>
> I am looking for a way to force gas to generate the
> disp32 variant for the first two instructions:
>
> 0: 48 8b 86 30 00 00 00 mov 0x30(%rsi),%rax
> 4: 48 8b 86 60 00 00 00 mov 0x60(%rsi),%rax
> 8: 48 8b 86 90 00 00 00 mov 0x90(%rsi),%rax
> f: 48 8b 86 c0 00 00 00 mov 0xc0(%rsi),%rax
>
> Is there some magical syntax to say "s32 not s8" ?
>
Try {disp32} ({d32})? pseudo prefix.
> Regards
>
> H.J.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [GNU as / amd64] Forcing disp32 offset for MODRM
2024-07-11 23:25 ` H.J. Lu
@ 2024-07-12 1:00 ` Mason
0 siblings, 0 replies; 3+ messages in thread
From: Mason @ 2024-07-12 1:00 UTC (permalink / raw)
To: H.J. Lu; +Cc: Binutils
On 12/07/2024 01:25, H.J. Lu wrote:
> On Fri, Jul 12, 2024, 6:50 AM Mason <slash.tmp@free.fr <mailto:slash.tmp@free.fr>> wrote:
>
> Hello everyone,
>
> Hope this is an appropriate place for my question.
> If not, I'd appreciate being pointed in the right
> direction. Please CC me on answers.
>
> Using GNU as - on amd64 target.
>
> My example code:
>
> movq 48*1(%rsi), %rax
> movq 48*2(%rsi), %rax
> movq 48*3(%rsi), %rax
> movq 48*4(%rsi), %rax
>
> which is assembled to
>
> 0: 48 8b 46 30 mov 0x30(%rsi),%rax
> 4: 48 8b 46 60 mov 0x60(%rsi),%rax
> 8: 48 8b 86 90 00 00 00 mov 0x90(%rsi),%rax
> f: 48 8b 86 c0 00 00 00 mov 0xc0(%rsi),%rax
>
> Since offsets 0x30 and 0x60 are less than 0x80,
> they fit in an s8, thus gas logically uses disp8.
>
> I am looking for a way to force gas to generate the
> disp32 variant for the first two instructions:
>
> 0: 48 8b 86 30 00 00 00 mov 0x30(%rsi),%rax
> 4: 48 8b 86 60 00 00 00 mov 0x60(%rsi),%rax
> 8: 48 8b 86 90 00 00 00 mov 0x90(%rsi),%rax
> f: 48 8b 86 c0 00 00 00 mov 0xc0(%rsi),%rax
>
> Is there some magical syntax to say "s32 not s8" ?
>
>
> Try {disp32} ({d32})? pseudo prefix.
Yes, you are the man/woman/boss!
https://stackoverflow.com/questions/47673177/how-do-gnu-assembler-x86-instruction-suffixes-like-s-in-mov-s-work
https://sourceware.org/binutils/docs/as/i386_002dMnemonics.html
{disp32} movq 48*1(%rsi), %rax
{disp32} movq 48*2(%rsi), %rax
{disp32} movq 48*3(%rsi), %rax
{disp32} movq 48*4(%rsi), %rax
is assembled into
0: 48 8b 86 30 00 00 00 mov 0x30(%rsi),%rax
7: 48 8b 86 60 00 00 00 mov 0x60(%rsi),%rax
e: 48 8b 86 90 00 00 00 mov 0x90(%rsi),%rax
15: 48 8b 86 c0 00 00 00 mov 0xc0(%rsi),%rax
Basically I had a crazy idea that has probably already
been proposed many times, and that sounds good on paper,
but apparently fails because of branch (mis)prediction.
Basically, I have a simple loop:
for (i = 0; i < n; ++i) foo(i);
with n only known only at run-time.
In assembly I am doing a Duff device of sorts, with computed
goto for the loop tail.
//unroll 64 times
L1: foo(0) .. foo(63)
loop_count -= 64;
if (loop_count >= 64) goto L1;
// otherwise adjust pointers and jump in the middle of the loop
// compute correct jump offset
jmp *%rax
But the performance is disappointing.
Branch mispredicts grow 50% vs a simple loop
to handle the tail :(
So my genius idea falls flat...
Will try increasing the unroll factor
(but I needed the {disp32} trick to make all
iterations the same size)
Open to suggestions.
Regards
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-07-12 1:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-11 22:50 [GNU as / amd64] Forcing disp32 offset for MODRM Mason
2024-07-11 23:25 ` H.J. Lu
2024-07-12 1:00 ` Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).