public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [GNU as / amd64] Forcing disp32 offset for MODRM
@ 2024-07-11 22:50 Mason
  2024-07-11 23:25 ` H.J. Lu
  0 siblings, 1 reply; 3+ messages in thread
From: Mason @ 2024-07-11 22:50 UTC (permalink / raw)
  To: binutils

Hello everyone,

Hope this is an appropriate place for my question.
If not, I'd appreciate being pointed in the right
direction. Please CC me on answers.

Using GNU as - on amd64 target.

My example code:

	movq 48*1(%rsi), %rax
	movq 48*2(%rsi), %rax
	movq 48*3(%rsi), %rax
	movq 48*4(%rsi), %rax

which is assembled to

   0:	48 8b 46 30          	mov    0x30(%rsi),%rax
   4:	48 8b 46 60          	mov    0x60(%rsi),%rax
   8:	48 8b 86 90 00 00 00 	mov    0x90(%rsi),%rax
   f:	48 8b 86 c0 00 00 00 	mov    0xc0(%rsi),%rax

Since offsets 0x30 and 0x60 are less than 0x80,
they fit in an s8, thus gas logically uses disp8.

I am looking for a way to force gas to generate the
disp32 variant for the first two instructions:

   0:	48 8b 86 30 00 00 00	mov    0x30(%rsi),%rax
   4:	48 8b 86 60 00 00 00	mov    0x60(%rsi),%rax
   8:	48 8b 86 90 00 00 00	mov    0x90(%rsi),%rax
   f:	48 8b 86 c0 00 00 00	mov    0xc0(%rsi),%rax

Is there some magical syntax to say "s32 not s8" ?

Regards

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GNU as / amd64] Forcing disp32 offset for MODRM
  2024-07-11 22:50 [GNU as / amd64] Forcing disp32 offset for MODRM Mason
@ 2024-07-11 23:25 ` H.J. Lu
  2024-07-12  1:00   ` Mason
  0 siblings, 1 reply; 3+ messages in thread
From: H.J. Lu @ 2024-07-11 23:25 UTC (permalink / raw)
  To: Mason; +Cc: Binutils

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

On Fri, Jul 12, 2024, 6:50 AM Mason <slash.tmp@free.fr> wrote:

> Hello everyone,
>
> Hope this is an appropriate place for my question.
> If not, I'd appreciate being pointed in the right
> direction. Please CC me on answers.
>
> Using GNU as - on amd64 target.
>
> My example code:
>
>         movq 48*1(%rsi), %rax
>         movq 48*2(%rsi), %rax
>         movq 48*3(%rsi), %rax
>         movq 48*4(%rsi), %rax
>
> which is assembled to
>
>    0:   48 8b 46 30             mov    0x30(%rsi),%rax
>    4:   48 8b 46 60             mov    0x60(%rsi),%rax
>    8:   48 8b 86 90 00 00 00    mov    0x90(%rsi),%rax
>    f:   48 8b 86 c0 00 00 00    mov    0xc0(%rsi),%rax
>
> Since offsets 0x30 and 0x60 are less than 0x80,
> they fit in an s8, thus gas logically uses disp8.
>
> I am looking for a way to force gas to generate the
> disp32 variant for the first two instructions:
>
>    0:   48 8b 86 30 00 00 00    mov    0x30(%rsi),%rax
>    4:   48 8b 86 60 00 00 00    mov    0x60(%rsi),%rax
>    8:   48 8b 86 90 00 00 00    mov    0x90(%rsi),%rax
>    f:   48 8b 86 c0 00 00 00    mov    0xc0(%rsi),%rax
>
> Is there some magical syntax to say "s32 not s8" ?
>

Try {disp32} ({d32})? pseudo prefix.


> Regards
>
> H.J.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GNU as / amd64] Forcing disp32 offset for MODRM
  2024-07-11 23:25 ` H.J. Lu
@ 2024-07-12  1:00   ` Mason
  0 siblings, 0 replies; 3+ messages in thread
From: Mason @ 2024-07-12  1:00 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

On 12/07/2024 01:25, H.J. Lu wrote:

> On Fri, Jul 12, 2024, 6:50 AM Mason <slash.tmp@free.fr <mailto:slash.tmp@free.fr>> wrote:
> 
>     Hello everyone,
> 
>     Hope this is an appropriate place for my question.
>     If not, I'd appreciate being pointed in the right
>     direction. Please CC me on answers.
> 
>     Using GNU as - on amd64 target.
> 
>     My example code:
> 
>             movq 48*1(%rsi), %rax
>             movq 48*2(%rsi), %rax
>             movq 48*3(%rsi), %rax
>             movq 48*4(%rsi), %rax
> 
>     which is assembled to
> 
>        0:   48 8b 46 30             mov    0x30(%rsi),%rax
>        4:   48 8b 46 60             mov    0x60(%rsi),%rax
>        8:   48 8b 86 90 00 00 00    mov    0x90(%rsi),%rax
>        f:   48 8b 86 c0 00 00 00    mov    0xc0(%rsi),%rax
> 
>     Since offsets 0x30 and 0x60 are less than 0x80,
>     they fit in an s8, thus gas logically uses disp8.
> 
>     I am looking for a way to force gas to generate the
>     disp32 variant for the first two instructions:
> 
>        0:   48 8b 86 30 00 00 00    mov    0x30(%rsi),%rax
>        4:   48 8b 86 60 00 00 00    mov    0x60(%rsi),%rax
>        8:   48 8b 86 90 00 00 00    mov    0x90(%rsi),%rax
>        f:   48 8b 86 c0 00 00 00    mov    0xc0(%rsi),%rax
> 
>     Is there some magical syntax to say "s32 not s8" ?
> 
> 
> Try {disp32} ({d32})? pseudo prefix.

Yes, you are the man/woman/boss!

https://stackoverflow.com/questions/47673177/how-do-gnu-assembler-x86-instruction-suffixes-like-s-in-mov-s-work
https://sourceware.org/binutils/docs/as/i386_002dMnemonics.html

	{disp32} movq 48*1(%rsi), %rax
	{disp32} movq 48*2(%rsi), %rax
	{disp32} movq 48*3(%rsi), %rax
	{disp32} movq 48*4(%rsi), %rax

is assembled into

   0:	48 8b 86 30 00 00 00 	mov    0x30(%rsi),%rax
   7:	48 8b 86 60 00 00 00 	mov    0x60(%rsi),%rax
   e:	48 8b 86 90 00 00 00 	mov    0x90(%rsi),%rax
  15:	48 8b 86 c0 00 00 00 	mov    0xc0(%rsi),%rax


Basically I had a crazy idea that has probably already
been proposed many times, and that sounds good on paper,
but apparently fails because of branch (mis)prediction.


Basically, I have a simple loop:

	for (i = 0; i < n; ++i) foo(i);

with n only known only at run-time.

In assembly I am doing a Duff device of sorts, with computed
goto for the loop tail.

	//unroll 64 times
L1:	foo(0) .. foo(63)
	loop_count -= 64;
	if (loop_count >= 64) goto L1;
	// otherwise adjust pointers and jump in the middle of the loop
	// compute correct jump offset
	jmp *%rax

But the performance is disappointing.

Branch mispredicts grow 50% vs a simple loop
to handle the tail :(

So my genius idea falls flat...

Will try increasing the unroll factor
(but I needed the {disp32} trick to make all
iterations the same size)

Open to suggestions.

Regards



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-07-12  1:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-11 22:50 [GNU as / amd64] Forcing disp32 offset for MODRM Mason
2024-07-11 23:25 ` H.J. Lu
2024-07-12  1:00   ` Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).