public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
@ 2023-03-10  2:44 Noah Goldstein
  2023-03-10 16:38 ` H.J. Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Noah Goldstein @ 2023-03-10  2:44 UTC (permalink / raw)
  To: libc-alpha; +Cc: goldstein.w.n, hjl.tools, carlos

High8 partial registers can incur a stall when being modified (if not
renamed seperately), or at the very least incur extra backend uops (if
renamed seperately). Either way `testl $0x0400, %eax` is preferable to
`andb $0x04, %ah`.

Function size is unchanged when accounting for 16-byte padding.
---
 sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
index d754668bce..d45f984e1a 100644
--- a/sysdeps/x86_64/fpu/e_fmodl.S
+++ b/sysdeps/x86_64/fpu/e_fmodl.S
@@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
 	fldt	8(%rsp)
 1:	fprem
 	fstsw	%ax
-	and	$04,%ah
+	testl	$0x400,%eax
 	jnz	1b
 	fstp	%st(1)
 	ret
-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
  2023-03-10  2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein
@ 2023-03-10 16:38 ` H.J. Lu
  2023-03-13  8:03 ` Florian Weimer
  2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein
  2 siblings, 0 replies; 8+ messages in thread
From: H.J. Lu @ 2023-03-10 16:38 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha, carlos

On Thu, Mar 9, 2023 at 6:44 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> High8 partial registers can incur a stall when being modified (if not
> renamed seperately), or at the very least incur extra backend uops (if
> renamed seperately). Either way `testl $0x0400, %eax` is preferable to
> `andb $0x04, %ah`.
>
> Function size is unchanged when accounting for 16-byte padding.
> ---
>  sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> index d754668bce..d45f984e1a 100644
> --- a/sysdeps/x86_64/fpu/e_fmodl.S
> +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
>         fldt    8(%rsp)
>  1:     fprem
>         fstsw   %ax
> -       and     $04,%ah
> +       testl   $0x400,%eax
>         jnz     1b
>         fstp    %st(1)
>         ret
> --
> 2.34.1
>

OK.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
  2023-03-10  2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein
  2023-03-10 16:38 ` H.J. Lu
@ 2023-03-13  8:03 ` Florian Weimer
  2023-03-13 16:59   ` Noah Goldstein
  2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein
  2 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-03-13  8:03 UTC (permalink / raw)
  To: Noah Goldstein via Libc-alpha; +Cc: Noah Goldstein, hjl.tools, carlos

* Noah Goldstein via Libc-alpha:

> High8 partial registers can incur a stall when being modified (if not
> renamed seperately), or at the very least incur extra backend uops (if
> renamed seperately). Either way `testl $0x0400, %eax` is preferable to
> `andb $0x04, %ah`.
>
> Function size is unchanged when accounting for 16-byte padding.
> ---
>  sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> index d754668bce..d45f984e1a 100644
> --- a/sysdeps/x86_64/fpu/e_fmodl.S
> +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
>  	fldt	8(%rsp)
>  1:	fprem
>  	fstsw	%ax
> -	and	$04,%ah
> +	testl	$0x400,%eax

Why not test $0x400,%ax or test $04,%ah?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
  2023-03-13  8:03 ` Florian Weimer
@ 2023-03-13 16:59   ` Noah Goldstein
  2023-03-13 17:30     ` Florian Weimer
  0 siblings, 1 reply; 8+ messages in thread
From: Noah Goldstein @ 2023-03-13 16:59 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos

On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein via Libc-alpha:
>
> > High8 partial registers can incur a stall when being modified (if not
> > renamed seperately), or at the very least incur extra backend uops (if
> > renamed seperately). Either way `testl $0x0400, %eax` is preferable to
> > `andb $0x04, %ah`.
> >
> > Function size is unchanged when accounting for 16-byte padding.
> > ---
> >  sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> > index d754668bce..d45f984e1a 100644
> > --- a/sysdeps/x86_64/fpu/e_fmodl.S
> > +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
> >       fldt    8(%rsp)
> >  1:   fprem
> >       fstsw   %ax
> > -     and     $04,%ah
> > +     testl   $0x400,%eax
>
> Why not test $0x400,%ax or test $04,%ah?
`test $0x400,%ax` uses imm16 which can cause length-changing-prefix
(`0x66` in the opcode) stalls.
`test $0x4,%ah` is more okay, but partial register usage has several
delays associated with it (even pure
reads), depends on arch but for example hwl/skl have 2c latency added
(in this case where %ah is not
being renamed seperately).
In general, if you don't need the code size, best to stick with
32/64-bit instructions.

>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
  2023-03-13 16:59   ` Noah Goldstein
@ 2023-03-13 17:30     ` Florian Weimer
  2023-03-13 20:49       ` Noah Goldstein
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-03-13 17:30 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos

* Noah Goldstein:

> On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Noah Goldstein via Libc-alpha:
>>
>> > High8 partial registers can incur a stall when being modified (if not
>> > renamed seperately), or at the very least incur extra backend uops (if
>> > renamed seperately). Either way `testl $0x0400, %eax` is preferable to
>> > `andb $0x04, %ah`.
>> >
>> > Function size is unchanged when accounting for 16-byte padding.
>> > ---
>> >  sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
>> > index d754668bce..d45f984e1a 100644
>> > --- a/sysdeps/x86_64/fpu/e_fmodl.S
>> > +++ b/sysdeps/x86_64/fpu/e_fmodl.S
>> > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
>> >       fldt    8(%rsp)
>> >  1:   fprem
>> >       fstsw   %ax
>> > -     and     $04,%ah
>> > +     testl   $0x400,%eax
>>
>> Why not test $0x400,%ax or test $04,%ah?
> `test $0x400,%ax` uses imm16 which can cause length-changing-prefix
> (`0x66` in the opcode) stalls.
> `test $0x4,%ah` is more okay, but partial register usage has several
> delays associated with it (even pure
> reads), depends on arch but for example hwl/skl have 2c latency added
> (in this case where %ah is not
> being renamed seperately).
> In general, if you don't need the code size, best to stick with
> 32/64-bit instructions.

Do we need to clear %eax first to avoid a false dependency?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
  2023-03-13 17:30     ` Florian Weimer
@ 2023-03-13 20:49       ` Noah Goldstein
  0 siblings, 0 replies; 8+ messages in thread
From: Noah Goldstein @ 2023-03-13 20:49 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos

On Mon, Mar 13, 2023 at 12:30 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein:
>
> > On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Noah Goldstein via Libc-alpha:
> >>
> >> > High8 partial registers can incur a stall when being modified (if not
> >> > renamed seperately), or at the very least incur extra backend uops (if
> >> > renamed seperately). Either way `testl $0x0400, %eax` is preferable to
> >> > `andb $0x04, %ah`.
> >> >
> >> > Function size is unchanged when accounting for 16-byte padding.
> >> > ---
> >> >  sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> >> > index d754668bce..d45f984e1a 100644
> >> > --- a/sysdeps/x86_64/fpu/e_fmodl.S
> >> > +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> >> > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
> >> >       fldt    8(%rsp)
> >> >  1:   fprem
> >> >       fstsw   %ax
> >> > -     and     $04,%ah
> >> > +     testl   $0x400,%eax
> >>
> >> Why not test $0x400,%ax or test $04,%ah?
> > `test $0x400,%ax` uses imm16 which can cause length-changing-prefix
> > (`0x66` in the opcode) stalls.
> > `test $0x4,%ah` is more okay, but partial register usage has several
> > delays associated with it (even pure
> > reads), depends on arch but for example hwl/skl have 2c latency added
> > (in this case where %ah is not
> > being renamed seperately).
> > In general, if you don't need the code size, best to stick with
> > 32/64-bit instructions.
>
> Do we need to clear %eax first to avoid a false dependency?

oh  yeah, guess you're right, probably `test %ah` is best.
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read
  2023-03-10  2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein
  2023-03-10 16:38 ` H.J. Lu
  2023-03-13  8:03 ` Florian Weimer
@ 2023-03-13 20:50 ` Noah Goldstein
  2023-03-14  2:00   ` H.J. Lu
  2 siblings, 1 reply; 8+ messages in thread
From: Noah Goldstein @ 2023-03-13 20:50 UTC (permalink / raw)
  To: libc-alpha; +Cc: goldstein.w.n, hjl.tools, carlos

High8 partial registers can incur a stall when being modified (if not
renamed seperately), or at the very least incur extra backend uops (if
renamed seperately). Either way read only `testl $0x4, %ah` is preferable
to `andb $0x4, %ah`.

Function size is unchanged when accounting for 16-byte padding.
---
 sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
index d754668bce..e9a76178f9 100644
--- a/sysdeps/x86_64/fpu/e_fmodl.S
+++ b/sysdeps/x86_64/fpu/e_fmodl.S
@@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
 	fldt	8(%rsp)
 1:	fprem
 	fstsw	%ax
-	and	$04,%ah
+	testb	$0x4,%ah
 	jnz	1b
 	fstp	%st(1)
 	ret
-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read
  2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein
@ 2023-03-14  2:00   ` H.J. Lu
  0 siblings, 0 replies; 8+ messages in thread
From: H.J. Lu @ 2023-03-14  2:00 UTC (permalink / raw)
  To: Noah Goldstein; +Cc: libc-alpha, carlos

On Mon, Mar 13, 2023 at 1:51 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> High8 partial registers can incur a stall when being modified (if not
> renamed seperately), or at the very least incur extra backend uops (if
> renamed seperately). Either way read only `testl $0x4, %ah` is preferable
> to `andb $0x4, %ah`.
>
> Function size is unchanged when accounting for 16-byte padding.
> ---
>  sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> index d754668bce..e9a76178f9 100644
> --- a/sysdeps/x86_64/fpu/e_fmodl.S
> +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
>         fldt    8(%rsp)
>  1:     fprem
>         fstsw   %ax
> -       and     $04,%ah
> +       testb   $0x4,%ah
>         jnz     1b
>         fstp    %st(1)
>         ret
> --
> 2.34.1
>

OK.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-03-14  2:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10  2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein
2023-03-10 16:38 ` H.J. Lu
2023-03-13  8:03 ` Florian Weimer
2023-03-13 16:59   ` Noah Goldstein
2023-03-13 17:30     ` Florian Weimer
2023-03-13 20:49       ` Noah Goldstein
2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein
2023-03-14  2:00   ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).