* [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
@ 2023-03-10 2:44 Noah Goldstein
2023-03-10 16:38 ` H.J. Lu
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Noah Goldstein @ 2023-03-10 2:44 UTC (permalink / raw)
To: libc-alpha; +Cc: goldstein.w.n, hjl.tools, carlos
High8 partial registers can incur a stall when being modified (if not
renamed seperately), or at the very least incur extra backend uops (if
renamed seperately). Either way `testl $0x0400, %eax` is preferable to
`andb $0x04, %ah`.
Function size is unchanged when accounting for 16-byte padding.
---
sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
index d754668bce..d45f984e1a 100644
--- a/sysdeps/x86_64/fpu/e_fmodl.S
+++ b/sysdeps/x86_64/fpu/e_fmodl.S
@@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
fldt 8(%rsp)
1: fprem
fstsw %ax
- and $04,%ah
+ testl $0x400,%eax
jnz 1b
fstp %st(1)
ret
--
2.34.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
2023-03-10 2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein
@ 2023-03-10 16:38 ` H.J. Lu
2023-03-13 8:03 ` Florian Weimer
2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein
2 siblings, 0 replies; 8+ messages in thread
From: H.J. Lu @ 2023-03-10 16:38 UTC (permalink / raw)
To: Noah Goldstein; +Cc: libc-alpha, carlos
On Thu, Mar 9, 2023 at 6:44 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> High8 partial registers can incur a stall when being modified (if not
> renamed seperately), or at the very least incur extra backend uops (if
> renamed seperately). Either way `testl $0x0400, %eax` is preferable to
> `andb $0x04, %ah`.
>
> Function size is unchanged when accounting for 16-byte padding.
> ---
> sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> index d754668bce..d45f984e1a 100644
> --- a/sysdeps/x86_64/fpu/e_fmodl.S
> +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
> fldt 8(%rsp)
> 1: fprem
> fstsw %ax
> - and $04,%ah
> + testl $0x400,%eax
> jnz 1b
> fstp %st(1)
> ret
> --
> 2.34.1
>
OK.
Thanks.
--
H.J.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
2023-03-10 2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein
2023-03-10 16:38 ` H.J. Lu
@ 2023-03-13 8:03 ` Florian Weimer
2023-03-13 16:59 ` Noah Goldstein
2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein
2 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-03-13 8:03 UTC (permalink / raw)
To: Noah Goldstein via Libc-alpha; +Cc: Noah Goldstein, hjl.tools, carlos
* Noah Goldstein via Libc-alpha:
> High8 partial registers can incur a stall when being modified (if not
> renamed seperately), or at the very least incur extra backend uops (if
> renamed seperately). Either way `testl $0x0400, %eax` is preferable to
> `andb $0x04, %ah`.
>
> Function size is unchanged when accounting for 16-byte padding.
> ---
> sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> index d754668bce..d45f984e1a 100644
> --- a/sysdeps/x86_64/fpu/e_fmodl.S
> +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
> fldt 8(%rsp)
> 1: fprem
> fstsw %ax
> - and $04,%ah
> + testl $0x400,%eax
Why not test $0x400,%ax or test $04,%ah?
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
2023-03-13 8:03 ` Florian Weimer
@ 2023-03-13 16:59 ` Noah Goldstein
2023-03-13 17:30 ` Florian Weimer
0 siblings, 1 reply; 8+ messages in thread
From: Noah Goldstein @ 2023-03-13 16:59 UTC (permalink / raw)
To: Florian Weimer; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos
On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein via Libc-alpha:
>
> > High8 partial registers can incur a stall when being modified (if not
> > renamed seperately), or at the very least incur extra backend uops (if
> > renamed seperately). Either way `testl $0x0400, %eax` is preferable to
> > `andb $0x04, %ah`.
> >
> > Function size is unchanged when accounting for 16-byte padding.
> > ---
> > sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> > index d754668bce..d45f984e1a 100644
> > --- a/sysdeps/x86_64/fpu/e_fmodl.S
> > +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
> > fldt 8(%rsp)
> > 1: fprem
> > fstsw %ax
> > - and $04,%ah
> > + testl $0x400,%eax
>
> Why not test $0x400,%ax or test $04,%ah?
`test $0x400,%ax` uses imm16 which can cause length-changing-prefix
(`0x66` in the opcode) stalls.
`test $0x4,%ah` is more okay, but partial register usage has several
delays associated with it (even pure
reads), depends on arch but for example hwl/skl have 2c latency added
(in this case where %ah is not
being renamed seperately).
In general, if you don't need the code size, best to stick with
32/64-bit instructions.
>
> Thanks,
> Florian
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
2023-03-13 16:59 ` Noah Goldstein
@ 2023-03-13 17:30 ` Florian Weimer
2023-03-13 20:49 ` Noah Goldstein
0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2023-03-13 17:30 UTC (permalink / raw)
To: Noah Goldstein; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos
* Noah Goldstein:
> On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Noah Goldstein via Libc-alpha:
>>
>> > High8 partial registers can incur a stall when being modified (if not
>> > renamed seperately), or at the very least incur extra backend uops (if
>> > renamed seperately). Either way `testl $0x0400, %eax` is preferable to
>> > `andb $0x04, %ah`.
>> >
>> > Function size is unchanged when accounting for 16-byte padding.
>> > ---
>> > sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
>> > index d754668bce..d45f984e1a 100644
>> > --- a/sysdeps/x86_64/fpu/e_fmodl.S
>> > +++ b/sysdeps/x86_64/fpu/e_fmodl.S
>> > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
>> > fldt 8(%rsp)
>> > 1: fprem
>> > fstsw %ax
>> > - and $04,%ah
>> > + testl $0x400,%eax
>>
>> Why not test $0x400,%ax or test $04,%ah?
> `test $0x400,%ax` uses imm16 which can cause length-changing-prefix
> (`0x66` in the opcode) stalls.
> `test $0x4,%ah` is more okay, but partial register usage has several
> delays associated with it (even pure
> reads), depends on arch but for example hwl/skl have 2c latency added
> (in this case where %ah is not
> being renamed seperately).
> In general, if you don't need the code size, best to stick with
> 32/64-bit instructions.
Do we need to clear %eax first to avoid a false dependency?
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read
2023-03-13 17:30 ` Florian Weimer
@ 2023-03-13 20:49 ` Noah Goldstein
0 siblings, 0 replies; 8+ messages in thread
From: Noah Goldstein @ 2023-03-13 20:49 UTC (permalink / raw)
To: Florian Weimer; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos
On Mon, Mar 13, 2023 at 12:30 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Noah Goldstein:
>
> > On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Noah Goldstein via Libc-alpha:
> >>
> >> > High8 partial registers can incur a stall when being modified (if not
> >> > renamed seperately), or at the very least incur extra backend uops (if
> >> > renamed seperately). Either way `testl $0x0400, %eax` is preferable to
> >> > `andb $0x04, %ah`.
> >> >
> >> > Function size is unchanged when accounting for 16-byte padding.
> >> > ---
> >> > sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> >> > index d754668bce..d45f984e1a 100644
> >> > --- a/sysdeps/x86_64/fpu/e_fmodl.S
> >> > +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> >> > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
> >> > fldt 8(%rsp)
> >> > 1: fprem
> >> > fstsw %ax
> >> > - and $04,%ah
> >> > + testl $0x400,%eax
> >>
> >> Why not test $0x400,%ax or test $04,%ah?
> > `test $0x400,%ax` uses imm16 which can cause length-changing-prefix
> > (`0x66` in the opcode) stalls.
> > `test $0x4,%ah` is more okay, but partial register usage has several
> > delays associated with it (even pure
> > reads), depends on arch but for example hwl/skl have 2c latency added
> > (in this case where %ah is not
> > being renamed seperately).
> > In general, if you don't need the code size, best to stick with
> > 32/64-bit instructions.
>
> Do we need to clear %eax first to avoid a false dependency?
oh yeah, guess you're right, probably `test %ah` is best.
>
> Thanks,
> Florian
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read
2023-03-10 2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein
2023-03-10 16:38 ` H.J. Lu
2023-03-13 8:03 ` Florian Weimer
@ 2023-03-13 20:50 ` Noah Goldstein
2023-03-14 2:00 ` H.J. Lu
2 siblings, 1 reply; 8+ messages in thread
From: Noah Goldstein @ 2023-03-13 20:50 UTC (permalink / raw)
To: libc-alpha; +Cc: goldstein.w.n, hjl.tools, carlos
High8 partial registers can incur a stall when being modified (if not
renamed seperately), or at the very least incur extra backend uops (if
renamed seperately). Either way read only `testl $0x4, %ah` is preferable
to `andb $0x4, %ah`.
Function size is unchanged when accounting for 16-byte padding.
---
sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
index d754668bce..e9a76178f9 100644
--- a/sysdeps/x86_64/fpu/e_fmodl.S
+++ b/sysdeps/x86_64/fpu/e_fmodl.S
@@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
fldt 8(%rsp)
1: fprem
fstsw %ax
- and $04,%ah
+ testb $0x4,%ah
jnz 1b
fstp %st(1)
ret
--
2.34.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read
2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein
@ 2023-03-14 2:00 ` H.J. Lu
0 siblings, 0 replies; 8+ messages in thread
From: H.J. Lu @ 2023-03-14 2:00 UTC (permalink / raw)
To: Noah Goldstein; +Cc: libc-alpha, carlos
On Mon, Mar 13, 2023 at 1:51 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>
> High8 partial registers can incur a stall when being modified (if not
> renamed seperately), or at the very least incur extra backend uops (if
> renamed seperately). Either way read only `testl $0x4, %ah` is preferable
> to `andb $0x4, %ah`.
>
> Function size is unchanged when accounting for 16-byte padding.
> ---
> sysdeps/x86_64/fpu/e_fmodl.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S
> index d754668bce..e9a76178f9 100644
> --- a/sysdeps/x86_64/fpu/e_fmodl.S
> +++ b/sysdeps/x86_64/fpu/e_fmodl.S
> @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl)
> fldt 8(%rsp)
> 1: fprem
> fstsw %ax
> - and $04,%ah
> + testb $0x4,%ah
> jnz 1b
> fstp %st(1)
> ret
> --
> 2.34.1
>
OK.
Thanks.
--
H.J.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-03-14 2:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10 2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein
2023-03-10 16:38 ` H.J. Lu
2023-03-13 8:03 ` Florian Weimer
2023-03-13 16:59 ` Noah Goldstein
2023-03-13 17:30 ` Florian Weimer
2023-03-13 20:49 ` Noah Goldstein
2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein
2023-03-14 2:00 ` H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).