* [PATCH v1] x86-64: Replace `%ah` write with `%eax` read @ 2023-03-10 2:44 Noah Goldstein 2023-03-10 16:38 ` H.J. Lu ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Noah Goldstein @ 2023-03-10 2:44 UTC (permalink / raw) To: libc-alpha; +Cc: goldstein.w.n, hjl.tools, carlos High8 partial registers can incur a stall when being modified (if not renamed seperately), or at the very least incur extra backend uops (if renamed seperately). Either way `testl $0x0400, %eax` is preferable to `andb $0x04, %ah`. Function size is unchanged when accounting for 16-byte padding. --- sysdeps/x86_64/fpu/e_fmodl.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S index d754668bce..d45f984e1a 100644 --- a/sysdeps/x86_64/fpu/e_fmodl.S +++ b/sysdeps/x86_64/fpu/e_fmodl.S @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl) fldt 8(%rsp) 1: fprem fstsw %ax - and $04,%ah + testl $0x400,%eax jnz 1b fstp %st(1) ret -- 2.34.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read 2023-03-10 2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein @ 2023-03-10 16:38 ` H.J. Lu 2023-03-13 8:03 ` Florian Weimer 2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein 2 siblings, 0 replies; 8+ messages in thread From: H.J. Lu @ 2023-03-10 16:38 UTC (permalink / raw) To: Noah Goldstein; +Cc: libc-alpha, carlos On Thu, Mar 9, 2023 at 6:44 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > High8 partial registers can incur a stall when being modified (if not > renamed seperately), or at the very least incur extra backend uops (if > renamed seperately). Either way `testl $0x0400, %eax` is preferable to > `andb $0x04, %ah`. > > Function size is unchanged when accounting for 16-byte padding. > --- > sysdeps/x86_64/fpu/e_fmodl.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S > index d754668bce..d45f984e1a 100644 > --- a/sysdeps/x86_64/fpu/e_fmodl.S > +++ b/sysdeps/x86_64/fpu/e_fmodl.S > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl) > fldt 8(%rsp) > 1: fprem > fstsw %ax > - and $04,%ah > + testl $0x400,%eax > jnz 1b > fstp %st(1) > ret > -- > 2.34.1 > OK. Thanks. -- H.J. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read 2023-03-10 2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein 2023-03-10 16:38 ` H.J. Lu @ 2023-03-13 8:03 ` Florian Weimer 2023-03-13 16:59 ` Noah Goldstein 2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein 2 siblings, 1 reply; 8+ messages in thread From: Florian Weimer @ 2023-03-13 8:03 UTC (permalink / raw) To: Noah Goldstein via Libc-alpha; +Cc: Noah Goldstein, hjl.tools, carlos * Noah Goldstein via Libc-alpha: > High8 partial registers can incur a stall when being modified (if not > renamed seperately), or at the very least incur extra backend uops (if > renamed seperately). Either way `testl $0x0400, %eax` is preferable to > `andb $0x04, %ah`. > > Function size is unchanged when accounting for 16-byte padding. > --- > sysdeps/x86_64/fpu/e_fmodl.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S > index d754668bce..d45f984e1a 100644 > --- a/sysdeps/x86_64/fpu/e_fmodl.S > +++ b/sysdeps/x86_64/fpu/e_fmodl.S > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl) > fldt 8(%rsp) > 1: fprem > fstsw %ax > - and $04,%ah > + testl $0x400,%eax Why not test $0x400,%ax or test $04,%ah? Thanks, Florian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read 2023-03-13 8:03 ` Florian Weimer @ 2023-03-13 16:59 ` Noah Goldstein 2023-03-13 17:30 ` Florian Weimer 0 siblings, 1 reply; 8+ messages in thread From: Noah Goldstein @ 2023-03-13 16:59 UTC (permalink / raw) To: Florian Weimer; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Noah Goldstein via Libc-alpha: > > > High8 partial registers can incur a stall when being modified (if not > > renamed seperately), or at the very least incur extra backend uops (if > > renamed seperately). Either way `testl $0x0400, %eax` is preferable to > > `andb $0x04, %ah`. > > > > Function size is unchanged when accounting for 16-byte padding. > > --- > > sysdeps/x86_64/fpu/e_fmodl.S | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S > > index d754668bce..d45f984e1a 100644 > > --- a/sysdeps/x86_64/fpu/e_fmodl.S > > +++ b/sysdeps/x86_64/fpu/e_fmodl.S > > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl) > > fldt 8(%rsp) > > 1: fprem > > fstsw %ax > > - and $04,%ah > > + testl $0x400,%eax > > Why not test $0x400,%ax or test $04,%ah? `test $0x400,%ax` uses imm16 which can cause length-changing-prefix (`0x66` in the opcode) stalls. `test $0x4,%ah` is more okay, but partial register usage has several delays associated with it (even pure reads), depends on arch but for example hwl/skl have 2c latency added (in this case where %ah is not being renamed seperately). In general, if you don't need the code size, best to stick with 32/64-bit instructions. > > Thanks, > Florian > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read 2023-03-13 16:59 ` Noah Goldstein @ 2023-03-13 17:30 ` Florian Weimer 2023-03-13 20:49 ` Noah Goldstein 0 siblings, 1 reply; 8+ messages in thread From: Florian Weimer @ 2023-03-13 17:30 UTC (permalink / raw) To: Noah Goldstein; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos * Noah Goldstein: > On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Noah Goldstein via Libc-alpha: >> >> > High8 partial registers can incur a stall when being modified (if not >> > renamed seperately), or at the very least incur extra backend uops (if >> > renamed seperately). Either way `testl $0x0400, %eax` is preferable to >> > `andb $0x04, %ah`. >> > >> > Function size is unchanged when accounting for 16-byte padding. >> > --- >> > sysdeps/x86_64/fpu/e_fmodl.S | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S >> > index d754668bce..d45f984e1a 100644 >> > --- a/sysdeps/x86_64/fpu/e_fmodl.S >> > +++ b/sysdeps/x86_64/fpu/e_fmodl.S >> > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl) >> > fldt 8(%rsp) >> > 1: fprem >> > fstsw %ax >> > - and $04,%ah >> > + testl $0x400,%eax >> >> Why not test $0x400,%ax or test $04,%ah? > `test $0x400,%ax` uses imm16 which can cause length-changing-prefix > (`0x66` in the opcode) stalls. > `test $0x4,%ah` is more okay, but partial register usage has several > delays associated with it (even pure > reads), depends on arch but for example hwl/skl have 2c latency added > (in this case where %ah is not > being renamed seperately). > In general, if you don't need the code size, best to stick with > 32/64-bit instructions. Do we need to clear %eax first to avoid a false dependency? Thanks, Florian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1] x86-64: Replace `%ah` write with `%eax` read 2023-03-13 17:30 ` Florian Weimer @ 2023-03-13 20:49 ` Noah Goldstein 0 siblings, 0 replies; 8+ messages in thread From: Noah Goldstein @ 2023-03-13 20:49 UTC (permalink / raw) To: Florian Weimer; +Cc: Noah Goldstein via Libc-alpha, hjl.tools, carlos On Mon, Mar 13, 2023 at 12:30 PM Florian Weimer <fweimer@redhat.com> wrote: > > * Noah Goldstein: > > > On Mon, Mar 13, 2023 at 3:03 AM Florian Weimer <fweimer@redhat.com> wrote: > >> > >> * Noah Goldstein via Libc-alpha: > >> > >> > High8 partial registers can incur a stall when being modified (if not > >> > renamed seperately), or at the very least incur extra backend uops (if > >> > renamed seperately). Either way `testl $0x0400, %eax` is preferable to > >> > `andb $0x04, %ah`. > >> > > >> > Function size is unchanged when accounting for 16-byte padding. > >> > --- > >> > sysdeps/x86_64/fpu/e_fmodl.S | 2 +- > >> > 1 file changed, 1 insertion(+), 1 deletion(-) > >> > > >> > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S > >> > index d754668bce..d45f984e1a 100644 > >> > --- a/sysdeps/x86_64/fpu/e_fmodl.S > >> > +++ b/sysdeps/x86_64/fpu/e_fmodl.S > >> > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl) > >> > fldt 8(%rsp) > >> > 1: fprem > >> > fstsw %ax > >> > - and $04,%ah > >> > + testl $0x400,%eax > >> > >> Why not test $0x400,%ax or test $04,%ah? > > `test $0x400,%ax` uses imm16 which can cause length-changing-prefix > > (`0x66` in the opcode) stalls. > > `test $0x4,%ah` is more okay, but partial register usage has several > > delays associated with it (even pure > > reads), depends on arch but for example hwl/skl have 2c latency added > > (in this case where %ah is not > > being renamed seperately). > > In general, if you don't need the code size, best to stick with > > 32/64-bit instructions. > > Do we need to clear %eax first to avoid a false dependency? oh yeah, guess you're right, probably `test %ah` is best. > > Thanks, > Florian > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read 2023-03-10 2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein 2023-03-10 16:38 ` H.J. Lu 2023-03-13 8:03 ` Florian Weimer @ 2023-03-13 20:50 ` Noah Goldstein 2023-03-14 2:00 ` H.J. Lu 2 siblings, 1 reply; 8+ messages in thread From: Noah Goldstein @ 2023-03-13 20:50 UTC (permalink / raw) To: libc-alpha; +Cc: goldstein.w.n, hjl.tools, carlos High8 partial registers can incur a stall when being modified (if not renamed seperately), or at the very least incur extra backend uops (if renamed seperately). Either way read only `testl $0x4, %ah` is preferable to `andb $0x4, %ah`. Function size is unchanged when accounting for 16-byte padding. --- sysdeps/x86_64/fpu/e_fmodl.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S index d754668bce..e9a76178f9 100644 --- a/sysdeps/x86_64/fpu/e_fmodl.S +++ b/sysdeps/x86_64/fpu/e_fmodl.S @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl) fldt 8(%rsp) 1: fprem fstsw %ax - and $04,%ah + testb $0x4,%ah jnz 1b fstp %st(1) ret -- 2.34.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read 2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein @ 2023-03-14 2:00 ` H.J. Lu 0 siblings, 0 replies; 8+ messages in thread From: H.J. Lu @ 2023-03-14 2:00 UTC (permalink / raw) To: Noah Goldstein; +Cc: libc-alpha, carlos On Mon, Mar 13, 2023 at 1:51 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > High8 partial registers can incur a stall when being modified (if not > renamed seperately), or at the very least incur extra backend uops (if > renamed seperately). Either way read only `testl $0x4, %ah` is preferable > to `andb $0x4, %ah`. > > Function size is unchanged when accounting for 16-byte padding. > --- > sysdeps/x86_64/fpu/e_fmodl.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/sysdeps/x86_64/fpu/e_fmodl.S b/sysdeps/x86_64/fpu/e_fmodl.S > index d754668bce..e9a76178f9 100644 > --- a/sysdeps/x86_64/fpu/e_fmodl.S > +++ b/sysdeps/x86_64/fpu/e_fmodl.S > @@ -13,7 +13,7 @@ ENTRY(__ieee754_fmodl) > fldt 8(%rsp) > 1: fprem > fstsw %ax > - and $04,%ah > + testb $0x4,%ah > jnz 1b > fstp %st(1) > ret > -- > 2.34.1 > OK. Thanks. -- H.J. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-03-14 2:01 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-10 2:44 [PATCH v1] x86-64: Replace `%ah` write with `%eax` read Noah Goldstein 2023-03-10 16:38 ` H.J. Lu 2023-03-13 8:03 ` Florian Weimer 2023-03-13 16:59 ` Noah Goldstein 2023-03-13 17:30 ` Florian Weimer 2023-03-13 20:49 ` Noah Goldstein 2023-03-13 20:50 ` [PATCH v2] x86-64: Replace `and %ah` write with `test %ah` read Noah Goldstein 2023-03-14 2:00 ` H.J. Lu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).