From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: "Gabriel Paubert" <paubert@iram.es>
Cc: <gcc@gcc.gnu.org>
Subject: Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1
Date: Fri, 6 Aug 2021 16:37:59 +0200 [thread overview]
Message-ID: <6ED6CDB5360246A1A9670B99892C354D@H270> (raw)
In-Reply-To: <20210806132052.GA565@lt-gp.iram.es>
Gabriel Paubert <paubert@iram.es> wrote:
> On Fri, Aug 06, 2021 at 02:43:34PM +0200, Stefan Kanthak wrote:
>> Gabriel Paubert <paubert@iram.es> wrote:
>>
>> > Hi,
>> >
>> > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote:
[...]
>> >> The whole idea behind these implementations is to get rid of loading
>> >> floating-point constants to perform comparisions.
>> >
>> > Indeed, but what I had in mind was something along the following lines:
>> >
>> > movq rax,xmm0 # and copy rax to say rcx, if needed later
>> > shrq rax,52 # move sign and exponent to 12 LSBs
>> > andl eax,0x7ff # mask the sign
>> > cmpl eax,0x434 # value to be checked
>> > ja return # exponent too large, we're done (what about NaNs?)
>> > cvttsd2si rax,xmm0 # safe after exponent check
>> > cvtsi2sd xmm0,rax # conversion done
>> >
>> > and a bit more to handle the corner cases (essentially preserve the
>> > sign to be correct between -1 and -0.0).
>>
>> The sign of -0.0 is the only corner case and already handled in my code.
>> Both SNAN and QNAN (which have an exponent 0x7ff) are handled and
>> preserved, as in the code GCC generates as well as my code.
>
> I don't know what the standard says about NaNs in this case, I seem to
> remember that arithmetic instructions typically produce QNaN when one of
> the inputs is a NaN, whether signaling or not.
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/trunc.html>
and its cousins as well as the C standard say
| If x is NaN, a NaN shall be returned.
That's why I mentioned that the code GCC generates also doesn't quiet SNaNs.
>> > But the CPU can (speculatively) start the conversions early, so the
>> > dependency chain is rather short.
>>
>> Correct.
>>
>> > I don't know if it's faster than your new code,
>>
>> It should be faster.
>>
>> > I'm almost sure that it's shorter.
>>
>> "neg rax; jo ...; neg rax" is 3+2+3=8 bytes, the above sequence has but
>> 5+4+5+5+2=21 bytes.
>>
>> JFTR: better use "add rax,rax; shr rax,53" instead of
>> "shr rax,52; and eax,0x7ff" and save 2 bytes.
>
> Indeed, I don't have the exact size of instructions in my head,
> especially since I've not written x86 assembly since the mid 90s.
>
> In any case, with your last improvement, the code is now down to a
> single 32 bit immediate constant. And I don't see how to eliminate it...
>
>>
>> Complete properly optimized code for __builtin_trunc is then as follows
>> (11 instructions, 44 bytes):
>>
>> .code64
>> .intel_syntax
>> .equ BIAS, 1023
>> .text
>> movq rax, xmm0 # rax = argument
>> add rax, rax
>> shr rax, 53 # rax = exponent of |argument|
>> cmp eax, BIAS + 53
>> jae .Lexit # argument indefinite?
>
> Maybe s/.Lexit/.L0/
Surely!
>> # |argument| >= 0x1.0p53?
>> cvttsd2si rax, xmm0 # rax = trunc(argument)
>> cvtsi2sd xmm1, rax # xmm1 = trunc(argument)
>> psrlq xmm0, 63
>> psllq xmm0, 63 # xmm0 = (argument & -0.0) ? -0.0 : 0.0
>> orpd xmm0, xmm1 # xmm0 = trunc(argument)
>> .L0: ret
>> .end
>>
>
> This looks nice.
Let's see how to convince GCC to generate such code sequences...
Stefan
next prev parent reply other threads:[~2021-08-06 14:42 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-05 7:25 Stefan Kanthak
2021-08-05 9:42 ` Gabriel Paubert
2021-08-05 10:19 ` Richard Biener
2021-08-05 11:58 ` Stefan Kanthak
2021-08-05 13:59 ` Gabriel Paubert
2021-08-06 12:43 ` Stefan Kanthak
2021-08-06 12:59 ` Richard Biener
2021-08-06 13:20 ` Gabriel Paubert
2021-08-06 14:37 ` Stefan Kanthak [this message]
2021-08-06 17:44 ` Joseph Myers
2021-08-07 12:32 ` Stefan Kanthak
2021-08-08 22:58 ` Vincent Lefevre
2021-08-09 17:19 ` Joseph Myers
2021-08-06 13:31 ` Michael Matz
2021-08-06 14:32 ` Stefan Kanthak
2021-08-06 15:04 ` Michael Matz
2021-08-06 15:16 ` Richard Biener
2021-08-06 16:57 ` Stefan Kanthak
2021-08-05 13:18 ` Gabriel Ravier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ED6CDB5360246A1A9670B99892C354D@H270 \
--to=stefan.kanthak@nexgo.de \
--cc=gcc@gcc.gnu.org \
--cc=paubert@iram.es \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).