Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
To: "Gabriel Paubert" <paubert@iram.es>
Cc: <gcc@gcc.gnu.org>
Subject: Re: Suboptimal code generated for __buitlin_trunc on AMD64 without SS4_4.1
Date: Fri, 6 Aug 2021 16:37:59 +0200	[thread overview]
Message-ID: <6ED6CDB5360246A1A9670B99892C354D@H270> (raw)
In-Reply-To: <20210806132052.GA565@lt-gp.iram.es>

Gabriel Paubert <paubert@iram.es> wrote:


> On Fri, Aug 06, 2021 at 02:43:34PM +0200, Stefan Kanthak wrote:
>> Gabriel Paubert <paubert@iram.es> wrote:
>> 
>> > Hi,
>> > 
>> > On Thu, Aug 05, 2021 at 01:58:12PM +0200, Stefan Kanthak wrote:

[...]

>> >> The whole idea behind these implementations is to get rid of loading
>> >> floating-point constants to perform comparisions.
>> > 
>> > Indeed, but what I had in mind was something along the following lines:
>> > 
>> > movq rax,xmm0   # and copy rax to say rcx, if needed later
>> > shrq rax,52     # move sign and exponent to 12 LSBs 
>> > andl eax,0x7ff  # mask the sign
>> > cmpl eax,0x434  # value to be checked
>> > ja return       # exponent too large, we're done (what about NaNs?)
>> > cvttsd2si rax,xmm0 # safe after exponent check
>> > cvtsi2sd xmm0,rax  # conversion done
>> > 
>> > and a bit more to handle the corner cases (essentially preserve the
>> > sign to be correct between -1 and -0.0).
>> 
>> The sign of -0.0 is the only corner case and already handled in my code.
>> Both SNAN and QNAN (which have an exponent 0x7ff) are handled and
>> preserved, as in the code GCC generates as well as my code.
> 
> I don't know what the standard says about NaNs in this case, I seem to
> remember that arithmetic instructions typically produce QNaN when one of
> the inputs is a NaN, whether signaling or not. 

<https://pubs.opengroup.org/onlinepubs/9699919799/functions/trunc.html>
and its cousins as well as the C standard say

| If x is NaN, a NaN shall be returned.

That's why I mentioned that the code GCC generates also doesn't quiet SNaNs.

>> > But the CPU can (speculatively) start the conversions early, so the
>> > dependency chain is rather short.
>> 
>> Correct.
>>  
>> > I don't know if it's faster than your new code,
>> 
>> It should be faster.
>> 
>> > I'm almost sure that it's shorter.
>> 
>> "neg rax; jo ...; neg rax" is 3+2+3=8 bytes, the above sequence has but
>> 5+4+5+5+2=21 bytes.
>> 
>> JFTR: better use "add rax,rax; shr rax,53" instead of
>>       "shr rax,52; and eax,0x7ff" and save 2 bytes.
> 
> Indeed, I don't have the exact size of instructions in my head,
> especially since I've not written x86 assembly since the mid 90s.
> 
> In any case, with your last improvement, the code is now down to a
> single 32 bit immediate constant. And I don't see how to eliminate it...
> 
>> 
>> Complete properly optimized code for __builtin_trunc is then as follows
>> (11 instructions, 44 bytes):
>> 
>> .code64
>> .intel_syntax
>> .equ    BIAS, 1023
>> .text
>>         movq    rax, xmm0    # rax = argument
>>         add     rax, rax
>>         shr     rax, 53      # rax = exponent of |argument|
>>         cmp     eax, BIAS + 53
>>         jae     .Lexit       # argument indefinite?
> 
> Maybe s/.Lexit/.L0/

Surely!

>>                              # |argument| >= 0x1.0p53?
>>         cvttsd2si rax, xmm0  # rax = trunc(argument)
>>         cvtsi2sd xmm1, rax   # xmm1 = trunc(argument)
>>         psrlq   xmm0, 63
>>         psllq   xmm0, 63     # xmm0 = (argument & -0.0) ? -0.0 : 0.0
>>         orpd    xmm0, xmm1   # xmm0 = trunc(argument)
>> .L0:    ret
>> .end
>> 
> 
> This looks nice.

Let's see how to convince GCC to generate such code sequences...

Stefan

next prev parent reply	other threads:[~2021-08-06 14:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05  7:25 Stefan Kanthak
2021-08-05  9:42 ` Gabriel Paubert
2021-08-05 10:19   ` Richard Biener
2021-08-05 11:58   ` Stefan Kanthak
2021-08-05 13:59     ` Gabriel Paubert
2021-08-06 12:43       ` Stefan Kanthak
2021-08-06 12:59         ` Richard Biener
2021-08-06 13:20         ` Gabriel Paubert
2021-08-06 14:37           ` Stefan Kanthak [this message]
2021-08-06 17:44             ` Joseph Myers
2021-08-07 12:32               ` Stefan Kanthak
2021-08-08 22:58                 ` Vincent Lefevre
2021-08-09 17:19                 ` Joseph Myers
2021-08-06 13:31         ` Michael Matz
2021-08-06 14:32           ` Stefan Kanthak
2021-08-06 15:04             ` Michael Matz
2021-08-06 15:16             ` Richard Biener
2021-08-06 16:57               ` Stefan Kanthak
2021-08-05 13:18   ` Gabriel Ravier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ED6CDB5360246A1A9670B99892C354D@H270 \
    --to=stefan.kanthak@nexgo.de \
    --cc=gcc@gcc.gnu.org \
    --cc=paubert@iram.es \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).