public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/56944] New: Better isfinite in some cases?
@ 2013-04-13  9:24 glisse at gcc dot gnu.org
  2013-04-14 10:21 ` [Bug target/56944] " glisse at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-13  9:24 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944

             Bug #: 56944
           Summary: Better isfinite in some cases?
    Classification: Unclassified
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: glisse@gcc.gnu.org
            Target: x86_64-linux-gnu


Hello,

for isfinite, gcc typically generates this sequence:

    movsd    .LC0(%rip), %xmm1
    andpd    %xmm1, %xmm0
    movsd    .LC1(%rip), %xmm1
    ucomisd    %xmm0, %xmm1
    setae    %al

With -fno-trapping-math, I tried this shorter sequence instead, which should be
valid:

    subsd    %xmm0, %xmm0
    ucomisd    %xmm0, %xmm0
    setnp    %al

Depending on the tests, it seemed to be either the same speed or 15% faster,
whether the argument is normal, infinite or nan. For a denormal argument, it is
15% slower (but then both codes take 100 times as long as the normal case). The
results might also be different on a more recent processor.

I don't know if we want to try and generate this code when -fno-trapping-math
is present.

(related to PR 30652)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/56944] Better isfinite in some cases?
  2013-04-13  9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
@ 2013-04-14 10:21 ` glisse at gcc dot gnu.org
  2013-04-15 10:26 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-14 10:21 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944

--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-14 10:21:53 UTC ---
Maybe in C terms:
isnan(x) -> x!=x
isinf(x) -> fabs(x)>DBL_MAX
isfinite(x) -> fabs(x)<=DBL_MAX

what I am suggesting (with -fno-trapping-math, and maybe -Os) is:
isfinite(x) -> !isnan(x-x)

but even without that, something like:
isfinite(x) -> x & mask1 != mask1 (or <) where mask1 is +inf
Since fabs(x) is implemented as x & mask2, this saves one constant, and it has
the nice advantage of forcing denormals to zero before the comparison, so the
running time of the function is the same for denormals instead of being 100
times longer. Note that writing a C program that convinces gcc to generate
exactly this code is very hard (even using intrinsics or inline asm, I
basically have to write the whole function in asm).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/56944] Better isfinite in some cases?
  2013-04-13  9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
  2013-04-14 10:21 ` [Bug target/56944] " glisse at gcc dot gnu.org
@ 2013-04-15 10:26 ` rguenth at gcc dot gnu.org
  2013-04-18 22:47 ` glisse at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-04-15 10:26 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-04-15
     Ever Confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> 2013-04-15 10:26:12 UTC ---
Confirmed.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/56944] Better isfinite in some cases?
  2013-04-13  9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
  2013-04-14 10:21 ` [Bug target/56944] " glisse at gcc dot gnu.org
  2013-04-15 10:26 ` rguenth at gcc dot gnu.org
@ 2013-04-18 22:47 ` glisse at gcc dot gnu.org
  2013-04-19  9:53 ` glisse at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-18 22:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944

--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-18 22:47:55 UTC ---
Created attachment 29900
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29900
(x & inf) < inf

I only vaguely checked that the resulting asm was what I expected.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/56944] Better isfinite in some cases?
  2013-04-13  9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2013-04-18 22:47 ` glisse at gcc dot gnu.org
@ 2013-04-19  9:53 ` glisse at gcc dot gnu.org
  2013-04-19 13:43 ` jakub at gcc dot gnu.org
  2013-04-19 15:12 ` glisse at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-19  9:53 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #29900|0                           |1
        is obsolete|                            |

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-19 09:53:42 UTC ---
Created attachment 29903
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29903
(x & inf) < inf

Slightly improved version of the patch, but there are still issues. Attaching a
log of an IRC discussion, so I don't forget all the nice advice:

marcg    Hello. I added an optab for __builtin_finite and an expander for x86,
and it seems to work. To avoid hurting the vectorizer, I also did a vector
version, and a builtin and told ix86_builtin_vectorized_function about it, and
it works for float.
marcg    For double, finite takes a double and returns an int, so the
vectorizer tries to find a vector version that takes a V2DF and returns a V4SI,
whereas I added a version that returns V2DI.
richi    make sure to also add constant folding
marcg    (good point about constant folding, thanks)
marcg    How is the vectorization supposed to work in that case? Should the
vectorizer try V2DI as well?
richi    it's not that easy. I think the target should provide a V2SI variant
instead. But of course the vectorizer will ask for V4SI here, so it won't work
either.
richi    the vectorizer simply cannot handle this at the moment (multiple types
and calls, in vectorizer speak)
richi    it would need to go through vectorizable_conversion-like code
marcg    A V2SI variant would be ugly, mixing vector sizes hurts.
richi    special-casing the FP classification builtins
marcg    Ok, I guess I should avoid adding this special 'finite' implementation
then
richi    you can hande it in vectorizer pattern matching, vectorizing it as it
was before
marcg    Ah, indeed, I could have a look (though it is only papering over
something, and it is becoming complicated for such a small optimization).
Thanks for the explanations.
richi    yes, it would be a hack
jakub    marcg: perhaps pattern recognition for that?
marcg    I think pattern recognition in the vectorizer is the hack richi was
talking about? Or do you mean something else?
jakub    marcg: if you find the target has V{2,4}DF -> V{2,4}DI finite builtin,
and code uses DF->SI, then rewrite the __builtin_finite call as
__builtin_finitedfdi casted to (int), and the vectorizer will handle the cast
richi    of course you then need such a builtin
richi    alternatively you can teach this to call vectorization
richi    that is, look for a vectorized call that preserves the number of
vector elements
richi    and do the appropriate widening/shortening


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/56944] Better isfinite in some cases?
  2013-04-13  9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2013-04-19  9:53 ` glisse at gcc dot gnu.org
@ 2013-04-19 13:43 ` jakub at gcc dot gnu.org
  2013-04-19 15:12 ` glisse at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-04-19 13:43 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-04-19 13:43:18 UTC ---
For the vectorization, vectorizable_call should handle DF -> SI builtin
vectorization, see e.g. how on i?86/x86_64 BUILT_IN_IFLOOR is vectorized.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/56944] Better isfinite in some cases?
  2013-04-13  9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2013-04-19 13:43 ` jakub at gcc dot gnu.org
@ 2013-04-19 15:12 ` glisse at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-19 15:12 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944

--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-19 15:11:58 UTC ---
(In reply to comment #5)
> For the vectorization, vectorizable_call should handle DF -> SI builtin
> vectorization, see e.g. how on i?86/x86_64 BUILT_IN_IFLOOR is vectorized.

I may try that. For cases where the result of __builtin_finite (really a bool)
is immediately used in a condition, that's going to be a waste compared to
pretending it returns a DI. However, detecting situations where we have a long
l and we use (int) l in an expression where we can remove the cast to int, and
making the necessary adjustments seems hard. Maybe restricting to the case
where the result is used once, as the first argument of a cond_expr where the
other arguments have a wider type would be doable...

Another thing worth trying would be to do this new __builtin_finite expansion
in the middle-end (same place as currently), and implement whatever is
necessary so that the back-end generates the right code from it. Even after the
patch for PR 54716, we are still not very good at turning integer logic
operations into float logic operations.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-04-19 15:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-13  9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
2013-04-14 10:21 ` [Bug target/56944] " glisse at gcc dot gnu.org
2013-04-15 10:26 ` rguenth at gcc dot gnu.org
2013-04-18 22:47 ` glisse at gcc dot gnu.org
2013-04-19  9:53 ` glisse at gcc dot gnu.org
2013-04-19 13:43 ` jakub at gcc dot gnu.org
2013-04-19 15:12 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).