public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/56944] New: Better isfinite in some cases?
@ 2013-04-13 9:24 glisse at gcc dot gnu.org
2013-04-14 10:21 ` [Bug target/56944] " glisse at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-13 9:24 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944
Bug #: 56944
Summary: Better isfinite in some cases?
Classification: Unclassified
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: glisse@gcc.gnu.org
Target: x86_64-linux-gnu
Hello,
for isfinite, gcc typically generates this sequence:
movsd .LC0(%rip), %xmm1
andpd %xmm1, %xmm0
movsd .LC1(%rip), %xmm1
ucomisd %xmm0, %xmm1
setae %al
With -fno-trapping-math, I tried this shorter sequence instead, which should be
valid:
subsd %xmm0, %xmm0
ucomisd %xmm0, %xmm0
setnp %al
Depending on the tests, it seemed to be either the same speed or 15% faster,
whether the argument is normal, infinite or nan. For a denormal argument, it is
15% slower (but then both codes take 100 times as long as the normal case). The
results might also be different on a more recent processor.
I don't know if we want to try and generate this code when -fno-trapping-math
is present.
(related to PR 30652)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/56944] Better isfinite in some cases?
2013-04-13 9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
@ 2013-04-14 10:21 ` glisse at gcc dot gnu.org
2013-04-15 10:26 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-14 10:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-14 10:21:53 UTC ---
Maybe in C terms:
isnan(x) -> x!=x
isinf(x) -> fabs(x)>DBL_MAX
isfinite(x) -> fabs(x)<=DBL_MAX
what I am suggesting (with -fno-trapping-math, and maybe -Os) is:
isfinite(x) -> !isnan(x-x)
but even without that, something like:
isfinite(x) -> x & mask1 != mask1 (or <) where mask1 is +inf
Since fabs(x) is implemented as x & mask2, this saves one constant, and it has
the nice advantage of forcing denormals to zero before the comparison, so the
running time of the function is the same for denormals instead of being 100
times longer. Note that writing a C program that convinces gcc to generate
exactly this code is very hard (even using intrinsics or inline asm, I
basically have to write the whole function in asm).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/56944] Better isfinite in some cases?
2013-04-13 9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
2013-04-14 10:21 ` [Bug target/56944] " glisse at gcc dot gnu.org
@ 2013-04-15 10:26 ` rguenth at gcc dot gnu.org
2013-04-18 22:47 ` glisse at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-04-15 10:26 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2013-04-15
Ever Confirmed|0 |1
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> 2013-04-15 10:26:12 UTC ---
Confirmed.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/56944] Better isfinite in some cases?
2013-04-13 9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
2013-04-14 10:21 ` [Bug target/56944] " glisse at gcc dot gnu.org
2013-04-15 10:26 ` rguenth at gcc dot gnu.org
@ 2013-04-18 22:47 ` glisse at gcc dot gnu.org
2013-04-19 9:53 ` glisse at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-18 22:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944
--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-18 22:47:55 UTC ---
Created attachment 29900
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29900
(x & inf) < inf
I only vaguely checked that the resulting asm was what I expected.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/56944] Better isfinite in some cases?
2013-04-13 9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
` (2 preceding siblings ...)
2013-04-18 22:47 ` glisse at gcc dot gnu.org
@ 2013-04-19 9:53 ` glisse at gcc dot gnu.org
2013-04-19 13:43 ` jakub at gcc dot gnu.org
2013-04-19 15:12 ` glisse at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-19 9:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944
Marc Glisse <glisse at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #29900|0 |1
is obsolete| |
--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-19 09:53:42 UTC ---
Created attachment 29903
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29903
(x & inf) < inf
Slightly improved version of the patch, but there are still issues. Attaching a
log of an IRC discussion, so I don't forget all the nice advice:
marcg Hello. I added an optab for __builtin_finite and an expander for x86,
and it seems to work. To avoid hurting the vectorizer, I also did a vector
version, and a builtin and told ix86_builtin_vectorized_function about it, and
it works for float.
marcg For double, finite takes a double and returns an int, so the
vectorizer tries to find a vector version that takes a V2DF and returns a V4SI,
whereas I added a version that returns V2DI.
richi make sure to also add constant folding
marcg (good point about constant folding, thanks)
marcg How is the vectorization supposed to work in that case? Should the
vectorizer try V2DI as well?
richi it's not that easy. I think the target should provide a V2SI variant
instead. But of course the vectorizer will ask for V4SI here, so it won't work
either.
richi the vectorizer simply cannot handle this at the moment (multiple types
and calls, in vectorizer speak)
richi it would need to go through vectorizable_conversion-like code
marcg A V2SI variant would be ugly, mixing vector sizes hurts.
richi special-casing the FP classification builtins
marcg Ok, I guess I should avoid adding this special 'finite' implementation
then
richi you can hande it in vectorizer pattern matching, vectorizing it as it
was before
marcg Ah, indeed, I could have a look (though it is only papering over
something, and it is becoming complicated for such a small optimization).
Thanks for the explanations.
richi yes, it would be a hack
jakub marcg: perhaps pattern recognition for that?
marcg I think pattern recognition in the vectorizer is the hack richi was
talking about? Or do you mean something else?
jakub marcg: if you find the target has V{2,4}DF -> V{2,4}DI finite builtin,
and code uses DF->SI, then rewrite the __builtin_finite call as
__builtin_finitedfdi casted to (int), and the vectorizer will handle the cast
richi of course you then need such a builtin
richi alternatively you can teach this to call vectorization
richi that is, look for a vectorized call that preserves the number of
vector elements
richi and do the appropriate widening/shortening
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/56944] Better isfinite in some cases?
2013-04-13 9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
` (3 preceding siblings ...)
2013-04-19 9:53 ` glisse at gcc dot gnu.org
@ 2013-04-19 13:43 ` jakub at gcc dot gnu.org
2013-04-19 15:12 ` glisse at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-04-19 13:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-04-19 13:43:18 UTC ---
For the vectorization, vectorizable_call should handle DF -> SI builtin
vectorization, see e.g. how on i?86/x86_64 BUILT_IN_IFLOOR is vectorized.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/56944] Better isfinite in some cases?
2013-04-13 9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
` (4 preceding siblings ...)
2013-04-19 13:43 ` jakub at gcc dot gnu.org
@ 2013-04-19 15:12 ` glisse at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-19 15:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56944
--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-19 15:11:58 UTC ---
(In reply to comment #5)
> For the vectorization, vectorizable_call should handle DF -> SI builtin
> vectorization, see e.g. how on i?86/x86_64 BUILT_IN_IFLOOR is vectorized.
I may try that. For cases where the result of __builtin_finite (really a bool)
is immediately used in a condition, that's going to be a waste compared to
pretending it returns a DI. However, detecting situations where we have a long
l and we use (int) l in an expression where we can remove the cast to int, and
making the necessary adjustments seems hard. Maybe restricting to the case
where the result is used once, as the first argument of a cond_expr where the
other arguments have a wider type would be doable...
Another thing worth trying would be to do this new __builtin_finite expansion
in the middle-end (same place as currently), and implement whatever is
necessary so that the back-end generates the right code from it. Even after the
patch for PR 54716, we are still not very good at turning integer logic
operations into float logic operations.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-04-19 15:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-13 9:24 [Bug target/56944] New: Better isfinite in some cases? glisse at gcc dot gnu.org
2013-04-14 10:21 ` [Bug target/56944] " glisse at gcc dot gnu.org
2013-04-15 10:26 ` rguenth at gcc dot gnu.org
2013-04-18 22:47 ` glisse at gcc dot gnu.org
2013-04-19 9:53 ` glisse at gcc dot gnu.org
2013-04-19 13:43 ` jakub at gcc dot gnu.org
2013-04-19 15:12 ` glisse at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).