* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
@ 2013-02-08 12:39 ` rguenth at gcc dot gnu.org
2013-02-08 12:53 ` rguenth at gcc dot gnu.org
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-08 12:39 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*, i?86-*-*
Status|UNCONFIRMED |NEW
Last reconfirmed| |2013-02-08
Component|tree-optimization |target
Ever Confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-08 12:38:50 UTC ---
Confirmed. That's because we have
__m256 foo(__m256, __m256, __m256) (__m256 a, __m256 b, __m256 c)
{
__m256 D.6689;
__m256 D.6686;
__m256 _5;
__m256 _6;
<bb 2>:
_5 = __builtin_ia32_mulps256 (a_1(D), b_2(D));
_6 = __builtin_ia32_addps256 (_5, c_3(D));
return _6;
instead of
_5 = a_1(D) * b_2(D);
_6 = _5 + c_3(D);
not sure why we use builtins for these basic operations...
_mm256_add_ps could for example be simply
extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm256_add_ps (__m256 __A, __m256 __B)
{
return (__m256) ((__v8sf)__A + (__v8sf)__B);
}
with the caveat of using a GNU extension.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
2013-02-08 12:39 ` [Bug target/56253] " rguenth at gcc dot gnu.org
@ 2013-02-08 12:53 ` rguenth at gcc dot gnu.org
2013-02-08 12:59 ` ubizjak at gmail dot com
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-08 12:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-08 12:52:32 UTC ---
For _mm256_fmadd_pd and friends the only possibility is to fold the target
builtins via targetm.fold_builtin to FMA_EXPR. Which is of course also
possible for the simple add/muls (but getting rid of builtins is a good
idea - they are quite heavy in compiler startup time).
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
2013-02-08 12:39 ` [Bug target/56253] " rguenth at gcc dot gnu.org
2013-02-08 12:53 ` rguenth at gcc dot gnu.org
@ 2013-02-08 12:59 ` ubizjak at gmail dot com
2013-02-08 13:30 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2013-02-08 12:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #3 from Uros Bizjak <ubizjak at gmail dot com> 2013-02-08 12:58:49 UTC ---
(In reply to comment #1)
> not sure why we use builtins for these basic operations...
Because they have to be emitted also for non-SSE math.
>From config/i386/sse.md:
;; The standard names for fma is only available with SSE math enabled.
(define_expand "fma<mode>4"
[(set (match_operand:FMAMODE 0 "register_operand")
(fma:FMAMODE
(match_operand:FMAMODE 1 "nonimmediate_operand")
(match_operand:FMAMODE 2 "nonimmediate_operand")
(match_operand:FMAMODE 3 "nonimmediate_operand")))]
"(TARGET_FMA || TARGET_FMA4) && TARGET_SSE_MATH")
...
;; The builtin for intrinsics is not constrained by SSE math enabled.
(define_expand "fma4i_fmadd_<mode>"
[(set (match_operand:FMAMODE 0 "register_operand")
(fma:FMAMODE
(match_operand:FMAMODE 1 "nonimmediate_operand")
(match_operand:FMAMODE 2 "nonimmediate_operand")
(match_operand:FMAMODE 3 "nonimmediate_operand")))]
"TARGET_FMA || TARGET_FMA4")
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (2 preceding siblings ...)
2013-02-08 12:59 ` ubizjak at gmail dot com
@ 2013-02-08 13:30 ` rguenth at gcc dot gnu.org
2013-02-08 13:43 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-08 13:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-08 13:30:14 UTC ---
(In reply to comment #3)
> (In reply to comment #1)
>
> > not sure why we use builtins for these basic operations...
>
> Because they have to be emitted also for non-SSE math.
>
> From config/i386/sse.md:
>
> ;; The standard names for fma is only available with SSE math enabled.
> (define_expand "fma<mode>4"
> [(set (match_operand:FMAMODE 0 "register_operand")
> (fma:FMAMODE
> (match_operand:FMAMODE 1 "nonimmediate_operand")
> (match_operand:FMAMODE 2 "nonimmediate_operand")
> (match_operand:FMAMODE 3 "nonimmediate_operand")))]
> "(TARGET_FMA || TARGET_FMA4) && TARGET_SSE_MATH")
>
> ...
>
> ;; The builtin for intrinsics is not constrained by SSE math enabled.
>
> (define_expand "fma4i_fmadd_<mode>"
> [(set (match_operand:FMAMODE 0 "register_operand")
> (fma:FMAMODE
> (match_operand:FMAMODE 1 "nonimmediate_operand")
> (match_operand:FMAMODE 2 "nonimmediate_operand")
> (match_operand:FMAMODE 3 "nonimmediate_operand")))]
> "TARGET_FMA || TARGET_FMA4")
Ah, of course ...
That leaves the option of folding in targetm.fold_builtin (when
the standard names are available, of course).
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (3 preceding siblings ...)
2013-02-08 13:30 ` rguenth at gcc dot gnu.org
@ 2013-02-08 13:43 ` rguenth at gcc dot gnu.org
2013-02-08 14:59 ` ubizjak at gmail dot com
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-08 13:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-08 13:42:23 UTC ---
(In reply to comment #3)
> (In reply to comment #1)
>
> > not sure why we use builtins for these basic operations...
>
> Because they have to be emitted also for non-SSE math.
>
> From config/i386/sse.md:
>
> ;; The standard names for fma is only available with SSE math enabled.
> (define_expand "fma<mode>4"
> [(set (match_operand:FMAMODE 0 "register_operand")
> (fma:FMAMODE
> (match_operand:FMAMODE 1 "nonimmediate_operand")
> (match_operand:FMAMODE 2 "nonimmediate_operand")
> (match_operand:FMAMODE 3 "nonimmediate_operand")))]
> "(TARGET_FMA || TARGET_FMA4) && TARGET_SSE_MATH")
>
> ...
>
> ;; The builtin for intrinsics is not constrained by SSE math enabled.
>
> (define_expand "fma4i_fmadd_<mode>"
> [(set (match_operand:FMAMODE 0 "register_operand")
> (fma:FMAMODE
> (match_operand:FMAMODE 1 "nonimmediate_operand")
> (match_operand:FMAMODE 2 "nonimmediate_operand")
> (match_operand:FMAMODE 3 "nonimmediate_operand")))]
> "TARGET_FMA || TARGET_FMA4")
Hmm, I wonder how the vectorizer then accesses add/sub patterns without
SSE math. It just queries optabs ...
We cannot handle the FMA case with standard operations anyway. But
if SSE modes are used, why should convert_mult_to_fma have to back off
(it also just looks at standard optabs)?
That said - should the above TARGET_SSE_MATH restriction not only
apply to scalar modes?
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (4 preceding siblings ...)
2013-02-08 13:43 ` rguenth at gcc dot gnu.org
@ 2013-02-08 14:59 ` ubizjak at gmail dot com
2013-02-11 11:36 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2013-02-08 14:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #6 from Uros Bizjak <ubizjak at gmail dot com> 2013-02-08 14:58:42 UTC ---
(In reply to comment #5)
> Hmm, I wonder how the vectorizer then accesses add/sub patterns without
> SSE math. It just queries optabs ...
>
> We cannot handle the FMA case with standard operations anyway. But
> if SSE modes are used, why should convert_mult_to_fma have to back off
> (it also just looks at standard optabs)?
>
> That said - should the above TARGET_SSE_MATH restriction not only
> apply to scalar modes?
Ugh, you are right. TARGET_SSE_MATH should apply to scalar modes only.
Patch in works.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (5 preceding siblings ...)
2013-02-08 14:59 ` ubizjak at gmail dot com
@ 2013-02-11 11:36 ` rguenth at gcc dot gnu.org
2014-09-23 16:40 ` agner at agner dot org
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-11 11:36 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> 2013-02-11 11:36:29 UTC ---
So to re-cap, for vector intrinsics we can use the vector GCC extensions
while for the scalar intrinsics we would have to use target builtin folding.
Not sure if it's worth the asymmetry (thus instead do everything with
target builtin folding).
Testcase that still needs to "work" (emit addsd) with -m32 -msse2
and without -mfpmath=sse:
#include <emmintrin.h>
double foo (double x)
{
return _mm_cvtsd_f64 (_mm_add_sd (_mm_set_sd (x), _mm_set_sd (1.0)));
}
Exposing the intrinsics internals to the compiler also would cause us to
not "literally" emitting the code the user may have asked for (though
RTL optimization already might do that - but RTL for example never
re-associates FP, even with -ffast-math).
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (6 preceding siblings ...)
2013-02-11 11:36 ` rguenth at gcc dot gnu.org
@ 2014-09-23 16:40 ` agner at agner dot org
2014-09-23 19:14 ` agner at agner dot org
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: agner at agner dot org @ 2014-09-23 16:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
Agner Fog <agner at agner dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |agner at agner dot org
--- Comment #8 from Agner Fog <agner at agner dot org> ---
The same problem applies to other kinds of optimizations, such as algebraic
reductions and constant propagation.
The method of using operators such as * and + is not portable to other
compilers, and it doesn't work with integer vectors for other integer sizes
than 64-bits. (I know that there is no integer FMA on Intel CPUs, but I am also
talking about other optimizations).
Here are some other examples of optimizations I would like gcc to do:
#include "x86intrin.h"
void dummy2(__m128 a, __m128 b);
void dummyi2(__m128i a, __m128i b);
void commutative(__m128 a, __m128 b) {
// expect reduce a+b = b+a. This is the only reduction that actually works!
dummy2(_mm_add_ps(a,b), _mm_add_ps(b,a));
}
void associative(__m128i a, __m128i b, __m128i c) {
// expect reduce (a+b)+c = a+(b+c)
dummy2i(_mm_add_epi32(_mm_add_epi32(a,b),c),
_mm_add_epi32(a,_mm_add_epi32(b,c)));
}
void distributive(__m128i a, __m128i b, __m128i c) {
// expect reduce a*b+a*c = a*(b+c)
dummy2i(_mm_add_epi32(_mm_mul_epi32(a,b),_mm_mul_epi32(a,c)),
_mm_mul_epi32(a,_mm_add_epi32(b,c)));
}
void constant_propagation() {
// expect store c and d as precalculated constants
__m128i a = _mm_setr_epi32(1,2,3,4);
__m128i b = _mm_set1_epi32(5);
__m128i c = _mm_add_epi32(a,b);
__m128i d = _mm_mul_epi32(a,b);
dummyi2(c,d);
}
Of course, the same applies to 256-bit and 512-bit vectors.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (7 preceding siblings ...)
2014-09-23 16:40 ` agner at agner dot org
@ 2014-09-23 19:14 ` agner at agner dot org
2014-09-23 19:22 ` glisse at gcc dot gnu.org
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: agner at agner dot org @ 2014-09-23 19:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #9 from Agner Fog <agner at agner dot org> ---
Many programmers are using a vector class library rather than writing intrinsic
functions directly. Such libraries have overloaded operators which are inlined
to produce intrinsic functions. Therefore, we cannot expect programmers to make
optimizations like FMA contraction, algebraic reduction, constant propagation,
etc. manually.
I don't know if this more general discussion of optimizations on code with
intrinsics fit into this bug or they need to be discussed elsewhere?
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (8 preceding siblings ...)
2014-09-23 19:14 ` agner at agner dot org
@ 2014-09-23 19:22 ` glisse at gcc dot gnu.org
2014-09-24 5:14 ` agner at agner dot org
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-09-23 19:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #10 from Marc Glisse <glisse at gcc dot gnu.org> ---
Two random links into the latest conversation:
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01812.html
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02288.html
That version of the patch doesn't handle integer vectors IIRC, but it is the
principle that matters, if the first step is accepted I'll extend it.
I suppose I should ping it again soon...
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (9 preceding siblings ...)
2014-09-23 19:22 ` glisse at gcc dot gnu.org
@ 2014-09-24 5:14 ` agner at agner dot org
2014-09-25 6:37 ` glisse at gcc dot gnu.org
2014-09-25 6:39 ` glisse at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: agner at agner dot org @ 2014-09-24 5:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #11 from Agner Fog <agner at agner dot org> ---
Thanks for the links Marc.
You are right, the discussion in the gcc-patches mailing list ignores integer
vectors. You need a solution that also allows optimizations on integer
intrinsic functions (perhaps cast the vector type?). I am not on any internal
mailing list, so please post it there for me.
The proposed solution of using vector extensions will not work on masked vector
intrinsics in AVX512, so it wouldn't enable e.g. constant propagation through a
masked intrinsic, but that is probably too much to ask for :) I will add a new
bug report for contraction of broadcast with AVX512.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (10 preceding siblings ...)
2014-09-24 5:14 ` agner at agner dot org
@ 2014-09-25 6:37 ` glisse at gcc dot gnu.org
2014-09-25 6:39 ` glisse at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-09-25 6:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #14 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Agner Fog from comment #13)
> Is it possible to do the same optimization with boolean vector intrinsics,
> such as _mm_and_epi32 and _mm_or_ps to enable optimizations such as
> algebraic reduction and constant propagation?
Anything we already do with vector extensions should be easy, and that includes
constant propagation in & and |. The sightly harder part is transformations
that are only valid if v is a "bool vector" (like replacing v!=0 with just v),
i.e. each component is either 0 or -1. We can test constants, we know the
result of comparisons is boolean, we know &, | and ^ preserve that property,
but it isn't a purely local property so it requires a bit more work.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/56253] fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3)
2013-02-08 12:15 [Bug tree-optimization/56253] New: fp-contract does not work with SSE and AVX FMAs (neither FMA4 nor FMA3) kretz at kde dot org
` (11 preceding siblings ...)
2014-09-25 6:37 ` glisse at gcc dot gnu.org
@ 2014-09-25 6:39 ` glisse at gcc dot gnu.org
12 siblings, 0 replies; 14+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-09-25 6:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
--- Comment #15 from Marc Glisse <glisse at gcc dot gnu.org> ---
Oups, sorry, or_ps may be harder, because representing it with vector
extensions requires casts to integer vectors, which makes it much harder to
actually generate or_ps in the backend (there is at least one PR about that),
so we probably won't do it soon.
^ permalink raw reply [flat|nested] 14+ messages in thread