* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
@ 2012-10-08 14:19 ` rguenth at gcc dot gnu.org
2012-10-12 13:41 ` glisse at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-10-08 14:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-10-08
Ever Confirmed|0 |1
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-10-08 14:19:27 UTC ---
Confirmed. Does not work for + though, as -0.0 + 0.0 is 0.0. At least
if I remember the signed-zero mess correctly ;)
On the tree level we see in-memory v because of the component modification:
_7 = BIT_FIELD_REF <v, 64, 0>;
_8 = _7 - 1.0e+0;
BIT_FIELD_REF <v, 64, 0> = _8;
v.0_10 = v;
v.1_11 = v.0_10 * { 2.0e+0, 2.0e+0 };
v = v.1_11;
so either lowering this differently in the first place or detecting
this kind of pattern would fix it.
Similar trick may be used for multiplication and division.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
2012-10-08 14:19 ` [Bug tree-optimization/54855] " rguenth at gcc dot gnu.org
@ 2012-10-12 13:41 ` glisse at gcc dot gnu.org
2012-10-12 17:08 ` glisse at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-12 13:41 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 13:41:35 UTC ---
(In reply to comment #1)
> Does not work for + though, as -0.0 + 0.0 is 0.0.
[...]
> On the tree level we see in-memory v because of the component modification:
>
> _7 = BIT_FIELD_REF <v, 64, 0>;
> _8 = _7 - 1.0e+0;
> BIT_FIELD_REF <v, 64, 0> = _8;
> v.0_10 = v;
> v.1_11 = v.0_10 * { 2.0e+0, 2.0e+0 };
> v = v.1_11;
>
> so either lowering this differently in the first place or detecting
> this kind of pattern would fix it.
Do you mean that at the tree level v[0] -= 1.0 could be changed to v -= {1.,
0.} ? That's not exactly what Ulrich was suggesting. It could be nice too, but
then we would need a different optimization in the back-end that detects the
special case of a vector subtraction where the second part of one argument is
0, in order to produce the optimal code.
In the x86 md, the sd instruction is represented as:
[(set (match_operand:VF_128 0 "register_operand" "=x,x")
(vec_merge:VF_128
(plusminus:VF_128
(match_operand:VF_128 1 "register_operand" "0,x")
(match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
(match_dup 1)
(const_int 1)))]
which is going to be hard to recognize from:
(insn 26 53 28 4 (set (reg:DF 81 [ D.2546 ])
(vec_select:DF (reg/v:V2DF 73 [ v ])
(parallel [
(const_int 0 [0])
]))) d.c:12 1408 {sse2_storelpd}
(nil))
(insn 28 26 29 4 (set (reg:DF 82 [ D.2546 ])
(minus:DF (reg:DF 81 [ D.2546 ])
(reg:DF 84))) d.c:12 760 {*fop_df_1_sse}
(expr_list:REG_DEAD (reg:DF 81 [ D.2546 ])
(nil)))
(insn 29 28 30 4 (set (reg/v:V2DF 73 [ v ])
(vec_concat:V2DF (reg:DF 82 [ D.2546 ])
(vec_select:DF (reg/v:V2DF 73 [ v ])
(parallel [
(const_int 1 [0x1])
])))) d.c:12 1411 {sse2_loadlpd}
(expr_list:REG_DEAD (reg:DF 82 [ D.2546 ])
(nil)))
However, since that's only 3 insn, providing an additional define_insn for the
same instruction but with a pattern of vec_select and vec_concat might be
enough for combine.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
2012-10-08 14:19 ` [Bug tree-optimization/54855] " rguenth at gcc dot gnu.org
2012-10-12 13:41 ` glisse at gcc dot gnu.org
@ 2012-10-12 17:08 ` glisse at gcc dot gnu.org
2012-10-12 17:34 ` glisse at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-12 17:08 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 17:08:20 UTC ---
The following patch gives this loop:
.L7:
subsd %xmm0, %xmm1
subl $1, %eax
addpd %xmm1, %xmm1
jne .L7
I guess I should add the same for mul and div at the same time, but I don't
know if it is the right approach.
--- config/i386/sse.md (revision 192405)
+++ config/i386/sse.md (working copy)
@@ -812,20 +812,38 @@
(const_int 1)))]
"TARGET_SSE"
"@
<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %2}
v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "isa" "noavx,avx")
(set_attr "type" "sseadd")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "<ssescalarmode>")])
+(define_insn "*sse2_vm<plusminus_insn>v2df3"
+ [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+ (vec_concat:V2DF
+ (plusminus:DF
+ (vec_select:DF
+ (match_operand:V2DF 1 "register_operand" "0,x")
+ (parallel [(const_int 0)]))
+ (match_operand:DF 2 "nonimmediate_operand" "xm,xm"))
+ (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))]
+ "TARGET_SSE2"
+ "@
+ <plusminus_mnemonic>sd\t{%2, %0|%0, %2}
+ v<plusminus_mnemonic>sd\t{%2, %1, %0|%0, %1, %2}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sseadd")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "DF")])
+
(define_expand "mul<mode>3"
[(set (match_operand:VF 0 "register_operand")
(mult:VF
(match_operand:VF 1 "nonimmediate_operand")
(match_operand:VF 2 "nonimmediate_operand")))]
"TARGET_SSE"
"ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")
(define_insn "*mul<mode>3"
[(set (match_operand:VF 0 "register_operand" "=x,x")
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
` (2 preceding siblings ...)
2012-10-12 17:08 ` glisse at gcc dot gnu.org
@ 2012-10-12 17:34 ` glisse at gcc dot gnu.org
2012-10-12 18:09 ` glisse at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-12 17:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 17:33:53 UTC ---
Note that a V4SF version should be doable, since it is 3 insn there as well,
although the pattern is different.
(insn 34 61 36 4 (set (reg:SF 103 [ D.2551 ])
(vec_select:SF (reg/v:V4SF 87 [ v ])
(parallel [
(const_int 0 [0])
]))) d.c:13 1380 {*vec_extractv4sf_0}
(nil))
(insn 36 34 37 4 (set (reg:SF 104 [ D.2551 ])
(minus:SF (reg:SF 103 [ D.2551 ])
(reg:SF 106))) d.c:13 759 {*fop_sf_1_sse}
(expr_list:REG_DEAD (reg:SF 103 [ D.2551 ])
(nil)))
(insn 37 36 38 4 (set (reg/v:V4SF 87 [ v ])
(vec_merge:V4SF (vec_duplicate:V4SF (reg:SF 104 [ D.2551 ]))
(reg/v:V4SF 87 [ v ])
(const_int 1 [0x1]))) d.c:13 1377 {vec_setv4sf_0}
(expr_list:REG_DEAD (reg:SF 104 [ D.2551 ])
(nil)))
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
` (3 preceding siblings ...)
2012-10-12 17:34 ` glisse at gcc dot gnu.org
@ 2012-10-12 18:09 ` glisse at gcc dot gnu.org
2012-10-20 17:44 ` glisse at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-12 18:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 18:08:45 UTC ---
Doing the optimization that late is a bit fragile though. For instance:
v[0] += 3.0;
v[0] -= 1.0;
is back to decomposing the vector, doing the operations and reconstructing it
(I didn't use -fassociative-math so it couldn't turn that into += 2.0). But
doing more looks too complicated.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
` (4 preceding siblings ...)
2012-10-12 18:09 ` glisse at gcc dot gnu.org
@ 2012-10-20 17:44 ` glisse at gcc dot gnu.org
2012-11-30 1:31 ` glisse at gcc dot gnu.org
2013-09-06 8:21 ` glisse at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-20 17:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-20 17:43:44 UTC ---
Uros' reply at http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01327.html copied
here for convenience:
"But, we _do_ have vec_merge pattern that describes the operation.
Adding another one to each operation just to satisfy combine is IMO
not correct approach. I'd rather see generic RTX simplification that
simplifies your proposed pattern to vec_merge pattern. Also, as you
mention in PR54855, Comment #5, the approach is too fragile..."
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
` (5 preceding siblings ...)
2012-10-20 17:44 ` glisse at gcc dot gnu.org
@ 2012-11-30 1:31 ` glisse at gcc dot gnu.org
2013-09-06 8:21 ` glisse at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-30 1:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-30 01:31:25 UTC ---
Created attachment 28832
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28832
simplify-rtx patch
With the patch, this code takes a single instruction. However, if I replace 'a'
with 1., it doesn't work (paradoxical subreg of mem...). I also see in the
combine dump some weird things like:
Failed to match this instruction:
(set (reg:V2DF 66 [ D.2211 ])
(zero_extend:V2DF (mem/u/c:DF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S8
A64])))
typedef double vec __attribute__((vector_size(16)));
vec f(vec x,double a)
{
x[0] -= a;
return x;
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
` (6 preceding siblings ...)
2012-11-30 1:31 ` glisse at gcc dot gnu.org
@ 2013-09-06 8:21 ` glisse at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-09-06 8:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855
--- Comment #8 from Marc Glisse <glisse at gcc dot gnu.org> ---
Just adding a link to the latest message of the conversation, for future
reference:
http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00824.html
^ permalink raw reply [flat|nested] 9+ messages in thread