[Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element
@ 2012-10-08 13:58 drepper.fsp at gmail dot com
  2012-10-08 14:19 ` [Bug tree-optimization/54855] " rguenth at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: drepper.fsp at gmail dot com @ 2012-10-08 13:58 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

             Bug #: 54855
           Summary: Unnecessary duplication when performing scalar
                    operation on vector element
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: drepper.fsp@gmail.com


Take the following code:


#include <stdio.h>

typedef double v2df __attribute__((vector_size(16)));

int
main(int argc, char *argv[])
{
  v2df v = { 2.0, 2.0 };
  v2df v2 = { 2.0, 2.0 };
  while (argc-- > 1)
    {
      v[0] -= 1.0;
      v *= v2;
    }
  printf("%g\n", v[0] + v[1]);
  return 0;
}

It compiles as C and C++, both compilers behave the same.

When compiling on x86-64 (therefore with SSE enabled) it generates for the loop
this code:


  4003f0:       66 0f 28 c1             movapd %xmm1,%xmm0
  4003f4:       83 e8 01                sub    $0x1,%eax
  4003f7:       f2 0f 5c c2             subsd  %xmm2,%xmm0
  4003fb:       f2 0f 10 c8             movsd  %xmm0,%xmm1
  4003ff:       66 0f 58 c9             addpd  %xmm1,%xmm1
  400403:       75 eb                   jne    4003f0 <main+0x20>


I.e., the value is pulled out of the vector, the subtraction is performed, and
then the scalar value is put back into the vector.

Instead the following sequence would have been completely sufficient:

sub    $0x1,%eax
subsd  %xmm2,%xmm1
addpd  %xmm1,%xmm1
jne    ...back

The subsd instruction doesn't touch the high parts of the register.


I know this is a special case, it only works if the scalar operation is for the
element zero of the vector.  But code can be designed like that.  I have some
code which would work nicely like this.  I don't know whether this translates
to other architectures as well.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
  2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
@ 2012-10-08 14:19 ` rguenth at gcc dot gnu.org
  2012-10-12 13:41 ` glisse at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-10-08 14:19 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-10-08
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-10-08 14:19:27 UTC ---
Confirmed.  Does not work for + though, as -0.0 + 0.0 is 0.0.  At least
if I remember the signed-zero mess correctly ;)

On the tree level we see in-memory v because of the component modification:

  _7 = BIT_FIELD_REF <v, 64, 0>;
  _8 = _7 - 1.0e+0;
  BIT_FIELD_REF <v, 64, 0> = _8;
  v.0_10 = v;
  v.1_11 = v.0_10 * { 2.0e+0, 2.0e+0 };
  v = v.1_11;

so either lowering this differently in the first place or detecting
this kind of pattern would fix it.

Similar trick may be used for multiplication and division.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
  2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
  2012-10-08 14:19 ` [Bug tree-optimization/54855] " rguenth at gcc dot gnu.org
@ 2012-10-12 13:41 ` glisse at gcc dot gnu.org
  2012-10-12 17:08 ` glisse at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-12 13:41 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 13:41:35 UTC ---
(In reply to comment #1)
> Does not work for + though, as -0.0 + 0.0 is 0.0.
[...]
> On the tree level we see in-memory v because of the component modification:
> 
>   _7 = BIT_FIELD_REF <v, 64, 0>;
>   _8 = _7 - 1.0e+0;
>   BIT_FIELD_REF <v, 64, 0> = _8;
>   v.0_10 = v;
>   v.1_11 = v.0_10 * { 2.0e+0, 2.0e+0 };
>   v = v.1_11;
> 
> so either lowering this differently in the first place or detecting
> this kind of pattern would fix it.

Do you mean that at the tree level v[0] -= 1.0 could be changed to v -= {1.,
0.} ? That's not exactly what Ulrich was suggesting. It could be nice too, but
then we would need a different optimization in the back-end that detects the
special case of a vector subtraction where the second part of one argument is
0, in order to produce the optimal code.

In the x86 md, the sd instruction is represented as:
  [(set (match_operand:VF_128 0 "register_operand" "=x,x")
        (vec_merge:VF_128
          (plusminus:VF_128
            (match_operand:VF_128 1 "register_operand" "0,x")
            (match_operand:VF_128 2 "nonimmediate_operand" "xm,xm"))
          (match_dup 1)
          (const_int 1)))]

which is going to be hard to recognize from:
(insn 26 53 28 4 (set (reg:DF 81 [ D.2546 ])
        (vec_select:DF (reg/v:V2DF 73 [ v ])
            (parallel [
                    (const_int 0 [0])
                ]))) d.c:12 1408 {sse2_storelpd}
     (nil))
(insn 28 26 29 4 (set (reg:DF 82 [ D.2546 ])
        (minus:DF (reg:DF 81 [ D.2546 ])
            (reg:DF 84))) d.c:12 760 {*fop_df_1_sse}
     (expr_list:REG_DEAD (reg:DF 81 [ D.2546 ])
        (nil)))
(insn 29 28 30 4 (set (reg/v:V2DF 73 [ v ])
        (vec_concat:V2DF (reg:DF 82 [ D.2546 ])
            (vec_select:DF (reg/v:V2DF 73 [ v ])
                (parallel [
                        (const_int 1 [0x1])
                    ])))) d.c:12 1411 {sse2_loadlpd}
     (expr_list:REG_DEAD (reg:DF 82 [ D.2546 ])
        (nil)))

However, since that's only 3 insn, providing an additional define_insn for the
same instruction but with a pattern of vec_select and vec_concat might be
enough for combine.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
  2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
  2012-10-08 14:19 ` [Bug tree-optimization/54855] " rguenth at gcc dot gnu.org
  2012-10-12 13:41 ` glisse at gcc dot gnu.org
@ 2012-10-12 17:08 ` glisse at gcc dot gnu.org
  2012-10-12 17:34 ` glisse at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-12 17:08 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

--- Comment #3 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 17:08:20 UTC ---
The following patch gives this loop:
.L7:
    subsd    %xmm0, %xmm1
    subl    $1, %eax
    addpd    %xmm1, %xmm1
    jne    .L7

I guess I should add the same for mul and div at the same time, but I don't
know if it is the right approach.

--- config/i386/sse.md    (revision 192405)
+++ config/i386/sse.md    (working copy)
@@ -812,20 +812,38 @@
       (const_int 1)))]
   "TARGET_SSE"
   "@
    <plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %0|%0, %2}
    v<plusminus_mnemonic><ssescalarmodesuffix>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
    (set_attr "type" "sseadd")
    (set_attr "prefix" "orig,vex")
    (set_attr "mode" "<ssescalarmode>")])

+(define_insn "*sse2_vm<plusminus_insn>v2df3"
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x")
+    (vec_concat:V2DF
+      (plusminus:DF
+        (vec_select:DF 
+          (match_operand:V2DF 1 "register_operand" "0,x")
+          (parallel [(const_int 0)]))
+        (match_operand:DF 2 "nonimmediate_operand" "xm,xm"))
+      (vec_select:DF (match_dup 1) (parallel [(const_int 1)]))))]
+  "TARGET_SSE2"
+  "@
+   <plusminus_mnemonic>sd\t{%2, %0|%0, %2}
+   v<plusminus_mnemonic>sd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseadd")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "DF")])
+
 (define_expand "mul<mode>3"
   [(set (match_operand:VF 0 "register_operand")
     (mult:VF
       (match_operand:VF 1 "nonimmediate_operand")
       (match_operand:VF 2 "nonimmediate_operand")))]
   "TARGET_SSE"
   "ix86_fixup_binary_operands_no_copy (MULT, <MODE>mode, operands);")

 (define_insn "*mul<mode>3"
   [(set (match_operand:VF 0 "register_operand" "=x,x")


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
  2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
                   ` (2 preceding siblings ...)
  2012-10-12 17:08 ` glisse at gcc dot gnu.org
@ 2012-10-12 17:34 ` glisse at gcc dot gnu.org
  2012-10-12 18:09 ` glisse at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-12 17:34 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 17:33:53 UTC ---
Note that a V4SF version should be doable, since it is 3 insn there as well,
although the pattern is different.

(insn 34 61 36 4 (set (reg:SF 103 [ D.2551 ])
        (vec_select:SF (reg/v:V4SF 87 [ v ])
            (parallel [
                    (const_int 0 [0])
                ]))) d.c:13 1380 {*vec_extractv4sf_0}
     (nil))
(insn 36 34 37 4 (set (reg:SF 104 [ D.2551 ])
        (minus:SF (reg:SF 103 [ D.2551 ])
            (reg:SF 106))) d.c:13 759 {*fop_sf_1_sse}
     (expr_list:REG_DEAD (reg:SF 103 [ D.2551 ])
        (nil)))
(insn 37 36 38 4 (set (reg/v:V4SF 87 [ v ])
        (vec_merge:V4SF (vec_duplicate:V4SF (reg:SF 104 [ D.2551 ]))
            (reg/v:V4SF 87 [ v ])
            (const_int 1 [0x1]))) d.c:13 1377 {vec_setv4sf_0}
     (expr_list:REG_DEAD (reg:SF 104 [ D.2551 ])
        (nil)))


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
  2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
                   ` (3 preceding siblings ...)
  2012-10-12 17:34 ` glisse at gcc dot gnu.org
@ 2012-10-12 18:09 ` glisse at gcc dot gnu.org
  2012-10-20 17:44 ` glisse at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-12 18:09 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-12 18:08:45 UTC ---
Doing the optimization that late is a bit fragile though. For instance:
    v[0] += 3.0;
    v[0] -= 1.0;
is back to decomposing the vector, doing the operations and reconstructing it
(I didn't use -fassociative-math so it couldn't turn that into += 2.0). But
doing more looks too complicated.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
  2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
                   ` (4 preceding siblings ...)
  2012-10-12 18:09 ` glisse at gcc dot gnu.org
@ 2012-10-20 17:44 ` glisse at gcc dot gnu.org
  2012-11-30  1:31 ` glisse at gcc dot gnu.org
  2013-09-06  8:21 ` glisse at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-10-20 17:44 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-20 17:43:44 UTC ---
Uros' reply at http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01327.html copied
here for convenience:
"But, we _do_ have vec_merge pattern that describes the operation.
Adding another one to each operation just to satisfy combine is IMO
not correct approach. I'd rather see generic RTX simplification that
simplifies your proposed pattern to vec_merge pattern. Also, as you
mention in PR54855, Comment #5, the approach is too fragile..."


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
  2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
                   ` (5 preceding siblings ...)
  2012-10-20 17:44 ` glisse at gcc dot gnu.org
@ 2012-11-30  1:31 ` glisse at gcc dot gnu.org
  2013-09-06  8:21 ` glisse at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-30  1:31 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-30 01:31:25 UTC ---
Created attachment 28832
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28832
simplify-rtx patch

With the patch, this code takes a single instruction. However, if I replace 'a'
with 1., it doesn't work (paradoxical subreg of mem...). I also see in the
combine dump some weird things like:
Failed to match this instruction:
(set (reg:V2DF 66 [ D.2211 ])
    (zero_extend:V2DF (mem/u/c:DF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [2 S8
A64])))



typedef double vec __attribute__((vector_size(16)));
vec f(vec x,double a)
{
  x[0] -= a;
  return x;
}


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54855] Unnecessary duplication when performing scalar operation on vector element
  2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
                   ` (6 preceding siblings ...)
  2012-11-30  1:31 ` glisse at gcc dot gnu.org
@ 2013-09-06  8:21 ` glisse at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-09-06  8:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54855

--- Comment #8 from Marc Glisse <glisse at gcc dot gnu.org> ---
Just adding a link to the latest message of the conversation, for future
reference:
http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00824.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-09-06  8:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-08 13:58 [Bug tree-optimization/54855] New: Unnecessary duplication when performing scalar operation on vector element drepper.fsp at gmail dot com
2012-10-08 14:19 ` [Bug tree-optimization/54855] " rguenth at gcc dot gnu.org
2012-10-12 13:41 ` glisse at gcc dot gnu.org
2012-10-12 17:08 ` glisse at gcc dot gnu.org
2012-10-12 17:34 ` glisse at gcc dot gnu.org
2012-10-12 18:09 ` glisse at gcc dot gnu.org
2012-10-20 17:44 ` glisse at gcc dot gnu.org
2012-11-30  1:31 ` glisse at gcc dot gnu.org
2013-09-06  8:21 ` glisse at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).